Package org.apache.tika.parser.pdf
-
Class Summary Class Description AccessChecker Checks whether or not a document allows extraction generally or extraction for accessibility only.OCRPageCounter This counts the number of pages that OCR would have been run or was run depending on the settings.PDFMarkedContent2XHTML This was added in Tika 1.24 as an alpha version of a text extractor that builds the text from the marked text tree and includes/normalizes some of the structural tags.PDFParser PDF parser.PDFParserConfig Config for PDFParser.PDFParserConfig.OCRStrategyAuto Encapsulate the numbers used to control OCR Strategy when set to autoPDMetadataExtractor -
Enum Summary Enum Description PDFParserConfig.IMAGE_STRATEGY PDFParserConfig.OCR_RENDERING_STRATEGY PDFParserConfig.OCR_STRATEGY