Package org.apache.tika.parser.pdf
-
Class Summary Class Description AccessChecker Checks whether or not a document allows extraction generally or extraction for accessibility only.PDFMarkedContent2XHTML This was added in Tika 1.24 as an alpha version of a text extractor that builds the text from the marked text tree and includes/normalizes some of the structural tags.PDFParser PDF parser.PDFParserConfig Config for PDFParser.PDFParserConfig.OCRStrategyAuto Encapsulate the numbers used to control OCR Strategy when set to autoPDMetadataExtractor -
Enum Summary Enum Description PDFParserConfig.IMAGE_STRATEGY PDFParserConfig.OCR_RENDERING_STRATEGY PDFParserConfig.OCR_STRATEGY