org.apache.tika.parser.pdf (Apache Tika 3.0.0 API)

package org.apache.tika.parser.pdf

Related Packages

Package

Description

org.apache.tika.parser

Tika parsers.

org.apache.tika.parser.pdf.image

org.apache.tika.parser.pdf.updates

org.apache.tika.parser.pdf.xmpschemas
Class

Description

AccessChecker

Checks whether or not a document allows extraction generally or extraction for accessibility only.

OCRPageCounter

This counts the number of pages that OCR would have been run or was run depending on the settings.

PDFMarkedContent2XHTML

This was added in Tika 1.24 as an alpha version of a text extractor that builds the text from the marked text tree and includes/normalizes some of the structural tags.

PDFParser

PDF parser.

PDFParserConfig

Config for PDFParser.

PDFParserConfig.IMAGE_STRATEGY

PDFParserConfig.OCR_RENDERING_STRATEGY

PDFParserConfig.OCR_STRATEGY

PDFParserConfig.OCRStrategyAuto

Encapsulate the numbers used to control OCR Strategy when set to auto

PDFParserConfig.TikaImageType

PDMetadataExtractor