All Classes (Apache Tika 2.9.1 API)

All Classes Interface Summary Class Summary Enum Summary Exception Summary Error Summary Annotation Types Summary
Class	Description
AbstractChunking	This class specifies the base class for file chunking
AbstractConsumersBuilder
AbstractConverter	Base class for Tika Metadata to XMP converter which provides some needed common functionality.
AbstractDBParser	Abstract class that handles iterating through tables within a database.
AbstractDWGParser
AbstractEmitter
AbstractEncodingDetectorParser	Abstract base class for parsers that use the AutoDetectReader and need to use the `EncodingDetector` configured by `TikaConfig`
AbstractExternalProcessParser	Abstract base class for parsers that call external processes.
AbstractFetcher
AbstractFSConsumer
AbstractImageParser
AbstractListManager
AbstractListManager.LevelTuple
AbstractListManager.ParagraphLevelCounter
AbstractMultipleParser	Abstract base class for parser wrappers which may / will process a given stream multiple times, merging the results of the various parsers used.
AbstractMultipleParser.MetadataPolicy	The various strategies for handling metadata emitted by multiple parsers.
AbstractOfficeParser	Intermediate layer to set `OfficeParserConfig` uniformly.
AbstractOOXMLExtractor	Base class for all Tika OOXML extractors.
AbstractParser	Abstract base class for new parsers.
AbstractProfiler
AbstractProfiler.EXCEPTION_TYPE
AbstractProfiler.PARSE_ERROR_TYPE	If information was gathered from the log file about a parse error
AbstractRecursiveParserWrapperHandler	This is a special handler to be used only with the `RecursiveParserWrapper`.
AbstractTranslator
AbstractXML2003Parser
AccessChecker	Checks whether or not a document allows extraction generally or extraction for accessibility only.
AccessPermissionException	Exception to be thrown when a document does not allow content extraction.
AccessPermissions	Until we can find a common standard, we'll use these options.
Activator
ActiveMimeParser	ActiveMime is a macro container format used in some mso files.
AdapterHelper
AdobeFontMetricParser	Parser for AFM Font Files
AdvancedTypeDetector
AgeRecogniser	Parser for extracting features from text.
AgeRecogniserConfig	Stores URL for AgePredictor
AlphaIdeographFilterFactory	Factory for filter that only allows tokens with characters that "isAlphabetic" or "isIdeographic" through.
AlternativePackaging
AmazonTranscribe	Amazon Transcribe implementation.
AnalyzerManager
AnnotationUtils	This class contains utilities for dealing with tika annotations
AppleSingleFileParser	Parser that strips the header off of AppleSingle and AppleDouble files.
AppParserFactoryBuilder
ArrayNumber	The class is used to represent the number of the array.
AsyncConfig
AsyncEmitter	Worker thread that takes EmitData off the queue, batches it and tries to emit it as a batch
AsyncProcessor	This is the main class for handling async requests.
AsyncRequest
AsyncResource
AsyncStatus
AsyncStatus.ASYNC_STATUS
AttributeDependantMetadataHandler	This adds a Metadata entry for a given node.
AttributeMatcher	Final evaluation state of a `.../@*` XPath expression.
AttributeMetadataHandler	SAX event handler that maps the contents of an XML attribute into a metadata field.
AudioFrame	An Audio Frame in an MP3 file.
AudioParser
AutoDetectParser
AutoDetectParserConfig	This config object can be used to tune how conservative we want to be when parsing data that is extremely compressible and resembles a ZIP bomb.
AutoDetectParserFactory	Simple class for AutoDetectParser
AutoDetectParserFactory	Factory for an AutoDetectParser
AutoDetectReader	An input stream reader that automatically detects the character encoding to be used for converting bytes to characters.
AutoDetectTransformer
AZBlobEmitter	Emit files to Azure blob storage.
AZBlobFetcher	Fetches files from Azure blob storage.
AZBlobPipesIterator
BasicContentHandlerFactory	Basic factory for creating common types of ContentHandlers
BasicContentHandlerFactory.HANDLER_TYPE	Common handler types for content.
BasicObject	Base object for FSSHTTPB.
BasicTikaFSConsumer	Basic FileResourceConsumer that reads files from an input directory and writes content to the output directory.
BasicTikaFSConsumersBuilder
BasicTokenCountStatsCalculator
BatchNoRestartError	FileResourceConsumers should throw this if something catastrophic has happened and the BatchProcess should shutdown and not be restarted.
BatchProcess	This is the main processor class for a single process.
BatchProcess.BATCH_CONSTANTS
BatchProcessBuilder	Builds a BatchProcessor from a combination of runtime arguments and the config file.
BatchProcessDriverCLI
BatchTopCommonTokenCounter	Utility class that runs TopCommonTokenCounter against a directory of table files (named {lang}_table.gz or leipzip-like afr_...-sentences.txt) and outputs common tokens files for each input table file in the output directory.
BinaryItem
Bit	The class is used to read/set bit value for a byte array
BitConverter
BitReader	A class is used to extract values across byte boundaries with arbitrary bit positions.
BitWriter
BodyContentHandler	Content handler decorator that only passes everything inside the XHTML <body/> tag to the underlying handler.
BoilerpipeContentHandler	Uses the boilerpipe library to automatically extract the main content from a web page.
BouncyCastleDigester	Digester that relies on BouncyCastle for MessageDigest implementations.
BoundedInputStream	Very slight modification of Commons' BoundedInputStream so that we can figure out if this hit the bound or not.
BPGParser	Parser for the Better Portable Graphics (BPG) File Format.
BPListDetector	Detector for BPList with utility functions for PList.
ByteDeleter
ByteFlipper
ByteInjector
BytesRefCalculator<T>	Interface for calculators that require a string
BytesRefCalculator.BytesRefCalcInstance<T>
ByteUtil
CachedTranslator	CachedTranslator.
CallablePipesIterator	This is a simple wrapper around `PipesIterator` that allows it to be called in its own thread.
CantFuzzException
CaptionObject	A model for caption objects from graphics and texts typically includes human readable sentence, language of the sentence and confidence score.
CaptureGroupMetadataFilter	This filter runs a regex against the first value in the "sourceField".
Cell	Cell of content.
CellDecorator	Cell decorator.
CellID
CellIDArray
CellManifestCurrentRevision
CellManifestDataElementData	Cell manifest data element
CharsetDetector	`CharsetDetector` provides a facility for detecting the charset or encoding of character data in an unknown format.
CharsetMatch	This class represents a charset that has been identified by a CharsetDetector as a possible encoding for a set of input data.
CharsetUtils
ChildMatcher	Intermediate evaluation state of a `.../*...` XPath expression.
ChmAccessor<T>	Defines an accessor interface
ChmAssert	Contains chm extractor assertions
ChmBlockInfo	A container that contains chm block information such as: i.
ChmCommons
ChmCommons.EntryType	Represents entry types: uncompressed, compressed
ChmCommons.IntelState	Represents intel file states during decompression
ChmCommons.LzxState	Represents lzx states: started decoding, not started decoding
ChmConstants
ChmDirectoryListingSet	Holds chm listing entries
ChmExtractor	Extracts text from chm file.
ChmItsfHeader	The Header 0000: char[4] 'ITSF' 0004: DWORD 3 (Version number) 0008: DWORD Total header length, including header section table and following data.
ChmItspHeader	Directory header The directory starts with a header; its format is as follows: 0000: char[4] 'ITSP' 0004: DWORD Version number 1 0008: DWORD Length of the directory header 000C: DWORD $0a (unknown) 0010: DWORD $1000 Directory chunk size 0014: DWORD "Density" of quickref section, usually 2 0018: DWORD Depth of the index tree - 1 there is no index, 2 if there is one level of PMGI chunks 001C: DWORD Chunk number of root index chunk, -1 if there is none (though at least one file has 0 despite there being no index chunk, probably a bug) 0020: DWORD Chunk number of first PMGL (listing) chunk 0024: DWORD Chunk number of last PMGL (listing) chunk 0028: DWORD -1 (unknown) 002C: DWORD Number of directory chunks (total) 0030: DWORD Windows language ID 0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC} 0044: DWORD $54 (This is the length again) 0048: DWORD -1 (unknown) 004C: DWORD -1 (unknown) 0050: DWORD -1 (unknown)
ChmLzxBlock	Decompresses a chm block.
ChmLzxcControlData	::DataSpace/Storage//ControlData This file contains $20 bytes of information on the compression.
ChmLzxcResetTable	LZXC reset table For ensuring a decompression.
ChmLzxState
ChmParser
ChmParsingException
ChmPmgiHeader	Description Note: not always exists An index chunk has the following format: 0000: char[4] 'PMGI' 0004: DWORD Length of quickref/free area at end of directory chunk 0008: Directory index entries (to quickref/free area) The quickref area in an PMGI is the same as in an PMGL The format of a directory index entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: directory listing chunk which starts with name Encoded Integers aka ENCINT An ENCINT is a variable-length integer.
ChmPmglHeader	Description There are two types of directory chunks -- index chunks, and listing chunks.
ChmSection
ChmWrapper
ChunkingFactory	This class is used to create instance of AbstractChunking.
ChunkingMethod
CJKBigramAwareLengthFilterFactory	Creates a very narrowly focused TokenFilter that limits tokens based on length _unless_ they've been identified as <DOUBLE> or <SINGLE> by the CJKBigramFilter.
ClassLoaderUtil
ClassParser	Parser for Java .class files.
CleanPhoneText	Class to help de-obfuscate phone numbers in text.
ClearByMimeMetadataFilter	This class clears the entire metadata object if the mime matches the mime filter.
ClimateForcast	Met keys from NCAR CCSM files in the Climate Forecast Convention.
ColInfo
Cols
CommandLineParserBuilder	Reads configurable options from a config file and returns org.apache.commons.cli.Options object to be used in commandline parser.
CommonsDigester	Implementation of `DigestingParser.Digester` that relies on commons.codec.digest.DigestUtils to calculate digest hashes.
CommonsDigester.DigestAlgorithm
CommonsDigesterFactory	Simple factory for `CommonsDigester` with default markLimit = 1000000 and md5 digester.
CommonTokenCountManager
CommonTokenOverlapCounter
CommonTokenResult
CommonTokens
CommonTokensBhattacharyya
CommonTokensCosine
CommonTokensHellinger
CommonTokensKLDivergence
CommonTokensKLDNormed
Compact64bitInt	A 9-byte encoding of values in the range 0x0002000000000000 through 0xFFFFFFFFFFFFFFFF
CompactID	This class is used to represent the CompactID structrue.
CompareUtils
CompositeDetector	Content type detector that combines multiple different detection mechanisms.
CompositeDigester
CompositeEncodingDetector
CompositeExternalParser	A Composite Parser that wraps up all the available External Parsers, and provides an easy way to access them.
CompositeMatcher	Composite XPath evaluation state.
CompositeMetadataFilter
CompositeParseContextConfig
CompositeParser	Composite parser that delegates parsing tasks to a component parser based on the declared content type of the incoming document.
CompositePipesReporter
CompositeRenderer
CompositeTagHandler	Takes an array of `ID3Tags` in preference order, and when asked for a given tag, will return it from the first `ID3Tags` that has it.
CompositeTextStatsCalculator
CompressorConstants
CompressorParser	Parser for various compression formats.
CompressorParserOptions	Interface for setting options for the `CompressorParser` by passing via the `ParseContext`.
ConcurrentUtils	Utility Class for Concurrency in Tika
ConfigBase
ConfigurableThreadPoolExecutor	Allows Thread Pool to be Configurable.
ConsumersManager	Simple interface around a collection of consumers that allows for initializing and shutting shared resources (e.g.
ContainerExtractor	Tika container extractor interface.
ContentHandlerDecorator	Decorator base class for the `ContentHandler` interface.
ContentHandlerDecoratorFactory
ContentHandlerExample	Examples of using different Content Handlers to get different parts of the file's contents
ContentHandlerFactory	Interface to allow easier injection of code for getting a new ContentHandler
ContentLengthCalculator
ContentTagParser
ContentTags
ContrastStatistics
CoreNLPNERecogniser	This class offers an implementation of `NERecogniser` based on CRF classifiers from Stanford CoreNLP.
CorruptedFileException	This exception should be thrown when the parse absolutely, positively has to stop.
CreativeCommons	A collection of Creative Commons properties names.
CryptoParser	Decrypts the incoming document stream and delegates further parsing to another parser instance.
CSVMessageBodyWriter
CSVParams
CSVPipesIterator	Iterates through a UTF-8 CSV file.
CSVResult
CTAKESAnnotationProperty	This enumeration includes the properties that an `IdentifiedAnnotation` object can provide.
CTAKESConfig	Configuration for `CTAKESContentHandler`.
CTAKESContentHandler	Class used to extract biomedical information while parsing.
CTAKESParser	CTAKESParser decorates a `Parser` and leverages on `CTAKESContentHandler` to extract biomedical information from clinical text using Apache cTAKES.
CTAKESSerializer	Enumeration for types of cTAKES (UIMA) CAS serializer supported by cTAKES.
CTAKESUtils	This class provides methods to extract biomedical information from plain text using `CTAKESContentHandler` that relies on Apache cTAKES.
CustomMimeInfo
Database
DataElement
DataElementData	Base class of data element
DataElementHash	Specifies an data element hash stream object
DataElementPackage
DataElementParseErrorException
DataElementType	The enumeration of the data element type
DataElementUtils
DataHashObject
DataNodeObjectData	Data Node Object data
DataSizeObject	Data Size Object
DataURIScheme
DataURISchemeParseException
DataURISchemeUtil	Not thread safe.
DateNormalizingMetadataFilter	Some dates in some file formats do not have a timezone.
DateUtils	Date related utility methods and constants
DBBuffer
DBConsumersManager
DBFParser	This is a Tika wrapper around the DBFReader.
DBWriter	This is still in its early stages.
DcXMLParser	Dublin Core metadata parser
DefaultContentHandlerFactoryBuilder	Builds BasicContentHandler with type defined by attribute "basicHandlerType" with possible values: xml, html, text, body, ignore.
DefaultDetector	A composite detector based on all the `Detector` implementations available through the `service provider mechanism`.
DefaultEmbeddedStreamTranslator	Loads EmbeddedStreamTranslators via service loading.
DefaultEncodingDetector	A composite encoding detector based on all the `EncodingDetector` implementations available through the `service provider mechanism`.
DefaultHtmlMapper	The default HTML mapping rules in Tika.
DefaultInputStreamFactory	Passthrough -- returns InputStream as is
DefaultMetadataFilter
DefaultParser	A composite parser based on all the `Parser` implementations available through the `service provider mechanism`.
DefaultProbDetector	A version of `DefaultDetector` for probabilistic mime detectors, which use statistical techniques to blend the results of differing underlying detectors when attempting to detect the type of a given file.
DefaultTranslator	A translator which picks the first available `Translator` implementations available through the `service provider mechanism`.
DefaultZipContainerDetector
DelegatingParser	Base class for parser implementations that want to delegate parts of the task of parsing an input document to another parser.
DeprecatedStreamingZipContainerDetector
DeprecatedZipContainerDetector	A detector that works on Zip documents and tries to figure out basic types -- epub, jar, ear, war, kmz and StarOffice
DescribeMetadata	Print the supported Tika Metadata models and their fields.
Detector	Content type detector.
DetectorResource
DGN8Parser	This is a VERY LIMITED parser.
DIFContentHandler
DIFContentHandler
DIFParser
DigestingAutoDetectParserFactory
DigestingParser
DigestingParser.Digester	Interface for digester.
DigestingParser.DigesterFactory	This is used in `AutoDetectParserConfig` to (optionally) wrap the parser in a digesting parser.
DigestingParser.Encoder	Encodes byte array from a MessageDigest to String
DirectoryListingEntry	The format of a directory listing entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT: length The offset is from the beginning of the content section the file is in, after the section has been decompressed (if appropriate).
DirListParser	Parses the output of /bin/ls and counts the number of files and the number of executables using Tika.
DisplayMetInstance	Grabs a PDF file from a URL and prints its `Metadata`
DL4JInceptionV3Net	`DL4JInceptionV3Net` is an implementation of `ObjectRecogniser`.
DL4JVGG16Net
DocumentSelector	Interface for different document selection strategies for purposes like embedded document extraction by a `ContainerExtractor` instance.
DocumentSelectorConfig
DublinCore	A collection of Dublin Core metadata names.
DumpTikaConfigExample	This class shows how to dump a TikaConfig object to a configuration file.
DurationFormatUtils	Functionality and naming conventions (roughly) copied from org.apache.commons.lang3 so that we didn't have to add another dependency.
DWGParser	DWG (CAD Drawing) parser.
DWGParserConfig
DWGReadFormatRemover	DWGReadFormatRemover removes the formatting from the text from libredwg files so only the raw text remains.
DWGReadParser	DWGReadParser (CAD Drawing) parser.
EightBytesOfData	This class is used to represent the property contains 8 bytes of data in the PropertySet.rgData stream field.
ElementMappingContentHandler	Content handler decorator that maps element `QName`s using a `Map`.
ElementMappingContentHandler.TargetElement
ElementMatcher	Final evaluation state of an XPath expression that targets an element.
ElementMetadataHandler	SAX event handler that maps the contents of an XML element into a metadata field.
EmbeddedContentHandler	Content handler decorator that prevents the `EmbeddedContentHandler.startDocument()` and `EmbeddedContentHandler.endDocument()` events from reaching the decorated handler.
EmbeddedDocumentExtractor
EmbeddedDocumentExtractorFactory
EmbeddedDocumentUtil	Utility class to handle common issues with embedded documents.
EmbeddedPartMetadata	This class records metadata about embedded parts that exists in the xml of the main document.
EmbeddedResourceHandler	Tika container extractor callback interface.
EmbeddedStreamTranslator	Interface for different filtering of embedded streams.
Embedder	Tika embedder interface
EMFParser	Extracts files embedded in EMF and offers a very rough capability to extract text if there is text stored in the EMF.
EmitData
EmitKey
Emitter
EmitterManager	Utility class that will apply the appropriate fetcher to the fetcherString based on the prefix.
EmptyDetector	Dummy detector that returns application/octet-stream for all documents.
EmptyEmitter
EmptyFetcher
EmptyParser	Dummy parser that always produces an empty XHTML document without even attempting to parse the given document stream.
EmptyTranslator	Dummy translator that always declines to give any text.
EncodingDetector	Character encoding detector.
EncryptedDocumentException
EncryptedPrescriptionDetector
EncryptedPrescriptionParser
EndDocumentShieldingContentHandler	A wrapper around a `ContentHandler` which will ignore normal SAX calls to `EndDocumentShieldingContentHandler.endDocument()`, and only fire them later.
EndianUtils	General Endian Related Utilties.
EndianUtils.BufferUnderrunException
EnviHeaderParser
Epub	EPub properties collection.
EpubContentParser	Parser for EPUB OPS `*.html` files.
EpubParser	Epub parser
Error
ErrorParser	Dummy parser that always throws a `TikaException` without even attempting to parse the given document stream.
EvalConsumerBuilder
EvalConsumersBuilder
EvalExceptionUtils
EvilCOSWriter
ExcelExtractor	Excel parser implementation which uses POI's Event API to handle the contents of a Workbook.
ExceptionUtils
ExcludeFieldMetadataFilter
ExecutableParser	Parser for executable files.
ExGuid
ExGUIDArray
ExpandedTitleContentHandler	Content handler decorator which wraps a `TransformerHandler` in order to allow the `TITLE` tag to render as `<title></title>` rather than `<title/>` which is accomplished by calling the `ContentHandler.characters(char[], int, int)` method with a `length` of 1 but a zero length char array.
ExtendedGUID
ExternalEmbedder	Embedder that uses an external program (like sed or exiftool) to embed text content and metadata into a given document.
ExternalParser	Parser that uses an external program (like catdoc or pdf2txt) to extract text content and metadata from a given document.
ExternalParser	This is a next generation external parser that uses some of the more recent additions to Tika.
ExternalParser.LineConsumer	Consumer contract
ExternalParsersConfigReader	Builds up ExternalParser instances based on XML file(s) which define what to run, for what, and how to process any output metadata.
ExternalParsersConfigReaderMetKeys	Met Keys used by the `ExternalParsersConfigReader`.
ExternalParsersFactory	Creates instances of ExternalParser based on XML configuration files.
ExternalProcess
ExternalTranslator	Abstract class used to interact with command line/external Translators.
ExtractComparer
ExtractComparerBuilder
ExtractEmbeddedFiles
ExtractProfiler
ExtractProfilerBuilder
ExtractReader
ExtractReader.ALTER_METADATA_LIST
ExtractReaderException	Exception when trying to read extract
ExtractReaderException.TYPE
FailedToStartClientException	This should be catastrophic
FallbackParser	Tries multiple parsers in turn, until one succeeds.
FeedParser	Feed parser.
FetchEmitTuple
FetchEmitTuple.ON_PARSE_EXCEPTION
Fetcher	Interface for an object that will fetch an InputStream given a fetch string.
FetcherManager	Utility class to hold multiple fetchers.
FetcherStreamFactory	This class looks for "fetcherName" in the http header.
FetcherStringException	If something goes wrong in parsing the fetcher string
FetchKey	Pair of fetcherName (which fetcher to call) and the key to send to that fetcher to retrieve a specific file.
FictionBookParser
Field	Field annotation is a contract for binding `Param` value from Tika Configuration to an object.
FieldNameMappingFilter
FileCommandDetector	This runs the linux 'file' command against a file.
FileListPipesIterator	Reads a list of file names/relative paths from a UTF-8 file.
FilenameUtils
FileProcessResult
FileProfiler	This class profiles actual files as opposed to extracts e.g.
FileProfilerBuilder
FileResource	This is a basic interface to handle a logical "file".
FileResourceConsumer	This is a base class for file consumers.
FileResourceCrawler
FileSystem	A collection of metadata elements for file system level metadata
FileSystemEmitter	Emitter to write to a file system.
FileSystemFetcher
FileSystemPipesIterator
FileSystemStatusReporter	This is intended to write summary statistics to disk periodically.
FileTooLongException
FlatOpenDocumentParser
FLVParser	Parser for metadata contained in Flash Videos (.flv).
Font
ForkParser
ForkProxy
ForkResource
FormattingUtils
FormattingUtils.Tag
FourBytesOfData	This class is used to represent the property contains 4 bytes of data in the PropertySet.rgData stream field.
FrictionlessPackageDetector
FSBatchProcessCLI
FSConsumersManager
FSCrawlerBuilder	Builds either an FSDirectoryCrawler or an FSListCrawler.
FSDirectoryCrawler
FSDirectoryCrawler.CRAWL_ORDER
FSDocumentSelector	Selector that chooses files based on their file name and their size, as determined by TikaCoreProperties.RESOURCE_NAME_KEY and Metadata.CONTENT_LENGTH.
FSFileResource	FileSystem(FS)Resource wraps a file name.
FSListCrawler	Class that "crawls" a list of files.
FSOutputStreamFactory
FSOutputStreamFactory.COMPRESSION
FSProperties
FSUtil	Utility class to handle some common issues when reading from and writing to a file system (FS).
FSUtil.HANDLE_EXISTING
FuzzingCLI
FuzzingCLIConfig
FuzzOne	Forked process that runs against a single input file
GCSEmitter
GCSFetcher	Fetches files from google cloud storage.
GCSPipesIterator
GDALParser	Wraps execution of the Geospatial Data Abstraction Library (GDAL) `gdalinfo` tool used to extract geospatial information out of hundreds of geo file formats.
GeneralTransformer
GenericConverter	Trys to convert as much of the properties in the `Metadata` map to XMP namespaces.
GeoGazetteerClient
Geographic	Geographic schema.
GeographicInformationParser
GeoParser
GeoParserConfig
GeoPointMetadataFilter	If `Metadata` contains a `TikaCoreProperties.LATITUDE` and a `TikaCoreProperties.LONGITUDE`, this filter concatenates those with a comma in the order LATITUDE,LONGITUDE.
GeoTag
GlobalIdTableEntry3FNDX
GlobalIdTableEntryFNDX
GoogleTranslator	An implementation of a REST client to the Google Translate v2 API.
GrabPhoneNumbersExample	Class to demonstrate how to use the `PhoneExtractingContentHandler` to get a list of all of the phone numbers from every file in a directory.
GribParser
GrobidNERecogniser
GrobidRESTParser
GUID
GuidUtil
GZipSpecializationDetector	This is designed to detect commonly gzipped file types such as warc.gz.
H2Util
HandlerConfig
HandlerConfig.PARSE_MODE	`HandlerConfig.PARSE_MODE.RMETA` "recursive metadata" is the same as the -J option in tika-app and the /rmeta endpoint in tika-server.
HDFParser	Since the `NetCDFParser` depends on the NetCDF-Java API, we are able to use it to parse HDF files as well.
HeaderCell
HeifParser
HexCoDec	A set of Hex encoding and decoding utility methods.
HSLFExtractor
HTML
HtmlEncodingDetector	Character encoding detector for determining the character encoding of a HTML document based on the potential charset parameter found in a Content-Type http-equiv meta tag somewhere near the beginning.
HTMLHelper	Helps produce user facing HTML output.
HtmlMapper	HTML mapper used to make incoming HTML documents easier to handle by Tika clients.
HtmlParser	HTML parser.
HttpClientFactory	This holds quite a bit of state and is not thread safe.
HttpClientUtil
HttpFetcher	Based on Apache httpclient
HttpHeaders	A collection of HTTP header names.
HttpParser
HwpStreamReader
HwpTextExtractorV5
HwpV5Parser
ICNSParser	A basic parser class for Apple ICNS icon files
IContentHandlerFactoryBuilder
ICrawlerBuilder
Icu4jEncodingDetector
ID3Tags	Interface that defines the common interface for ID3 tag parsers, such as ID3v1 and ID3v2.3.
ID3Tags.ID3Comment	Represents a comments in ID3 (especially ID3 v2), where are made up of several parts
ID3v1Handler	This is used to parse ID3 Version 1 Tag information from an MP3 file, if available.
ID3v22Handler	This is used to parse ID3 Version 2.2 Tag information from an MP3 file, if available.
ID3v23Handler	This is used to parse ID3 Version 2.3 Tag information from an MP3 file, if available.
ID3v24Handler	This is used to parse ID3 Version 2.4 Tag information from an MP3 file, if available.
ID3v2Frame	A frame of ID3v2 data, which is then passed to a handler to be turned into useful data.
ID3v2Frame.RawTag
ID3v2Frame.TextEncoding
IDBWriter
IdentityHtmlMapper	Alternative HTML mapping rules that pass the input HTML as-is without any modifications.
IDMLParser	Adobe InDesign IDML Parser.
IFileProcessorFutureResult	stub interface to allow for different result types from different processors
IFSSHTTPBSerializable	FSSHTTPB Serialize interface.
ImageDeskew
ImageDeskew.HoughLine
ImageGraphicsEngine	Copied nearly verbatim from PDFBox
ImageGraphicsEngineFactory
ImageMetadataExtractor	Uses the Metadata Extractor library to read EXIF and IPTC image metadata and map to Tika fields.
ImageParser
ImageUtil
ImportContextImpl	`ImportContextImpl`...
IncludeFieldMetadataFilter
IncrementalUpdateRecord
Initializable	Components that must do special processing across multiple fields at initialization time should implement this interface.
InitializableProblemHandler	This is to be used to handle potential recoverable problems that might arise during initialization.
InputStreamDigester
InputStreamFactory	A factory which returns a fresh `InputStream` for the same resource each time.
InputStreamFactory	Interface to allow for custom/consistent creation of InputStream
IntermediateNodeObject
IntermediateNodeObject.RootNodeObjectBuilder	The class is used to build a root node object.
InterruptableParsingExample	This example demonstrates how to interrupt document parsing if some condition is met.
Interrupter	Class that waits for input on System.in.
InterrupterBuilder	Builds an Interrupter
InterrupterFutureResult
IOUtils
IPADetector
IParserFactoryBuilder
IProperty	The interface of the property in OneNote file.
IPTC	IPTC photo metadata schema.
IptcAnpaParser	Parser for IPTC ANPA New Wire Feeds
ISArchiveParser
ISATabUtils
IsIncrementalUpdate
ITikaToXMPConverter	Interface for the specific `Metadata` to XMP converters
IWork13PackageParser
IWork13PackageParser.IWork13DocumentType
IWork18PackageParser	For now, this parser isn't even registered.
IWork18PackageParser.IWork18DocumentType
IWorkDetector
IWorkPackageParser	A parser for the IWork container files.
IWorkPackageParser.IWORKDocumentType
JackcessParser	Parser that handles Microsoft Access files via Jackcess
JarDetector
JCID	This class is used to represent a JCID
JCIDObject	This class is used to represent the JCID object.
JDBCEmitter	This is only an initial, basic implementation of an emitter for JDBC.
JDBCEmitter.AttachmentStrategy
JDBCEmitter.MultivaluedFieldStrategy
JDBCPipesIterator	Iterates through a the results from a sql call via jdbc.
JDBCPipesReporter	This is an initial draft of a JDBCPipesReporter.
JDBCTableReader	General base class to iterate through rows of a JDBC table
JDBCUtil
JDBCUtil.CREATE_TABLE
JempboxExtractor
JoshuaNetworkTranslator	This translator is designed to work with a TCP-IP available Joshua translation server, specifically the REST-based Joshua server.
JournalParser
JpegParser
JsonEmitData
JsonFetchEmitTuple
JsonFetchEmitTupleList
JSONMessageBodyWriter
JsonMetadata
JsonMetadataList
JSONObjWriter
JsonResponse
JsonResponse
JsonStreamingSerializer
JXLParser	Tries to scrape XMP out of JXL
KafkaEmitter	Emits the now-parsed documents into a specified Apache Kafka topic.
KafkaPipesIterator
KMZDetector
LangModel
Language
LanguageAwareTokenCountStats<T>	Interface for calculators that require language probabilities and token stats
LanguageConfidence
LanguageDetectingParser
LanguageDetector
LanguageDetectorExample
LanguageDetectorTest
LanguageHandler	SAX content handler that updates a language detector based on all the received character content.
LanguageIdentifier	Identifier of the language that best matches a given content profile.
LanguageIDWrapper
LanguageNames	Support for language tags (as defined by https://tools.ietf.org/html/bcp47)
LanguageProfile	Language profile based on ngram counts.
LanguageProfilerBuilder	This class runs a ngram analysis over submitted text, results might be used for automatic language identification.
LanguageResource
LanguageResult
LanguageWriter	Writer that builds a language profile based on all the written content.
Latin1StringsParser	Parser to extract printable Latin1 strings from arbitrary files with pure java without running any external process.
LeafNodeObject
LeafNodeObject.IntermediateNodeObjectBuilder	The class is used to build a intermediate node object.
LeipzigHelper
LeipzigSampler
Lingo24LangDetector	An implementation of a Language Detector using the Premium MT API v1.
Lingo24Translator	An implementation of a REST client for the Premium MT API v1.
Link
LinkContentHandler	Content handler that collects links from an XHTML document.
LinkedCell	Linked cell.
ListDescriptor	Contains the information for a single list in the list or list override tables.
ListManager	Computes the number text which goes at the beginning of each list paragraph
LittleEndianBitConverter	Implement a converter which converts to/from little-endian byte arrays
LoadErrorHandler	Interface for error handling strategies in service class loading.
Location
LoggingPipesReporter	Simple PipesReporter that logs everything at the debug level.
LookaheadInputStream	Stream wrapper that make it easy to read up to n bytes ahead from a stream that supports the mark feature.
LuceneIndexer
LuceneIndexerExtended
LyricsHandler	This is used to parse Lyrics3 tag information from an MP3 file, if available.
MachineMetadata	Metadata for describing machines, such as their architecture, type and endian-ness
MachineMetadata.Endian
MagicDetector	Content type detection based on magic bytes, i.e.
MailDateParser	Dates in emails are a mess.
MailUtil
MappedBufferCleaner	Copied/pasted from the Apache Lucene/Solr project.
MarianTranslator	Translator that uses the Marian NMT decoder for translation.
MarianTranslator.MarianServerClient	Internal Client for marian-server Web Socket Server.
Matcher	XPath element matcher.
MatchingContentHandler	Content handler decorator that only passes the elements, attributes, and text nodes that match the given XPath expression.
MatParser
MboxParser	Mbox (mailbox) parser.
MediaType	Internet media type.
MediaTypeExample
MediaTypeRegistry	Registry of known Internet media types.
Message	A collection of Message related property names.
Metadata	A multi-valued metadata container.
MetadataAwareLuceneIndexer	Builds on the LuceneIndexer from Chapter 5 and adds indexing of Metadata.
MetadataExtractor	OOXML metadata extractor.
MetadataFields	Knowns about all declared `Metadata` fields.
MetadataFilter	Filters the metadata in place after the parse
MetadataHandler	Deprecated. Use the `AttributeMetadataHandler` and `ElementMetadataHandler` classes instead
MetadataList	wrapper class to make isWriteable in MetadataListMBW simpler
MetadataListMessageBodyWriter
MetadataResource
MetadataWriteFilter
MetadataWriteFilterFactory
MicrosoftTranslator	Wrapper class to access the Windows translation service.
MidiParser
MIFContentHandler	Content handler for MIF Content and Metadata.
MIFExtractor	Helper Class to Parse and Extract Adobe MIF Files.
MIFParser
MimeBuffer
MimeType	Internet media type.
MimeTypeException	A class to encapsulate MimeType related exceptions.
MimeTypes	This class is a MimeType repository.
MimeTypesFactory	Creates instances of MimeTypes.
MimeTypesReader	A reader for XML files compliant with the freedesktop MIME-info DTD.
MimeTypesReaderMetKeys	Met Keys used by the `MimeTypesReader`.
MiscOLEDetector	A detector that works on a POIFS OLE2 document to figure out exactly what the file is.
MITIENERecogniser	This class offers an implementation of `NERecogniser` based on trained models using state-of-the-art information extraction tools.
MosesTranslator	Translator that uses the Moses decoder for translation.
MP3Frame	A frame in an MP3 file, such as ID3v2 Tags or some audio.
Mp3Parser	The `Mp3Parser` is used to parse ID3 Version 1 Tag information from an MP3 file, if available.
Mp3Parser.ID3TagsAndAudio
MP4Parser	Parser for the MP4 media container format, as well as the older QuickTime format that MP4 is based on.
MSEmbeddedStreamTranslator
MSOfficeBinaryConverter	Tika to XMP mapping for the binary MS formats Word (.doc), Excel (.xls) and PowerPoint (.ppt).
MSOfficeXMLConverter	Tika to XMP mapping for the Office Open XML formats Word (.docx), Excel (.xlsx) and PowerPoint (.pptx).
MSOneStorePackage
MSOneStoreParser
MSOwnerFileParser	Parser for temporary MSOFfice files.
MuPDFRenderer
MyFirstTika	Demonstrates how to call the different components within Tika: its `Detector` framework (aka MIME identification and repository), its `Parser` interface, its `org.apache.tika.language.LanguageIdentifier` and other goodies.
NamedAttributeMatcher	Final evaluation state of a `.../@name` XPath expression.
NamedElementMatcher	Intermediate evaluation state of a `.../name...` XPath expression.
NamedEntityParser	This implementation of `Parser` extracts entity names from text content and adds it to the metadata.
NameDetector	Content type detection based on the resource name.
NameEntityExtractor
Namespace	Utility class to hold namespace information.
NERecogniser	Defines a contract for named entity recogniser.
NetCDFParser	A `Parser` for NetCDF files using the UCAR, MIT-licensed NetCDF for Java API.
NetworkParser
NLTKNERecogniser	This class offers an implementation of `NERecogniser` based on ne_chunk() module of NLTK.
NNExampleModelDetector
NNTrainedModel
NNTrainedModelBuilder
NoData	This class is used to represent the property contains no data.
NodeMatcher	Final evaluation state of a `.../node()` XPath expression.
NodeObject
NonDetectingEncodingDetector	Always returns the charset passed in via the initializer
NoOpFilter	This filter performs no operations on the metadata and leaves it untouched.
NoTextPDFRenderer	This class extends the PDFRenderer to exclude rendering of electronic text.
NSNormalizerContentHandler	Content handler decorator that: Maps old OpenOffice 1.0 Namespaces to the OpenDocument ones Returns a fake DTD when parser requests OpenOffice DTD
NumberCell	Number cell.
ObjectFromDOMAndQueueBuilder<T>	Same as `ObjectFromDOMAndQueueBuilder`, but this is for objects that require access to the shared queue.
ObjectFromDOMBuilder<T>	Interface for things that build objects from a DOM Node and a map of runtime attributes
ObjectGroupData	The ObjectGroupData class.
ObjectGroupDataElementData
ObjectGroupDataElementData.Builder	The internal class for build a list of DataElement from a node object.
ObjectGroupDeclarations	Object Group Declarations
ObjectGroupMetadata	Specifies an object group metadata
ObjectGroupMetadataDeclarations	Object Metadata Declaration
ObjectGroupObjectBLOBDataDeclaration	object data BLOB declaration
ObjectGroupObjectData
ObjectGroupObjectDataBLOBReference	object data BLOB reference
ObjectGroupObjectDeclare
ObjectRecogniser	This is a contract for object recognisers used by `ObjectRecognitionParser`
ObjectRecognitionParser	This parser recognises objects from Images.
ObjectSpaceObjectPropSet	This class is used to represent a ObjectSpaceObjectPropSet.
ObjectSpaceObjectPropSet
ObjectSpaceObjectStreamHeader
ObjectSpaceObjectStreamOfContextIDs	This class is used to represent a ObjectSpaceObjectStreamOfContextIDs.
ObjectSpaceObjectStreamOfOIDs	This class is used to represent a ObjectSpaceObjectStreamOfOIDs.
ObjectSpaceObjectStreamOfOSIDs	This class is used to represent a ObjectSpaceObjectStreamOfOSIDs.
OfferLargerThanQueueSize
Office	Office Document properties collection.
OfficeOpenXMLCore	Core properties as defined in the Office Open XML specification part Two that are not in the DublinCore namespace.
OfficeOpenXMLExtended	Extended properties as defined in the Office Open XML specification part Four.
OfficeParser	Defines a Microsoft document content extractor.
OfficeParser.POIFSDocumentType
OfficeParserConfig
OfflineContentHandler	Content handler decorator that always returns an empty stream from the `OfflineContentHandler.resolveEntity(String, String)` method to prevent potential network or other external resources from being accessed by an XML parser.
OldExcelParser	A POI-powered Tika Parser for very old versions of Excel, from pre-OLE2 days, such as Excel 4.
OneByteOfData	This class is used to represent the property contains 1 byte of data in the PropertySet.rgData stream field.
OneNoteParser	OneNote tika parser capable of parsing Microsoft OneNote files.
OneNotePropertyEnum
OneNoteTreeWalkerOptions	Options when walking the one note tree.
OOXMLExtractor	Interface implemented by all Tika OOXML extractors.
OOXMLExtractorFactory	Figures out the correct `OOXMLExtractor` for the supplied document and returns it.
OOXMLParser	Office Open XML (OOXML) parser.
OOXMLTikaBodyPartHandler
OOXMLWordAndPowerPointTextHandler	This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.
OOXMLWordAndPowerPointTextHandler.EditType
OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
OPCPackageDetector
OPCPackageWrapper	This is a wrapper around OPCPackage that calls revert() instead of close().
OpenDocumentContentParser	Parser for ODF `content.xml` files.
OpenDocumentConverter	Tika to XMP mapping for the Open Document formats: Text (.odt), Spreatsheet (.ods), Graphics (.odg) and Presentation (.odp).
OpenDocumentDetector
OpenDocumentMetaParser	Parser for OpenDocument `meta.xml` files.
OpenDocumentParser	OpenOffice parser
OpenNLPDetector	This is based on OpenNLP's language detector.
OpenNLPMetadataFilter
OpenNLPNameFinder	An implementation of `NERecogniser` that finds names in text using Open NLP Model.
OpenNLPNERecogniser	This implementation of `NERecogniser` chains an array of `OpenNLPNameFinder`s for which NER models are available in classpath.
OpenSearchClient
OpenSearchClient
OpenSearchEmitter
OpenSearchEmitter.AttachmentStrategy
OpenSearchEmitter.UpdateStrategy
OpenSearchPipesReporter	As of the 2.5.0 release, this is ALPHA version.
OPFParser	Use this to parse the .opf files
OptimaizeLangDetector	Implementation of the LanguageDetector API that uses https://github.com/optimaize/language-detector
OptimaizeMetadataFilter
OutlookExtractor	Outlook Message Parser.
OutlookExtractor.RECIPIENT_TYPE
OutlookPSTParser	Parser for MS Outlook PST email storage files
OutputStreamFactory
OverrideDetector	Deprecated. after 2.5.0 this functionality was moved to the CompositeDetector
PackageConstants
PackageParser	Parser for various packaging formats.
PageBasedRenderResults
PagedText	XMP Paged-text schema.
PageRangeRequest	The range of pages to render.
ParagraphProperties
ParallelFileProcessingResult
Param<T>	This is a serializable model class for parameters from configuration file.
ParamField	This class stores metdata for `Field` annotation are used to map them to `Param` at runtime
ParseContext	Parse context.
ParseContextConfig	Implementations must be thread-safe!
Parser	Tika parser interface.
ParserContainerExtractor	An implementation of `ContainerExtractor` powered by the regular `Parser` API.
ParserDecorator	Decorator base class for the `Parser` interface.
ParseRecord	Use this class to store exceptions, warnings and other information during the parse.
ParserFactory
ParserFactory
ParserFactoryBuilder
ParserFactoryFactory	Lightweight, easily serializable class that contains enough information to build a `ParserFactory`
ParserPostProcessor	Parser decorator that post-processes the results from a decorated parser.
ParserUtils	Helper util methods for Parsers themselves.
ParsingEmbeddedDocumentExtractor	Helper class for parsers of package archives or other compound document formats that support embedded or attached component documents.
ParsingEmbeddedDocumentExtractorFactory
ParsingExample
ParsingReader	Reader for the text content from a given binary stream.
PasswordProvider	Interface for providing a password to a Parser for handling Encrypted and Password Protected Documents.
PasswordProviderConfig
PDDocumentRenderer	stub interface for the PDFParser to use to figure out if it needs to pass on the PDDocument or create a temp file to be used by a file-based renderer down the road.
PDF	PDF properties collection.
PDFBoxRenderer
PDFMarkedContent2XHTML	This was added in Tika 1.24 as an alpha version of a text extractor that builds the text from the marked text tree and includes/normalizes some of the structural tags.
PDFParser	PDF parser.
PDFParserConfig	Config for PDFParser.
PDFParserConfig.IMAGE_STRATEGY
PDFParserConfig.OCR_RENDERING_STRATEGY
PDFParserConfig.OCR_STRATEGY
PDFParserConfig.OCRStrategyAuto	Encapsulate the numbers used to control OCR Strategy when set to auto
PDFRenderingState
PDFServerConfig	PDF parser configuration, for the request
PDFTransformer
PDFTransformerConfig
PDMetadataExtractor
Pharmacy
PhoneExtractingContentHandler	Class used to extract phone numbers while parsing.
Photoshop	XMP Photoshop metadata schema.
PickBestTextEncodingParser	Deprecated. Currently not suitable for real use, more a demo / prototype!
PipesClient	The PipesClient is designed to be single-threaded.
PipesConfig
PipesConfigBase
PipesException	Fatal exception that means that something went seriously wrong.
PipesIterator	Abstract class that handles the testing for timeouts/thread safety issues.
PipesParser
PipesReporter	This is called asynchronously by the AsyncProcessor.
PipesReporterBase	Base class that includes filtering by `PipesResult.STATUS`
PipesResource
PipesResult
PipesResult.STATUS
PipesServer	This server is forked from the PipesClient.
PipesServer.STATUS
Pkcs7Parser	Basic parser for PKCS7 data.
PListParser	Parser for Apple's plist and bplist.
POIFSContainerDetector	A detector that works on a POIFS OLE2 document to figure out exactly what the file is.
POIXMLTextExtractorDecorator
PooledTimeSeriesParser	Uses the Pooled Time Series algorithm + command line tool, to generate a numeric representation of the video suitable for similarity searches.
PrescriptionParser
PrettyMetadataKeyComparator
ProbabilisticMimeDetectionSelector	Selector for combining different mime detection results based on probability
ProbabilisticMimeDetectionSelector.Builder	build class for probability parameters setting
ProcessUtils
ProduceTypeResourceComparator	Resource comparator based to produce type.
ProfilingWriter	Writer that builds a language profile based on all the written content.
Property	XMP property definition.
Property.PropertyType
Property.ValueType
PropertyID	This class is used to represent a PropertyID.
PropertySet	This class is used to represent a PropertySet.
PropertySetObject	This class is used to represent the property set.
PropertyType
PropertyTypeException	XMP property definition violation exception.
PropsUtil	Utility class to handle properties.
PrtArrayOfPropertyValues	The class is used to represent the prtArrayOfPropertyValues .
PrtFourBytesOfLengthFollowedByData	This class is used to represent the prtFourBytesOfLengthFollowedByData.
PRTParser	A basic text extracting parser for the CADKey PRT (CAD Drawing) format.
PSDParser	Parser for the Adobe Photoshop PSD File Format.
QuattroPro	QuattroPro properties collection.
QuattroProParser	Parser for Corel QuattroPro documents (part of Corel WordPerfect Office Suite).
RangeFetcher	This class extracts a range of bytes from a given fetch key.
RarParser	Parser for Rar files.
RDCAnalysisChunking	This class is used to process RDC analysis chunking
RecentFiles	Builds on top of the LuceneIndexer and the Metadata discussions in Chapter 6 to output an RSS (or RDF) feed of files crawled by the LuceneIndexer within the last N minutes.
RecognisedObject	A model for recognised objects from graphics and texts typically includes human readable label for the object, language of the label, id and confidence score.
RecursiveMetadataResource
RecursiveParserWrapper	This is a helper class that wraps a parser in a recursive handler.
RecursiveParserWrapperFSConsumer	This runs a RecursiveParserWrapper against an input file and outputs the json metadata to an output file.
RecursiveParserWrapperHandler	This is the default implementation of `AbstractRecursiveParserWrapperHandler`.
RegexCaptureParser
RegexNERecogniser	This class offers an implementation of `NERecogniser` based on Regular Expressions.
RegexUtils	Inspired from Nutch code class OutlinkExtractor.
Renderer	Interface for a renderer.
Rendering
RenderingParser
RenderingState	This should be to track state for each file (embedded or otherwise).
RenderingTracker	Use this in the ParseContext to keep track of unique ids for rendered images in embedded docs.
RenderRequest	Empty interface for requests to a renderer.
RenderResult
RenderResult.STATUS
RenderResults
ReplacementCharset	An implementation of the standard "replacement" charset defined by the W3C.
Report	This class represents a single report.
ReporterBuilder	Interface for reporter builders
RequestTypes	The enumeration of request type.
RereadableInputStream	Wraps an input stream, reading it only once, but making it available for rereading an arbitrary number of times.
ResultsReporter
RevisionManifest
RevisionManifestDataElementData
RevisionManifestObjectGroupReferences	Specifies a revision manifest object group references, each followed by object group extended GUIDs
RevisionManifestRootDeclare	Specifies a revision manifest root declare, each followed by root and object extended GUIDs
RevisionStoreObject	The class is used to represent the revision store object.
RevisionStoreObjectGroup
RFC822Parser	Uses apache-mime4j to parse emails.
RichTextContentHandler	Content handler for Rich Text, it will extract XHTML <img/> tag <alt/> attribute and XHTML <a/> tag <name/> attribute into the output.
RollbackSoftware	Demonstrates Tika and its ability to sense symlinks.
RTFConverter	Tika to XMP mapping for the RTF format.
RTFMetadata
RTFParser	RTF parser
RTGTranslator	This translator is designed to work with a TCP-IP available RTG translation server, specifically the REST-based RTG server.
RunProperties	WARNING: This class is mutable.
RuntimeSAXException	Use this to throw a SAXException in subclassed methods that don't throw SAXExceptions
S3Emitter	Emits to existing s3 bucket
S3Fetcher	Fetches files from s3.
S3PipesIterator
SafeContentHandler	Content handler decorator that makes sure that the character events (`SafeContentHandler.characters(char[], int, int)` or `SafeContentHandler.ignorableWhitespace(char[], int, int)`) passed to the decorated content handler contain only valid XML characters.
SafeContentHandler.Output	Internal interface that allows both character and ignorable whitespace content to be filtered the same way.
SAS7BDATParser	Processes the SAS7BDAT data columnar database file used by SAS and other similar languages.
SecureContentHandler	Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.
SentimentAnalysisParser	This parser classifies documents based on the sentiment of document.
SequenceNumberGenerator
SerialNumber
ServerStatus
ServerStatus.STATUS
ServerStatus.TASK
ServerStatusResource
ServerStatusWatcher
ServiceLoader	Internal utility class that Tika uses to look up service providers.
ServiceLoaderUtils	Service Loading and Ordering related utils
SiegfriedDetector	Simple wrapper around Siegfried https://github.com/richardlehane/siegfried The default behavior is to run detection, report the results in the metadata and then return null so that other detectors will be used.
SignatureObject	Signature Object
SimpleChunking
SimpleLogReporterBuilder
SimpleTextExtractor
SimpleThreadPoolExecutor	Simple Thread Pool Executor
SimpleTypeDetector
SlowCompositeReaderWrapper	COPIED VERBATIM FROM LUCENE This class forces a composite reader (eg a `MultiReader` or `DirectoryReader`) to emulate a `LeafReader`.
SolrEmitter
SolrEmitter.AttachmentStrategy
SolrEmitter.UpdateStrategy
SolrPipesIterator	Iterates through results from a Solr query.
SourceCodeParser	Generic Source code parser for Java, Groovy, C++.
SpanSwapper	randomly swaps spans from the input
SpreadsheetMLParser	Parses wordml 2003 format Excel files.
SpringExample
SQLite3Parser	This is the main class for parsing SQLite3 files.
StandardHtmlEncodingDetector	An encoding detector that tries to respect the spirit of the HTML spec part 12.2.3 "The input byte stream", or at least the part that is compatible with the implementation of tika.
StandardOrganizations	This class provides a collection of the most important technical standard organizations.
StandardReference	Class that represents a standard reference.
StandardReference.StandardReferenceBuilder
StandardsExtractingContentHandler	StandardsExtractingContentHandler is a Content Handler used to extract standard references while parsing.
StandardsExtractionExample	Class to demonstrate how to use the `StandardsExtractingContentHandler` to get a list of the standard references from every file in a directory.
StandardsText	StandardText relies on regular expressions to extract standard references from text.
StandardWriteFilter	This is to be used to limit the amount of metadata that a parser can add based on the `StandardWriteFilter.maxTotalEstimatedSize`, `StandardWriteFilter.maxFieldSize`, `StandardWriteFilter.maxValuesPerField`, and `StandardWriteFilter.maxKeySize`.
StandardWriteFilterFactory	Factory class for `StandardWriteFilter`.
StarOfficeDetector
StartXRefOffset
StartXRefScanner	This is a first draft of a scanner to extract incremental updates out of PDFs.
StatefulParser	The RecursiveParserWrapper wraps the parser sent into the parsecontext and then uses that parser to store state (among many other things).
StatusReporter	Basic class to use for reporting status from both the crawler and the consumers.
StatusReporterBuilder
StatusReporterFutureResult	Empty class for what a StatusReporter returns when it finishes.
StoppingEarlyException	Sentinel exception to stop parsing xml once target is found while SAX parsing.
StorageIndexCellMapping	Specifies the storage index cell mappings (with cell identifier, cell mapping extended GUID, and cell mapping serial number)
StorageIndexDataElementData
StorageIndexManifestMapping
StorageIndexRevisionMapping	Specifies the storage index revision mappings (with revision and revision mapping extended GUIDs, and revision mapping serial number)
StorageManifestDataElementData
StorageManifestRootDeclare	Specifies one or more storage manifest root declare.
StorageManifestSchemaGUID	Specifies a storage manifest schema GUID
StrawManTikaAppDriver	Simple single-threaded class that calls tika-app against every file in a directory.
StreamEmitter
StreamGobbler
StreamingDetectContext
StreamingZipContainerDetector	Currently only used in tests.
StreamObject
StreamObjectHeaderEnd
StreamObjectHeaderEnd16bit	An 16-bit header for a compound object would indicate the end of a stream object
StreamObjectHeaderEnd8bit	An 8-bit header for a compound object would indicate the end of a stream object
StreamObjectHeaderStart	This class specifies the base class for 16-bit or 32-bit stream object header start
StreamObjectHeaderStart16bit	An 16-bit header for a compound object would indicate the start of a stream object
StreamObjectHeaderStart32bit	An 32-bit header for a compound object would indicate the start of a stream object
StreamObjectParseErrorException
StreamObjectTypeHeaderEnd
StreamObjectTypeHeaderStart	The enumeration of the stream object type header start
StreamOutRPWFSConsumer	This uses the `JsonStreamingSerializer` to write out a single metadata object at a time.
StringsConfig	Configuration for the "strings" (or strings-alternative) command.
StringsEncoding	Character encoding of the strings that are to be found using the "strings" command.
StringsParser	Parser that uses the "strings" (or strings-alternative) command to find the printable strings in a object, or other binary, file (application/octet-stream).
StringStatsCalculator<T>	Interface for calculators that require a string
StringUtils
SubtreeMatcher	Evaluation state of a `...//...` XPath expression.
SummaryExtractor	Extractor for Common OLE2 (HPSF) metadata
SupplementingParser	Runs the input stream through all available parsers, merging the metadata from them based on the `AbstractMultipleParser.MetadataPolicy` chosen.
SXSLFPowerPointExtractorDecorator	SAX/Streaming pptx extractior
SXWPFWordExtractorDecorator	This is an experimental, alternative extractor for docx files.
SystemUtils	Copied from commons-lang to avoid requiring the dependency
TableInfo
TaggedContentHandler	A content handler decorator that tags potential exceptions so that the handler that caused the exception can easily be identified.
TaggedSAXException	A `SAXException` wrapper that tags the wrapped exception with a given object reference.
TailStream	A specialized input stream implementation which records the last portion read from an underlying stream.
TarWriter
TaskStatus
TeeContentHandler	Content handler proxy that forwards the received SAX events to zero or more underlying content handlers.
TEIDOMParser
TemporaryResources	Utility class for tracking and ultimately closing or otherwise disposing a collection of temporary resources.
TensorflowImageRecParser	This is an implementation of `ObjectRecogniser` powered by Tensorflow convolutional neural network (CNN).
TensorflowRESTCaptioner	Tensorflow image captioner.
TensorflowRESTRecogniser	Tensor Flow image recogniser which has high performance.
TensorflowRESTVideoRecogniser	Tensor Flow video recogniser which has high performance.
TesseractOCRConfig	Configuration for TesseractOCRParser.
TesseractOCRConfig.OUTPUT_TYPE
TesseractOCRParser	TesseractOCRParser powered by tesseract-ocr engine.
TesseractServerConfig	Tesseract configuration, for the request
TextAndAttributeContentHandler
TextAndAttributeXMLParser
TextAndCSVParser	Unless the `TikaCoreProperties.CONTENT_TYPE_USER_OVERRIDE` is set, this parser tries to assess whether the file is a text file, csv or tsv.
TextCell	Text cell.
TextContentHandler	Content handler decorator that only passes the `TextContentHandler.characters(char[], int, int)` and (@link `TextContentHandler.ignorableWhitespace(char[], int, int)` (plus `TextContentHandler.startDocument()` and `TextContentHandler.endDocument()` events to the decorated content handler.
TextDetector	Content type detection of plain text documents.
TextLangDetector	Language Detection using MIT Lincoln Lab’s Text.jl library https://github.com/trevorlewis/TextREST.jl
TextMatcher	Final evaluation state of a `.../text()` XPath expression.
TextMessageBodyWriter	Returns simple text string for a particular metadata value.
TextOnlyPDFRenderer	This class extends the PDFRenderer to render only the textual elements
TextProfileSignature	Copied nearly directly from Apache Nutch: https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/TextProfileSignature.java
TextSha256Signature	Calculates the base32 encoded SHA-256 checksum on the analyzed text
TextStatistics	Utility class for computing a histogram of the bytes seen in a stream.
TextStatsCalculator	Base text stats interface
TextStatsFromTikaEval	These examples create a new `CompositeTextStatsCalculator` for each call.
TIAParsingExample
TIFF	XMP Exif TIFF schema.
TiffParser
Tika	Facade class for accessing Tika functionality.
TikaActivator	Bundle activator that adjust the class loading mechanism of the `ServiceLoader` class to work correctly in an OSGi environment.
TikaAsyncCLI
TikaCLI	Simple command line interface for Apache Tika.
TikaClient
TikaClientCLI
TikaClientConfigException
TikaClientException
TikaConfig	Parse xml config file.
TikaConfigException	Tika Config Exception is an exception to occur when there is an error in Tika config file and/or one or more of the parsers failed to initialize from that erroneous config.
TikaConfigSerializer
TikaConfigSerializer.Mode
TikaCoreProperties	Contains a core set of basic Tika metadata properties, which all parsers will attempt to supply (where the file format permits).
TikaCoreProperties.EmbeddedResourceType	A file might contain different types of embedded documents.
TikaDetectors	Provides details of all the `Detector`s registered with Apache Tika, similar to --list-detectors with the Tika CLI.
TikaEmitterException
TikaEmitterResult
TikaEvalCLI
TikaEvalMetadataFilter
TikaEvalResource
TikaExcelDataFormatter	Overrides Excel's General format to include more significant digits than the MS Spec allows.
TikaExcelGeneralFormat	A Format that allows up to 15 significant digits for integers.
TikaException	Tika exception
TikaFileTypeDetector
TikaGUI	Simple Swing GUI for Apache Tika.
TikaInputStream	Input stream with extended capabilities.
TikaLanguageDetector	This is Tika's original legacy, homegrown language detector.
TikaLoggingFilter
TikaMemoryLimitException
TikaMimeKeys	A collection of Tika metadata keys used in Mime Type resolution
TikaMimeTypes	Provides details of all the mimetypes known to Apache Tika, similar to --list-supported-types with the Tika CLI.
TikaMp4BoxHandler
TikaPagedText	Metadata properties for paged text, metadata appropriate for an individual page (useful for embedded document handlers called on individual pages).
TikaParsers	Provides details of all the `Parser`s registered with Apache Tika, similar to --list-parsers and --list-parser-details within the Tika CLI.
TikaResource
TikaServerCli
TikaServerClientConfig
TikaServerConfig
TikaServerParseException	Simple wrapper exception to be thrown for consistent handling of exceptions that can happen during a parse.
TikaServerParseExceptionMapper
TikaServerProcess
TikaServerResource	Stub interface to allow for loading of resources via SPI
TikaServerStatus
TikaServerWatchDog
TikaServerWriter<T>	Stub interface to allow for SPI loading from other modules without opening up service loading to any generic MessageBodyWriter
TikaTaskTimeout
TikaTimeoutException	Runtime/unchecked version of `TimeoutException`
TikaToXMP
TikaUserDataBox
TikaVersion
TikaWelcome	Provides a basic welcome to the Apache Tika Server.
TikaWelcome.Endpoint
TimeoutConfig
TlsConfig
TMXContentHandler	Content Handler for Translation Memory eXchange (TMX) files.
TMXParser	Parser for Translation Memory eXchange (TMX) files.
TNEFParser	A POI-powered Tika Parser for TNEF (Transport Neutral Encoding Format) messages, aka winmail.dat
ToHTMLContentHandler	SAX event handler that serializes the HTML document to a character stream.
TokenContraster	Computes some corpus contrast statistics.
TokenCounter	Deprecated. use `CompositeTextStatsCalculator` with `TokenEntropy`, `TokenLengths` and `TopNTokens`.
TokenCountPriorityQueue
TokenCountPriorityQueue
TokenCounts
TokenCountStatsCalculator<T>	Interface for calculators that require token stats
TokenEntropy
TokenIntPair
TokenLengths
TokenStatistics
TopCommonTokenCounter	Utility class that reads in a UTF-8 input file with one document per row and outputs the 20000 tokens with the highest document frequencies.
TopNTokens
TotalCounter	Interface for pipesiterators that allow counting of total documents.
TotalCountResult
TotalCountResult.STATUS
ToTextContentHandler	SAX event handler that writes all character content out to a character stream.
ToXMLContentHandler	SAX event handler that serializes the XML document to a character stream.
TrainedModel
TrainedModelDetector
TrainTestSplit
TranscribeTranslateExample	This example demonstrates primitive logic for chaining Tika API calls.
Transformer
TranslateResource
Translator	Interface for Translator services.
TranslatorExample
TrecDocumentGenerator	Generates document summaries for corpus analysis in the Open Relevance project.
TrueTypeParser	Parser for TrueType font files (TTF).
Truncator
TSDParser	Tika parser for Time Stamped Data Envelope (application/timestamped-data)
TwoBytesOfData	This class is used to represent the property contains 2 bytes of data in the PropertySet.rgData stream field.
TXTParser	Plain text parser.
TypeDetector	Content type detection based on a content type hint.
UByte	The `unsigned byte` type
UInteger	The `unsigned int` type
ULong	The `unsigned long` type
UMath
UnicodeBlockCounter
UniversalEncodingDetector
UnpackerResource
UnrarParser	Parser for Rar files.
Unsigned	A utility class for static access to unsigned number functionality.
UnsupportedFormatException	Parsers should throw this exception when they encounter a file format that they do not support.
UNumber	A base type for unsigned numbers.
URLEmailNormalizingFilterFactory	Factory for filter that normalizes urls and emails to __url__ and __email__ respectively.
UrlFetcher	Simple fetcher for URLs.
UShort	The `unsigned short` type
UuidUtils
VectorGraphicsOnlyPDFRenderer	This class extends the PDFRenderer to render only the textual elements
WACZParser
WARC
WARCParser
WatchDogResult
WebPParser
WMFParser	This parser offers a very rough capability to extract text if there is text stored in the WMF files.
Word2006MLParser
WordExtractor
WordExtractor.TagAndStyle
WordMLParser	Parses wordml 2003 format word files.
WordPerfect	WordPerfect properties collection.
WordPerfectParser	Parser for Corel WordPerfect documents.
WriteLimiter
WriteLimitReachedException
WriteOutContentHandler	SAX event handler that writes content up to an optional write limit out to a character stream or other decorated handler.
XHTMLContentHandler	Content handler decorator that simplifies the task of producing XHTML events for Tika content parsers.
XLIFF12ContentHandler	Content Handler for XLIFF 1.2 documents.
XLIFF12Parser	Parser for XLIFF 1.2 files.
XLSXHREFFormatter
XLZParser	Parser for XLZ Archives.
XMLDOMUtil
XMLErrorLogUpdater	This is a very task specific class that reads a log file and updates the "comparisons" table.
XMLLogMsgHandler
XMLLogReader
XMLParser	XML parser.
XMLProfiler
XMLReaderUtils	Utility functions for reading XML.
XmlRootExtractor	Utility class that uses a `SAXParser` to determine the namespace URI and local name of the root element of an XML file.
XMP
XMPContentHandler	Content handler decorator that simplifies the task of producing XMP output.
XMPDM	XMP Dynamic Media schema.
XMPDM.ChannelTypePropertyConverter	Deprecated. Experimental method, will change shortly
XMPIdq
XMPMessageBodyWriter
XMPMetadata	Provides a conversion of the Metadata map from Tika to the XMP data model by also providing the Metadata API for clients to ease transition.
XMPMetadataExtractor	XMP Metadata Extractor based on Apache XmpBox.
XMPMetadataResource
XMPMM
XMPPacketScanner	This class is a parser for XMP packets.
XMPRights	XMP Rights management schema.
XMPSchemaIllustrator
XMPSchemaPDFUA
XMPSchemaPDFVT
XMPSchemaPDFX	This is somewhat of a hack to handle the older pdfx: See also the more modern `XMPSchemaPDFXId`
XMPSchemaPDFXId
XPathParser	Parser for a very simple XPath subset.
XPSExtractorDecorator
XPSTextExtractor	Currently, mostly a pass-through class to hold pkg and properties and keep the general framework similar to our other POI-integrated extractors.
XSLFEventBasedPowerPointExtractor
XSLFPowerPointExtractorDecorator
XSSFBExcelExtractorDecorator
XSSFExcelExtractorDecorator
XSSFExcelExtractorDecorator.HeaderFooterFromString
XSSFExcelExtractorDecorator.SheetTextAsHTML	Turns formatted sheet events into HTML
XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer	Captures information on interesting tags, whilst delegating the main work to the formatting handler
XUserDefinedCharset
XUserDefinedCharset.NotImplementedException
XWPFEventBasedWordExtractor	Experimental class that is based on POI's XSSFEventBasedExcelExtractor
XWPFListManager
XWPFNumberingShim	Stub class of POI's XWPFNumbering because onDocumentRead() is protected
XWPFStylesShim	For Tika, all we need (so far) is a mapping between styleId and a style's name.
XWPFWordExtractorDecorator
YandexTranslator	An implementation of a REST client for the YANDEX Translate API.
ZeroByteFileException	Exception thrown by the AutoDetectParser when a file contains zero-bytes.
ZeroByteFileException.IgnoreZeroByteFileException
ZeroSizeFileDetector	Detector to identify zero length files as application/x-zerovalue
ZipContainerDetector	Classes that implement this must be able to detect on a ZipFile and in streaming mode.
ZipFilesChunking	This class is used to process zip file chunking
ZipHeader
ZipListFiles	Example code listing from Chapter 1.
ZipSalvager
ZipWriter