@Version("1.0.0")
Package org.apache.tika.extractor
Extraction of component documents.
-
Interface Summary Interface Description ContainerExtractor Tika container extractor interface.DocumentSelector Interface for different document selection strategies for purposes like embedded document extraction by aContainerExtractor
instance.EmbeddedBytesSelector EmbeddedDocumentBytesHandler EmbeddedDocumentByteStoreExtractorFactory This factory creates EmbeddedDocumentExtractors that require anEmbeddedDocumentBytesHandler
in theParseContext
should extend this.EmbeddedDocumentExtractor EmbeddedDocumentExtractorFactory EmbeddedResourceHandler Tika container extractor callback interface.EmbeddedStreamTranslator Interface for different filtering of embedded streams. -
Class Summary Class Description AbstractEmbeddedDocumentBytesHandler BasicEmbeddedBytesSelector BasicEmbeddedDocumentBytesHandler For now, this is an in-memory EmbeddedDocumentBytesHandler that stores all the bytes in memory.DefaultEmbeddedStreamTranslator Loads EmbeddedStreamTranslators via service loading.EmbeddedBytesSelector.AcceptAll EmbeddedDocumentUtil Utility class to handle common issues with embedded documents.ParentContentHandler Simple pointer class to allow parsers to pass on the parent contenthandler through to the embedded document's parseParserContainerExtractor An implementation ofContainerExtractor
powered by the regularParser
API.ParsingEmbeddedDocumentExtractor Helper class for parsers of package archives or other compound document formats that support embedded or attached component documents.ParsingEmbeddedDocumentExtractorFactory RUnpackExtractor Recursive Unpacker and text and metadata extractor.RUnpackExtractorFactory