Package org.apache.tika.extractor
Class ParserContainerExtractor
- java.lang.Object
- 
- org.apache.tika.extractor.ParserContainerExtractor
 
- 
- All Implemented Interfaces:
- Serializable,- ContainerExtractor
 
 public class ParserContainerExtractor extends Object implements ContainerExtractor An implementation ofContainerExtractorpowered by the regularParserAPI. This allows you to easily extract out all the embedded resources from within container files supported by normal Tika parsers. By default theAutoDetectParserwill be used, to allow extraction from the widest range of containers.- See Also:
- Serialized Form
 
- 
- 
Constructor SummaryConstructors Constructor Description ParserContainerExtractor()ParserContainerExtractor(TikaConfig config)ParserContainerExtractor(Parser parser, Detector detector)
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description voidextract(TikaInputStream stream, ContainerExtractor recurseExtractor, EmbeddedResourceHandler handler)Processes a container file, and extracts all the embedded resources from within it.booleanisSupported(TikaInputStream input)Is this Container Extractor able to process the supplied container?
 
- 
- 
- 
Constructor Detail- 
ParserContainerExtractorpublic ParserContainerExtractor() 
 - 
ParserContainerExtractorpublic ParserContainerExtractor(TikaConfig config) 
 
- 
 - 
Method Detail- 
isSupportedpublic boolean isSupported(TikaInputStream input) throws IOException Description copied from interface:ContainerExtractorIs this Container Extractor able to process the supplied container?- Specified by:
- isSupportedin interface- ContainerExtractor
- Throws:
- IOException
 
 - 
extractpublic void extract(TikaInputStream stream, ContainerExtractor recurseExtractor, EmbeddedResourceHandler handler) throws IOException, TikaException Description copied from interface:ContainerExtractorProcesses a container file, and extracts all the embedded resources from within it.The EmbeddedResourceHandleryou supply will be called for each embedded resource in the container. It is up to you whether you process the contents of the resource or not.The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller. If required, nested containers (such as a .docx within a .zip) can automatically be recursed into, and processed inline. If no recurseExtractor is given, the nested containers will be treated as with any other embedded resources. - Specified by:
- extractin interface- ContainerExtractor
- Parameters:
- stream- the document stream (input)
- recurseExtractor- the extractor to use on any embedded containers
- handler- handler for the embedded files (output)
- Throws:
- IOException- if the document stream could not be read
- TikaException- if the container could not be parsed
 
 
- 
 
-