public class ParserContainerExtractor extends Object implements ContainerExtractor
ContainerExtractorpowered by the regular
ParserAPI. This allows you to easily extract out all the embedded resources from within container files supported by normal Tika parsers. By default the
AutoDetectParserwill be used, to allow extraction from the widest range of containers.
|Constructor and Description|
|Modifier and Type||Method and Description|
Processes a container file, and extracts all the embedded resources from within it.
Is this Container Extractor able to process the supplied container?
public ParserContainerExtractor(TikaConfig config)
public boolean isSupported(TikaInputStream input) throws IOException
public void extract(TikaInputStream stream, ContainerExtractor recurseExtractor, EmbeddedResourceHandler handler) throws IOException, TikaException
EmbeddedResourceHandler you supply will
be called for each embedded resource in the container. It is
up to you whether you process the contents of the resource or not.
The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.
If required, nested containers (such as a .docx within a .zip) can automatically be recursed into, and processed inline. If no recurseExtractor is given, the nested containers will be treated as with any other embedded resources.
stream- the document stream (input)
recurseExtractor- the extractor to use on any embedded containers
handler- handler for the embedded files (output)
IOException- if the document stream could not be read
TikaException- if the container could not be parsed
Copyright © 2007–2021 The Apache Software Foundation. All rights reserved.