public interface ContainerExtractor extends Serializable
Modifier and Type | Method and Description |
---|---|
void |
extract(TikaInputStream stream,
ContainerExtractor recurseExtractor,
EmbeddedResourceHandler handler)
Processes a container file, and extracts all the embedded
resources from within it.
|
boolean |
isSupported(TikaInputStream input)
Is this Container Extractor able to process the
supplied container?
|
boolean isSupported(TikaInputStream input) throws IOException
IOException
void extract(TikaInputStream stream, ContainerExtractor recurseExtractor, EmbeddedResourceHandler handler) throws IOException, TikaException
The EmbeddedResourceHandler
you supply will
be called for each embedded resource in the container. It is
up to you whether you process the contents of the resource or not.
The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.
If required, nested containers (such as a .docx within a .zip) can automatically be recursed into, and processed inline. If no recurseExtractor is given, the nested containers will be treated as with any other embedded resources.
stream
- the document stream (input)recurseExtractor
- the extractor to use on any embedded containershandler
- handler for the embedded files (output)IOException
- if the document stream could not be readTikaException
- if the container could not be parsedCopyright © 2007–1969 The Apache Software Foundation. All rights reserved.