|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.tika.extractor.ParserContainerExtractor
public class ParserContainerExtractor
An implementation of ContainerExtractor
powered by the regular
Parser
API. This allows you to easily extract out all the
embedded resources from within container files supported by normal Tika
parsers. By default the AutoDetectParser
will be used, to allow
extraction from the widest range of containers.
Constructor Summary | |
---|---|
ParserContainerExtractor()
|
|
ParserContainerExtractor(Parser parser,
Detector detector)
|
|
ParserContainerExtractor(TikaConfig config)
|
Method Summary | |
---|---|
void |
extract(TikaInputStream stream,
ContainerExtractor recurseExtractor,
EmbeddedResourceHandler handler)
Processes a container file, and extracts all the embedded resources from within it. |
boolean |
isSupported(TikaInputStream input)
Is this Container Extractor able to process the supplied container? |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ParserContainerExtractor()
public ParserContainerExtractor(TikaConfig config)
public ParserContainerExtractor(Parser parser, Detector detector)
Method Detail |
---|
public boolean isSupported(TikaInputStream input) throws java.io.IOException
ContainerExtractor
isSupported
in interface ContainerExtractor
java.io.IOException
public void extract(TikaInputStream stream, ContainerExtractor recurseExtractor, EmbeddedResourceHandler handler) throws java.io.IOException, TikaException
ContainerExtractor
The EmbeddedResourceHandler
you supply will
be called for each embedded resource in the container. It is
up to you whether you process the contents of the resource or not.
The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.
If required, nested containers (such as a .docx within a .zip) can automatically be recursed into, and processed inline. If no recurseExtractor is given, the nested containers will be treated as with any other embedded resources.
extract
in interface ContainerExtractor
stream
- the document stream (input)recurseExtractor
- the extractor to use on any embedded containershandler
- handler for the embedded files (output)
java.io.IOException
- if the document stream could not be read
TikaException
- if the container could not be parsed
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |