public class RecursiveParserWrapper extends ParserDecorator
After parsing a document, call getMetadata() to retrieve a list of Metadata objects, one for each embedded resource. The first item in the list will contain the Metadata for the outer container file.
Content can also be extracted and stored in the TikaCoreProperties.TIKA_CONTENT
field
of a Metadata object. Select the type of content to be stored
at initialization.
If a WriteLimitReachedException is encountered, the wrapper will stop processing the current resource, and it will not process any of the child resources for the given resource. However, it will try to parse as much as it can. If a WLRE is reached in the parent document, no child resources will be parsed.
The implementation is based on Jukka's RecursiveMetadataParser and Nick's additions. See: RecursiveMetadataParser.
Note that this wrapper holds all data in memory and is not appropriate for files with content too large to be held in memory.
The unit tests for this class are in the tika-parsers module.
Constructor and Description |
---|
RecursiveParserWrapper(Parser wrappedParser)
Initialize the wrapper with
catchEmbeddedExceptions set
to true as default. |
RecursiveParserWrapper(Parser wrappedParser,
boolean catchEmbeddedExceptions) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
getSupportedTypes(ParseContext context)
Delegates the method call to the decorated parser.
|
void |
parse(InputStream stream,
ContentHandler recursiveParserWrapperHandler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
getDecorationName, getWrappedParser, withFallbacks, withoutTypes, withTypes
parse
public RecursiveParserWrapper(Parser wrappedParser)
catchEmbeddedExceptions
set
to true
as default.wrappedParser
- parser to use for the container documents and the embedded documentspublic RecursiveParserWrapper(Parser wrappedParser, boolean catchEmbeddedExceptions)
wrappedParser
- parser to wrapcatchEmbeddedExceptions
- whether or not to catch+record embedded exceptions.
If set to false
, embedded exceptions will be
thrown and the rest of the file will not be parsed. The
following will not be ignored:
CorruptedFileException
, RuntimeException
public Set<MediaType> getSupportedTypes(ParseContext context)
ParserDecorator
super.getSupportedTypes()
to invoke the decorated parser) to implement extra decoration.getSupportedTypes
in interface Parser
getSupportedTypes
in class ParserDecorator
context
- parse contextpublic void parse(InputStream stream, ContentHandler recursiveParserWrapperHandler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
ParserDecorator
super.parse()
to invoke
the decorated parser) to implement extra decoration.parse
in interface Parser
parse
in class ParserDecorator
stream
- recursiveParserWrapperHandler
- -- handler must implement
RecursiveParserWrapperHandler
metadata
- context
- IOException
SAXException
TikaException
IllegalStateException
- if the handler is not a RecursiveParserWrapperHandler
Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.