public class RecursiveParserWrapperHandler extends AbstractRecursiveParserWrapperHandler
AbstractRecursiveParserWrapperHandler.
See its documentation for more details.
This caches the a metadata object for each embedded file and for the container file.
It places the extracted content in the metadata object, with this key:
TikaCoreProperties.TIKA_CONTENT
If memory is a concern, subclass AbstractRecursiveParserWrapperHandler to handle each
embedded document.
NOTE: This handler must only be used with the RecursiveParserWrapper
| Modifier and Type | Field and Description |
|---|---|
protected List<Metadata> |
metadataList |
EMBEDDED_RESOURCE_LIMIT_REACHED| Constructor and Description |
|---|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
Create a handler with no limit on the number of embedded resources
|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
int maxEmbeddedResources)
Create a handler that limits the number of embedded resources that will be
parsed
|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
int maxEmbeddedResources,
MetadataFilter metadataFilter) |
| Modifier and Type | Method and Description |
|---|---|
void |
endDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after the full parse has completed.
|
void |
endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing an embedded document.
|
List<Metadata> |
getMetadataList() |
void |
startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing an embedded document
|
getContentHandlerFactory, getNewContentHandler, getNewContentHandler, hasHitMaximumEmbeddedResourcescharacters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warningpublic RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
public RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
maxEmbeddedResources - number of embedded resources that will be parsedpublic RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources, MetadataFilter metadataFilter)
public void startEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
startEmbeddedDocument in class AbstractRecursiveParserWrapperHandlercontentHandler - - local content handler to use on the embedded documentmetadata - metadata to use for the embedded documentSAXExceptionpublic void endEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
endEmbeddedDocument in class AbstractRecursiveParserWrapperHandlercontentHandler - local contenthandler used on the embedded documentmetadata - metadata from the embedded documentSAXExceptionpublic void endDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
AbstractRecursiveParserWrapperHandlersuper.endDocument(...)
in subclasses because this adds whether or not the embedded resource
maximum has been hit to the metadata.endDocument in class AbstractRecursiveParserWrapperHandlercontentHandler - content handler used on the main documentmetadata - metadata from the main documentSAXExceptionCopyright © 2007–2022 The Apache Software Foundation. All rights reserved.