public class RecursiveParserWrapperHandler extends AbstractRecursiveParserWrapperHandler
AbstractRecursiveParserWrapperHandler
.
See its documentation for more details.
This caches the a metadata object for each embedded file and for the container file.
It places the extracted content in the metadata object, with this key: AbstractRecursiveParserWrapperHandler.TIKA_CONTENT
If memory is a concern, subclass AbstractRecursiveParserWrapperHandler to handle each
embedded document.
NOTE: This handler must only be used with the RecursiveParserWrapper
Modifier and Type | Field and Description |
---|---|
protected List<Metadata> |
metadataList |
CONTAINER_EXCEPTION, EMBEDDED_DEPTH, EMBEDDED_EXCEPTION, EMBEDDED_RESOURCE_LIMIT_REACHED, EMBEDDED_RESOURCE_PATH, PARSE_TIME_MILLIS, TIKA_CONTENT, TIKA_CONTENT_HANDLER, WRITE_LIMIT_REACHED
Constructor and Description |
---|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
Create a handler with no limit on the number of embedded resources
|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
int maxEmbeddedResources)
Create a handler that limits the number of embedded resources that will be
parsed
|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
int maxEmbeddedResources,
int maxWriteLimit,
MetadataFilter metadataFilter) |
Modifier and Type | Method and Description |
---|---|
void |
endDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after the full parse has completed.
|
void |
endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing an embedded document.
|
List<Metadata> |
getMetadataList() |
void |
startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing an embedded document
|
getContentHandlerFactory, getNewContentHandler, getNewContentHandler, getTotalWriteLimit, hasHitMaximumEmbeddedResources
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
public RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
public RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
maxEmbeddedResources
- number of embedded resources that will be parsedpublic RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources, int maxWriteLimit, MetadataFilter metadataFilter)
public void startEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
startEmbeddedDocument
in class AbstractRecursiveParserWrapperHandler
contentHandler
- - local content handler to use on the embedded documentmetadata
- metadata to use for the embedded documentSAXException
public void endEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
endEmbeddedDocument
in class AbstractRecursiveParserWrapperHandler
contentHandler
- local contenthandler used on the embedded documentmetadata
- metadata from the embedded documentSAXException
public void endDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
AbstractRecursiveParserWrapperHandler
super.endDocument(...)
in subclasses because this adds whether or not the embedded resource
maximum has been hit to the metadata.endDocument
in class AbstractRecursiveParserWrapperHandler
contentHandler
- content handler used on the main documentmetadata
- metadata from the main documentSAXException
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.