public class RecursiveParserWrapperHandler extends AbstractRecursiveParserWrapperHandler
AbstractRecursiveParserWrapperHandler.
 See its documentation for more details.
 
 This caches the a metadata object for each embedded file and for the container file.
 It places the extracted content in the metadata object, with this key:
 TikaCoreProperties.TIKA_CONTENT
 If memory is a concern, subclass AbstractRecursiveParserWrapperHandler to handle each
 embedded document.
 
 NOTE: This handler must only be used with the RecursiveParserWrapper
 
| Modifier and Type | Field and Description | 
|---|---|
protected List<Metadata> | 
metadataList  | 
EMBEDDED_RESOURCE_LIMIT_REACHED| Constructor and Description | 
|---|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
Create a handler with no limit on the number of embedded resources 
 | 
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
                             int maxEmbeddedResources)
Create a handler that limits the number of embedded resources that will be
 parsed 
 | 
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
                             int maxEmbeddedResources,
                             MetadataFilter metadataFilter)  | 
| Modifier and Type | Method and Description | 
|---|---|
void | 
endDocument(ContentHandler contentHandler,
           Metadata metadata)
This is called after the full parse has completed. 
 | 
void | 
endEmbeddedDocument(ContentHandler contentHandler,
                   Metadata metadata)
This is called after parsing an embedded document. 
 | 
List<Metadata> | 
getMetadataList()  | 
void | 
startEmbeddedDocument(ContentHandler contentHandler,
                     Metadata metadata)
This is called before parsing an embedded document 
 | 
getContentHandlerFactory, getNewContentHandler, getNewContentHandler, hasHitMaximumEmbeddedResourcescharacters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warningpublic RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
public RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
maxEmbeddedResources - number of embedded resources that will be parsedpublic RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources, MetadataFilter metadataFilter)
public void startEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
startEmbeddedDocument in class AbstractRecursiveParserWrapperHandlercontentHandler - - local content handler to use on the embedded documentmetadata - metadata to use for the embedded documentSAXExceptionpublic void endEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
endEmbeddedDocument in class AbstractRecursiveParserWrapperHandlercontentHandler - local contenthandler used on the embedded documentmetadata - metadata from the embedded documentSAXExceptionpublic void endDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
AbstractRecursiveParserWrapperHandlersuper.endDocument(...)
 in subclasses because this adds whether or not the embedded resource
 maximum has been hit to the metadata.endDocument in class AbstractRecursiveParserWrapperHandlercontentHandler - content handler used on the main documentmetadata - metadata from the main documentSAXExceptionCopyright © 2007–2021 The Apache Software Foundation. All rights reserved.