Package org.apache.tika.sax
Class RecursiveParserWrapperHandler
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.apache.tika.sax.AbstractRecursiveParserWrapperHandler
org.apache.tika.sax.RecursiveParserWrapperHandler
- All Implemented Interfaces:
Serializable
,ContentHandler
,DTDHandler
,EntityResolver
,ErrorHandler
This is the default implementation of
AbstractRecursiveParserWrapperHandler
.
See its documentation for more details.
This caches the a metadata object for each embedded file and for the container file.
It places the extracted content in the metadata object, with this key:
TikaCoreProperties.TIKA_CONTENT
If memory is a concern, subclass AbstractRecursiveParserWrapperHandler to handle each
embedded document.
NOTE: This handler must only be used with the RecursiveParserWrapper
- See Also:
-
Field Summary
Fields inherited from class org.apache.tika.sax.AbstractRecursiveParserWrapperHandler
EMBEDDED_RESOURCE_LIMIT_REACHED
-
Constructor Summary
ConstructorDescriptionRecursiveParserWrapperHandler
(ContentHandlerFactory contentHandlerFactory) Create a handler with no limit on the number of embedded resourcesRecursiveParserWrapperHandler
(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources) Create a handler that limits the number of embedded resources that will be parsedRecursiveParserWrapperHandler
(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources, MetadataFilter metadataFilter) -
Method Summary
Modifier and TypeMethodDescriptionvoid
endDocument
(ContentHandler contentHandler, Metadata metadata) This is called after the full parse has completed.void
endEmbeddedDocument
(ContentHandler contentHandler, Metadata metadata) This is called after parsing an embedded document.void
startEmbeddedDocument
(ContentHandler contentHandler, Metadata metadata) This is called before parsing an embedded documentMethods inherited from class org.apache.tika.sax.AbstractRecursiveParserWrapperHandler
getContentHandlerFactory, getNewContentHandler, getNewContentHandler, hasHitMaximumEmbeddedResources
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.xml.sax.ContentHandler
declaration
-
Field Details
-
metadataList
-
-
Constructor Details
-
RecursiveParserWrapperHandler
Create a handler with no limit on the number of embedded resources -
RecursiveParserWrapperHandler
public RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources) Create a handler that limits the number of embedded resources that will be parsed- Parameters:
maxEmbeddedResources
- number of embedded resources that will be parsed
-
RecursiveParserWrapperHandler
public RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources, MetadataFilter metadataFilter)
-
-
Method Details
-
startEmbeddedDocument
public void startEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException This is called before parsing an embedded document- Overrides:
startEmbeddedDocument
in classAbstractRecursiveParserWrapperHandler
- Parameters:
contentHandler
- - local content handler to use on the embedded documentmetadata
- metadata to use for the embedded document- Throws:
SAXException
-
endEmbeddedDocument
public void endEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException This is called after parsing an embedded document.- Overrides:
endEmbeddedDocument
in classAbstractRecursiveParserWrapperHandler
- Parameters:
contentHandler
- local contenthandler used on the embedded documentmetadata
- metadata from the embedded document- Throws:
SAXException
-
endDocument
Description copied from class:AbstractRecursiveParserWrapperHandler
This is called after the full parse has completed. Override this for custom behavior. Make sure to call this assuper.endDocument(...)
in subclasses because this adds whether or not the embedded resource maximum has been hit to the metadata.- Overrides:
endDocument
in classAbstractRecursiveParserWrapperHandler
- Parameters:
contentHandler
- content handler used on the main documentmetadata
- metadata from the main document- Throws:
SAXException
-
getMetadataList
- Returns:
- a list of Metadata objects, one for the main document and one for each embedded document
-