Package org.apache.tika.sax
Class AbstractRecursiveParserWrapperHandler
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.apache.tika.sax.AbstractRecursiveParserWrapperHandler
- All Implemented Interfaces:
Serializable
,ContentHandler
,DTDHandler
,EntityResolver
,ErrorHandler
- Direct Known Subclasses:
RecursiveParserWrapperHandler
public abstract class AbstractRecursiveParserWrapperHandler
extends DefaultHandler
implements Serializable
This is a special handler to be used only with the
RecursiveParserWrapper
.
It allows for finer-grained processing of embedded documents than in the legacy handlers.
Subclasses can choose how to process individual embedded documents.- See Also:
-
Field Summary
-
Constructor Summary
ConstructorDescriptionAbstractRecursiveParserWrapperHandler
(ContentHandlerFactory contentHandlerFactory) AbstractRecursiveParserWrapperHandler
(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources) -
Method Summary
Modifier and TypeMethodDescriptionvoid
endDocument
(ContentHandler contentHandler, Metadata metadata) This is called after the full parse has completed.void
endEmbeddedDocument
(ContentHandler contentHandler, Metadata metadata) This is called after parsing each embedded document.getNewContentHandler
(OutputStream os, Charset charset) boolean
void
startEmbeddedDocument
(ContentHandler contentHandler, Metadata metadata) This is called before parsing each embedded document.Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
-
Field Details
-
EMBEDDED_RESOURCE_LIMIT_REACHED
-
-
Constructor Details
-
AbstractRecursiveParserWrapperHandler
-
AbstractRecursiveParserWrapperHandler
public AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
-
-
Method Details
-
getNewContentHandler
-
getNewContentHandler
-
startEmbeddedDocument
public void startEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException This is called before parsing each embedded document. Override this for custom behavior. Make sure to call this in your custom classes because this tracks the number of embedded documents.- Parameters:
contentHandler
- local handler to be used on this embedded documentmetadata
- embedded document's metadata- Throws:
SAXException
-
endEmbeddedDocument
public void endEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException This is called after parsing each embedded document. Override this for custom behavior. This is currently a no-op.- Parameters:
contentHandler
- content handler that was used on this embedded documentmetadata
- metadata for this embedded document- Throws:
SAXException
-
endDocument
This is called after the full parse has completed. Override this for custom behavior. Make sure to call this assuper.endDocument(...)
in subclasses because this adds whether or not the embedded resource maximum has been hit to the metadata.- Parameters:
contentHandler
- content handler that was used on the main documentmetadata
- metadata that was gathered for the main document- Throws:
SAXException
-
hasHitMaximumEmbeddedResources
public boolean hasHitMaximumEmbeddedResources()- Returns:
- whether this handler has hit the maximum embedded resources during the parse
-
getContentHandlerFactory
-