public abstract class AbstractRecursiveParserWrapperHandler extends DefaultHandler implements Serializable
RecursiveParserWrapper
.
It allows for finer-grained processing of embedded documents than in the legacy handlers.
Subclasses can choose how to process individual embedded documents.Modifier and Type | Field and Description |
---|---|
static Property |
EMBEDDED_RESOURCE_LIMIT_REACHED |
Constructor and Description |
---|
AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory) |
AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
int maxEmbeddedResources) |
Modifier and Type | Method and Description |
---|---|
void |
endDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after the full parse has completed.
|
void |
endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing each embedded document.
|
ContentHandlerFactory |
getContentHandlerFactory() |
ContentHandler |
getNewContentHandler() |
ContentHandler |
getNewContentHandler(OutputStream os,
Charset charset) |
boolean |
hasHitMaximumEmbeddedResources() |
void |
startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing each embedded document.
|
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
public static final Property EMBEDDED_RESOURCE_LIMIT_REACHED
public AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
public AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
public ContentHandler getNewContentHandler()
public ContentHandler getNewContentHandler(OutputStream os, Charset charset)
public void startEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
contentHandler
- local handler to be used on this embedded documentmetadata
- embedded document's metadataSAXException
public void endEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
contentHandler
- content handler that was used on this embedded documentmetadata
- metadata for this embedded documentSAXException
public void endDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
super.endDocument(...)
in subclasses because this adds whether or not the embedded resource
maximum has been hit to the metadata.contentHandler
- content handler that was used on the main documentmetadata
- metadata that was gathered for the main documentSAXException
public boolean hasHitMaximumEmbeddedResources()
public ContentHandlerFactory getContentHandlerFactory()
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.