public abstract class AbstractRecursiveParserWrapperHandler extends DefaultHandler implements Serializable
RecursiveParserWrapper
.
It allows for finer-grained processing of embedded documents than in the legacy handlers.
Subclasses can choose how to process individual embedded documents.Modifier and Type | Field and Description |
---|---|
static Property |
CONTAINER_EXCEPTION |
static Property |
EMBEDDED_DEPTH |
static Property |
EMBEDDED_EXCEPTION |
static Property |
EMBEDDED_RESOURCE_LIMIT_REACHED |
static Property |
EMBEDDED_RESOURCE_PATH |
static Property |
PARSE_TIME_MILLIS |
static Property |
TIKA_CONTENT |
static Property |
TIKA_CONTENT_HANDLER
Simple class name of the content handler
|
static Property |
WRITE_LIMIT_REACHED |
Constructor and Description |
---|
AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory) |
AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
int maxEmbeddedResources,
int totalWriteLimit) |
Modifier and Type | Method and Description |
---|---|
void |
endDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after the full parse has completed.
|
void |
endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing each embedded document.
|
ContentHandlerFactory |
getContentHandlerFactory() |
ContentHandler |
getNewContentHandler() |
ContentHandler |
getNewContentHandler(OutputStream os,
Charset charset) |
int |
getTotalWriteLimit() |
boolean |
hasHitMaximumEmbeddedResources() |
void |
startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing each embedded document.
|
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
public static final Property TIKA_CONTENT
public static final Property TIKA_CONTENT_HANDLER
public static final Property PARSE_TIME_MILLIS
public static final Property WRITE_LIMIT_REACHED
public static final Property EMBEDDED_RESOURCE_LIMIT_REACHED
public static final Property EMBEDDED_EXCEPTION
public static final Property CONTAINER_EXCEPTION
public static final Property EMBEDDED_RESOURCE_PATH
public static final Property EMBEDDED_DEPTH
public AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
public AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources, int totalWriteLimit)
public ContentHandler getNewContentHandler()
public ContentHandler getNewContentHandler(OutputStream os, Charset charset)
public void startEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
contentHandler
- local handler to be used on this embedded documentmetadata
- embedded document's metadataSAXException
public void endEmbeddedDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
contentHandler
- content handler that was used on this embedded documentmetadata
- metadata for this embedded documentSAXException
public void endDocument(ContentHandler contentHandler, Metadata metadata) throws SAXException
super.endDocument(...)
in subclasses because this adds whether or not the embedded resource
maximum has been hit to the metadata.contentHandler
- content handler that was used on the main documentmetadata
- metadata that was gathered for the main documentSAXException
public boolean hasHitMaximumEmbeddedResources()
public ContentHandlerFactory getContentHandlerFactory()
public int getTotalWriteLimit()
Copyright © 2007–1969 The Apache Software Foundation. All rights reserved.