Package org.apache.tika.sax
@Version("1.0.0")
package org.apache.tika.sax
SAX utilities.
-
ClassDescriptionThis is a special handler to be used only with the
RecursiveParserWrapper
.Basic factory for creating common types of ContentHandlersCommon handler types for content.Content handler decorator that only passes everything inside the XHTML <body/> tag to the underlying handler.Class to help de-obfuscate phone numbers in text.Decorator base class for theContentHandler
interface.Interface to allow easier injection of code for getting a new ContentHandlerContent handler decorator that maps elementQName
s using aMap
.Content handler decorator that prevents theEmbeddedContentHandler.startDocument()
andEmbeddedContentHandler.endDocument()
events from reaching the decorated handler.A wrapper around aContentHandler
which will ignore normal SAX calls toEndDocumentShieldingContentHandler.endDocument()
, and only fire them later.Content handler decorator which wraps aTransformerHandler
in order to allow theTITLE
tag to render as<title></title>
rather than<title/>
which is accomplished by calling theContentHandler.characters(char[], int, int)
method with alength
of 1 but a zero length char array.Content handler that collects links from an XHTML document.Content handler decorator that always returns an empty stream from theOfflineContentHandler.resolveEntity(String, String)
method to prevent potential network or other external resources from being accessed by an XML parser.Class used to extract phone numbers while parsing.This is the default implementation ofAbstractRecursiveParserWrapperHandler
.Content handler for Rich Text, it will extract XHTML <img/> tag <alt/> attribute and XHTML <a/> tag <name/> attribute into the output.Content handler decorator that makes sure that the character events (SafeContentHandler.characters(char[], int, int)
orSafeContentHandler.ignorableWhitespace(char[], int, int)
) passed to the decorated content handler contain only valid XML characters.Internal interface that allows both character and ignorable whitespace content to be filtered the same way.Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.This class provides a collection of the most important technical standard organizations.Class that represents a standard reference.StandardsExtractingContentHandler is a Content Handler used to extract standard references while parsing.StandardText relies on regular expressions to extract standard references from text.Sentinel exception to stop parsing xml once target is found while SAX parsing.A content handler decorator that tags potential exceptions so that the handler that caused the exception can easily be identified.ASAXException
wrapper that tags the wrapped exception with a given object reference.Content handler proxy that forwards the received SAX events to zero or more underlying content handlers.Content handler decorator that only passes theTextContentHandler.characters(char[], int, int)
and (@linkTextContentHandler.ignorableWhitespace(char[], int, int)
(plusTextContentHandler.startDocument()
andTextContentHandler.endDocument()
events to the decorated content handler.SAX event handler that serializes the HTML document to a character stream.SAX event handler that writes all character content out to a character stream.SAX event handler that serializes the XML document to a character stream.SAX event handler that writes content up to an optional write limit out to a character stream or other decorated handler.Content handler decorator that simplifies the task of producing XHTML events for Tika content parsers.Content handler decorator that simplifies the task of producing XMP output.