Package org.apache.tika.sax
SAX utilities.
-
Interface Summary Interface Description ContentHandlerFactory Interface to allow easier injection of code for getting a new ContentHandlerSafeContentHandler.Output Internal interface that allows both character and ignorable whitespace content to be filtered the same way. -
Class Summary Class Description AbstractRecursiveParserWrapperHandler This is a special handler to be used only with theRecursiveParserWrapper
.BasicContentHandlerFactory Basic factory for creating common types of ContentHandlersBodyContentHandler Content handler decorator that only passes everything inside the XHTML <body/> tag to the underlying handler.CleanPhoneText Class to help de-obfuscate phone numbers in text.ContentHandlerDecorator Decorator base class for theContentHandler
interface.DIFContentHandler ElementMappingContentHandler Content handler decorator that maps elementQName
s using aMap
.ElementMappingContentHandler.TargetElement EmbeddedContentHandler Content handler decorator that prevents theEmbeddedContentHandler.startDocument()
andEmbeddedContentHandler.endDocument()
events from reaching the decorated handler.EndDocumentShieldingContentHandler A wrapper around aContentHandler
which will ignore normal SAX calls toEndDocumentShieldingContentHandler.endDocument()
, and only fire them later.ExpandedTitleContentHandler Content handler decorator which wraps aTransformerHandler
in order to allow theTITLE
tag to render as<title></title>
rather than<title/>
which is accomplished by calling theContentHandler.characters(char[], int, int)
method with alength
of 1 but a zero length char array.Link LinkContentHandler Content handler that collects links from an XHTML document.OfflineContentHandler Content handler decorator that always returns an empty stream from theOfflineContentHandler.resolveEntity(String, String)
method to prevent potential network or other external resources from being accessed by an XML parser.PhoneExtractingContentHandler Class used to extract phone numbers while parsing.RecursiveParserWrapperHandler This is the default implementation ofAbstractRecursiveParserWrapperHandler
.RichTextContentHandler Content handler for Rich Text, it will extract XHTML <img/> tag <alt/> attribute and XHTML <a/> tag <name/> attribute into the output.SafeContentHandler Content handler decorator that makes sure that the character events (SafeContentHandler.characters(char[], int, int)
orSafeContentHandler.ignorableWhitespace(char[], int, int)
) passed to the decorated content handler contain only valid XML characters.SecureContentHandler Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.StandardOrganizations This class provides a collection of the most important technical standard organizations.StandardReference Class that represents a standard reference.StandardReference.StandardReferenceBuilder StandardsExtractingContentHandler StandardsExtractingContentHandler is a Content Handler used to extract standard references while parsing.StandardsExtractionExample Class to demonstrate how to use theStandardsExtractingContentHandler
to get a list of the standard references from every file in a directory.StandardsText StandardText relies on regular expressions to extract standard references from text.TaggedContentHandler A content handler decorator that tags potential exceptions so that the handler that caused the exception can easily be identified.TeeContentHandler Content handler proxy that forwards the received SAX events to zero or more underlying content handlers.TextContentHandler Content handler decorator that only passes theTextContentHandler.characters(char[], int, int)
and (@linkTextContentHandler.ignorableWhitespace(char[], int, int)
(plusTextContentHandler.startDocument()
andTextContentHandler.endDocument()
events to the decorated content handler.ToHTMLContentHandler SAX event handler that serializes the HTML document to a character stream.ToTextContentHandler SAX event handler that writes all character content out to a character stream.ToXMLContentHandler SAX event handler that serializes the XML document to a character stream.WriteOutContentHandler SAX event handler that writes content up to an optional write limit out to a character stream or other decorated handler.XHTMLContentHandler Content handler decorator that simplifies the task of producing XHTML events for Tika content parsers.XMPContentHandler Content handler decorator that simplifies the task of producing XMP output. -
Enum Summary Enum Description BasicContentHandlerFactory.HANDLER_TYPE Common handler types for content. -
Exception Summary Exception Description TaggedSAXException ASAXException
wrapper that tags the wrapped exception with a given object reference.