Package | Description |
---|---|
org.apache.tika.language.detect | |
org.apache.tika.parser.ctakes | |
org.apache.tika.parser.odf | |
org.apache.tika.sax |
SAX utilities.
|
org.apache.tika.sax.xpath |
XPath utilities
|
Modifier and Type | Class and Description |
---|---|
class |
LanguageHandler
SAX content handler that updates a language detector based on all the
received character content.
|
Modifier and Type | Class and Description |
---|---|
class |
CTAKESContentHandler
Class used to extract biomedical information while parsing.
|
Modifier and Type | Class and Description |
---|---|
class |
NSNormalizerContentHandler
Content handler decorator that:
Maps old OpenOffice 1.0 Namespaces to the OpenDocument ones
Returns a fake DTD when parser requests OpenOffice DTD
|
Modifier and Type | Class and Description |
---|---|
class |
BodyContentHandler
Content handler decorator that only passes everything inside
the XHTML <body/> tag to the underlying handler.
|
class |
ElementMappingContentHandler
Content handler decorator that maps element
QName s using
a Map . |
class |
EmbeddedContentHandler
Content handler decorator that prevents the
EmbeddedContentHandler.startDocument()
and EmbeddedContentHandler.endDocument() events from reaching the decorated handler. |
class |
EndDocumentShieldingContentHandler
A wrapper around a
ContentHandler which will ignore normal
SAX calls to EndDocumentShieldingContentHandler.endDocument() , and only fire them later. |
class |
ExpandedTitleContentHandler
Content handler decorator which wraps a
TransformerHandler in order to
allow the TITLE tag to render as <title></title>
rather than <title/> which is accomplished
by calling the ContentHandler.characters(char[], int, int) method
with a length of 1 but a zero length char array. |
class |
OfflineContentHandler
Content handler decorator that always returns an empty stream from the
OfflineContentHandler.resolveEntity(String, String) method to prevent potential
network or other external resources from being accessed by an XML parser. |
class |
PhoneExtractingContentHandler
Class used to extract phone numbers while parsing.
|
class |
RichTextContentHandler
Content handler for Rich Text, it will extract XHTML <img/>
tag <alt/> attribute and XHTML <a/> tag <name/>
attribute into the output.
|
class |
SafeContentHandler
Content handler decorator that makes sure that the character events
(
SafeContentHandler.characters(char[], int, int) or
SafeContentHandler.ignorableWhitespace(char[], int, int) ) passed to the decorated
content handler contain only valid XML characters. |
class |
SecureContentHandler
Content handler decorator that attempts to prevent denial of service
attacks against Tika parsers.
|
class |
StandardsExtractingContentHandler
StandardsExtractingContentHandler is a Content Handler used to extract
standard references while parsing.
|
class |
TaggedContentHandler
A content handler decorator that tags potential exceptions so that the
handler that caused the exception can easily be identified.
|
class |
WriteOutContentHandler
SAX event handler that writes content up to an optional write
limit out to a character stream or other decorated handler.
|
class |
XHTMLContentHandler
Content handler decorator that simplifies the task of producing XHTML
events for Tika content parsers.
|
class |
XMPContentHandler
Content handler decorator that simplifies the task of producing XMP output.
|
Modifier and Type | Class and Description |
---|---|
class |
MatchingContentHandler
Content handler decorator that only passes the elements, attributes,
and text nodes that match the given XPath expression.
|
Copyright © 2007–2021 The Apache Software Foundation. All rights reserved.