Package org.apache.tika.sax
Class XHTMLContentHandler
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.apache.tika.sax.ContentHandlerDecorator
org.apache.tika.sax.SafeContentHandler
org.apache.tika.sax.XHTMLContentHandler
- All Implemented Interfaces:
ContentHandler
,DTDHandler
,EntityResolver
,ErrorHandler
Content handler decorator that simplifies the task of producing XHTML
events for Tika content parsers.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.tika.sax.SafeContentHandler
SafeContentHandler.Output
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
characters
(char[] ch, int start, int length) void
characters
(String characters) void
Emits an XHTML element with the given text content.void
Ends the XHTML document by writing the following footer and clearing the namespace mappings:void
endElement
(String name) void
endElement
(String uri, String local, String name) Ends the given element.protected boolean
isInvalid
(int ch) Checks whether the given Unicode character is an invalid XML character and should be replaced for output.void
newline()
void
Starts an XHTML document by setting up the namespace mappings when called for the first time.void
startElement
(String name) void
startElement
(String name, String attribute, String value) void
startElement
(String uri, String local, String name, Attributes attributes) Starts the given element.void
startElement
(String name, AttributesImpl attributes) Methods inherited from class org.apache.tika.sax.SafeContentHandler
ignorableWhitespace, writeReplacement
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endPrefixMapping, error, fatalError, handleException, processingInstruction, setContentHandler, setDocumentLocator, skippedEntity, startPrefixMapping, toString, warning
Methods inherited from class org.xml.sax.helpers.DefaultHandler
notationDecl, resolveEntity, unparsedEntityDecl
-
Field Details
-
XHTML
The XHTML namespace URI- See Also:
-
ENDLINE
The elements that get appended with theNL
character.
-
-
Constructor Details
-
XHTMLContentHandler
-
-
Method Details
-
startDocument
Starts an XHTML document by setting up the namespace mappings when called for the first time. The standard XHTML prefix is generated lazily when the first element is started.- Specified by:
startDocument
in interfaceContentHandler
- Overrides:
startDocument
in classContentHandlerDecorator
- Throws:
SAXException
-
endDocument
Ends the XHTML document by writing the following footer and clearing the namespace mappings:</body> </html>
- Specified by:
endDocument
in interfaceContentHandler
- Overrides:
endDocument
in classSafeContentHandler
- Throws:
SAXException
-
startElement
public void startElement(String uri, String local, String name, Attributes attributes) throws SAXException Starts the given element. Table cells and list items are automatically indented by emitting a tab character as ignorable whitespace.- Specified by:
startElement
in interfaceContentHandler
- Overrides:
startElement
in classSafeContentHandler
- Throws:
SAXException
-
endElement
Ends the given element. Block elements are automatically followed by a newline character.- Specified by:
endElement
in interfaceContentHandler
- Overrides:
endElement
in classSafeContentHandler
- Throws:
SAXException
-
characters
- Specified by:
characters
in interfaceContentHandler
- Overrides:
characters
in classSafeContentHandler
- Throws:
SAXException
- See Also:
-
startElement
- Throws:
SAXException
-
startElement
- Throws:
SAXException
-
startElement
- Throws:
SAXException
-
endElement
- Throws:
SAXException
-
characters
- Throws:
SAXException
-
newline
- Throws:
SAXException
-
element
Emits an XHTML element with the given text content. If the given text value is null or empty, then the element is not written.- Parameters:
name
- XHTML element namevalue
- element value, possiblynull
- Throws:
SAXException
- if the content element could not be written
-
isInvalid
protected boolean isInvalid(int ch) Description copied from class:SafeContentHandler
Checks whether the given Unicode character is an invalid XML character and should be replaced for output. Subclasses can override this method to use an alternative definition of which characters should be replaced in the XML output. The default definition from the XML specification is:Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
- Overrides:
isInvalid
in classSafeContentHandler
- Parameters:
ch
- character- Returns:
true
if the character should be replaced,false
otherwise
-