Package org.apache.tika.sax
Class XHTMLContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ContentHandlerDecorator
-
- org.apache.tika.sax.SafeContentHandler
-
- org.apache.tika.sax.XHTMLContentHandler
-
- All Implemented Interfaces:
ContentHandler,DTDHandler,EntityResolver,ErrorHandler
public class XHTMLContentHandler extends SafeContentHandler
Content handler decorator that simplifies the task of producing XHTML events for Tika content parsers.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.tika.sax.SafeContentHandler
SafeContentHandler.Output
-
-
Constructor Summary
Constructors Constructor Description XHTMLContentHandler(ContentHandler handler, Metadata metadata)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcharacters(char[] ch, int start, int length)voidcharacters(String characters)voidelement(String name, String value)Emits an XHTML element with the given text content.voidendDocument()Ends the XHTML document by writing the following footer and clearing the namespace mappings:voidendElement(String name)voidendElement(String uri, String local, String name)Ends the given element.protected booleanisInvalid(int ch)Checks whether the given Unicode character is an invalid XML character and should be replaced for output.voidnewline()voidstartDocument()Starts an XHTML document by setting up the namespace mappings when called for the first time.voidstartElement(String name)voidstartElement(String name, String attribute, String value)voidstartElement(String uri, String local, String name, Attributes attributes)Starts the given element.voidstartElement(String name, AttributesImpl attributes)-
Methods inherited from class org.apache.tika.sax.SafeContentHandler
ignorableWhitespace, writeReplacement
-
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endPrefixMapping, error, fatalError, handleException, processingInstruction, setContentHandler, setDocumentLocator, skippedEntity, startPrefixMapping, toString, warning
-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
notationDecl, resolveEntity, unparsedEntityDecl
-
-
-
-
Field Detail
-
XHTML
public static final String XHTML
The XHTML namespace URI- See Also:
- Constant Field Values
-
-
Constructor Detail
-
XHTMLContentHandler
public XHTMLContentHandler(ContentHandler handler, Metadata metadata)
-
-
Method Detail
-
startDocument
public void startDocument() throws SAXExceptionStarts an XHTML document by setting up the namespace mappings when called for the first time. The standard XHTML prefix is generated lazily when the first element is started.- Specified by:
startDocumentin interfaceContentHandler- Overrides:
startDocumentin classContentHandlerDecorator- Throws:
SAXException
-
endDocument
public void endDocument() throws SAXExceptionEnds the XHTML document by writing the following footer and clearing the namespace mappings:</body> </html>
- Specified by:
endDocumentin interfaceContentHandler- Overrides:
endDocumentin classSafeContentHandler- Throws:
SAXException
-
startElement
public void startElement(String uri, String local, String name, Attributes attributes) throws SAXException
Starts the given element. Table cells and list items are automatically indented by emitting a tab character as ignorable whitespace.- Specified by:
startElementin interfaceContentHandler- Overrides:
startElementin classSafeContentHandler- Throws:
SAXException
-
endElement
public void endElement(String uri, String local, String name) throws SAXException
Ends the given element. Block elements are automatically followed by a newline character.- Specified by:
endElementin interfaceContentHandler- Overrides:
endElementin classSafeContentHandler- Throws:
SAXException
-
characters
public void characters(char[] ch, int start, int length) throws SAXException- Specified by:
charactersin interfaceContentHandler- Overrides:
charactersin classSafeContentHandler- Throws:
SAXException- See Also:
- TIKA-210
-
startElement
public void startElement(String name) throws SAXException
- Throws:
SAXException
-
startElement
public void startElement(String name, String attribute, String value) throws SAXException
- Throws:
SAXException
-
startElement
public void startElement(String name, AttributesImpl attributes) throws SAXException
- Throws:
SAXException
-
endElement
public void endElement(String name) throws SAXException
- Throws:
SAXException
-
characters
public void characters(String characters) throws SAXException
- Throws:
SAXException
-
newline
public void newline() throws SAXException- Throws:
SAXException
-
element
public void element(String name, String value) throws SAXException
Emits an XHTML element with the given text content. If the given text value is null or empty, then the element is not written.- Parameters:
name- XHTML element namevalue- element value, possiblynull- Throws:
SAXException- if the content element could not be written
-
isInvalid
protected boolean isInvalid(int ch)
Description copied from class:SafeContentHandlerChecks whether the given Unicode character is an invalid XML character and should be replaced for output. Subclasses can override this method to use an alternative definition of which characters should be replaced in the XML output. The default definition from the XML specification is:Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
- Overrides:
isInvalidin classSafeContentHandler- Parameters:
ch- character- Returns:
trueif the character should be replaced,falseotherwise
-
-