org.apache.tika.sax
Class XHTMLContentHandler

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by org.apache.tika.sax.ContentHandlerDecorator
          extended by org.apache.tika.sax.SafeContentHandler
              extended by org.apache.tika.sax.XHTMLContentHandler
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class XHTMLContentHandler
extends SafeContentHandler

Content handler decorator that simplifies the task of producing XHTML events for Tika content parsers.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.tika.sax.SafeContentHandler
SafeContentHandler.Output
 
Field Summary
static java.util.Set<java.lang.String> ENDLINE
          The elements that get appended with the NL character.
static java.lang.String XHTML
          The XHTML namespace URI
 
Constructor Summary
XHTMLContentHandler(org.xml.sax.ContentHandler handler, Metadata metadata)
           
 
Method Summary
 void characters(char[] ch, int start, int length)
           
 void characters(java.lang.String characters)
           
 void element(java.lang.String name, java.lang.String value)
          Emits an XHTML element with the given text content.
 void endDocument()
          Ends the XHTML document by writing the following footer and clearing the namespace mappings:
 void endElement(java.lang.String name)
           
 void endElement(java.lang.String uri, java.lang.String local, java.lang.String name)
          Ends the given element.
 void newline()
           
 void startDocument()
          Starts an XHTML document by setting up the namespace mappings.
 void startElement(java.lang.String name)
           
 void startElement(java.lang.String name, org.xml.sax.helpers.AttributesImpl attributes)
           
 void startElement(java.lang.String name, java.lang.String attribute, java.lang.String value)
           
 void startElement(java.lang.String uri, java.lang.String local, java.lang.String name, org.xml.sax.Attributes attributes)
          Starts the given element.
 
Methods inherited from class org.apache.tika.sax.SafeContentHandler
ignorableWhitespace, isInvalid, isInvalid, writeReplacement
 
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endPrefixMapping, handleException, processingInstruction, setContentHandler, setDocumentLocator, skippedEntity, startPrefixMapping, toString
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

XHTML

public static final java.lang.String XHTML
The XHTML namespace URI

See Also:
Constant Field Values

ENDLINE

public static final java.util.Set<java.lang.String> ENDLINE
The elements that get appended with the NL character.

Constructor Detail

XHTMLContentHandler

public XHTMLContentHandler(org.xml.sax.ContentHandler handler,
                           Metadata metadata)
Method Detail

startDocument

public void startDocument()
                   throws org.xml.sax.SAXException
Starts an XHTML document by setting up the namespace mappings. The standard XHTML prefix is generated lazily when the first element is started.

Specified by:
startDocument in interface org.xml.sax.ContentHandler
Overrides:
startDocument in class ContentHandlerDecorator
Throws:
org.xml.sax.SAXException

endDocument

public void endDocument()
                 throws org.xml.sax.SAXException
Ends the XHTML document by writing the following footer and clearing the namespace mappings:
   </body>
 </html>
 

Specified by:
endDocument in interface org.xml.sax.ContentHandler
Overrides:
endDocument in class SafeContentHandler
Throws:
org.xml.sax.SAXException

startElement

public void startElement(java.lang.String uri,
                         java.lang.String local,
                         java.lang.String name,
                         org.xml.sax.Attributes attributes)
                  throws org.xml.sax.SAXException
Starts the given element. Table cells and list items are automatically indented by emitting a tab character as ignorable whitespace.

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class SafeContentHandler
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String uri,
                       java.lang.String local,
                       java.lang.String name)
                throws org.xml.sax.SAXException
Ends the given element. Block elements are automatically followed by a newline character.

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class SafeContentHandler
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class SafeContentHandler
Throws:
org.xml.sax.SAXException
See Also:
TIKA-210

startElement

public void startElement(java.lang.String name)
                  throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

startElement

public void startElement(java.lang.String name,
                         java.lang.String attribute,
                         java.lang.String value)
                  throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

startElement

public void startElement(java.lang.String name,
                         org.xml.sax.helpers.AttributesImpl attributes)
                  throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String name)
                throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

characters

public void characters(java.lang.String characters)
                throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

newline

public void newline()
             throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

element

public void element(java.lang.String name,
                    java.lang.String value)
             throws org.xml.sax.SAXException
Emits an XHTML element with the given text content. If the given text value is null or empty, then the element is not written.

Parameters:
name - XHTML element name
value - element value, possibly null
Throws:
org.xml.sax.SAXException - if the content element could not be written


Copyright © 2007-2011 The Apache Software Foundation. All Rights Reserved.