org.apache.tika.parser.microsoft.ooxml
Class AbstractOOXMLExtractor

java.lang.Object
  extended by org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
All Implemented Interfaces:
OOXMLExtractor
Direct Known Subclasses:
POIXMLTextExtractorDecorator, XSLFPowerPointExtractorDecorator, XSSFExcelExtractorDecorator, XWPFWordExtractorDecorator

public abstract class AbstractOOXMLExtractor
extends java.lang.Object
implements OOXMLExtractor

Base class for all Tika OOXML extractors. Tika extractors decorate POI extractors so that the parsed content of documents is returned as a sequence of XHTML SAX events. Subclasses must implement the buildXHTML method buildXHTML(XHTMLContentHandler) that populates the XHTMLContentHandler object received as parameter.


Field Summary
protected  org.apache.poi.POIXMLTextExtractor extractor
           
 
Constructor Summary
AbstractOOXMLExtractor(org.apache.poi.POIXMLTextExtractor extractor, java.lang.String type)
           
 
Method Summary
protected abstract  void buildXHTML(XHTMLContentHandler xhtml)
          Populates the XHTMLContentHandler object received as parameter.
 org.apache.poi.POIXMLDocument getDocument()
          Returns the opened document.
 MetadataExtractor getMetadataExtractor()
          POIXMLTextExtractor#getMetadataTextExtractor() not yet supported for OOXML by POI.
 void getXHTML(org.xml.sax.ContentHandler handler, Metadata metadata)
          Parses the document into a sequence of XHTML SAX events sent to the given content handler.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

extractor

protected org.apache.poi.POIXMLTextExtractor extractor
Constructor Detail

AbstractOOXMLExtractor

public AbstractOOXMLExtractor(org.apache.poi.POIXMLTextExtractor extractor,
                              java.lang.String type)
Method Detail

getDocument

public org.apache.poi.POIXMLDocument getDocument()
Description copied from interface: OOXMLExtractor
Returns the opened document.

Specified by:
getDocument in interface OOXMLExtractor
See Also:
OOXMLExtractor.getDocument()

getMetadataExtractor

public MetadataExtractor getMetadataExtractor()
Description copied from interface: OOXMLExtractor
POIXMLTextExtractor#getMetadataTextExtractor() not yet supported for OOXML by POI.

Specified by:
getMetadataExtractor in interface OOXMLExtractor
See Also:
OOXMLExtractor.getMetadataExtractor()

getXHTML

public void getXHTML(org.xml.sax.ContentHandler handler,
                     Metadata metadata)
              throws org.xml.sax.SAXException,
                     org.apache.xmlbeans.XmlException,
                     java.io.IOException
Description copied from interface: OOXMLExtractor
Parses the document into a sequence of XHTML SAX events sent to the given content handler.

Specified by:
getXHTML in interface OOXMLExtractor
Throws:
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException
See Also:
OOXMLExtractor.getXHTML(org.xml.sax.ContentHandler, org.apache.tika.metadata.Metadata)

buildXHTML

protected abstract void buildXHTML(XHTMLContentHandler xhtml)
                            throws org.xml.sax.SAXException,
                                   org.apache.xmlbeans.XmlException,
                                   java.io.IOException
Populates the XHTMLContentHandler object received as parameter.

Throws:
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException


Copyright © 2007-2010 The Apache Software Foundation. All Rights Reserved.