org.apache.tika.parser.microsoft.ooxml
Interface OOXMLExtractor

All Known Implementing Classes:
AbstractOOXMLExtractor, POIXMLTextExtractorDecorator, XSLFPowerPointExtractorDecorator, XSSFExcelExtractorDecorator, XWPFWordExtractorDecorator

public interface OOXMLExtractor

Interface implemented by all Tika OOXML extractors.

See Also:
POIXMLTextExtractor

Method Summary
 org.apache.poi.POIXMLDocument getDocument()
          Returns the opened document.
 MetadataExtractor getMetadataExtractor()
          POIXMLTextExtractor#getMetadataTextExtractor() not yet supported for OOXML by POI.
 void getXHTML(org.xml.sax.ContentHandler handler, Metadata metadata)
          Parses the document into a sequence of XHTML SAX events sent to the given content handler.
 

Method Detail

getDocument

org.apache.poi.POIXMLDocument getDocument()
Returns the opened document.

See Also:
POIXMLTextExtractor#getDocument()

getMetadataExtractor

MetadataExtractor getMetadataExtractor()
POIXMLTextExtractor#getMetadataTextExtractor() not yet supported for OOXML by POI.


getXHTML

void getXHTML(org.xml.sax.ContentHandler handler,
              Metadata metadata)
              throws org.xml.sax.SAXException,
                     org.apache.xmlbeans.XmlException,
                     java.io.IOException
Parses the document into a sequence of XHTML SAX events sent to the given content handler.

Throws:
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException


Copyright © 2007-2010 The Apache Software Foundation. All Rights Reserved.