org.apache.tika.parser.microsoft.ooxml
Interface OOXMLExtractor

All Known Implementing Classes:
AbstractOOXMLExtractor, POIXMLTextExtractorDecorator, XSLFPowerPointExtractorDecorator, XSSFExcelExtractorDecorator, XWPFWordExtractorDecorator

public interface OOXMLExtractor

Interface implemented by all Tika OOXML extractors.

See Also:
POIXMLTextExtractor

Method Summary
 org.apache.poi.POIXMLDocument getDocument()
          Returns the opened document.
 MetadataExtractor getMetadataExtractor()
          POIXMLTextExtractor.getMetadataTextExtractor() not yet supported for OOXML by POI.
 void getXHTML(org.xml.sax.ContentHandler handler, Metadata metadata, ParseContext context)
          Parses the document into a sequence of XHTML SAX events sent to the given content handler.
 

Method Detail

getDocument

org.apache.poi.POIXMLDocument getDocument()
Returns the opened document.

See Also:
POIXMLTextExtractor.getDocument()

getMetadataExtractor

MetadataExtractor getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor() not yet supported for OOXML by POI.


getXHTML

void getXHTML(org.xml.sax.ContentHandler handler,
              Metadata metadata,
              ParseContext context)
              throws org.xml.sax.SAXException,
                     org.apache.xmlbeans.XmlException,
                     java.io.IOException,
                     TikaException
Parses the document into a sequence of XHTML SAX events sent to the given content handler.

Throws:
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException
TikaException


Copyright © 2007-2010 The Apache Software Foundation. All Rights Reserved.