org.apache.tika.parser.microsoft.ooxml
Interface OOXMLExtractor

All Known Implementing Classes:
AbstractOOXMLExtractor, POIXMLTextExtractorDecorator, XSLFPowerPointExtractorDecorator, XSSFExcelExtractorDecorator, XWPFWordExtractorDecorator

public interface OOXMLExtractor

Interface implemented by all Tika OOXML extractors.

See Also:
POIXMLTextExtractor

Method Summary
 org.apache.poi.POIXMLDocument getDocument()
          Returns the opened document.
 MetadataExtractor getMetadataExtractor()
          POIXMLTextExtractor.getMetadataTextExtractor() not yet supported for OOXML by POI.
 void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context)
          Parses the document into a sequence of XHTML SAX events sent to the given content handler.
 

Method Detail

getDocument

org.apache.poi.POIXMLDocument getDocument()
Returns the opened document.

See Also:
POIXMLTextExtractor.getDocument()

getMetadataExtractor

MetadataExtractor getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor() not yet supported for OOXML by POI.


getXHTML

void getXHTML(ContentHandler handler,
              Metadata metadata,
              ParseContext context)
              throws SAXException,
                     org.apache.xmlbeans.XmlException,
                     IOException,
                     TikaException
Parses the document into a sequence of XHTML SAX events sent to the given content handler.

Throws:
SAXException
org.apache.xmlbeans.XmlException
IOException
TikaException


Copyright © 2007-2011 The Apache Software Foundation. All Rights Reserved.