org.apache.tika.parser.microsoft.ooxml
Class AbstractOOXMLExtractor
java.lang.Object
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- All Implemented Interfaces:
- OOXMLExtractor
- Direct Known Subclasses:
- POIXMLTextExtractorDecorator, XSLFPowerPointExtractorDecorator, XSSFExcelExtractorDecorator, XWPFWordExtractorDecorator
public abstract class AbstractOOXMLExtractor
- extends java.lang.Object
- implements OOXMLExtractor
Base class for all Tika OOXML extractors.
Tika extractors decorate POI extractors so that the parsed content of
documents is returned as a sequence of XHTML SAX events. Subclasses must
implement the buildXHTML method buildXHTML(XHTMLContentHandler)
that
populates the XHTMLContentHandler
object received as parameter.
Field Summary |
protected org.apache.poi.POIXMLTextExtractor |
extractor
|
Constructor Summary |
AbstractOOXMLExtractor(org.apache.poi.POIXMLTextExtractor extractor,
java.lang.String type)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
extractor
protected org.apache.poi.POIXMLTextExtractor extractor
AbstractOOXMLExtractor
public AbstractOOXMLExtractor(org.apache.poi.POIXMLTextExtractor extractor,
java.lang.String type)
getDocument
public org.apache.poi.POIXMLDocument getDocument()
- Description copied from interface:
OOXMLExtractor
- Returns the opened document.
- Specified by:
getDocument
in interface OOXMLExtractor
- See Also:
OOXMLExtractor.getDocument()
getMetadataExtractor
public MetadataExtractor getMetadataExtractor()
- Description copied from interface:
OOXMLExtractor
POIXMLTextExtractor.getMetadataTextExtractor()
not yet supported
for OOXML by POI.
- Specified by:
getMetadataExtractor
in interface OOXMLExtractor
- See Also:
OOXMLExtractor.getMetadataExtractor()
getXHTML
public void getXHTML(org.xml.sax.ContentHandler handler,
Metadata metadata)
throws org.xml.sax.SAXException,
org.apache.xmlbeans.XmlException,
java.io.IOException
- Description copied from interface:
OOXMLExtractor
- Parses the document into a sequence of XHTML SAX events sent to the
given content handler.
- Specified by:
getXHTML
in interface OOXMLExtractor
- Throws:
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException
- See Also:
OOXMLExtractor.getXHTML(org.xml.sax.ContentHandler,
org.apache.tika.metadata.Metadata)
buildXHTML
protected abstract void buildXHTML(XHTMLContentHandler xhtml)
throws org.xml.sax.SAXException,
org.apache.xmlbeans.XmlException,
java.io.IOException
- Populates the
XHTMLContentHandler
object received as parameter.
- Throws:
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException
Copyright © 2010 The Apache Software Foundation. All Rights Reserved.