Interface OOXMLExtractor
- All Known Implementing Classes:
AbstractOOXMLExtractor
,POIXMLTextExtractorDecorator
,SXSLFPowerPointExtractorDecorator
,SXWPFWordExtractorDecorator
,XPSExtractorDecorator
,XSLFPowerPointExtractorDecorator
,XSSFBExcelExtractorDecorator
,XSSFExcelExtractorDecorator
,XWPFWordExtractorDecorator
public interface OOXMLExtractor
Interface implemented by all Tika OOXML extractors.
- See Also:
-
POIXMLTextExtractor
-
Method Summary
Modifier and TypeMethodDescriptionorg.apache.poi.ooxml.POIXMLDocument
Returns the opened document.POIXMLTextExtractor.getMetadataTextExtractor()
not yet supported for OOXML by POI.void
getXHTML
(ContentHandler handler, Metadata metadata, ParseContext context) Parses the document into a sequence of XHTML SAX events sent to the given content handler.
-
Method Details
-
getDocument
org.apache.poi.ooxml.POIXMLDocument getDocument()Returns the opened document.- See Also:
-
POIXMLTextExtractor.getDocument()
-
getMetadataExtractor
MetadataExtractor getMetadataExtractor()POIXMLTextExtractor.getMetadataTextExtractor()
not yet supported for OOXML by POI. -
getXHTML
void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException Parses the document into a sequence of XHTML SAX events sent to the given content handler.- Throws:
SAXException
org.apache.xmlbeans.XmlException
IOException
TikaException
-