Class AbstractOOXMLExtractor
- java.lang.Object
- 
- org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
 
- 
- All Implemented Interfaces:
- OOXMLExtractor
 - Direct Known Subclasses:
- POIXMLTextExtractorDecorator,- SXSLFPowerPointExtractorDecorator,- SXWPFWordExtractorDecorator,- XPSExtractorDecorator,- XSLFPowerPointExtractorDecorator,- XSSFExcelExtractorDecorator,- XWPFWordExtractorDecorator
 
 public abstract class AbstractOOXMLExtractor extends Object implements OOXMLExtractor Base class for all Tika OOXML extractors. Tika extractors decorate POI extractors so that the parsed content of documents is returned as a sequence of XHTML SAX events. Subclasses must implement the buildXHTML methodbuildXHTML(XHTMLContentHandler)that populates theXHTMLContentHandlerobject received as parameter.
- 
- 
Field SummaryFields Modifier and Type Field Description protected OfficeParserConfigconfigprotected static String[]EMBEDDED_RELATIONSHIPSprotected org.apache.poi.ooxml.extractor.POIXMLTextExtractorextractor
 - 
Constructor SummaryConstructors Constructor Description AbstractOOXMLExtractor(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor)
 - 
Method SummaryAll Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract voidbuildXHTML(XHTMLContentHandler xhtml)Populates theXHTMLContentHandlerobject received as parameter.org.apache.poi.ooxml.POIXMLDocumentgetDocument()Returns the opened document.protected Map<String,EmbeddedPartMetadata>getEmbeddedPartMetadataMap()protected StringgetJustFileName(String desc)protected abstract List<org.apache.poi.openxml4j.opc.PackagePart>getMainDocumentParts()Return a list of the main parts of the document, used when searching for embedded resources.MetadataExtractorgetMetadataExtractor()POIXMLTextExtractor.getMetadataTextExtractor()not yet supported for OOXML by POI.voidgetXHTML(ContentHandler handler, Metadata metadata, ParseContext context)Parses the document into a sequence of XHTML SAX events sent to the given content handler.protected voidhandleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part, XHTMLContentHandler xhtml, String rel, EmbeddedPartMetadata embeddedPartMetadata, TikaCoreProperties.EmbeddedResourceType embeddedResourceType)Handles an embedded file in the documentprotected Map<String,String>loadLinkedRelationships(org.apache.poi.openxml4j.opc.PackagePart bodyPart, boolean includeInternal, Metadata metadata)This is used by the SAX docx and pptx decorators to load hyperlinks and other linked objects
 
- 
- 
- 
Field Detail- 
EMBEDDED_RELATIONSHIPSprotected static final String[] EMBEDDED_RELATIONSHIPS 
 - 
configprotected OfficeParserConfig config 
 - 
extractorprotected org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor 
 
- 
 - 
Constructor Detail- 
AbstractOOXMLExtractorpublic AbstractOOXMLExtractor(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor) 
 
- 
 - 
Method Detail- 
getDocumentpublic org.apache.poi.ooxml.POIXMLDocument getDocument() Description copied from interface:OOXMLExtractorReturns the opened document.- Specified by:
- getDocumentin interface- OOXMLExtractor
- See Also:
- OOXMLExtractor.getDocument()
 
 - 
getMetadataExtractorpublic MetadataExtractor getMetadataExtractor() Description copied from interface:OOXMLExtractorPOIXMLTextExtractor.getMetadataTextExtractor()not yet supported for OOXML by POI.- Specified by:
- getMetadataExtractorin interface- OOXMLExtractor
- See Also:
- OOXMLExtractor.getMetadataExtractor()
 
 - 
getXHTMLpublic void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException Description copied from interface:OOXMLExtractorParses the document into a sequence of XHTML SAX events sent to the given content handler.- Specified by:
- getXHTMLin interface- OOXMLExtractor
- Throws:
- SAXException
- org.apache.xmlbeans.XmlException
- IOException
- TikaException
- See Also:
- OOXMLExtractor.getXHTML(ContentHandler, Metadata, ParseContext)
 
 - 
getEmbeddedPartMetadataMapprotected Map<String,EmbeddedPartMetadata> getEmbeddedPartMetadataMap() 
 - 
handleEmbeddedFileprotected void handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part, XHTMLContentHandler xhtml, String rel, EmbeddedPartMetadata embeddedPartMetadata, TikaCoreProperties.EmbeddedResourceType embeddedResourceType) throws SAXException, IOExceptionHandles an embedded file in the document- Throws:
- SAXException
- IOException
 
 - 
buildXHTMLprotected abstract void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException Populates theXHTMLContentHandlerobject received as parameter.- Throws:
- SAXException
- org.apache.xmlbeans.XmlException
- IOException
 
 - 
getMainDocumentPartsprotected abstract List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts() throws TikaException Return a list of the main parts of the document, used when searching for embedded resources. This should be all the parts of the document that end up with things embedded into them.- Throws:
- TikaException
 
 
- 
 
-