public abstract class AbstractOOXMLExtractor extends Object implements OOXMLExtractor
buildXHTML(XHTMLContentHandler) that
populates the XHTMLContentHandler object received as parameter.| Modifier and Type | Field and Description |
|---|---|
protected OfficeParserConfig |
config |
protected static String[] |
EMBEDDED_RELATIONSHIPS |
protected org.apache.poi.ooxml.extractor.POIXMLTextExtractor |
extractor |
| Constructor and Description |
|---|
AbstractOOXMLExtractor(ParseContext context,
org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor) |
| Modifier and Type | Method and Description |
|---|---|
protected abstract void |
buildXHTML(XHTMLContentHandler xhtml)
Populates the
XHTMLContentHandler object received as parameter. |
org.apache.poi.ooxml.POIXMLDocument |
getDocument()
Returns the opened document.
|
protected String |
getJustFileName(String desc) |
protected abstract List<org.apache.poi.openxml4j.opc.PackagePart> |
getMainDocumentParts()
Return a list of the main parts of the document, used
when searching for embedded resources.
|
MetadataExtractor |
getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor()
not yet supported
for OOXML by POI. |
void |
getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the
given content handler.
|
protected void |
handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part,
XHTMLContentHandler xhtml,
String rel)
Handles an embedded file in the document
|
protected Map<String,String> |
loadLinkedRelationships(org.apache.poi.openxml4j.opc.PackagePart bodyPart,
boolean includeInternal,
Metadata metadata)
This is used by the SAX docx and pptx decorators to load hyperlinks and
other linked objects
|
protected static final String[] EMBEDDED_RELATIONSHIPS
protected OfficeParserConfig config
protected org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor
public AbstractOOXMLExtractor(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor)
public org.apache.poi.ooxml.POIXMLDocument getDocument()
OOXMLExtractorgetDocument in interface OOXMLExtractorOOXMLExtractor.getDocument()public MetadataExtractor getMetadataExtractor()
OOXMLExtractorPOIXMLTextExtractor.getMetadataTextExtractor()
not yet supported
for OOXML by POI.getMetadataExtractor in interface OOXMLExtractorOOXMLExtractor.getMetadataExtractor()public void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException
OOXMLExtractorgetXHTML in interface OOXMLExtractorSAXExceptionorg.apache.xmlbeans.XmlExceptionIOExceptionTikaExceptionOOXMLExtractor.getXHTML(ContentHandler, Metadata,
ParseContext)protected void handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part,
XHTMLContentHandler xhtml,
String rel)
throws SAXException,
IOException
SAXExceptionIOExceptionprotected abstract void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException
XHTMLContentHandler object received as parameter.SAXExceptionorg.apache.xmlbeans.XmlExceptionIOExceptionprotected abstract List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts() throws TikaException
TikaExceptionCopyright © 2007–2022 The Apache Software Foundation. All rights reserved.