public abstract class AbstractOOXMLExtractor extends Object implements OOXMLExtractor
buildXHTML(XHTMLContentHandler)
that
populates the XHTMLContentHandler
object received as parameter.Modifier and Type | Field and Description |
---|---|
protected OfficeParserConfig |
config |
protected static String[] |
EMBEDDED_RELATIONSHIPS |
protected org.apache.poi.POIXMLTextExtractor |
extractor |
Constructor and Description |
---|
AbstractOOXMLExtractor(ParseContext context,
org.apache.poi.POIXMLTextExtractor extractor) |
Modifier and Type | Method and Description |
---|---|
protected abstract void |
buildXHTML(XHTMLContentHandler xhtml)
Populates the
XHTMLContentHandler object received as parameter. |
org.apache.poi.POIXMLDocument |
getDocument()
Returns the opened document.
|
protected String |
getJustFileName(String desc) |
protected abstract List<org.apache.poi.openxml4j.opc.PackagePart> |
getMainDocumentParts()
Return a list of the main parts of the document, used
when searching for embedded resources.
|
MetadataExtractor |
getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor() not yet supported
for OOXML by POI. |
void |
getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the
given content handler.
|
protected void |
handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part,
ContentHandler handler,
String rel)
Handles an embedded file in the document
|
protected Map<String,String> |
loadLinkedRelationships(org.apache.poi.openxml4j.opc.PackagePart bodyPart,
boolean includeInternal,
Metadata metadata)
This is used by the SAX docx and pptx decorators to load hyperlinks and
other linked objects
|
protected static final String[] EMBEDDED_RELATIONSHIPS
protected OfficeParserConfig config
protected org.apache.poi.POIXMLTextExtractor extractor
public AbstractOOXMLExtractor(ParseContext context, org.apache.poi.POIXMLTextExtractor extractor)
public org.apache.poi.POIXMLDocument getDocument()
OOXMLExtractor
getDocument
in interface OOXMLExtractor
OOXMLExtractor.getDocument()
public MetadataExtractor getMetadataExtractor()
OOXMLExtractor
POIXMLTextExtractor.getMetadataTextExtractor()
not yet supported
for OOXML by POI.getMetadataExtractor
in interface OOXMLExtractor
OOXMLExtractor.getMetadataExtractor()
public void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException
OOXMLExtractor
getXHTML
in interface OOXMLExtractor
SAXException
org.apache.xmlbeans.XmlException
IOException
TikaException
OOXMLExtractor.getXHTML(ContentHandler, Metadata, ParseContext)
protected void handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part, ContentHandler handler, String rel) throws SAXException, IOException
SAXException
IOException
protected abstract void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException
XHTMLContentHandler
object received as parameter.SAXException
org.apache.xmlbeans.XmlException
IOException
protected abstract List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts() throws TikaException
TikaException
Copyright © 2007–2018 The Apache Software Foundation. All rights reserved.