public class SXWPFWordExtractorDecorator extends AbstractOOXMLExtractor
This will be better for some use cases than the classic docx extractor; and, it will be worse for others.
config, EMBEDDED_RELATIONSHIPS, extractor| Constructor and Description |
|---|
SXWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
XWPFEventBasedWordExtractor extractor) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
buildXHTML(XHTMLContentHandler xhtml)
Populates the
XHTMLContentHandler object received as parameter. |
protected List<org.apache.poi.openxml4j.opc.PackagePart> |
getMainDocumentParts()
This returns all items that might contain embedded objects:
main document, headers, footers, comments, etc.
|
getDocument, getJustFileName, getMetadataExtractor, getXHTML, handleEmbeddedFile, loadLinkedRelationshipspublic SXWPFWordExtractorDecorator(Metadata metadata, ParseContext context, XWPFEventBasedWordExtractor extractor)
protected void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException
AbstractOOXMLExtractorXHTMLContentHandler object received as parameter.buildXHTML in class AbstractOOXMLExtractorSAXExceptionorg.apache.xmlbeans.XmlExceptionIOExceptionprotected List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts()
getMainDocumentParts in class AbstractOOXMLExtractorCopyright © 2007–2021 The Apache Software Foundation. All rights reserved.