public class SXWPFWordExtractorDecorator extends AbstractOOXMLExtractor
This will be better for some use cases than the classic docx extractor; and, it will be worse for others.
config, EMBEDDED_RELATIONSHIPS, extractor
Constructor and Description |
---|
SXWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
XWPFEventBasedWordExtractor extractor) |
Modifier and Type | Method and Description |
---|---|
protected void |
buildXHTML(XHTMLContentHandler xhtml)
Populates the
XHTMLContentHandler object received as parameter. |
protected List<org.apache.poi.openxml4j.opc.PackagePart> |
getMainDocumentParts()
This returns all items that might contain embedded objects:
main document, headers, footers, comments, etc.
|
getDocument, getJustFileName, getMetadataExtractor, getXHTML, handleEmbeddedFile, loadLinkedRelationships
public SXWPFWordExtractorDecorator(Metadata metadata, ParseContext context, XWPFEventBasedWordExtractor extractor)
protected void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException
AbstractOOXMLExtractor
XHTMLContentHandler
object received as parameter.buildXHTML
in class AbstractOOXMLExtractor
SAXException
org.apache.xmlbeans.XmlException
IOException
protected List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts()
getMainDocumentParts
in class AbstractOOXMLExtractor
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.