public class OOXMLWordAndPowerPointTextHandler extends DefaultHandler
This class does not generally check for namespaces, and it can be applied to PPTX and DOCX for text extraction.
This can be used to scrape content from charts. It currently ignores formula (<c:f/>) elements
This does not work with .xlsx or .vsdx.
TODO: move this into POI?
Modifier and Type | Class and Description |
---|---|
static class |
OOXMLWordAndPowerPointTextHandler.EditType |
static interface |
OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler |
Constructor and Description |
---|
OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler,
Map<String,String> hyperlinks) |
OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler,
Map<String,String> hyperlinks,
boolean includeTextBox,
boolean concatenatePhoneticRuns) |
Modifier and Type | Method and Description |
---|---|
void |
characters(char[] ch,
int start,
int length) |
void |
endDocument() |
void |
endElement(String uri,
String localName,
String qName) |
void |
endPrefixMapping(String prefix) |
void |
ignorableWhitespace(char[] ch,
int start,
int length) |
void |
startDocument() |
void |
startElement(String uri,
String localName,
String qName,
Attributes atts) |
void |
startPrefixMapping(String prefix,
String uri) |
error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, unparsedEntityDecl, warning
public static final String W_NS
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks)
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns)
public void startDocument() throws SAXException
startDocument
in interface ContentHandler
startDocument
in class DefaultHandler
SAXException
public void endDocument() throws SAXException
endDocument
in interface ContentHandler
endDocument
in class DefaultHandler
SAXException
public void startPrefixMapping(String prefix, String uri) throws SAXException
startPrefixMapping
in interface ContentHandler
startPrefixMapping
in class DefaultHandler
SAXException
public void endPrefixMapping(String prefix) throws SAXException
endPrefixMapping
in interface ContentHandler
endPrefixMapping
in class DefaultHandler
SAXException
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
startElement
in interface ContentHandler
startElement
in class DefaultHandler
SAXException
public void endElement(String uri, String localName, String qName) throws SAXException
endElement
in interface ContentHandler
endElement
in class DefaultHandler
SAXException
public void characters(char[] ch, int start, int length) throws SAXException
characters
in interface ContentHandler
characters
in class DefaultHandler
SAXException
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException
ignorableWhitespace
in interface ContentHandler
ignorableWhitespace
in class DefaultHandler
SAXException
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.