public class OOXMLWordAndPowerPointTextHandler extends DefaultHandler
This class does not generally check for namespaces, and it can be applied to PPTX and DOCX for text extraction.
This can be used to scrape content from charts. It currently ignores formula (<c:f/>) elements
This does not work with .xlsx or .vsdx.
TODO: move this into POI?
| Modifier and Type | Class and Description |
|---|---|
static class |
OOXMLWordAndPowerPointTextHandler.EditType |
static interface |
OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler |
| Constructor and Description |
|---|
OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler,
Map<String,String> hyperlinks) |
OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler,
Map<String,String> hyperlinks,
boolean includeTextBox,
boolean concatenatePhoneticRuns) |
| Modifier and Type | Method and Description |
|---|---|
void |
characters(char[] ch,
int start,
int length) |
void |
endDocument() |
void |
endElement(String uri,
String localName,
String qName) |
void |
endPrefixMapping(String prefix) |
void |
ignorableWhitespace(char[] ch,
int start,
int length) |
void |
startDocument() |
void |
startElement(String uri,
String localName,
String qName,
Attributes atts) |
void |
startPrefixMapping(String prefix,
String uri) |
error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, unparsedEntityDecl, warningpublic static final String W_NS
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks)
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns)
public void startDocument()
throws SAXException
startDocument in interface ContentHandlerstartDocument in class DefaultHandlerSAXExceptionpublic void endDocument()
throws SAXException
endDocument in interface ContentHandlerendDocument in class DefaultHandlerSAXExceptionpublic void startPrefixMapping(String prefix, String uri) throws SAXException
startPrefixMapping in interface ContentHandlerstartPrefixMapping in class DefaultHandlerSAXExceptionpublic void endPrefixMapping(String prefix) throws SAXException
endPrefixMapping in interface ContentHandlerendPrefixMapping in class DefaultHandlerSAXExceptionpublic void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
startElement in interface ContentHandlerstartElement in class DefaultHandlerSAXExceptionpublic void endElement(String uri, String localName, String qName) throws SAXException
endElement in interface ContentHandlerendElement in class DefaultHandlerSAXExceptionpublic void characters(char[] ch,
int start,
int length)
throws SAXException
characters in interface ContentHandlercharacters in class DefaultHandlerSAXExceptionpublic void ignorableWhitespace(char[] ch,
int start,
int length)
throws SAXException
ignorableWhitespace in interface ContentHandlerignorableWhitespace in class DefaultHandlerSAXExceptionCopyright © 2007–2022 The Apache Software Foundation. All rights reserved.