Class OOXMLWordAndPowerPointTextHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
-
- All Implemented Interfaces:
ContentHandler,DTDHandler,EntityResolver,ErrorHandler
public class OOXMLWordAndPowerPointTextHandler extends DefaultHandler
This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.This class does not generally check for namespaces, and it can be applied to PPTX and DOCX for text extraction.
This can be used to scrape content from charts. It currently ignores formula (<c:f/>) elements
This does not work with .xlsx or .vsdx.TODO: move this into POI?
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classOOXMLWordAndPowerPointTextHandler.EditTypestatic interfaceOOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
-
Constructor Summary
Constructors Constructor Description OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks)OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcharacters(char[] ch, int start, int length)voidendDocument()voidendElement(String uri, String localName, String qName)voidendPrefixMapping(String prefix)voidignorableWhitespace(char[] ch, int start, int length)voidstartDocument()voidstartElement(String uri, String localName, String qName, Attributes atts)voidstartPrefixMapping(String prefix, String uri)-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
W_NS
public static final String W_NS
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks)
-
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns)
-
-
Method Detail
-
startDocument
public void startDocument() throws SAXException- Specified by:
startDocumentin interfaceContentHandler- Overrides:
startDocumentin classDefaultHandler- Throws:
SAXException
-
endDocument
public void endDocument() throws SAXException- Specified by:
endDocumentin interfaceContentHandler- Overrides:
endDocumentin classDefaultHandler- Throws:
SAXException
-
startPrefixMapping
public void startPrefixMapping(String prefix, String uri) throws SAXException
- Specified by:
startPrefixMappingin interfaceContentHandler- Overrides:
startPrefixMappingin classDefaultHandler- Throws:
SAXException
-
endPrefixMapping
public void endPrefixMapping(String prefix) throws SAXException
- Specified by:
endPrefixMappingin interfaceContentHandler- Overrides:
endPrefixMappingin classDefaultHandler- Throws:
SAXException
-
startElement
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
- Specified by:
startElementin interfaceContentHandler- Overrides:
startElementin classDefaultHandler- Throws:
SAXException
-
endElement
public void endElement(String uri, String localName, String qName) throws SAXException
- Specified by:
endElementin interfaceContentHandler- Overrides:
endElementin classDefaultHandler- Throws:
SAXException
-
characters
public void characters(char[] ch, int start, int length) throws SAXException- Specified by:
charactersin interfaceContentHandler- Overrides:
charactersin classDefaultHandler- Throws:
SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException- Specified by:
ignorableWhitespacein interfaceContentHandler- Overrides:
ignorableWhitespacein classDefaultHandler- Throws:
SAXException
-
-