Class OOXMLWordAndPowerPointTextHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
-
- All Implemented Interfaces:
ContentHandler
,DTDHandler
,EntityResolver
,ErrorHandler
public class OOXMLWordAndPowerPointTextHandler extends DefaultHandler
This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.This class does not generally check for namespaces, and it can be applied to PPTX and DOCX for text extraction.
TODO: move this into POI?
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
OOXMLWordAndPowerPointTextHandler.EditType
static interface
OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
-
Constructor Summary
Constructors Constructor Description OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks)
OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
characters(char[] ch, int start, int length)
void
endDocument()
void
endElement(String uri, String localName, String qName)
void
endPrefixMapping(String prefix)
void
ignorableWhitespace(char[] ch, int start, int length)
void
startDocument()
void
startElement(String uri, String localName, String qName, Attributes atts)
void
startPrefixMapping(String prefix, String uri)
-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
W_NS
public static final String W_NS
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks)
-
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, Map<String,String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns)
-
-
Method Detail
-
startDocument
public void startDocument() throws SAXException
- Specified by:
startDocument
in interfaceContentHandler
- Overrides:
startDocument
in classDefaultHandler
- Throws:
SAXException
-
endDocument
public void endDocument() throws SAXException
- Specified by:
endDocument
in interfaceContentHandler
- Overrides:
endDocument
in classDefaultHandler
- Throws:
SAXException
-
startPrefixMapping
public void startPrefixMapping(String prefix, String uri) throws SAXException
- Specified by:
startPrefixMapping
in interfaceContentHandler
- Overrides:
startPrefixMapping
in classDefaultHandler
- Throws:
SAXException
-
endPrefixMapping
public void endPrefixMapping(String prefix) throws SAXException
- Specified by:
endPrefixMapping
in interfaceContentHandler
- Overrides:
endPrefixMapping
in classDefaultHandler
- Throws:
SAXException
-
startElement
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
- Specified by:
startElement
in interfaceContentHandler
- Overrides:
startElement
in classDefaultHandler
- Throws:
SAXException
-
endElement
public void endElement(String uri, String localName, String qName) throws SAXException
- Specified by:
endElement
in interfaceContentHandler
- Overrides:
endElement
in classDefaultHandler
- Throws:
SAXException
-
characters
public void characters(char[] ch, int start, int length) throws SAXException
- Specified by:
characters
in interfaceContentHandler
- Overrides:
characters
in classDefaultHandler
- Throws:
SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException
- Specified by:
ignorableWhitespace
in interfaceContentHandler
- Overrides:
ignorableWhitespace
in classDefaultHandler
- Throws:
SAXException
-
-