org.apache.tika.parser.microsoft.ooxml
Class XWPFWordExtractorDecorator

java.lang.Object
  extended by org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
      extended by org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator
All Implemented Interfaces:
OOXMLExtractor

public class XWPFWordExtractorDecorator
extends AbstractOOXMLExtractor


Field Summary
 
Fields inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
extractor
 
Constructor Summary
XWPFWordExtractorDecorator(ParseContext context, org.apache.poi.xwpf.extractor.XWPFWordExtractor extractor)
           
 
Method Summary
protected  void buildXHTML(XHTMLContentHandler xhtml)
          Populates the XHTMLContentHandler object received as parameter.
protected  java.util.List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts()
          Word documents are simple, they only have the one main part
 
Methods inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
getDocument, getMetadataExtractor, getXHTML, handleEmbedded
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XWPFWordExtractorDecorator

public XWPFWordExtractorDecorator(ParseContext context,
                                  org.apache.poi.xwpf.extractor.XWPFWordExtractor extractor)
Method Detail

buildXHTML

protected void buildXHTML(XHTMLContentHandler xhtml)
                   throws org.xml.sax.SAXException,
                          org.apache.xmlbeans.XmlException,
                          java.io.IOException
Description copied from class: AbstractOOXMLExtractor
Populates the XHTMLContentHandler object received as parameter.

Specified by:
buildXHTML in class AbstractOOXMLExtractor
Throws:
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException
See Also:
XWPFWordExtractor.getText()

getMainDocumentParts

protected java.util.List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts()
Word documents are simple, they only have the one main part

Specified by:
getMainDocumentParts in class AbstractOOXMLExtractor


Copyright © 2007-2011 The Apache Software Foundation. All Rights Reserved.