|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
public abstract class AbstractOOXMLExtractor
Base class for all Tika OOXML extractors.
Tika extractors decorate POI extractors so that the parsed content of
documents is returned as a sequence of XHTML SAX events. Subclasses must
implement the buildXHTML method buildXHTML(XHTMLContentHandler)
that
populates the XHTMLContentHandler
object received as parameter.
Field Summary | |
---|---|
protected org.apache.poi.POIXMLTextExtractor |
extractor
|
Constructor Summary | |
---|---|
AbstractOOXMLExtractor(ParseContext context,
org.apache.poi.POIXMLTextExtractor extractor)
|
Method Summary | |
---|---|
protected abstract void |
buildXHTML(XHTMLContentHandler xhtml)
Populates the XHTMLContentHandler object received as parameter. |
org.apache.poi.POIXMLDocument |
getDocument()
Returns the opened document. |
protected abstract List<org.apache.poi.openxml4j.opc.PackagePart> |
getMainDocumentParts()
Return a list of the main parts of the document, used when searching for embedded resources. |
MetadataExtractor |
getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor() not yet supported
for OOXML by POI. |
void |
getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the given content handler. |
protected void |
handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part,
ContentHandler handler)
Handles an embedded file in the document |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected org.apache.poi.POIXMLTextExtractor extractor
Constructor Detail |
---|
public AbstractOOXMLExtractor(ParseContext context, org.apache.poi.POIXMLTextExtractor extractor)
Method Detail |
---|
public org.apache.poi.POIXMLDocument getDocument()
OOXMLExtractor
getDocument
in interface OOXMLExtractor
OOXMLExtractor.getDocument()
public MetadataExtractor getMetadataExtractor()
OOXMLExtractor
POIXMLTextExtractor.getMetadataTextExtractor()
not yet supported
for OOXML by POI.
getMetadataExtractor
in interface OOXMLExtractor
OOXMLExtractor.getMetadataExtractor()
public void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException
OOXMLExtractor
getXHTML
in interface OOXMLExtractor
SAXException
org.apache.xmlbeans.XmlException
IOException
TikaException
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractor#getXHTML(org.xml.sax.ContentHandler,
org.apache.tika.metadata.Metadata)
protected void handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part, ContentHandler handler) throws SAXException, IOException
SAXException
IOException
protected abstract void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException
XHTMLContentHandler
object received as parameter.
SAXException
org.apache.xmlbeans.XmlException
IOException
protected abstract List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts() throws TikaException
TikaException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |