org.apache.tika.parser.microsoft
Class ExcelExtractor

java.lang.Object
  extended by org.apache.tika.parser.microsoft.ExcelExtractor

public class ExcelExtractor
extends java.lang.Object

Excel parser implementation which uses POI's Event API to handle the contents of a Workbook.

The Event API uses a much smaller memory footprint than HSSFWorkbook when processing excel files but at the cost of more complexity.

With the Event API a listener is registered for specific record types and those records are created, fired off to the listener and then discarded as the stream is being processed.

See Also:
HSSFListener, POI Event API How To

Constructor Summary
ExcelExtractor(ParseContext context)
           
 
Method Summary
protected  void copy(org.apache.poi.poifs.filesystem.DirectoryEntry sourceDir, org.apache.poi.poifs.filesystem.DirectoryEntry destDir)
           
protected  void handleEmbeddedResource(TikaInputStream resource, java.lang.String filename, java.lang.String mediaType, XHTMLContentHandler xhtml, boolean outputHtml)
           
protected  void handleEmbededOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir, XHTMLContentHandler xhtml)
          Handle an office document that's embedded at the POIFS level
 boolean isListenForAllRecords()
          Returns true if this parser is configured to listen for all records instead of just the specified few.
protected  void parse(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem, XHTMLContentHandler xhtml, java.util.Locale locale)
          Extracts text from an Excel Workbook writing the extracted content to the specified Appendable.
 void setListenForAllRecords(boolean listenForAllRecords)
          Specifies whether this parser should to listen for all records or just for the specified few.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExcelExtractor

public ExcelExtractor(ParseContext context)
Method Detail

isListenForAllRecords

public boolean isListenForAllRecords()
Returns true if this parser is configured to listen for all records instead of just the specified few.


setListenForAllRecords

public void setListenForAllRecords(boolean listenForAllRecords)
Specifies whether this parser should to listen for all records or just for the specified few.

Note: Under normal operation this setting should be false (the default), but you can experiment with this setting for testing and debugging purposes.

Parameters:
listenForAllRecords - true if the HSSFListener should be registered to listen for all records or false if the listener should be configured to only receive specified records.

parse

protected void parse(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem,
                     XHTMLContentHandler xhtml,
                     java.util.Locale locale)
              throws java.io.IOException,
                     org.xml.sax.SAXException,
                     TikaException
Extracts text from an Excel Workbook writing the extracted content to the specified Appendable.

Parameters:
filesystem - POI file system
Throws:
java.io.IOException - if an error occurs processing the workbook or writing the extracted content
org.xml.sax.SAXException
TikaException

handleEmbeddedResource

protected void handleEmbeddedResource(TikaInputStream resource,
                                      java.lang.String filename,
                                      java.lang.String mediaType,
                                      XHTMLContentHandler xhtml,
                                      boolean outputHtml)
                               throws java.io.IOException,
                                      org.xml.sax.SAXException,
                                      TikaException
Throws:
java.io.IOException
org.xml.sax.SAXException
TikaException

handleEmbededOfficeDoc

protected void handleEmbededOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
                                      XHTMLContentHandler xhtml)
                               throws java.io.IOException,
                                      org.xml.sax.SAXException,
                                      TikaException
Handle an office document that's embedded at the POIFS level

Throws:
java.io.IOException
org.xml.sax.SAXException
TikaException

copy

protected void copy(org.apache.poi.poifs.filesystem.DirectoryEntry sourceDir,
                    org.apache.poi.poifs.filesystem.DirectoryEntry destDir)
             throws java.io.IOException
Throws:
java.io.IOException


Copyright © 2007-2011 The Apache Software Foundation. All Rights Reserved.