|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.tika.parser.microsoft.ExcelExtractor
public class ExcelExtractor
Excel parser implementation which uses POI's Event API to handle the contents of a Workbook.
The Event API uses a much smaller memory footprint than
HSSFWorkbook
when processing excel files
but at the cost of more complexity.
With the Event API a listener is registered for specific record types and those records are created, fired off to the listener and then discarded as the stream is being processed.
HSSFListener
,
POI Event API How ToConstructor Summary | |
---|---|
ExcelExtractor(ParseContext context)
|
Method Summary | |
---|---|
protected void |
copy(org.apache.poi.poifs.filesystem.DirectoryEntry sourceDir,
org.apache.poi.poifs.filesystem.DirectoryEntry destDir)
|
protected void |
handleEmbeddedResource(TikaInputStream resource,
java.lang.String filename,
java.lang.String mediaType,
XHTMLContentHandler xhtml,
boolean outputHtml)
|
protected void |
handleEmbededOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
XHTMLContentHandler xhtml)
Handle an office document that's embedded at the POIFS level |
boolean |
isListenForAllRecords()
Returns true if this parser is configured to listen
for all records instead of just the specified few. |
protected void |
parse(org.apache.poi.poifs.filesystem.NPOIFSFileSystem filesystem,
XHTMLContentHandler xhtml,
java.util.Locale locale)
Extracts text from an Excel Workbook writing the extracted content to the specified Appendable . |
void |
setListenForAllRecords(boolean listenForAllRecords)
Specifies whether this parser should to listen for all records or just for the specified few. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ExcelExtractor(ParseContext context)
Method Detail |
---|
public boolean isListenForAllRecords()
true
if this parser is configured to listen
for all records instead of just the specified few.
public void setListenForAllRecords(boolean listenForAllRecords)
Note: Under normal operation this setting should
be false
(the default), but you can experiment with
this setting for testing and debugging purposes.
listenForAllRecords
- true
if the HSSFListener
should be registered to listen for all records or false
if the listener should be configured to only receive specified records.protected void parse(org.apache.poi.poifs.filesystem.NPOIFSFileSystem filesystem, XHTMLContentHandler xhtml, java.util.Locale locale) throws java.io.IOException, org.xml.sax.SAXException, TikaException
Appendable
.
filesystem
- POI file system
java.io.IOException
- if an error occurs processing the workbook
or writing the extracted content
org.xml.sax.SAXException
TikaException
protected void handleEmbeddedResource(TikaInputStream resource, java.lang.String filename, java.lang.String mediaType, XHTMLContentHandler xhtml, boolean outputHtml) throws java.io.IOException, org.xml.sax.SAXException, TikaException
java.io.IOException
org.xml.sax.SAXException
TikaException
protected void handleEmbededOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir, XHTMLContentHandler xhtml) throws java.io.IOException, org.xml.sax.SAXException, TikaException
java.io.IOException
org.xml.sax.SAXException
TikaException
protected void copy(org.apache.poi.poifs.filesystem.DirectoryEntry sourceDir, org.apache.poi.poifs.filesystem.DirectoryEntry destDir) throws java.io.IOException
java.io.IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |