public class ExcelExtractor extends Object
HSSFWorkbook
when processing excel files
but at the cost of more complexity.
With the Event API a listener is registered for
specific record types and those records are created,
fired off to the listener and then discarded as the stream
is being processed.HSSFListener
,
POI Event API How ToModifier and Type | Field and Description |
---|---|
protected ParseContext |
context |
protected OfficeParserConfig |
officeParserConfig |
protected Metadata |
parentMetadata |
Constructor and Description |
---|
ExcelExtractor(ParseContext context,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
protected Detector |
getDetector() |
protected MimeTypes |
getMimeTypes()
Deprecated.
|
protected String |
getPassword()
Returns the password to be used for this file, or null
if no / default password should be used
|
protected TikaConfig |
getTikaConfig() |
protected void |
handleEmbeddedOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
String resourceName,
XHTMLContentHandler xhtml,
boolean outputHtml)
Handle an office document that's embedded at the POIFS level
|
protected void |
handleEmbeddedOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
XHTMLContentHandler xhtml,
boolean outputHtml)
Handle an office document that's embedded at the POIFS level
|
protected void |
handleEmbeddedResource(TikaInputStream resource,
Metadata embeddedMetadata,
String filename,
String relationshipID,
org.apache.poi.hpsf.ClassID storageClassID,
String mediaType,
XHTMLContentHandler xhtml,
boolean outputHtml) |
protected void |
handleEmbeddedResource(TikaInputStream resource,
String filename,
String relationshipID,
org.apache.poi.hpsf.ClassID storageClassID,
String mediaType,
XHTMLContentHandler xhtml,
boolean outputHtml) |
protected void |
handleEmbeddedResource(TikaInputStream resource,
String filename,
String relationshipID,
String mediaType,
XHTMLContentHandler xhtml,
boolean outputHtml) |
boolean |
isListenForAllRecords()
Returns
true if this parser is configured to listen
for all records instead of just the specified few. |
protected void |
parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
XHTMLContentHandler xhtml,
Locale locale) |
protected void |
parse(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem,
XHTMLContentHandler xhtml,
Locale locale)
Extracts text from an Excel Workbook writing the extracted content
to the specified
Appendable . |
void |
setListenForAllRecords(boolean listenForAllRecords)
Specifies whether this parser should to listen for all
records or just for the specified few.
|
protected final Metadata parentMetadata
protected final OfficeParserConfig officeParserConfig
protected final ParseContext context
public ExcelExtractor(ParseContext context, Metadata metadata)
public boolean isListenForAllRecords()
true
if this parser is configured to listen
for all records instead of just the specified few.public void setListenForAllRecords(boolean listenForAllRecords)
false
(the default), but you can experiment with
this setting for testing and debugging purposes.listenForAllRecords
- true
if the HSSFListener
should be registered to listen for all records or
false
if the listener should be configured to only receive specified
records.protected void parse(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem, XHTMLContentHandler xhtml, Locale locale) throws IOException, SAXException, TikaException
Appendable
.filesystem
- POI file systemIOException
- if an error occurs processing the workbook
or writing the extracted contentSAXException
TikaException
protected void parse(org.apache.poi.poifs.filesystem.DirectoryNode root, XHTMLContentHandler xhtml, Locale locale) throws IOException, SAXException, TikaException
IOException
SAXException
TikaException
protected TikaConfig getTikaConfig()
protected Detector getDetector()
protected MimeTypes getMimeTypes()
embeddedDocumentUtil
protected String getPassword()
protected void handleEmbeddedResource(TikaInputStream resource, String filename, String relationshipID, String mediaType, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaException
IOException
SAXException
TikaException
protected void handleEmbeddedResource(TikaInputStream resource, String filename, String relationshipID, org.apache.poi.hpsf.ClassID storageClassID, String mediaType, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaException
IOException
SAXException
TikaException
protected void handleEmbeddedResource(TikaInputStream resource, Metadata embeddedMetadata, String filename, String relationshipID, org.apache.poi.hpsf.ClassID storageClassID, String mediaType, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaException
IOException
SAXException
TikaException
protected void handleEmbeddedOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaException
IOException
SAXException
TikaException
protected void handleEmbeddedOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir, String resourceName, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaException
IOException
SAXException
TikaException
Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.