Package org.apache.tika.parser.microsoft
Class ExcelExtractor
- java.lang.Object
- 
- org.apache.tika.parser.microsoft.ExcelExtractor
 
- 
 public class ExcelExtractor extends Object Excel parser implementation which uses POI's Event API to handle the contents of a Workbook. The Event API uses a much smaller memory footprint thanHSSFWorkbookwhen processing excel files but at the cost of more complexity. With the Event API a listener is registered for specific record types and those records are created, fired off to the listener and then discarded as the stream is being processed.- See Also:
- HSSFListener, POI Event API How To
 
- 
- 
Field SummaryFields Modifier and Type Field Description protected ParseContextcontextprotected OfficeParserConfigofficeParserConfigprotected MetadataparentMetadata
 - 
Constructor SummaryConstructors Constructor Description ExcelExtractor(ParseContext context, Metadata metadata)
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected DetectorgetDetector()protected MimeTypesgetMimeTypes()Deprecated.protected StringgetPassword()Returns the password to be used for this file, or null if no / default password should be usedprotected TikaConfiggetTikaConfig()protected voidhandleEmbeddedOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir, String resourceName, XHTMLContentHandler xhtml, boolean outputHtml)Handle an office document that's embedded at the POIFS levelprotected voidhandleEmbeddedOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir, XHTMLContentHandler xhtml, boolean outputHtml)Handle an office document that's embedded at the POIFS levelprotected voidhandleEmbeddedResource(TikaInputStream resource, String filename, String relationshipID, String mediaType, XHTMLContentHandler xhtml, boolean outputHtml)protected voidhandleEmbeddedResource(TikaInputStream resource, String filename, String relationshipID, org.apache.poi.hpsf.ClassID storageClassID, String mediaType, XHTMLContentHandler xhtml, boolean outputHtml)protected voidhandleEmbeddedResource(TikaInputStream resource, Metadata embeddedMetadata, String filename, String relationshipID, org.apache.poi.hpsf.ClassID storageClassID, String mediaType, XHTMLContentHandler xhtml, boolean outputHtml)booleanisListenForAllRecords()Returnstrueif this parser is configured to listen for all records instead of just the specified few.protected voidparse(org.apache.poi.poifs.filesystem.DirectoryNode root, XHTMLContentHandler xhtml, Locale locale)protected voidparse(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem, XHTMLContentHandler xhtml, Locale locale)Extracts text from an Excel Workbook writing the extracted content to the specifiedAppendable.voidsetListenForAllRecords(boolean listenForAllRecords)Specifies whether this parser should to listen for all records or just for the specified few.
 
- 
- 
- 
Field Detail- 
parentMetadataprotected final Metadata parentMetadata 
 - 
officeParserConfigprotected final OfficeParserConfig officeParserConfig 
 - 
contextprotected final ParseContext context 
 
- 
 - 
Constructor Detail- 
ExcelExtractorpublic ExcelExtractor(ParseContext context, Metadata metadata) 
 
- 
 - 
Method Detail- 
isListenForAllRecordspublic boolean isListenForAllRecords() Returnstrueif this parser is configured to listen for all records instead of just the specified few.
 - 
setListenForAllRecordspublic void setListenForAllRecords(boolean listenForAllRecords) Specifies whether this parser should to listen for all records or just for the specified few. Note: Under normal operation this setting should befalse(the default), but you can experiment with this setting for testing and debugging purposes.- Parameters:
- listenForAllRecords-- trueif the HSSFListener should be registered to listen for all records or- falseif the listener should be configured to only receive specified records.
 
 - 
parseprotected void parse(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem, XHTMLContentHandler xhtml, Locale locale) throws IOException, SAXException, TikaExceptionExtracts text from an Excel Workbook writing the extracted content to the specifiedAppendable.- Parameters:
- filesystem- POI file system
- Throws:
- IOException- if an error occurs processing the workbook or writing the extracted content
- SAXException
- TikaException
 
 - 
parseprotected void parse(org.apache.poi.poifs.filesystem.DirectoryNode root, XHTMLContentHandler xhtml, Locale locale) throws IOException, SAXException, TikaException- Throws:
- IOException
- SAXException
- TikaException
 
 - 
getTikaConfigprotected TikaConfig getTikaConfig() 
 - 
getDetectorprotected Detector getDetector() 
 - 
getMimeTypesprotected MimeTypes getMimeTypes() Deprecated.- Returns:
- mimetypes
 
 - 
getPasswordprotected String getPassword() Returns the password to be used for this file, or null if no / default password should be used
 - 
handleEmbeddedResourceprotected void handleEmbeddedResource(TikaInputStream resource, String filename, String relationshipID, String mediaType, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaException - Throws:
- IOException
- SAXException
- TikaException
 
 - 
handleEmbeddedResourceprotected void handleEmbeddedResource(TikaInputStream resource, String filename, String relationshipID, org.apache.poi.hpsf.ClassID storageClassID, String mediaType, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaException - Throws:
- IOException
- SAXException
- TikaException
 
 - 
handleEmbeddedResourceprotected void handleEmbeddedResource(TikaInputStream resource, Metadata embeddedMetadata, String filename, String relationshipID, org.apache.poi.hpsf.ClassID storageClassID, String mediaType, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaException - Throws:
- IOException
- SAXException
- TikaException
 
 - 
handleEmbeddedOfficeDocprotected void handleEmbeddedOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaExceptionHandle an office document that's embedded at the POIFS level- Throws:
- IOException
- SAXException
- TikaException
 
 - 
handleEmbeddedOfficeDocprotected void handleEmbeddedOfficeDoc(org.apache.poi.poifs.filesystem.DirectoryEntry dir, String resourceName, XHTMLContentHandler xhtml, boolean outputHtml) throws IOException, SAXException, TikaExceptionHandle an office document that's embedded at the POIFS level- Throws:
- IOException
- SAXException
- TikaException
 
 
- 
 
-