Class XSSFExcelExtractorDecorator
- java.lang.Object
-
- org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
-
- org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
-
- All Implemented Interfaces:
OOXMLExtractor
- Direct Known Subclasses:
XSSFBExcelExtractorDecorator
public class XSSFExcelExtractorDecorator extends AbstractOOXMLExtractor
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classXSSFExcelExtractorDecorator.HeaderFooterFromStringprotected static classXSSFExcelExtractorDecorator.SheetTextAsHTMLTurns formatted sheet events into HTMLprotected static classXSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturerCaptures information on interesting tags, whilst delegating the main work to the formatting handler
-
Field Summary
Fields Modifier and Type Field Description protected Map<String,String>drawingHyperlinksprotected org.apache.poi.ss.usermodel.DataFormatterformatterprotected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelperhfHelperAllows access to headers/footers from raw xml stringsprotected Metadatametadataprotected ParseContextparseContextprotected List<org.apache.poi.openxml4j.opc.PackagePart>sheetParts-
Fields inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
config, EMBEDDED_RELATIONSHIPS, extractor
-
-
Constructor Summary
Constructors Constructor Description XSSFExcelExtractorDecorator(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidaddDrawingHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart)protected voidbuildXHTML(XHTMLContentHandler xhtml)Populates theXHTMLContentHandlerobject received as parameter.protected voidconfigureExtractor(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)protected voidextractHeaderFooter(String hf, XHTMLContentHandler xhtml)protected voidextractHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart, XHTMLContentHandler xhtml)protected List<org.apache.poi.openxml4j.opc.PackagePart>getMainDocumentParts()In Excel files, sheets have things embedded in them, and sheet drawings which have the imagesvoidgetXHTML(ContentHandler handler, Metadata metadata, ParseContext context)Parses the document into a sequence of XHTML SAX events sent to the given content handler.protected voidprocessShapes(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, XHTMLContentHandler xhtml)voidprocessSheet(org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler sheetContentsHandler, org.apache.poi.xssf.model.Comments comments, org.apache.poi.xssf.model.StylesTable styles, org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable strings, InputStream sheetInputStream)-
Methods inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
getDocument, getJustFileName, getMetadataExtractor, handleEmbeddedFile, loadLinkedRelationships
-
-
-
-
Field Detail
-
hfHelper
protected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelper hfHelper
Allows access to headers/footers from raw xml strings
-
formatter
protected final org.apache.poi.ss.usermodel.DataFormatter formatter
-
sheetParts
protected final List<org.apache.poi.openxml4j.opc.PackagePart> sheetParts
-
metadata
protected Metadata metadata
-
parseContext
protected ParseContext parseContext
-
-
Constructor Detail
-
XSSFExcelExtractorDecorator
public XSSFExcelExtractorDecorator(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
-
-
Method Detail
-
configureExtractor
protected void configureExtractor(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
-
getXHTML
public void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException
Description copied from interface:OOXMLExtractorParses the document into a sequence of XHTML SAX events sent to the given content handler.- Specified by:
getXHTMLin interfaceOOXMLExtractor- Overrides:
getXHTMLin classAbstractOOXMLExtractor- Throws:
SAXExceptionorg.apache.xmlbeans.XmlExceptionIOExceptionTikaException- See Also:
OOXMLExtractor.getXHTML(ContentHandler, Metadata, ParseContext)
-
buildXHTML
protected void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException
Description copied from class:AbstractOOXMLExtractorPopulates theXHTMLContentHandlerobject received as parameter.- Specified by:
buildXHTMLin classAbstractOOXMLExtractor- Throws:
SAXExceptionorg.apache.xmlbeans.XmlExceptionIOException- See Also:
XSSFExcelExtractor.getText()
-
addDrawingHyperLinks
protected void addDrawingHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart)
-
extractHyperLinks
protected void extractHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart, XHTMLContentHandler xhtml) throws SAXException- Throws:
SAXException
-
extractHeaderFooter
protected void extractHeaderFooter(String hf, XHTMLContentHandler xhtml) throws SAXException
- Throws:
SAXException
-
processShapes
protected void processShapes(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, XHTMLContentHandler xhtml) throws SAXException
- Throws:
SAXException
-
processSheet
public void processSheet(org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler sheetContentsHandler, org.apache.poi.xssf.model.Comments comments, org.apache.poi.xssf.model.StylesTable styles, org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException- Throws:
IOExceptionSAXException
-
getMainDocumentParts
protected List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts() throws TikaException
In Excel files, sheets have things embedded in them, and sheet drawings which have the images- Specified by:
getMainDocumentPartsin classAbstractOOXMLExtractor- Throws:
TikaException
-
-