Class XSSFExcelExtractorDecorator
java.lang.Object
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- All Implemented Interfaces:
OOXMLExtractor
- Direct Known Subclasses:
XSSFBExcelExtractorDecorator
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static classprotected static classTurns formatted sheet events into HTMLprotected static classCaptures information on interesting tags, whilst delegating the main work to the formatting handler -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final org.apache.poi.ss.usermodel.DataFormatterprotected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelperAllows access to headers/footers from raw xml stringsprotected Metadataprotected ParseContextprotected final List<org.apache.poi.openxml4j.opc.PackagePart>Fields inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
config, EMBEDDED_RELATIONSHIPS, extractor -
Constructor Summary
ConstructorsConstructorDescriptionXSSFExcelExtractorDecorator(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale) -
Method Summary
Modifier and TypeMethodDescriptionprotected voidaddDrawingHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart) protected voidbuildXHTML(XHTMLContentHandler xhtml) Populates theXHTMLContentHandlerobject received as parameter.protected voidconfigureExtractor(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale) protected voidextractHeaderFooter(String hf, XHTMLContentHandler xhtml) protected voidextractHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart, XHTMLContentHandler xhtml) protected List<org.apache.poi.openxml4j.opc.PackagePart>In Excel files, sheets have things embedded in them, and sheet drawings which have the imagesvoidgetXHTML(ContentHandler handler, Metadata metadata, ParseContext context) Parses the document into a sequence of XHTML SAX events sent to the given content handler.protected voidprocessShapes(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, XHTMLContentHandler xhtml) voidprocessSheet(org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler sheetContentsHandler, org.apache.poi.xssf.model.Comments comments, org.apache.poi.xssf.model.StylesTable styles, org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable strings, InputStream sheetInputStream) Methods inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
getDocument, getEmbeddedPartMetadataMap, getJustFileName, getMetadataExtractor, handleEmbeddedFile, loadLinkedRelationships
-
Field Details
-
hfHelper
protected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelper hfHelperAllows access to headers/footers from raw xml strings -
formatter
protected final org.apache.poi.ss.usermodel.DataFormatter formatter -
sheetParts
-
drawingHyperlinks
-
metadata
-
parseContext
-
-
Constructor Details
-
XSSFExcelExtractorDecorator
public XSSFExcelExtractorDecorator(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
-
-
Method Details
-
configureExtractor
protected void configureExtractor(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale) -
getXHTML
public void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException Description copied from interface:OOXMLExtractorParses the document into a sequence of XHTML SAX events sent to the given content handler.- Specified by:
getXHTMLin interfaceOOXMLExtractor- Overrides:
getXHTMLin classAbstractOOXMLExtractor- Throws:
SAXExceptionorg.apache.xmlbeans.XmlExceptionIOExceptionTikaException- See Also:
-
buildXHTML
protected void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException Description copied from class:AbstractOOXMLExtractorPopulates theXHTMLContentHandlerobject received as parameter.- Specified by:
buildXHTMLin classAbstractOOXMLExtractor- Throws:
SAXExceptionorg.apache.xmlbeans.XmlExceptionIOException- See Also:
-
XSSFExcelExtractor.getText()
-
addDrawingHyperLinks
protected void addDrawingHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart) -
extractHyperLinks
protected void extractHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart, XHTMLContentHandler xhtml) throws SAXException - Throws:
SAXException
-
processShapes
protected void processShapes(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, XHTMLContentHandler xhtml) throws SAXException - Throws:
SAXException
-
getMainDocumentParts
protected List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts() throws TikaExceptionIn Excel files, sheets have things embedded in them, and sheet drawings which have the images- Specified by:
getMainDocumentPartsin classAbstractOOXMLExtractor- Throws:
TikaException
-