Class XSSFExcelExtractorDecorator
- java.lang.Object
-
- org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
-
- org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
-
- All Implemented Interfaces:
OOXMLExtractor
- Direct Known Subclasses:
XSSFBExcelExtractorDecorator
public class XSSFExcelExtractorDecorator extends AbstractOOXMLExtractor
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
XSSFExcelExtractorDecorator.HeaderFooterFromString
protected static class
XSSFExcelExtractorDecorator.SheetTextAsHTML
Turns formatted sheet events into HTMLprotected static class
XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
Captures information on interesting tags, whilst delegating the main work to the formatting handler
-
Field Summary
Fields Modifier and Type Field Description protected Map<String,String>
drawingHyperlinks
protected org.apache.poi.ss.usermodel.DataFormatter
formatter
protected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelper
hfHelper
Allows access to headers/footers from raw xml stringsprotected Metadata
metadata
protected ParseContext
parseContext
protected List<org.apache.poi.openxml4j.opc.PackagePart>
sheetParts
-
Fields inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
config, EMBEDDED_RELATIONSHIPS, extractor
-
-
Constructor Summary
Constructors Constructor Description XSSFExcelExtractorDecorator(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addDrawingHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart)
protected void
buildXHTML(XHTMLContentHandler xhtml)
Populates theXHTMLContentHandler
object received as parameter.protected void
configureExtractor(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
protected void
extractHeaderFooter(String hf, XHTMLContentHandler xhtml)
protected void
extractHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart, XHTMLContentHandler xhtml)
protected List<org.apache.poi.openxml4j.opc.PackagePart>
getMainDocumentParts()
In Excel files, sheets have things embedded in them, and sheet drawings which have the imagesvoid
getXHTML(ContentHandler handler, Metadata metadata, ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the given content handler.protected void
processShapes(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, XHTMLContentHandler xhtml)
void
processSheet(org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler sheetContentsHandler, org.apache.poi.xssf.model.Comments comments, org.apache.poi.xssf.model.StylesTable styles, org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable strings, InputStream sheetInputStream)
-
Methods inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
getDocument, getEmbeddedPartMetadataMap, getJustFileName, getMetadataExtractor, handleEmbeddedFile, loadLinkedRelationships
-
-
-
-
Field Detail
-
hfHelper
protected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelper hfHelper
Allows access to headers/footers from raw xml strings
-
formatter
protected final org.apache.poi.ss.usermodel.DataFormatter formatter
-
sheetParts
protected final List<org.apache.poi.openxml4j.opc.PackagePart> sheetParts
-
metadata
protected Metadata metadata
-
parseContext
protected ParseContext parseContext
-
-
Constructor Detail
-
XSSFExcelExtractorDecorator
public XSSFExcelExtractorDecorator(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
-
-
Method Detail
-
configureExtractor
protected void configureExtractor(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
-
getXHTML
public void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException
Description copied from interface:OOXMLExtractor
Parses the document into a sequence of XHTML SAX events sent to the given content handler.- Specified by:
getXHTML
in interfaceOOXMLExtractor
- Overrides:
getXHTML
in classAbstractOOXMLExtractor
- Throws:
SAXException
org.apache.xmlbeans.XmlException
IOException
TikaException
- See Also:
OOXMLExtractor.getXHTML(ContentHandler, Metadata, ParseContext)
-
buildXHTML
protected void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException
Description copied from class:AbstractOOXMLExtractor
Populates theXHTMLContentHandler
object received as parameter.- Specified by:
buildXHTML
in classAbstractOOXMLExtractor
- Throws:
SAXException
org.apache.xmlbeans.XmlException
IOException
- See Also:
XSSFExcelExtractor.getText()
-
addDrawingHyperLinks
protected void addDrawingHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart)
-
extractHyperLinks
protected void extractHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart, XHTMLContentHandler xhtml) throws SAXException
- Throws:
SAXException
-
extractHeaderFooter
protected void extractHeaderFooter(String hf, XHTMLContentHandler xhtml) throws SAXException
- Throws:
SAXException
-
processShapes
protected void processShapes(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, XHTMLContentHandler xhtml) throws SAXException
- Throws:
SAXException
-
processSheet
public void processSheet(org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler sheetContentsHandler, org.apache.poi.xssf.model.Comments comments, org.apache.poi.xssf.model.StylesTable styles, org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException
- Throws:
IOException
SAXException
-
getMainDocumentParts
protected List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts() throws TikaException
In Excel files, sheets have things embedded in them, and sheet drawings which have the images- Specified by:
getMainDocumentParts
in classAbstractOOXMLExtractor
- Throws:
TikaException
-
-