Package org.apache.tika.parser.microsoft.ooxml
package org.apache.tika.parser.microsoft.ooxml
-
ClassDescriptionBase class for all Tika OOXML extractors.This class records metadata about embedded parts that exists in the xml of the main document.Parses OOXML field codes (instrText) to extract URLs from HYPERLINK, INCLUDEPICTURE, INCLUDETEXT, IMPORT, and LINK fields.OOXML metadata extractor base class.Interface implemented by all Tika OOXML extractors.Figures out the correct
OOXMLExtractorfor the supplied document and returns it.Office Open XML (OOXML) parser.This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.This is a wrapper around OPCPackage that calls revert() instead of close().WARNING: This class is mutable.SAX/Streaming pptx extractiorThis is an experimental, alternative extractor for docx files.SAX-based extractor for Visio OOXML (.vsdx) files.Turns formatted sheet events into HTMLCaptures information on interesting tags, whilst delegating the main work to the formatting handlerCallback interface for receiving structured document events from the OOXML SAX dispatcher.