Package org.apache.tika.parser.microsoft.ooxml
package org.apache.tika.parser.microsoft.ooxml
-
ClassDescriptionBase class for all Tika OOXML extractors.This class records metadata about embedded parts that exists in the xml of the main document.OOXML metadata extractor.Interface implemented by all Tika OOXML extractors.Figures out the correct
OOXMLExtractor
for the supplied document and returns it.Office Open XML (OOXML) parser.This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.This is a wrapper around OPCPackage that calls revert() instead of close().WARNING: This class is mutable.SAX/Streaming pptx extractiorThis is an experimental, alternative extractor for docx files.Turns formatted sheet events into HTMLCaptures information on interesting tags, whilst delegating the main work to the formatting handler