Class XWPFFeatureExtractor

java.lang.Object
org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFFeatureExtractor

public class XWPFFeatureExtractor extends Object
This is designed to extract features that are useful for forensics, e-discovery and digital preservation. Specifically, the presence of: tracked changes, hidden text, comments and comment authors. Because several of these features can be placed on run properties, which can be in lots of places, we're scraping the document xml
  • Constructor Details

    • XWPFFeatureExtractor

      public XWPFFeatureExtractor()
  • Method Details

    • process

      public void process(org.apache.poi.xwpf.usermodel.XWPFDocument xwpfDocument, Metadata metadata, ParseContext parseContext)
    • process

      public void process(org.apache.poi.openxml4j.opc.PackagePart packagePart, Metadata metadata, ParseContext parseContext)