Class AbstractOfficeParser

    • Constructor Detail

      • AbstractOfficeParser

        public AbstractOfficeParser()
    • Method Detail

      • configure

        public void configure​(ParseContext parseContext)
        Checks to see if the user has specified an OfficeParserConfig. If so, no changes are made; if not, one is added to the context.
        Parameters:
        parseContext -
      • setIncludeDeletedContent

        @Field
        public void setIncludeDeletedContent​(boolean includeDeletedConent)
      • setIncludeMoveFromContent

        @Field
        public void setIncludeMoveFromContent​(boolean includeMoveFromContent)
      • setUseSAXDocxExtractor

        @Field
        public void setUseSAXDocxExtractor​(boolean useSAXDocxExtractor)
      • setExtractMacros

        @Field
        public void setExtractMacros​(boolean extractMacros)
      • setIncludeShapeBasedContent

        @Field
        public void setIncludeShapeBasedContent​(boolean includeShapeBasedContent)
      • isIncludeShapeBasedContent

        public boolean isIncludeShapeBasedContent()
      • setUseSAXPptxExtractor

        @Field
        public void setUseSAXPptxExtractor​(boolean useSAXPptxExtractor)
      • isUseSAXPptxExtractor

        public boolean isUseSAXPptxExtractor()
      • setConcatenatePhoneticRuns

        @Field
        public void setConcatenatePhoneticRuns​(boolean concatenatePhoneticRuns)
      • isConcatenatePhoneticRuns

        public boolean isConcatenatePhoneticRuns()
      • isExtractAllAlternativesFromMSG

        public boolean isExtractAllAlternativesFromMSG()
      • setExtractAllAlternativesFromMSG

        @Field
        public void setExtractAllAlternativesFromMSG​(boolean extractAllAlternativesFromMSG)
        Some .msg files can contain body content in html, rtf and/or text. The default behavior is to pick the first non-null value and include only that. If you'd like to extract all non-null body content, which is likely duplicative, set this value to true.
        Parameters:
        extractAllAlternativesFromMSG - whether or not to extract all alternative parts from msg files
        Since:
        1.17
      • setByteArrayMaxOverride

        @Field
        public void setByteArrayMaxOverride​(int maxOverride)
        WARNING: this sets a static variable in POI. This allows users to override POI's protection of the allocation of overly large byte arrays. Use carefully; and please open up issues on POI's bugzilla to bump values for specific records. If the value is <&eq; 0, this value is ignored
        Parameters:
        maxOverride -
      • getByteArrayMaxOverride

        public int getByteArrayMaxOverride()
      • setDateFormatOverride

        @Field
        public void setDateFormatOverride​(String format)
      • getDateFormatOverride

        public String getDateFormatOverride()
      • setIncludeHeadersAndFooters

        @Field
        public void setIncludeHeadersAndFooters​(boolean includeHeadersAndFooters)
      • isIncludeHeadersAndFooters

        public boolean isIncludeHeadersAndFooters()