Package org.apache.tika.parser.microsoft
Class AbstractOfficeParser
- java.lang.Object
- 
- org.apache.tika.parser.AbstractParser
- 
- org.apache.tika.parser.microsoft.AbstractOfficeParser
 
 
- 
- All Implemented Interfaces:
- Serializable,- Parser
 - Direct Known Subclasses:
- OfficeParser,- OOXMLParser,- Word2006MLParser
 
 public abstract class AbstractOfficeParser extends AbstractParser Intermediate layer to setOfficeParserConfiguniformly.- See Also:
- Serialized Form
 
- 
- 
Constructor SummaryConstructors Constructor Description AbstractOfficeParser()
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description voidconfigure(ParseContext parseContext)Checks to see if the user has specified anOfficeParserConfig.intgetByteArrayMaxOverride()StringgetDateFormatOverride()booleanisConcatenatePhoneticRuns()booleanisExtractAllAlternativesFromMSG()booleanisExtractMacros()booleanisIncludeDeletedContent()booleanisIncludeHeadersAndFooters()booleanisIncludeMoveFromContent()booleanisIncludeShapeBasedContent()booleanisUseSAXDocxExtractor()booleanisUseSAXPptxExtractor()voidsetByteArrayMaxOverride(int maxOverride)WARNING: this sets a static variable in POI.voidsetConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)voidsetDateFormatOverride(String format)voidsetExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)Some .msg files can contain body content in html, rtf and/or text.voidsetExtractMacros(boolean extractMacros)voidsetIncludeDeletedContent(boolean includeDeletedConent)voidsetIncludeHeadersAndFooters(boolean includeHeadersAndFooters)voidsetIncludeMoveFromContent(boolean includeMoveFromContent)voidsetIncludeShapeBasedContent(boolean includeShapeBasedContent)voidsetUseSAXDocxExtractor(boolean useSAXDocxExtractor)voidsetUseSAXPptxExtractor(boolean useSAXPptxExtractor)- 
Methods inherited from class org.apache.tika.parser.AbstractParserparse
 - 
Methods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 - 
Methods inherited from interface org.apache.tika.parser.ParsergetSupportedTypes, parse
 
- 
 
- 
- 
- 
Method Detail- 
configurepublic void configure(ParseContext parseContext) Checks to see if the user has specified anOfficeParserConfig. If so, no changes are made; if not, one is added to the context.- Parameters:
- parseContext-
 
 - 
isIncludeDeletedContentpublic boolean isIncludeDeletedContent() - Returns:
- See Also:
- OfficeParserConfig.isIncludeDeletedContent()
 
 - 
setIncludeDeletedContent@Field public void setIncludeDeletedContent(boolean includeDeletedConent) 
 - 
isIncludeMoveFromContentpublic boolean isIncludeMoveFromContent() - Returns:
- See Also:
- OfficeParserConfig.isIncludeMoveFromContent()
 
 - 
setIncludeMoveFromContent@Field public void setIncludeMoveFromContent(boolean includeMoveFromContent) 
 - 
isUseSAXDocxExtractorpublic boolean isUseSAXDocxExtractor() - Returns:
- See Also:
- OfficeParserConfig.isUseSAXDocxExtractor()
 
 - 
setUseSAXDocxExtractor@Field public void setUseSAXDocxExtractor(boolean useSAXDocxExtractor) 
 - 
isExtractMacrospublic boolean isExtractMacros() - Returns:
- whether or not to extract macros
- See Also:
- OfficeParserConfig.isExtractMacros()
 
 - 
setExtractMacros@Field public void setExtractMacros(boolean extractMacros) 
 - 
setIncludeShapeBasedContent@Field public void setIncludeShapeBasedContent(boolean includeShapeBasedContent) 
 - 
isIncludeShapeBasedContentpublic boolean isIncludeShapeBasedContent() 
 - 
setUseSAXPptxExtractor@Field public void setUseSAXPptxExtractor(boolean useSAXPptxExtractor) 
 - 
isUseSAXPptxExtractorpublic boolean isUseSAXPptxExtractor() 
 - 
setConcatenatePhoneticRuns@Field public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns) 
 - 
isConcatenatePhoneticRunspublic boolean isConcatenatePhoneticRuns() 
 - 
isExtractAllAlternativesFromMSGpublic boolean isExtractAllAlternativesFromMSG() 
 - 
setExtractAllAlternativesFromMSG@Field public void setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG) Some .msg files can contain body content in html, rtf and/or text. The default behavior is to pick the first non-null value and include only that. If you'd like to extract all non-null body content, which is likely duplicative, set this value to true.- Parameters:
- extractAllAlternativesFromMSG- whether or not to extract all alternative parts from msg files
- Since:
- 1.17
 
 - 
setByteArrayMaxOverride@Field public void setByteArrayMaxOverride(int maxOverride) WARNING: this sets a static variable in POI. This allows users to override POI's protection of the allocation of overly large byte arrays. Use carefully; and please open up issues on POI's bugzilla to bump values for specific records. If the value is <&eq; 0, this value is ignored- Parameters:
- maxOverride-
 
 - 
getByteArrayMaxOverridepublic int getByteArrayMaxOverride() 
 - 
getDateFormatOverridepublic String getDateFormatOverride() 
 - 
setIncludeHeadersAndFooters@Field public void setIncludeHeadersAndFooters(boolean includeHeadersAndFooters) 
 - 
isIncludeHeadersAndFooterspublic boolean isIncludeHeadersAndFooters() 
 
- 
 
-