public class OfficeParserConfig extends Object implements Serializable
Constructor and Description |
---|
OfficeParserConfig() |
Modifier and Type | Method and Description |
---|---|
boolean |
getConcatenatePhoneticRuns() |
boolean |
getExtractAllAlternativesFromMSG() |
boolean |
getExtractMacros() |
boolean |
getIncludeDeletedContent() |
boolean |
getIncludeHeadersAndFooters() |
boolean |
getIncludeMoveFromContent() |
boolean |
getIncludeShapeBasedContent() |
boolean |
getUseSAXDocxExtractor() |
boolean |
getUseSAXPptxExtractor() |
void |
setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Microsoft Excel files can sometimes contain phonetic (furigana) strings.
|
void |
setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)
Some .msg files can contain body content in html, rtf and/or text.
|
void |
setExtractMacros(boolean extractMacros)
Sets whether or not MSOffice parsers should extract macros.
|
void |
setIncludeDeletedContent(boolean includeDeletedContent)
Sets whether or not the parser should include deleted content.
|
void |
setIncludeHeadersAndFooters(boolean includeHeadersAndFooters)
Whether or not to include headers and footers.
|
void |
setIncludeMoveFromContent(boolean includeMoveFromContent)
With track changes on, when a section is moved, the content
is stored in both the "moveFrom" section and in the "moveTo" section.
|
void |
setIncludeShapeBasedContent(boolean includeShapeBasedContent)
In Excel and Word, there can be text stored within drawing shapes.
|
void |
setUseSAXDocxExtractor(boolean useSAXDocxExtractor)
Use the experimental SAX-based streaming DOCX parser?
If set to
false , the classic parser will be used; if true ,
the new experimental parser will be used. |
void |
setUseSAXPptxExtractor(boolean useSAXPptxExtractor)
Use the experimental SAX-based streaming DOCX parser?
If set to
false , the classic parser will be used; if true ,
the new experimental parser will be used. |
public void setExtractMacros(boolean extractMacros)
false
.extractMacros
- public boolean getExtractMacros()
public void setIncludeDeletedContent(boolean includeDeletedContent)
SXWPFWordExtractorDecorator
so far!!!includeDeletedContent
- public boolean getIncludeDeletedContent()
public void setIncludeMoveFromContent(boolean includeMoveFromContent)
true
Default: false
This has only been implemented in the streaming docx parser
(SXWPFWordExtractorDecorator
so far!!!includeMoveFromContent
- public boolean getIncludeMoveFromContent()
public void setIncludeShapeBasedContent(boolean includeShapeBasedContent)
false
Default: true
includeShapeBasedContent
- public boolean getIncludeShapeBasedContent()
public void setIncludeHeadersAndFooters(boolean includeHeadersAndFooters)
true
includeHeadersAndFooters
- public boolean getIncludeHeadersAndFooters()
public boolean getUseSAXDocxExtractor()
public void setUseSAXDocxExtractor(boolean useSAXDocxExtractor)
false
, the classic parser will be used; if true
,
the new experimental parser will be used.
Default: false
(classic DOM parser)useSAXDocxExtractor
- public void setUseSAXPptxExtractor(boolean useSAXPptxExtractor)
false
, the classic parser will be used; if true
,
the new experimental parser will be used.
Default: false
(classic DOM parser)useSAXPptxExtractor
- public boolean getUseSAXPptxExtractor()
public boolean getConcatenatePhoneticRuns()
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
This is currently only supported by the xls and xlsx parsers (not the xlsb parser),
and the default is true
.
concatenatePhoneticRuns
- public void setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)
extractAllAlternativesFromMSG
- whether or not to extract all alternative partspublic boolean getExtractAllAlternativesFromMSG()
Copyright © 2007–2018 The Apache Software Foundation. All rights reserved.