Modifier and Type | Method and Description |
---|---|
void |
HtmlParser.setExtractScripts(boolean extractScripts)
Whether or not to extract contents in script entities.
|
void |
HtmlEncodingDetector.setMarkLimit(int markLimit)
How far into the stream to read for charset detection.
|
Modifier and Type | Method and Description |
---|---|
void |
StandardHtmlEncodingDetector.setMarkLimit(int markLimit)
How far into the stream to read for charset detection.
|
Modifier and Type | Method and Description |
---|---|
void |
AbstractOfficeParser.setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns) |
void |
AbstractOfficeParser.setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)
Some .msg files can contain body content in html, rtf and/or text.
|
void |
AbstractOfficeParser.setExtractMacros(boolean extractMacros) |
void |
AbstractOfficeParser.setIncludeDeletedContent(boolean includeDeletedConent) |
void |
AbstractOfficeParser.setIncludeMoveFromContent(boolean includeMoveFromContent) |
void |
AbstractOfficeParser.setIncludeShapeBasedContent(boolean includeShapeBasedContent) |
void |
AbstractOfficeParser.setUseSAXDocxExtractor(boolean useSAXDocxExtractor) |
void |
AbstractOfficeParser.setUseSAXPptxExtractor(boolean useSAXPptxExtractor) |
Modifier and Type | Method and Description |
---|---|
void |
TesseractOCRParser.setApplyRotation(boolean applyRotation) |
void |
TesseractOCRParser.setColorspace(String colorspace) |
void |
TesseractOCRParser.setDensity(int density) |
void |
TesseractOCRParser.setDepth(int depth) |
void |
TesseractOCRParser.setEnableImageProcessing(int enableImageProcessing) |
void |
TesseractOCRParser.setFilter(String filter) |
void |
TesseractOCRParser.setImageMagickPath(String imageMagickPath) |
void |
TesseractOCRParser.setLanguage(String language) |
void |
TesseractOCRParser.setMinFileSizeToOcr(long minFileSizeToOcr) |
void |
TesseractOCRParser.setOutputType(String outputType) |
void |
TesseractOCRParser.setPageSegMode(String pageSegMode) |
void |
TesseractOCRParser.setPreserveInterwordSpacing(boolean preserveInterwordSpacing) |
void |
TesseractOCRParser.setResize(int resize) |
void |
TesseractOCRParser.setTessdataPath(String tessdataPath) |
void |
TesseractOCRParser.setTesseractPath(String tesseractPath) |
void |
TesseractOCRParser.setTimeout(int timeout) |
Modifier and Type | Method and Description |
---|---|
void |
PDFParser.setOcrImageType(String imageType) |
void |
PDFParser.setOcrStrategy(String ocrStrategyString) |
void |
PDFParser.setSortByPosition(boolean v)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
void |
CompressorParser.setMemoryLimitInKb(int memoryLimitInKb) |
Modifier and Type | Method and Description |
---|---|
void |
ObjectRecognitionParser.setRecogniser(String recogniserClass) |
Modifier and Type | Field and Description |
---|---|
protected URI |
TensorflowRESTRecogniser.apiBaseUri |
protected double |
TensorflowRESTRecogniser.minConfidence |
protected int |
TensorflowRESTRecogniser.topN |
Modifier and Type | Method and Description |
---|---|
void |
RTFParser.setMemoryLimitInKb(int memoryLimitInKb) |
Modifier and Type | Method and Description |
---|---|
void |
UniversalEncodingDetector.setMarkLimit(int markLimit)
How far into the stream to read for charset detection.
|
void |
Icu4jEncodingDetector.setMarkLimit(int markLimit)
How far into the stream to read for charset detection.
|
void |
Icu4jEncodingDetector.setStripMarkup(boolean stripMarkup)
Whether or not to attempt to strip html-ish markup
from the stream before sending it to the underlying
detector.
|
Modifier and Type | Method and Description |
---|---|
void |
WordPerfectParser.setIncludeDeletedContent(boolean includeDeletedContent)
Whether or not to include deleted content.
|
Copyright © 2007–2019 The Apache Software Foundation. All rights reserved.