Modifier and Type | Method and Description |
---|---|
void |
FileCommandDetector.setFilePath(String fileCommandPath) |
void |
FileCommandDetector.setMaxBytes(int maxBytes)
If this is not called on a TikaInputStream, this detector
will spool up to this many bytes to a file to be detected
by the 'file' command.
|
void |
FileCommandDetector.setTimeoutMs(long timeoutMs) |
Modifier and Type | Method and Description |
---|---|
void |
ExcludeFieldMetadataFilter.setExclude(List<String> exclude) |
void |
IncludeFieldMetadataFilter.setInclude(List<String> include) |
void |
ClearByMimeMetadataFilter.setMimes(List<String> mimes) |
Modifier and Type | Method and Description |
---|---|
void |
GeoParser.setGazetteerRestEndpoint(String gazetteerRestEndpoint) |
void |
GeoParser.setNerModelUrl(String nerModelUrl) |
Modifier and Type | Method and Description |
---|---|
void |
HtmlParser.setExtractScripts(boolean extractScripts)
Whether or not to extract contents in script entities.
|
void |
HtmlEncodingDetector.setMarkLimit(int markLimit)
How far into the stream to read for charset detection.
|
Modifier and Type | Method and Description |
---|---|
void |
StandardHtmlEncodingDetector.setMarkLimit(int markLimit)
How far into the stream to read for charset detection.
|
Modifier and Type | Method and Description |
---|---|
void |
BPGParser.setMaxRecordLength(int maxRecordLength) |
Modifier and Type | Method and Description |
---|---|
void |
AbstractOfficeParser.setByteArrayMaxOverride(int maxOverride)
WARNING: this sets a static variable in POI.
|
void |
AbstractOfficeParser.setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns) |
void |
AbstractOfficeParser.setDateFormatOverride(String format) |
void |
AbstractOfficeParser.setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)
Some .msg files can contain body content in html, rtf and/or text.
|
void |
AbstractOfficeParser.setExtractMacros(boolean extractMacros) |
void |
AbstractOfficeParser.setIncludeDeletedContent(boolean includeDeletedConent) |
void |
AbstractOfficeParser.setIncludeMoveFromContent(boolean includeMoveFromContent) |
void |
AbstractOfficeParser.setIncludeShapeBasedContent(boolean includeShapeBasedContent) |
void |
AbstractOfficeParser.setUseSAXDocxExtractor(boolean useSAXDocxExtractor) |
void |
AbstractOfficeParser.setUseSAXPptxExtractor(boolean useSAXPptxExtractor) |
Modifier and Type | Method and Description |
---|---|
void |
MP4Parser.setMaxRecordSize(long maxRecordSize)
Override the maximum record size limit.
|
Modifier and Type | Method and Description |
---|---|
void |
TesseractOCRParser.setApplyRotation(boolean applyRotation) |
void |
TesseractOCRParser.setColorspace(String colorspace) |
void |
TesseractOCRParser.setDensity(int density) |
void |
TesseractOCRParser.setDepth(int depth) |
void |
TesseractOCRParser.setEnableImageProcessing(int enableImageProcessing) |
void |
TesseractOCRParser.setFilter(String filter) |
void |
TesseractOCRParser.setImageMagickPath(String imageMagickPath) |
void |
TesseractOCRParser.setLanguage(String language) |
void |
TesseractOCRParser.setMaxFileSizeToOcr(long maxFileSizeToOcr) |
void |
TesseractOCRParser.setMinFileSizeToOcr(long minFileSizeToOcr) |
void |
TesseractOCRParser.setOutputType(String outputType) |
void |
TesseractOCRParser.setPageSegMode(String pageSegMode) |
void |
TesseractOCRParser.setPreserveInterwordSpacing(boolean preserveInterwordSpacing) |
void |
TesseractOCRParser.setResize(int resize) |
void |
TesseractOCRParser.setTessdataPath(String tessdataPath) |
void |
TesseractOCRParser.setTesseractPath(String tesseractPath) |
void |
TesseractOCRParser.setTimeout(int timeout) |
Modifier and Type | Method and Description |
---|---|
void |
FlatOpenDocumentParser.setExtractMacros(boolean extractMacros) |
void |
OpenDocumentParser.setExtractMacros(boolean extractMacros) |
Modifier and Type | Method and Description |
---|---|
void |
PDFParser.setDropThreshold(float dropThreshold) |
void |
PDFParser.setEnableAutoSpace(boolean v)
If true (the default), the parser should estimate
where spaces should be inserted between words.
|
void |
PDFParser.setExtractAnnotationText(boolean v)
If true (the default), text in annotations will be
extracted.
|
void |
PDFParser.setMaxMainMemoryBytes(long maxMainMemoryBytes) |
void |
PDFParser.setOcrImageType(String imageType) |
void |
PDFParser.setOcrStrategy(String ocrStrategyString) |
void |
PDFParser.setSortByPosition(boolean v)
If true, sort text tokens by their x/y position
before extracting text.
|
void |
PDFParser.setSuppressDuplicateOverlappingText(boolean v)
If true, the parser should try to remove duplicated
text over the same region.
|
Modifier and Type | Method and Description |
---|---|
void |
PackageParser.setDetectCharsetsInEntryNames(boolean detectCharsetsInEntryNames)
Whether or not to run the default charset detector against entry
names in ZipFiles.
|
void |
ZipContainerDetector.setMarkLimit(int markLimit)
If this is less than 0, the file will be spooled to disk,
and detection will run on the full file.
|
void |
CompressorParser.setMemoryLimitInKb(int memoryLimitInKb) |
Modifier and Type | Method and Description |
---|---|
void |
ObjectRecognitionParser.setRecogniser(String recogniserClass) |
Modifier and Type | Field and Description |
---|---|
protected URI |
TensorflowRESTRecogniser.apiBaseUri |
protected double |
TensorflowRESTRecogniser.minConfidence |
protected int |
TensorflowRESTRecogniser.topN |
Modifier and Type | Method and Description |
---|---|
void |
RTFParser.setMemoryLimitInKb(int memoryLimitInKb) |
Modifier and Type | Method and Description |
---|---|
void |
UniversalEncodingDetector.setMarkLimit(int markLimit)
How far into the stream to read for charset detection.
|
void |
Icu4jEncodingDetector.setMarkLimit(int markLimit)
How far into the stream to read for charset detection.
|
void |
Icu4jEncodingDetector.setStripMarkup(boolean stripMarkup)
Whether or not to attempt to strip html-ish markup
from the stream before sending it to the underlying
detector.
|
Modifier and Type | Method and Description |
---|---|
void |
WordPerfectParser.setIncludeDeletedContent(boolean includeDeletedContent)
Whether or not to include deleted content.
|
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.