public class TesseractOCRConfig extends Object implements Serializable
This allows to enable TesseractOCRParser and set its parameters:
TesseractOCRConfig config = new TesseractOCRConfig();
config.setTesseractPath(tesseractFolder);
parseContext.set(TesseractOCRConfig.class, config);
Parameters can also be set by either editing the existing TesseractOCRConfig.properties file in, tika-parser/src/main/resources/org/apache/tika/parser/ocr, or overriding it by creating your own and placing it in the package org/apache/tika/parser/ocr on the classpath.
Modifier and Type | Class and Description |
---|---|
static class |
TesseractOCRConfig.OUTPUT_TYPE |
Constructor and Description |
---|
TesseractOCRConfig()
Default contructor.
|
TesseractOCRConfig(InputStream is)
Loads properties from InputStream and then tries to close InputStream.
|
Modifier and Type | Method and Description |
---|---|
boolean |
getApplyRotation() |
String |
getColorspace() |
int |
getDensity() |
int |
getDepth() |
String |
getFilter() |
String |
getImageMagickPath() |
String |
getLanguage() |
int |
getMaxFileSizeToOcr() |
int |
getMinFileSizeToOcr() |
TesseractOCRConfig.OUTPUT_TYPE |
getOutputType() |
String |
getPageSegMode() |
boolean |
getPreserveInterwordSpacing() |
int |
getResize() |
String |
getTessdataPath() |
String |
getTesseractPath() |
int |
getTimeout() |
int |
isEnableImageProcessing() |
void |
setApplyRotation(boolean applyRotation)
Sets whether or not a rotation value should be calculated and passed to ImageMagick.
|
void |
setColorspace(String colorspace) |
void |
setDensity(int density) |
void |
setDepth(int depth) |
void |
setEnableImageProcessing(int enableImageProcessing)
Set the value to true if processing is to be enabled.
|
void |
setFilter(String filter) |
void |
setImageMagickPath(String ImageMagickPath)
Set the path to the ImageMagick executable, needed if it is not on system path.
|
void |
setLanguage(String language)
Set tesseract language dictionary to be used.
|
void |
setMaxFileSizeToOcr(int maxFileSizeToOcr)
Set maximum file size to submit file to ocr.
|
void |
setMinFileSizeToOcr(int minFileSizeToOcr)
Set minimum file size to submit file to ocr.
|
void |
setOutputType(TesseractOCRConfig.OUTPUT_TYPE outputType)
Set output type from ocr process.
|
void |
setPageSegMode(String pageSegMode)
Set tesseract page segmentation mode.
|
void |
setPreserveInterwordSpacing(boolean preserveInterwordSpacing)
Whether or not to maintain interword spacing.
|
void |
setResize(int resize) |
void |
setTessdataPath(String tessdataPath)
Set the path to the 'tessdata' folder, which contains language files and config files.
|
void |
setTesseractPath(String tesseractPath)
Set the path to the Tesseract executable, needed if it is not on system path.
|
void |
setTimeout(int timeout)
Set maximum time (seconds) to wait for the ocring process to terminate.
|
public TesseractOCRConfig()
public TesseractOCRConfig(InputStream is)
is
- public String getTesseractPath()
setTesseractPath(String tesseractPath)
public void setTesseractPath(String tesseractPath)
Note that if you set this value, it is highly recommended that you also
set the path to the 'tessdata' folder using setTessdataPath(java.lang.String)
.
public String getTessdataPath()
setTessdataPath(String tessdataPath)
public void setTessdataPath(String tessdataPath)
public String getLanguage()
setLanguage(String language)
public void setLanguage(String language)
public String getPageSegMode()
setPageSegMode(String pageSegMode)
public void setPageSegMode(String pageSegMode)
public void setPreserveInterwordSpacing(boolean preserveInterwordSpacing)
false
.preserveInterwordSpacing
- public boolean getPreserveInterwordSpacing()
public int getMinFileSizeToOcr()
public void setMinFileSizeToOcr(int minFileSizeToOcr)
public int getMaxFileSizeToOcr()
public void setMaxFileSizeToOcr(int maxFileSizeToOcr)
public void setTimeout(int timeout)
public int getTimeout()
setTimeout(int timeout)
public void setOutputType(TesseractOCRConfig.OUTPUT_TYPE outputType)
public TesseractOCRConfig.OUTPUT_TYPE getOutputType()
setOutputType(OUTPUT_TYPE outputType)
public int isEnableImageProcessing()
setEnableImageProcessing(int)
public void setEnableImageProcessing(int enableImageProcessing)
public int getDensity()
public void setDensity(int density)
density
- the density to set. Valid range of values is 150-1200.
Default value is 300.public int getDepth()
public void setDepth(int depth)
depth
- the depth to set. Valid values are 2, 4, 8, 16, 32, 64, 256, 4096.
Default value is 4.public String getColorspace()
public void setColorspace(String colorspace)
colorspace
- the colorspace to set
Deafult value is gray.public String getFilter()
public void setFilter(String filter)
filter
- the filter to set. Valid values are point, hermite, cubic, box, gaussian, catrom, triangle, quadratic and mitchell.
Default value is triangle.public int getResize()
public void setResize(int resize)
resize
- the resize to set. Valid range of values is 100-900.
Default value is 900.public String getImageMagickPath()
setImageMagickPath(String ImageMagickPath)
public void setImageMagickPath(String ImageMagickPath)
ImageMagickPath
- to ImageMagick file.public boolean getApplyRotation()
public void setApplyRotation(boolean applyRotation)
true
- to calculate and apply rotation, false to skip. Default is false, true required Python installed.Copyright © 2007–2017 The Apache Software Foundation. All rights reserved.