Class Tess4JConfig

java.lang.Object
org.apache.tika.parser.ocr.tess4j.Tess4JConfig
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
Tess4JConfig.RuntimeConfig

public class Tess4JConfig extends Object implements Serializable
Configuration for Tess4JParser.

This class is not thread-safe and must be synchronized externally.

See Also:
  • Constructor Details

    • Tess4JConfig

      public Tess4JConfig()
  • Method Details

    • getLanguage

      public String getLanguage()
    • setLanguage

      public void setLanguage(String language)
      Set tesseract language dictionary to be used. Default is "eng". Multiple languages may be specified, separated by plus characters. e.g. "eng+fra"
    • getDataPath

      public String getDataPath()
    • setDataPath

      public void setDataPath(String dataPath) throws TikaConfigException
      Set the path to the tessdata directory.
      Throws:
      TikaConfigException
    • getPageSegMode

      public int getPageSegMode()
    • setPageSegMode

      public void setPageSegMode(int pageSegMode)
      Set tesseract page segmentation mode. Default is 1.
    • getOcrEngineMode

      public int getOcrEngineMode()
    • setOcrEngineMode

      public void setOcrEngineMode(int ocrEngineMode)
      Set OCR Engine Mode. Default is 3.
    • getMaxFileSizeToOcr

      public long getMaxFileSizeToOcr()
    • setMaxFileSizeToOcr

      public void setMaxFileSizeToOcr(long maxFileSizeToOcr)
    • getMinFileSizeToOcr

      public long getMinFileSizeToOcr()
    • setMinFileSizeToOcr

      public void setMinFileSizeToOcr(long minFileSizeToOcr)
    • getPoolSize

      public int getPoolSize()
    • setPoolSize

      public void setPoolSize(int poolSize)
      Set the number of Tesseract instances to keep in the pool. Default is 2. Must be at least 1.
    • getTimeoutSeconds

      public int getTimeoutSeconds()
    • setTimeoutSeconds

      public void setTimeoutSeconds(int timeoutSeconds)
      Set maximum time (seconds) to wait for a pooled Tesseract instance. Default is 120.
    • isSkipOcr

      public boolean isSkipOcr()
    • setSkipOcr

      public void setSkipOcr(boolean skipOcr)
    • getDpi

      public int getDpi()
    • setDpi

      public void setDpi(int dpi)
      Set the DPI for image rendering. Default is 300.
    • getMaxImagePixels

      public long getMaxImagePixels()
    • setMaxImagePixels

      public void setMaxImagePixels(long maxImagePixels)
      Set the maximum total pixels (width × height) allowed for an image before OCR is skipped. Default is 100,000,000 (100 megapixels). Set to -1 for no limit (not recommended).
    • getNativeLibPath

      public String getNativeLibPath()
    • setNativeLibPath

      public void setNativeLibPath(String nativeLibPath) throws TikaConfigException
      Set the path to the directory containing native Tesseract/Leptonica shared libraries. On macOS with Homebrew this is typically /opt/homebrew/lib.
      Throws:
      TikaConfigException