Class OcrConfig

java.lang.Object
org.apache.tika.parser.pdf.OcrConfig
All Implemented Interfaces:
Serializable

public class OcrConfig extends Object implements Serializable
Configuration for OCR processing in PDF parsing. Groups all OCR-related settings together.
See Also:
  • Constructor Details

    • OcrConfig

      public OcrConfig()
  • Method Details

    • getStrategy

      public OcrConfig.Strategy getStrategy()
    • setStrategy

      public void setStrategy(OcrConfig.Strategy strategy)
    • getStrategyAuto

      public OcrConfig.StrategyAuto getStrategyAuto()
    • setStrategyAuto

      public void setStrategyAuto(OcrConfig.StrategyAuto strategyAuto)
    • getRenderingStrategy

      public OcrConfig.RenderingStrategy getRenderingStrategy()
    • setRenderingStrategy

      public void setRenderingStrategy(OcrConfig.RenderingStrategy renderingStrategy)
    • getDpi

      public int getDpi()
    • setDpi

      public void setDpi(int dpi)
    • getImageType

      public OcrConfig.ImageType getImageType()
    • setImageType

      public void setImageType(OcrConfig.ImageType imageType)
    • getImageFormat

      public OcrConfig.ImageFormat getImageFormat()
    • setImageFormat

      public void setImageFormat(OcrConfig.ImageFormat imageFormat)
    • getImageQuality

      public float getImageQuality()
    • setImageQuality

      public void setImageQuality(float imageQuality)
    • getMaxImagePixels

      public long getMaxImagePixels()
    • setMaxImagePixels

      public void setMaxImagePixels(long maxImagePixels)
      Set the maximum total pixels (width × height) for a rendered page image. Pages exceeding this limit are skipped for OCR. Default is 100,000,000. Set to -1 for no limit (not recommended).
    • getMaxPagesToOcr

      public int getMaxPagesToOcr()
    • setMaxPagesToOcr

      public void setMaxPagesToOcr(int maxPagesToOcr)
      Set the maximum number of pages to OCR per document. Default is -1 (no limit). Must be -1 or at least 1.