Package org.apache.tika.parser.pdf
Class OcrConfig
java.lang.Object
org.apache.tika.parser.pdf.OcrConfig
- All Implemented Interfaces:
Serializable
Configuration for OCR processing in PDF parsing.
Groups all OCR-related settings together.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumstatic enumstatic enumstatic enumstatic classConfiguration for AUTO strategy behavior. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintgetDpi()floatlongintvoidsetDpi(int dpi) voidsetImageFormat(OcrConfig.ImageFormat imageFormat) voidsetImageQuality(float imageQuality) voidsetImageType(OcrConfig.ImageType imageType) voidsetMaxImagePixels(long maxImagePixels) Set the maximum total pixels (width × height) for a rendered page image.voidsetMaxPagesToOcr(int maxPagesToOcr) Set the maximum number of pages to OCR per document.voidsetRenderingStrategy(OcrConfig.RenderingStrategy renderingStrategy) voidsetStrategy(OcrConfig.Strategy strategy) voidsetStrategyAuto(OcrConfig.StrategyAuto strategyAuto)
-
Constructor Details
-
OcrConfig
public OcrConfig()
-
-
Method Details
-
getStrategy
-
setStrategy
-
getStrategyAuto
-
setStrategyAuto
-
getRenderingStrategy
-
setRenderingStrategy
-
getDpi
public int getDpi() -
setDpi
public void setDpi(int dpi) -
getImageType
-
setImageType
-
getImageFormat
-
setImageFormat
-
getImageQuality
public float getImageQuality() -
setImageQuality
public void setImageQuality(float imageQuality) -
getMaxImagePixels
public long getMaxImagePixels() -
setMaxImagePixels
public void setMaxImagePixels(long maxImagePixels) Set the maximum total pixels (width × height) for a rendered page image. Pages exceeding this limit are skipped for OCR. Default is 100,000,000. Set to-1for no limit (not recommended). -
getMaxPagesToOcr
public int getMaxPagesToOcr() -
setMaxPagesToOcr
public void setMaxPagesToOcr(int maxPagesToOcr) Set the maximum number of pages to OCR per document. Default is-1(no limit). Must be-1or at least1.
-