Class TesseractOCRParser

java.lang.Object
org.apache.tika.parser.AbstractExternalProcessParser
org.apache.tika.parser.ocr.TesseractOCRParser
All Implemented Interfaces:
Serializable, Initializable, SelfConfiguring, Parser

public class TesseractOCRParser extends AbstractExternalProcessParser implements Initializable
TesseractOCRParser powered by tesseract-ocr engine. To enable this parser, create a TesseractOCRConfig object and pass it through a ParseContext. Tesseract-ocr must be installed and on system path or the path to its root folder must be provided:

TesseractOCRConfig config = new TesseractOCRConfig();
//Needed if tesseract is not on system path
config.setTesseractPath(tesseractFolder);
parseContext.set(TesseractOCRConfig.class, config);

See Also: