Configuration
Table of Contents
This section covers configuring Apache Tika.
Overview
Tika 4.x uses JSON configuration files. Configuration controls parsers, detectors, content handlers, and other components.
Tika 3.x and earlier used XML configuration (tika-config.xml). See the
Migration Guide for details on converting to JSON.
|
Topics
Parser Configuration
-
PDFParser - PDF parsing options
-
TesseractOCRParser - OCR options for image-based text extraction
Other Configuration
-
Digesters - Computing cryptographic hashes of documents
-
Encoding Detectors - Configuring charset/encoding detection