Getting Started with Apache Tika

Apache Tika can be used in several ways depending on your needs. Choose the approach that best fits your use case.

Choose Your Integration Method

Java API

Use Tika directly in your Java application. Best for tight integration and full control over parsing behavior.

Command Line (tika-app)

Run Tika from the command line. Best for quick extraction, scripting, and one-off tasks.

Server (REST API)

Run Tika as a standalone server with a REST API. Best for language-agnostic integration and microservice architectures.

gRPC

Use Tika via gRPC protocol. Best for high-performance, cross-language communication.

Which Should I Use?

Use Case Recommended Approach

Java application needing content extraction

Java API

Shell scripts or batch processing

Command Line

Non-Java application (Python, Node.js, etc.)

Server (REST) or gRPC

High-throughput processing pipeline

Server or gRPC with Pipes

Quick one-time extraction

Command Line

Scalable Processing

For processing large volumes of documents, see Tika Pipes, which provides fault-tolerant, scalable document processing and works with all of the above integration methods.

Resources