Tika Server
This section covers running Apache Tika as a REST server via tika-server.
Overview
Tika Server provides a RESTful HTTP interface for parsing documents and extracting content. It can be deployed as a standalone service or in a containerized environment.
Endpoints
Content Extraction (/tika)
The /tika endpoint extracts content from a document as plain text.
curl -T document.pdf http://localhost:9998/tika
Recursive Metadata (/rmeta)
The /rmeta endpoint returns metadata for the container document and all embedded documents
as a JSON array of metadata objects.
curl -T document.pdf http://localhost:9998/rmeta
Content handler can be specified in the URL path:
-
/rmeta/text- plain text content (default) -
/rmeta/html- HTML content -
/rmeta/xml- XHTML content -
/rmeta/markdown- Markdown content -
/rmeta/ignore- metadata only, no content
curl -T document.docx http://localhost:9998/rmeta/markdown
Topics
-
TLS/SSL Configuration - Secure your server with TLS and mutual authentication
Security Configuration
Config Endpoint Protection
By default, the /config endpoints that expose server configuration are disabled for security
reasons. These endpoints can reveal sensitive information about your server configuration,
including parser settings and system properties (see CVE-2015-3271).
The protected endpoints include:
-
/config- Returns the server’s full configuration -
/config/parsers- Returns configured parsers -
/config/detectors- Returns configured detectors -
/config/mimeTypes- Returns MIME type mappings
Enabling Config Endpoints
To enable these endpoints:
{
"server": {
"enableUnsecureFeatures": true
}
}
Only enable enableUnsecureFeatures if you have secured access to Tika Server through
network controls (firewalls, private subnets), a reverse proxy (nginx, Apache httpd), or
2-way TLS authentication. Exposing config endpoints to
untrusted networks can help attackers identify vulnerabilities and craft targeted attacks.
|
Command Line Usage
You can also enable unsecure features via command line:
java -jar tika-server-standard.jar --enableUnsecureFeatures
Security Best Practices
-
Keep config endpoints disabled in production (default behavior)
-
Use network controls to restrict access to the Tika Server (firewall rules, private subnets)
-
Consider TLS for encrypted communication - see TLS Configuration
-
Run with minimal privileges - don’t run Tika Server as root
-
Monitor logs for unusual access patterns