Security
This page covers security considerations when using Apache Tika.
Security Model
Apache Tika’s security model describes the trust boundaries and assumptions that govern how Tika processes content. Understanding this model is essential for deploying Tika securely.
Known Vulnerabilities
For information about known security vulnerabilities (CVEs) in Apache Tika and their remediation, please see:
External Command Security
Apache Tika can be configured to use external system commands for certain operations,
such as the FileCommandDetector and ExternalParser components.
| External command configuration should only be performed by trusted administrators. Never allow untrusted users to configure command paths or arguments. |
Security Best Practices
-
Restrict configuration access: Only allow administrators to modify Tika configuration files that specify external commands.
-
Use absolute paths: Always configure external commands with absolute paths to prevent PATH manipulation attacks.
-
Sandbox execution: Consider running Tika in a container or sandbox environment to limit the impact of any command execution vulnerabilities.
-
Audit command configuration: Regularly review configured external commands and their arguments.
Credential Handling
Password Storage in Memory
Tika stores some credentials as Java String objects, which remain in memory until garbage collected. For environments with strict security requirements:
-
Use environment variables: Configure credentials via environment variables rather than configuration files where possible.
-
Use secret managers: Integrate with HashiCorp Vault, AWS Secrets Manager, or similar services for production deployments.
-
Enable encryption: Use the AES encryption option in
HttpClientFactoryfor stored passwords. -
Minimize credential scope: Use credentials with minimum necessary privileges and rotate them regularly.