Apache Tika Apache Tika Documentation
Apache Tika GitHub

Apache Tika

    • Home
    • Using Tika
      • Java API
      • Tika Server
        • TLS/SSL Configuration
      • Command Line
      • gRPC
    • Pipes
      • Getting Started
      • Fetchers
      • Emitters
      • Iterators
      • Reporters
      • Pipeline Configuration
      • Parse Modes
      • Extracting Embedded Bytes
      • Timeouts
      • Forked-JVM CPU Sizing
      • Troubleshooting
      • Plugins
        • File System
        • Amazon S3
        • Google Cloud Storage
        • Azure Blob Storage
        • OpenSearch
        • Elasticsearch
        • Apache Solr
        • JDBC
        • Apache Kafka
        • HTTP
        • Google Drive
        • Microsoft Graph
        • Atlassian JWT
        • CSV
        • JSON
    • Configuration
      • PDF Parser
      • Tesseract OCR
      • VLM Parsers (Claude, Gemini, OpenAI)
      • External Parser (ffmpeg, exiftool, etc.)
      • Tess4J OCR (In-Process)
    • Migration to 4.x
      • Migration Guide
      • Tika Server Migration
      • Serialization Changes
      • Metadata Changes
      • Design Notes
      • Chunk Strategies
      • Inference Handler Requirements
    • Advanced
      • Language Detection
      • Building the Language Detector
      • Text Quality Scoring (Junk Detection)
      • Building the Junk Detector
      • Robustness
      • Setting Limits
      • Spooling
      • Embedded Document Metadata
      • Running a Local VLM Server
      • Tika-Server REST UAT Script
    • Developers
      • Serialization and Configuration
    • FAQ
    • Security
    • Roadmap
    • Maintainers
      • Publishing the Site
      • Release Guides
        • Releasing Apache Tika
        • Release Artifacts: What Goes Where
        • Releasing Tika Docker Images
        • Releasing Tika Helm Charts
        • Releasing Tika gRPC
Apache Tika 4.0.0-SNAPSHOT
  • Apache Tika
    • 4.0.0-SNAPSHOT

For Maintainers

Table of Contents
  • Topics
  • Development Resources

This section contains documentation for Apache Tika project maintainers and committers.

Topics

  • Publishing the Site - How to build and publish the documentation site

  • Release Guides - How to release Apache Tika

Development Resources

  • JIRA - Issue tracker

  • Maven Snapshots - SNAPSHOT builds

  • CI Builds - Continuous integration builds

  • Confluence Wiki - Legacy wiki (being migrated to these docs)

© Apache Software Foundation. All rights reserved.