Publishing the Documentation Site

This guide covers how to build and publish the Apache Tika documentation site.

Overview

The documentation is built using Antora, a static site generator for AsciiDoc. The site supports multiple versions through Git branches and includes client-side search powered by Lunr.

Prerequisites

  • Maven 3.9+

  • Git

  • Internet access on first build — the Antora plugin downloads Node.js into ~/.cache/tika-antora/ (~100 MB, one-time per machine; reused across clean builds and across worktrees)

Building the Site Locally

The docs module is only included in the reactor under the apache-release profile. Build the site from the repo root:

./mvnw package -Papache-release -pl :tika-docs -DskipTests

The generated site will be at docs/target/site/. The current git commit and date are stamped automatically onto the home page (a generated copy of the playbook lives at docs/antora-playbook-stamped.yml — gitignored).

To skip the stamping or override the playbook:

# build directly with the unstamped playbook
cd docs && mvn antora:antora -Dplaybook=antora-playbook.yml

Previewing the Site

Option 1: Python HTTP server (recommended)

cd docs/target/site
python3 -m http.server 8000

Then open http://localhost:8000 in your browser.

Option 2: Node.js HTTP server

npx http-server docs/target/site -p 8000

Then open http://localhost:8000 in your browser.

Option 3: Open static HTML directly

# Linux
xdg-open docs/target/site/index.html

# macOS
open docs/target/site/index.html

# Windows
start docs/target/site/index.html
Opening static HTML directly may not fully test search and relative links.

Living Documentation

The documentation includes examples that are symlinked to actual test configuration files in the codebase. This ensures examples are always valid and tested. The symlinks are in docs/modules/ROOT/examples/ and point to files in tika-parsers/…​/config-examples/.

When you modify a config example in the codebase, the documentation automatically reflects the change on the next build.

Version Management

Documentation versions are managed through Git branches with the docs/ prefix.

Branch Structure

  • HEAD (main branch) - Current development version (SNAPSHOT)

  • docs/4.0.0 - Released 4.0.0 documentation

  • docs/4.1.0 - Released 4.1.0 documentation

The playbook (antora-playbook.yml) is configured to build all docs/* branches automatically.

Publishing to the Site

Build the docs with Maven, then run publish-docs.sh to copy the output to a tika-site SVN checkout (with URL flattening so /docs/tika/X.Y.Z/…​ becomes /docs/X.Y.Z/…​):

./mvnw package -Papache-release -pl :tika-docs -DskipTests
cd docs
./publish-docs.sh /path/to/tika-site/publish

# Then in the SVN checkout:
cd /path/to/tika-site
svn add publish/docs publish/_ --force
svn commit -m "Publish 4.0.0-SNAPSHOT docs"

The Maven package step builds the Antora site (stamping the current git commit and date on the home page); publish-docs.sh copies the output to the site checkout with the correct directory layout:

  • publish/docs/4.0.0-SNAPSHOT/ — the documentation pages

  • publish/_/ — CSS, JS, fonts (shared across versions)

  • publish/docs/index.html — redirect to latest version

Publishing a Release

When releasing a new version (e.g., 4.0.0):

# 1. Tag the release as usual
git tag v4.0.0

# 2. Create docs branch from tag
git checkout -b docs/4.0.0 v4.0.0

# 3. Update version in antora.yml
sed -i "s/4.0.0-SNAPSHOT/4.0.0/" docs/antora.yml
git commit -am "Set docs version to 4.0.0"
git push origin docs/4.0.0

# 4. Build and publish
./mvnw package -Papache-release -pl :tika-docs -DskipTests
cd docs
./publish-docs.sh /path/to/tika-site/publish

# 5. Commit to SVN
cd /path/to/tika-site
svn add publish/docs publish/_ --force
svn commit -m "Publish 4.0.0 docs"

Updating Released Documentation

To fix or update documentation for a released version:

# 1. Checkout the docs branch
git checkout docs/4.0.0

# 2. Make changes (docs or config examples)
# Edit files as needed...

# 3. Commit and push
git commit -am "Fix PDF parser example"
git push origin docs/4.0.0

# 4. Rebuild and republish
./mvnw package -Papache-release -pl :tika-docs -DskipTests
cd docs
./publish-docs.sh /path/to/tika-site/publish
cd /path/to/tika-site
svn commit -m "Update 4.0.0 docs"

Site Structure

The Antora configuration files:

  • docs/antora.yml - Component descriptor (name, version, navigation)

  • docs/antora-playbook.yml - Site-wide configuration (sources, UI, extensions)

  • docs/modules/ROOT/nav.adoc - Navigation sidebar structure

  • docs/modules/ROOT/pages/ - Documentation pages

  • docs/modules/ROOT/examples/ - Symlinks to config examples

  • docs/supplemental-ui/ - Custom UI components (header, footer, search)

The site includes client-side search powered by the Lunr extension. The search index is generated at build time and requires no server-side infrastructure.

Logos

Official Apache Tika logos are archived at docs/assets/logos/asf-tika-logos.zip.