Releasing Apache Tika

This guide covers the process for releasing the main Apache Tika project.

Prerequisites

Before starting the release process, ensure you have:

  • Commit access to the Apache Tika repository

  • A valid GPG key published to a public keyserver

  • Maven credentials configured in ~/.m2/settings.xml

  • Access to Apache’s Nexus repository manager

  • SVN client (svn) — release candidates upload to dist.apache.org via SVN, not scp

  • Internet access on first build — the Antora docs build downloads Node.js into ~/.cache/tika-antora/ on first run (~100 MB, one-time per machine; reused across clean builds)

Pre-Release Checks

Before starting the release, run vulnerability and dependency audits:

# Identify vulnerable dependencies
mvn ossindex:audit -Dossindex.fail=true

# Check for outdated plugins
mvn versions:display-plugin-updates

# Check for outdated dependencies
mvn versions:display-dependency-updates

# Run full regression tests
mvn -Prelease-profile clean verify

Release Process

Step 1: Clone the Repository

Clone the repository if you haven’t already:

git clone https://github.com/apache/tika.git
cd tika

Step 2: Update Documentation

Update CHANGES.txt with the release date:

Release X.Y.Z - MM/dd/yyyy

Add any changelog entries as needed.

Step 3: JIRA Management

  1. Create versions X.Y.Z, X.(Y+1), and X.(Y+2) in JIRA if they don’t exist

  2. Reassign any unresolved X.Y.Z issues to X.(Y+1) via bulk change

Step 4: Verify License Headers

Run the Apache RAT plugin to verify all files have proper license headers:

mvn apache-rat:check

Step 5: Commit Changes

Commit the CHANGES.txt updates:

git add CHANGES.txt
git commit -m "Prepare for X.Y.Z release"
git push

Step 6: Set Maven Memory

Configure Maven memory settings:

export MAVEN_OPTS="-Xms128m -Xmx256m"

Step 7: Prepare the Release

Execute the Maven release prepare goal:

mvn release:prepare

This will prompt you to confirm:

  • The release version (X.Y.Z)

  • The SCM tag name

  • The next development version

Always enter X.Y.Z-rcN as the SCM tag name — never the bare X.Y.Z.

Pushing a tag of the form X.Y.Z (digit.digit.digit, no hyphen) immediately auto-triggers the Docker release workflow, which publishes images to Docker Hub and moves apache/tika:latest. If you enter bare X.Y.Z here, those images go live before the vote has even started.

For the first vote: enter X.Y.Z-rc1. For a second RC: X.Y.Z-rc2. Etc. The bare X.Y.Z tag is created later in Step 12 after the vote passes, and that push is what triggers the Docker release.

Step 8: Perform the Release

Execute the Maven release perform goal:

mvn release:perform -Darguments="-DskipITs"

-DskipITs skips integration tests during the inner build. Tests already ran in release:prepare’s `verify phase; re-running them during perform is belt-and-suspenders, and some pipes/elasticsearch chaos-monkey tests are timing-sensitive enough to flake on a tagged build.

If release:perform fails partway through, see Troubleshooting release:perform.

Ensure you have valid Maven credentials in ~/.m2/settings.xml:

<servers>
  <server>
    <id>apache.releases.https</id>
    <username>your-apache-id</username>
    <password>your-password</password>
  </server>
</servers>

Step 9: Verify Staging Repository

  1. Access Apache’s Nexus at https://repository.apache.org

  2. Log in with your Apache credentials

  3. Navigate to "Staging Repositories"

  4. Find the org.apache.tika staging repository

  5. Verify it contains all expected artifacts

  6. Click "Close" with an appropriate message

Step 10: Upload Distribution Artifacts

The release-plugin’s antrun task assembles a dist directory at target/checkout/target/X.Y.Z/ containing the source zip, app jar, server tarballs, and parser-package jars (each with .asc and .sha512).

At the end of release:perform you will see an echo telling you to scp -r …​ people.apache.org:public_html/tika/. Ignore that. It is stale — the current ASF release distribution channel is the SVN repo under dist.apache.org, not people.apache.org.

Check out the dist dev SVN repo and copy the prepared dist directory in:

svn co https://dist.apache.org/repos/dist/dev/tika tika-dist-dev
cp -r target/checkout/target/X.Y.Z tika-dist-dev/
cd tika-dist-dev
svn add X.Y.Z
svn commit -m "Stage Apache Tika X.Y.Z RC<n>"

Verify the directory contains all expected artifacts (each with .asc and .sha512):

  • tika-X.Y.Z-src.zip

  • tika-app-X.Y.Z.jar

  • tika-server-standard-X.Y.Z.jar (and tika-server-standard-X.Y.Z.zip)

  • tika-parser-scientific-package-X.Y.Z.jar

  • tika-parser-sqlite3-package-X.Y.Z.jar

  • tika-parser-nlp-package-X.Y.Z.jar

Also:

  • CHANGES.txt (already in the dist directory; rename to CHANGES-X.Y.Z.txt if your local copy hasn’t been)

  • Ensure the KEYS file at the parent directory contains your GPG key

Step 11: Call the Vote

Send a vote request to the dev@tika.apache.org mailing list:

Subject: [VOTE] Release Apache Tika X.Y.Z

Hi all,

I have created a candidate build for Apache Tika X.Y.Z.

The release candidate artifacts can be found at:
https://dist.apache.org/repos/dist/dev/tika/

The staging repository is:
https://repository.apache.org/content/repositories/orgapachetika-XXXX

The Git tag is:
https://github.com/apache/tika/tree/X.Y.Z

Please vote:
[ ] +1 Release this package
[ ] +0 No opinion
[ ] -1 Do not release (please provide reason)

This vote will remain open for at least 72 hours.

Step 12: Release the Artifacts

Upon successful vote (at least 3 +1 votes from PMC members):

  1. Release the Nexus staging repository (click "Release" button)

  2. Move artifacts from dev to release distribution:

    svn mv https://dist.apache.org/repos/dist/dev/tika/X.Y.Z \
           https://dist.apache.org/repos/dist/release/tika/X.Y.Z \
           -m "Release Apache Tika X.Y.Z"
  3. Create the GA git tag from the winning RC and push it. This auto-triggers the Docker release workflow (see Releasing Tika Docker Images):

    git tag X.Y.Z X.Y.Z-rcN     # point GA tag at the same commit as the winning RC
    git push origin X.Y.Z

    For a prerelease (X.Y.Z-alpha-N, X.Y.Z-beta-N, etc.) where you don’t want :latest to move and don’t want the workflow to auto-fire, skip this substep. Trigger the Docker release manually via workflow_dispatch instead; see the Docker guide.

Troubleshooting release:perform

The release:perform build can fail mid-way for reasons unrelated to the release itself. This section captures the recoveries learned during recent releases. Once these get fixed in the build (tracked in the to-fix-before- beta punch list), this section can be slimmed down.

tika-docs assembly fails: "archive cannot be empty"

[ERROR] Failed to create assembly: Error creating assembly archive docs:
        archive cannot be empty

Cause: the Antora plugin is not auto-bound to the package phase, so target/site/ is empty when maven-assembly-plugin runs.

Recovery (resume from tika-docs):

cd target/checkout
mvn deploy -Papache-release -rf :tika-docs -DskipITs

If the antora binding (the recommended fix in the to-fix-before-beta punch list) hasn’t yet landed, you may need to manually build the site first:

cd target/checkout/docs
mvn antora:antora
cd ..
mvn deploy -Papache-release -rf :tika-docs -DskipITs

Antrun error from a child module: "Could not find file …​ -src.zip"

Could not find file .../docs/target/X.Y.Z/tika-X.Y.Z-src.zip
to generate checksum for.

Cause: the root-pom antrun execution lacks <inherited>false</inherited>, so it fires from each child module on a resumed deploy with ${basedir} pointing at the wrong directory.

Recovery (run the antrun once at the root):

cd target/checkout
mvn deploy --non-recursive -Papache-release -Dmaven.deploy.skip=true

--non-recursive runs only the root pom; -Dmaven.deploy.skip=true prevents re-uploading the root pom artifact (already uploaded earlier). The antrun fires in the correct basedir and target/X.Y.Z/ gets populated.

Nexus staging repository: only one repo when I expected two

If release:perform fails partway and you re-run it, you may see only one open staging repository on repository.apache.org even though both invocations uploaded artifacts. This is normal: while the staging repo is open, redeploys overwrite earlier artifacts. Confirm by checking the Last Modified timestamp on a representative artifact (e.g. tika-core-X.Y.Z.jar) — it should match the most recent run.

When in doubt, drop the staging repo and run release:perform cleanly from scratch. It costs ~1 hour but yields a guaranteed single-build set of artifacts.

gRPC distribution zip is huge (~600+ MB)

The tika-grpc-X.Y.Z.zip artifact bundles every pipes plugin with its full transitive closure (microsoft-graph, gcs, az-blob, s3, kafka, etc.) plus multi-platform native libs (rocksdbjni, netty natives). Several hundred MB of that is duplication of dependencies already in the root lib/ directory. This is a known issue tracked for cleanup before beta — see the to-fix-before-4.0.0-beta punch list. The release can ship as-is; the zip is correct, just bloated.

Post-Release

Update Unreleased Modules

Update any modules that weren’t part of the release to the next SNAPSHOT version.

Update Website

Refresh the website documentation to reflect the new release:

  • Update download links

  • Update version numbers in documentation

  • Add release notes

Release Docker and Helm Images

For a GA release, the Docker images publish automatically when the X.Y.Z tag is pushed in Step 12 above — no manual step needed. Watch the "Docker release - tika-server and tika-grpc" workflow run in the Actions tab to confirm. See Releasing Tika Docker Images for the tag scheme, verification steps, and how to publish a manual rebuild (CVE in base image, etc.).

For a prerelease (X.Y.Z-alpha-N, X.Y.Z-beta-N, RC variants), the Docker workflow does not auto-fire — trigger it manually via workflow_dispatch per the Docker guide.

Helm charts are released separately via Releasing Tika Helm Charts.

Send Announcements

Send release announcements to:

Subject: [ANNOUNCE] Apache Tika X.Y.Z Released

The Apache Tika team is pleased to announce the release of Apache Tika X.Y.Z.

Apache Tika is a toolkit for detecting and extracting metadata and text
from various types of files.

This release includes:
[List major changes/features]

For a complete list of changes, see:
https://tika.apache.org/X.Y.Z/changes.html

Download:
https://tika.apache.org/download.html

Thanks to everyone who contributed to this release!

The Apache Tika Team

Register the Release

Register the release at https://reporter.apache.org