Release Artifacts: What Goes Where

A 4.x Tika release publishes to three channels, each with a different audience:

Channel URL Audience

Maven Central

https://repo1.maven.org/maven2/org/apache/tika/

Java consumers adding Tika to a Maven / Gradle build. Get lean basic jars plus pom + sources-jar + javadoc-jar. Maven resolves transitive deps for them.

Apache dist

https://downloads.apache.org/tika/<version>/

Humans downloading runnable archives or drop-in plugin zips. No Maven involved. Want fat / self-contained artifacts.

Docker Hub

apache/tika, apache/tika-grpc

Container deployers. Get a ready-to-run image with parsers + plugins bundled (the "kitchen sink" by default).

The driving principle: fat distribution artifacts (zips, shaded jars, runnable bundles) do not go to Maven Central; basic Maven artifacts (slim jars + pom) do not need to clutter Apache dist. Channel-specific shapes keep each ecosystem clean.

Per-artifact matrix

Artifact Maven Central Apache dist Docker apache/tika Docker apache/tika-grpc

tika-core, tika-parsers-* jars (each module)

✓ slim jar (Maven-native)

inside the image

inside the image

tika-app-<v>.zip (CLI + GUI)

✓ slim jar

✓ assembly zip (slim jar + lib/ deps)

tika-server-standard-<v>.jar (slim runtime jar)

✓ slim jar

tika-server-standard-<v>.zip (full distribution)

extracted into image

tika-eval-app-<v>.zip (eval CLI)

✓ slim jar

✓ assembly zip (slim jar + lib/ deps)

tika-parser-scientific-package, tika-parser-sqlite3-package, tika-parser-nlp-package

✓ slim jar (~10 KB, metadata)

-shaded.jar (~20–25 MB, full deps)

inside the image

inside the image

tika-pipes-<plugin> (each — solr, http, s3, kafka, …)

✓ slim jar

✓ pf4j zip distribution

inside the image

inside the image

tika-grpc-<v>.jar (slim)

inside the image

tika-grpc-<v>.zip (Docker build artifact)

✗ not attached (TIKA-4723: <attach>false</attach>)

build context for the image

src.zip, KEYS, CHANGES-<v>.txt

Why each shape

Slim vs shaded jars (parser packages)

tika-parser-scientific-package (and sqlite3, nlp) are "drop-in classpath" artifacts. Sysadmins running tika-server who want a parser added to its classpath grab one fat jar and cp it into /tika-extras/. That’s the use case the shaded jar serves on Apache dist.

A Maven consumer wanting the same parsers does not want a 25 MB jar shaded over their classpath — Maven’s transitive dep resolution gives them the same classes via the module jar + its deps. So Central gets the slim (~10 KB) metadata jar; pom transitive deps do the work.

Mechanism: maven-shade-plugin configured with <outputFile>${project.build.directory}/${project.artifactId}-${project.version}-shaded.jar</outputFile> and <shadedArtifactAttached>false</shadedArtifactAttached>. Shade writes the fat jar to a separate file on disk but does not attach it to the Maven artifact set, so mvn deploy only uploads the slim main jar.

pf4j plugin zips (tika-pipes-*)

The .zip for each pipes plugin is the runtime drop-in form: unzip into <server>/plugins/<plugin-name>/ and pf4j discovers it at startup. That’s an Apache dist artifact, not a Maven artifact.

The plugin’s jar is on Maven Central for users building atop the plugin API or embedding it programmatically.

Mechanism: maven-assembly-plugin with <attach>false</attach> in each plugin pom (TIKA-4723). Because <attach>false</attach> also skips the local-repo install, each plugin pom additionally runs maven-install-plugin:install-file during the install phase, writing the zip into the local repo at canonical coordinates (<groupId>:<artifactId>:zip:<version>). Sibling modules (tika-pipes-fork-parser, tika-server-*, tika-grpc, integration tests) declare the zip as a test-scope Maven dep and rely on this mechanism to resolve it from the local repo without ever publishing it to Central.

tika-grpc

tika-grpc is a standalone gRPC server, parallel to tika-server — not built on top of tika-server-core. It depends directly on tika-core and the parser modules (via tika-parsers-standard-package).

Maven Central gets the slim tika-grpc.jar for users embedding the gRPC server in a Maven build. Apache dist publishes nothing for tika-grpc. Users either pull apache/tika-grpc from Docker Hub or add tika-grpc as a Maven dep.

The Docker image is built by the release workflow from a -Pdocker Maven invocation that produces a runnable layout (jar + deps + bundled plugins) locally; that build output is the build context for the apache/tika-grpc image and isn’t published as a release artifact in its own right.

tika-grpc expects pf4j plugins for full functionality; starting without plugins logs a warning with a download URL pointing at Apache dist. Most fetcher-dependent RPC calls will fail at runtime if no plugins are present.

Server: slim jar on Central, bin.zip on dist

tika-server-standard-<v>.jar is the slim runtime jar — its manifest declares Class-Path: lib/ and it expects to be run from a directory that also contains a populated lib/ (and plugins/). Standalone the slim jar can’t run. Maven Central publishes it for embedders who’ll resolve lib/ via Maven dep resolution.

tika-server-standard-<v>.zip is the full assembled distribution: the slim jar + lib/ + the bundled tika-pipes-file-system plugin + a startup script. Apache dist publishes this for sysadmins who want unzip + java -jar.

The 4.0.0-alpha-1 release published both on dist; 4.x onwards drops the slim jar from dist (only on Central) and drops the -bin.tgz variant (.zip is universally readable). 4.x also drops the legacy -bin classifier, so the full distribution is tika-server-standard-<v>.zip, consistent with tika-app, tika-eval-app, and the pipes plugins.

App / eval-app

Same pattern as the parser packages — Central gets the slim jar (Maven consumers); dist gets the assembled zip with deps under lib/.

Where this is configured in the source tree

The Apache dist staging include list: pom.xml, apache-release profile, the <copy> step inside the antrun task (look for tika-app near the top). One <include> line per artifact pattern. The tika-pipes-plugins//target/-shaded.jar* and similar globs cover the sets.

Per-module shaping: each module’s pom decides what shape its target/ produces (assembly with <attach>false</attach> for plugin zips and app zips; shade with outputFile + <shadedArtifactAttached>false</shadedArtifactAttached> for parser packages).

Maven Central deployment: happens via mvn deploy (or mvn release:perform). Any artifact that’s attached to the Maven project gets uploaded. The whole point of the <attach>false</attach> / <shadedArtifactAttached>false</shadedArtifactAttached> pattern is to keep the fat distribution shapes off Central without disrupting the build process.

Docker image contents: .github/workflows/docker-release.yml (the release publish workflow). The release-tika-grpc job currently assembles a custom build context from per-module outputs ( dependency:copy-dependencies, per-plugin cp, parser-package cp). The release-tika-server job builds from tika-server-standard-<v>.zip (unpacked into /opt/tika-server/).