Release Artifacts: What Goes Where
A 4.x Tika release publishes to three channels, each with a different audience:
| Channel | URL | Audience |
|---|---|---|
Maven Central |
Java consumers adding Tika to a Maven / Gradle build. Get lean basic jars plus pom + sources-jar + javadoc-jar. Maven resolves transitive deps for them. |
|
Apache dist |
Humans downloading runnable archives or drop-in plugin zips. No Maven involved. Want fat / self-contained artifacts. |
|
Docker Hub |
|
Container deployers. Get a ready-to-run image with parsers + plugins bundled (the "kitchen sink" by default). |
The driving principle: fat distribution artifacts (zips, shaded jars, runnable bundles) do not go to Maven Central; basic Maven artifacts (slim jars + pom) do not need to clutter Apache dist. Channel-specific shapes keep each ecosystem clean.
Per-artifact matrix
| Artifact | Maven Central | Apache dist | Docker apache/tika |
Docker apache/tika-grpc |
|---|---|---|---|---|
|
✓ slim jar (Maven-native) |
— |
inside the image |
inside the image |
|
✓ slim jar |
✓ assembly zip (slim jar + |
— |
— |
|
✓ slim jar |
— |
— |
— |
|
— |
✓ |
extracted into image |
— |
|
✓ slim jar |
✓ assembly zip (slim jar + |
— |
— |
|
✓ slim jar (~10 KB, metadata) |
✓ |
inside the image |
inside the image |
|
✓ slim jar |
✓ pf4j zip distribution |
inside the image |
inside the image |
|
✓ |
— |
— |
inside the image |
|
✗ not attached (TIKA-4723: |
— |
— |
build context for the image |
|
— |
✓ |
— |
— |
Why each shape
Slim vs shaded jars (parser packages)
tika-parser-scientific-package (and sqlite3, nlp) are
"drop-in classpath" artifacts. Sysadmins running tika-server who want a
parser added to its classpath grab one fat jar and cp it into
/tika-extras/. That’s the use case the shaded jar serves on Apache dist.
A Maven consumer wanting the same parsers does not want a 25 MB jar
shaded over their classpath — Maven’s transitive dep resolution gives them
the same classes via the module jar + its deps. So Central gets the slim
(~10 KB) metadata jar; pom transitive deps do the work.
Mechanism: maven-shade-plugin configured with
<outputFile>${project.build.directory}/${project.artifactId}-${project.version}-shaded.jar</outputFile>
and <shadedArtifactAttached>false</shadedArtifactAttached>. Shade writes
the fat jar to a separate file on disk but does not attach it to the
Maven artifact set, so mvn deploy only uploads the slim main jar.
pf4j plugin zips (tika-pipes-*)
The .zip for each pipes plugin is the runtime drop-in form: unzip into
<server>/plugins/<plugin-name>/ and pf4j discovers it at startup. That’s
an Apache dist artifact, not a Maven artifact.
The plugin’s jar is on Maven Central for users building atop the plugin API or embedding it programmatically.
Mechanism: maven-assembly-plugin with <attach>false</attach> in each
plugin pom (TIKA-4723). Because <attach>false</attach> also skips the
local-repo install, each plugin pom additionally runs
maven-install-plugin:install-file during the install phase, writing
the zip into the local repo at canonical coordinates
(<groupId>:<artifactId>:zip:<version>). Sibling modules
(tika-pipes-fork-parser, tika-server-*, tika-grpc, integration
tests) declare the zip as a test-scope Maven dep and rely on this
mechanism to resolve it from the local repo without ever publishing it
to Central.
tika-grpc
tika-grpc is a standalone gRPC server, parallel to tika-server — not built
on top of tika-server-core. It depends directly on tika-core and the
parser modules (via tika-parsers-standard-package).
Maven Central gets the slim tika-grpc.jar for users embedding the gRPC
server in a Maven build. Apache dist publishes nothing for tika-grpc.
Users either pull apache/tika-grpc from Docker Hub or add tika-grpc
as a Maven dep.
The Docker image is built by the release workflow from a -Pdocker
Maven invocation that produces a runnable layout (jar + deps + bundled
plugins) locally; that build output is the build context for the
apache/tika-grpc image and isn’t published as a release artifact in its
own right.
tika-grpc expects pf4j plugins for full functionality; starting without plugins logs a warning with a download URL pointing at Apache dist. Most fetcher-dependent RPC calls will fail at runtime if no plugins are present.
Server: slim jar on Central, bin.zip on dist
tika-server-standard-<v>.jar is the slim runtime jar — its manifest
declares Class-Path: lib/ and it expects to be run from a directory
that also contains a populated lib/ (and plugins/). Standalone the
slim jar can’t run. Maven Central publishes it for embedders who’ll
resolve lib/ via Maven dep resolution.
tika-server-standard-<v>.zip is the full assembled distribution:
the slim jar + lib/ + the bundled tika-pipes-file-system plugin + a
startup script. Apache dist publishes this for sysadmins who want
unzip + java -jar.
The 4.0.0-alpha-1 release published both on dist; 4.x onwards drops the
slim jar from dist (only on Central) and drops the -bin.tgz variant
(.zip is universally readable). 4.x also drops the legacy -bin
classifier, so the full distribution is tika-server-standard-<v>.zip,
consistent with tika-app, tika-eval-app, and the pipes plugins.
Where this is configured in the source tree
The Apache dist staging include list: pom.xml, apache-release
profile, the <copy> step inside the antrun task (look for tika-app
near the top). One <include> line per artifact pattern. The
tika-pipes-plugins//target/-shaded.jar* and similar globs cover the
sets.
Per-module shaping: each module’s pom decides what shape its target/
produces (assembly with <attach>false</attach> for plugin zips and
app zips; shade with outputFile + <shadedArtifactAttached>false</shadedArtifactAttached>
for parser packages).
Maven Central deployment: happens via mvn deploy (or
mvn release:perform). Any artifact that’s attached to the Maven project
gets uploaded. The whole point of the <attach>false</attach> /
<shadedArtifactAttached>false</shadedArtifactAttached> pattern is to
keep the fat distribution shapes off Central without disrupting the
build process.
Docker image contents: .github/workflows/docker-release.yml (the
release publish workflow). The release-tika-grpc job currently
assembles a custom build context from per-module outputs (
dependency:copy-dependencies, per-plugin cp, parser-package cp).
The release-tika-server job builds from tika-server-standard-<v>.zip
(unpacked into /opt/tika-server/).