Pipes Troubleshooting
This page covers diagnosing problems with the forked PipesServer processes
that Tika Pipes uses for per-document isolation. The most common symptom is a
forked process that dies during startup, or one that becomes unresponsive
mid-run.
When a forked server fails to start
The Tika parent process always logs the exit code of a failed fork. You will see something like:
ERROR clientId=2: Process exited with code 1 before connecting to socket
ERROR Shared server process exited with code 1 before becoming ready
For native JVM crashes (e.g. a segfault in a JNI parser), the JVM writes an
hs_err_pid<N>.log file. We direct that via -XX:ErrorFile= into the
manager’s per-server temp directory, then read it into the parent’s SLF4J
logger before cleanup:
ERROR clientId=2: JVM crash log hs_err_pid12345.log:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f...
...
So for native crashes, read the parent application’s log first — the hs_err contents are inlined there.
Child JVM stdout/stderr
By default the child PipesServer JVM inherits its stdout and stderr from
the parent. This is the 12-factor / container-friendly default: when Tika
runs in Docker or Kubernetes, the pipes-server’s log records flow through
to the container’s stdio stream where the runtime (Docker, containerd) and
any log aggregator (fluentd, fluent-bit, Promtail, the K8s log API, etc.)
pick them up automatically. The default pipes-fork-server-default-log4j2.xml
writes to SYSTEM_ERR, so inheritance is what makes those records visible
to your observability stack.
If you don’t want the pipes-server’s output interleaved with your own — e.g. an embedded use case where the parent is producing its own structured
stdout, or a test environment where you want a quieter console — set the
system property tika.pipes.server.stdio=discard on the parent JVM:
java -Dtika.pipes.server.stdio=discard -jar your-app.jar ...
With this set, the child’s stdout and stderr are routed to the null sink
and the pipes server’s log records are silently dropped at the OS level.
(Records written via SLF4J inside the child can still be captured by
configuring log4j2.xml / logback.xml to write to your own file or
network appender, independent of the stdio setting.)
Safety of the inherit default on Windows
Earlier versions of Tika hit a surefire hang on Windows when inheriting
child stdio: a forked child held a duplicate of the parent JVM’s stderr
handle, and any reader upstream of the parent (a maven-surefire controller,
typically) never saw EOF after the parent died — the child kept the pipe
open. That class of hang is now mitigated structurally: every child
PipesServer watches its parent’s process handle via
ProcessHandle.onExit() (see Parent-death detection) and self-
terminates within milliseconds of parent exit. The inherited handle is
released essentially synchronously with the parent’s death, and upstream
readers see EOF promptly.
Parent-death detection
The child PipesServer JVMs watch their parent’s PID via
ProcessHandle.onExit() and self-terminate within milliseconds if the
parent dies. The parent passes its own PID via the
TIKA_PIPES_PARENT_PID environment variable when spawning the child.
This matters because the parent (e.g. tika-server) can be killed in ways
that skip its JVM shutdown hooks — for instance,
Process.destroy() on Windows is equivalent to TerminateProcess, which
bypasses all hooks. Without parent-death detection, an orphaned PipesServer
would only notice via TCP RST on its next socket read, and would not
notice at all while busy in a parse, leaving it (and any external
subprocess it had spawned, such as a tesseract OCR worker) running
indefinitely.
When the watcher fires, the child exits via System.exit, which runs
`AbstractExternalProcessParser’s shutdown hook and cleans up any
in-flight external subprocesses.
Configuration knobs reference
| System property / env var | Effect |
|---|---|
|
|
|
Set automatically by the parent manager when spawning a |