Pipes Configuration

The pipes section of the JSON config controls the pipeline process itself: how many forked JVMs to run, timeouts, memory management, and parse behavior.

{
  "pipes": {
    "numClients": 4,
    "socketTimeoutMs": 60000,
    "maxFilesProcessedPerProcess": 10000,
    "parseMode": "RMETA",
    "onParseException": "EMIT",
    "forkedJvmArgs": ["-Xmx512m"]
  }
}

Process Management

Field Default Description

numClients

4

Number of parallel forked JVM processes. Each processes one document at a time. See Forked-JVM CPU Sizing for guidance on choosing this value relative to host CPU count.

forkedJvmArgs

[]

JVM arguments for forked processes (e.g., ["-Xmx512m", "-Xms256m"]). When numClients > 1, Tika auto-injects -XX:ActiveProcessorCount to right-size each fork’s GC and JIT thread pools unless you provide your own; see Forked-JVM CPU Sizing.

javaPath

java

Path to the Java executable for forked processes.

maxFilesProcessedPerProcess

10000

Restart forked processes after this many files. Prevents slow-building memory leaks in parsing libraries.

tempDirectory

system default

Directory for temporary files. Consider a RAM-backed filesystem (e.g., /dev/shm) for better performance.

Timeouts

See also Timeouts for the full timeout model.

Field Default Description

socketTimeoutMs

60000

Maximum time (ms) to wait for data from a forked process. If no heartbeat or result is received within this window, the parse is considered hung.

heartbeatIntervalMs

1000

Interval (ms) between heartbeats sent from the forked process. Must be significantly less than socketTimeoutMs.

startupTimeoutMillis

240000

Maximum time (ms) to wait for a forked process to start up.

shutdownClientAfterMillis

300000

Shut down an idle forked process after this many milliseconds of inactivity.

maxWaitForClientMillis

60000

Maximum time (ms) to wait for an available forked process when all are busy.

Parse Behavior

Field Default Description

parseMode

RMETA

How embedded documents are handled: RMETA (recursive metadata list), CONCATENATE, CONTENT_ONLY, UNPACK. See Parse Modes.

onParseException

EMIT

What to do when a parse fails: EMIT (emit error metadata) or SKIP (silently skip).

stopOnlyOnFatal

false

When false, stop the pipeline on configuration errors (missing fetcher/emitter). When true, only stop on fatal initialization failures. Use true for server mode, false for batch mode.

Async / Emit Batching

These settings control how parsed results are batched before sending to emitters.

Field Default Description

numEmitters

1

Number of emitter threads.

queueSize

10000

Size of the fetch/emit tuple queue.

emitWithinMillis

10000

Flush the emit batch if nothing has been emitted within this many milliseconds, even if the batch is not full.

emitMaxEstimatedBytes

100000

Flush the emit batch when the estimated size reaches this many bytes.

emitIntermediateResults

false

Emit partial results as they become available (rather than waiting for the full parse to complete).

Shared Server Mode (Experimental)

Field Default Description

useSharedServer

false

When true, multiple clients share a single forked JVM instead of each having its own. Reduces memory overhead but sacrifices isolation — one crash affects all in-flight requests. Not recommended for production.

See Shared Server Mode for details.