Pipes Reporters

Reporters track the processing status of each document in the pipeline. They record whether a parse succeeded, failed, or timed out, along with timing information.

File System Reporter (file-system-reporter)

Writes a JSON status file that is updated periodically.

Module: tika-pipes-file-system

Field Default Description

statusFile

required

Path to the JSON status file.

reportUpdateMs

1000

How often to update the status file (milliseconds).

JDBC Reporter (jdbc-reporter)

Writes per-document status to a SQL database table.

Module: tika-pipes-jdbc

Field Default Description

connectionString

required

JDBC connection string.

tableName

required

Table name for status records.

createTable

false

Auto-create the table if it does not exist.

Elasticsearch Reporter (es-pipes-reporter)

Writes per-document parse status back into the Elasticsearch index via upsert.

Module: tika-pipes-es

Field Default Description

esUrl

required

Elasticsearch endpoint (including index).

keyPrefix

tika_

Prefix for status fields (e.g., tika_parse_status).

includeRouting

false

Include routing in upsert requests.

OpenSearch Reporter (opensearch-pipes-reporter)

Same as the ES reporter but for OpenSearch. Uses openSearchUrl instead of esUrl.

Module: tika-pipes-opensearch