Pipes Reporters

A pipes reporter records per-document processing status — success, parse exception, timeout, OOM — as the pipeline runs. Reporters are observational; they do not gate parsing or emission.

The Reporter Contract

Each reporter implements PipesReporter#report(FetchEmitTuple t, PipesResult result, long elapsed) and gets called once per processed document. Reporters typically buffer status records in memory and flush them on a background thread, so per-document calls stay cheap.

Wiring Reporters Into a Pipeline

Reporters live under the plural top-level pipes-reporters key. The keys inside that block are reporter type-names; multiple reporters may run together.

{
  "pipes-reporters": {
    "file-system-reporter": {
      "statusFile": "/var/log/tika/status.json",
      "reportUpdateMs": 1000
    },
    "jdbc-reporter": {
      "connectionString": "jdbc:h2:mem:reports;DB_CLOSE_DELAY=-1"
    }
  }
}

Each entry’s outer key is the reporter’s component name — there is no separate ID layer because reporters do not get referenced by other components.

Available Reporters

Plugin Component name Notes

File System

file-system-reporter

Writes a JSON status file periodically. Pair with an external watcher — see Live status for watching applications.

JDBC

jdbc-reporter

Writes per-doc status rows to a SQL table.

OpenSearch

opensearch-pipes-reporter

Writes per-doc status records to an OpenSearch index.

Elasticsearch

es-pipes-reporter

Writes per-doc status records to an Elasticsearch index.

For the full plugin / interface matrix, see Plugins.