Pipes Reporters
A pipes reporter records per-document processing status — success, parse exception, timeout, OOM — as the pipeline runs. Reporters are observational; they do not gate parsing or emission.
The Reporter Contract
Each reporter implements PipesReporter#report(FetchEmitTuple t, PipesResult result, long elapsed) and gets called once per processed document. Reporters typically buffer status records in memory and flush them on a background thread, so per-document calls stay cheap.
Wiring Reporters Into a Pipeline
Reporters live under the plural top-level pipes-reporters key. The keys inside that block are reporter type-names; multiple reporters may run together.
{
"pipes-reporters": {
"file-system-reporter": {
"statusFile": "/var/log/tika/status.json",
"reportUpdateMs": 1000
},
"jdbc-reporter": {
"connectionString": "jdbc:h2:mem:reports;DB_CLOSE_DELAY=-1"
}
}
}
Each entry’s outer key is the reporter’s component name — there is no separate ID layer because reporters do not get referenced by other components.
Available Reporters
| Plugin | Component name | Notes |
|---|---|---|
|
Writes a JSON status file periodically. Pair with an external watcher — see Live status for watching applications. |
|
|
Writes per-doc status rows to a SQL table. |
|
|
Writes per-doc status records to an OpenSearch index. |
|
|
Writes per-doc status records to an Elasticsearch index. |
For the full plugin / interface matrix, see Plugins.