Emitters
An emitter writes parse results to a destination — a file on disk, a row in a database, a document in a search index, a message on a queue, etc.
The Emitter Contract
Each emitter implements Emitter#emit(EmitData emitData), where EmitData carries the emit key, the parsed Metadata, and (for content-emitting strategies) the extracted content.
The emit key is supplied by the iterator on each FetchEmitTuple and tells the emitter where to put the result. Its shape depends on the emitter:
-
file-system / S3 / GCS / Azure Blob — a key/path relative to
basePathorprefix. -
OpenSearch / Elasticsearch / Solr — the
_idfield value, taken from the metadata field named by the emitter’sidField. -
JDBC — the value bound to the first
?placeholder in theinserttemplate. -
Kafka — the Kafka record key.
Emitters are intended to be safe under concurrent use; the pipeline’s worker pool may call emit() from many threads.
Wiring Emitters Into a Pipeline
Emitters live under the top-level emitters key. Each emitter gets an ID (the outer map key) and a type-name (the inner map key); the iterator references the ID through its emitterId field.
{
"emitters": {
"output": {
"file-system-emitter": {
"basePath": "/data/output",
"fileExtension": "json"
}
}
},
"pipes-iterator": {
"file-system-pipes-iterator": {
"basePath": "/data/input",
"fetcherId": "...",
"emitterId": "output"
}
}
}
A pipeline may declare multiple emitters and choose between them at iterator-config time. Within a single iterator, each emitted FetchEmitTuple carries exactly one emitter ID.
Available Emitters
| Plugin | Component name | Notes |
|---|---|---|
|
Local / mounted filesystem. |
|
|
S3 or S3-compatible. |
|
|
GCS via ADC. |
|
|
SAS-token auth. |
|
|
REST-based bulk indexing. |
|
|
REST-based bulk indexing; ApiKey or basic auth. |
|
|
SolrCloud (URLs or ZooKeeper). |
|
|
Any RDBMS with a JDBC driver. |
|
|
Standard Kafka producer. |
For the full plugin / interface matrix, see Plugins.