Emitters

An emitter writes parse results to a destination — a file on disk, a row in a database, a document in a search index, a message on a queue, etc.

The Emitter Contract

Each emitter implements Emitter#emit(EmitData emitData), where EmitData carries the emit key, the parsed Metadata, and (for content-emitting strategies) the extracted content.

The emit key is supplied by the iterator on each FetchEmitTuple and tells the emitter where to put the result. Its shape depends on the emitter:

  • file-system / S3 / GCS / Azure Blob — a key/path relative to basePath or prefix.

  • OpenSearch / Elasticsearch / Solr — the _id field value, taken from the metadata field named by the emitter’s idField.

  • JDBC — the value bound to the first ? placeholder in the insert template.

  • Kafka — the Kafka record key.

Emitters are intended to be safe under concurrent use; the pipeline’s worker pool may call emit() from many threads.

Wiring Emitters Into a Pipeline

Emitters live under the top-level emitters key. Each emitter gets an ID (the outer map key) and a type-name (the inner map key); the iterator references the ID through its emitterId field.

{
  "emitters": {
    "output": {
      "file-system-emitter": {
        "basePath": "/data/output",
        "fileExtension": "json"
      }
    }
  },
  "pipes-iterator": {
    "file-system-pipes-iterator": {
      "basePath": "/data/input",
      "fetcherId": "...",
      "emitterId": "output"
    }
  }
}

A pipeline may declare multiple emitters and choose between them at iterator-config time. Within a single iterator, each emitted FetchEmitTuple carries exactly one emitter ID.

Available Emitters

Plugin Component name Notes

File System

file-system-emitter

Local / mounted filesystem.

Amazon S3

s3-emitter

S3 or S3-compatible.

Google Cloud Storage

gcs-emitter

GCS via ADC.

Azure Blob Storage

az-blob-emitter

SAS-token auth.

OpenSearch

opensearch-emitter

REST-based bulk indexing.

Elasticsearch

es-emitter

REST-based bulk indexing; ApiKey or basic auth.

Apache Solr

solr-emitter

SolrCloud (URLs or ZooKeeper).

JDBC

jdbc-emitter

Any RDBMS with a JDBC driver.

Apache Kafka

kafka-emitter

Standard Kafka producer.

For the full plugin / interface matrix, see Plugins.