Emitters
Emitters write parsed results to a destination. Each emitter is identified by
its component name and an id that is referenced by the pipes iterator.
File System Emitter (file-system-emitter)
Writes parsed metadata as JSON files to a local or mounted filesystem.
Module: tika-pipes-file-system
{
"emitters": [
{
"file-system-emitter": {
"id": "my-emitter",
"basePath": "/data/output",
"fileExtension": "json",
"onExists": "REPLACE",
"prettyPrint": true
}
}
]
}
| Field | Default | Description |
|---|---|---|
|
required |
Base output directory. |
|
|
Extension for output files. |
|
|
Behavior when output file exists: |
|
|
Pretty-print JSON output. |
Elasticsearch Emitter (es-emitter)
Sends parsed documents to Elasticsearch via the _bulk API. Uses plain HTTP — no dependency on the ES Java client.
Module: tika-pipes-es
| Field | Default | Description |
|---|---|---|
|
required |
Full URL including index (e.g., |
|
|
Metadata field used as the document |
|
none |
Base64-encoded |
|
|
|
|
|
|
|
|
Join-field name for |
OpenSearch Emitter (opensearch-emitter)
Sends documents to OpenSearch. Configured identically to the ES emitter
but uses openSearchUrl instead of esUrl.
Module: tika-pipes-opensearch
S3 Emitter (s3-emitter)
Writes parsed metadata as JSON objects to Amazon S3.
Module: tika-pipes-s3
| Field | Default | Description |
|---|---|---|
|
required |
S3 bucket name. |
|
required |
AWS region. |
|
none |
S3 key prefix for output objects. |
|
|
Credentials type: |
|
|
File extension for output keys. |
Azure Blob Emitter (az-blob-emitter)
Writes parsed metadata to Azure Blob Storage.
Module: tika-pipes-az-blob
Solr Emitter (solr-emitter)
Indexes parsed documents into Apache Solr.
Module: tika-pipes-solr
| Field | Default | Description |
|---|---|---|
|
required |
Solr collection name. |
|
required |
List of Solr URLs. |
|
|
Field name for document ID. |
|
|
Milliseconds before auto-commit (-1 = server default). |
|
|
How to handle embedded documents. |