Emitters

Emitters write parsed results to a destination. Each emitter is identified by its component name and an id that is referenced by the pipes iterator.

File System Emitter (file-system-emitter)

Writes parsed metadata as JSON files to a local or mounted filesystem.

Module: tika-pipes-file-system

{
  "emitters": [
    {
      "file-system-emitter": {
        "id": "my-emitter",
        "basePath": "/data/output",
        "fileExtension": "json",
        "onExists": "REPLACE",
        "prettyPrint": true
      }
    }
  ]
}
Field Default Description

basePath

required

Base output directory.

fileExtension

json

Extension for output files.

onExists

EXCEPTION

Behavior when output file exists: SKIP, REPLACE, EXCEPTION.

prettyPrint

false

Pretty-print JSON output.

Elasticsearch Emitter (es-emitter)

Sends parsed documents to Elasticsearch via the _bulk API. Uses plain HTTP — no dependency on the ES Java client.

Module: tika-pipes-es

Field Default Description

esUrl

required

Full URL including index (e.g., https://localhost:9200/my-index).

idField

_id

Metadata field used as the document _id.

apiKey

none

Base64-encoded id:api_key for authentication.

attachmentStrategy

SEPARATE_DOCUMENTS

SEPARATE_DOCUMENTS or PARENT_CHILD.

updateStrategy

OVERWRITE

OVERWRITE (full replace) or UPSERT (field-level merge).

embeddedFileFieldName

embedded

Join-field name for PARENT_CHILD mode.

OpenSearch Emitter (opensearch-emitter)

Sends documents to OpenSearch. Configured identically to the ES emitter but uses openSearchUrl instead of esUrl.

Module: tika-pipes-opensearch

S3 Emitter (s3-emitter)

Writes parsed metadata as JSON objects to Amazon S3.

Module: tika-pipes-s3

Field Default Description

bucket

required

S3 bucket name.

region

required

AWS region.

prefix

none

S3 key prefix for output objects.

credentialsProvider

profile

Credentials type: profile, static, instance.

fileExtension

json

File extension for output keys.

GCS Emitter (gcs-emitter)

Writes parsed metadata to Google Cloud Storage.

Module: tika-pipes-gcs

Azure Blob Emitter (az-blob-emitter)

Writes parsed metadata to Azure Blob Storage.

Module: tika-pipes-az-blob

Solr Emitter (solr-emitter)

Indexes parsed documents into Apache Solr.

Module: tika-pipes-solr

Field Default Description

solrCollection

required

Solr collection name.

solrUrls

required

List of Solr URLs.

idField

id

Field name for document ID.

commitWithin

-1

Milliseconds before auto-commit (-1 = server default).

attachmentStrategy

SEPARATE_DOCUMENTS

How to handle embedded documents.

JDBC Emitter (jdbc-emitter)

Writes parsed metadata to a SQL database via JDBC.

Module: tika-pipes-jdbc

Field Default Description

connection

required

JDBC connection string.

insert

required

SQL INSERT statement with ? placeholders.

keys

required

Ordered list of metadata keys to bind to placeholders.

Kafka Emitter (kafka-emitter)

Sends parsed metadata as messages to Apache Kafka.

Module: tika-pipes-kafka

Field Default Description

topic

required

Kafka topic name.

bootstrapServers

required

Kafka broker addresses.

acks

all

Acknowledgment requirement.

lingerMs

0

Batch wait time in milliseconds.