Pipes Plugins

Tika Pipes is extensible through plugins. Each plugin lives in its own Maven module and can implement one or more of the four pipes extension points:

  • Fetcher — retrieves document bytes from a source.

  • Emitter — writes parsed results to a destination.

  • Iterator (PipesIterator) — enumerates documents to process as FetchEmitTuple records.

  • Reporter (PipesReporter) — records per-document processing status.

Many plugins implement more than one (e.g., the S3 plugin provides fetcher, emitter, and iterator). The pages below document each plugin once, with one section per implemented interface.

Plugin / Interface Matrix

Plugin Fetcher Emitter Iterator Reporter

File System

Amazon S3

Google Cloud Storage

Azure Blob Storage

OpenSearch

Elasticsearch

Solr

JDBC

Kafka

HTTP

Google Drive

Microsoft Graph

Atlassian JWT

CSV

JSON

Interface Overviews

For descriptions of the interfaces themselves — their contracts, the shared concepts (FetchKey, FetchEmitTuple, fetcherId/emitterId wiring, etc.), and how they fit into a pipeline — see: