CSV Plugin

The CSV plugin (tika-pipes-csv) provides an iterator that reads work items from a CSV file. It is iterator-only — pair it with a fetcher and emitter.

Interface Component name Class

Iterator

csv-pipes-iterator

CSVPipesIterator

CSV Iterator (csv-pipes-iterator)

Reads each row of the CSV as a work item and emits one FetchEmitTuple per row.

{
  "pipes-iterator": {
    "csv-pipes-iterator": {
      "csvPath": "/data/work-items.csv",
      "idColumn": "doc_id",
      "fetchKeyColumn": "source_path",
      "emitKeyColumn": "output_path",
      "fetcherId": "fsf",
      "emitterId": "fse"
    }
  }
}

Configuration

Field Default Description

csvPath

required

Path to the CSV file on disk.

idColumn

optional

Column whose value becomes the iterator’s row identifier.

fetchKeyColumn

optional

Column whose value becomes the fetch key on each emitted tuple.

emitKeyColumn

optional

Column whose value becomes the emit key on each emitted tuple.

fetcherId / emitterId

required

IDs of the fetcher and emitter to bind to each emitted tuple. See Pipes Iterators for the shared iterator contract.

Notes

  • The CSV must have a header row — column names in the config refer to header values, not column indexes.

  • For very large CSV files, the iterator streams rows rather than loading them all into memory.

  • For row-shaped work items in JSONL (one JSON object per line), use the JSON iterator instead.