CSV Plugin
The CSV plugin (tika-pipes-csv) provides an iterator that reads work items from a CSV file. It is iterator-only — pair it with a fetcher and emitter.
| Interface | Component name | Class |
|---|---|---|
Iterator |
|
|
CSV Iterator (csv-pipes-iterator)
Reads each row of the CSV as a work item and emits one FetchEmitTuple per row.
{
"pipes-iterator": {
"csv-pipes-iterator": {
"csvPath": "/data/work-items.csv",
"idColumn": "doc_id",
"fetchKeyColumn": "source_path",
"emitKeyColumn": "output_path",
"fetcherId": "fsf",
"emitterId": "fse"
}
}
}
Configuration
| Field | Default | Description |
|---|---|---|
|
required |
Path to the CSV file on disk. |
|
optional |
Column whose value becomes the iterator’s row identifier. |
|
optional |
Column whose value becomes the fetch key on each emitted tuple. |
|
optional |
Column whose value becomes the emit key on each emitted tuple. |
|
required |
IDs of the fetcher and emitter to bind to each emitted tuple. See Pipes Iterators for the shared iterator contract. |
Notes
-
The CSV must have a header row — column names in the config refer to header values, not column indexes.
-
For very large CSV files, the iterator streams rows rather than loading them all into memory.
-
For row-shaped work items in JSONL (one JSON object per line), use the JSON iterator instead.