Class CSVPipesIterator
- java.lang.Object
-
- org.apache.tika.config.ConfigBase
-
- org.apache.tika.pipes.pipesiterator.PipesIterator
-
- org.apache.tika.pipes.pipesiterator.csv.CSVPipesIterator
-
- All Implemented Interfaces:
Iterable<FetchEmitTuple>
,Callable<Integer>
,Initializable
public class CSVPipesIterator extends PipesIterator implements Initializable
Iterates through a UTF-8 CSV file. This adds all columns (except for the 'fetchKeyColumn' and 'emitKeyColumn', if specified) to the metadata object.- If an 'idColumn' is specified, this will use that column's value as the id.
- If no 'idColumn' is specified, but a 'fetchKeyColumn' is specified, the string in the 'fetchKeyColumn' will be used as the 'id'.
- The 'idColumn' value is not added to the metadata.
- If a 'fetchKeyColumn' is specified, this will use that column's value as the fetchKey.
- If no 'fetchKeyColumn' is specified, this will send the metadata from the other columns.
- The 'fetchKeyColumn' value is not added to the metadata.
- If an 'emitKeyColumn' is specified, this will use that column's value as the emit key.
- If an 'emitKeyColumn' is not specified, this will use the value from the 'fetchKeyColumn'.
- The 'emitKeyColumn' value is not added to the metadata.
-
-
Field Summary
-
Fields inherited from class org.apache.tika.pipes.pipesiterator.PipesIterator
COMPLETED_SEMAPHORE, DEFAULT_MAX_WAIT_MS, DEFAULT_QUEUE_SIZE
-
-
Constructor Summary
Constructors Constructor Description CSVPipesIterator()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
checkInitialization(InitializableProblemHandler problemHandler)
protected void
enqueue()
void
setCsvPath(String csvPath)
void
setCsvPath(Path csvPath)
void
setEmitKeyColumn(String emitKeyColumn)
void
setFetchKeyColumn(String fetchKeyColumn)
void
setIdColumn(String idColumn)
-
Methods inherited from class org.apache.tika.pipes.pipesiterator.PipesIterator
build, call, getEmitterName, getFetcherName, getHandlerConfig, getOnParseException, initialize, iterator, setEmitterName, setFetcherName, setHandlerType, setMaxEmbeddedResources, setMaxWaitMs, setOnParseException, setOnParseException, setParseMode, setParseMode, setQueueSize, setThrowOnWriteLimitReached, setWriteLimit, tryToAdd
-
Methods inherited from class org.apache.tika.config.ConfigBase
buildComposite, buildComposite, buildSingle, buildSingle, configure, handleSettings
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.tika.config.Initializable
initialize
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Method Detail
-
enqueue
protected void enqueue() throws InterruptedException, IOException, TimeoutException
- Specified by:
enqueue
in classPipesIterator
- Throws:
InterruptedException
IOException
TimeoutException
-
checkInitialization
public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
- Specified by:
checkInitialization
in interfaceInitializable
- Overrides:
checkInitialization
in classPipesIterator
- Parameters:
problemHandler
- if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.- Throws:
TikaConfigException
-
-