Class CSVPipesIterator
- java.lang.Object
-
- org.apache.tika.config.ConfigBase
-
- org.apache.tika.pipes.pipesiterator.PipesIterator
-
- org.apache.tika.pipes.pipesiterator.csv.CSVPipesIterator
-
- All Implemented Interfaces:
Iterable<FetchEmitTuple>,Callable<Integer>,Initializable
public class CSVPipesIterator extends PipesIterator implements Initializable
Iterates through a UTF-8 CSV file. This adds all columns (except for the 'fetchKeyColumn' and 'emitKeyColumn', if specified) to the metadata object.- If an 'idColumn' is specified, this will use that column's value as the id.
- If no 'idColumn' is specified, but a 'fetchKeyColumn' is specified, the string in the 'fetchKeyColumn' will be used as the 'id'.
- The 'idColumn' value is not added to the metadata.
- If a 'fetchKeyColumn' is specified, this will use that column's value as the fetchKey.
- If no 'fetchKeyColumn' is specified, this will send the metadata from the other columns.
- The 'fetchKeyColumn' value is not added to the metadata.
- If an 'emitKeyColumn' is specified, this will use that column's value as the emit key.
- If an 'emitKeyColumn' is not specified, this will use the value from the 'fetchKeyColumn'.
- The 'emitKeyColumn' value is not added to the metadata.
-
-
Field Summary
-
Fields inherited from class org.apache.tika.pipes.pipesiterator.PipesIterator
COMPLETED_SEMAPHORE, DEFAULT_MAX_WAIT_MS, DEFAULT_QUEUE_SIZE
-
-
Constructor Summary
Constructors Constructor Description CSVPipesIterator()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcheckInitialization(InitializableProblemHandler problemHandler)protected voidenqueue()voidsetCsvPath(String csvPath)voidsetCsvPath(Path csvPath)voidsetEmitKeyColumn(String emitKeyColumn)voidsetFetchKeyColumn(String fetchKeyColumn)voidsetIdColumn(String idColumn)-
Methods inherited from class org.apache.tika.pipes.pipesiterator.PipesIterator
build, call, getEmitterName, getFetcherName, getHandlerConfig, getOnParseException, initialize, iterator, setEmitterName, setFetcherName, setHandlerType, setMaxEmbeddedResources, setMaxWaitMs, setOnParseException, setOnParseException, setParseMode, setParseMode, setQueueSize, setThrowOnWriteLimitReached, setWriteLimit, tryToAdd
-
Methods inherited from class org.apache.tika.config.ConfigBase
buildComposite, buildComposite, buildSingle, buildSingle, configure, handleSettings
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.tika.config.Initializable
initialize
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Method Detail
-
enqueue
protected void enqueue() throws InterruptedException, IOException, TimeoutException- Specified by:
enqueuein classPipesIterator- Throws:
InterruptedExceptionIOExceptionTimeoutException
-
checkInitialization
public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
- Specified by:
checkInitializationin interfaceInitializable- Overrides:
checkInitializationin classPipesIterator- Parameters:
problemHandler- if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.- Throws:
TikaConfigException
-
-