Class CSVPipesIterator
java.lang.Object
org.apache.tika.config.ConfigBase
org.apache.tika.pipes.pipesiterator.PipesIterator
org.apache.tika.pipes.pipesiterator.csv.CSVPipesIterator
- All Implemented Interfaces:
Iterable<FetchEmitTuple>
,Callable<Integer>
,Initializable
Iterates through a UTF-8 CSV file. This adds all columns
(except for the 'fetchKeyColumn' and 'emitKeyColumn', if specified)
to the metadata object.
- If an 'idColumn' is specified, this will use that column's value as the id.
- If no 'idColumn' is specified, but a 'fetchKeyColumn' is specified, the string in the 'fetchKeyColumn' will be used as the 'id'.
- The 'idColumn' value is not added to the metadata.
- If a 'fetchKeyColumn' is specified, this will use that column's value as the fetchKey.
- If no 'fetchKeyColumn' is specified, this will send the metadata from the other columns.
- The 'fetchKeyColumn' value is not added to the metadata.
- If an 'emitKeyColumn' is specified, this will use that column's value as the emit key.
- If an 'emitKeyColumn' is not specified, this will use the value from the 'fetchKeyColumn'.
- The 'emitKeyColumn' value is not added to the metadata.
-
Field Summary
Fields inherited from class org.apache.tika.pipes.pipesiterator.PipesIterator
COMPLETED_SEMAPHORE, DEFAULT_MAX_WAIT_MS, DEFAULT_QUEUE_SIZE
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
checkInitialization
(InitializableProblemHandler problemHandler) protected void
enqueue()
void
setCsvPath
(String csvPath) void
setCsvPath
(Path csvPath) void
setEmitKeyColumn
(String emitKeyColumn) void
setFetchKeyColumn
(String fetchKeyColumn) void
setIdColumn
(String idColumn) Methods inherited from class org.apache.tika.pipes.pipesiterator.PipesIterator
build, call, getEmitterName, getFetcherName, getHandlerConfig, getOnParseException, initialize, iterator, setEmitterName, setFetcherName, setHandlerType, setMaxEmbeddedResources, setMaxWaitMs, setOnParseException, setOnParseException, setParseMode, setParseMode, setQueueSize, setThrowOnWriteLimitReached, setWriteLimit, tryToAdd
Methods inherited from class org.apache.tika.config.ConfigBase
buildComposite, buildComposite, buildSingle, buildSingle, configure, handleSettings
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.tika.config.Initializable
initialize
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
CSVPipesIterator
public CSVPipesIterator()
-
-
Method Details
-
setCsvPath
-
setFetchKeyColumn
-
setEmitKeyColumn
-
setIdColumn
-
setCsvPath
-
enqueue
- Specified by:
enqueue
in classPipesIterator
- Throws:
InterruptedException
IOException
TimeoutException
-
checkInitialization
public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException - Specified by:
checkInitialization
in interfaceInitializable
- Overrides:
checkInitialization
in classPipesIterator
- Parameters:
problemHandler
- if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.- Throws:
TikaConfigException
-