Package org.apache.tika.batch
Class BatchProcess
java.lang.Object
org.apache.tika.batch.BatchProcess
- All Implemented Interfaces:
Callable<ParallelFileProcessingResult>
This is the main processor class for a single process.
This class can only be run once.
It requires a
FileResourceCrawler
and FileResourceConsumer
s, and it can also
support a StatusReporter
and an Interrupter
.
This is designed to shutdown if a parser has timed out or if there is
an OutOfMemoryError. Consider using BatchProcessDriverCLI
as a daemon/watchdog that monitors and can restart this batch process;
Note that this classs redirects stderr to stdout so that it can communicate without interference with the parent process on stderr.
-
Nested Class Summary
-
Constructor Summary
ConstructorDescriptionBatchProcess
(FileResourceCrawler fileResourceCrawler, ConsumersManager consumersManager, StatusReporter reporter, Interrupter interrupter) -
Method Summary
Modifier and TypeMethodDescriptioncall()
Runs main execution loop.void
setMaxAliveTimeSeconds
(int maxAliveTimeSeconds) The maximum amount of time that this process can be alive.void
setPauseOnEarlyTerminationMillis
(long pauseOnEarlyTerminationMillis) If there is an early termination via an interrupt or too many timed out consumers or because a consumer or other Runnable threw a Throwable, pause this long before interrupting the consumers and other threads.void
setTimeoutCheckPulseMillis
(long timeoutCheckPulseMillis) void
setTimeoutThresholdMillis
(long timeoutThresholdMillis) The amount of time allowed before a consumer should be timed out.
-
Constructor Details
-
BatchProcess
public BatchProcess(FileResourceCrawler fileResourceCrawler, ConsumersManager consumersManager, StatusReporter reporter, Interrupter interrupter)
-
-
Method Details
-
call
Runs main execution loop.Redirects stdout to stderr to keep clean communications over stdout with parent process
- Specified by:
call
in interfaceCallable<ParallelFileProcessingResult>
- Returns:
- result of the processing
- Throws:
InterruptedException
-
setPauseOnEarlyTerminationMillis
public void setPauseOnEarlyTerminationMillis(long pauseOnEarlyTerminationMillis) If there is an early termination via an interrupt or too many timed out consumers or because a consumer or other Runnable threw a Throwable, pause this long before interrupting the consumers and other threads.Typically makes sense for this to be the same or slightly larger than timeoutThresholdMillis
- Parameters:
pauseOnEarlyTerminationMillis
- how long to pause if there is an early termination
-
setTimeoutThresholdMillis
public void setTimeoutThresholdMillis(long timeoutThresholdMillis) The amount of time allowed before a consumer should be timed out.- Parameters:
timeoutThresholdMillis
- threshold in milliseconds before declaring a consumer timed out
-
setTimeoutCheckPulseMillis
public void setTimeoutCheckPulseMillis(long timeoutCheckPulseMillis) -
setMaxAliveTimeSeconds
public void setMaxAliveTimeSeconds(int maxAliveTimeSeconds) The maximum amount of time that this process can be alive. To avoid memory leaks, it is sometimes beneficial to shutdown (and restart) the process periodically. If the value is < 0, the process will run until completion, interruption or exception.- Parameters:
maxAliveTimeSeconds
- maximum amount of time in seconds to remain alive
-