Package org.apache.tika.batch
Class BatchProcess
- java.lang.Object
-
- org.apache.tika.batch.BatchProcess
-
- All Implemented Interfaces:
Callable<ParallelFileProcessingResult>
public class BatchProcess extends Object implements Callable<ParallelFileProcessingResult>
This is the main processor class for a single process. This class can only be run once. It requires aFileResourceCrawler
andFileResourceConsumer
s, and it can also support aStatusReporter
and anInterrupter
. This is designed to shutdown if a parser has timed out or if there is an OutOfMemoryError. Consider usingBatchProcessDriverCLI
as a daemon/watchdog that monitors and can restart this batch process;Note that this classs redirects stderr to stdout so that it can communicate without interference with the parent process on stderr.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BatchProcess.BATCH_CONSTANTS
-
Constructor Summary
Constructors Constructor Description BatchProcess(FileResourceCrawler fileResourceCrawler, ConsumersManager consumersManager, StatusReporter reporter, Interrupter interrupter)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ParallelFileProcessingResult
call()
Runs main execution loop.void
setMaxAliveTimeSeconds(int maxAliveTimeSeconds)
The maximum amount of time that this process can be alive.void
setPauseOnEarlyTerminationMillis(long pauseOnEarlyTerminationMillis)
If there is an early termination via an interrupt or too many timed out consumers or because a consumer or other Runnable threw a Throwable, pause this long before killing the consumers and other threads.void
setTimeoutCheckPulseMillis(long timeoutCheckPulseMillis)
void
setTimeoutThresholdMillis(long timeoutThresholdMillis)
The amount of time allowed before a consumer should be timed out.
-
-
-
Constructor Detail
-
BatchProcess
public BatchProcess(FileResourceCrawler fileResourceCrawler, ConsumersManager consumersManager, StatusReporter reporter, Interrupter interrupter)
-
-
Method Detail
-
call
public ParallelFileProcessingResult call() throws InterruptedException
Runs main execution loop.Redirects stdout to stderr to keep clean communications over stdout with parent process
- Specified by:
call
in interfaceCallable<ParallelFileProcessingResult>
- Returns:
- result of the processing
- Throws:
InterruptedException
-
setPauseOnEarlyTerminationMillis
public void setPauseOnEarlyTerminationMillis(long pauseOnEarlyTerminationMillis)
If there is an early termination via an interrupt or too many timed out consumers or because a consumer or other Runnable threw a Throwable, pause this long before killing the consumers and other threads. Typically makes sense for this to be the same or slightly larger than timeoutThresholdMillis- Parameters:
pauseOnEarlyTerminationMillis
- how long to pause if there is an early termination
-
setTimeoutThresholdMillis
public void setTimeoutThresholdMillis(long timeoutThresholdMillis)
The amount of time allowed before a consumer should be timed out.- Parameters:
timeoutThresholdMillis
- threshold in milliseconds before declaring a consumer timed out
-
setTimeoutCheckPulseMillis
public void setTimeoutCheckPulseMillis(long timeoutCheckPulseMillis)
-
setMaxAliveTimeSeconds
public void setMaxAliveTimeSeconds(int maxAliveTimeSeconds)
The maximum amount of time that this process can be alive. To avoid memory leaks, it is sometimes beneficial to shutdown (and restart) the process periodically. If the value is < 0, the process will run until completion, interruption or exception.- Parameters:
maxAliveTimeSeconds
- maximum amount of time in seconds to remain alive
-
-