Package org.apache.tika.batch
Class FileResourceConsumer
java.lang.Object
org.apache.tika.batch.FileResourceConsumer
- All Implemented Interfaces:
Callable<IFileProcessorFutureResult>
- Direct Known Subclasses:
AbstractFSConsumer
,AbstractProfiler
public abstract class FileResourceConsumer
extends Object
implements Callable<IFileProcessorFutureResult>
This is a base class for file consumers. The
goal of this class is to abstract out the multithreading
and record keeping components.
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptioncall()
org.apache.tika.batch.FileStarted
checkForTimedOutMillis
(long staleThresholdMillis) Checks to see if the currentFile being processed (if there is one) should be timed out (still being worked on after staleThresholdMillis).protected void
protected void
flushAndClose
(Closeable closeable) org.apache.tika.batch.FileStarted
Returns the name and start time of a file that is currently being processed.int
int
protected String
getXMLifiedLogMsg
(String type, String resourceId, String... attrs) protected String
getXMLifiedLogMsg
(String type, String resourceId, Throwable t, String... attrs) Use this for structured output that captures resourceId and other attributes.protected void
Make sure to call this appropriately!boolean
Returns whether or not the consumer is still could process a file or is still processing a file (ACTIVELY_CONSUMING or ASKED_TO_SHUTDOWN)protected void
parse
(String resourceId, Parser parser, InputStream is, ContentHandler handler, Metadata m, ParseContext parseContext) Utility method to handle logging equivalently among all implementing classes.void
This politely asks the consumer to shutdown.abstract boolean
processFileResource
(FileResource fileResource) Main piece of code that needs to be implemented.
-
Field Details
-
LOG
protected static final org.slf4j.Logger LOG -
TIMED_OUT
-
OOM
-
IO_IS
-
IO_OS
-
PARSE_ERR
-
PARSE_EX
-
ELAPSED_MILLIS
-
-
Constructor Details
-
FileResourceConsumer
-
-
Method Details
-
call
- Specified by:
call
in interfaceCallable<IFileProcessorFutureResult>
-
processFileResource
Main piece of code that needs to be implemented. Clients are responsible for closing streams and handling the exceptions that they'd like to handle. Unchecked throwables can be thrown past this, of course. When an unchecked throwable is thrown, this logs the error, and then rethrows the exception. Clients/subclasses should make sure to catch and handle everything they can. The design goal is that the whole process should close up and shutdown soon after an unchecked exception or error is thrown. Make sure to callincrementHandledExceptions()
appropriately in your implementation of this method.- Parameters:
fileResource
- resource to process- Returns:
- whether or not a file was successfully processed
-
incrementHandledExceptions
protected void incrementHandledExceptions()Make sure to call this appropriately! -
isStillActive
public boolean isStillActive()Returns whether or not the consumer is still could process a file or is still processing a file (ACTIVELY_CONSUMING or ASKED_TO_SHUTDOWN)- Returns:
- whether this consumer is still active
-
pleaseShutdown
public void pleaseShutdown()This politely asks the consumer to shutdown. Before processing another file, the consumer will check to see if it has been asked to terminate.This offers another method for politely requesting that a FileResourceConsumer stop processing besides passing it
PoisonFileResource
. -
getCurrentFile
public org.apache.tika.batch.FileStarted getCurrentFile()Returns the name and start time of a file that is currently being processed. If no file is currently being processed, this will return null.- Returns:
- FileStarted or null
-
getNumResourcesConsumed
public int getNumResourcesConsumed() -
getNumHandledExceptions
public int getNumHandledExceptions() -
checkForTimedOutMillis
public org.apache.tika.batch.FileStarted checkForTimedOutMillis(long staleThresholdMillis) Checks to see if the currentFile being processed (if there is one) should be timed out (still being worked on after staleThresholdMillis).If the consumer should be timed out, this will return the currentFile and set the state to TIMED_OUT.
If the consumer was already timed out earlier or is not processing a file or has been working on a file for less than #staleThresholdMillis, then this will return null.
- Parameters:
staleThresholdMillis
- threshold to determine whether the consumer has gone stale.- Returns:
- null or the file started that triggered the stale condition
-
getXMLifiedLogMsg
-
getXMLifiedLogMsg
Use this for structured output that captures resourceId and other attributes.- Parameters:
type
- entity name for exceptionresourceId
- resourceId stringt
- throwable can be nullattrs
- (array of key0, value0, key1, value1, etc.)
-
close
-
flushAndClose
-
parse
protected void parse(String resourceId, Parser parser, InputStream is, ContentHandler handler, Metadata m, ParseContext parseContext) throws Throwable Utility method to handle logging equivalently among all implementing classes. Use, override or avoid as desired.- Parameters:
resourceId
- resourceIdparser
- parser to useis
- inputStream (will be closed by this method!)handler
- handler for the contentm
- metadataparseContext
- parse context- Throws:
Throwable
- (logs and then throws whatever was thrown (if anything)
-