Package org.apache.tika.batch
Class FileResourceConsumer
- java.lang.Object
-
- org.apache.tika.batch.FileResourceConsumer
-
- All Implemented Interfaces:
Callable<IFileProcessorFutureResult>
- Direct Known Subclasses:
AbstractFSConsumer
,AbstractProfiler
public abstract class FileResourceConsumer extends Object implements Callable<IFileProcessorFutureResult>
This is a base class for file consumers. The goal of this class is to abstract out the multithreading and recordkeeping components.
-
-
Constructor Summary
Constructors Constructor Description FileResourceConsumer(ArrayBlockingQueue<FileResource> fileQueue)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description IFileProcessorFutureResult
call()
org.apache.tika.batch.FileStarted
checkForTimedOutMillis(long staleThresholdMillis)
Checks to see if the currentFile being processed (if there is one) should be timed out (still being worked on after staleThresholdMillis).protected void
close(Closeable closeable)
protected void
flushAndClose(Closeable closeable)
org.apache.tika.batch.FileStarted
getCurrentFile()
Returns the name and start time of a file that is currently being processed.int
getNumHandledExceptions()
int
getNumResourcesConsumed()
protected String
getXMLifiedLogMsg(String type, String resourceId, String... attrs)
protected String
getXMLifiedLogMsg(String type, String resourceId, Throwable t, String... attrs)
Use this for structured output that captures resourceId and other attributes.protected void
incrementHandledExceptions()
Make sure to call this appropriately!boolean
isStillActive()
Returns whether or not the consumer is still could process a file or is still processing a file (ACTIVELY_CONSUMING or ASKED_TO_SHUTDOWN)protected void
parse(String resourceId, Parser parser, InputStream is, ContentHandler handler, Metadata m, ParseContext parseContext)
Utility method to handle logging equivalently among all implementing classes.void
pleaseShutdown()
This politely asks the consumer to shutdown.abstract boolean
processFileResource(FileResource fileResource)
Main piece of code that needs to be implemented.
-
-
-
Field Detail
-
LOG
protected static final org.slf4j.Logger LOG
-
TIMED_OUT
public static String TIMED_OUT
-
OOM
public static String OOM
-
IO_IS
public static String IO_IS
-
IO_OS
public static String IO_OS
-
PARSE_ERR
public static String PARSE_ERR
-
PARSE_EX
public static String PARSE_EX
-
ELAPSED_MILLIS
public static String ELAPSED_MILLIS
-
-
Constructor Detail
-
FileResourceConsumer
public FileResourceConsumer(ArrayBlockingQueue<FileResource> fileQueue)
-
-
Method Detail
-
call
public IFileProcessorFutureResult call()
- Specified by:
call
in interfaceCallable<IFileProcessorFutureResult>
-
processFileResource
public abstract boolean processFileResource(FileResource fileResource)
Main piece of code that needs to be implemented. Clients are responsible for closing streams and handling the exceptions that they'd like to handle. Unchecked throwables can be thrown past this, of course. When an unchecked throwable is thrown, this logs the error, and then rethrows the exception. Clients/subclasses should make sure to catch and handle everything they can. The design goal is that the whole process should close up and shutdown soon after an unchecked exception or error is thrown. Make sure to callincrementHandledExceptions()
appropriately in your implementation of this method.- Parameters:
fileResource
- resource to process- Returns:
- whether or not a file was successfully processed
-
incrementHandledExceptions
protected void incrementHandledExceptions()
Make sure to call this appropriately!
-
isStillActive
public boolean isStillActive()
Returns whether or not the consumer is still could process a file or is still processing a file (ACTIVELY_CONSUMING or ASKED_TO_SHUTDOWN)- Returns:
- whether this consumer is still active
-
pleaseShutdown
public void pleaseShutdown()
This politely asks the consumer to shutdown. Before processing another file, the consumer will check to see if it has been asked to terminate.This offers another method for politely requesting that a FileResourceConsumer stop processing besides passing it
PoisonFileResource
.
-
getCurrentFile
public org.apache.tika.batch.FileStarted getCurrentFile()
Returns the name and start time of a file that is currently being processed. If no file is currently being processed, this will return null.- Returns:
- FileStarted or null
-
getNumResourcesConsumed
public int getNumResourcesConsumed()
-
getNumHandledExceptions
public int getNumHandledExceptions()
-
checkForTimedOutMillis
public org.apache.tika.batch.FileStarted checkForTimedOutMillis(long staleThresholdMillis)
Checks to see if the currentFile being processed (if there is one) should be timed out (still being worked on after staleThresholdMillis).If the consumer should be timed out, this will return the currentFile and set the state to TIMED_OUT.
If the consumer was already timed out earlier or is not processing a file or has been working on a file for less than #staleThresholdMillis, then this will return null.
- Parameters:
staleThresholdMillis
- threshold to determine whether the consumer has gone stale.- Returns:
- null or the file started that triggered the stale condition
-
getXMLifiedLogMsg
protected String getXMLifiedLogMsg(String type, String resourceId, String... attrs)
-
getXMLifiedLogMsg
protected String getXMLifiedLogMsg(String type, String resourceId, Throwable t, String... attrs)
Use this for structured output that captures resourceId and other attributes.- Parameters:
type
- entity name for exceptionresourceId
- resourceId stringt
- throwable can be nullattrs
- (array of key0, value0, key1, value1, etc.)
-
close
protected void close(Closeable closeable)
-
flushAndClose
protected void flushAndClose(Closeable closeable)
-
parse
protected void parse(String resourceId, Parser parser, InputStream is, ContentHandler handler, Metadata m, ParseContext parseContext) throws Throwable
Utility method to handle logging equivalently among all implementing classes. Use, override or avoid as desired.- Parameters:
resourceId
- resourceIdparser
- parser to useis
- inputStream (will be closed by this method!)handler
- handler for the contentm
- metadataparseContext
- parse context- Throws:
Throwable
- (logs and then throws whatever was thrown (if anything)
-
-