Package org.apache.tika.batch
Class FileResourceCrawler
java.lang.Object
org.apache.tika.batch.FileResourceCrawler
- All Implemented Interfaces:
Callable<IFileProcessorFutureResult>
- Direct Known Subclasses:
FSDirectoryCrawler
,FSListCrawler
public abstract class FileResourceCrawler
extends Object
implements Callable<IFileProcessorFutureResult>
-
Field Summary
-
Constructor Summary
ConstructorDescriptionFileResourceCrawler
(ArrayBlockingQueue<FileResource> queue, int numConsumers) -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.tika.batch.FileResourceCrawlerFutureResult
call()
int
getAdded()
int
boolean
isActive()
If the crawler stops for any reason, it is no longer active.boolean
Use sparingly.protected boolean
void
setDocumentSelector
(DocumentSelector documentSelector) void
setMaxConsecWaitInMillis
(long maxConsecWaitInMillis) void
setMaxFilesToAdd
(int maxFilesToAdd) Maximum number of files to add.void
setMaxFilesToConsider
(int maxFilesToConsider) Maximum number of files to consider.void
Set to true to shut down the FileResourceCrawler without adding poison.abstract void
start()
Implement this to control the addition of FileResources.protected int
tryToAdd
(FileResource fileResource) boolean
Returns whether the crawler timed out while trying to add a resource to the queue.
-
Field Details
-
LOG
protected static final org.slf4j.Logger LOG -
SKIPPED
protected static final int SKIPPED- See Also:
-
ADDED
protected static final int ADDED- See Also:
-
STOP_NOW
protected static final int STOP_NOW- See Also:
-
-
Constructor Details
-
FileResourceCrawler
- Parameters:
queue
- shared queuenumConsumers
- number of consumers (needs to know how many poisons to add when done)
-
-
Method Details
-
start
Implement this to control the addition of FileResources. CalltryToAdd(org.apache.tika.batch.FileResource)
to add FileResources to the queue.- Throws:
InterruptedException
-
call
public org.apache.tika.batch.FileResourceCrawlerFutureResult call()- Specified by:
call
in interfaceCallable<IFileProcessorFutureResult>
-
tryToAdd
- Parameters:
fileResource
- resource to add- Returns:
- int status of the attempt (SKIPPED, ADDED, STOP_NOW) to add the resource to the queue.
- Throws:
InterruptedException
-
isActive
public boolean isActive()If the crawler stops for any reason, it is no longer active.- Returns:
- whether crawler is active or not
-
setMaxConsecWaitInMillis
public void setMaxConsecWaitInMillis(long maxConsecWaitInMillis) -
setDocumentSelector
-
getConsidered
public int getConsidered() -
select
-
setMaxFilesToAdd
public void setMaxFilesToAdd(int maxFilesToAdd) Maximum number of files to add. IfmaxFilesToAdd
< 0 (default), then this crawler will add all documents.- Parameters:
maxFilesToAdd
- maximum number of files to add to the queue
-
setMaxFilesToConsider
public void setMaxFilesToConsider(int maxFilesToConsider) Maximum number of files to consider. A file is considered whether or not the DocumentSelector selects a document. IfmaxFilesToConsider
< 0 (default), then this crawler will add all documents.- Parameters:
maxFilesToConsider
- maximum number of files to consider adding to the queue
-
isQueueEmpty
public boolean isQueueEmpty()Use sparingly. This synchronizes on the queue!- Returns:
- whether this queue contains any non-poison file resources
-
wasTimedOut
public boolean wasTimedOut()Returns whether the crawler timed out while trying to add a resource to the queue. If the crawler timed out while trying to add poison, this is not set to true.- Returns:
- whether this was timed out or not
-
getAdded
public int getAdded()- Returns:
- number of files that this crawler added to the queue
-
shutDownNoPoison
public void shutDownNoPoison()Set to true to shut down the FileResourceCrawler without adding poison. Do this only if you've already called another mechanism to request that consumers shut down. This prevents a potential deadlock issue where the crawler is trying to add to the queue, but it is full.
-