Package org.apache.tika.batch
Class FileResourceCrawler
java.lang.Object
org.apache.tika.batch.FileResourceCrawler
- All Implemented Interfaces:
- Callable<IFileProcessorFutureResult>
- Direct Known Subclasses:
- FSDirectoryCrawler,- FSListCrawler
public abstract class FileResourceCrawler
extends Object
implements Callable<IFileProcessorFutureResult>
- 
Field SummaryFields
- 
Constructor SummaryConstructorsConstructorDescriptionFileResourceCrawler(ArrayBlockingQueue<FileResource> queue, int numConsumers) 
- 
Method SummaryModifier and TypeMethodDescriptionorg.apache.tika.batch.FileResourceCrawlerFutureResultcall()intgetAdded()intbooleanisActive()If the crawler stops for any reason, it is no longer active.booleanUse sparingly.protected booleanvoidsetDocumentSelector(DocumentSelector documentSelector) voidsetMaxConsecWaitInMillis(long maxConsecWaitInMillis) voidsetMaxFilesToAdd(int maxFilesToAdd) Maximum number of files to add.voidsetMaxFilesToConsider(int maxFilesToConsider) Maximum number of files to consider.voidSet to true to shut down the FileResourceCrawler without adding poison.abstract voidstart()Implement this to control the addition of FileResources.protected inttryToAdd(FileResource fileResource) booleanReturns whether the crawler timed out while trying to add a resource to the queue.
- 
Field Details- 
LOGprotected static final org.slf4j.Logger LOG
- 
SKIPPEDprotected static final int SKIPPED- See Also:
 
- 
ADDEDprotected static final int ADDED- See Also:
 
- 
STOP_NOWprotected static final int STOP_NOW- See Also:
 
 
- 
- 
Constructor Details- 
FileResourceCrawler- Parameters:
- queue- shared queue
- numConsumers- number of consumers (needs to know how many poisons to add when done)
 
 
- 
- 
Method Details- 
startImplement this to control the addition of FileResources. CalltryToAdd(org.apache.tika.batch.FileResource)to add FileResources to the queue.- Throws:
- InterruptedException
 
- 
callpublic org.apache.tika.batch.FileResourceCrawlerFutureResult call()- Specified by:
- callin interface- Callable<IFileProcessorFutureResult>
 
- 
tryToAdd- Parameters:
- fileResource- resource to add
- Returns:
- int status of the attempt (SKIPPED, ADDED, STOP_NOW) to add the resource to the queue.
- Throws:
- InterruptedException
 
- 
isActivepublic boolean isActive()If the crawler stops for any reason, it is no longer active.- Returns:
- whether crawler is active or not
 
- 
setMaxConsecWaitInMillispublic void setMaxConsecWaitInMillis(long maxConsecWaitInMillis) 
- 
setDocumentSelector
- 
getConsideredpublic int getConsidered()
- 
select
- 
setMaxFilesToAddpublic void setMaxFilesToAdd(int maxFilesToAdd) Maximum number of files to add. IfmaxFilesToAdd< 0 (default), then this crawler will add all documents.- Parameters:
- maxFilesToAdd- maximum number of files to add to the queue
 
- 
setMaxFilesToConsiderpublic void setMaxFilesToConsider(int maxFilesToConsider) Maximum number of files to consider. A file is considered whether or not the DocumentSelector selects a document. IfmaxFilesToConsider< 0 (default), then this crawler will add all documents.- Parameters:
- maxFilesToConsider- maximum number of files to consider adding to the queue
 
- 
isQueueEmptypublic boolean isQueueEmpty()Use sparingly. This synchronizes on the queue!- Returns:
- whether this queue contains any non-poison file resources
 
- 
wasTimedOutpublic boolean wasTimedOut()Returns whether the crawler timed out while trying to add a resource to the queue. If the crawler timed out while trying to add poison, this is not set to true.- Returns:
- whether this was timed out or not
 
- 
getAddedpublic int getAdded()- Returns:
- number of files that this crawler added to the queue
 
- 
shutDownNoPoisonpublic void shutDownNoPoison()Set to true to shut down the FileResourceCrawler without adding poison. Do this only if you've already called another mechanism to request that consumers shut down. This prevents a potential deadlock issue where the crawler is trying to add to the queue, but it is full.
 
-