public abstract class FileResourceCrawler extends Object implements Callable<IFileProcessorFutureResult>
Modifier and Type | Field and Description |
---|---|
protected static int |
ADDED |
protected static org.slf4j.Logger |
logger |
protected static int |
SKIPPED |
protected static int |
STOP_NOW |
Constructor and Description |
---|
FileResourceCrawler(ArrayBlockingQueue<FileResource> queue,
int numConsumers) |
Modifier and Type | Method and Description |
---|---|
org.apache.tika.batch.FileResourceCrawlerFutureResult |
call() |
int |
getAdded() |
int |
getConsidered() |
boolean |
isActive()
If the crawler stops for any reason, it is no longer active.
|
boolean |
isQueueEmpty()
Use sparingly.
|
protected boolean |
select(Metadata m) |
void |
setDocumentSelector(DocumentSelector documentSelector) |
void |
setMaxConsecWaitInMillis(long maxConsecWaitInMillis) |
void |
setMaxFilesToAdd(int maxFilesToAdd)
Maximum number of files to add.
|
void |
setMaxFilesToConsider(int maxFilesToConsider)
Maximum number of files to consider.
|
void |
shutDownNoPoison()
Set to true to shut down the FileResourceCrawler without
adding poison.
|
abstract void |
start()
Implement this to control the addition of FileResources.
|
protected int |
tryToAdd(FileResource fileResource) |
boolean |
wasTimedOut()
Returns whether the crawler timed out while trying to add a resource
to the queue.
|
protected static final int SKIPPED
protected static final int ADDED
protected static final int STOP_NOW
protected static org.slf4j.Logger logger
public FileResourceCrawler(ArrayBlockingQueue<FileResource> queue, int numConsumers)
queue
- shared queuenumConsumers
- number of consumers (needs to know how many poisons to add when done)public abstract void start() throws InterruptedException
tryToAdd(org.apache.tika.batch.FileResource)
to add FileResources to the queue.InterruptedException
public org.apache.tika.batch.FileResourceCrawlerFutureResult call()
call
in interface Callable<IFileProcessorFutureResult>
protected int tryToAdd(FileResource fileResource) throws InterruptedException
fileResource
- resource to addInterruptedException
public boolean isActive()
public void setMaxConsecWaitInMillis(long maxConsecWaitInMillis)
public void setDocumentSelector(DocumentSelector documentSelector)
public int getConsidered()
protected boolean select(Metadata m)
public void setMaxFilesToAdd(int maxFilesToAdd)
maxFilesToAdd
< 0 (default),
then this crawler will add all documents.maxFilesToAdd
- maximum number of files to add to the queuepublic void setMaxFilesToConsider(int maxFilesToConsider)
If maxFilesToConsider
< 0 (default), then this crawler
will add all documents.
maxFilesToConsider
- maximum number of files to consider adding to the queuepublic boolean isQueueEmpty()
public boolean wasTimedOut()
If the crawler timed out while trying to add poison, this is not set to true.
public int getAdded()
public void shutDownNoPoison()
Copyright © 2007-2015 The Apache Software Foundation. All Rights Reserved.