public abstract class FileResourceCrawler extends Object implements Callable<IFileProcessorFutureResult>
Modifier and Type | Field and Description |
---|---|
protected static int |
ADDED |
protected static org.slf4j.Logger |
LOG |
protected static int |
SKIPPED |
protected static int |
STOP_NOW |
Constructor and Description |
---|
FileResourceCrawler(ArrayBlockingQueue<FileResource> queue,
int numConsumers) |
Modifier and Type | Method and Description |
---|---|
org.apache.tika.batch.FileResourceCrawlerFutureResult |
call() |
int |
getAdded() |
int |
getConsidered() |
boolean |
isActive()
If the crawler stops for any reason, it is no longer active.
|
boolean |
isQueueEmpty()
Use sparingly.
|
protected boolean |
select(Metadata m) |
void |
setDocumentSelector(DocumentSelector documentSelector) |
void |
setMaxConsecWaitInMillis(long maxConsecWaitInMillis) |
void |
setMaxFilesToAdd(int maxFilesToAdd)
Maximum number of files to add.
|
void |
setMaxFilesToConsider(int maxFilesToConsider)
Maximum number of files to consider.
|
void |
shutDownNoPoison()
Set to true to shut down the FileResourceCrawler without
adding poison.
|
abstract void |
start()
Implement this to control the addition of FileResources.
|
protected int |
tryToAdd(FileResource fileResource) |
boolean |
wasTimedOut()
Returns whether the crawler timed out while trying to add a resource
to the queue.
|
protected static final org.slf4j.Logger LOG
protected static final int SKIPPED
protected static final int ADDED
protected static final int STOP_NOW
public FileResourceCrawler(ArrayBlockingQueue<FileResource> queue, int numConsumers)
queue
- shared queuenumConsumers
- number of consumers (needs to know how many poisons to add when done)public abstract void start() throws InterruptedException
tryToAdd(org.apache.tika.batch.FileResource)
to add FileResources to the queue.InterruptedException
public org.apache.tika.batch.FileResourceCrawlerFutureResult call()
call
in interface Callable<IFileProcessorFutureResult>
protected int tryToAdd(FileResource fileResource) throws InterruptedException
fileResource
- resource to addInterruptedException
public boolean isActive()
public void setMaxConsecWaitInMillis(long maxConsecWaitInMillis)
public void setDocumentSelector(DocumentSelector documentSelector)
public int getConsidered()
protected boolean select(Metadata m)
public void setMaxFilesToAdd(int maxFilesToAdd)
maxFilesToAdd
< 0 (default),
then this crawler will add all documents.maxFilesToAdd
- maximum number of files to add to the queuepublic void setMaxFilesToConsider(int maxFilesToConsider)
maxFilesToConsider
< 0 (default), then this crawler
will add all documents.maxFilesToConsider
- maximum number of files to consider adding to the queuepublic boolean isQueueEmpty()
public boolean wasTimedOut()
public int getAdded()
public void shutDownNoPoison()
Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.