Package org.apache.tika.batch.fs
Class FSDirectoryCrawler
- java.lang.Object
-
- org.apache.tika.batch.FileResourceCrawler
-
- org.apache.tika.batch.fs.FSDirectoryCrawler
-
- All Implemented Interfaces:
Callable<IFileProcessorFutureResult>
public class FSDirectoryCrawler extends FileResourceCrawler
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FSDirectoryCrawler.CRAWL_ORDER
-
Field Summary
-
Fields inherited from class org.apache.tika.batch.FileResourceCrawler
ADDED, LOG, SKIPPED, STOP_NOW
-
-
Constructor Summary
Constructors Constructor Description FSDirectoryCrawler(ArrayBlockingQueue<FileResource> fileQueue, int numConsumers, Path root, Path startDirectory, FSDirectoryCrawler.CRAWL_ORDER crawlOrder)
FSDirectoryCrawler(ArrayBlockingQueue<FileResource> fileQueue, int numConsumers, Path root, FSDirectoryCrawler.CRAWL_ORDER crawlOrder)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
handleFirstFileInDirectory(Path f)
Override this if you have any special handling for the first actual file that the crawler comes across in a directory.void
start()
Implement this to control the addition of FileResources.-
Methods inherited from class org.apache.tika.batch.FileResourceCrawler
call, getAdded, getConsidered, isActive, isQueueEmpty, select, setDocumentSelector, setMaxConsecWaitInMillis, setMaxFilesToAdd, setMaxFilesToConsider, shutDownNoPoison, tryToAdd, wasTimedOut
-
-
-
-
Constructor Detail
-
FSDirectoryCrawler
public FSDirectoryCrawler(ArrayBlockingQueue<FileResource> fileQueue, int numConsumers, Path root, FSDirectoryCrawler.CRAWL_ORDER crawlOrder)
-
FSDirectoryCrawler
public FSDirectoryCrawler(ArrayBlockingQueue<FileResource> fileQueue, int numConsumers, Path root, Path startDirectory, FSDirectoryCrawler.CRAWL_ORDER crawlOrder)
-
-
Method Detail
-
start
public void start() throws InterruptedException
Description copied from class:FileResourceCrawler
Implement this to control the addition of FileResources. CallFileResourceCrawler.tryToAdd(org.apache.tika.batch.FileResource)
to add FileResources to the queue.- Specified by:
start
in classFileResourceCrawler
- Throws:
InterruptedException
-
handleFirstFileInDirectory
public void handleFirstFileInDirectory(Path f)
Override this if you have any special handling for the first actual file that the crawler comes across in a directory. For example, it might be handy to call mkdirs() on an output directory if your FileResourceConsumers are writing to a file.- Parameters:
f
- file to handle
-
-