Package org.apache.tika.eval.app
Class FileProfiler
java.lang.Object
org.apache.tika.batch.FileResourceConsumer
org.apache.tika.eval.app.AbstractProfiler
org.apache.tika.eval.app.FileProfiler
- All Implemented Interfaces:
- Callable<IFileProcessorFutureResult>
This class profiles actual files as opposed to extracts e.g. 
ExtractProfiler.
 This does _not_ parse files, but does run file type identification and digests the
 raw bytes.
 If the 'file' command is available on the command line, this will also run the FileCommandDetector.
- 
Nested Class SummaryNested classes/interfaces inherited from class org.apache.tika.eval.app.AbstractProfilerAbstractProfiler.EXCEPTION_TYPE, AbstractProfiler.PARSE_ERROR_TYPE
- 
Field SummaryFieldsFields inherited from class org.apache.tika.eval.app.AbstractProfilerFALSE, ID, MIME_TABLE, REF_EXTRACT_EXCEPTION_TYPES, REF_PARSE_ERROR_TYPES, REF_PARSE_EXCEPTION_TYPES, TRUE, writerFields inherited from class org.apache.tika.batch.FileResourceConsumerELAPSED_MILLIS, IO_IS, IO_OS, OOM, PARSE_ERR, PARSE_EX, TIMED_OUT
- 
Constructor SummaryConstructorsConstructorDescriptionFileProfiler(ArrayBlockingQueue<FileResource> fileQueue, Path inputDir, IDBWriter dbWriter) 
- 
Method SummaryModifier and TypeMethodDescriptionbooleanprocessFileResource(FileResource fileResource) Main piece of code that needs to be implemented.static voidUSAGE()Methods inherited from class org.apache.tika.eval.app.AbstractProfilercalcTextStats, closeWriter, getContent, getFileLength, getPathsFromExtractCrawl, getPathsFromSrcCrawl, getSourceFileLength, loadCommonTokens, setMaxContentLength, setMaxContentLengthForLangId, setMaxTokens, truncateContent, writeContentData, writeExceptionData, writeExtractException, writeProfileDataMethods inherited from class org.apache.tika.batch.FileResourceConsumercall, checkForTimedOutMillis, close, flushAndClose, getCurrentFile, getNumHandledExceptions, getNumResourcesConsumed, getXMLifiedLogMsg, getXMLifiedLogMsg, incrementHandledExceptions, isStillActive, parse, pleaseShutdown
- 
Field Details- 
DETECT_EXCEPTION- See Also:
 
- 
FILE_PROFILES
- 
FILE_MIME_TABLE
 
- 
- 
Constructor Details- 
FileProfiler
 
- 
- 
Method Details- 
USAGEpublic static void USAGE()
- 
processFileResourceDescription copied from class:FileResourceConsumerMain piece of code that needs to be implemented. Clients are responsible for closing streams and handling the exceptions that they'd like to handle. Unchecked throwables can be thrown past this, of course. When an unchecked throwable is thrown, this logs the error, and then rethrows the exception. Clients/subclasses should make sure to catch and handle everything they can. The design goal is that the whole process should close up and shutdown soon after an unchecked exception or error is thrown. Make sure to callFileResourceConsumer.incrementHandledExceptions()appropriately in your implementation of this method.- Specified by:
- processFileResourcein class- FileResourceConsumer
- Parameters:
- fileResource- resource to process
- Returns:
- whether or not a file was successfully processed
 
 
-