Package org.apache.tika.eval.app
Class FileProfiler
- java.lang.Object
-
- org.apache.tika.batch.FileResourceConsumer
-
- org.apache.tika.eval.app.AbstractProfiler
-
- org.apache.tika.eval.app.FileProfiler
-
- All Implemented Interfaces:
Callable<IFileProcessorFutureResult>
public class FileProfiler extends AbstractProfiler
This class profiles actual files as opposed to extracts e.g.ExtractProfiler. This does _not_ parse files, but does run file type identification and digests the raw bytes.If the 'file' command is available on the command line, this will also run the FileCommandDetector.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.tika.eval.app.AbstractProfiler
AbstractProfiler.EXCEPTION_TYPE, AbstractProfiler.PARSE_ERROR_TYPE
-
-
Field Summary
Fields Modifier and Type Field Description static StringDETECT_EXCEPTIONstatic TableInfoFILE_MIME_TABLEstatic TableInfoFILE_PROFILES-
Fields inherited from class org.apache.tika.eval.app.AbstractProfiler
FALSE, ID, MIME_TABLE, REF_EXTRACT_EXCEPTION_TYPES, REF_PARSE_ERROR_TYPES, REF_PARSE_EXCEPTION_TYPES, TRUE, writer
-
Fields inherited from class org.apache.tika.batch.FileResourceConsumer
ELAPSED_MILLIS, IO_IS, IO_OS, OOM, PARSE_ERR, PARSE_EX, TIMED_OUT
-
-
Constructor Summary
Constructors Constructor Description FileProfiler(ArrayBlockingQueue<FileResource> fileQueue, Path inputDir, IDBWriter dbWriter)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanprocessFileResource(FileResource fileResource)Main piece of code that needs to be implemented.static voidUSAGE()-
Methods inherited from class org.apache.tika.eval.app.AbstractProfiler
calcTextStats, closeWriter, getContent, getFileLength, getPathsFromExtractCrawl, getPathsFromSrcCrawl, getSourceFileLength, loadCommonTokens, setMaxContentLength, setMaxContentLengthForLangId, setMaxTokens, truncateContent, writeContentData, writeExceptionData, writeExtractException, writeProfileData
-
Methods inherited from class org.apache.tika.batch.FileResourceConsumer
call, checkForTimedOutMillis, close, flushAndClose, getCurrentFile, getNumHandledExceptions, getNumResourcesConsumed, getXMLifiedLogMsg, getXMLifiedLogMsg, incrementHandledExceptions, isStillActive, parse, pleaseShutdown
-
-
-
-
Field Detail
-
DETECT_EXCEPTION
public static final String DETECT_EXCEPTION
- See Also:
- Constant Field Values
-
FILE_PROFILES
public static TableInfo FILE_PROFILES
-
FILE_MIME_TABLE
public static TableInfo FILE_MIME_TABLE
-
-
Constructor Detail
-
FileProfiler
public FileProfiler(ArrayBlockingQueue<FileResource> fileQueue, Path inputDir, IDBWriter dbWriter)
-
-
Method Detail
-
USAGE
public static void USAGE()
-
processFileResource
public boolean processFileResource(FileResource fileResource)
Description copied from class:FileResourceConsumerMain piece of code that needs to be implemented. Clients are responsible for closing streams and handling the exceptions that they'd like to handle. Unchecked throwables can be thrown past this, of course. When an unchecked throwable is thrown, this logs the error, and then rethrows the exception. Clients/subclasses should make sure to catch and handle everything they can. The design goal is that the whole process should close up and shutdown soon after an unchecked exception or error is thrown. Make sure to callFileResourceConsumer.incrementHandledExceptions()appropriately in your implementation of this method.- Specified by:
processFileResourcein classFileResourceConsumer- Parameters:
fileResource- resource to process- Returns:
- whether or not a file was successfully processed
-
-