public class FileProfiler extends AbstractProfiler
ExtractProfiler.
This does _not_ parse files, but does run file type identification and digests the
raw bytes.
If the 'file' command is available on the command line, this will also run the FileCommandDetector.
AbstractProfiler.EXCEPTION_TYPE, AbstractProfiler.PARSE_ERROR_TYPE| Modifier and Type | Field and Description |
|---|---|
static String |
DETECT_EXCEPTION |
static TableInfo |
FILE_MIME_TABLE |
static TableInfo |
FILE_PROFILES |
FALSE, ID, MIME_TABLE, REF_EXTRACT_EXCEPTION_TYPES, REF_PARSE_ERROR_TYPES, REF_PARSE_EXCEPTION_TYPES, TRUE, writerELAPSED_MILLIS, IO_IS, IO_OS, OOM, PARSE_ERR, PARSE_EX, TIMED_OUT| Constructor and Description |
|---|
FileProfiler(ArrayBlockingQueue<FileResource> fileQueue,
Path inputDir,
IDBWriter dbWriter) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
processFileResource(FileResource fileResource)
Main piece of code that needs to be implemented.
|
static void |
USAGE() |
calcTextStats, closeWriter, getContent, getFileLength, getPathsFromExtractCrawl, getPathsFromSrcCrawl, getSourceFileLength, loadCommonTokens, setMaxContentLength, setMaxContentLengthForLangId, setMaxTokens, truncateContent, writeContentData, writeExceptionData, writeExtractException, writeProfileDatacall, checkForTimedOutMillis, close, flushAndClose, getCurrentFile, getNumHandledExceptions, getNumResourcesConsumed, getXMLifiedLogMsg, getXMLifiedLogMsg, incrementHandledExceptions, isStillActive, parse, pleaseShutdownpublic static final String DETECT_EXCEPTION
public static TableInfo FILE_PROFILES
public static TableInfo FILE_MIME_TABLE
public FileProfiler(ArrayBlockingQueue<FileResource> fileQueue, Path inputDir, IDBWriter dbWriter)
public static void USAGE()
public boolean processFileResource(FileResource fileResource)
FileResourceConsumerFileResourceConsumer.incrementHandledExceptions() appropriately in
your implementation of this method.
processFileResource in class FileResourceConsumerfileResource - resource to processCopyright © 2007–2022 The Apache Software Foundation. All rights reserved.