public class FileProfiler extends AbstractProfiler
ExtractProfiler
.
This does _not_ parse files, but does run file type identification and digests the
raw bytes.
If the 'file' command is available on the command line, this will also run the FileCommandDetector.
AbstractProfiler.EXCEPTION_TYPE, AbstractProfiler.PARSE_ERROR_TYPE
Modifier and Type | Field and Description |
---|---|
static String |
DETECT_EXCEPTION |
static TableInfo |
FILE_MIME_TABLE |
static TableInfo |
FILE_PROFILES |
FALSE, ID, MIME_TABLE, REF_EXTRACT_EXCEPTION_TYPES, REF_PARSE_ERROR_TYPES, REF_PARSE_EXCEPTION_TYPES, TRUE, writer
ELAPSED_MILLIS, IO_IS, IO_OS, OOM, PARSE_ERR, PARSE_EX, TIMED_OUT
Constructor and Description |
---|
FileProfiler(ArrayBlockingQueue<FileResource> fileQueue,
Path inputDir,
IDBWriter dbWriter) |
Modifier and Type | Method and Description |
---|---|
boolean |
processFileResource(FileResource fileResource)
Main piece of code that needs to be implemented.
|
static void |
USAGE() |
calcTextStats, closeWriter, getContent, getFileLength, getPathsFromExtractCrawl, getPathsFromSrcCrawl, getSourceFileLength, loadCommonTokens, setMaxContentLength, setMaxContentLengthForLangId, setMaxTokens, truncateContent, writeContentData, writeExceptionData, writeExtractException, writeProfileData
call, checkForTimedOutMillis, close, flushAndClose, getCurrentFile, getNumHandledExceptions, getNumResourcesConsumed, getXMLifiedLogMsg, getXMLifiedLogMsg, incrementHandledExceptions, isStillActive, parse, pleaseShutdown
public static final String DETECT_EXCEPTION
public static TableInfo FILE_PROFILES
public static TableInfo FILE_MIME_TABLE
public FileProfiler(ArrayBlockingQueue<FileResource> fileQueue, Path inputDir, IDBWriter dbWriter)
public static void USAGE()
public boolean processFileResource(FileResource fileResource)
FileResourceConsumer
FileResourceConsumer.incrementHandledExceptions()
appropriately in
your implementation of this method.
processFileResource
in class FileResourceConsumer
fileResource
- resource to processCopyright © 2007–2021 The Apache Software Foundation. All rights reserved.