public class FileProfiler extends AbstractProfiler
ExtractProfiler
.
This does _not_ parse files, but does run file type identification and digests the
raw bytes.
If the 'file' command is available on the command line, this will also run the
FileCommandDetector.AbstractProfiler.EXCEPTION_TYPE, AbstractProfiler.PARSE_ERROR_TYPE
Modifier and Type | Field and Description |
---|---|
static String |
DETECT_EXCEPTION |
static TableInfo |
FILE_MIME_TABLE |
static TableInfo |
FILE_PROFILES |
FALSE, ID, MIME_TABLE, REF_EXTRACT_EXCEPTION_TYPES, REF_PARSE_ERROR_TYPES, REF_PARSE_EXCEPTION_TYPES, TRUE, writer
ELAPSED_MILLIS, IO_IS, IO_OS, OOM, PARSE_ERR, PARSE_EX, TIMED_OUT
Constructor and Description |
---|
FileProfiler(ArrayBlockingQueue<FileResource> fileQueue,
Path inputDir,
IDBWriter dbWriter) |
Modifier and Type | Method and Description |
---|---|
boolean |
processFileResource(FileResource fileResource)
Main piece of code that needs to be implemented.
|
static void |
USAGE() |
calcTextStats, closeWriter, getContent, getFileLength, getPathsFromExtractCrawl, getPathsFromSrcCrawl, getSourceFileLength, loadCommonTokens, setMaxContentLength, setMaxContentLengthForLangId, setMaxTokens, truncateContent, writeContentData, writeExceptionData, writeExtractException, writeProfileData
call, checkForTimedOutMillis, close, flushAndClose, getCurrentFile, getNumHandledExceptions, getNumResourcesConsumed, getXMLifiedLogMsg, getXMLifiedLogMsg, incrementHandledExceptions, isStillActive, parse, pleaseShutdown
public static TableInfo FILE_PROFILES
public static TableInfo FILE_MIME_TABLE
public static final String DETECT_EXCEPTION
public FileProfiler(ArrayBlockingQueue<FileResource> fileQueue, Path inputDir, IDBWriter dbWriter)
public static void USAGE()
public boolean processFileResource(FileResource fileResource)
FileResourceConsumer
FileResourceConsumer.incrementHandledExceptions()
appropriately in
your implementation of this method.
processFileResource
in class FileResourceConsumer
fileResource
- resource to processCopyright © 2007–2021 The Apache Software Foundation. All rights reserved.