Package org.apache.tika.eval
Class ExtractProfiler
- java.lang.Object
-
- org.apache.tika.batch.FileResourceConsumer
-
- org.apache.tika.eval.AbstractProfiler
-
- org.apache.tika.eval.ExtractProfiler
-
- All Implemented Interfaces:
Callable<IFileProcessorFutureResult>
public class ExtractProfiler extends AbstractProfiler
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.tika.eval.AbstractProfiler
AbstractProfiler.EXCEPTION_TYPE, AbstractProfiler.PARSE_ERROR_TYPE
-
-
Field Summary
Fields Modifier and Type Field Description static TableInfo
CONTAINER_TABLE
static TableInfo
CONTENTS_TABLE
static TableInfo
EMBEDDED_FILE_PATH_TABLE
static TableInfo
EXCEPTION_TABLE
static TableInfo
EXTRACT_EXCEPTION_TABLE
static TableInfo
PROFILE_TABLE
static TableInfo
TAGS_TABLE
-
Fields inherited from class org.apache.tika.eval.AbstractProfiler
FALSE, ID, MIME_TABLE, REF_EXTRACT_EXCEPTION_TYPES, REF_PARSE_ERROR_TYPES, REF_PARSE_EXCEPTION_TYPES, TRUE, writer
-
Fields inherited from class org.apache.tika.batch.FileResourceConsumer
ELAPSED_MILLIS, IO_IS, IO_OS, OOM, PARSE_ERR, PARSE_EX, TIMED_OUT
-
-
Constructor Summary
Constructors Constructor Description ExtractProfiler(ArrayBlockingQueue<FileResource> queue, Path inputDir, Path extracts, ExtractReader extractReader, IDBWriter dbWriter)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
processFileResource(FileResource fileResource)
Main piece of code that needs to be implemented.static void
USAGE()
-
Methods inherited from class org.apache.tika.eval.AbstractProfiler
calcTextStats, closeWriter, getContent, getFileLength, getPathsFromExtractCrawl, getPathsFromSrcCrawl, getSourceFileLength, loadCommonTokens, setMaxContentLength, setMaxContentLengthForLangId, setMaxTokens, truncateContent, writeContentData, writeExceptionData, writeExtractException, writeProfileData
-
Methods inherited from class org.apache.tika.batch.FileResourceConsumer
call, checkForTimedOutMillis, close, flushAndClose, getCurrentFile, getNumHandledExceptions, getNumResourcesConsumed, getXMLifiedLogMsg, getXMLifiedLogMsg, incrementHandledExceptions, isStillActive, parse, pleaseShutdown
-
-
-
-
Field Detail
-
EXTRACT_EXCEPTION_TABLE
public static TableInfo EXTRACT_EXCEPTION_TABLE
-
EXCEPTION_TABLE
public static TableInfo EXCEPTION_TABLE
-
CONTAINER_TABLE
public static TableInfo CONTAINER_TABLE
-
PROFILE_TABLE
public static TableInfo PROFILE_TABLE
-
EMBEDDED_FILE_PATH_TABLE
public static TableInfo EMBEDDED_FILE_PATH_TABLE
-
CONTENTS_TABLE
public static TableInfo CONTENTS_TABLE
-
TAGS_TABLE
public static TableInfo TAGS_TABLE
-
-
Constructor Detail
-
ExtractProfiler
public ExtractProfiler(ArrayBlockingQueue<FileResource> queue, Path inputDir, Path extracts, ExtractReader extractReader, IDBWriter dbWriter)
-
-
Method Detail
-
USAGE
public static void USAGE()
-
processFileResource
public boolean processFileResource(FileResource fileResource)
Description copied from class:FileResourceConsumer
Main piece of code that needs to be implemented. Clients are responsible for closing streams and handling the exceptions that they'd like to handle. Unchecked throwables can be thrown past this, of course. When an unchecked throwable is thrown, this logs the error, and then rethrows the exception. Clients/subclasses should make sure to catch and handle everything they can. The design goal is that the whole process should close up and shutdown soon after an unchecked exception or error is thrown. Make sure to callFileResourceConsumer.incrementHandledExceptions()
appropriately in your implementation of this method.- Specified by:
processFileResource
in classFileResourceConsumer
- Parameters:
fileResource
- resource to process- Returns:
- whether or not a file was successfully processed
-
-