Package org.apache.tika.eval.app
Class ExtractComparer
java.lang.Object
org.apache.tika.batch.FileResourceConsumer
org.apache.tika.eval.app.AbstractProfiler
org.apache.tika.eval.app.ExtractComparer
- All Implemented Interfaces:
Callable<IFileProcessorFutureResult>
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.tika.eval.app.AbstractProfiler
AbstractProfiler.EXCEPTION_TYPE, AbstractProfiler.PARSE_ERROR_TYPE
-
Field Summary
Modifier and TypeFieldDescriptionstatic TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
static TableInfo
Fields inherited from class org.apache.tika.eval.app.AbstractProfiler
FALSE, ID, MIME_TABLE, REF_EXTRACT_EXCEPTION_TYPES, REF_PARSE_ERROR_TYPES, REF_PARSE_EXCEPTION_TYPES, TRUE, writer
Fields inherited from class org.apache.tika.batch.FileResourceConsumer
ELAPSED_MILLIS, IO_IS, IO_OS, OOM, PARSE_ERR, PARSE_EX, TIMED_OUT
-
Constructor Summary
ConstructorDescriptionExtractComparer
(ArrayBlockingQueue<FileResource> queue, Path inputDir, Path extractsA, Path extractsB, ExtractReader extractReader, IDBWriter writer) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
compareFiles
(org.apache.tika.eval.app.EvalFilePaths fpsA, org.apache.tika.eval.app.EvalFilePaths fpsB) boolean
processFileResource
(FileResource fileResource) Main piece of code that needs to be implemented.static void
USAGE()
Methods inherited from class org.apache.tika.eval.app.AbstractProfiler
calcTextStats, closeWriter, getContent, getFileLength, getPathsFromExtractCrawl, getPathsFromSrcCrawl, getSourceFileLength, loadCommonTokens, setMaxContentLength, setMaxContentLengthForLangId, setMaxTokens, truncateContent, writeContentData, writeExceptionData, writeExtractException, writeProfileData
Methods inherited from class org.apache.tika.batch.FileResourceConsumer
call, checkForTimedOutMillis, close, flushAndClose, getCurrentFile, getNumHandledExceptions, getNumResourcesConsumed, getXMLifiedLogMsg, getXMLifiedLogMsg, incrementHandledExceptions, isStillActive, parse, pleaseShutdown
-
Field Details
-
REF_PAIR_NAMES
-
COMPARISON_CONTAINERS
-
CONTENT_COMPARISONS
-
PROFILES_A
-
PROFILES_B
-
EMBEDDED_FILE_PATH_TABLE_A
-
EMBEDDED_FILE_PATH_TABLE_B
-
CONTENTS_TABLE_A
-
CONTENTS_TABLE_B
-
TAGS_TABLE_A
-
TAGS_TABLE_B
-
EXCEPTION_TABLE_A
-
EXCEPTION_TABLE_B
-
EXTRACT_EXCEPTION_TABLE_A
-
EXTRACT_EXCEPTION_TABLE_B
-
-
Constructor Details
-
ExtractComparer
public ExtractComparer(ArrayBlockingQueue<FileResource> queue, Path inputDir, Path extractsA, Path extractsB, ExtractReader extractReader, IDBWriter writer)
-
-
Method Details
-
USAGE
public static void USAGE() -
processFileResource
Description copied from class:FileResourceConsumer
Main piece of code that needs to be implemented. Clients are responsible for closing streams and handling the exceptions that they'd like to handle. Unchecked throwables can be thrown past this, of course. When an unchecked throwable is thrown, this logs the error, and then rethrows the exception. Clients/subclasses should make sure to catch and handle everything they can. The design goal is that the whole process should close up and shutdown soon after an unchecked exception or error is thrown. Make sure to callFileResourceConsumer.incrementHandledExceptions()
appropriately in your implementation of this method.- Specified by:
processFileResource
in classFileResourceConsumer
- Parameters:
fileResource
- resource to process- Returns:
- whether or not a file was successfully processed
-
compareFiles
protected void compareFiles(org.apache.tika.eval.app.EvalFilePaths fpsA, org.apache.tika.eval.app.EvalFilePaths fpsB) throws IOException - Throws:
IOException
-