Package org.apache.tika.eval.app
Class ExtractComparer
- java.lang.Object
-
- org.apache.tika.batch.FileResourceConsumer
-
- org.apache.tika.eval.app.AbstractProfiler
-
- org.apache.tika.eval.app.ExtractComparer
-
- All Implemented Interfaces:
Callable<IFileProcessorFutureResult>
public class ExtractComparer extends AbstractProfiler
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.tika.eval.app.AbstractProfiler
AbstractProfiler.EXCEPTION_TYPE, AbstractProfiler.PARSE_ERROR_TYPE
-
-
Field Summary
Fields Modifier and Type Field Description static TableInfo
COMPARISON_CONTAINERS
static TableInfo
CONTENT_COMPARISONS
static TableInfo
CONTENTS_TABLE_A
static TableInfo
CONTENTS_TABLE_B
static TableInfo
EMBEDDED_FILE_PATH_TABLE_A
static TableInfo
EMBEDDED_FILE_PATH_TABLE_B
static TableInfo
EXCEPTION_TABLE_A
static TableInfo
EXCEPTION_TABLE_B
static TableInfo
EXTRACT_EXCEPTION_TABLE_A
static TableInfo
EXTRACT_EXCEPTION_TABLE_B
static TableInfo
PROFILES_A
static TableInfo
PROFILES_B
static TableInfo
REF_PAIR_NAMES
static TableInfo
TAGS_TABLE_A
static TableInfo
TAGS_TABLE_B
-
Fields inherited from class org.apache.tika.eval.app.AbstractProfiler
FALSE, ID, MIME_TABLE, REF_EXTRACT_EXCEPTION_TYPES, REF_PARSE_ERROR_TYPES, REF_PARSE_EXCEPTION_TYPES, TRUE, writer
-
Fields inherited from class org.apache.tika.batch.FileResourceConsumer
ELAPSED_MILLIS, IO_IS, IO_OS, OOM, PARSE_ERR, PARSE_EX, TIMED_OUT
-
-
Constructor Summary
Constructors Constructor Description ExtractComparer(ArrayBlockingQueue<FileResource> queue, Path inputDir, Path extractsA, Path extractsB, ExtractReader extractReader, IDBWriter writer)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
compareFiles(org.apache.tika.eval.app.EvalFilePaths fpsA, org.apache.tika.eval.app.EvalFilePaths fpsB)
boolean
processFileResource(FileResource fileResource)
Main piece of code that needs to be implemented.static void
USAGE()
-
Methods inherited from class org.apache.tika.eval.app.AbstractProfiler
calcTextStats, closeWriter, getContent, getFileLength, getPathsFromExtractCrawl, getPathsFromSrcCrawl, getSourceFileLength, loadCommonTokens, setMaxContentLength, setMaxContentLengthForLangId, setMaxTokens, truncateContent, writeContentData, writeExceptionData, writeExtractException, writeProfileData
-
Methods inherited from class org.apache.tika.batch.FileResourceConsumer
call, checkForTimedOutMillis, close, flushAndClose, getCurrentFile, getNumHandledExceptions, getNumResourcesConsumed, getXMLifiedLogMsg, getXMLifiedLogMsg, incrementHandledExceptions, isStillActive, parse, pleaseShutdown
-
-
-
-
Field Detail
-
REF_PAIR_NAMES
public static TableInfo REF_PAIR_NAMES
-
COMPARISON_CONTAINERS
public static TableInfo COMPARISON_CONTAINERS
-
CONTENT_COMPARISONS
public static TableInfo CONTENT_COMPARISONS
-
PROFILES_A
public static TableInfo PROFILES_A
-
PROFILES_B
public static TableInfo PROFILES_B
-
EMBEDDED_FILE_PATH_TABLE_A
public static TableInfo EMBEDDED_FILE_PATH_TABLE_A
-
EMBEDDED_FILE_PATH_TABLE_B
public static TableInfo EMBEDDED_FILE_PATH_TABLE_B
-
CONTENTS_TABLE_A
public static TableInfo CONTENTS_TABLE_A
-
CONTENTS_TABLE_B
public static TableInfo CONTENTS_TABLE_B
-
TAGS_TABLE_A
public static TableInfo TAGS_TABLE_A
-
TAGS_TABLE_B
public static TableInfo TAGS_TABLE_B
-
EXCEPTION_TABLE_A
public static TableInfo EXCEPTION_TABLE_A
-
EXCEPTION_TABLE_B
public static TableInfo EXCEPTION_TABLE_B
-
EXTRACT_EXCEPTION_TABLE_A
public static TableInfo EXTRACT_EXCEPTION_TABLE_A
-
EXTRACT_EXCEPTION_TABLE_B
public static TableInfo EXTRACT_EXCEPTION_TABLE_B
-
-
Constructor Detail
-
ExtractComparer
public ExtractComparer(ArrayBlockingQueue<FileResource> queue, Path inputDir, Path extractsA, Path extractsB, ExtractReader extractReader, IDBWriter writer)
-
-
Method Detail
-
USAGE
public static void USAGE()
-
processFileResource
public boolean processFileResource(FileResource fileResource)
Description copied from class:FileResourceConsumer
Main piece of code that needs to be implemented. Clients are responsible for closing streams and handling the exceptions that they'd like to handle. Unchecked throwables can be thrown past this, of course. When an unchecked throwable is thrown, this logs the error, and then rethrows the exception. Clients/subclasses should make sure to catch and handle everything they can. The design goal is that the whole process should close up and shutdown soon after an unchecked exception or error is thrown. Make sure to callFileResourceConsumer.incrementHandledExceptions()
appropriately in your implementation of this method.- Specified by:
processFileResource
in classFileResourceConsumer
- Parameters:
fileResource
- resource to process- Returns:
- whether or not a file was successfully processed
-
compareFiles
protected void compareFiles(org.apache.tika.eval.app.EvalFilePaths fpsA, org.apache.tika.eval.app.EvalFilePaths fpsB) throws IOException
- Throws:
IOException
-
-