Package org.apache.tika.eval.app
Class ExtractComparer
- java.lang.Object
-
- org.apache.tika.batch.FileResourceConsumer
-
- org.apache.tika.eval.app.AbstractProfiler
-
- org.apache.tika.eval.app.ExtractComparer
-
- All Implemented Interfaces:
Callable<IFileProcessorFutureResult>
public class ExtractComparer extends AbstractProfiler
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.tika.eval.app.AbstractProfiler
AbstractProfiler.EXCEPTION_TYPE, AbstractProfiler.PARSE_ERROR_TYPE
-
-
Field Summary
Fields Modifier and Type Field Description static TableInfoCOMPARISON_CONTAINERSstatic TableInfoCONTENT_COMPARISONSstatic TableInfoCONTENTS_TABLE_Astatic TableInfoCONTENTS_TABLE_Bstatic TableInfoEMBEDDED_FILE_PATH_TABLE_Astatic TableInfoEMBEDDED_FILE_PATH_TABLE_Bstatic TableInfoEXCEPTION_TABLE_Astatic TableInfoEXCEPTION_TABLE_Bstatic TableInfoEXTRACT_EXCEPTION_TABLE_Astatic TableInfoEXTRACT_EXCEPTION_TABLE_Bstatic TableInfoPROFILES_Astatic TableInfoPROFILES_Bstatic TableInfoREF_PAIR_NAMESstatic TableInfoTAGS_TABLE_Astatic TableInfoTAGS_TABLE_B-
Fields inherited from class org.apache.tika.eval.app.AbstractProfiler
FALSE, ID, MIME_TABLE, REF_EXTRACT_EXCEPTION_TYPES, REF_PARSE_ERROR_TYPES, REF_PARSE_EXCEPTION_TYPES, TRUE, writer
-
Fields inherited from class org.apache.tika.batch.FileResourceConsumer
ELAPSED_MILLIS, IO_IS, IO_OS, OOM, PARSE_ERR, PARSE_EX, TIMED_OUT
-
-
Constructor Summary
Constructors Constructor Description ExtractComparer(ArrayBlockingQueue<FileResource> queue, Path inputDir, Path extractsA, Path extractsB, ExtractReader extractReader, IDBWriter writer)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcompareFiles(org.apache.tika.eval.app.EvalFilePaths fpsA, org.apache.tika.eval.app.EvalFilePaths fpsB)booleanprocessFileResource(FileResource fileResource)Main piece of code that needs to be implemented.static voidUSAGE()-
Methods inherited from class org.apache.tika.eval.app.AbstractProfiler
calcTextStats, closeWriter, getContent, getFileLength, getPathsFromExtractCrawl, getPathsFromSrcCrawl, getSourceFileLength, loadCommonTokens, setMaxContentLength, setMaxContentLengthForLangId, setMaxTokens, truncateContent, writeContentData, writeExceptionData, writeExtractException, writeProfileData
-
Methods inherited from class org.apache.tika.batch.FileResourceConsumer
call, checkForTimedOutMillis, close, flushAndClose, getCurrentFile, getNumHandledExceptions, getNumResourcesConsumed, getXMLifiedLogMsg, getXMLifiedLogMsg, incrementHandledExceptions, isStillActive, parse, pleaseShutdown
-
-
-
-
Field Detail
-
REF_PAIR_NAMES
public static TableInfo REF_PAIR_NAMES
-
COMPARISON_CONTAINERS
public static TableInfo COMPARISON_CONTAINERS
-
CONTENT_COMPARISONS
public static TableInfo CONTENT_COMPARISONS
-
PROFILES_A
public static TableInfo PROFILES_A
-
PROFILES_B
public static TableInfo PROFILES_B
-
EMBEDDED_FILE_PATH_TABLE_A
public static TableInfo EMBEDDED_FILE_PATH_TABLE_A
-
EMBEDDED_FILE_PATH_TABLE_B
public static TableInfo EMBEDDED_FILE_PATH_TABLE_B
-
CONTENTS_TABLE_A
public static TableInfo CONTENTS_TABLE_A
-
CONTENTS_TABLE_B
public static TableInfo CONTENTS_TABLE_B
-
TAGS_TABLE_A
public static TableInfo TAGS_TABLE_A
-
TAGS_TABLE_B
public static TableInfo TAGS_TABLE_B
-
EXCEPTION_TABLE_A
public static TableInfo EXCEPTION_TABLE_A
-
EXCEPTION_TABLE_B
public static TableInfo EXCEPTION_TABLE_B
-
EXTRACT_EXCEPTION_TABLE_A
public static TableInfo EXTRACT_EXCEPTION_TABLE_A
-
EXTRACT_EXCEPTION_TABLE_B
public static TableInfo EXTRACT_EXCEPTION_TABLE_B
-
-
Constructor Detail
-
ExtractComparer
public ExtractComparer(ArrayBlockingQueue<FileResource> queue, Path inputDir, Path extractsA, Path extractsB, ExtractReader extractReader, IDBWriter writer)
-
-
Method Detail
-
USAGE
public static void USAGE()
-
processFileResource
public boolean processFileResource(FileResource fileResource)
Description copied from class:FileResourceConsumerMain piece of code that needs to be implemented. Clients are responsible for closing streams and handling the exceptions that they'd like to handle. Unchecked throwables can be thrown past this, of course. When an unchecked throwable is thrown, this logs the error, and then rethrows the exception. Clients/subclasses should make sure to catch and handle everything they can. The design goal is that the whole process should close up and shutdown soon after an unchecked exception or error is thrown. Make sure to callFileResourceConsumer.incrementHandledExceptions()appropriately in your implementation of this method.- Specified by:
processFileResourcein classFileResourceConsumer- Parameters:
fileResource- resource to process- Returns:
- whether or not a file was successfully processed
-
compareFiles
protected void compareFiles(org.apache.tika.eval.app.EvalFilePaths fpsA, org.apache.tika.eval.app.EvalFilePaths fpsB) throws IOException- Throws:
IOException
-
-