Package org.apache.tika.quality
Class TextQualityComparison
java.lang.Object
org.apache.tika.quality.TextQualityComparison
Result of comparing two candidate strings for text quality via
TextQualityDetector.compare(java.lang.String, java.lang.String, java.lang.String, java.lang.String).
A typical use is charset-decoding arbitration: given raw bytes decoded two different ways (e.g. cp1251 vs cp1252), pass each decoded string with a label and let the detector pick the cleaner one.
The delta field is the absolute difference between the two z-scores.
A delta near zero means the model is uncertain; larger values indicate
confident discrimination. As a rough guide: delta > 1.0 is useful signal,
delta > 3.0 is confident.
-
Constructor Summary
ConstructorsConstructorDescriptionTextQualityComparison(String winner, float delta, TextQualityScore scoreA, TextQualityScore scoreB, String labelA, String labelB) -
Method Summary
Modifier and TypeMethodDescriptionfloatdelta()Absolute difference in z-scores between the two candidates.labelA()Label supplied for candidate A (e.g. a charset name or encoding description).labelB()Label supplied for candidate B.scoreA()Quality score for candidate A.scoreB()Quality score for candidate B.toString()winner()Returns"A"if candidate A is cleaner,"B"otherwise.
-
Constructor Details
-
TextQualityComparison
public TextQualityComparison(String winner, float delta, TextQualityScore scoreA, TextQualityScore scoreB, String labelA, String labelB)
-
-
Method Details
-
winner
-
delta
public float delta()Absolute difference in z-scores between the two candidates. Small delta = uncertain; large delta = confident. -
scoreA
Quality score for candidate A. -
scoreB
Quality score for candidate B. -
labelA
Label supplied for candidate A (e.g. a charset name or encoding description). -
labelB
Label supplied for candidate B. -
toString
-