Class TextQualityComparison

java.lang.Object
org.apache.tika.quality.TextQualityComparison

public final class TextQualityComparison extends Object
Result of comparing two candidate strings for text quality via TextQualityDetector.compare(java.lang.String, java.lang.String, java.lang.String, java.lang.String).

A typical use is charset-decoding arbitration: given raw bytes decoded two different ways (e.g. cp1251 vs cp1252), pass each decoded string with a label and let the detector pick the cleaner one.

The delta field is the absolute difference between the two z-scores. A delta near zero means the model is uncertain; larger values indicate confident discrimination. As a rough guide: delta > 1.0 is useful signal, delta > 3.0 is confident.

  • Constructor Details

  • Method Details

    • winner

      public String winner()
      Returns "A" if candidate A is cleaner, "B" otherwise. Check delta() to gauge confidence.
    • delta

      public float delta()
      Absolute difference in z-scores between the two candidates. Small delta = uncertain; large delta = confident.
    • scoreA

      public TextQualityScore scoreA()
      Quality score for candidate A.
    • scoreB

      public TextQualityScore scoreB()
      Quality score for candidate B.
    • labelA

      public String labelA()
      Label supplied for candidate A (e.g. a charset name or encoding description).
    • labelB

      public String labelB()
      Label supplied for candidate B.
    • toString

      public String toString()
      Overrides:
      toString in class Object