Enum Class StructuralEncodingRules.Utf8Result

java.lang.Object
java.lang.Enum<StructuralEncodingRules.Utf8Result>
org.apache.tika.ml.chardetect.StructuralEncodingRules.Utf8Result
All Implemented Interfaces:
Serializable, Comparable<StructuralEncodingRules.Utf8Result>, Constable
Enclosing class:
StructuralEncodingRules

public static enum StructuralEncodingRules.Utf8Result extends Enum<StructuralEncodingRules.Utf8Result>
Outcome of the UTF-8 structural check.
  • Enum Constant Details

    • LIKELY_UTF8

      public static final StructuralEncodingRules.Utf8Result LIKELY_UTF8
      Sample is grammatically valid UTF-8 and contains at least one complete multi-byte sequence. Not a guarantee — short CJK probes occasionally pass UTF-8 grammar by coincidence (on our training corpus, FP is ≤ 0.77% at 16B, ≤ 0.05% at 256B, ≤ 0.01% at 4KB). Callers should add UTF-8 as a candidate to the pool, not treat it as a hard winner — let downstream language-signal arbitration resolve genuine FPs.
    • NOT_UTF8

      public static final StructuralEncodingRules.Utf8Result NOT_UTF8
      Sample contains at least one invalid UTF-8 sequence.
    • AMBIGUOUS

      public static final StructuralEncodingRules.Utf8Result AMBIGUOUS
      Sample is structurally valid but contains no complete multi-byte sequence (pure ASCII, or only a truncated lead at probe-end). Cannot confirm or deny UTF-8; pass to the statistical model.
  • Method Details

    • values

      public static StructuralEncodingRules.Utf8Result[] values()
      Returns an array containing the constants of this enum class, in the order they are declared.
      Returns:
      an array containing the constants of this enum class, in the order they are declared
    • valueOf

      public static StructuralEncodingRules.Utf8Result valueOf(String name)
      Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)
      Parameters:
      name - the name of the enum constant to be returned.
      Returns:
      the enum constant with the specified name
      Throws:
      IllegalArgumentException - if this enum class has no constant with the specified name
      NullPointerException - if the argument is null
    • isDecisive

      public boolean isDecisive()
      Returns true when the grammar check produced a directional answer (either LIKELY_UTF8 or NOT_UTF8). AMBIGUOUS means the probe carries no UTF-8-specific evidence.
    • toCharset

      public Charset toCharset()