Package org.apache.tika.ml.chardetect
Enum Class StructuralEncodingRules.Utf8Result
java.lang.Object
java.lang.Enum<StructuralEncodingRules.Utf8Result>
org.apache.tika.ml.chardetect.StructuralEncodingRules.Utf8Result
- All Implemented Interfaces:
Serializable,Comparable<StructuralEncodingRules.Utf8Result>,Constable
- Enclosing class:
- StructuralEncodingRules
public static enum StructuralEncodingRules.Utf8Result
extends Enum<StructuralEncodingRules.Utf8Result>
Outcome of the UTF-8 structural check.
-
Nested Class Summary
Nested classes/interfaces inherited from class java.lang.Enum
Enum.EnumDesc<E extends Enum<E>> -
Enum Constant Summary
Enum ConstantsEnum ConstantDescriptionSample is structurally valid but contains no complete multi-byte sequence (pure ASCII, or only a truncated lead at probe-end).Sample is grammatically valid UTF-8 and contains at least one complete multi-byte sequence.Sample contains at least one invalid UTF-8 sequence. -
Method Summary
Modifier and TypeMethodDescriptionbooleanReturns true when the grammar check produced a directional answer (either LIKELY_UTF8 or NOT_UTF8).Returns the enum constant of this class with the specified name.static StructuralEncodingRules.Utf8Result[]values()Returns an array containing the constants of this enum class, in the order they are declared.
-
Enum Constant Details
-
LIKELY_UTF8
Sample is grammatically valid UTF-8 and contains at least one complete multi-byte sequence. Not a guarantee — short CJK probes occasionally pass UTF-8 grammar by coincidence (on our training corpus, FP is ≤ 0.77% at 16B, ≤ 0.05% at 256B, ≤ 0.01% at 4KB). Callers should add UTF-8 as a candidate to the pool, not treat it as a hard winner — let downstream language-signal arbitration resolve genuine FPs. -
NOT_UTF8
Sample contains at least one invalid UTF-8 sequence. -
AMBIGUOUS
Sample is structurally valid but contains no complete multi-byte sequence (pure ASCII, or only a truncated lead at probe-end). Cannot confirm or deny UTF-8; pass to the statistical model.
-
-
Method Details
-
values
Returns an array containing the constants of this enum class, in the order they are declared.- Returns:
- an array containing the constants of this enum class, in the order they are declared
-
valueOf
Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)- Parameters:
name- the name of the enum constant to be returned.- Returns:
- the enum constant with the specified name
- Throws:
IllegalArgumentException- if this enum class has no constant with the specified nameNullPointerException- if the argument is null
-
isDecisive
public boolean isDecisive()Returns true when the grammar check produced a directional answer (either LIKELY_UTF8 or NOT_UTF8). AMBIGUOUS means the probe carries no UTF-8-specific evidence. -
toCharset
-