Class DecodeEquivalence
For single-byte codepages, the mapping from byte value (0x00..0xFF) to
Unicode codepoint is a fixed table. Two charsets decode a probe
byte-for-byte identically iff their byte-to-char tables agree on every
byte value that appears in the probe. ASCII bytes (below 0x80)
map identically in every Latin-family codepage and are skipped; the check
reduces to "do these charsets agree on every high byte present in this
probe?"
Cost: O(probe.length) per call in the worst case, typically
short-circuits on the first disagreement. Byte-to-char tables are
computed lazily on first use and cached for process lifetime.
This is the inference-time counterpart to the broader
CharsetConfusables#POTENTIAL_DECODE_EQUIV_FAMILIES declaration —
families enumerate which pairs are potentially byte-identical;
this class decides whether they are actually byte-identical on a
specific probe.
-
Method Summary
Modifier and TypeMethodDescriptionstatic booleanbyteIdenticalOnProbe(byte[] probe, Charset a, Charset b) Returnstrueif decodingprobeunder charsetsaandbproduces bit-identical character sequences.
-
Method Details
-
byteIdenticalOnProbe
Returnstrueif decodingprobeunder charsetsaandbproduces bit-identical character sequences. Only the high-byte positions (bytes>= 0x80) are compared; all Latin-family charsets agree on ASCII.Returns
false(and caches nothing) if either charset's byte table cannot be resolved (e.g. stateful, multi-byte, or JVM-unsupported). Callers should restrict invocation to single-byte charsets, typically viaCharsetConfusables#potentialDecodeEquivPeersOf(String).
-