Class ConfusableGroups
java.lang.Object
org.apache.tika.langdetect.charsoup.ConfusableGroups
Loads the shared confusable language groups from
confusables.txt on the classpath. This is the single source
of truth used by CharSoupLanguageDetector (production inference),
CompareDetectors (evaluation), and TrainLanguageModel
(filterPool). The Python contamination filter reads the same file directly.-
Method Summary
-
Method Details
-
load
Load and return the confusable groups. Each entry is an array of ISO 639-3 codes that are considered mutually confusable.- Throws:
RuntimeException- if the resource cannot be read
-