Class ConfusableGroups

java.lang.Object
org.apache.tika.langdetect.charsoup.ConfusableGroups

public final class ConfusableGroups extends Object
Loads the shared confusable language groups from confusables.txt on the classpath. This is the single source of truth used by CharSoupLanguageDetector (production inference), CompareDetectors (evaluation), and TrainLanguageModel (filterPool). The Python contamination filter reads the same file directly.
  • Method Details

    • load

      public static String[][] load()
      Load and return the confusable groups. Each entry is an array of ISO 639-3 codes that are considered mutually confusable.
      Throws:
      RuntimeException - if the resource cannot be read