Class CommonTokenCountManager

java.lang.Object
org.apache.tika.eval.core.tokens.CommonTokenCountManager

public class CommonTokenCountManager extends Object
  • Constructor Details

    • CommonTokenCountManager

      public CommonTokenCountManager()
    • CommonTokenCountManager

      public CommonTokenCountManager(Path commonTokensDir, String defaultLangCode)
  • Method Details

    • getTokens

      public Set<String> getTokens(String lang)
    • getLangs

      public Set<String> getLangs()
    • getLangTokens

      public org.apache.commons.lang3.tuple.Pair<String,LangModel> getLangTokens(String lang)
      Parameters:
      lang -
      Returns:
      pair of actual language code used and a set of common tokens for that language
    • close

      public void close() throws IOException
      Throws:
      IOException