Class TextStatsFromTikaEval

  • public class TextStatsFromTikaEval
    extends Object
    These examples create a new CompositeTextStatsCalculator for each call. This is extremely inefficient because the lang id model has to be loaded and the common words for each call.
    • Constructor Detail

      • TextStatsFromTikaEval

        public TextStatsFromTikaEval()
    • Method Detail

      • getOOV

        public double getOOV​(String txt)
        Use the default language id models and the default common tokens lists in tika-eval to calculate the out-of-vocabulary percentage for a given string.
        txt -