public class TextStatsFromTikaEval extends Object
CompositeTextStatsCalculator
for each call. This is extremely inefficient because the lang id
model has to be loaded and the common words for each call.Constructor and Description |
---|
TextStatsFromTikaEval() |
Modifier and Type | Method and Description |
---|---|
double |
getOOV(String txt)
Use the default language id models and the default common tokens
lists in tika-eval to calculate the out-of-vocabulary percentage
for a given string.
|
public double getOOV(String txt)
txt
- Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.