public class TextStatsFromTikaEval extends Object
CompositeTextStatsCalculator
for each call. This is extremely inefficient because the lang id
model has to be loaded and the common words for each call.| Constructor and Description |
|---|
TextStatsFromTikaEval() |
| Modifier and Type | Method and Description |
|---|---|
double |
getOOV(String txt)
Use the default language id models and the default common tokens
lists in tika-eval to calculate the out-of-vocabulary percentage
for a given string.
|
public double getOOV(String txt)
txt - Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.