Package org.apache.tika.eval.core.tokens
Class AnalyzerManager
java.lang.Object
org.apache.tika.eval.core.tokens.AnalyzerManager
Manages tokenization for tika-eval. Uses
TikaEvalTokenizer in
STANDARD mode, which includes
alphabetic, ideographic, and numeric tokens with NFKD normalization,
case folding, and CJK bigrams. No minimum length filter or skip list
is applied — those are only used in
COMMON_TOKENS mode.-
Method Summary
Modifier and TypeMethodDescriptionintGet the max token limit.static AnalyzerManagernewInstance(int maxTokens) Tokenize the given text and return a TokenCounts object.voidTokenize and stream tokens to a consumer, respecting maxTokens limit.
-
Method Details
-
newInstance
-
tokenize
Tokenize the given text and return a TokenCounts object.- Parameters:
text- input text- Returns:
- token counts
-
tokenize
Tokenize and stream tokens to a consumer, respecting maxTokens limit.- Parameters:
text- input textconsumer- receives each token string
-
getMaxTokens
public int getMaxTokens()Get the max token limit.
-