Package org.apache.tika.eval.core.tokens
package org.apache.tika.eval.core.tokens
-
ClassDescriptionFactory for filter that only allows tokens with characters that "isAlphabetic" or "isIdeographic" through.Creates a very narrowly focused TokenFilter that limits tokens based on length _unless_ they've been identified as <DOUBLE> or <SINGLE> by the CJKBigramFilter.Computes some corpus contrast statistics.Deprecated.Factory for filter that normalizes urls and emails to __url__ and __email__ respectively.
CompositeTextStatsCalculator
withTokenEntropy
,TokenLengths
andTopNTokens
.