Package | Description |
---|---|
org.apache.tika.eval.langid | |
org.apache.tika.eval.textstats |
Modifier and Type | Class and Description |
---|---|
class |
LanguageIDWrapper
The most efficient way to call this in a multithreaded environment
is to call
LanguageIDWrapper.loadBuiltInModels() before
instantiating the |
Modifier and Type | Interface and Description |
---|---|
interface |
BytesRefCalculator<T>
Interface for calculators that require a string
|
interface |
LanguageAwareTokenCountStats<T>
Interface for calculators that require language probabilities and token stats
|
interface |
StringStatsCalculator<T>
Interface for calculators that require a string
|
interface |
TokenCountStatsCalculator<T>
Interface for calculators that require token stats
|
Modifier and Type | Class and Description |
---|---|
class |
BasicTokenCountStatsCalculator |
class |
CommonTokens |
class |
CommonTokensBhattacharyya |
class |
CommonTokensCosine |
class |
CommonTokensHellinger |
class |
CommonTokensKLDivergence |
class |
CommonTokensKLDNormed |
class |
ContentLengthCalculator |
class |
TextProfileSignature
Copied nearly directly from Apache Nutch:
https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/TextProfileSignature.java
See documentation: https://nutch.apache.org/apidocs/apidocs-2.0/org/apache/nutch/crawl/TextProfileSignature.html
This returns the base32 encoded sha256
|
class |
TextSha256Signature
Calculates the base32 encoded SHA-256 checksum on the analyzed text
|
class |
TokenEntropy |
class |
TokenLengths |
class |
TopNTokens |
class |
UnicodeBlockCounter |
Constructor and Description |
---|
CompositeTextStatsCalculator(List<TextStatsCalculator> calculators) |
CompositeTextStatsCalculator(List<TextStatsCalculator> calculators,
org.apache.lucene.analysis.Analyzer analyzer,
LanguageIDWrapper languageIDWrapper) |
Copyright © 2007–2021 The Apache Software Foundation. All rights reserved.