Interface | Description |
---|---|
BytesRefCalculator<T> |
Interface for calculators that require a string
|
BytesRefCalculator.BytesRefCalcInstance<T> | |
LanguageAwareTokenCountStats<T> |
Interface for calculators that require language probabilities and token stats
|
StringStatsCalculator<T> |
Interface for calculators that require a string
|
TextStatsCalculator |
Base text stats interface
|
TokenCountStatsCalculator<T> |
Interface for calculators that require token stats
|
Class | Description |
---|---|
BasicTokenCountStatsCalculator | |
CommonTokens | |
CommonTokensBhattacharyya | |
CommonTokensCosine | |
CommonTokensHellinger | |
CommonTokensKLDivergence | |
CommonTokensKLDNormed | |
CompositeTextStatsCalculator | |
ContentLengthCalculator | |
TextProfileSignature |
Copied nearly directly from Apache Nutch:
https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/TextProfileSignature.java
See documentation: https://nutch.apache.org/apidocs/apidocs-2.0/org/apache/nutch/crawl/TextProfileSignature.html
This returns the base32 encoded sha256
|
TextSha256Signature |
Calculates the base32 encoded SHA-256 checksum on the analyzed text
|
TokenCountPriorityQueue | |
TokenEntropy | |
TokenLengths | |
TopNTokens | |
UnicodeBlockCounter |
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.