Class TextProfileSignature
- java.lang.Object
-
- org.apache.tika.eval.core.textstats.TextProfileSignature
-
- All Implemented Interfaces:
TextStatsCalculator
,TokenCountStatsCalculator<String>
public class TextProfileSignature extends Object implements TokenCountStatsCalculator<String>
Copied nearly directly from Apache Nutch: https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/TextProfileSignature.javaSee documentation: https://nutch.apache.org/apidocs/apidocs-2.0/org/apache/nutch/crawl/TextProfileSignature.html
This returns the base32 encoded sha256
-
-
Constructor Summary
Constructors Constructor Description TextProfileSignature()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
calculate(TokenCounts tokenCounts)
void
setMinTokenLength(int minTokenLength)
Be careful -- for CJK languages, the default analyzer uses character bigrams.void
setQuantRate(float quantRate)
-
-
-
Method Detail
-
calculate
public String calculate(TokenCounts tokenCounts)
- Specified by:
calculate
in interfaceTokenCountStatsCalculator<String>
-
setMinTokenLength
public void setMinTokenLength(int minTokenLength)
Be careful -- for CJK languages, the default analyzer uses character bigrams. You will "ignore" all cjk language tokens if you set minTokenLength > 2!- Parameters:
minTokenLength
- -- include tokens of this length or greater.
-
setQuantRate
public void setQuantRate(float quantRate)
-
-