Class TextProfileSignature
java.lang.Object
org.apache.tika.eval.core.textstats.TextProfileSignature
- All Implemented Interfaces:
 TextStatsCalculator,TokenCountStatsCalculator<String>
Copied nearly directly from Apache Nutch:
 https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/TextProfileSignature.java
 
See documentation: https://nutch.apache.org/apidocs/apidocs-2.0/org/apache/nutch/crawl/TextProfileSignature.html
This returns the base32 encoded sha256
- 
Constructor Summary
Constructors - 
Method Summary
Modifier and TypeMethodDescriptioncalculate(TokenCounts tokenCounts) voidsetMinTokenLength(int minTokenLength) Be careful -- for CJK languages, the default analyzer uses character bigrams.voidsetQuantRate(float quantRate)  
- 
Constructor Details
- 
TextProfileSignature
public TextProfileSignature() 
 - 
 - 
Method Details
- 
calculate
- Specified by:
 calculatein interfaceTokenCountStatsCalculator<String>
 - 
setMinTokenLength
public void setMinTokenLength(int minTokenLength) Be careful -- for CJK languages, the default analyzer uses character bigrams. You will "ignore" all cjk language tokens if you set minTokenLength > 2!- Parameters:
 minTokenLength- -- include tokens of this length or greater.
 - 
setQuantRate
public void setQuantRate(float quantRate)  
 -