Class TextProfileSignature

  • All Implemented Interfaces:
    TextStatsCalculator, TokenCountStatsCalculator<String>

    public class TextProfileSignature
    extends Object
    implements TokenCountStatsCalculator<String>
    Copied nearly directly from Apache Nutch: https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/TextProfileSignature.java

    See documentation: https://nutch.apache.org/apidocs/apidocs-2.0/org/apache/nutch/crawl/TextProfileSignature.html

    This returns the base32 encoded sha256

    • Constructor Detail

      • TextProfileSignature

        public TextProfileSignature()
    • Method Detail

      • setMinTokenLength

        public void setMinTokenLength​(int minTokenLength)
        Be careful -- for CJK languages, the default analyzer uses character bigrams. You will "ignore" all cjk language tokens if you set minTokenLength > 2!
        Parameters:
        minTokenLength - -- include tokens of this length or greater.
      • setQuantRate

        public void setQuantRate​(float quantRate)