Class CJKBigramAwareLengthFilterFactory

java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenFilterFactory
org.apache.tika.eval.core.tokens.CJKBigramAwareLengthFilterFactory

public class CJKBigramAwareLengthFilterFactory extends org.apache.lucene.analysis.TokenFilterFactory
Creates a very narrowly focused TokenFilter that limits tokens based on length _unless_ they've been identified as <DOUBLE> or <SINGLE> by the CJKBigramFilter.

This class is intended to be used when generating "common tokens" files.

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
     

    Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

    LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
  • Constructor Summary

    Constructors
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.lucene.analysis.TokenStream
    create(org.apache.lucene.analysis.TokenStream tokenStream)
     

    Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory

    availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters

    Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

    defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • CJKBigramAwareLengthFilterFactory

      public CJKBigramAwareLengthFilterFactory()
    • CJKBigramAwareLengthFilterFactory

      public CJKBigramAwareLengthFilterFactory(Map<String,String> args)
  • Method Details

    • create

      public org.apache.lucene.analysis.TokenStream create(org.apache.lucene.analysis.TokenStream tokenStream)
      Specified by:
      create in class org.apache.lucene.analysis.TokenFilterFactory