Package org.apache.tika.eval.core.tokens
Class CJKBigramAwareLengthFilterFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenFilterFactory
org.apache.tika.eval.core.tokens.CJKBigramAwareLengthFilterFactory
public class CJKBigramAwareLengthFilterFactory
extends org.apache.lucene.analysis.TokenFilterFactory
Creates a very narrowly focused TokenFilter that limits tokens based on length
_unless_ they've been identified as <DOUBLE> or <SINGLE>
by the CJKBigramFilter.
This class is intended to be used when generating "common tokens" files.
-
Field Summary
FieldsFields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion -
Constructor Summary
ConstructorsConstructorDescription -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.lucene.analysis.TokenStreamcreate(org.apache.lucene.analysis.TokenStream tokenStream) Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFiltersMethods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
- See Also:
-
-
Constructor Details
-
CJKBigramAwareLengthFilterFactory
public CJKBigramAwareLengthFilterFactory() -
CJKBigramAwareLengthFilterFactory
-
-
Method Details
-
create
public org.apache.lucene.analysis.TokenStream create(org.apache.lucene.analysis.TokenStream tokenStream) - Specified by:
createin classorg.apache.lucene.analysis.TokenFilterFactory
-