Package org.apache.tika.eval.core.tokens
Class URLEmailNormalizingFilterFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenFilterFactory
org.apache.tika.eval.core.tokens.URLEmailNormalizingFilterFactory
public class URLEmailNormalizingFilterFactory
extends org.apache.lucene.analysis.TokenFilterFactory
Factory for filter that normalizes urls and emails to __url__ and __email__
respectively. WARNING:This will not work correctly unless the
UAX29URLEmailTokenizer is used! This must be run _before_ the
AlphaIdeographFilterFactory, or else the emails/urls will already
be removed!-
Field Summary
FieldsFields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion -
Constructor Summary
ConstructorsConstructorDescription -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.lucene.analysis.TokenStreamcreate(org.apache.lucene.analysis.TokenStream tokenStream) Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFiltersMethods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
- See Also:
-
URL
- See Also:
-
EMAIL
- See Also:
-
-
Constructor Details
-
URLEmailNormalizingFilterFactory
public URLEmailNormalizingFilterFactory() -
URLEmailNormalizingFilterFactory
-
-
Method Details
-
create
public org.apache.lucene.analysis.TokenStream create(org.apache.lucene.analysis.TokenStream tokenStream) - Specified by:
createin classorg.apache.lucene.analysis.TokenFilterFactory
-