Package org.apache.tika.eval.core.tokens
Class URLEmailNormalizingFilterFactory
- java.lang.Object
-
- org.apache.lucene.analysis.util.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.util.TokenFilterFactory
-
- org.apache.tika.eval.core.tokens.URLEmailNormalizingFilterFactory
-
public class URLEmailNormalizingFilterFactory extends org.apache.lucene.analysis.util.TokenFilterFactory
Factory for filter that normalizes urls and emails to __url__ and __email__ respectively. WARNING:This will not work correctly unless theUAX29URLEmailTokenizer
is used! This must be run _before_ theAlphaIdeographFilterFactory
, or else the emails/urls will already be removed!
-
-
Constructor Summary
Constructors Constructor Description URLEmailNormalizingFilterFactory(Map<String,String> args)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.lucene.analysis.TokenStream
create(org.apache.lucene.analysis.TokenStream tokenStream)
-
Methods inherited from class org.apache.lucene.analysis.util.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
-
Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
URL
public static final String URL
- See Also:
- Constant Field Values
-
EMAIL
public static final String EMAIL
- See Also:
- Constant Field Values
-
-