Package org.apache.tika.eval.core.tokens
Class URLEmailNormalizingFilterFactory
- java.lang.Object
-
- org.apache.lucene.analysis.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.TokenFilterFactory
-
- org.apache.tika.eval.core.tokens.URLEmailNormalizingFilterFactory
-
public class URLEmailNormalizingFilterFactory extends org.apache.lucene.analysis.TokenFilterFactory
Factory for filter that normalizes urls and emails to __url__ and __email__ respectively. WARNING:This will not work correctly unless theUAX29URLEmailTokenizer
is used! This must be run _before_ theAlphaIdeographFilterFactory
, or else the emails/urls will already be removed!
-
-
Constructor Summary
Constructors Constructor Description URLEmailNormalizingFilterFactory()
URLEmailNormalizingFilterFactory(Map<String,String> args)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.lucene.analysis.TokenStream
create(org.apache.lucene.analysis.TokenStream tokenStream)
-
Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
-
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final String NAME
- See Also:
- Constant Field Values
-
URL
public static final String URL
- See Also:
- Constant Field Values
-
EMAIL
public static final String EMAIL
- See Also:
- Constant Field Values
-
-