public class TopCommonTokenCounter extends Object
The CommmonTokensAnalyzer intentionally drops tokens shorter than 4 characters, but includes bigrams for cjk.
It also has a include list for __email__ and __url__ and a skip list for common html markup terms.
| Constructor and Description | 
|---|
TopCommonTokenCounter()  | 
Copyright © 2007–2021 The Apache Software Foundation. All rights reserved.