Factory for filter that only allows tokens with characters that "isAlphabetic" or "isIdeographic" through.
Creates a very narrowly focused TokenFilter that limits tokens based on length _unless_ they've been identified as <DOUBLE> or <SINGLE> by the CJKBigramFilter.
Computes some corpus contrast statistics.
Factory for filter that normalizes urls and emails to __url__ and __email__ respectively.
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.