Class | Description |
---|---|
BatchTopCommonTokenCounter |
Utility class that runs TopCommonTokenCounter against a directory
of table files (named {lang}_table.gz or leipzip-like afr_...-sentences.txt)
and outputs common tokens files for each input table file in the output directory.
|
CommonTokenOverlapCounter | |
LeipzigHelper | |
LeipzigSampler | |
SlowCompositeReaderWrapper |
COPIED VERBATIM FROM LUCENE
This class forces a composite reader (eg a
MultiReader or DirectoryReader ) to emulate a
LeafReader . |
TopCommonTokenCounter |
Utility class that reads in a UTF-8 input file with one document per row
and outputs the 20000 tokens with the highest document frequencies.
|
TrainTestSplit |
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.