public class OptimaizeLangDetector extends LanguageDetector
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MAX_CHARS_FOR_DETECTION |
static int |
DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION |
mixedLanguages, shortText
Constructor and Description |
---|
OptimaizeLangDetector() |
OptimaizeLangDetector(int maxCharsForDetection) |
Modifier and Type | Method and Description |
---|---|
void |
addText(char[] cbuf,
int off,
int len)
Add statistics about this text for the current document.
|
List<LanguageResult> |
detectAll()
Detect languages based on previously submitted text (via addText calls).
|
boolean |
hasEnoughText()
Tell the caller whether more text is required for the current document
before the language can be reliably detected.
|
boolean |
hasModel(String language)
Provide information about whether a model exists for a specific
language.
|
LanguageDetector |
loadModels()
Load (or re-load) all available language models.
|
LanguageDetector |
loadModels(Set<String> languages)
Load (or re-load) the models specified in
|
void |
reset()
Reset statistics about the current document being processed
|
LanguageDetector |
setPriors(Map<String,Float> languageProbabilities)
Set the a-priori probabilities for these languages.
|
addText, detect, detect, detectAll, getDefaultLanguageDetector, getLanguageDetectors, getLanguageDetectors, isMixedLanguages, isShortText, setMixedLanguages, setShortText
public static final int DEFAULT_MAX_CHARS_FOR_DETECTION
public static final int DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION
public OptimaizeLangDetector()
public OptimaizeLangDetector(int maxCharsForDetection)
public LanguageDetector loadModels()
LanguageDetector
loadModels
in class LanguageDetector
public LanguageDetector loadModels(Set<String> languages) throws IOException
LanguageDetector
loadModels
in class LanguageDetector
languages
- list of target languages.IOException
public boolean hasModel(String language)
LanguageDetector
hasModel
in class LanguageDetector
language
- ISO 639-1 name for languagepublic LanguageDetector setPriors(Map<String,Float> languageProbabilities) throws IOException
LanguageDetector
If hasModel() returns false for any of the languages, an IllegalArgumentException is thrown.
Use of these probabilities is detector-specific, and thus might not impact the results at all. As such, these should be viewed as a hint.
setPriors
in class LanguageDetector
languageProbabilities
- Map from language to probabilityIOException
public void reset()
LanguageDetector
reset
in class LanguageDetector
public void addText(char[] cbuf, int off, int len)
LanguageDetector
addText
in class LanguageDetector
cbuf
- Character bufferoff
- Offset into cbuf to first character in the run of textlen
- Number of characters in the run of text.public List<LanguageResult> detectAll()
detectAll
in class LanguageDetector
IllegalStateException
- if no models have been loaded with
loadModels()
or loadModels(java.util.Set)
public boolean hasEnoughText()
LanguageDetector
Implementations can override this to do early termination of stats collection, which can improve performance with longer documents.
Note that detect() can be called even when this returns false
hasEnoughText
in class LanguageDetector
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.