public class OptimaizeLangDetector extends LanguageDetector
| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_MAX_CHARS_FOR_DETECTION |
static int |
DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION |
mixedLanguages, shortText| Constructor and Description |
|---|
OptimaizeLangDetector() |
OptimaizeLangDetector(int maxCharsForDetection) |
| Modifier and Type | Method and Description |
|---|---|
void |
addText(char[] cbuf,
int off,
int len)
Add statistics about this text for the current document.
|
List<LanguageResult> |
detectAll()
Detect languages based on previously submitted text (via addText calls).
|
boolean |
hasEnoughText()
Tell the caller whether more text is required for the current document
before the language can be reliably detected.
|
boolean |
hasModel(String language)
Provide information about whether a model exists for a specific
language.
|
LanguageDetector |
loadModels()
Load (or re-load) all available language models.
|
LanguageDetector |
loadModels(Set<String> languages)
Load (or re-load) the models specified in
|
void |
reset()
Reset statistics about the current document being processed
|
LanguageDetector |
setPriors(Map<String,Float> languageProbabilities)
Set the a-priori probabilities for these languages.
|
addText, detect, detect, detectAll, getDefaultLanguageDetector, getLanguageDetectors, getLanguageDetectors, isMixedLanguages, isShortText, setMixedLanguages, setShortTextpublic static final int DEFAULT_MAX_CHARS_FOR_DETECTION
public static final int DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION
public OptimaizeLangDetector()
public OptimaizeLangDetector(int maxCharsForDetection)
public LanguageDetector loadModels()
LanguageDetectorloadModels in class LanguageDetectorpublic LanguageDetector loadModels(Set<String> languages) throws IOException
LanguageDetectorloadModels in class LanguageDetectorlanguages - list of target languages.IOExceptionpublic boolean hasModel(String language)
LanguageDetectorhasModel in class LanguageDetectorlanguage - ISO 639-1 name for languagepublic LanguageDetector setPriors(Map<String,Float> languageProbabilities) throws IOException
LanguageDetectorIf hasModel() returns false for any of the languages, an IllegalArgumentException is thrown.
Use of these probabilities is detector-specific, and thus might not impact the results at all. As such, these should be viewed as a hint.
setPriors in class LanguageDetectorlanguageProbabilities - Map from language to probabilityIOExceptionpublic void reset()
LanguageDetectorreset in class LanguageDetectorpublic void addText(char[] cbuf,
int off,
int len)
LanguageDetectoraddText in class LanguageDetectorcbuf - Character bufferoff - Offset into cbuf to first character in the run of textlen - Number of characters in the run of text.public List<LanguageResult> detectAll()
detectAll in class LanguageDetectorIllegalStateException - if no models have been loaded with
loadModels() or loadModels(java.util.Set)public boolean hasEnoughText()
LanguageDetectorImplementations can override this to do early termination of stats collection, which can improve performance with longer documents.
Note that detect() can be called even when this returns false
hasEnoughText in class LanguageDetectorCopyright © 2007–2023 The Apache Software Foundation. All rights reserved.