public class TikaLanguageDetector extends LanguageDetector
Because it works only on trigrams, it is not suitable for short texts.
There are better performing language detectors. This module is still here in the hopes that we'll get around to improving it, because it is elegant and could be fairly trivially improved.
mixedLanguages, shortText| Constructor and Description |
|---|
TikaLanguageDetector() |
| Modifier and Type | Method and Description |
|---|---|
void |
addText(char[] cbuf,
int off,
int len)
Add statistics about this text for the current document.
|
List<LanguageResult> |
detectAll()
Detect languages based on previously submitted text (via addText calls).
|
boolean |
hasModel(String language)
Provide information about whether a model exists for a specific
language.
|
LanguageDetector |
loadModels()
Load (or re-load) all available language models.
|
LanguageDetector |
loadModels(Set<String> languages)
Load (or re-load) the models specified in
|
void |
reset()
Reset statistics about the current document being processed
|
LanguageDetector |
setPriors(Map<String,Float> languageProbabilities)
not supported
|
addText, detect, detect, detectAll, getDefaultLanguageDetector, getLanguageDetectors, getLanguageDetectors, hasEnoughText, isMixedLanguages, isShortText, setMixedLanguages, setShortTextpublic LanguageDetector loadModels() throws IOException
LanguageDetectorloadModels in class LanguageDetectorIOExceptionpublic LanguageDetector loadModels(Set<String> languages) throws IOException
LanguageDetectorloadModels in class LanguageDetectorlanguages - list of target languages.IOExceptionpublic boolean hasModel(String language)
LanguageDetectorhasModel in class LanguageDetectorlanguage - ISO 639-1 name for languagepublic LanguageDetector setPriors(Map<String,Float> languageProbabilities) throws IOException
setPriors in class LanguageDetectorlanguageProbabilities - Map from language to probabilityIOExceptionpublic void reset()
LanguageDetectorreset in class LanguageDetectorpublic void addText(char[] cbuf,
int off,
int len)
LanguageDetectoraddText in class LanguageDetectorcbuf - Character bufferoff - Offset into cbuf to first character in the run of textlen - Number of characters in the run of text.public List<LanguageResult> detectAll()
LanguageDetectordetectAll in class LanguageDetectorCopyright © 2007–2021 The Apache Software Foundation. All rights reserved.