public abstract class LanguageDetector extends Object
Modifier and Type | Field and Description |
---|---|
protected boolean |
mixedLanguages |
protected boolean |
shortText |
Constructor and Description |
---|
LanguageDetector() |
Modifier and Type | Method and Description |
---|---|
abstract void |
addText(char[] cbuf,
int off,
int len)
Add statistics about this text for the current document.
|
void |
addText(CharSequence text)
Add
|
LanguageResult |
detect() |
LanguageResult |
detect(CharSequence text) |
abstract List<LanguageResult> |
detectAll()
Detect languages based on previously submitted text (via addText calls).
|
List<LanguageResult> |
detectAll(String text)
Utility wrapper that detects the language of a given chunk of text.
|
static LanguageDetector |
getDefaultLanguageDetector() |
static List<LanguageDetector> |
getLanguageDetectors() |
static List<LanguageDetector> |
getLanguageDetectors(ServiceLoader loader) |
boolean |
hasEnoughText()
Tell the caller whether more text is required for the current document
before the language can be reliably detected.
|
abstract boolean |
hasModel(String language)
Provide information about whether a model exists for a specific
language.
|
boolean |
isMixedLanguages() |
boolean |
isShortText() |
abstract LanguageDetector |
loadModels()
Load (or re-load) all available language models.
|
abstract LanguageDetector |
loadModels(Set<String> languages)
Load (or re-load) the models specified in
|
abstract void |
reset()
Reset statistics about the current document being processed
|
LanguageDetector |
setMixedLanguages(boolean mixedLanguages) |
abstract LanguageDetector |
setPriors(Map<String,Float> languageProbabilities)
Set the a-priori probabilities for these languages.
|
LanguageDetector |
setShortText(boolean shortText) |
public static LanguageDetector getDefaultLanguageDetector()
public static List<LanguageDetector> getLanguageDetectors()
public static List<LanguageDetector> getLanguageDetectors(ServiceLoader loader)
public boolean isMixedLanguages()
public LanguageDetector setMixedLanguages(boolean mixedLanguages)
public boolean isShortText()
public LanguageDetector setShortText(boolean shortText)
public abstract LanguageDetector loadModels() throws IOException
IOException
public abstract LanguageDetector loadModels(Set<String> languages) throws IOException
languages
- list of target languages.IOException
public abstract boolean hasModel(String language)
language
- ISO 639-1 name for languagepublic abstract LanguageDetector setPriors(Map<String,Float> languageProbabilities) throws IOException
If hasModel() returns false for any of the languages, an IllegalArgumentException is thrown.
Use of these probabilities is detector-specific, and thus might not impact the results at all. As such, these should be viewed as a hint.
languageProbabilities
- Map from language to probabilityIOException
public abstract void reset()
public abstract void addText(char[] cbuf, int off, int len)
cbuf
- Character bufferoff
- Offset into cbuf to first character in the run of textlen
- Number of characters in the run of text.public void addText(CharSequence text)
text
- Characters to add to current statistics.public boolean hasEnoughText()
Implementations can override this to do early termination of stats collection, which can improve performance with longer documents.
Note that detect() can be called even when this returns false
public abstract List<LanguageResult> detectAll()
public LanguageResult detect()
public List<LanguageResult> detectAll(String text)
text
- String to add to current statistics.public LanguageResult detect(CharSequence text)
Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.