Class OptimaizeLangDetector
- java.lang.Object
- 
- org.apache.tika.language.detect.LanguageDetector
- 
- org.apache.tika.langdetect.optimaize.OptimaizeLangDetector
 
 
- 
 public class OptimaizeLangDetector extends LanguageDetector Implementation of the LanguageDetector API that uses https://github.com/optimaize/language-detector
- 
- 
Field SummaryFields Modifier and Type Field Description static intDEFAULT_MAX_CHARS_FOR_DETECTIONstatic intDEFAULT_MAX_CHARS_FOR_SHORT_DETECTION- 
Fields inherited from class org.apache.tika.language.detect.LanguageDetectormixedLanguages, shortText
 
- 
 - 
Constructor SummaryConstructors Constructor Description OptimaizeLangDetector()OptimaizeLangDetector(int maxCharsForDetection)
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddText(char[] cbuf, int off, int len)Add statistics about this text for the current document.List<LanguageResult>detectAll()Detect languages based on previously submitted text (via addText calls).booleanhasEnoughText()Tell the caller whether more text is required for the current document before the language can be reliably detected.booleanhasModel(String language)Provide information about whether a model exists for a specific language.LanguageDetectorloadModels()Load (or re-load) all available language models.LanguageDetectorloadModels(Set<String> languages)Load (or re-load) the models specified in. voidreset()Reset statistics about the current document being processedLanguageDetectorsetPriors(Map<String,Float> languageProbabilities)Set the a-priori probabilities for these languages.- 
Methods inherited from class org.apache.tika.language.detect.LanguageDetectoraddText, detect, detect, detectAll, getDefaultLanguageDetector, getLanguageDetectors, getLanguageDetectors, isMixedLanguages, isShortText, setMixedLanguages, setShortText
 
- 
 
- 
- 
- 
Field Detail- 
DEFAULT_MAX_CHARS_FOR_DETECTIONpublic static final int DEFAULT_MAX_CHARS_FOR_DETECTION - See Also:
- Constant Field Values
 
 - 
DEFAULT_MAX_CHARS_FOR_SHORT_DETECTIONpublic static final int DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION - See Also:
- Constant Field Values
 
 
- 
 - 
Method Detail- 
loadModelspublic LanguageDetector loadModels() Description copied from class:LanguageDetectorLoad (or re-load) all available language models. This must be called after any settings that would impact the models being loaded (e.g. mixed language/short text), but before any of the document processing routines (below) are called. Note that it only needs to be called once.- Specified by:
- loadModelsin class- LanguageDetector
- Returns:
- this
 
 - 
loadModelspublic LanguageDetector loadModels(Set<String> languages) throws IOException Description copied from class:LanguageDetectorLoad (or re-load) the models specified in. These use the ISO 639-1 names, with an optional "- " for more specific specification (e.g. "zh-CN" for Chinese in China). - Specified by:
- loadModelsin class- LanguageDetector
- Parameters:
- languages- list of target languages.
- Returns:
- this
- Throws:
- IOException
 
 - 
hasModelpublic boolean hasModel(String language) Description copied from class:LanguageDetectorProvide information about whether a model exists for a specific language.- Specified by:
- hasModelin class- LanguageDetector
- Parameters:
- language- ISO 639-1 name for language
- Returns:
- true if a model for this language exists.
 
 - 
setPriorspublic LanguageDetector setPriors(Map<String,Float> languageProbabilities) throws IOException Description copied from class:LanguageDetectorSet the a-priori probabilities for these languages. The provided map uses the language as the key, and the probability (0.0 > probability < 1.0) of text being in that language. Note that if the probabilities don't sum to 1.0, these values will be normalized.If hasModel() returns false for any of the languages, an IllegalArgumentException is thrown. Use of these probabilities is detector-specific, and thus might not impact the results at all. As such, these should be viewed as a hint. - Specified by:
- setPriorsin class- LanguageDetector
- Parameters:
- languageProbabilities- Map from language to probability
- Returns:
- this
- Throws:
- IOException
 
 - 
resetpublic void reset() Description copied from class:LanguageDetectorReset statistics about the current document being processed- Specified by:
- resetin class- LanguageDetector
 
 - 
addTextpublic void addText(char[] cbuf, int off, int len)Description copied from class:LanguageDetectorAdd statistics about this text for the current document. Note that we assume an implicit word break exists before/after each of these runs of text.- Specified by:
- addTextin class- LanguageDetector
- Parameters:
- cbuf- Character buffer
- off- Offset into cbuf to first character in the run of text
- len- Number of characters in the run of text.
 
 - 
detectAllpublic List<LanguageResult> detectAll() Detect languages based on previously submitted text (via addText calls).- Specified by:
- detectAllin class- LanguageDetector
- Returns:
- the detected list of languages
- Throws:
- IllegalStateException- if no models have been loaded with- loadModels()or- loadModels(java.util.Set)
 
 - 
hasEnoughTextpublic boolean hasEnoughText() Description copied from class:LanguageDetectorTell the caller whether more text is required for the current document before the language can be reliably detected.Implementations can override this to do early termination of stats collection, which can improve performance with longer documents. Note that detect() can be called even when this returns false - Overrides:
- hasEnoughTextin class- LanguageDetector
- Returns:
- true if we have enough text for reliable detection.
 
 
- 
 
-