OptimaizeLangDetector (Apache Tika 2.4.1 API)

java.lang.Object
- org.apache.tika.language.detect.LanguageDetector
- - org.apache.tika.langdetect.optimaize.OptimaizeLangDetector

```
public class OptimaizeLangDetector
extends LanguageDetector
```
Implementation of the LanguageDetector API that uses https://github.com/optimaize/language-detector

Field Summary

Fields
Modifier and Type Field and Description

static int DEFAULT_MAX_CHARS_FOR_DETECTION

static int DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION
- Fields inherited from class org.apache.tika.language.detect.LanguageDetector
  mixedLanguages, shortText

Fields
Modifier and Type	Field and Description
`static int`	`DEFAULT_MAX_CHARS_FOR_DETECTION`
`static int`	`DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION`

Constructor Summary

Constructors
Constructor and Description

OptimaizeLangDetector()

OptimaizeLangDetector(int maxCharsForDetection)

Constructors
Constructor and Description
`OptimaizeLangDetector()`
`OptimaizeLangDetector(int maxCharsForDetection)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addText(char[] cbuf, int off, int len)` Add statistics about this text for the current document.
`List<LanguageResult>`	`detectAll()` Detect languages based on previously submitted text (via addText calls).
`boolean`	`hasEnoughText()` Tell the caller whether more text is required for the current document before the language can be reliably detected.
`boolean`	`hasModel(String language)` Provide information about whether a model exists for a specific language.
`LanguageDetector`	`loadModels()` Load (or re-load) all available language models.
`LanguageDetector`	`loadModels(Set<String> languages)` Load (or re-load) the models specified in .
`void`	`reset()` Reset statistics about the current document being processed
`LanguageDetector`	`setPriors(Map<String,Float> languageProbabilities)` Set the a-priori probabilities for these languages.

Methods inherited from class org.apache.tika.language.detect.LanguageDetector
addText, detect, detect, detectAll, getDefaultLanguageDetector, getLanguageDetectors, getLanguageDetectors, isMixedLanguages, isShortText, setMixedLanguages, setShortText

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_MAX_CHARS_FOR_DETECTION
```
public static final int DEFAULT_MAX_CHARS_FOR_DETECTION
```
    See Also:
    
    Constant Field Values
  - DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION
```
public static final int DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - OptimaizeLangDetector
```
public OptimaizeLangDetector()
```
  - OptimaizeLangDetector
```
public OptimaizeLangDetector(int maxCharsForDetection)
```
- Method Detail
  - loadModels
```
public LanguageDetector loadModels()
```
    Description copied from class: LanguageDetector
    
    Load (or re-load) all available language models. This must be called after any settings that would impact the models being loaded (e.g. mixed language/short text), but before any of the document processing routines (below) are called. Note that it only needs to be called once.
    
    Specified by:
    
    loadModels in class LanguageDetector
    
    Returns:
    
    this
  - loadModels
```
public LanguageDetector loadModels(Set<String> languages)
                            throws IOException
```
    Description copied from class: LanguageDetector
    
    Load (or re-load) the models specified in . These use the ISO 639-1 names, with an optional "-" for more specific specification (e.g. "zh-CN" for Chinese in China).
    
    Specified by:
    
    loadModels in class LanguageDetector
    
    Parameters:
    
    languages - list of target languages.
    
    Returns:
    
    this
    
    Throws:
    
    IOException
  - hasModel
```
public boolean hasModel(String language)
```
    Description copied from class: LanguageDetector
    
    Provide information about whether a model exists for a specific language.
    
    Specified by:
    
    hasModel in class LanguageDetector
    
    Parameters:
    
    language - ISO 639-1 name for language
    
    Returns:
    
    true if a model for this language exists.
  - setPriors
```
public LanguageDetector setPriors(Map<String,Float> languageProbabilities)
                           throws IOException
```
    Description copied from class: LanguageDetector
    
    Set the a-priori probabilities for these languages. The provided map uses the language as the key, and the probability (0.0 > probability < 1.0) of text being in that language. Note that if the probabilities don't sum to 1.0, these values will be normalized.
    If hasModel() returns false for any of the languages, an IllegalArgumentException is thrown.
    Use of these probabilities is detector-specific, and thus might not impact the results at all. As such, these should be viewed as a hint.
    
    Specified by:
    
    setPriors in class LanguageDetector
    
    Parameters:
    
    languageProbabilities - Map from language to probability
    
    Returns:
    
    this
    
    Throws:
    
    IOException
  - reset
```
public void reset()
```
    Description copied from class: LanguageDetector
    
    Reset statistics about the current document being processed
    
    Specified by:
    
    reset in class LanguageDetector
  - addText
```
public void addText(char[] cbuf,
                    int off,
                    int len)
```
    Description copied from class: LanguageDetector
    
    Add statistics about this text for the current document. Note that we assume an implicit word break exists before/after each of these runs of text.
    
    Specified by:
    
    addText in class LanguageDetector
    
    Parameters:
    
    cbuf - Character buffer
    
    off - Offset into cbuf to first character in the run of text
    
    len - Number of characters in the run of text.
  - detectAll
```
public List<LanguageResult> detectAll()
```
    Detect languages based on previously submitted text (via addText calls).
    
    Specified by:
    
    detectAll in class LanguageDetector
    
    Returns:
    
    the detected list of languages
    
    Throws:
    
    IllegalStateException - if no models have been loaded with loadModels() or loadModels(java.util.Set)
  - hasEnoughText
```
public boolean hasEnoughText()
```
    Description copied from class: LanguageDetector
    
    Tell the caller whether more text is required for the current document before the language can be reliably detected.
    Implementations can override this to do early termination of stats collection, which can improve performance with longer documents.
    Note that detect() can be called even when this returns false
    
    Overrides:
    
    hasEnoughText in class LanguageDetector
    
    Returns:
    
    true if we have enough text for reliable detection.

Class OptimaizeLangDetector

Field Summary

Fields inherited from class org.apache.tika.language.detect.LanguageDetector

Constructor Summary

Method Summary

Methods inherited from class org.apache.tika.language.detect.LanguageDetector

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_MAX_CHARS_FOR_DETECTION

DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION

Constructor Detail

OptimaizeLangDetector

OptimaizeLangDetector

Method Detail

loadModels

loadModels

hasModel

setPriors

reset

addText

detectAll

hasEnoughText