Class TextLangDetector


  • public class TextLangDetector
    extends LanguageDetector
    Language Detection using MIT Lincoln Lab’s Text.jl library https://github.com/trevorlewis/TextREST.jl

    Please run the TextREST.jl server before using this.

    • Constructor Detail

      • TextLangDetector

        public TextLangDetector()
    • Method Detail

      • canRun

        protected static boolean canRun()
      • loadModels

        public LanguageDetector loadModels()
                                    throws IOException
        Description copied from class: LanguageDetector
        Load (or re-load) all available language models. This must be called after any settings that would impact the models being loaded (e.g. mixed language/short text), but before any of the document processing routines (below) are called. Note that it only needs to be called once.
        Specified by:
        loadModels in class LanguageDetector
        Returns:
        this
        Throws:
        IOException
      • hasModel

        public boolean hasModel​(String language)
        Description copied from class: LanguageDetector
        Provide information about whether a model exists for a specific language.
        Specified by:
        hasModel in class LanguageDetector
        Parameters:
        language - ISO 639-1 name for language
        Returns:
        true if a model for this language exists.
      • setPriors

        public LanguageDetector setPriors​(Map<String,​Float> languageProbabilities)
                                   throws IOException
        Description copied from class: LanguageDetector
        Set the a-priori probabilities for these languages. The provided map uses the language as the key, and the probability (0.0 > probability < 1.0) of text being in that language. Note that if the probabilities don't sum to 1.0, these values will be normalized.

        If hasModel() returns false for any of the languages, an IllegalArgumentException is thrown.

        Use of these probabilities is detector-specific, and thus might not impact the results at all. As such, these should be viewed as a hint.

        Specified by:
        setPriors in class LanguageDetector
        Parameters:
        languageProbabilities - Map from language to probability
        Returns:
        this
        Throws:
        IOException
      • reset

        public void reset()
        Description copied from class: LanguageDetector
        Reset statistics about the current document being processed
        Specified by:
        reset in class LanguageDetector
      • addText

        public void addText​(char[] cbuf,
                            int off,
                            int len)
        Description copied from class: LanguageDetector
        Add statistics about this text for the current document. Note that we assume an implicit word break exists before/after each of these runs of text.
        Specified by:
        addText in class LanguageDetector
        Parameters:
        cbuf - Character buffer
        off - Offset into cbuf to first character in the run of text
        len - Number of characters in the run of text.
      • detectAll

        public List<LanguageResult> detectAll()
        Description copied from class: LanguageDetector
        Detect languages based on previously submitted text (via addText calls).
        Specified by:
        detectAll in class LanguageDetector
        Returns:
        list of all possible languages with at least medium confidence, sorted by confidence from highest to lowest. There will always be at least one result, which might have a confidence of NONE.