Package org.apache.tika.language.detect
Class LanguageHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ContentHandlerDecorator
-
- org.apache.tika.sax.WriteOutContentHandler
-
- org.apache.tika.language.detect.LanguageHandler
-
- All Implemented Interfaces:
ContentHandler
,DTDHandler
,EntityResolver
,ErrorHandler
public class LanguageHandler extends WriteOutContentHandler
SAX content handler that updates a language detector based on all the received character content.- Since:
- Apache Tika 0.10
-
-
Constructor Summary
Constructors Constructor Description LanguageHandler()
LanguageHandler(LanguageDetector detector)
LanguageHandler(LanguageWriter writer)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description LanguageDetector
getDetector()
Returns the language detector used by this content handler.LanguageResult
getLanguage()
Returns the detected language based on text handled thus far.-
Methods inherited from class org.apache.tika.sax.WriteOutContentHandler
characters, ignorableWhitespace
-
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endDocument, endElement, endPrefixMapping, error, fatalError, handleException, processingInstruction, setContentHandler, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, toString, warning
-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
notationDecl, resolveEntity, unparsedEntityDecl
-
-
-
-
Constructor Detail
-
LanguageHandler
public LanguageHandler() throws IOException
- Throws:
IOException
-
LanguageHandler
public LanguageHandler(LanguageWriter writer)
-
LanguageHandler
public LanguageHandler(LanguageDetector detector)
-
-
Method Detail
-
getDetector
public LanguageDetector getDetector()
Returns the language detector used by this content handler. Note that the returned detector gets updated whenever new SAX events are received by this content handler.- Returns:
- language detector
-
getLanguage
public LanguageResult getLanguage()
Returns the detected language based on text handled thus far.- Returns:
- LanguageResult
-
-