|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.tika.parser.txt.CharsetMatch
public class CharsetMatch
This class represents a charset that has been identified by a CharsetDetector as a possible encoding for a set of input data. From an instance of this class, you can ask for a confidence level in the charset identification, or for Java Reader or String to access the original byte data in Unicode form.
Instances of this class are created only by CharsetDetectors. Note: this class has a natural ordering that is inconsistent with equals. The natural ordering is based on the match confidence value.
Field Summary | |
---|---|
static int |
BOM
Bit flag indicating the match is based on the presence of a BOM. |
static int |
DECLARED_ENCODING
Bit flag indicating he match is based on the declared encoding. |
static int |
ENCODING_SCHEME
Bit flag indicating the match is based on the the encoding scheme. |
static int |
LANG_STATISTICS
Bit flag indicating the match is based on language statistics. |
Method Summary | |
---|---|
int |
compareTo(CharsetMatch other)
Compare to other CharsetMatch objects. |
int |
getConfidence()
Get an indication of the confidence in the charset detected. |
java.lang.String |
getLanguage()
Get the ISO code for the language of the detected charset. |
int |
getMatchType()
Return flags indicating what it was about the input data that caused this charset to be considered as a possible match. |
java.lang.String |
getName()
Get the name of the detected charset. |
java.io.Reader |
getReader()
Create a java.io.Reader for reading the Unicode character data corresponding to the original byte data supplied to the Charset detect operation. |
java.lang.String |
getString()
Create a Java String from Unicode character data corresponding to the original byte data supplied to the Charset detect operation. |
java.lang.String |
getString(int maxLength)
Create a Java String from Unicode character data corresponding to the original byte data supplied to the Charset detect operation. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int ENCODING_SCHEME
getMatchType()
,
Constant Field Valuespublic static final int BOM
getMatchType()
,
Constant Field Valuespublic static final int DECLARED_ENCODING
getMatchType()
,
Constant Field Valuespublic static final int LANG_STATISTICS
getMatchType()
,
Constant Field ValuesMethod Detail |
---|
public java.io.Reader getReader()
public java.lang.String getString() throws java.io.IOException
java.io.IOException
public java.lang.String getString(int maxLength) throws java.io.IOException
maxLength
- The maximium length of the String to be created when the
source of the data is an input stream, or -1 for
unlimited length.
java.io.IOException
public int getConfidence()
public int getMatchType()
Note: currently, this method always returns zero.
public java.lang.String getName()
Charset
,
InputStreamReader
public java.lang.String getLanguage()
null
if the language cannot be determined.public int compareTo(CharsetMatch other)
compareTo
in interface java.lang.Comparable<CharsetMatch>
o
- the CharsetMatch object to compare against.
java.lang.ClassCastException
- if the argument is not a CharsetMatch.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |