public class TextStatistics extends Object
Constructor and Description |
---|
TextStatistics() |
Modifier and Type | Method and Description |
---|---|
void |
addData(byte[] buffer,
int offset,
int length) |
int |
count()
Returns the total number of bytes seen so far.
|
int |
count(int b)
Returns the number of occurrences of the given byte.
|
int |
countControl()
Counts control characters (i.e.
|
int |
countEightBit()
Counts eight bit characters, i.e.
|
int |
countSafeAscii()
Counts "safe" (i.e.
|
boolean |
isMostlyAscii()
Checks whether at least one byte was seen and that the bytes that
were seen were mostly plain text (i.e.
|
boolean |
looksLikeUTF8()
Checks whether the observed byte stream looks like UTF-8 encoded text.
|
public void addData(byte[] buffer, int offset, int length)
public boolean isMostlyAscii()
public boolean looksLikeUTF8()
true
if the seen bytes look like UTF-8,
false
otherwisepublic int count()
public int count(int b)
b
- bytepublic int countControl()
This definition of control characters is based on section 4 of the "Content-Type Processing Model" Internet-draft (draft-abarth-mime-sniff-01).
+-------------------------+ | Binary data byte ranges | +-------------------------+ | 0x00 -- 0x08 | | 0x0B | | 0x0E -- 0x1A | | 0x1C -- 0x1F | +-------------------------+
public int countSafeAscii()
countControl()
public int countEightBit()
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.