|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.xml.sax.helpers.DefaultHandler org.apache.tika.sax.ContentHandlerDecorator org.apache.tika.sax.SafeContentHandler
public class SafeContentHandler
Content handler decorator that makes sure that the character events
(characters(char[], int, int)
or
ignorableWhitespace(char[], int, int)
) passed to the decorated
content handler contain only valid XML characters. All invalid characters
are replaced with spaces.
The XML standard defines the following Unicode character ranges as valid XML characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Note that currently this class only detects those invalid characters whose UTF-16 representation fits a single char. Also, this class does not ensure that the UTF-16 encoding of incoming characters is correct.
Nested Class Summary | |
---|---|
protected static interface |
SafeContentHandler.Output
Internal interface that allows both character and ignorable whitespace content to be filtered the same way. |
Constructor Summary | |
---|---|
SafeContentHandler(org.xml.sax.ContentHandler handler)
|
Method Summary | |
---|---|
void |
characters(char[] ch,
int start,
int length)
|
void |
endDocument()
|
void |
endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String name)
|
void |
ignorableWhitespace(char[] ch,
int start,
int length)
|
protected boolean |
isInvalid(char ch)
Deprecated. Use isInvalid(int) instead |
protected boolean |
isInvalid(int ch)
Checks whether the given Unicode character is an invalid XML character and should be replaced for output. |
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String name,
org.xml.sax.Attributes atts)
|
protected void |
writeReplacement(SafeContentHandler.Output output)
Outputs the replacement for an invalid character. |
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator |
---|
endPrefixMapping, handleException, processingInstruction, setContentHandler, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, toString |
Methods inherited from class org.xml.sax.helpers.DefaultHandler |
---|
error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public SafeContentHandler(org.xml.sax.ContentHandler handler)
Method Detail |
---|
protected boolean isInvalid(int ch)
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
ch
- character
true
if the character should be replaced,
false
otherwiseprotected boolean isInvalid(char ch)
isInvalid(int)
instead
protected void writeReplacement(SafeContentHandler.Output output) throws org.xml.sax.SAXException
output
- where the replacement is written to
org.xml.sax.SAXException
- if the replacement could not be writtenpublic void startElement(java.lang.String uri, java.lang.String localName, java.lang.String name, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException
startElement
in interface org.xml.sax.ContentHandler
startElement
in class ContentHandlerDecorator
org.xml.sax.SAXException
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String name) throws org.xml.sax.SAXException
endElement
in interface org.xml.sax.ContentHandler
endElement
in class ContentHandlerDecorator
org.xml.sax.SAXException
public void endDocument() throws org.xml.sax.SAXException
endDocument
in interface org.xml.sax.ContentHandler
endDocument
in class ContentHandlerDecorator
org.xml.sax.SAXException
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException
characters
in interface org.xml.sax.ContentHandler
characters
in class ContentHandlerDecorator
org.xml.sax.SAXException
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException
ignorableWhitespace
in interface org.xml.sax.ContentHandler
ignorableWhitespace
in class ContentHandlerDecorator
org.xml.sax.SAXException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |