Package org.apache.tika.sax
Class SafeContentHandler
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.apache.tika.sax.ContentHandlerDecorator
org.apache.tika.sax.SafeContentHandler
- All Implemented Interfaces:
- ContentHandler,- DTDHandler,- EntityResolver,- ErrorHandler
- Direct Known Subclasses:
- XHTMLContentHandler,- XMPContentHandler
Content handler decorator that makes sure that the character events
 (
characters(char[], int, int) or
 ignorableWhitespace(char[], int, int)) passed to the decorated
 content handler contain only valid XML characters. All invalid characters
 are replaced with the Unicode replacement character U+FFFD (though a
 subclass may change this by overriding the writeReplacement(Output)  method).
 The XML standard defines the following Unicode character ranges as valid XML characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Note that currently this class only detects those invalid characters whose UTF-16 representation fits a single char. Also, this class does not ensure that the UTF-16 encoding of incoming characters is correct.
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionprotected static interfaceInternal interface that allows both character and ignorable whitespace content to be filtered the same way.
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionvoidcharacters(char[] ch, int start, int length) voidvoidendElement(String uri, String localName, String name) voidignorableWhitespace(char[] ch, int start, int length) protected booleanisInvalid(int ch) Checks whether the given Unicode character is an invalid XML character and should be replaced for output.voidstartElement(String uri, String localName, String name, Attributes atts) protected voidOutputs the replacement for an invalid character.Methods inherited from class org.apache.tika.sax.ContentHandlerDecoratorendPrefixMapping, error, fatalError, handleException, processingInstruction, setContentHandler, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, toString, warningMethods inherited from class org.xml.sax.helpers.DefaultHandlernotationDecl, resolveEntity, unparsedEntityDeclMethods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.xml.sax.ContentHandlerdeclaration
- 
Constructor Details- 
SafeContentHandler
 
- 
- 
Method Details- 
isInvalidprotected boolean isInvalid(int ch) Checks whether the given Unicode character is an invalid XML character and should be replaced for output. Subclasses can override this method to use an alternative definition of which characters should be replaced in the XML output. The default definition from the XML specification is:Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] - Parameters:
- ch- character
- Returns:
- trueif the character should be replaced,- falseotherwise
 
- 
writeReplacementOutputs the replacement for an invalid character. Subclasses can override this method to use a custom replacement.- Parameters:
- output- where the replacement is written to
- Throws:
- SAXException- if the replacement could not be written
 
- 
startElementpublic void startElement(String uri, String localName, String name, Attributes atts) throws SAXException - Specified by:
- startElementin interface- ContentHandler
- Overrides:
- startElementin class- ContentHandlerDecorator
- Throws:
- SAXException
 
- 
endElement- Specified by:
- endElementin interface- ContentHandler
- Overrides:
- endElementin class- ContentHandlerDecorator
- Throws:
- SAXException
 
- 
endDocument- Specified by:
- endDocumentin interface- ContentHandler
- Overrides:
- endDocumentin class- ContentHandlerDecorator
- Throws:
- SAXException
 
- 
characters- Specified by:
- charactersin interface- ContentHandler
- Overrides:
- charactersin class- ContentHandlerDecorator
- Throws:
- SAXException
 
- 
ignorableWhitespace- Specified by:
- ignorableWhitespacein interface- ContentHandler
- Overrides:
- ignorableWhitespacein class- ContentHandlerDecorator
- Throws:
- SAXException
 
 
-