Package org.apache.tika.sax
Class SecureContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ContentHandlerDecorator
-
- org.apache.tika.sax.SecureContentHandler
-
- All Implemented Interfaces:
ContentHandler
,DTDHandler
,EntityResolver
,ErrorHandler
public class SecureContentHandler extends ContentHandlerDecorator
Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.Currently this class simply compares the number of output characters to to the number of input bytes and keeps track of the XML nesting levels. An exception gets thrown if the output seems excessive compared to the input document. This is a strong indication of a zip bomb.
- Since:
- Apache Tika 0.4
- See Also:
- TIKA-216
-
-
Constructor Summary
Constructors Constructor Description SecureContentHandler(ContentHandler handler, TikaInputStream stream)
Decorates the given content handler with zip bomb prevention based on the count of bytes read from the given counting input stream.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
advance(int length)
Records the given number of output characters (or more accurately UTF-16 code units).void
characters(char[] ch, int start, int length)
void
endElement(String uri, String localName, String name)
long
getMaximumCompressionRatio()
Returns the maximum compression ratio.int
getMaximumDepth()
Returns the maximum XML element nesting level.int
getMaximumPackageEntryDepth()
Returns the maximum package entry nesting level.long
getOutputThreshold()
Returns the configured output threshold.void
ignorableWhitespace(char[] ch, int start, int length)
void
setMaximumCompressionRatio(long ratio)
Sets the ratio between output characters and input bytes.void
setMaximumDepth(int depth)
Sets the maximum XML element nesting level.void
setMaximumPackageEntryDepth(int depth)
Sets the maximum package entry nesting level.void
setOutputThreshold(long threshold)
Sets the threshold for output characters before the zip bomb prevention is activated.void
startElement(String uri, String localName, String name, Attributes atts)
void
throwIfCauseOf(SAXException e)
Converts the givenSAXException
to a correspondingTikaException
if it's caused by this instance detecting a zip bomb.-
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endDocument, endPrefixMapping, handleException, processingInstruction, setContentHandler, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, toString
-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning
-
-
-
-
Constructor Detail
-
SecureContentHandler
public SecureContentHandler(ContentHandler handler, TikaInputStream stream)
Decorates the given content handler with zip bomb prevention based on the count of bytes read from the given counting input stream. The resulting decorator can be passed to a Tika parser along with the given counting input stream.- Parameters:
handler
- the content handler to be decoratedstream
- the input stream to be parsed
-
-
Method Detail
-
getOutputThreshold
public long getOutputThreshold()
Returns the configured output threshold.- Returns:
- output threshold
-
setOutputThreshold
public void setOutputThreshold(long threshold)
Sets the threshold for output characters before the zip bomb prevention is activated. This avoids false positives in cases where an otherwise normal document for some reason starts with a highly compressible sequence of bytes.- Parameters:
threshold
- new output threshold
-
getMaximumCompressionRatio
public long getMaximumCompressionRatio()
Returns the maximum compression ratio.- Returns:
- maximum compression ratio
-
setMaximumCompressionRatio
public void setMaximumCompressionRatio(long ratio)
Sets the ratio between output characters and input bytes. If this ratio is exceeded (after the output threshold has been reached) then an exception gets thrown.- Parameters:
ratio
- new maximum compression ratio
-
getMaximumDepth
public int getMaximumDepth()
Returns the maximum XML element nesting level.- Returns:
- maximum XML element nesting level
-
setMaximumDepth
public void setMaximumDepth(int depth)
Sets the maximum XML element nesting level. If this depth level is exceeded then an exception gets thrown.- Parameters:
depth
- maximum XML element nesting level
-
getMaximumPackageEntryDepth
public int getMaximumPackageEntryDepth()
Returns the maximum package entry nesting level.- Returns:
- maximum package entry nesting level
-
setMaximumPackageEntryDepth
public void setMaximumPackageEntryDepth(int depth)
Sets the maximum package entry nesting level. If this depth level is exceeded then an exception gets thrown.- Parameters:
depth
- maximum package entry nesting level
-
throwIfCauseOf
public void throwIfCauseOf(SAXException e) throws TikaException
Converts the givenSAXException
to a correspondingTikaException
if it's caused by this instance detecting a zip bomb.- Parameters:
e
- SAX exception- Throws:
TikaException
- zip bomb exception
-
advance
protected void advance(int length) throws SAXException
Records the given number of output characters (or more accurately UTF-16 code units). Throws an exception if the recorded number of characters highly exceeds the number of input bytes read.- Parameters:
length
- number of new output characters produced- Throws:
SAXException
- if a zip bomb is detected
-
startElement
public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException
- Specified by:
startElement
in interfaceContentHandler
- Overrides:
startElement
in classContentHandlerDecorator
- Throws:
SAXException
-
endElement
public void endElement(String uri, String localName, String name) throws SAXException
- Specified by:
endElement
in interfaceContentHandler
- Overrides:
endElement
in classContentHandlerDecorator
- Throws:
SAXException
-
characters
public void characters(char[] ch, int start, int length) throws SAXException
- Specified by:
characters
in interfaceContentHandler
- Overrides:
characters
in classContentHandlerDecorator
- Throws:
SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException
- Specified by:
ignorableWhitespace
in interfaceContentHandler
- Overrides:
ignorableWhitespace
in classContentHandlerDecorator
- Throws:
SAXException
-
-