org.apache.tika.sax
Class SecureContentHandler

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by org.apache.tika.sax.ContentHandlerDecorator
          extended by org.apache.tika.sax.SecureContentHandler
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class SecureContentHandler
extends ContentHandlerDecorator

Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.

Currently this class simply compares the number of output characters to to the number of input bytes, and throws an exception if the output is truly excessive when compared to the input. This is a strong indication of a zip bomb.

Since:
Apache Tika 0.4
See Also:
TIKA-216

Constructor Summary
SecureContentHandler(org.xml.sax.ContentHandler handler, CountingInputStream stream)
          Decorates the given content handler with zip bomb prevention based on the count of bytes read from the given counting input stream.
 
Method Summary
 void characters(char[] ch, int start, int length)
           
 long getMaximumCompressionRatio()
          Returns the maximum compression ratio.
 long getOutputThreshold()
          Returns the configured output threshold.
 void ignorableWhitespace(char[] ch, int start, int length)
           
 void setMaximumCompressionRatio(long ratio)
          Sets the ratio between output characters and input bytes.
 void setOutputThreshold(long threshold)
          Sets the threshold for output characters before the zip bomb prevention is activated.
 void throwIfCauseOf(org.xml.sax.SAXException e)
          Converts the given SAXException to a corresponding TikaException if it's caused by this instance detecting a zip bomb.
 
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endDocument, endElement, endPrefixMapping, handleException, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, toString
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SecureContentHandler

public SecureContentHandler(org.xml.sax.ContentHandler handler,
                            CountingInputStream stream)
Decorates the given content handler with zip bomb prevention based on the count of bytes read from the given counting input stream. The resulting decorator can be passed to a Tika parser along with the given counting input stream.

Parameters:
handler - the content handler to be decorated
stream - the input stream to be parsed, wrapped into a CountingInputStream decorator
Method Detail

getOutputThreshold

public long getOutputThreshold()
Returns the configured output threshold.

Returns:
output threshold

setOutputThreshold

public void setOutputThreshold(long threshold)
Sets the threshold for output characters before the zip bomb prevention is activated. This avoids false positives in cases where an otherwise normal document for some reason starts with a highly compressible sequence of bytes.

Parameters:
threshold - new output threshold

getMaximumCompressionRatio

public long getMaximumCompressionRatio()
Returns the maximum compression ratio.

Returns:
maximum compression ratio

setMaximumCompressionRatio

public void setMaximumCompressionRatio(long ratio)
Sets the ratio between output characters and input bytes. If this ratio is exceeded (after the output threshold has been reached) then an exception gets thrown.

Parameters:
ratio - new maximum compression ratio

throwIfCauseOf

public void throwIfCauseOf(org.xml.sax.SAXException e)
                    throws TikaException
Converts the given SAXException to a corresponding TikaException if it's caused by this instance detecting a zip bomb.

Parameters:
e - SAX exception
Throws:
TikaException - zip bomb exception

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class ContentHandlerDecorator
Throws:
org.xml.sax.SAXException

ignorableWhitespace

public void ignorableWhitespace(char[] ch,
                                int start,
                                int length)
                         throws org.xml.sax.SAXException
Specified by:
ignorableWhitespace in interface org.xml.sax.ContentHandler
Overrides:
ignorableWhitespace in class ContentHandlerDecorator
Throws:
org.xml.sax.SAXException


Copyright © 2010 The Apache Software Foundation. All Rights Reserved.