org.apache.tika.sax
Class SecureContentHandler

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by org.apache.tika.sax.ContentHandlerDecorator
          extended by org.apache.tika.sax.SecureContentHandler
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler

public class SecureContentHandler
extends ContentHandlerDecorator

Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.

Currently this class simply compares the number of output characters to to the number of input bytes and keeps track of the XML nesting levels. An exception gets thrown if the output seems excessive compared to the input document. This is a strong indication of a zip bomb.

Since:
Apache Tika 0.4
See Also:
TIKA-216

Constructor Summary
SecureContentHandler(ContentHandler handler, TikaInputStream stream)
          Decorates the given content handler with zip bomb prevention based on the count of bytes read from the given counting input stream.
 
Method Summary
 void characters(char[] ch, int start, int length)
           
 void endElement(String uri, String localName, String name)
           
 long getMaximumCompressionRatio()
          Returns the maximum compression ratio.
 int getMaximumDepth()
          Returns the maximum XML element nesting level.
 int getMaximumPackageEntryDepth()
          Returns the maximum package entry nesting level.
 long getOutputThreshold()
          Returns the configured output threshold.
 void ignorableWhitespace(char[] ch, int start, int length)
           
 void setMaximumCompressionRatio(long ratio)
          Sets the ratio between output characters and input bytes.
 void setMaximumDepth(int depth)
          Sets the maximum XML element nesting level.
 void setMaximumPackageEntryDepth(int depth)
          Sets the maximum package entry nesting level.
 void setOutputThreshold(long threshold)
          Sets the threshold for output characters before the zip bomb prevention is activated.
 void startElement(String uri, String localName, String name, Attributes atts)
           
 void throwIfCauseOf(SAXException e)
          Converts the given SAXException to a corresponding TikaException if it's caused by this instance detecting a zip bomb.
 
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endDocument, endPrefixMapping, handleException, processingInstruction, setContentHandler, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, toString
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SecureContentHandler

public SecureContentHandler(ContentHandler handler,
                            TikaInputStream stream)
Decorates the given content handler with zip bomb prevention based on the count of bytes read from the given counting input stream. The resulting decorator can be passed to a Tika parser along with the given counting input stream.

Parameters:
handler - the content handler to be decorated
stream - the input stream to be parsed
Method Detail

getOutputThreshold

public long getOutputThreshold()
Returns the configured output threshold.

Returns:
output threshold

setOutputThreshold

public void setOutputThreshold(long threshold)
Sets the threshold for output characters before the zip bomb prevention is activated. This avoids false positives in cases where an otherwise normal document for some reason starts with a highly compressible sequence of bytes.

Parameters:
threshold - new output threshold

getMaximumCompressionRatio

public long getMaximumCompressionRatio()
Returns the maximum compression ratio.

Returns:
maximum compression ratio

setMaximumCompressionRatio

public void setMaximumCompressionRatio(long ratio)
Sets the ratio between output characters and input bytes. If this ratio is exceeded (after the output threshold has been reached) then an exception gets thrown.

Parameters:
ratio - new maximum compression ratio

getMaximumDepth

public int getMaximumDepth()
Returns the maximum XML element nesting level.

Returns:
maximum XML element nesting level

setMaximumPackageEntryDepth

public void setMaximumPackageEntryDepth(int depth)
Sets the maximum package entry nesting level. If this depth level is exceeded then an exception gets thrown.

Parameters:
depth - maximum package entry nesting level

getMaximumPackageEntryDepth

public int getMaximumPackageEntryDepth()
Returns the maximum package entry nesting level.

Returns:
maximum package entry nesting level

setMaximumDepth

public void setMaximumDepth(int depth)
Sets the maximum XML element nesting level. If this depth level is exceeded then an exception gets thrown.

Parameters:
depth - maximum XML element nesting level

throwIfCauseOf

public void throwIfCauseOf(SAXException e)
                    throws TikaException
Converts the given SAXException to a corresponding TikaException if it's caused by this instance detecting a zip bomb.

Parameters:
e - SAX exception
Throws:
TikaException - zip bomb exception

startElement

public void startElement(String uri,
                         String localName,
                         String name,
                         Attributes atts)
                  throws SAXException
Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class ContentHandlerDecorator
Throws:
SAXException

endElement

public void endElement(String uri,
                       String localName,
                       String name)
                throws SAXException
Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class ContentHandlerDecorator
Throws:
SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws SAXException
Specified by:
characters in interface ContentHandler
Overrides:
characters in class ContentHandlerDecorator
Throws:
SAXException

ignorableWhitespace

public void ignorableWhitespace(char[] ch,
                                int start,
                                int length)
                         throws SAXException
Specified by:
ignorableWhitespace in interface ContentHandler
Overrides:
ignorableWhitespace in class ContentHandlerDecorator
Throws:
SAXException


Copyright © 2007-2011 The Apache Software Foundation. All Rights Reserved.