Class SecureContentHandler

  • All Implemented Interfaces:
    ContentHandler, DTDHandler, EntityResolver, ErrorHandler

    public class SecureContentHandler
    extends ContentHandlerDecorator
    Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.

    Currently this class simply compares the number of output characters to to the number of input bytes and keeps track of the XML nesting levels. An exception gets thrown if the output seems excessive compared to the input document. This is a strong indication of a zip bomb.

    Since:
    Apache Tika 0.4
    See Also:
    TIKA-216
    • Constructor Detail

      • SecureContentHandler

        public SecureContentHandler​(ContentHandler handler,
                                    TikaInputStream stream)
        Decorates the given content handler with zip bomb prevention based on the count of bytes read from the given counting input stream. The resulting decorator can be passed to a Tika parser along with the given counting input stream.
        Parameters:
        handler - the content handler to be decorated
        stream - the input stream to be parsed
    • Method Detail

      • getOutputThreshold

        public long getOutputThreshold()
        Returns the configured output threshold.
        Returns:
        output threshold
      • setOutputThreshold

        public void setOutputThreshold​(long threshold)
        Sets the threshold for output characters before the zip bomb prevention is activated. This avoids false positives in cases where an otherwise normal document for some reason starts with a highly compressible sequence of bytes.
        Parameters:
        threshold - new output threshold
      • getMaximumCompressionRatio

        public long getMaximumCompressionRatio()
        Returns the maximum compression ratio.
        Returns:
        maximum compression ratio
      • setMaximumCompressionRatio

        public void setMaximumCompressionRatio​(long ratio)
        Sets the ratio between output characters and input bytes. If this ratio is exceeded (after the output threshold has been reached) then an exception gets thrown.
        Parameters:
        ratio - new maximum compression ratio
      • getMaximumDepth

        public int getMaximumDepth()
        Returns the maximum XML element nesting level.
        Returns:
        maximum XML element nesting level
      • setMaximumDepth

        public void setMaximumDepth​(int depth)
        Sets the maximum XML element nesting level. If this depth level is exceeded then an exception gets thrown.
        Parameters:
        depth - maximum XML element nesting level
      • getMaximumPackageEntryDepth

        public int getMaximumPackageEntryDepth()
        Returns the maximum package entry nesting level.
        Returns:
        maximum package entry nesting level
      • setMaximumPackageEntryDepth

        public void setMaximumPackageEntryDepth​(int depth)
        Sets the maximum package entry nesting level. If this depth level is exceeded then an exception gets thrown.
        Parameters:
        depth - maximum package entry nesting level
      • advance

        protected void advance​(int length)
                        throws SAXException
        Records the given number of output characters (or more accurately UTF-16 code units). Throws an exception if the recorded number of characters highly exceeds the number of input bytes read.
        Parameters:
        length - number of new output characters produced
        Throws:
        SAXException - if a zip bomb is detected