Class URLEnabledInputStreamFactory

  • All Implemented Interfaces:
    InputStreamFactory

    public class URLEnabledInputStreamFactory
    extends Object
    implements InputStreamFactory
    This class looks for "fileUrl" in the http header. If it is not null and not empty, this will return a new TikaInputStream from the URL.

    This is not meant to be used in place of a robust, responsible crawler. Rather, this is a convenience factory.

    WARNING: Unless you carefully lock down access to the server, whoever has access to this service will have the read access of the server. In short, anyone with access to this service could request and get "file:///etc/supersensitive_file_dont_read.txt". Or, if your server has access to your intranet, and you let the public hit this service, they will now have access to your intranet. See CVE-2015-3271