Class S3Fetcher

java.lang.Object
org.apache.tika.pipes.fetcher.AbstractFetcher
org.apache.tika.pipes.fetcher.s3.S3Fetcher
All Implemented Interfaces:
Initializable, Fetcher, RangeFetcher

public class S3Fetcher extends AbstractFetcher implements Initializable, RangeFetcher
Fetches files from s3. Example file: s3://my_bucket/path/to/my_file.pdf The bucket must be specified via the tika-config or before initialization, and the fetch key is "path/to/my_file.pdf".
  • Constructor Details

    • S3Fetcher

      public S3Fetcher()
    • S3Fetcher

      public S3Fetcher(S3FetcherConfig s3FetcherConfig)
  • Method Details

    • fetch

      public InputStream fetch(String fetchKey, Metadata metadata, ParseContext parseContext) throws TikaException, IOException
      Specified by:
      fetch in interface Fetcher
      Throws:
      TikaException
      IOException
    • fetch

      public InputStream fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) throws TikaException, IOException
      Specified by:
      fetch in interface RangeFetcher
      Throws:
      TikaException
      IOException
    • setSpoolToTemp

      @Field public void setSpoolToTemp(boolean spoolToTemp)
    • setRegion

      @Field public void setRegion(String region)
    • setProfile

      @Field public void setProfile(String profile)
    • setBucket

      @Field public void setBucket(String bucket)
    • setThrottleSeconds

      @Field public void setThrottleSeconds(String commaDelimitedLongs) throws TikaConfigException
      Set seconds to throttle retries as a comma-delimited list, e.g.: 30,60,120,600
      Parameters:
      commaDelimitedLongs -
      Throws:
      TikaConfigException
    • setThrottleSeconds

      public void setThrottleSeconds(long[] throttleSeconds)
    • getThrottleSeconds

      public long[] getThrottleSeconds()
    • setPrefix

      @Field public void setPrefix(String prefix)
      prefix to prepend to the fetch key before fetching. This will automatically add a '/' at the end.
      Parameters:
      prefix -
    • setExtractUserMetadata

      @Field public void setExtractUserMetadata(boolean extractUserMetadata)
      Whether or not to extract user metadata from the S3Object
      Parameters:
      extractUserMetadata -
    • setMaxConnections

      @Field public void setMaxConnections(int maxConnections)
    • setCredentialsProvider

      @Field public void setCredentialsProvider(String credentialsProvider)
    • setMaxLength

      @Field public void setMaxLength(long maxLength)
    • setSleepBeforeRetryMillis

      @Deprecated @Field public void setSleepBeforeRetryMillis(long sleepBeforeRetryMillis)
      Parameters:
      sleepBeforeRetryMillis - -- amount of time in millis to sleep if there was a failure
    • setAccessKey

      @Field public void setAccessKey(String accessKey)
    • setSecretKey

      @Field public void setSecretKey(String secretKey)
    • initialize

      public void initialize(Map<String,Param> params) throws TikaConfigException
      This initializes the s3 client. Note, we wrap S3's RuntimeExceptions, e.g. AmazonClientException in a TikaConfigException.
      Specified by:
      initialize in interface Initializable
      Parameters:
      params - params to use for initialization
      Throws:
      TikaConfigException
    • checkInitialization

      public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
      Specified by:
      checkInitialization in interface Initializable
      Parameters:
      problemHandler - if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.
      Throws:
      TikaConfigException
    • setEndpointConfigurationService

      @Field public void setEndpointConfigurationService(String endpointConfigurationService)
    • setPathStyleAccessEnabled

      @Field public void setPathStyleAccessEnabled(boolean pathStyleAccessEnabled)