Package org.apache.tika.pipes.fetcher.s3
Class S3Fetcher
java.lang.Object
org.apache.tika.pipes.fetcher.AbstractFetcher
org.apache.tika.pipes.fetcher.s3.S3Fetcher
- All Implemented Interfaces:
Initializable,Fetcher,RangeFetcher
Fetches files from s3. Example file: s3://my_bucket/path/to/my_file.pdf
The bucket must be specified via the tika-config or before
initialization, and the fetch key is "path/to/my_file.pdf".
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcheckInitialization(InitializableProblemHandler problemHandler) fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) fetch(String fetchKey, Metadata metadata, ParseContext parseContext) long[]voidinitialize(Map<String, Param> params) This initializes the s3 client.voidsetAccessKey(String accessKey) voidvoidsetCredentialsProvider(String credentialsProvider) voidsetEndpointConfigurationService(String endpointConfigurationService) voidsetExtractUserMetadata(boolean extractUserMetadata) Whether or not to extract user metadata from the S3ObjectvoidsetMaxConnections(int maxConnections) voidsetMaxLength(long maxLength) voidsetPathStyleAccessEnabled(boolean pathStyleAccessEnabled) voidprefix to prepend to the fetch key before fetching.voidsetProfile(String profile) voidvoidsetSecretKey(String secretKey) voidsetSleepBeforeRetryMillis(long sleepBeforeRetryMillis) Deprecated.voidsetSpoolToTemp(boolean spoolToTemp) voidsetThrottleSeconds(long[] throttleSeconds) voidsetThrottleSeconds(String commaDelimitedLongs) Set seconds to throttle retries as a comma-delimited list, e.g.: 30,60,120,600Methods inherited from class org.apache.tika.pipes.fetcher.AbstractFetcher
getName, setNameMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.tika.pipes.fetcher.RangeFetcher
fetch
-
Constructor Details
-
S3Fetcher
public S3Fetcher() -
S3Fetcher
-
-
Method Details
-
fetch
public InputStream fetch(String fetchKey, Metadata metadata, ParseContext parseContext) throws TikaException, IOException - Specified by:
fetchin interfaceFetcher- Throws:
TikaExceptionIOException
-
fetch
public InputStream fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) throws TikaException, IOException - Specified by:
fetchin interfaceRangeFetcher- Throws:
TikaExceptionIOException
-
setSpoolToTemp
-
setRegion
-
setProfile
-
setBucket
-
setThrottleSeconds
Set seconds to throttle retries as a comma-delimited list, e.g.: 30,60,120,600- Parameters:
commaDelimitedLongs-- Throws:
TikaConfigException
-
setThrottleSeconds
public void setThrottleSeconds(long[] throttleSeconds) -
getThrottleSeconds
public long[] getThrottleSeconds() -
setPrefix
prefix to prepend to the fetch key before fetching. This will automatically add a '/' at the end.- Parameters:
prefix-
-
setExtractUserMetadata
Whether or not to extract user metadata from the S3Object- Parameters:
extractUserMetadata-
-
setMaxConnections
-
setCredentialsProvider
-
setMaxLength
-
setSleepBeforeRetryMillis
Deprecated.- Parameters:
sleepBeforeRetryMillis- -- amount of time in millis to sleep if there was a failure
-
setAccessKey
-
setSecretKey
-
initialize
This initializes the s3 client. Note, we wrap S3's RuntimeExceptions, e.g. AmazonClientException in a TikaConfigException.- Specified by:
initializein interfaceInitializable- Parameters:
params- params to use for initialization- Throws:
TikaConfigException
-
checkInitialization
public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException - Specified by:
checkInitializationin interfaceInitializable- Parameters:
problemHandler- if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.- Throws:
TikaConfigException
-
setEndpointConfigurationService
-
setPathStyleAccessEnabled
-
setThrottleSeconds(String)