Class HttpFetcher
java.lang.Object
org.apache.tika.pipes.fetcher.AbstractFetcher
org.apache.tika.pipes.fetcher.http.HttpFetcher
- All Implemented Interfaces:
Initializable
,Fetcher
,RangeFetcher
Based on Apache httpclient
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
checkInitialization
(InitializableProblemHandler problemHandler) fetch
(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) fetch
(String fetchKey, Metadata metadata, ParseContext parseContext) org.apache.http.client.HttpClient
void
initialize
(Map<String, Param> params) static Map<String,
Collection<String>> parseHeaders
(String headersString) void
setAuthScheme
(String authScheme) void
setConnectTimeout
(int connectTimeout) void
setHttpClient
(org.apache.http.client.HttpClient httpClient) void
setHttpClientFactory
(HttpClientFactory httpClientFactory) void
setHttpFetcherConfig
(HttpFetcherConfig httpFetcherConfig) void
setHttpHeaders
(List<String> headers) Which http headers should we capture in the metadata.void
setHttpRequestHeaders
(List<String> headers) Which http request headers should we send in the http fetch requests.void
setJwtExpiresInSeconds
(int jwtExpiresInSeconds) void
setJwtIssuer
(String jwtIssuer) void
setJwtPrivateKeyBase64
(String jwtPrivateKeyBase64) void
setJwtSecret
(String jwtSecret) void
setJwtSubject
(String jwtSubject) void
setMaxConnections
(int maxConnections) void
setMaxConnectionsPerRoute
(int maxConnectionsPerRoute) void
setMaxErrMsgSize
(int maxErrMsgSize) void
setMaxRedirects
(int maxRedirects) void
setMaxSpoolSize
(long maxSpoolSize) Set the maximum number of bytes to spool to a temp file.void
setNtDomain
(String domain) void
setOverallTimeout
(long overallTimeout) This sets an overall timeout on the request.void
setPassword
(String password) void
setProxyHost
(String proxyHost) void
setProxyPort
(int proxyPort) void
setRequestTimeout
(int requestTimeout) void
setSocketTimeout
(int socketTimeout) void
setUserAgent
(String userAgent) When making the request, what User-Agent is sent in the request.void
setUserName
(String userName) Methods inherited from class org.apache.tika.pipes.fetcher.AbstractFetcher
getName, setName
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.tika.pipes.fetcher.RangeFetcher
fetch
-
Field Details
-
HTTP_HEADER_PREFIX
-
HTTP_FETCH_PREFIX
-
HTTP_STATUS_CODE
http status code -
HTTP_NUM_REDIRECTS
Number of redirects -
HTTP_TARGET_URL
If there were redirects, this captures the final URL visited -
HTTP_TARGET_IP_ADDRESS
-
HTTP_FETCH_TRUNCATED
-
HTTP_CONTENT_ENCODING
-
HTTP_CONTENT_TYPE
-
-
Constructor Details
-
HttpFetcher
public HttpFetcher() -
HttpFetcher
-
-
Method Details
-
fetch
public InputStream fetch(String fetchKey, Metadata metadata, ParseContext parseContext) throws IOException, TikaException - Specified by:
fetch
in interfaceFetcher
- Throws:
IOException
TikaException
-
fetch
public InputStream fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) throws IOException, TikaException - Specified by:
fetch
in interfaceRangeFetcher
- Throws:
IOException
TikaException
-
setUserName
-
setPassword
-
setNtDomain
-
setAuthScheme
-
setProxyHost
-
setProxyPort
-
setConnectTimeout
-
setRequestTimeout
-
setSocketTimeout
-
setMaxConnections
-
setMaxConnectionsPerRoute
-
setMaxSpoolSize
Set the maximum number of bytes to spool to a temp file. If this value is-1
, the full stream will be spooled to a temp fileDefault size is -1.
- Parameters:
maxSpoolSize
-
-
setMaxRedirects
-
setHttpRequestHeaders
Which http request headers should we send in the http fetch requests.- Parameters:
headers
- The headers to add to the HTTP GET requests.
-
parseHeaders
-
setHttpHeaders
Which http headers should we capture in the metadata. Keys will be prepended withHTTP_HEADER_PREFIX
- Parameters:
headers
-
-
setOverallTimeout
This sets an overall timeout on the request. If a server is super slow or the file is very long, the other timeouts might not be triggered.- Parameters:
overallTimeout
-
-
setMaxErrMsgSize
-
setUserAgent
When making the request, what User-Agent is sent in the request. By default httpclient adds e.g. "Apache-HttpClient/4.5.13 (Java/x.y.z)"- Parameters:
userAgent
-
-
setJwtIssuer
-
setJwtSubject
-
setJwtExpiresInSeconds
-
setJwtSecret
-
setJwtPrivateKeyBase64
-
initialize
- Specified by:
initialize
in interfaceInitializable
- Parameters:
params
- params to use for initialization- Throws:
TikaConfigException
-
checkInitialization
public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException - Specified by:
checkInitialization
in interfaceInitializable
- Parameters:
problemHandler
- if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.- Throws:
TikaConfigException
-
setHttpClientFactory
-
setHttpClient
public void setHttpClient(org.apache.http.client.HttpClient httpClient) -
getHttpClient
public org.apache.http.client.HttpClient getHttpClient() -
getHttpFetcherConfig
-
setHttpFetcherConfig
-