Class HttpFetcher
java.lang.Object
org.apache.tika.pipes.fetcher.AbstractFetcher
org.apache.tika.pipes.fetcher.http.HttpFetcher
- All Implemented Interfaces:
Initializable,Fetcher,RangeFetcher
Based on Apache httpclient
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcheckInitialization(InitializableProblemHandler problemHandler) fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) fetch(String fetchKey, Metadata metadata, ParseContext parseContext) org.apache.http.client.HttpClientvoidinitialize(Map<String, Param> params) static Map<String,Collection<String>> parseHeaders(String headersString) voidsetAuthScheme(String authScheme) voidsetConnectTimeout(int connectTimeout) voidsetHttpClient(org.apache.http.client.HttpClient httpClient) voidsetHttpClientFactory(HttpClientFactory httpClientFactory) voidsetHttpFetcherConfig(HttpFetcherConfig httpFetcherConfig) voidsetHttpHeaders(List<String> headers) Which http headers should we capture in the metadata.voidsetHttpRequestHeaders(List<String> headers) Which http request headers should we send in the http fetch requests.voidsetJwtExpiresInSeconds(int jwtExpiresInSeconds) voidsetJwtIssuer(String jwtIssuer) voidsetJwtPrivateKeyBase64(String jwtPrivateKeyBase64) voidsetJwtSecret(String jwtSecret) voidsetJwtSubject(String jwtSubject) voidsetMaxConnections(int maxConnections) voidsetMaxConnectionsPerRoute(int maxConnectionsPerRoute) voidsetMaxErrMsgSize(int maxErrMsgSize) voidsetMaxRedirects(int maxRedirects) voidsetMaxSpoolSize(long maxSpoolSize) Set the maximum number of bytes to spool to a temp file.voidsetNtDomain(String domain) voidsetOverallTimeout(long overallTimeout) This sets an overall timeout on the request.voidsetPassword(String password) voidsetProxyHost(String proxyHost) voidsetProxyPort(int proxyPort) voidsetRequestTimeout(int requestTimeout) voidsetSocketTimeout(int socketTimeout) voidsetUserAgent(String userAgent) When making the request, what User-Agent is sent in the request.voidsetUserName(String userName) Methods inherited from class org.apache.tika.pipes.fetcher.AbstractFetcher
getName, setNameMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.tika.pipes.fetcher.RangeFetcher
fetch
-
Field Details
-
HTTP_HEADER_PREFIX
-
HTTP_FETCH_PREFIX
-
HTTP_STATUS_CODE
http status code -
HTTP_NUM_REDIRECTS
Number of redirects -
HTTP_TARGET_URL
If there were redirects, this captures the final URL visited -
HTTP_TARGET_IP_ADDRESS
-
HTTP_FETCH_TRUNCATED
-
HTTP_CONTENT_ENCODING
-
HTTP_CONTENT_TYPE
-
-
Constructor Details
-
HttpFetcher
public HttpFetcher() -
HttpFetcher
-
-
Method Details
-
fetch
public InputStream fetch(String fetchKey, Metadata metadata, ParseContext parseContext) throws IOException, TikaException - Specified by:
fetchin interfaceFetcher- Throws:
IOExceptionTikaException
-
fetch
public InputStream fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) throws IOException, TikaException - Specified by:
fetchin interfaceRangeFetcher- Throws:
IOExceptionTikaException
-
setUserName
-
setPassword
-
setNtDomain
-
setAuthScheme
-
setProxyHost
-
setProxyPort
-
setConnectTimeout
-
setRequestTimeout
-
setSocketTimeout
-
setMaxConnections
-
setMaxConnectionsPerRoute
-
setMaxSpoolSize
Set the maximum number of bytes to spool to a temp file. If this value is-1, the full stream will be spooled to a temp fileDefault size is -1.
- Parameters:
maxSpoolSize-
-
setMaxRedirects
-
setHttpRequestHeaders
Which http request headers should we send in the http fetch requests.- Parameters:
headers- The headers to add to the HTTP GET requests.
-
parseHeaders
-
setHttpHeaders
Which http headers should we capture in the metadata. Keys will be prepended withHTTP_HEADER_PREFIX- Parameters:
headers-
-
setOverallTimeout
This sets an overall timeout on the request. If a server is super slow or the file is very long, the other timeouts might not be triggered.- Parameters:
overallTimeout-
-
setMaxErrMsgSize
-
setUserAgent
When making the request, what User-Agent is sent in the request. By default httpclient adds e.g. "Apache-HttpClient/4.5.13 (Java/x.y.z)"- Parameters:
userAgent-
-
setJwtIssuer
-
setJwtSubject
-
setJwtExpiresInSeconds
-
setJwtSecret
-
setJwtPrivateKeyBase64
-
initialize
- Specified by:
initializein interfaceInitializable- Parameters:
params- params to use for initialization- Throws:
TikaConfigException
-
checkInitialization
public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException - Specified by:
checkInitializationin interfaceInitializable- Parameters:
problemHandler- if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.- Throws:
TikaConfigException
-
setHttpClientFactory
-
setHttpClient
public void setHttpClient(org.apache.http.client.HttpClient httpClient) -
getHttpClient
public org.apache.http.client.HttpClient getHttpClient() -
getHttpFetcherConfig
-
setHttpFetcherConfig
-