Class HttpFetcher
- java.lang.Object
-
- org.apache.tika.pipes.fetcher.AbstractFetcher
-
- org.apache.tika.pipes.fetcher.http.HttpFetcher
-
- All Implemented Interfaces:
Initializable,Fetcher,RangeFetcher
public class HttpFetcher extends AbstractFetcher implements Initializable, RangeFetcher
Based on Apache httpclient
-
-
Field Summary
Fields Modifier and Type Field Description static PropertyHTTP_CONTENT_ENCODINGstatic PropertyHTTP_CONTENT_TYPEstatic StringHTTP_FETCH_PREFIXstatic PropertyHTTP_FETCH_TRUNCATEDstatic StringHTTP_HEADER_PREFIXstatic PropertyHTTP_NUM_REDIRECTSNumber of redirectsstatic PropertyHTTP_STATUS_CODEhttp status codestatic PropertyHTTP_TARGET_IP_ADDRESSstatic PropertyHTTP_TARGET_URLIf there were redirects, this captures the final URL visited
-
Constructor Summary
Constructors Constructor Description HttpFetcher()HttpFetcher(HttpFetcherConfig httpFetcherConfig)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcheckInitialization(InitializableProblemHandler problemHandler)InputStreamfetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext)InputStreamfetch(String fetchKey, Metadata metadata, ParseContext parseContext)org.apache.http.client.HttpClientgetHttpClient()HttpFetcherConfiggetHttpFetcherConfig()voidinitialize(Map<String,Param> params)static Map<String,Collection<String>>parseHeaders(String headersString)voidsetAuthScheme(String authScheme)voidsetConnectTimeout(int connectTimeout)voidsetHttpClient(org.apache.http.client.HttpClient httpClient)voidsetHttpClientFactory(HttpClientFactory httpClientFactory)voidsetHttpFetcherConfig(HttpFetcherConfig httpFetcherConfig)voidsetHttpHeaders(List<String> headers)Which http headers should we capture in the metadata.voidsetHttpRequestHeaders(List<String> headers)Which http request headers should we send in the http fetch requests.voidsetJwtExpiresInSeconds(int jwtExpiresInSeconds)voidsetJwtIssuer(String jwtIssuer)voidsetJwtPrivateKeyBase64(String jwtPrivateKeyBase64)voidsetJwtSecret(String jwtSecret)voidsetJwtSubject(String jwtSubject)voidsetMaxConnections(int maxConnections)voidsetMaxConnectionsPerRoute(int maxConnectionsPerRoute)voidsetMaxErrMsgSize(int maxErrMsgSize)voidsetMaxRedirects(int maxRedirects)voidsetMaxSpoolSize(long maxSpoolSize)Set the maximum number of bytes to spool to a temp file.voidsetNtDomain(String domain)voidsetOverallTimeout(long overallTimeout)This sets an overall timeout on the request.voidsetPassword(String password)voidsetProxyHost(String proxyHost)voidsetProxyPort(int proxyPort)voidsetRequestTimeout(int requestTimeout)voidsetSocketTimeout(int socketTimeout)voidsetUserAgent(String userAgent)When making the request, what User-Agent is sent in the request.voidsetUserName(String userName)-
Methods inherited from class org.apache.tika.pipes.fetcher.AbstractFetcher
getName, setName
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.tika.pipes.fetcher.RangeFetcher
fetch
-
-
-
-
Field Detail
-
HTTP_HEADER_PREFIX
public static String HTTP_HEADER_PREFIX
-
HTTP_FETCH_PREFIX
public static String HTTP_FETCH_PREFIX
-
HTTP_STATUS_CODE
public static Property HTTP_STATUS_CODE
http status code
-
HTTP_NUM_REDIRECTS
public static Property HTTP_NUM_REDIRECTS
Number of redirects
-
HTTP_TARGET_URL
public static Property HTTP_TARGET_URL
If there were redirects, this captures the final URL visited
-
HTTP_TARGET_IP_ADDRESS
public static Property HTTP_TARGET_IP_ADDRESS
-
HTTP_FETCH_TRUNCATED
public static Property HTTP_FETCH_TRUNCATED
-
HTTP_CONTENT_ENCODING
public static Property HTTP_CONTENT_ENCODING
-
HTTP_CONTENT_TYPE
public static Property HTTP_CONTENT_TYPE
-
-
Constructor Detail
-
HttpFetcher
public HttpFetcher()
-
HttpFetcher
public HttpFetcher(HttpFetcherConfig httpFetcherConfig)
-
-
Method Detail
-
fetch
public InputStream fetch(String fetchKey, Metadata metadata, ParseContext parseContext) throws IOException, TikaException
- Specified by:
fetchin interfaceFetcher- Throws:
IOExceptionTikaException
-
fetch
public InputStream fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) throws IOException, TikaException
- Specified by:
fetchin interfaceRangeFetcher- Throws:
IOExceptionTikaException
-
setProxyPort
@Field public void setProxyPort(int proxyPort)
-
setConnectTimeout
@Field public void setConnectTimeout(int connectTimeout)
-
setRequestTimeout
@Field public void setRequestTimeout(int requestTimeout)
-
setSocketTimeout
@Field public void setSocketTimeout(int socketTimeout)
-
setMaxConnections
@Field public void setMaxConnections(int maxConnections)
-
setMaxConnectionsPerRoute
@Field public void setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
-
setMaxSpoolSize
@Field public void setMaxSpoolSize(long maxSpoolSize)
Set the maximum number of bytes to spool to a temp file. If this value is-1, the full stream will be spooled to a temp fileDefault size is -1.
- Parameters:
maxSpoolSize-
-
setMaxRedirects
@Field public void setMaxRedirects(int maxRedirects)
-
setHttpRequestHeaders
@Field public void setHttpRequestHeaders(List<String> headers)
Which http request headers should we send in the http fetch requests.- Parameters:
headers- The headers to add to the HTTP GET requests.
-
parseHeaders
public static Map<String,Collection<String>> parseHeaders(String headersString)
-
setHttpHeaders
@Field public void setHttpHeaders(List<String> headers)
Which http headers should we capture in the metadata. Keys will be prepended withHTTP_HEADER_PREFIX- Parameters:
headers-
-
setOverallTimeout
@Field public void setOverallTimeout(long overallTimeout)
This sets an overall timeout on the request. If a server is super slow or the file is very long, the other timeouts might not be triggered.- Parameters:
overallTimeout-
-
setMaxErrMsgSize
@Field public void setMaxErrMsgSize(int maxErrMsgSize)
-
setUserAgent
@Field public void setUserAgent(String userAgent)
When making the request, what User-Agent is sent in the request. By default httpclient adds e.g. "Apache-HttpClient/4.5.13 (Java/x.y.z)"- Parameters:
userAgent-
-
setJwtExpiresInSeconds
@Field public void setJwtExpiresInSeconds(int jwtExpiresInSeconds)
-
initialize
public void initialize(Map<String,Param> params) throws TikaConfigException
- Specified by:
initializein interfaceInitializable- Parameters:
params- params to use for initialization- Throws:
TikaConfigException
-
checkInitialization
public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
- Specified by:
checkInitializationin interfaceInitializable- Parameters:
problemHandler- if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.- Throws:
TikaConfigException
-
setHttpClientFactory
public void setHttpClientFactory(HttpClientFactory httpClientFactory)
-
setHttpClient
public void setHttpClient(org.apache.http.client.HttpClient httpClient)
-
getHttpClient
public org.apache.http.client.HttpClient getHttpClient()
-
getHttpFetcherConfig
public HttpFetcherConfig getHttpFetcherConfig()
-
setHttpFetcherConfig
public void setHttpFetcherConfig(HttpFetcherConfig httpFetcherConfig)
-
-