Class HttpFetcher
- java.lang.Object
-
- org.apache.tika.pipes.fetcher.AbstractFetcher
-
- org.apache.tika.pipes.fetcher.http.HttpFetcher
-
- All Implemented Interfaces:
Initializable
,Fetcher
,RangeFetcher
public class HttpFetcher extends AbstractFetcher implements Initializable, RangeFetcher
Based on Apache httpclient
-
-
Field Summary
Fields Modifier and Type Field Description static Property
HTTP_CONTENT_ENCODING
static Property
HTTP_CONTENT_TYPE
static String
HTTP_FETCH_PREFIX
static Property
HTTP_FETCH_TRUNCATED
static String
HTTP_HEADER_PREFIX
static Property
HTTP_NUM_REDIRECTS
Number of redirectsstatic Property
HTTP_STATUS_CODE
http status codestatic Property
HTTP_TARGET_IP_ADDRESS
static Property
HTTP_TARGET_URL
If there were redirects, this captures the final URL visited
-
Constructor Summary
Constructors Constructor Description HttpFetcher()
HttpFetcher(HttpFetcherConfig httpFetcherConfig)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
checkInitialization(InitializableProblemHandler problemHandler)
InputStream
fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext)
InputStream
fetch(String fetchKey, Metadata metadata, ParseContext parseContext)
org.apache.http.client.HttpClient
getHttpClient()
HttpFetcherConfig
getHttpFetcherConfig()
void
initialize(Map<String,Param> params)
static Map<String,List<String>>
parseHeaders(String headersString)
void
setAuthScheme(String authScheme)
void
setConnectTimeout(int connectTimeout)
void
setHttpClient(org.apache.http.client.HttpClient httpClient)
void
setHttpClientFactory(HttpClientFactory httpClientFactory)
void
setHttpFetcherConfig(HttpFetcherConfig httpFetcherConfig)
void
setHttpHeaders(List<String> headers)
Which http headers should we capture in the metadata.void
setHttpRequestHeaders(List<String> headers)
Which http request headers should we send in the http fetch requests.void
setJwtExpiresInSeconds(int jwtExpiresInSeconds)
void
setJwtIssuer(String jwtIssuer)
void
setJwtPrivateKeyBase64(String jwtPrivateKeyBase64)
void
setJwtSecret(String jwtSecret)
void
setJwtSubject(String jwtSubject)
void
setMaxConnections(int maxConnections)
void
setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
void
setMaxErrMsgSize(int maxErrMsgSize)
void
setMaxRedirects(int maxRedirects)
void
setMaxSpoolSize(long maxSpoolSize)
Set the maximum number of bytes to spool to a temp file.void
setNtDomain(String domain)
void
setOverallTimeout(long overallTimeout)
This sets an overall timeout on the request.void
setPassword(String password)
void
setProxyHost(String proxyHost)
void
setProxyPort(int proxyPort)
void
setRequestTimeout(int requestTimeout)
void
setSocketTimeout(int socketTimeout)
void
setUserAgent(String userAgent)
When making the request, what User-Agent is sent in the request.void
setUserName(String userName)
-
Methods inherited from class org.apache.tika.pipes.fetcher.AbstractFetcher
getName, setName
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.tika.pipes.fetcher.RangeFetcher
fetch
-
-
-
-
Field Detail
-
HTTP_HEADER_PREFIX
public static String HTTP_HEADER_PREFIX
-
HTTP_FETCH_PREFIX
public static String HTTP_FETCH_PREFIX
-
HTTP_STATUS_CODE
public static Property HTTP_STATUS_CODE
http status code
-
HTTP_NUM_REDIRECTS
public static Property HTTP_NUM_REDIRECTS
Number of redirects
-
HTTP_TARGET_URL
public static Property HTTP_TARGET_URL
If there were redirects, this captures the final URL visited
-
HTTP_TARGET_IP_ADDRESS
public static Property HTTP_TARGET_IP_ADDRESS
-
HTTP_FETCH_TRUNCATED
public static Property HTTP_FETCH_TRUNCATED
-
HTTP_CONTENT_ENCODING
public static Property HTTP_CONTENT_ENCODING
-
HTTP_CONTENT_TYPE
public static Property HTTP_CONTENT_TYPE
-
-
Constructor Detail
-
HttpFetcher
public HttpFetcher()
-
HttpFetcher
public HttpFetcher(HttpFetcherConfig httpFetcherConfig)
-
-
Method Detail
-
fetch
public InputStream fetch(String fetchKey, Metadata metadata, ParseContext parseContext) throws IOException, TikaException
- Specified by:
fetch
in interfaceFetcher
- Throws:
IOException
TikaException
-
fetch
public InputStream fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) throws IOException, TikaException
- Specified by:
fetch
in interfaceRangeFetcher
- Throws:
IOException
TikaException
-
setProxyPort
@Field public void setProxyPort(int proxyPort)
-
setConnectTimeout
@Field public void setConnectTimeout(int connectTimeout)
-
setRequestTimeout
@Field public void setRequestTimeout(int requestTimeout)
-
setSocketTimeout
@Field public void setSocketTimeout(int socketTimeout)
-
setMaxConnections
@Field public void setMaxConnections(int maxConnections)
-
setMaxConnectionsPerRoute
@Field public void setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
-
setMaxSpoolSize
@Field public void setMaxSpoolSize(long maxSpoolSize)
Set the maximum number of bytes to spool to a temp file. If this value is-1
, the full stream will be spooled to a temp fileDefault size is -1.
- Parameters:
maxSpoolSize
-
-
setMaxRedirects
@Field public void setMaxRedirects(int maxRedirects)
-
setHttpRequestHeaders
@Field public void setHttpRequestHeaders(List<String> headers)
Which http request headers should we send in the http fetch requests.- Parameters:
headers
- The headers to add to the HTTP GET requests.
-
setHttpHeaders
@Field public void setHttpHeaders(List<String> headers)
Which http headers should we capture in the metadata. Keys will be prepended withHTTP_HEADER_PREFIX
- Parameters:
headers
-
-
setOverallTimeout
@Field public void setOverallTimeout(long overallTimeout)
This sets an overall timeout on the request. If a server is super slow or the file is very long, the other timeouts might not be triggered.- Parameters:
overallTimeout
-
-
setMaxErrMsgSize
@Field public void setMaxErrMsgSize(int maxErrMsgSize)
-
setUserAgent
@Field public void setUserAgent(String userAgent)
When making the request, what User-Agent is sent in the request. By default httpclient adds e.g. "Apache-HttpClient/4.5.13 (Java/x.y.z)"- Parameters:
userAgent
-
-
setJwtExpiresInSeconds
@Field public void setJwtExpiresInSeconds(int jwtExpiresInSeconds)
-
initialize
public void initialize(Map<String,Param> params) throws TikaConfigException
- Specified by:
initialize
in interfaceInitializable
- Parameters:
params
- params to use for initialization- Throws:
TikaConfigException
-
checkInitialization
public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
- Specified by:
checkInitialization
in interfaceInitializable
- Parameters:
problemHandler
- if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.- Throws:
TikaConfigException
-
setHttpClientFactory
public void setHttpClientFactory(HttpClientFactory httpClientFactory)
-
setHttpClient
public void setHttpClient(org.apache.http.client.HttpClient httpClient)
-
getHttpClient
public org.apache.http.client.HttpClient getHttpClient()
-
getHttpFetcherConfig
public HttpFetcherConfig getHttpFetcherConfig()
-
setHttpFetcherConfig
public void setHttpFetcherConfig(HttpFetcherConfig httpFetcherConfig)
-
-