Class HttpFetcher
java.lang.Object
org.apache.tika.plugins.AbstractTikaExtension
org.apache.tika.pipes.fetcher.http.HttpFetcher
- All Implemented Interfaces:
Fetcher,RangeFetcher,TikaExtension,org.pf4j.ExtensionPoint
Based on Apache httpclient
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic Propertystatic Propertystatic Stringstatic Propertystatic Stringstatic PropertyNumber of redirectsstatic Propertyhttp status codestatic Propertystatic PropertyIf there were redirects, this captures the final URL visitedFields inherited from class org.apache.tika.plugins.AbstractTikaExtension
pluginConfig -
Constructor Summary
ConstructorsConstructorDescriptionHttpFetcher(ExtensionConfig pluginConfig, HttpFetcherConfig httpFetcherConfig) -
Method Summary
Modifier and TypeMethodDescriptionstatic HttpFetcherbuild(ExtensionConfig pluginConfig) fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) fetch(String fetchKey, Metadata metadata, ParseContext parseContext) Fetches a resource and returns it as a TikaInputStream.static Map<String,Collection<String>> parseHeaders(String headersString) voidsetHttpClient(org.apache.http.client.HttpClient httpClient) voidsetHttpClientFactory(HttpClientFactory httpClientFactory) voidsetHttpFetcherConfig(HttpFetcherConfig httpFetcherConfig) voidsetJwtGenerator(JwtGenerator jwtGenerator) Methods inherited from class org.apache.tika.plugins.AbstractTikaExtension
getExtensionConfigMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.tika.pipes.api.fetcher.RangeFetcher
fetchMethods inherited from interface org.apache.tika.plugins.TikaExtension
getExtensionConfig
-
Field Details
-
HTTP_HEADER_PREFIX
-
HTTP_FETCH_PREFIX
-
HTTP_STATUS_CODE
http status code -
HTTP_NUM_REDIRECTS
Number of redirects -
HTTP_TARGET_URL
If there were redirects, this captures the final URL visited -
HTTP_TARGET_IP_ADDRESS
-
HTTP_FETCH_TRUNCATED
-
HTTP_CONTENT_ENCODING
-
HTTP_CONTENT_TYPE
-
-
Constructor Details
-
HttpFetcher
-
-
Method Details
-
build
public static HttpFetcher build(ExtensionConfig pluginConfig) throws TikaConfigException, IOException - Throws:
TikaConfigExceptionIOException
-
fetch
public TikaInputStream fetch(String fetchKey, Metadata metadata, ParseContext parseContext) throws IOException, TikaException Description copied from interface:FetcherFetches a resource and returns it as a TikaInputStream.- Specified by:
fetchin interfaceFetcher- Parameters:
fetchKey- the key identifying the resource to fetch (interpretation depends on the implementation, e.g., file path, URL, S3 key)metadata- metadata object to be updated with resource informationparseContext- the parse context- Returns:
- a TikaInputStream for reading the resource content
- Throws:
IOException- if an I/O error occurs during fetchingTikaException- if a Tika-specific error occurs during fetching
-
fetch
public TikaInputStream fetch(String fetchKey, long startRange, long endRange, Metadata metadata, ParseContext parseContext) throws IOException, TikaException - Specified by:
fetchin interfaceRangeFetcher- Throws:
IOExceptionTikaException
-
parseHeaders
-
setHttpClientFactory
-
setHttpFetcherConfig
- Throws:
TikaConfigException
-
setHttpClient
public void setHttpClient(org.apache.http.client.HttpClient httpClient) -
getHttpFetcherConfig
-
setJwtGenerator
-
getJwtGenerator
-