HTTP Plugin
The HTTP plugin (tika-pipes-http) provides a fetcher that downloads documents over HTTP(S). It is fetcher-only — pair it with another emitter and iterator.
| Interface | Component name | Class |
|---|---|---|
Fetcher |
|
|
HTTP Fetcher (http-fetcher)
Fetches document bytes from an HTTP(S) URL. The fetch key is the URL.
{
"fetchers": {
"httpf": {
"http-fetcher": {
"userName": "tika",
"password": "REDACTED",
"authScheme": "basic",
"userAgent": "tika-pipes/1.0",
"maxConnections": 2000,
"maxConnectionsPerRoute": 1000,
"connectTimeoutMillis": 30000,
"socketTimeoutMillis": 60000,
"requestTimeoutMillis": 60000,
"overallTimeoutMillis": 120000,
"maxRedirects": 5,
"maxSpoolSize": -1,
"httpHeaders": ["Accept: application/octet-stream"]
}
}
}
}
Configuration
| Field | Default | Description |
|---|---|---|
|
optional |
Basic-auth credentials. |
|
optional |
NT domain for NTLM auth. |
|
optional |
Auth scheme hint: |
|
optional |
Outbound HTTP proxy. |
|
no default |
|
|
|
HTTP connection-pool size. |
|
|
Per-route connection-pool size. |
|
|
TCP connect timeout. |
|
|
Socket read timeout. |
|
|
Connection-manager request timeout. |
|
|
Hard cap on total time for a single fetch operation. |
|
|
Maximum number of redirects to follow. |
|
|
Maximum bytes to spool locally before failing. |
|
|
Maximum bytes of error response body to capture into the exception. |
|
empty |
Extra HTTP headers, formatted as |
|
empty |
Structured per-request headers as a |
|
optional |
JWT claims, for endpoints that accept JWT-bearer auth. |
|
optional |
HMAC secret for symmetric-key JWT signing. |
|
optional |
Base64-encoded private key for asymmetric (RSA/ECDSA) JWT signing. Mutually exclusive with |
Notes
-
Both basic auth and JWT auth may be configured at the same time, but only one will apply per request (JWT takes precedence when present).
-
For zero-redirect crawling, leave
maxRedirectsat0. The fetcher returns the redirect response as-is so the caller can decide what to do. -
overallTimeoutMillisis enforced by the fetcher itself, not the HTTP client — it covers slow drains and zombie connections that the lower-level timeouts may miss. -
For Atlassian Cloud endpoints that require an Atlassian Connect JWT, use the dedicated Atlassian JWT fetcher instead — it has the correct claim layout baked in.