HTTP Plugin

The HTTP plugin (tika-pipes-http) provides a fetcher that downloads documents over HTTP(S). It is fetcher-only — pair it with another emitter and iterator.

Interface Component name Class

Fetcher

http-fetcher

HttpFetcher

HTTP Fetcher (http-fetcher)

Fetches document bytes from an HTTP(S) URL. The fetch key is the URL.

{
  "fetchers": {
    "httpf": {
      "http-fetcher": {
        "userName": "tika",
        "password": "REDACTED",
        "authScheme": "basic",
        "userAgent": "tika-pipes/1.0",
        "maxConnections": 2000,
        "maxConnectionsPerRoute": 1000,
        "connectTimeoutMillis": 30000,
        "socketTimeoutMillis": 60000,
        "requestTimeoutMillis": 60000,
        "overallTimeoutMillis": 120000,
        "maxRedirects": 5,
        "maxSpoolSize": -1,
        "httpHeaders": ["Accept: application/octet-stream"]
      }
    }
  }
}

Configuration

Field Default Description

userName / password

optional

Basic-auth credentials.

ntDomain

optional

NT domain for NTLM auth.

authScheme

optional

Auth scheme hint: basic, digest, ntlm, or unset.

proxyHost / proxyPort

optional

Outbound HTTP proxy.

userAgent

no default

User-Agent header sent on each request.

maxConnections

2000

HTTP connection-pool size.

maxConnectionsPerRoute

1000

Per-route connection-pool size.

connectTimeoutMillis

120000

TCP connect timeout.

socketTimeoutMillis

120000

Socket read timeout.

requestTimeoutMillis

120000

Connection-manager request timeout.

overallTimeoutMillis

120000

Hard cap on total time for a single fetch operation.

maxRedirects

0

Maximum number of redirects to follow. 0 means follow none.

maxSpoolSize

-1

Maximum bytes to spool locally before failing. -1 means no limit.

maxErrMsgSize

10000000

Maximum bytes of error response body to capture into the exception.

httpHeaders

empty

Extra HTTP headers, formatted as "Header: value" strings (list).

httpRequestHeaders

empty

Structured per-request headers as a Header → [values] map. Used when a header has multiple values.

jwtIssuer / jwtSubject / jwtExpiresInSeconds

optional

JWT claims, for endpoints that accept JWT-bearer auth.

jwtSecret

optional

HMAC secret for symmetric-key JWT signing.

jwtPrivateKeyBase64

optional

Base64-encoded private key for asymmetric (RSA/ECDSA) JWT signing. Mutually exclusive with jwtSecret.

Notes

  • Both basic auth and JWT auth may be configured at the same time, but only one will apply per request (JWT takes precedence when present).

  • For zero-redirect crawling, leave maxRedirects at 0. The fetcher returns the redirect response as-is so the caller can decide what to do.

  • overallTimeoutMillis is enforced by the fetcher itself, not the HTTP client — it covers slow drains and zombie connections that the lower-level timeouts may miss.

  • For Atlassian Cloud endpoints that require an Atlassian Connect JWT, use the dedicated Atlassian JWT fetcher instead — it has the correct claim layout baked in.