Google Drive Plugin

The Google Drive plugin (tika-pipes-google-drive) provides a fetcher that retrieves files from a Google Drive. It is fetcher-only — pair it with another emitter and iterator.

Interface Component name Class

Fetcher

google-drive-fetcher

GoogleDriveFetcher

Google Drive Fetcher (google-drive-fetcher)

Fetches files from Google Drive by file ID. The fetch key is the Drive file ID.

{
  "fetchers": {
    "gdf": {
      "google-drive-fetcher": {
        "applicationName": "tika-pipes",
        "serviceAccountKeyBase64": "REDACTED_BASE64_SERVICE_ACCOUNT_JSON",
        "subjectUser": "user@example.com",
        "scopes": ["https://www.googleapis.com/auth/drive.readonly"],
        "spoolToTemp": true
      }
    }
  }
}

Configuration

Field Default Description

applicationName

tika-pipes

Application name sent to the Google API for logging/quota tracking.

serviceAccountKeyBase64

optional

Base64-encoded service-account JSON key. If absent, the SDK falls back to Application Default Credentials (env var GOOGLE_APPLICATION_CREDENTIALS or workload identity).

subjectUser

optional

For domain-wide delegation: the user to impersonate (e.g., user@example.com).

scopes

empty

OAuth scopes to request. Typical: ["https://www.googleapis.com/auth/drive.readonly"].

spoolToTemp

false

If true, files are spooled to a temp file before being parsed.

throttleSeconds

optional

Rate-limit array — consecutive failures sleep for the corresponding number of seconds.

Notes

  • The plugin uses Google’s official google-api-services-drive SDK.

  • For domain-wide delegation, the service account must have been granted that scope in the Google Workspace admin console — Tika config alone is not enough.

  • Service-account credentials in serviceAccountKeyBase64 are sensitive — use environment-variable substitution or external secret stores rather than checking the encoded JSON into source control.