Package org.apache.tika.parser.vlm
Class GeminiVLMParser
java.lang.Object
org.apache.tika.parser.vlm.AbstractVLMParser
org.apache.tika.parser.vlm.GeminiVLMParser
- All Implemented Interfaces:
Serializable,Initializable,SelfConfiguring,Parser
VLM parser for the Google Gemini
generateContent API.
Supports both images and PDFs natively (Gemini processes PDFs with native vision, understanding layout, charts, tables, and diagrams — not just extracting text).
The API key is sent as a key query parameter (not a Bearer header).
Default base URL points to the public Gemini API; change it for Vertex AI or a proxy.
Configuration key: "gemini-vlm-parser"
- Since:
- Apache Tika 4.0
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.tika.parser.vlm.AbstractVLMParser
AbstractVLMParser.HttpCall -
Field Summary
Fields inherited from class org.apache.tika.parser.vlm.AbstractVLMParser
VLM_COMPLETION_TOKENS, VLM_META, VLM_MODEL, VLM_PROMPT_TOKENS -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected AbstractVLMParser.HttpCallbuildHttpCall(VLMOCRConfig config, String base64Data, String mimeType) Build a fully formedAbstractVLMParser.HttpCallfor the target API.protected Stringprotected StringextractResponseText(String responseBody, Metadata metadata) Parse the API response body and extract the model's text output.protected StringgetHealthCheckUrl(VLMOCRConfig config) Methods inherited from class org.apache.tika.parser.vlm.AbstractVLMParser
getApiKey, getBaseUrl, getConfig, getDefaultConfig, getMaxFileSizeToOcr, getMaxTokens, getMinFileSizeToOcr, getModel, getPrompt, getSupportedTypes, getTimeoutSeconds, initialize, isInlineContent, isServerAvailable, isSkipOcr, parse, setApiKey, setBaseUrl, setInlineContent, setMaxFileSizeToOcr, setMaxTokens, setMinFileSizeToOcr, setModel, setPrompt, setSkipOcr, setTimeoutSeconds, stripTrailingSlash
-
Constructor Details
-
GeminiVLMParser
public GeminiVLMParser() -
GeminiVLMParser
-
GeminiVLMParser
-
-
Method Details
-
buildHttpCall
protected AbstractVLMParser.HttpCall buildHttpCall(VLMOCRConfig config, String base64Data, String mimeType) Description copied from class:AbstractVLMParserBuild a fully formedAbstractVLMParser.HttpCallfor the target API.- Specified by:
buildHttpCallin classAbstractVLMParser- Parameters:
config- resolved config for this parsebase64Data- base64-encoded version of the file bytesmimeType- the MIME type of the input (e.g.image/png)- Returns:
- a ready-to-execute
AbstractVLMParser.HttpCall
-
extractResponseText
Description copied from class:AbstractVLMParserParse the API response body and extract the model's text output. Implementations should also populateAbstractVLMParser.VLM_PROMPT_TOKENSandAbstractVLMParser.VLM_COMPLETION_TOKENSin metadata when the information is available.- Specified by:
extractResponseTextin classAbstractVLMParser- Parameters:
responseBody- raw JSON response bodymetadata- metadata to enrich with token counts- Returns:
- the extracted text content
- Throws:
TikaException
-
getSupportedMediaTypes
- Specified by:
getSupportedMediaTypesin classAbstractVLMParser- Returns:
- the set of media types this parser handles (images, PDFs, etc.)
-
configKey
- Specified by:
configKeyin classAbstractVLMParser- Returns:
- the JSON config key for
ParseContextConfiglookup (e.g."openai-vlm-parser","gemini-vlm-parser")
-
getHealthCheckUrl
- Specified by:
getHealthCheckUrlin classAbstractVLMParser- Returns:
- an optional health-check URL to probe at init time, or
nullto skip the probe
-