Package org.apache.tika.parser.vlm
Class AbstractVLMParser
java.lang.Object
org.apache.tika.parser.vlm.AbstractVLMParser
- All Implemented Interfaces:
Serializable,Initializable,SelfConfiguring,Parser
- Direct Known Subclasses:
ClaudeVLMParser,GeminiVLMParser,OpenAIVLMParser
Abstract base class for parsers that delegate to a remote Vision-Language
Model (VLM) endpoint for OCR and document understanding.
Subclasses only need to implement the API-specific request/response serialization and declare their supported media types. All common logic (HTTP transport, timeout handling, inline content, markdown-to-XHTML rendering, config resolution) lives here.
- Since:
- Apache Tika 4.0
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static final recordEncapsulates a fully built HTTP request for a VLM API call. -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract AbstractVLMParser.HttpCallbuildHttpCall(VLMOCRConfig config, String base64Data, String mimeType) Build a fully formedAbstractVLMParser.HttpCallfor the target API.protected abstract Stringprotected abstract StringextractResponseText(String responseBody, Metadata metadata) Parse the API response body and extract the model's text output.protected VLMOCRConfiggetConfig(ParseContext parseContext) protected VLMOCRConfigprotected abstract StringgetHealthCheckUrl(VLMOCRConfig config) longintlonggetModel()getSupportedTypes(ParseContext context) Returns the set of media types supported by this parser when used with the given parse context.intvoidCalled after all properties have been set to allow for validation and initialization that depends on multiple properties.booleanbooleanbooleanvoidparse(TikaInputStream tis, ContentHandler handler, Metadata metadata, ParseContext parseContext) Parses a document stream into a sequence of XHTML SAX events.voidvoidsetBaseUrl(String baseUrl) voidsetInlineContent(boolean inlineContent) voidsetMaxFileSizeToOcr(long maxFileSizeToOcr) voidsetMaxTokens(int maxTokens) voidsetMinFileSizeToOcr(long minFileSizeToOcr) voidvoidvoidsetSkipOcr(boolean skipOcr) voidsetTimeoutSeconds(int timeoutSeconds) protected static StringstripTrailingSlash(String url)
-
Field Details
-
VLM_META
Metadata namespace for VLM properties.- See Also:
-
VLM_MODEL
-
VLM_PROMPT_TOKENS
-
VLM_COMPLETION_TOKENS
-
-
Constructor Details
-
AbstractVLMParser
-
-
Method Details
-
buildHttpCall
protected abstract AbstractVLMParser.HttpCall buildHttpCall(VLMOCRConfig config, String base64Data, String mimeType) Build a fully formedAbstractVLMParser.HttpCallfor the target API.- Parameters:
config- resolved config for this parsebase64Data- base64-encoded version of the file bytesmimeType- the MIME type of the input (e.g.image/png)- Returns:
- a ready-to-execute
AbstractVLMParser.HttpCall
-
extractResponseText
protected abstract String extractResponseText(String responseBody, Metadata metadata) throws TikaException Parse the API response body and extract the model's text output. Implementations should also populateVLM_PROMPT_TOKENSandVLM_COMPLETION_TOKENSin metadata when the information is available.- Parameters:
responseBody- raw JSON response bodymetadata- metadata to enrich with token counts- Returns:
- the extracted text content
- Throws:
TikaException
-
getSupportedMediaTypes
- Returns:
- the set of media types this parser handles (images, PDFs, etc.)
-
configKey
- Returns:
- the JSON config key for
ParseContextConfiglookup (e.g."openai-vlm-parser","gemini-vlm-parser")
-
getHealthCheckUrl
- Returns:
- an optional health-check URL to probe at init time, or
nullto skip the probe
-
getSupportedTypes
Description copied from interface:ParserReturns the set of media types supported by this parser when used with the given parse context.- Specified by:
getSupportedTypesin interfaceParser- Parameters:
context- parse context- Returns:
- immutable set of media types
-
parse
public void parse(TikaInputStream tis, ContentHandler handler, Metadata metadata, ParseContext parseContext) throws IOException, SAXException, TikaException Description copied from interface:ParserParses a document stream into a sequence of XHTML SAX events. Fills in related document metadata in the given metadata object.The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.
Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.
- Specified by:
parsein interfaceParserhandler- handler for the XHTML SAX events (output)metadata- document metadata (input and output)parseContext- parse context- Throws:
IOException- if the document stream could not be readSAXException- if the SAX events could not be processedTikaException- if the document could not be parsed
-
initialize
Description copied from interface:InitializableCalled after all properties have been set to allow for validation and initialization that depends on multiple properties.- Specified by:
initializein interfaceInitializable- Throws:
TikaConfigException- if there is a problem with the configuration
-
getConfig
- Throws:
TikaConfigExceptionIOException
-
stripTrailingSlash
-
getDefaultConfig
-
getBaseUrl
-
setBaseUrl
- Throws:
TikaConfigException
-
getModel
-
setModel
-
getPrompt
-
setPrompt
-
getMaxTokens
public int getMaxTokens() -
setMaxTokens
public void setMaxTokens(int maxTokens) -
getTimeoutSeconds
public int getTimeoutSeconds() -
setTimeoutSeconds
public void setTimeoutSeconds(int timeoutSeconds) -
getApiKey
-
setApiKey
- Throws:
TikaConfigException
-
isInlineContent
public boolean isInlineContent() -
setInlineContent
public void setInlineContent(boolean inlineContent) -
isSkipOcr
public boolean isSkipOcr() -
setSkipOcr
public void setSkipOcr(boolean skipOcr) -
getMinFileSizeToOcr
public long getMinFileSizeToOcr() -
setMinFileSizeToOcr
public void setMinFileSizeToOcr(long minFileSizeToOcr) -
getMaxFileSizeToOcr
public long getMaxFileSizeToOcr() -
setMaxFileSizeToOcr
public void setMaxFileSizeToOcr(long maxFileSizeToOcr) -
isServerAvailable
public boolean isServerAvailable()
-