Class AbstractVLMParser

java.lang.Object
org.apache.tika.parser.vlm.AbstractVLMParser
All Implemented Interfaces:
Serializable, Initializable, SelfConfiguring, Parser
Direct Known Subclasses:
ClaudeVLMParser, GeminiVLMParser, OpenAIVLMParser

public abstract class AbstractVLMParser extends Object implements Parser, Initializable
Abstract base class for parsers that delegate to a remote Vision-Language Model (VLM) endpoint for OCR and document understanding.

Subclasses only need to implement the API-specific request/response serialization and declare their supported media types. All common logic (HTTP transport, timeout handling, inline content, markdown-to-XHTML rendering, config resolution) lives here.

Since:
Apache Tika 4.0
See Also:
  • Field Details

    • VLM_META

      public static final String VLM_META
      Metadata namespace for VLM properties.
      See Also:
    • VLM_MODEL

      public static final Property VLM_MODEL
    • VLM_PROMPT_TOKENS

      public static final Property VLM_PROMPT_TOKENS
    • VLM_COMPLETION_TOKENS

      public static final Property VLM_COMPLETION_TOKENS
  • Constructor Details

    • AbstractVLMParser

      protected AbstractVLMParser(VLMOCRConfig config)
  • Method Details

    • buildHttpCall

      protected abstract AbstractVLMParser.HttpCall buildHttpCall(VLMOCRConfig config, String base64Data, String mimeType)
      Build a fully formed AbstractVLMParser.HttpCall for the target API.
      Parameters:
      config - resolved config for this parse
      base64Data - base64-encoded version of the file bytes
      mimeType - the MIME type of the input (e.g. image/png)
      Returns:
      a ready-to-execute AbstractVLMParser.HttpCall
    • extractResponseText

      protected abstract String extractResponseText(String responseBody, Metadata metadata) throws TikaException
      Parse the API response body and extract the model's text output. Implementations should also populate VLM_PROMPT_TOKENS and VLM_COMPLETION_TOKENS in metadata when the information is available.
      Parameters:
      responseBody - raw JSON response body
      metadata - metadata to enrich with token counts
      Returns:
      the extracted text content
      Throws:
      TikaException
    • getSupportedMediaTypes

      protected abstract Set<MediaType> getSupportedMediaTypes()
      Returns:
      the set of media types this parser handles (images, PDFs, etc.)
    • configKey

      protected abstract String configKey()
      Returns:
      the JSON config key for ParseContextConfig lookup (e.g. "openai-vlm-parser", "gemini-vlm-parser")
    • getHealthCheckUrl

      protected abstract String getHealthCheckUrl(VLMOCRConfig config)
      Returns:
      an optional health-check URL to probe at init time, or null to skip the probe
    • getSupportedTypes

      public Set<MediaType> getSupportedTypes(ParseContext context)
      Description copied from interface: Parser
      Returns the set of media types supported by this parser when used with the given parse context.
      Specified by:
      getSupportedTypes in interface Parser
      Parameters:
      context - parse context
      Returns:
      immutable set of media types
    • parse

      public void parse(TikaInputStream tis, ContentHandler handler, Metadata metadata, ParseContext parseContext) throws IOException, SAXException, TikaException
      Description copied from interface: Parser
      Parses a document stream into a sequence of XHTML SAX events. Fills in related document metadata in the given metadata object.

      The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.

      Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.

      Specified by:
      parse in interface Parser
      handler - handler for the XHTML SAX events (output)
      metadata - document metadata (input and output)
      parseContext - parse context
      Throws:
      IOException - if the document stream could not be read
      SAXException - if the SAX events could not be processed
      TikaException - if the document could not be parsed
    • initialize

      public void initialize() throws TikaConfigException
      Description copied from interface: Initializable
      Called after all properties have been set to allow for validation and initialization that depends on multiple properties.
      Specified by:
      initialize in interface Initializable
      Throws:
      TikaConfigException - if there is a problem with the configuration
    • getConfig

      protected VLMOCRConfig getConfig(ParseContext parseContext) throws TikaConfigException, IOException
      Throws:
      TikaConfigException
      IOException
    • stripTrailingSlash

      protected static String stripTrailingSlash(String url)
    • getDefaultConfig

      protected VLMOCRConfig getDefaultConfig()
    • getBaseUrl

      public String getBaseUrl()
    • setBaseUrl

      public void setBaseUrl(String baseUrl) throws TikaConfigException
      Throws:
      TikaConfigException
    • getModel

      public String getModel()
    • setModel

      public void setModel(String model)
    • getPrompt

      public String getPrompt()
    • setPrompt

      public void setPrompt(String prompt)
    • getMaxTokens

      public int getMaxTokens()
    • setMaxTokens

      public void setMaxTokens(int maxTokens)
    • getTimeoutSeconds

      public int getTimeoutSeconds()
    • setTimeoutSeconds

      public void setTimeoutSeconds(int timeoutSeconds)
    • getApiKey

      public String getApiKey()
    • setApiKey

      public void setApiKey(String apiKey) throws TikaConfigException
      Throws:
      TikaConfigException
    • isInlineContent

      public boolean isInlineContent()
    • setInlineContent

      public void setInlineContent(boolean inlineContent)
    • isSkipOcr

      public boolean isSkipOcr()
    • setSkipOcr

      public void setSkipOcr(boolean skipOcr)
    • getMinFileSizeToOcr

      public long getMinFileSizeToOcr()
    • setMinFileSizeToOcr

      public void setMinFileSizeToOcr(long minFileSizeToOcr)
    • getMaxFileSizeToOcr

      public long getMaxFileSizeToOcr()
    • setMaxFileSizeToOcr

      public void setMaxFileSizeToOcr(long maxFileSizeToOcr)
    • isServerAvailable

      public boolean isServerAvailable()