Class OpenAIImageEmbeddingParser

java.lang.Object
org.apache.tika.inference.OpenAIImageEmbeddingParser
All Implemented Interfaces:
Closeable, Serializable, AutoCloseable, Initializable, SelfConfiguring, Parser

public class OpenAIImageEmbeddingParser extends Object implements Parser, Initializable, Closeable
Parser that sends images to a CLIP-like embedding endpoint (OpenAI-compatible /v1/embeddings with image input) and stores the resulting vector in metadata.

This parser registers for the same image/ocr-* media types used by the PDF renderer's OCR pipeline, so it slots into the existing ocrStrategy mechanism. When configured, each rendered page image is sent to the embedding endpoint and the vector is stored as a serialized Chunk with a PaginatedLocator (when page number metadata is available).

The image is sent in the Jina CLIP format: {"input": [{"image": "data:image/png;base64,..."}]}.

Configuration key: "openai-image-embedding-parser"

Thread safety: instances are safe for concurrent parse(org.apache.tika.io.TikaInputStream, org.xml.sax.ContentHandler, org.apache.tika.metadata.Metadata, org.apache.tika.parser.ParseContext) calls once fully constructed. Setters must not be called concurrently with parse(org.apache.tika.io.TikaInputStream, org.xml.sax.ContentHandler, org.apache.tika.metadata.Metadata, org.apache.tika.parser.ParseContext).

Since:
Apache Tika 4.0
See Also:
  • Constructor Details

    • OpenAIImageEmbeddingParser

      public OpenAIImageEmbeddingParser()
    • OpenAIImageEmbeddingParser

      public OpenAIImageEmbeddingParser(ImageEmbeddingConfig config)
    • OpenAIImageEmbeddingParser

      public OpenAIImageEmbeddingParser(JsonConfig jsonConfig)
  • Method Details

    • getSupportedTypes

      public Set<MediaType> getSupportedTypes(ParseContext context)
      Description copied from interface: Parser
      Returns the set of media types supported by this parser when used with the given parse context.
      Specified by:
      getSupportedTypes in interface Parser
      Parameters:
      context - parse context
      Returns:
      immutable set of media types
    • parse

      public void parse(TikaInputStream tis, ContentHandler handler, Metadata metadata, ParseContext parseContext) throws IOException, SAXException, TikaException
      Description copied from interface: Parser
      Parses a document stream into a sequence of XHTML SAX events. Fills in related document metadata in the given metadata object.

      The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.

      Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.

      Specified by:
      parse in interface Parser
      handler - handler for the XHTML SAX events (output)
      metadata - document metadata (input and output)
      parseContext - parse context
      Throws:
      IOException - if the document stream could not be read
      SAXException - if the SAX events could not be processed
      TikaException - if the document could not be parsed
    • initialize

      public void initialize() throws TikaConfigException
      Description copied from interface: Initializable
      Called after all properties have been set to allow for validation and initialization that depends on multiple properties.
      Specified by:
      initialize in interface Initializable
      Throws:
      TikaConfigException - if there is a problem with the configuration
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
    • getBaseUrl

      public String getBaseUrl()
    • setBaseUrl

      public void setBaseUrl(String baseUrl) throws TikaConfigException
      Throws:
      TikaConfigException
    • getModel

      public String getModel()
    • setModel

      public void setModel(String model)
    • getApiKey

      public String getApiKey()
    • setApiKey

      public void setApiKey(String apiKey) throws TikaConfigException
      Throws:
      TikaConfigException
    • getTimeoutSeconds

      public int getTimeoutSeconds()
    • setTimeoutSeconds

      public void setTimeoutSeconds(int timeoutSeconds)
    • isSkipEmbedding

      public boolean isSkipEmbedding()
    • setSkipEmbedding

      public void setSkipEmbedding(boolean skipEmbedding)
    • getMinFileSizeToEmbed

      public long getMinFileSizeToEmbed()
    • setMinFileSizeToEmbed

      public void setMinFileSizeToEmbed(long minFileSizeToEmbed)
    • getMaxFileSizeToEmbed

      public long getMaxFileSizeToEmbed()
    • setMaxFileSizeToEmbed

      public void setMaxFileSizeToEmbed(long maxFileSizeToEmbed)
    • getEmbeddingsPath

      public String getEmbeddingsPath()
    • setEmbeddingsPath

      public void setEmbeddingsPath(String embeddingsPath)
      Set the URL path for embeddings requests. Default is /v1/embeddings. For Azure OpenAI, use /openai/deployments/{deployment}/embeddings?api-version=2024-02-01.
    • getApiKeyHeaderName

      public String getApiKeyHeaderName()
    • setApiKeyHeaderName

      public void setApiKeyHeaderName(String apiKeyHeaderName)
      Set the HTTP header name for API key authentication. Default is Authorization. For Azure OpenAI, set to api-key.
    • getApiKeyPrefix

      public String getApiKeyPrefix()
    • setApiKeyPrefix

      public void setApiKeyPrefix(String apiKeyPrefix)
      Set the prefix prepended to the API key in the auth header. Default is "Bearer " (with trailing space). For Azure OpenAI, set to "" (empty string).