Class OpenAIEmbeddingFilter

All Implemented Interfaces:
Closeable, Serializable, AutoCloseable

public class OpenAIEmbeddingFilter extends AbstractEmbeddingFilter
Metadata filter that calls an OpenAI-compatible /v1/embeddings endpoint to produce vectors for each text chunk.

Works with OpenAI, vLLM, Ollama, sentence-transformers servers, and any endpoint that implements the OpenAI Embeddings API.

Configuration key: "openai-embedding-filter"

Since:
Apache Tika 4.0
See Also:
  • Constructor Details

    • OpenAIEmbeddingFilter

      public OpenAIEmbeddingFilter()
    • OpenAIEmbeddingFilter

      public OpenAIEmbeddingFilter(InferenceConfig config)
  • Method Details

    • embed

      protected void embed(List<Chunk> chunks, InferenceConfig config) throws IOException, TikaException
      Description copied from class: AbstractEmbeddingFilter
      Call the embeddings endpoint to fill in vectors on each chunk. Implementations should set Chunk.setVector(float[]) on each chunk in the list.
      Specified by:
      embed in class AbstractEmbeddingFilter
      Parameters:
      chunks - the text chunks to embed
      config - the resolved config for this call
      Throws:
      IOException - on HTTP errors
      TikaException - on API-level errors
    • getEmbeddingsPath

      public String getEmbeddingsPath()
    • setEmbeddingsPath

      public void setEmbeddingsPath(String embeddingsPath)
      Set the URL path for embeddings requests. Default is /v1/embeddings. For Azure OpenAI, use /openai/deployments/{deployment}/embeddings?api-version=2024-02-01.
    • getApiKeyHeaderName

      public String getApiKeyHeaderName()
    • setApiKeyHeaderName

      public void setApiKeyHeaderName(String apiKeyHeaderName)
      Set the HTTP header name for API key authentication. Default is Authorization. For Azure OpenAI, set to api-key.
    • getApiKeyPrefix

      public String getApiKeyPrefix()
    • setApiKeyPrefix

      public void setApiKeyPrefix(String apiKeyPrefix)
      Set the prefix prepended to the API key in the auth header. Default is "Bearer " (with trailing space). For Azure OpenAI, set to "" (empty string).