Package org.apache.tika.inference
Class OpenAIEmbeddingFilter
java.lang.Object
org.apache.tika.metadata.filter.MetadataFilter
org.apache.tika.inference.AbstractEmbeddingFilter
org.apache.tika.inference.OpenAIEmbeddingFilter
- All Implemented Interfaces:
Closeable,Serializable,AutoCloseable
Metadata filter that calls an OpenAI-compatible
/v1/embeddings
endpoint to produce vectors for each text chunk.
Works with OpenAI, vLLM, Ollama, sentence-transformers servers, and any endpoint that implements the OpenAI Embeddings API.
Configuration key: "openai-embedding-filter"
- Since:
- Apache Tika 4.0
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidembed(List<Chunk> chunks, InferenceConfig config) Call the embeddings endpoint to fill in vectors on each chunk.voidsetApiKeyHeaderName(String apiKeyHeaderName) Set the HTTP header name for API key authentication.voidsetApiKeyPrefix(String apiKeyPrefix) Set the prefix prepended to the API key in the auth header.voidsetEmbeddingsPath(String embeddingsPath) Set the URL path for embeddings requests.Methods inherited from class org.apache.tika.inference.AbstractEmbeddingFilter
filter, getApiKey, getBaseUrl, getContentField, getDefaultConfig, getMaxBatchSize, getMaxChunkChars, getMaxChunks, getModel, getOutputField, getOverlapChars, getTimeoutSeconds, isClearContentAfterChunking, isSkipEmbedding, setApiKey, setBaseUrl, setClearContentAfterChunking, setContentField, setMaxBatchSize, setMaxChunkChars, setMaxChunks, setModel, setOutputField, setOverlapChars, setSkipEmbedding, setTimeoutSecondsMethods inherited from class org.apache.tika.metadata.filter.MetadataFilter
close, filter
-
Constructor Details
-
OpenAIEmbeddingFilter
public OpenAIEmbeddingFilter() -
OpenAIEmbeddingFilter
-
-
Method Details
-
embed
Description copied from class:AbstractEmbeddingFilterCall the embeddings endpoint to fill in vectors on each chunk. Implementations should setChunk.setVector(float[])on each chunk in the list.- Specified by:
embedin classAbstractEmbeddingFilter- Parameters:
chunks- the text chunks to embedconfig- the resolved config for this call- Throws:
IOException- on HTTP errorsTikaException- on API-level errors
-
getEmbeddingsPath
-
setEmbeddingsPath
Set the URL path for embeddings requests. Default is/v1/embeddings. For Azure OpenAI, use/openai/deployments/{deployment}/embeddings?api-version=2024-02-01. -
getApiKeyHeaderName
-
setApiKeyHeaderName
Set the HTTP header name for API key authentication. Default isAuthorization. For Azure OpenAI, set toapi-key. -
getApiKeyPrefix
-
setApiKeyPrefix
Set the prefix prepended to the API key in the auth header. Default is"Bearer "(with trailing space). For Azure OpenAI, set to""(empty string).
-