Package org.apache.tika.inference


package org.apache.tika.inference
  • Classes
    Class
    Description
    Base class for metadata filters that chunk text content and call a remote embeddings endpoint to produce vectors for each chunk.
    A content chunk with multimodal locators and an optional embedding vector.
    Serializes and deserializes a list of Chunk objects to/from JSON.
    Configuration for image embedding parsers that call a CLIP-like vector endpoint.
    Runtime-only config that prevents modification of security-sensitive and cost-sensitive fields (baseUrl, apiKey, model) at parse time.
    Configuration for the inference metadata filters.
    Runtime-only config that prevents modification of security-sensitive and cost-sensitive fields (baseUrl, apiKey, model) at parse time.
    Splits markdown text into chunks that respect structural boundaries.
    Metadata filter that calls an OpenAI-compatible /v1/embeddings endpoint to produce vectors for each text chunk.
    Parser that sends images to a CLIP-like embedding endpoint (OpenAI-compatible /v1/embeddings with image input) and stores the resulting vector in metadata.
    Serializes and deserializes float vectors as base64-encoded big-endian float32 byte arrays.