Class ChunkSerializer

java.lang.Object
org.apache.tika.inference.ChunkSerializer

public final class ChunkSerializer extends Object
Serializes and deserializes a list of Chunk objects to/from JSON. Vectors are stored as base64-encoded little-endian float32 via VectorSerializer. Locators are nested under a "locators" object with optional text, paginated, spatial, and temporal arrays.
  • Method Details

    • toJson

      public static String toJson(List<Chunk> chunks) throws IOException
      Serialize chunks to a JSON array string.
      Throws:
      IOException
    • mergeInto

      public static void mergeInto(Metadata metadata, List<Chunk> newChunks) throws IOException
      Reads any existing chunks from the metadata field, appends the new chunks, and writes the merged list back. This allows multiple components (text chunker, image embedder, etc.) to contribute to the same chunks array.
      Parameters:
      metadata - the metadata to read from and write to
      newChunks - chunks to append
      Throws:
      IOException
    • fromJson

      public static List<Chunk> fromJson(String json) throws IOException
      Deserialize a JSON array string back to a list of chunks.
      Throws:
      IOException