Package org.apache.tika.inference
Class ChunkSerializer
java.lang.Object
org.apache.tika.inference.ChunkSerializer
Serializes and deserializes a list of
Chunk objects to/from JSON.
Vectors are stored as base64-encoded little-endian float32 via
VectorSerializer. Locators are nested under a "locators"
object with optional text, paginated, spatial,
and temporal arrays.-
Method Summary
Modifier and TypeMethodDescriptionDeserialize a JSON array string back to a list of chunks.static voidReads any existing chunks from the metadata field, appends the new chunks, and writes the merged list back.static StringSerialize chunks to a JSON array string.
-
Method Details
-
toJson
Serialize chunks to a JSON array string.- Throws:
IOException
-
mergeInto
Reads any existing chunks from the metadata field, appends the new chunks, and writes the merged list back. This allows multiple components (text chunker, image embedder, etc.) to contribute to the same chunks array.- Parameters:
metadata- the metadata to read from and write tonewChunks- chunks to append- Throws:
IOException
-
fromJson
Deserialize a JSON array string back to a list of chunks.- Throws:
IOException
-