Class Chunk

java.lang.Object
org.apache.tika.inference.Chunk

public class Chunk extends Object
A content chunk with multimodal locators and an optional embedding vector.

The text field holds the textual content of the chunk (may be null for non-text chunks such as image regions or audio segments).

The Locators object identifies where this chunk comes from in the original content across multiple modalities (text offsets, page/bbox, spatial regions, temporal ranges).

  • Constructor Details

    • Chunk

      public Chunk(String text, Locators locators)
    • Chunk

      public Chunk(String text, int startOffset, int endOffset)
      Convenience constructor for text-only chunks with character offsets.
  • Method Details

    • getText

      public String getText()
    • getLocators

      public Locators getLocators()
    • getStartOffset

      public int getStartOffset()
      Convenience: returns the start offset from the first TextLocator, or -1 if none.
    • getEndOffset

      public int getEndOffset()
      Convenience: returns the end offset from the first TextLocator, or -1 if none.
    • getVector

      public float[] getVector()
    • setVector

      public void setVector(float[] vector)