Package org.apache.tika.inference
Class Chunk
java.lang.Object
org.apache.tika.inference.Chunk
A content chunk with multimodal locators and an optional embedding vector.
The text field holds the textual content of the chunk (may be
null for non-text chunks such as image regions or audio segments).
The Locators object identifies where this chunk comes from in
the original content across multiple modalities (text offsets, page/bbox,
spatial regions, temporal ranges).
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintConvenience: returns the end offset from the firstTextLocator, or -1 if none.intConvenience: returns the start offset from the firstTextLocator, or -1 if none.getText()float[]voidsetVector(float[] vector)
-
Constructor Details
-
Chunk
-
Chunk
Convenience constructor for text-only chunks with character offsets.
-
-
Method Details
-
getText
-
getLocators
-
getStartOffset
public int getStartOffset()Convenience: returns the start offset from the firstTextLocator, or -1 if none. -
getEndOffset
public int getEndOffset()Convenience: returns the end offset from the firstTextLocator, or -1 if none. -
getVector
public float[] getVector() -
setVector
public void setVector(float[] vector)
-