org.apache.tika.ml.LinearModel

public class LinearModel extends Object

INT8-quantized multinomial logistic regression model for classification.

Binary format (big-endian, magic "LDM1"):

   Offset  Field
   0       4B magic: 0x4C444D31
   4       4B version: 1 or 2
   8       4B numBuckets (B)
   12      4B numClasses (C)
   16+     Labels: C entries of [2B length + UTF-8 bytes]
           Scales: C × 4B float (per-class dequantization)
           Biases: C × 4B float (per-class bias term)
           (V2 only)
           1B hasCalibration flag
           If hasCalibration: ClassMean: C × 4B float, ClassStd: C × 4B float
           Weights: B × C bytes (bucket-major, INT8 signed)

Weights are stored in bucket-major order: weights[bucket * numClasses + class]. This layout is optimal for the sparse dot-product in predict(int[]) — each non-zero bucket reads a contiguous run of numClasses bytes, ideal for SIMD and cache prefetching.

Calibration (V2): optional per-class mean/std of training-set logits. When present, predictCalibratedLogits(int[]) standardizes raw logits so cross-specialist pooling can compare "unusually confident" signals on equal footing. V1 files are still readable; calibration is absent and predictCalibratedLogits(int[]) falls back to raw logits.

Field Summary

Fields

Modifier and Type

Field

Description

static final int

MAGIC

static final int

VERSION

Latest version we emit.

static final int

VERSION_V1

static final int

VERSION_V2
Constructor Summary

Constructors

Constructor

Description

LinearModel(int numBuckets, int numClasses, String[] labels, float[] scales, float[] biases, byte[][] weights)

Construct without calibration (V1-compatible).

LinearModel(int numBuckets, int numClasses, String[] labels, float[] scales, float[] biases, byte[][] weights, float[] classMean, float[] classStd)

Construct with optional calibration.
Method Summary

Modifier and Type

Method

Description

static float

entropy(float[] probs)

Shannon entropy (in bits) of a probability distribution.

float[]

getBiases()

float[]

getClassMean()

float[]

getClassStd()

String

getLabel(int classIndex)

String[]

getLabels()

int

getNumBuckets()

int

getNumClasses()

float[]

getScales()

byte[][]

getWeights()

Return weights in class-major [class][bucket] layout.

boolean

hasCalibration()

true if this model carries per-class calibration statistics.

static LinearModel

load(InputStream is)

Load a model from an input stream.

static LinearModel

loadFromClasspath(String resourcePath)

Load a model from the classpath.

static LinearModel

loadFromPath(Path path)

Load a model from a file on disk.

float[]

predict(int[] features)

Compute softmax probabilities for the given feature vector.

float[]

predictCalibratedLogits(int[] features)

Compute calibrated logits: (raw - classMean[c]) / classStd[c] for each class, if the model carries calibration statistics, else raw logits (no-op).

float[]

predictLogits(int[] features)

Compute raw logits for the given feature vector (before softmax).

float[]

predictLogitsDense(float[] features)

Compute logits for a dense float feature vector.

void

save(OutputStream os)

Write the model in LDM binary format.

static float[]

softmax(float[] logits)

In-place softmax with numerical stability.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- MAGIC
  
  public static final int MAGIC
  See Also:
  
  Constant Field Values
- VERSION_V1
  
  public static final int VERSION_V1
  See Also:
  
  Constant Field Values
- VERSION_V2
  
  public static final int VERSION_V2
  See Also:
  
  Constant Field Values
- VERSION
  
  public static final int VERSION
  
  Latest version we emit.
  See Also:
  
  Constant Field Values
Constructor Details
- LinearModel
  
  public LinearModel(int numBuckets, int numClasses, String[] labels, float[] scales, float[] biases, byte[][] weights)
  
  Construct without calibration (V1-compatible). Transposes class-major weights to bucket-major flat layout internally.
- LinearModel
  
  public LinearModel(int numBuckets, int numClasses, String[] labels, float[] scales, float[] biases, byte[][] weights, float[] classMean, float[] classStd)
  
  Construct with optional calibration. Pass classMean and classStd (each of length numClasses) to enable z-score calibration in predictCalibratedLogits(int[]); pass null for both to skip. Any classStd[c] == 0 is rewritten to 1.0f to avoid divide-by-zero.
Method Details
- loadFromClasspath
  
  public static LinearModel loadFromClasspath(String resourcePath) throws IOException
  
  Load a model from the classpath. Transparently handles both plain LDM1 binaries and gzip-compressed LDM1 binaries (detected by magic bytes).
  
  Throws:
  
  IOException
- loadFromPath
  
  public static LinearModel loadFromPath(Path path) throws IOException
  
  Load a model from a file on disk. Transparently handles both plain and gzip-compressed LDM1 files.
  
  Throws:
  
  IOException
- load
  
  public static LinearModel load(InputStream is) throws IOException
  
  Load a model from an input stream. Transparently handles both plain LDM1 binaries and gzip-compressed ones: if the first two bytes are the gzip magic 0x1F 0x8B the stream is wrapped in a GZIPInputStream before reading.
  
  Throws:
  
  IOException
- save
  
  public void save(OutputStream os) throws IOException
  
  Write the model in LDM binary format. Emits V2 (with or without calibration block depending on whether this model has calibration).
  
  Throws:
  
  IOException
- predictLogits
  
  public float[] predictLogits(int[] features)
  
  Compute raw logits for the given feature vector (before softmax). Uses a sparse inner loop — only non-zero buckets are visited.
  
  Parameters:
  
  features - int array of size numBuckets
  
  Returns:
  
  float array of size numClasses (raw, unnormalized logits)
- predictLogitsDense
  
  public float[] predictLogitsDense(float[] features)
  
  Compute logits for a dense float feature vector. Unlike predictLogits(int[]), which assumes sparse integer counts and applies per-bucket clipping to suppress single-feature dominance in hashed representations, this method just performs a plain dot product — appropriate for adjudicator / meta-model feature vectors where each slot is already a calibrated quantity (specialist logit, z-score, one-hot flag, etc.).
  
  Parameters:
  
  features - float array of length numBuckets
  
  Returns:
  
  float array of length numClasses (raw logits)
- predict
  
  public float[] predict(int[] features)
  
  Compute softmax probabilities for the given feature vector.
  
  Parameters:
  
  features - int array of size numBuckets
  
  Returns:
  
  float array of size numClasses (softmax probabilities, sum ≈ 1.0)
- predictCalibratedLogits
  
  public float[] predictCalibratedLogits(int[] features)
  
  Compute calibrated logits: (raw - classMean[c]) / classStd[c] for each class, if the model carries calibration statistics, else raw logits (no-op). Calibrated logits are comparable across specialists with different natural logit scales — they express "how many standard deviations above this class's training-set mean" rather than raw weight arithmetic.
- hasCalibration
  
  public boolean hasCalibration()
  
  true if this model carries per-class calibration statistics.
- getClassMean
  
  public float[] getClassMean()
- getClassStd
  
  public float[] getClassStd()
- softmax
  
  public static float[] softmax(float[] logits)
  
  In-place softmax with numerical stability.
- entropy
  
  public static float entropy(float[] probs)
  
  Shannon entropy (in bits) of a probability distribution.
- getNumBuckets
  
  public int getNumBuckets()
- getNumClasses
  
  public int getNumClasses()
- getLabels
  
  public String[] getLabels()
- getLabel
  
  public String getLabel(int classIndex)
- getScales
  
  public float[] getScales()
- getBiases
  
  public float[] getBiases()
- getWeights
  
  public byte[][] getWeights()
  
  Return weights in class-major [class][bucket] layout. Creates a new array each call.

Class LinearModel

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

MAGIC

VERSION_V1

VERSION_V2

VERSION

Constructor Details

LinearModel

LinearModel

Method Details

loadFromClasspath

loadFromPath

load

save

predictLogits

predictLogitsDense

predict

predictCalibratedLogits

hasCalibration

getClassMean

getClassStd

softmax

entropy

getNumBuckets

getNumClasses

getLabels

getLabel

getScales

getBiases

getWeights