Interface FeatureExtractor<T>

Type Parameters:
T - the raw input type (e.g. String for text, byte[] for raw bytes)
All Known Implementing Classes:
Utf16ColumnFeatureExtractor

public interface FeatureExtractor<T>
Generic feature extractor that maps an input of type T to a fixed-length integer feature vector suitable for a LinearModel.
  • Method Summary

    Modifier and Type
    Method
    Description
    int[]
    extract(T input)
    Extract features from the given input.
    default int
    extractSparseInto(T input, int[] dense, int[] touched)
    Sparse extraction into caller-owned reusable buffers: populates dense with feature counts, writes the indices of non-zero entries into touched, and returns how many indices were written.
    int
     
  • Method Details

    • extract

      int[] extract(T input)
      Extract features from the given input.
      Parameters:
      input - raw input (may be null)
      Returns:
      int array of length getNumBuckets() with feature counts
    • getNumBuckets

      int getNumBuckets()
      Returns:
      number of hash buckets (feature-vector dimension)
    • extractSparseInto

      default int extractSparseInto(T input, int[] dense, int[] touched)
      Sparse extraction into caller-owned reusable buffers: populates dense with feature counts, writes the indices of non-zero entries into touched, and returns how many indices were written. Callers are responsible for clearing the touched entries of dense before reuse.

      Default implementation delegates to extract(T). Extractors that can do better (avoid allocating the full dense vector, or scan the input only once) should override.