Class EncodingDetectorContext

java.lang.Object
org.apache.tika.detect.EncodingDetectorContext

public class EncodingDetectorContext extends Object
Context object that collects encoding detection results from base detectors. Stored in ParseContext by CompositeEncodingDetector so that a MetaEncodingDetector can see all candidates and arbitrate. Removed after detection to prevent contamination during recursive parsing.

Each base detector contributes a ranked List of EncodingResults. The context exposes the top result from each detector as the primary signal, and provides access to all candidates for richer arbitration strategies.

Since:
Apache Tika 3.2
  • Constructor Details

    • EncodingDetectorContext

      public EncodingDetectorContext()
  • Method Details

    • addResult

      public void addResult(List<EncodingResult> encodingResults, String detectorName)
      Record the ranked results from a child detector.
      Parameters:
      encodingResults - ranked results, highest confidence first; must not be empty
      detectorName - simple class name of the detector
    • getResults

      public List<EncodingDetectorContext.Result> getResults()
      Returns:
      unmodifiable list of all per-detector results in detection order
    • getUniqueCharsets

      public Set<Charset> getUniqueCharsets()
      Returns the unique charsets from ALL results of every detector, in detection order (top result first within each detector).

      Using all candidates rather than just each detector's top-1 is important when a single detector returns a ranked list (e.g., Mojibuster on a short probe returns [windows-1252, windows-1250, Shift-JIS]). If only the top-1 were used, CharSoup would see a single charset and return "unanimous" without ever attempting arbitration.

    • getTopConfidenceFor

      public float getTopConfidenceFor(Charset charset)
      Returns the highest confidence seen for the given charset across all detector results (not just top results). Useful for arbitrators that want to propagate the base detector's confidence for the winning charset.
    • setArbitrationInfo

      public void setArbitrationInfo(String info)
      Set by the meta detector to describe how it reached its decision. Values: "unanimous", "scored", "no-stream", "empty-stream", etc.
    • getArbitrationInfo

      public String getArbitrationInfo()