Class EncodingResult

java.lang.Object
org.apache.tika.detect.EncodingResult

public class EncodingResult extends Object
A charset detection result pairing a Charset with a confidence score and a EncodingResult.ResultType indicating the nature of the evidence.

Result types

  • EncodingResult.ResultType.DECLARATIVE — the document explicitly stated its encoding (BOM, HTML <meta charset>). These are authoritative claims about author intent and get preference over inferred results when consistent with the actual bytes.
  • EncodingResult.ResultType.STRUCTURAL — byte-grammar proof (ISO-2022 escape sequences, UTF-8 multibyte validation). The encoding is proven by the byte structure itself, independent of any declaration.
  • EncodingResult.ResultType.STATISTICAL — probabilistic inference from a statistical model. The confidence float is meaningful here for ranking among candidates; for DECLARATIVE and STRUCTURAL results it is conventionally 1.0 but carries no additional information.
Since:
Apache Tika 4.0
  • Constructor Details

    • EncodingResult

      public EncodingResult(Charset charset, float confidence)
      Constructs a STATISTICAL result. Existing detectors that do not yet classify their evidence type default to statistical (probabilistic) treatment, which is the safe, arbitratable assumption.
      Parameters:
      charset - the detected charset; must not be null
      confidence - detection confidence in [0.0, 1.0]
    • EncodingResult

      public EncodingResult(Charset charset, float confidence, String label)
      Constructs a STATISTICAL result with a detector-specific label.
      Parameters:
      charset - the detected charset; must not be null
      confidence - detection confidence in [0.0, 1.0]
      label - the detector's original label (e.g. "IBM420-ltr"); if null, defaults to charset.name()
    • EncodingResult

      public EncodingResult(Charset charset, float confidence, String label, EncodingResult.ResultType resultType)
      Constructs a result with an explicit EncodingResult.ResultType.
      Parameters:
      charset - the detected charset; must not be null
      confidence - detection confidence in [0.0, 1.0]
      label - the detector's original label; if null, defaults to charset.name()
      resultType - the nature of the evidence; must not be null
  • Method Details