Package org.apache.tika.detect
Class EncodingDetectorContext
java.lang.Object
org.apache.tika.detect.EncodingDetectorContext
Context object that collects encoding detection results from base detectors.
Stored in
ParseContext by
CompositeEncodingDetector so that a MetaEncodingDetector
can see all candidates and arbitrate. Removed after detection to prevent
contamination during recursive parsing.
Each base detector contributes a ranked List of
EncodingResults. The context exposes the top result from each
detector as the primary signal, and provides access to all candidates
for richer arbitration strategies.
- Since:
- Apache Tika 3.2
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA single detector's contribution: its ranked list of candidates and its name. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidaddResult(List<EncodingResult> encodingResults, String detectorName) Record the ranked results from a child detector.floatgetTopConfidenceFor(Charset charset) Returns the highest confidence seen for the given charset across all detector results (not just top results).Returns the unique charsets from ALL results of every detector, in detection order (top result first within each detector).voidsetArbitrationInfo(String info) Set by the meta detector to describe how it reached its decision.
-
Constructor Details
-
EncodingDetectorContext
public EncodingDetectorContext()
-
-
Method Details
-
addResult
Record the ranked results from a child detector.- Parameters:
encodingResults- ranked results, highest confidence first; must not be emptydetectorName- simple class name of the detector
-
getResults
- Returns:
- unmodifiable list of all per-detector results in detection order
-
getUniqueCharsets
Returns the unique charsets from ALL results of every detector, in detection order (top result first within each detector).Using all candidates rather than just each detector's top-1 is important when a single detector returns a ranked list (e.g., Mojibuster on a short probe returns [windows-1252, windows-1250, Shift-JIS]). If only the top-1 were used, CharSoup would see a single charset and return "unanimous" without ever attempting arbitration.
-
getTopConfidenceFor
Returns the highest confidence seen for the given charset across all detector results (not just top results). Useful for arbitrators that want to propagate the base detector's confidence for the winning charset. -
setArbitrationInfo
Set by the meta detector to describe how it reached its decision. Values: "unanimous", "scored", "no-stream", "empty-stream", etc. -
getArbitrationInfo
-