Class EvalCharsetDetectors

java.lang.Object
org.apache.tika.ml.chardetect.tools.EvalCharsetDetectors

public class EvalCharsetDetectors extends Object
Compares MojibusterEncodingDetector against ICU4J and juniversalchardet.

Supports:

  • --lengths 20,50,100,200,full — per-probe-length accuracy sweep
  • --confusion — top-confusion report for the ML-All detector

Usage:

   java EvalCharsetDetectors \
     [--model /path/to/chardetect.bin] \
     --data  /path/to/test-dir \
     [--lengths 20,50,100,200,full] \
     [--confusion]
 
  • Constructor Details

    • EvalCharsetDetectors

      public EvalCharsetDetectors()
  • Method Details