Class EvalJunkDetector
For each script's dev set, scores clean sentences alongside several corruption modes at various injection rates and string lengths. Computes per-cell Cohen's d (discrimination power) and TPR/FPR at a fixed z-score threshold.
Output files in --output-dir:
- detail.tsv — one row per (script, distortion, rate, length):
script, distortion, param, length, n_clean, n_corrupt, mean_clean_z, mean_corrupt_z, cohens_d, fpr, tpr - summary.tsv — macro-averaged Cohen's d and FPR/TPR per (distortion, rate, length) across all scripts.
- compare.tsv — pairwise codec-comparison accuracy using the
JunkDetector.compare(java.lang.String, java.lang.String, java.lang.String, java.lang.String)API, stratified by string length. This is the primary metric for the charset-arbitration use case; larger mean delta = better discrimination at that length.
Why char-remap is not in summary.tsv: The character-level wrong-codec
substitution (e.g. CP1252→CP1255, replacing umlauts with Hebrew letters) is added
to training at a 5% rate. At that rate it is too subtle to detect via the absolute
JunkDetector.score(java.lang.String) API — z-score distributions barely separate (Cohen's d ≈ 0).
The distortion trains the LR to distinguish subtly-wrong from correct decodings, which
only manifests as larger pairwise deltas in JunkDetector.compare(java.lang.String, java.lang.String, java.lang.String, java.lang.String). Measuring it
via summary.tsv would produce misleading d≈0 "failure" rows; see compare.tsv instead.
Cohen's d = (mean_clean_z − mean_corrupt_z) / pooled_std. Higher d = better discrimination. FPR = fraction of clean text falsely flagged; TPR = fraction of corrupted text correctly flagged. Both use threshold = −2.0.
To compare two model versions: run eval before and after, then diff the summary and compare TSVs. The "macro_d" column in summary.tsv and the "mean_delta" columns in compare.tsv are the headline metrics.
Usage:
java EvalJunkDetector \
--model /path/to/junkdetect.bin (default: classpath)
--data-dir ~/datasets/madlad/junkdetect
--output-dir /path/to/results (default: data-dir/eval)
--split dev|test (default: dev)
--samples 200
--compare-n 200 (qualifying pairs per codec pair per length)
--seed 42
--lengths 5,9,15,30,50,100,200
--compare-lengths 5,9,15,30,50
--rates 0.01,0.05,0.10,0.25,0.50,0.90
--threshold -2.0
-
Constructor Summary
Constructors -
Method Summary
-
Constructor Details
-
EvalJunkDetector
public EvalJunkDetector()
-
-
Method Details
-
main
- Throws:
Exception
-