rigor
statistical accountability for the second-model · cream-ochre-black · 2026-05-16
no data yet
Phase-1 framework deployed; awaiting first metric writes from npm run score-predictions and scripts/check_drift.py.
Empty state expected until the second-model has accumulated outcomes. Tables read: model_drift_metrics, model_predictions × prediction_outcomes.
rolling accuracy per signal
(no metric rows yet)
REJECT triggers after 14 consecutive days of MCC < 0.10. sMAPE alert > 1.0; Brier alert > 0.30.
probability of backtest overfitting
(no pbo rows yet — write '<signal>:pbo' to model_drift_metrics)
feature-drift PSI (60-day window)
(no drift rows yet — run scripts/check_drift.py on a weekly cron)
calibration plots (predicted × realised)
(no resolved predictions yet — calibration appears once prediction_outcomes is populated)