inference calibration
Per-model calibration metrics for the inference-rebuild. Read-only. Source: inference_models × arb_predictions ⋈ arb_prediction_outcomes.
active model registry
| active | version | game | tier | fit at | train rows | WF CRPS | WF log-score | cov 50% | cov 90% |
|---|---|---|---|---|---|---|---|---|---|
| ● | v1.1-magic-tier3-2026-05-16T0320Z | magic | 3 | 2026-05-16 03:20 | 10,144 | — | — | — | — |
| ● | v1.1-pokemon-tier3-2026-05-16T0145Z | pokemon | 3 | 2026-05-16 01:45 | 7,628 | — | — | — | — |
| ○ | v1.1-pokemon-tier3-2026-05-16T0423Z | pokemon | 3 | 2026-05-16 04:23 | 7,628 | — | — | — | — |
| ○ | v1.1-magic-tier3-2026-05-16T0144Z | magic | 3 | 2026-05-16 01:44 | 10,144 | — | — | — | — |
coverage + proper score
Tier 2 (clearing-prob): proper-score = Brier (lower is better). Tier 3 (log-price): proper-score = mean |log-residual| (lower is better); 50% coverage should be ≈ 0.50.
| tier | game | model | N w/ outcome | cov 50% | proper score |
|---|---|---|---|---|---|
| 3 | magic | v1.1-magic-tier3-2026-05-16T0320Z | 22,039 | 58.0% | 0.7178 |
| 3 | pokemon | v1.1-pokemon-tier3-2026-05-16T0145Z | 2,006 | 54.8% | 0.8307 |
tier-3 residuals by stratum
log(realised) − log(predicted p50) per (game × price-band). Median should be ≈ 0 if unbiased; mean-abs is a sharpness proxy.
| game | price band | n | median resid | mean |resid| |
|---|---|---|---|---|
| magic | A: <$10 | 1,697 | -0.548 | 0.651 |
| magic | B: $10-50 | 12,672 | -0.378 | 0.589 |
| magic | C: $50-200 | 5,905 | -0.282 | 0.900 |
| magic | D: $200-1k | 1,765 | +1.064 | 1.096 |
| pokemon | A: <$10 | 1,665 | -0.823 | 0.829 |
| pokemon | B: $10-50 | 336 | -0.684 | 0.817 |
| pokemon | C: $50-200 | 5 | +2.412 | 2.267 |