feat: solo-error penalty + fast healthcheck (instant Traefik restart)
Build & push image / build-and-push (push) Successful in 20s
CI / test (push) Successful in 10m22s

Dashboard: add an editable 'solo-error penalty ×' (default 1.5) — a false positive only one model made (a unique wrong claim, derived from reporter count) multiplies its FP penalty, mirroring the solo-find bonus. Client-side; store stays point-free.

Deploy: speed up the healthcheck (image HEALTHCHECK + compose example: interval 30s->5s, start_period 10s, start_interval 1s). Traefik gates routing on the Docker health status, so the old 30s-to-first-probe meant ~30s of 502s after a restart; the daemon binds the port in ms, so it now goes healthy in ~1s. Data is on the volume; only fire-and-forget emits in the ~1s window are at risk.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-27 12:45:07 -04:00
parent c15f860853
commit 14cbee8e25
3 changed files with 27 additions and 4 deletions
+4 -2
View File
@@ -81,6 +81,7 @@
<span class="small mut">critical</span><input type="number" id="p_critical" value="20">
<span class="small mut" style="margin-left:18px">false-positive penalty ×</span><input type="number" id="fp_mult" value="-0.5" step="0.5" title="A false positive scores this × the severity the model CLAIMED (its lens verdict). e.g. a Blocking-claimed FP at -0.5 = high(8) × -0.5 = -4 pts.">
<span class="small mut" style="margin-left:18px">solo-find bonus ×</span><input type="number" id="solo_bonus" value="1.5" step="0.5" min="1" title="A confirmed finding that NO other model reported scores this × its severity points — rewarding a model for catching what the swarm missed. 1 = no bonus.">
<span class="small mut" style="margin-left:18px">solo-error penalty ×</span><input type="number" id="solo_err" value="1.5" step="0.5" min="1" title="A false positive that NO other model made (a unique wrong claim) multiplies its FP penalty by this — noisier than a shared mistake. 1 = no extra penalty.">
</div>
</div>
</div>
@@ -168,6 +169,7 @@ function curve(){
}
function fpMult(){ const v = parseFloat(document.getElementById("fp_mult").value); return isNaN(v) ? 0 : v; }
function soloBonus(){ const v = parseFloat(document.getElementById("solo_bonus").value); return isNaN(v) ? 1 : v; }
function soloErr(){ const v = parseFloat(document.getElementById("solo_err").value); return isNaN(v) ? 1 : v; }
// A false positive has no graded severity, so penalize it by the severity the
// MODEL claimed — its lens verdict (raw_severity) — mapped onto the curve. The
// louder the wrong cry, the bigger the penalty.
@@ -235,7 +237,7 @@ function aggregate(f){
else { m.ungraded.add(r.finding_id); }
}
const fpm = fpMult(), sb = soloBonus();
const fpm = fpMult(), sb = soloBonus(), se = soloErr();
const out = [...M.values()].map(m => {
const sevCounts = Object.fromEntries(SEVS.map(s=>[s,0]));
let confirmedPoints = 0, solo = 0;
@@ -245,7 +247,7 @@ function aggregate(f){
if (isSolo) solo++;
confirmedPoints += (c[sevv] || 0) * (isSolo ? sb : 1);
}
let fpPen = 0; for (const k of m.fp.values()) fpPen += (c[k]||0) * fpm; // negative when fpm<0
let fpPen = 0; for (const [fid, k] of m.fp){ const soloE = (reporters.get(fid)?.size || 1) === 1; fpPen += (c[k]||0) * fpm * (soloE ? se : 1); } // solo (unique) errors penalized extra
const points = confirmedPoints + fpPen; // NET: solo-boosted confirmed + FP penalty
const findings = m.findings.size, confirmed = m.confirmed.size;
return { model:m.model, provider:m.provider, runs:m.runs, minutes:m.minutes,