feat: solo-error penalty + fast healthcheck (instant Traefik restart)
Dashboard: add an editable 'solo-error penalty ×' (default 1.5) — a false positive only one model made (a unique wrong claim, derived from reporter count) multiplies its FP penalty, mirroring the solo-find bonus. Client-side; store stays point-free. Deploy: speed up the healthcheck (image HEALTHCHECK + compose example: interval 30s->5s, start_period 10s, start_interval 1s). Traefik gates routing on the Docker health status, so the old 30s-to-first-probe meant ~30s of 502s after a restart; the daemon binds the port in ms, so it now goes healthy in ~1s. Data is on the volume; only fire-and-forget emits in the ~1s window are at risk. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -15,5 +15,10 @@ ENV GADFLY_REPORTS_ADDR=:8090 \
|
|||||||
GADFLY_REPORTS_DB=/data/gadfly-reports.db
|
GADFLY_REPORTS_DB=/data/gadfly-reports.db
|
||||||
EXPOSE 8090
|
EXPOSE 8090
|
||||||
VOLUME ["/data"]
|
VOLUME ["/data"]
|
||||||
|
# Fast probe so an orchestrator (e.g. Traefik) resumes routing within a few seconds
|
||||||
|
# of a (re)start — the daemon binds the port in milliseconds. First probe at
|
||||||
|
# --interval (5s); --start-period keeps early failures from flapping the status.
|
||||||
|
HEALTHCHECK --interval=5s --timeout=3s --start-period=10s --retries=3 \
|
||||||
|
CMD wget -q -O - http://localhost:8090/healthz || exit 1
|
||||||
ENTRYPOINT ["/usr/local/bin/gadfly-reports"]
|
ENTRYPOINT ["/usr/local/bin/gadfly-reports"]
|
||||||
CMD ["serve"]
|
CMD ["serve"]
|
||||||
|
|||||||
@@ -57,9 +57,14 @@ services:
|
|||||||
networks: [traefik]
|
networks: [traefik]
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ["CMD", "wget", "-q", "-O", "-", "http://localhost:8090/healthz"]
|
test: ["CMD", "wget", "-q", "-O", "-", "http://localhost:8090/healthz"]
|
||||||
interval: 30s
|
# Fast probe so Traefik resumes routing within ~1s of a restart (the daemon
|
||||||
timeout: 5s
|
# binds the port in milliseconds). Without a fast probe Traefik 502s until the
|
||||||
|
# first check — the usual "why is it down for 30s after restart".
|
||||||
|
interval: 5s
|
||||||
|
timeout: 3s
|
||||||
retries: 3
|
retries: 3
|
||||||
|
start_period: 10s
|
||||||
|
start_interval: 1s # probe every 1s during start_period (needs Docker 25+)
|
||||||
labels:
|
labels:
|
||||||
- "traefik.enable=true"
|
- "traefik.enable=true"
|
||||||
- "traefik.http.routers.gadfly-reports.rule=Host(`reports.example.com`)"
|
- "traefik.http.routers.gadfly-reports.rule=Host(`reports.example.com`)"
|
||||||
@@ -83,6 +88,13 @@ Traefik bits to your setup — the **host** (`reports.example.com`), the **entry
|
|||||||
to the container's `:8090`. Then point `gadfly`'s `GADFLY_FINDINGS_URL` and `gadfly-mcp`'s
|
to the container's `:8090`. Then point `gadfly`'s `GADFLY_FINDINGS_URL` and `gadfly-mcp`'s
|
||||||
`--store` at `https://reports.example.com` (with the same token).
|
`--store` at `https://reports.example.com` (with the same token).
|
||||||
|
|
||||||
|
On `docker compose pull && docker compose up -d`, the fast healthcheck lets Traefik resume routing
|
||||||
|
within ~1s (the daemon starts in milliseconds — Traefik just won't route to a container whose health
|
||||||
|
probe hasn't passed yet, which is the "down for 30s after restart" gotcha). Your data lives on the
|
||||||
|
`gadfly-reports-data` volume and survives restarts; the only loss exposure is a review POSTing
|
||||||
|
findings during that ~1s window, since gadfly's emit is fire-and-forget (no retry) — negligible
|
||||||
|
against reviews that take minutes.
|
||||||
|
|
||||||
## HTTP API (the canonical contract)
|
## HTTP API (the canonical contract)
|
||||||
|
|
||||||
| Method & path | Body / query | Purpose |
|
| Method & path | Body / query | Purpose |
|
||||||
@@ -143,6 +155,10 @@ number of models that reported one is known, so a confirmed finding that **only
|
|||||||
The `solo` column counts those. This is derived from the data (reporter count); the grader never has
|
The `solo` column counts those. This is derived from the data (reporter count); the grader never has
|
||||||
to flag it. Set the bonus to `1` to disable.
|
to flag it. Set the bonus to `1` to disable.
|
||||||
|
|
||||||
|
Its mirror, **solo-error penalty ×** (default `1.5`), multiplies the FP penalty when a false positive
|
||||||
|
was made by **only that model** — a unique wrong claim is noisier than a shared mistake. So a
|
||||||
|
Blocking-claimed solo FP costs `high(8) × -0.5 × 1.5 = -6` vs `-4` for a shared one. Set to `1` to disable.
|
||||||
|
|
||||||
Auth: the `/ui` shell is public (it holds no data); paste the store token into its **connect** box,
|
Auth: the `/ui` shell is public (it holds no data); paste the store token into its **connect** box,
|
||||||
or open `/ui?token=<token>` once (remembered in `localStorage`). Prefer your own dashboard? Point
|
or open `/ui?token=<token>` once (remembered in `localStorage`). Prefer your own dashboard? Point
|
||||||
Grafana/Metabase/etc. at the SQLite file or the same `/export` + `/scoreboard` + `/runs` JSON.
|
Grafana/Metabase/etc. at the SQLite file or the same `/export` + `/scoreboard` + `/runs` JSON.
|
||||||
|
|||||||
@@ -81,6 +81,7 @@
|
|||||||
<span class="small mut">critical</span><input type="number" id="p_critical" value="20">
|
<span class="small mut">critical</span><input type="number" id="p_critical" value="20">
|
||||||
<span class="small mut" style="margin-left:18px">false-positive penalty ×</span><input type="number" id="fp_mult" value="-0.5" step="0.5" title="A false positive scores this × the severity the model CLAIMED (its lens verdict). e.g. a Blocking-claimed FP at -0.5 = high(8) × -0.5 = -4 pts.">
|
<span class="small mut" style="margin-left:18px">false-positive penalty ×</span><input type="number" id="fp_mult" value="-0.5" step="0.5" title="A false positive scores this × the severity the model CLAIMED (its lens verdict). e.g. a Blocking-claimed FP at -0.5 = high(8) × -0.5 = -4 pts.">
|
||||||
<span class="small mut" style="margin-left:18px">solo-find bonus ×</span><input type="number" id="solo_bonus" value="1.5" step="0.5" min="1" title="A confirmed finding that NO other model reported scores this × its severity points — rewarding a model for catching what the swarm missed. 1 = no bonus.">
|
<span class="small mut" style="margin-left:18px">solo-find bonus ×</span><input type="number" id="solo_bonus" value="1.5" step="0.5" min="1" title="A confirmed finding that NO other model reported scores this × its severity points — rewarding a model for catching what the swarm missed. 1 = no bonus.">
|
||||||
|
<span class="small mut" style="margin-left:18px">solo-error penalty ×</span><input type="number" id="solo_err" value="1.5" step="0.5" min="1" title="A false positive that NO other model made (a unique wrong claim) multiplies its FP penalty by this — noisier than a shared mistake. 1 = no extra penalty.">
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@@ -168,6 +169,7 @@ function curve(){
|
|||||||
}
|
}
|
||||||
function fpMult(){ const v = parseFloat(document.getElementById("fp_mult").value); return isNaN(v) ? 0 : v; }
|
function fpMult(){ const v = parseFloat(document.getElementById("fp_mult").value); return isNaN(v) ? 0 : v; }
|
||||||
function soloBonus(){ const v = parseFloat(document.getElementById("solo_bonus").value); return isNaN(v) ? 1 : v; }
|
function soloBonus(){ const v = parseFloat(document.getElementById("solo_bonus").value); return isNaN(v) ? 1 : v; }
|
||||||
|
function soloErr(){ const v = parseFloat(document.getElementById("solo_err").value); return isNaN(v) ? 1 : v; }
|
||||||
// A false positive has no graded severity, so penalize it by the severity the
|
// A false positive has no graded severity, so penalize it by the severity the
|
||||||
// MODEL claimed — its lens verdict (raw_severity) — mapped onto the curve. The
|
// MODEL claimed — its lens verdict (raw_severity) — mapped onto the curve. The
|
||||||
// louder the wrong cry, the bigger the penalty.
|
// louder the wrong cry, the bigger the penalty.
|
||||||
@@ -235,7 +237,7 @@ function aggregate(f){
|
|||||||
else { m.ungraded.add(r.finding_id); }
|
else { m.ungraded.add(r.finding_id); }
|
||||||
}
|
}
|
||||||
|
|
||||||
const fpm = fpMult(), sb = soloBonus();
|
const fpm = fpMult(), sb = soloBonus(), se = soloErr();
|
||||||
const out = [...M.values()].map(m => {
|
const out = [...M.values()].map(m => {
|
||||||
const sevCounts = Object.fromEntries(SEVS.map(s=>[s,0]));
|
const sevCounts = Object.fromEntries(SEVS.map(s=>[s,0]));
|
||||||
let confirmedPoints = 0, solo = 0;
|
let confirmedPoints = 0, solo = 0;
|
||||||
@@ -245,7 +247,7 @@ function aggregate(f){
|
|||||||
if (isSolo) solo++;
|
if (isSolo) solo++;
|
||||||
confirmedPoints += (c[sevv] || 0) * (isSolo ? sb : 1);
|
confirmedPoints += (c[sevv] || 0) * (isSolo ? sb : 1);
|
||||||
}
|
}
|
||||||
let fpPen = 0; for (const k of m.fp.values()) fpPen += (c[k]||0) * fpm; // negative when fpm<0
|
let fpPen = 0; for (const [fid, k] of m.fp){ const soloE = (reporters.get(fid)?.size || 1) === 1; fpPen += (c[k]||0) * fpm * (soloE ? se : 1); } // solo (unique) errors penalized extra
|
||||||
const points = confirmedPoints + fpPen; // NET: solo-boosted confirmed + FP penalty
|
const points = confirmedPoints + fpPen; // NET: solo-boosted confirmed + FP penalty
|
||||||
const findings = m.findings.size, confirmed = m.confirmed.size;
|
const findings = m.findings.size, confirmed = m.confirmed.size;
|
||||||
return { model:m.model, provider:m.provider, runs:m.runs, minutes:m.minutes,
|
return { model:m.model, provider:m.provider, runs:m.runs, minutes:m.minutes,
|
||||||
|
|||||||
Reference in New Issue
Block a user