feat: solo-error penalty + fast healthcheck (instant Traefik restart)
Dashboard: add an editable 'solo-error penalty ×' (default 1.5) — a false positive only one model made (a unique wrong claim, derived from reporter count) multiplies its FP penalty, mirroring the solo-find bonus. Client-side; store stays point-free. Deploy: speed up the healthcheck (image HEALTHCHECK + compose example: interval 30s->5s, start_period 10s, start_interval 1s). Traefik gates routing on the Docker health status, so the old 30s-to-first-probe meant ~30s of 502s after a restart; the daemon binds the port in ms, so it now goes healthy in ~1s. Data is on the volume; only fire-and-forget emits in the ~1s window are at risk. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -57,9 +57,14 @@ services:
|
||||
networks: [traefik]
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "-O", "-", "http://localhost:8090/healthz"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
# Fast probe so Traefik resumes routing within ~1s of a restart (the daemon
|
||||
# binds the port in milliseconds). Without a fast probe Traefik 502s until the
|
||||
# first check — the usual "why is it down for 30s after restart".
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
start_interval: 1s # probe every 1s during start_period (needs Docker 25+)
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.http.routers.gadfly-reports.rule=Host(`reports.example.com`)"
|
||||
@@ -83,6 +88,13 @@ Traefik bits to your setup — the **host** (`reports.example.com`), the **entry
|
||||
to the container's `:8090`. Then point `gadfly`'s `GADFLY_FINDINGS_URL` and `gadfly-mcp`'s
|
||||
`--store` at `https://reports.example.com` (with the same token).
|
||||
|
||||
On `docker compose pull && docker compose up -d`, the fast healthcheck lets Traefik resume routing
|
||||
within ~1s (the daemon starts in milliseconds — Traefik just won't route to a container whose health
|
||||
probe hasn't passed yet, which is the "down for 30s after restart" gotcha). Your data lives on the
|
||||
`gadfly-reports-data` volume and survives restarts; the only loss exposure is a review POSTing
|
||||
findings during that ~1s window, since gadfly's emit is fire-and-forget (no retry) — negligible
|
||||
against reviews that take minutes.
|
||||
|
||||
## HTTP API (the canonical contract)
|
||||
|
||||
| Method & path | Body / query | Purpose |
|
||||
@@ -143,6 +155,10 @@ number of models that reported one is known, so a confirmed finding that **only
|
||||
The `solo` column counts those. This is derived from the data (reporter count); the grader never has
|
||||
to flag it. Set the bonus to `1` to disable.
|
||||
|
||||
Its mirror, **solo-error penalty ×** (default `1.5`), multiplies the FP penalty when a false positive
|
||||
was made by **only that model** — a unique wrong claim is noisier than a shared mistake. So a
|
||||
Blocking-claimed solo FP costs `high(8) × -0.5 × 1.5 = -6` vs `-4` for a shared one. Set to `1` to disable.
|
||||
|
||||
Auth: the `/ui` shell is public (it holds no data); paste the store token into its **connect** box,
|
||||
or open `/ui?token=<token>` once (remembered in `localStorage`). Prefer your own dashboard? Point
|
||||
Grafana/Metabase/etc. at the SQLite file or the same `/export` + `/scoreboard` + `/runs` JSON.
|
||||
|
||||
Reference in New Issue
Block a user