Invert the PR scope from opt-in to exclusion: untick a PR to drop it
from the comparison; the excluded set persists in localStorage and new
PRs are included automatically as they arrive. The list is now reverse
chronological (last run/report first) with the date shown per PR, the
footer states the total count so truncation fears are checkable at a
glance, and the scrollable list is pinned with min-height:0 for
robustness.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replace the cramped PR multi-select with a modal: every repo#pr as a
checkbox (with model coverage), a search box, and all/none that apply to
the search results. The model hider moves to the same popup style — the
per-row × and the hidden-chips bar are gone; both pickers live as
buttons in the filter row showing their current state.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
UI: a repo#pr multi-select (labeled with how many models ran each PR)
scopes the whole table — runs, minutes, findings, points — to the chosen
PRs, so a model with 2 runs can be fairly compared against one with 60.
API: GET /scoreboard accepts ?repo= and ?pr= (repeatable or comma-list).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Each scoreboard row gets a × to hide that model — for retired ones (m1
etc.) you no longer want in the view. Hidden models drop out of the
table, totals, and the findings drill-down; the set persists in
localStorage (grt-hidden) across reloads, with a "hidden (N): …" bar of
click-to-restore chips + a "show all".
Solo-ness is still computed against ALL models (hiding is a view filter,
not a rescoring), so hiding one model never fakes another's solo finds.
README Dashboard section updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Dashboard: add an editable 'solo-error penalty ×' (default 1.5) — a false positive only one model made (a unique wrong claim, derived from reporter count) multiplies its FP penalty, mirroring the solo-find bonus. Client-side; store stays point-free.
Deploy: speed up the healthcheck (image HEALTHCHECK + compose example: interval 30s->5s, start_period 10s, start_interval 1s). Traefik gates routing on the Docker health status, so the old 30s-to-first-probe meant ~30s of 502s after a restart; the daemon binds the port in ms, so it now goes healthy in ~1s. Data is on the volume; only fire-and-forget emits in the ~1s window are at risk.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds an editable 'solo-find bonus ×' (default 1.5). A confirmed finding reported by exactly one model (derived from the global reporter count per content-addressed finding — no grader flagging needed) scores severity × bonus. New 'solo' column counts uniquely-caught confirmed findings. Solo-ness is computed over ALL data so the model filter can't fake it. Client-side only; store stays point-free.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds an editable 'false-positive penalty ×' to the dashboard. A false positive carries no graded severity, so it's penalized by the severity the model CLAIMED (its lens verdict / raw_severity, mapped onto the curve: Blocking->high, Minor->small). points(net) = confirmed points + Σ penalty×points[claimed], so a model with a few good finds but many false positives nets down — even negative — and sorts to the bottom. Adds an 'fp pen' column; net points/pts-min/pts-run shown red when negative. Client-side only; the store stays point-free.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Serves a self-contained vanilla-JS dashboard (embedded via go:embed): a per-model performance table — runs, minutes, findings, confirmed/false-positive/ungraded, points, points-per-minute, points-per-run, by-severity — with drill-down filters (date range, repo, provider, model, lens, grade/severity), free-text search, and a click-to-scope findings detail table.
Scoring stays client-side: the page has an editable points curve and computes points + value-per-minute in the browser, so the store remains point-free. Adds GET /runs (lists all runs, incl. zero-finding ones) so minutes/runs are filterable. The /ui shell is public (carries no data); data endpoints stay token-gated and the JS sends the token.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>