feat: PR filter — compare models on the same set of PRs

UI: a repo#pr multi-select (labeled with how many models ran each PR) scopes the whole table — runs, minutes, findings, points — to the chosen PRs, so a model with 2 runs can be fairly compared against one with 60. API: GET /scoreboard accepts ?repo= and ?pr= (repeatable or comma-list). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 22:55:43 -04:00
parent 2f003dd132
commit 1af115fdf1
6 changed files with 202 additions and 19 deletions
@@ -106,7 +106,7 @@ against reviews that take minutes.
 | `POST /findings/{id}/grade` | `{is_real, severity?, usefulness?, notes?, grader?}` | record a triage grade |
 | `GET /export` | — | flat report×finding×run×latest-grade rows — the dashboard feed |
 | `GET /runs` | — | list all runs (timing/tokens), oldest first |
-| `GET /scoreboard` | — | points-free per-model rollup |
+| `GET /scoreboard` | `?repo=<repo>` `&pr=<n>` (repeatable or comma-list, e.g. `?pr=10,11`) | points-free per-model rollup, optionally narrowed to specific PRs so models are compared on the same work |

 `POST /runs` body: `{run_id, repo, pr, model, provider, lenses, duration_secs, input_tokens?, output_tokens?, cost_usd?}`
 (re-posting the same `run_id` updates it).
@@ -138,6 +138,11 @@ ungraded, points, **points-per-minute**, points-per-run, by-severity — with **
 (date range, repo, provider, model, lens, grade/severity), free-text search, and a click-to-scope
 findings detail table.

+Comparisons can be scoped to **specific PRs**: a multi-select lists every `repo#pr` with how many
+models ran it (`steve/x#12 · 3/5 models`) — pick the PRs you want and the entire table (runs,
+minutes, findings, points) counts only those, so a model with 2 runs can be compared against one
+with 60 on exactly the work you choose.
+
 True to the store's "no points" rule, **scoring lives in the browser**: the page has an editable
 points curve (default `trivial=1, small=3, medium=5, high=8, critical=20`) and computes
 `points = Σ weight[severity]·count` and `value/min = points / minutes` on the fly — retune it without