Commit Graph

2 Commits

Author SHA1 Message Date
steve 1af115fdf1 feat: PR filter — compare models on the same set of PRs
Build & push image / build-and-push (push) Successful in 13s
CI / test (push) Successful in 9m51s
UI: a repo#pr multi-select (labeled with how many models ran each PR)
scopes the whole table — runs, minutes, findings, points — to the chosen
PRs, so a model with 2 runs can be fairly compared against one with 60.
API: GET /scoreboard accepts ?repo= and ?pr= (repeatable or comma-list).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 22:56:49 -04:00
steve ddcf42a3ce feat: gadfly-reports — findings store + scoreboard daemon
Build & push image / build-and-push (push) Successful in 1m13s
CI / test (push) Successful in 10m39s
SQLite-backed HTTP store for Gadfly review findings, per-review run timings, and human/Claude grades, with a points-free per-model scoreboard. Pure fact store: it computes no points or rankings (the dashboard maps severity->points client-side and retunes without re-scoring). Findings are content-addressed by location so cross-model reports collapse for consensus; one grade per finding, latest wins. Pure-Go SQLite (CGO-free) + Docker image CI + tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 23:55:24 -04:00