gadfly-reports

steve/gadfly-reports

Fork 0

Commit Graph

Author	SHA1	Message	Date
steve	c15f860853	feat(ui): solo-find bonus — reward a model for catching what others missed Build & push image / build-and-push (push) Successful in 20s Details CI / test (push) Successful in 10m20s Details Adds an editable 'solo-find bonus ×' (default 1.5). A confirmed finding reported by exactly one model (derived from the global reporter count per content-addressed finding — no grader flagging needed) scores severity × bonus. New 'solo' column counts uniquely-caught confirmed findings. Solo-ness is computed over ALL data so the model filter can't fake it. Client-side only; store stays point-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 12:24:29 -04:00
steve	0cb6b25f11	feat(ui): false-positive penalty (severity-scaled, default -0.5) Build & push image / build-and-push (push) Successful in 20s Details CI / test (push) Successful in 10m24s Details Adds an editable 'false-positive penalty ×' to the dashboard. A false positive carries no graded severity, so it's penalized by the severity the model CLAIMED (its lens verdict / raw_severity, mapped onto the curve: Blocking->high, Minor->small). points(net) = confirmed points + Σ penalty×points[claimed], so a model with a few good finds but many false positives nets down — even negative — and sorts to the bottom. Adds an 'fp pen' column; net points/pts-min/pts-run shown red when negative. Client-side only; the store stays point-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 09:50:18 -04:00
steve	35ebc53561	feat: built-in read-only dashboard at /ui + GET /runs Build & push image / build-and-push (push) Successful in 26s Details CI / test (push) Successful in 10m24s Details Serves a self-contained vanilla-JS dashboard (embedded via go:embed): a per-model performance table — runs, minutes, findings, confirmed/false-positive/ungraded, points, points-per-minute, points-per-run, by-severity — with drill-down filters (date range, repo, provider, model, lens, grade/severity), free-text search, and a click-to-scope findings detail table. Scoring stays client-side: the page has an editable points curve and computes points + value-per-minute in the browser, so the store remains point-free. Adds GET /runs (lists all runs, incl. zero-finding ones) so minutes/runs are filterable. The /ui shell is public (carries no data); data endpoints stay token-gated and the JS sends the token. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 00:22:39 -04:00

Author

SHA1

Message

Date

steve

c15f860853

feat(ui): solo-find bonus — reward a model for catching what others missed

Build & push image / build-and-push (push) Successful in 20s

Details

CI / test (push) Successful in 10m20s

Details

Adds an editable 'solo-find bonus ×' (default 1.5). A confirmed finding reported by exactly one model (derived from the global reporter count per content-addressed finding — no grader flagging needed) scores severity × bonus. New 'solo' column counts uniquely-caught confirmed findings. Solo-ness is computed over ALL data so the model filter can't fake it. Client-side only; store stays point-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-27 12:24:29 -04:00

steve

0cb6b25f11

feat(ui): false-positive penalty (severity-scaled, default -0.5)

Build & push image / build-and-push (push) Successful in 20s

Details

CI / test (push) Successful in 10m24s

Details

Adds an editable 'false-positive penalty ×' to the dashboard. A false positive carries no graded severity, so it's penalized by the severity the model CLAIMED (its lens verdict / raw_severity, mapped onto the curve: Blocking->high, Minor->small). points(net) = confirmed points + Σ penalty×points[claimed], so a model with a few good finds but many false positives nets down — even negative — and sorts to the bottom. Adds an 'fp pen' column; net points/pts-min/pts-run shown red when negative. Client-side only; the store stays point-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-27 09:50:18 -04:00

steve

35ebc53561

feat: built-in read-only dashboard at /ui + GET /runs

Build & push image / build-and-push (push) Successful in 26s

Details

CI / test (push) Successful in 10m24s

Details

Serves a self-contained vanilla-JS dashboard (embedded via go:embed): a per-model performance table — runs, minutes, findings, confirmed/false-positive/ungraded, points, points-per-minute, points-per-run, by-severity — with drill-down filters (date range, repo, provider, model, lens, grade/severity), free-text search, and a click-to-scope findings detail table.

Scoring stays client-side: the page has an editable points curve and computes points + value-per-minute in the browser, so the store remains point-free. Adds GET /runs (lists all runs, incl. zero-finding ones) so minutes/runs are filterable. The /ui shell is public (carries no data); data endpoints stay token-gated and the JS sends the token.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-27 00:22:39 -04:00

3 Commits