SQLite-backed HTTP store for Gadfly review findings, per-review run timings, and human/Claude grades, with a points-free per-model scoreboard. Pure fact store: it computes no points or rankings (the dashboard maps severity->points client-side and retunes without re-scoring). Findings are content-addressed by location so cross-model reports collapse for consensus; one grade per finding, latest wins. Pure-Go SQLite (CGO-free) + Docker image CI + tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4.4 KiB
🪰📋 gadfly-reports
A small durable store + scoreboard for Gadfly review findings. Gadfly (and any CI) POST each model's findings and per-review timing here; a human or Claude — via gadfly-mcp — later grades each finding. It's a single Go binary backed by SQLite, speaking a tiny HTTP API.
🤖 Heads up: this is a vibe-coded project
gadfly-reports was built almost entirely by an AI agent (Claude Code) — the design, the code, and these docs. It's small and it's tested, but treat it accordingly: it's a homelab-grade service, not a hardened product, and there may be the occasional AI-flavored rough edge. Issues and PRs welcome.
What it stores — and what it deliberately doesn't
gadfly-reports is a pure fact store:
- runs — one per model's review of a PR: wall-clock duration, lens count, optional token/cost.
- findings — content-addressed by location (
repo + pr + lens + file + line), so the same issue raised by several models collapses to one finding with many reports. That collapse is what makes cross-model consensus and per-model precision measurable. - grades — a triage verdict per finding:
is_real,severity(trivial|small|medium|high|critical), optionalusefulness(1–5), notes, grader. Grade history is kept; the latest wins.
It stores no points and computes no rankings. Mapping severity → points and ranking models by "value per minute" (or per token) is a client/dashboard concern, so you can retune the curve any time without migrating or re-scoring stored data.
Run it
# from source
go run gitea.stevedudenhoeffer.com/steve/gadfly-reports@latest serve
# or Docker (image published by CI on every push to main)
docker run -d --name gadfly-reports -p 8090:8090 -v gadfly-reports-data:/data \
-e GADFLY_REPORTS_TOKEN=change-me \
gitea.stevedudenhoeffer.com/steve/gadfly-reports:latest
HTTP API (the canonical contract)
| Method & path | Body / query | Purpose |
|---|---|---|
GET /healthz |
— | liveness (open even when a token is set) |
POST /runs |
one run object | upsert a model's review of a PR (timing/tokens) |
POST /reports |
JSON array of report objects | record findings + which model reported each |
POST /findings/{id}/grade |
{is_real, severity?, usefulness?, notes?, grader?} |
record a triage grade |
GET /export |
— | flat report×finding×run×latest-grade rows — the dashboard feed |
GET /scoreboard |
— | points-free per-model rollup |
POST /runs body: {run_id, repo, pr, model, provider, lenses, duration_secs, input_tokens?, output_tokens?, cost_usd?}
(re-posting the same run_id updates it).
POST /reports array element: {repo, pr, lens, file, line, title, model, provider, run_id, raw_severity, detail}.
GET /scoreboard element: {model, provider, runs, minutes, input_tokens, output_tokens, findings, confirmed, false_positive, ungraded, by_severity:{severity:count}}.
If GADFLY_REPORTS_TOKEN is set, every route except /healthz requires Authorization: Bearer <token>.
Configuration
| Env | Default | Meaning |
|---|---|---|
GADFLY_REPORTS_ADDR |
:8090 |
listen address |
GADFLY_REPORTS_DB |
gadfly-reports.db (/data/gadfly-reports.db in Docker) |
SQLite path |
GADFLY_REPORTS_TOKEN |
(empty) | bearer token callers must present (empty = open) |
CLI flags --addr / --db / --token override the env.
Dashboards
Point anything at the JSON endpoints (or the SQLite file read-only). GET /export is the flat feed;
GET /scoreboard is the per-model rollup. Compute points and value-per-minute in the dashboard,
e.g. with a curve like trivial=1, small=3, medium=5, high=8, critical=20 →
points = Σ weight[severity]·by_severity[severity], value/min = points / minutes.
How it fits together
- gadfly POSTs findings here after each
review when
GADFLY_FINDINGS_URLpoints at this store (advisory; off by default). - gadfly-mcp is the MCP server Claude uses to list findings and record grades against this store.
Build / test
go build ./...
go test ./...
gofmt -l . # must be clean
License
MIT © 2026 Steve Dudenhoeffer.