ddcf42a3ce
SQLite-backed HTTP store for Gadfly review findings, per-review run timings, and human/Claude grades, with a points-free per-model scoreboard. Pure fact store: it computes no points or rankings (the dashboard maps severity->points client-side and retunes without re-scoring). Findings are content-addressed by location so cross-model reports collapse for consensus; one grade per finding, latest wins. Pure-Go SQLite (CGO-free) + Docker image CI + tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
97 lines
4.4 KiB
Markdown
97 lines
4.4 KiB
Markdown
# 🪰📋 gadfly-reports
|
||
|
||
A small **durable store + scoreboard** for [Gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly)
|
||
review findings. Gadfly (and any CI) POST each model's findings and per-review timing here; a human
|
||
or Claude — via [gadfly-mcp](https://gitea.stevedudenhoeffer.com/steve/gadfly-mcp) — later grades
|
||
each finding. It's a single Go binary backed by SQLite, speaking a tiny HTTP API.
|
||
|
||
> ### 🤖 Heads up: this is a vibe-coded project
|
||
> gadfly-reports was built almost entirely by an AI agent (Claude Code) — the design, the code, and
|
||
> these docs. It's small and it's tested, but treat it accordingly: it's a homelab-grade service,
|
||
> not a hardened product, and there may be the occasional AI-flavored rough edge. Issues and PRs
|
||
> welcome.
|
||
|
||
## What it stores — and what it deliberately doesn't
|
||
|
||
gadfly-reports is a **pure fact store**:
|
||
|
||
- **runs** — one per model's review of a PR: wall-clock duration, lens count, optional token/cost.
|
||
- **findings** — **content-addressed by location** (`repo + pr + lens + file + line`), so the *same*
|
||
issue raised by several models collapses to one finding with many **reports**. That collapse is
|
||
what makes cross-model **consensus** and per-model **precision** measurable.
|
||
- **grades** — a triage verdict per finding: `is_real`, `severity`
|
||
(`trivial|small|medium|high|critical`), optional `usefulness` (1–5), notes, grader. Grade history
|
||
is kept; the latest wins.
|
||
|
||
It stores **no points and computes no rankings.** Mapping severity → points and ranking models by
|
||
"value per minute" (or per token) is a **client/dashboard concern**, so you can retune the curve any
|
||
time without migrating or re-scoring stored data.
|
||
|
||
## Run it
|
||
|
||
```sh
|
||
# from source
|
||
go run gitea.stevedudenhoeffer.com/steve/gadfly-reports@latest serve
|
||
|
||
# or Docker (image published by CI on every push to main)
|
||
docker run -d --name gadfly-reports -p 8090:8090 -v gadfly-reports-data:/data \
|
||
-e GADFLY_REPORTS_TOKEN=change-me \
|
||
gitea.stevedudenhoeffer.com/steve/gadfly-reports:latest
|
||
```
|
||
|
||
## HTTP API (the canonical contract)
|
||
|
||
| Method & path | Body / query | Purpose |
|
||
|---|---|---|
|
||
| `GET /healthz` | — | liveness (open even when a token is set) |
|
||
| `POST /runs` | one run object | upsert a model's review of a PR (timing/tokens) |
|
||
| `POST /reports` | JSON **array** of report objects | record findings + which model reported each |
|
||
| `POST /findings/{id}/grade` | `{is_real, severity?, usefulness?, notes?, grader?}` | record a triage grade |
|
||
| `GET /export` | — | flat report×finding×run×latest-grade rows — the dashboard feed |
|
||
| `GET /scoreboard` | — | points-free per-model rollup |
|
||
|
||
`POST /runs` body: `{run_id, repo, pr, model, provider, lenses, duration_secs, input_tokens?, output_tokens?, cost_usd?}`
|
||
(re-posting the same `run_id` updates it).
|
||
|
||
`POST /reports` array element: `{repo, pr, lens, file, line, title, model, provider, run_id, raw_severity, detail}`.
|
||
|
||
`GET /scoreboard` element: `{model, provider, runs, minutes, input_tokens, output_tokens, findings, confirmed, false_positive, ungraded, by_severity:{severity:count}}`.
|
||
|
||
If `GADFLY_REPORTS_TOKEN` is set, every route except `/healthz` requires `Authorization: Bearer <token>`.
|
||
|
||
## Configuration
|
||
|
||
| Env | Default | Meaning |
|
||
|-----|---------|---------|
|
||
| `GADFLY_REPORTS_ADDR` | `:8090` | listen address |
|
||
| `GADFLY_REPORTS_DB` | `gadfly-reports.db` (`/data/gadfly-reports.db` in Docker) | SQLite path |
|
||
| `GADFLY_REPORTS_TOKEN` | *(empty)* | bearer token callers must present (empty = open) |
|
||
|
||
CLI flags `--addr` / `--db` / `--token` override the env.
|
||
|
||
## Dashboards
|
||
|
||
Point anything at the JSON endpoints (or the SQLite file read-only). `GET /export` is the flat feed;
|
||
`GET /scoreboard` is the per-model rollup. Compute points and value-per-minute **in the dashboard**,
|
||
e.g. with a curve like `trivial=1, small=3, medium=5, high=8, critical=20` →
|
||
`points = Σ weight[severity]·by_severity[severity]`, `value/min = points / minutes`.
|
||
|
||
## How it fits together
|
||
|
||
- **[gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly)** POSTs findings here after each
|
||
review when `GADFLY_FINDINGS_URL` points at this store (advisory; off by default).
|
||
- **[gadfly-mcp](https://gitea.stevedudenhoeffer.com/steve/gadfly-mcp)** is the MCP server Claude
|
||
uses to list findings and record grades against this store.
|
||
|
||
## Build / test
|
||
|
||
```sh
|
||
go build ./...
|
||
go test ./...
|
||
gofmt -l . # must be clean
|
||
```
|
||
|
||
## License
|
||
|
||
MIT © 2026 Steve Dudenhoeffer.
|