# 🪰📋 gadfly-reports A small **durable store + scoreboard** for [Gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly) review findings. Gadfly (and any CI) POST each model's findings and per-review timing here; a human or Claude — via [gadfly-mcp](https://gitea.stevedudenhoeffer.com/steve/gadfly-mcp) — later grades each finding. It's a single Go binary backed by SQLite, speaking a tiny HTTP API. > ### 🤖 Heads up: this is a vibe-coded project > gadfly-reports was built almost entirely by an AI agent (Claude Code) — the design, the code, and > these docs. It's small and it's tested, but treat it accordingly: it's a homelab-grade service, > not a hardened product, and there may be the occasional AI-flavored rough edge. Issues and PRs > welcome. ## What it stores — and what it deliberately doesn't gadfly-reports is a **pure fact store**: - **runs** — one per model's review of a PR: wall-clock duration, lens count, optional token/cost. - **findings** — **content-addressed by location** (`repo + pr + lens + file + line`), so the *same* issue raised by several models collapses to one finding with many **reports**. That collapse is what makes cross-model **consensus** and per-model **precision** measurable. - **grades** — a triage verdict per finding: `is_real`, `severity` (`trivial|small|medium|high|critical`), optional `usefulness` (1–5), notes, grader. Grade history is kept; the latest wins. It stores **no points and computes no rankings.** Mapping severity → points and ranking models by "value per minute" (or per token) is a **client/dashboard concern**, so you can retune the curve any time without migrating or re-scoring stored data. ## Run it ```sh # from source go run gitea.stevedudenhoeffer.com/steve/gadfly-reports@latest serve # or Docker (image published by CI on every push to main) docker run -d --name gadfly-reports -p 8090:8090 -v gadfly-reports-data:/data \ -e GADFLY_REPORTS_TOKEN=change-me \ gitea.stevedudenhoeffer.com/steve/gadfly-reports:latest ``` ### Deploy behind Traefik (expose over a domain) ```yaml # docker-compose.yml — publish gadfly-reports at https://reports.example.com via Traefik. services: gadfly-reports: image: gitea.stevedudenhoeffer.com/steve/gadfly-reports:latest restart: unless-stopped environment: # Auth is built in: callers (gadfly emit, gadfly-mcp) send this as a bearer # token; /healthz stays open. ADDR and DB default to :8090 and # /data/gadfly-reports.db inside the image. GADFLY_REPORTS_TOKEN: ${GADFLY_REPORTS_TOKEN:?set GADFLY_REPORTS_TOKEN in .env} volumes: - gadfly-reports-data:/data networks: [traefik] healthcheck: test: ["CMD", "wget", "-q", "-O", "-", "http://localhost:8090/healthz"] # Fast probe so Traefik resumes routing within ~1s of a restart (the daemon # binds the port in milliseconds). Without a fast probe Traefik 502s until the # first check — the usual "why is it down for 30s after restart". interval: 5s timeout: 3s retries: 3 start_period: 10s start_interval: 1s # probe every 1s during start_period (needs Docker 25+) labels: - "traefik.enable=true" - "traefik.http.routers.gadfly-reports.rule=Host(`reports.example.com`)" - "traefik.http.routers.gadfly-reports.entrypoints=websecure" - "traefik.http.routers.gadfly-reports.tls=true" - "traefik.http.routers.gadfly-reports.tls.certresolver=letsencrypt" - "traefik.http.services.gadfly-reports.loadbalancer.server.port=8090" volumes: gadfly-reports-data: networks: traefik: external: true # the network your Traefik instance is attached to ``` Put `GADFLY_REPORTS_TOKEN=` in a `.env` beside the compose file. Tailor the three Traefik bits to your setup — the **host** (`reports.example.com`), the **entrypoint** (`websecure`) and the **certresolver** (`letsencrypt`) must match your Traefik config, and the `traefik` network must be the external one Traefik watches. Traefik terminates TLS and forwards to the container's `:8090`. Then point `gadfly`'s `GADFLY_FINDINGS_URL` and `gadfly-mcp`'s `--store` at `https://reports.example.com` (with the same token). On `docker compose pull && docker compose up -d`, the fast healthcheck lets Traefik resume routing within ~1s (the daemon starts in milliseconds — Traefik just won't route to a container whose health probe hasn't passed yet, which is the "down for 30s after restart" gotcha). Your data lives on the `gadfly-reports-data` volume and survives restarts; the only loss exposure is a review POSTing findings during that ~1s window, since gadfly's emit is fire-and-forget (no retry) — negligible against reviews that take minutes. ## HTTP API (the canonical contract) | Method & path | Body / query | Purpose | |---|---|---| | `GET /healthz` | — | liveness (open even when a token is set) | | `GET /` · `GET /ui` | — | **view-only dashboard** — HTML shell, public; its JS fetches the gated endpoints with the token | | `POST /runs` | one run object | upsert a model's review of a PR (timing/tokens) | | `POST /reports` | JSON **array** of report objects | record findings + which model reported each | | `POST /findings/{id}/grade` | `{is_real, severity?, usefulness?, notes?, grader?}` | record a triage grade | | `GET /export` | — | flat report×finding×run×latest-grade rows — the dashboard feed | | `GET /runs` | — | list all runs (timing/tokens), oldest first | | `GET /scoreboard` | — | points-free per-model rollup | `POST /runs` body: `{run_id, repo, pr, model, provider, lenses, duration_secs, input_tokens?, output_tokens?, cost_usd?}` (re-posting the same `run_id` updates it). `POST /reports` array element: `{repo, pr, lens, file, line, title, model, provider, run_id, raw_severity, detail}`. `GET /scoreboard` element: `{model, provider, runs, minutes, input_tokens, output_tokens, findings, confirmed, false_positive, ungraded, by_severity:{severity:count}}`. If `GADFLY_REPORTS_TOKEN` is set, every route except the public view shell (`/healthz`, `/`, `/ui`) requires `Authorization: Bearer `. The `/ui` shell carries no data itself — its JS sends the token on each fetch — so the public shell leaks nothing. ## Configuration | Env | Default | Meaning | |-----|---------|---------| | `GADFLY_REPORTS_ADDR` | `:8090` | listen address | | `GADFLY_REPORTS_DB` | `gadfly-reports.db` (`/data/gadfly-reports.db` in Docker) | SQLite path | | `GADFLY_REPORTS_TOKEN` | *(empty)* | bearer token callers must present (empty = open) | CLI flags `--addr` / `--db` / `--token` override the env. ## Dashboard A built-in **read-only dashboard** ships at **`/ui`** (hit the host root and you're redirected there). It's a single self-contained page that pulls `/runs` + `/export` and does everything in your browser: a **per-model performance table** — runs, minutes, findings, confirmed / false-positive / ungraded, points, **points-per-minute**, points-per-run, by-severity — with **drill-down filters** (date range, repo, provider, model, lens, grade/severity), free-text search, and a click-to-scope findings detail table. True to the store's "no points" rule, **scoring lives in the browser**: the page has an editable points curve (default `trivial=1, small=3, medium=5, high=8, critical=20`) and computes `points = Σ weight[severity]·count` and `value/min = points / minutes` on the fly — retune it without touching stored data. There's also an editable **false-positive penalty ×** (default `-0.5`). A false positive has no graded severity, so it's penalized by the severity the model **claimed** (its lens verdict — Blocking→high, Minor→small): `penalty × points[claimed]`. So a Blocking-claimed FP at `-0.5` costs `high(8) × -0.5 = -4`, and a model with the odd good find but many false positives nets *down* — even negative — instead of coasting on its hits. And an editable **solo-find bonus ×** (default `1.5`). Because findings are content-addressed, the number of models that reported one is known, so a confirmed finding that **only that model** caught (no other model reported it) scores `severity × bonus` — rewarding catching what the swarm missed. The `solo` column counts those. This is derived from the data (reporter count); the grader never has to flag it. Set the bonus to `1` to disable. Its mirror, **solo-error penalty ×** (default `1.5`), multiplies the FP penalty when a false positive was made by **only that model** — a unique wrong claim is noisier than a shared mistake. So a Blocking-claimed solo FP costs `high(8) × -0.5 × 1.5 = -6` vs `-4` for a shared one. Set to `1` to disable. Auth: the `/ui` shell is public (it holds no data); paste the store token into its **connect** box, or open `/ui?token=` once (remembered in `localStorage`). Prefer your own dashboard? Point Grafana/Metabase/etc. at the SQLite file or the same `/export` + `/scoreboard` + `/runs` JSON. ## How it fits together - **[gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly)** POSTs findings here after each review when `GADFLY_FINDINGS_URL` points at this store (advisory; off by default). - **[gadfly-mcp](https://gitea.stevedudenhoeffer.com/steve/gadfly-mcp)** is the MCP server Claude uses to list findings and record grades against this store. ## Build / test ```sh go build ./... go test ./... gofmt -l . # must be clean ``` ## License MIT © 2026 Steve Dudenhoeffer.