Files
gadfly-reports/README.md
T
steve ddcf42a3ce
Build & push image / build-and-push (push) Successful in 1m13s
CI / test (push) Successful in 10m39s
feat: gadfly-reports — findings store + scoreboard daemon
SQLite-backed HTTP store for Gadfly review findings, per-review run timings, and human/Claude grades, with a points-free per-model scoreboard. Pure fact store: it computes no points or rankings (the dashboard maps severity->points client-side and retunes without re-scoring). Findings are content-addressed by location so cross-model reports collapse for consensus; one grade per finding, latest wins. Pure-Go SQLite (CGO-free) + Docker image CI + tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 23:55:24 -04:00

97 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🪰📋 gadfly-reports
A small **durable store + scoreboard** for [Gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly)
review findings. Gadfly (and any CI) POST each model's findings and per-review timing here; a human
or Claude — via [gadfly-mcp](https://gitea.stevedudenhoeffer.com/steve/gadfly-mcp) — later grades
each finding. It's a single Go binary backed by SQLite, speaking a tiny HTTP API.
> ### 🤖 Heads up: this is a vibe-coded project
> gadfly-reports was built almost entirely by an AI agent (Claude Code) — the design, the code, and
> these docs. It's small and it's tested, but treat it accordingly: it's a homelab-grade service,
> not a hardened product, and there may be the occasional AI-flavored rough edge. Issues and PRs
> welcome.
## What it stores — and what it deliberately doesn't
gadfly-reports is a **pure fact store**:
- **runs** — one per model's review of a PR: wall-clock duration, lens count, optional token/cost.
- **findings** — **content-addressed by location** (`repo + pr + lens + file + line`), so the *same*
issue raised by several models collapses to one finding with many **reports**. That collapse is
what makes cross-model **consensus** and per-model **precision** measurable.
- **grades** — a triage verdict per finding: `is_real`, `severity`
(`trivial|small|medium|high|critical`), optional `usefulness` (15), notes, grader. Grade history
is kept; the latest wins.
It stores **no points and computes no rankings.** Mapping severity → points and ranking models by
"value per minute" (or per token) is a **client/dashboard concern**, so you can retune the curve any
time without migrating or re-scoring stored data.
## Run it
```sh
# from source
go run gitea.stevedudenhoeffer.com/steve/gadfly-reports@latest serve
# or Docker (image published by CI on every push to main)
docker run -d --name gadfly-reports -p 8090:8090 -v gadfly-reports-data:/data \
-e GADFLY_REPORTS_TOKEN=change-me \
gitea.stevedudenhoeffer.com/steve/gadfly-reports:latest
```
## HTTP API (the canonical contract)
| Method & path | Body / query | Purpose |
|---|---|---|
| `GET /healthz` | — | liveness (open even when a token is set) |
| `POST /runs` | one run object | upsert a model's review of a PR (timing/tokens) |
| `POST /reports` | JSON **array** of report objects | record findings + which model reported each |
| `POST /findings/{id}/grade` | `{is_real, severity?, usefulness?, notes?, grader?}` | record a triage grade |
| `GET /export` | — | flat report×finding×run×latest-grade rows — the dashboard feed |
| `GET /scoreboard` | — | points-free per-model rollup |
`POST /runs` body: `{run_id, repo, pr, model, provider, lenses, duration_secs, input_tokens?, output_tokens?, cost_usd?}`
(re-posting the same `run_id` updates it).
`POST /reports` array element: `{repo, pr, lens, file, line, title, model, provider, run_id, raw_severity, detail}`.
`GET /scoreboard` element: `{model, provider, runs, minutes, input_tokens, output_tokens, findings, confirmed, false_positive, ungraded, by_severity:{severity:count}}`.
If `GADFLY_REPORTS_TOKEN` is set, every route except `/healthz` requires `Authorization: Bearer <token>`.
## Configuration
| Env | Default | Meaning |
|-----|---------|---------|
| `GADFLY_REPORTS_ADDR` | `:8090` | listen address |
| `GADFLY_REPORTS_DB` | `gadfly-reports.db` (`/data/gadfly-reports.db` in Docker) | SQLite path |
| `GADFLY_REPORTS_TOKEN` | *(empty)* | bearer token callers must present (empty = open) |
CLI flags `--addr` / `--db` / `--token` override the env.
## Dashboards
Point anything at the JSON endpoints (or the SQLite file read-only). `GET /export` is the flat feed;
`GET /scoreboard` is the per-model rollup. Compute points and value-per-minute **in the dashboard**,
e.g. with a curve like `trivial=1, small=3, medium=5, high=8, critical=20`
`points = Σ weight[severity]·by_severity[severity]`, `value/min = points / minutes`.
## How it fits together
- **[gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly)** POSTs findings here after each
review when `GADFLY_FINDINGS_URL` points at this store (advisory; off by default).
- **[gadfly-mcp](https://gitea.stevedudenhoeffer.com/steve/gadfly-mcp)** is the MCP server Claude
uses to list findings and record grades against this store.
## Build / test
```sh
go build ./...
go test ./...
gofmt -l . # must be clean
```
## License
MIT © 2026 Steve Dudenhoeffer.