steve ddcf42a3ce
Build & push image / build-and-push (push) Successful in 1m13s
CI / test (push) Successful in 10m39s
feat: gadfly-reports — findings store + scoreboard daemon
SQLite-backed HTTP store for Gadfly review findings, per-review run timings, and human/Claude grades, with a points-free per-model scoreboard. Pure fact store: it computes no points or rankings (the dashboard maps severity->points client-side and retunes without re-scoring). Findings are content-addressed by location so cross-model reports collapse for consensus; one grade per finding, latest wins. Pure-Go SQLite (CGO-free) + Docker image CI + tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 23:55:24 -04:00

🪰📋 gadfly-reports

A small durable store + scoreboard for Gadfly review findings. Gadfly (and any CI) POST each model's findings and per-review timing here; a human or Claude — via gadfly-mcp — later grades each finding. It's a single Go binary backed by SQLite, speaking a tiny HTTP API.

🤖 Heads up: this is a vibe-coded project

gadfly-reports was built almost entirely by an AI agent (Claude Code) — the design, the code, and these docs. It's small and it's tested, but treat it accordingly: it's a homelab-grade service, not a hardened product, and there may be the occasional AI-flavored rough edge. Issues and PRs welcome.

What it stores — and what it deliberately doesn't

gadfly-reports is a pure fact store:

  • runs — one per model's review of a PR: wall-clock duration, lens count, optional token/cost.
  • findingscontent-addressed by location (repo + pr + lens + file + line), so the same issue raised by several models collapses to one finding with many reports. That collapse is what makes cross-model consensus and per-model precision measurable.
  • grades — a triage verdict per finding: is_real, severity (trivial|small|medium|high|critical), optional usefulness (15), notes, grader. Grade history is kept; the latest wins.

It stores no points and computes no rankings. Mapping severity → points and ranking models by "value per minute" (or per token) is a client/dashboard concern, so you can retune the curve any time without migrating or re-scoring stored data.

Run it

# from source
go run gitea.stevedudenhoeffer.com/steve/gadfly-reports@latest serve

# or Docker (image published by CI on every push to main)
docker run -d --name gadfly-reports -p 8090:8090 -v gadfly-reports-data:/data \
  -e GADFLY_REPORTS_TOKEN=change-me \
  gitea.stevedudenhoeffer.com/steve/gadfly-reports:latest

HTTP API (the canonical contract)

Method & path Body / query Purpose
GET /healthz liveness (open even when a token is set)
POST /runs one run object upsert a model's review of a PR (timing/tokens)
POST /reports JSON array of report objects record findings + which model reported each
POST /findings/{id}/grade {is_real, severity?, usefulness?, notes?, grader?} record a triage grade
GET /export flat report×finding×run×latest-grade rows — the dashboard feed
GET /scoreboard points-free per-model rollup

POST /runs body: {run_id, repo, pr, model, provider, lenses, duration_secs, input_tokens?, output_tokens?, cost_usd?} (re-posting the same run_id updates it).

POST /reports array element: {repo, pr, lens, file, line, title, model, provider, run_id, raw_severity, detail}.

GET /scoreboard element: {model, provider, runs, minutes, input_tokens, output_tokens, findings, confirmed, false_positive, ungraded, by_severity:{severity:count}}.

If GADFLY_REPORTS_TOKEN is set, every route except /healthz requires Authorization: Bearer <token>.

Configuration

Env Default Meaning
GADFLY_REPORTS_ADDR :8090 listen address
GADFLY_REPORTS_DB gadfly-reports.db (/data/gadfly-reports.db in Docker) SQLite path
GADFLY_REPORTS_TOKEN (empty) bearer token callers must present (empty = open)

CLI flags --addr / --db / --token override the env.

Dashboards

Point anything at the JSON endpoints (or the SQLite file read-only). GET /export is the flat feed; GET /scoreboard is the per-model rollup. Compute points and value-per-minute in the dashboard, e.g. with a curve like trivial=1, small=3, medium=5, high=8, critical=20points = Σ weight[severity]·by_severity[severity], value/min = points / minutes.

How it fits together

  • gadfly POSTs findings here after each review when GADFLY_FINDINGS_URL points at this store (advisory; off by default).
  • gadfly-mcp is the MCP server Claude uses to list findings and record grades against this store.

Build / test

go build ./...
go test ./...
gofmt -l .   # must be clean

License

MIT © 2026 Steve Dudenhoeffer.

S
Description
No description provided
Readme MIT 120 KiB
Languages
Go 60.6%
HTML 37.6%
Dockerfile 1.8%