feat: gadfly-mcp — MCP server for grading gadfly-reports findings

Thin, stateless stdio MCP server (official Go SDK) that exposes a gadfly-reports store to an MCP client (e.g. Claude). Tools: list_findings, record_finding_grade, scoreboard (grader forced to claude). Launch via 'go run ...@latest' — nothing to install. Core logic tested against httptest, no daemon required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 23:55:24 -04:00
parent bb6bd209b4
commit f92e54e3ed
9 changed files with 730 additions and 28 deletions
@@ -0,0 +1,26 @@
 name: CI
 on:
  push:
    branches: [main]
  pull_request:
    types: [opened, synchronize, reopened]
  workflow_dispatch: {}
 jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: "1.26"
      - name: Build
        run: go build ./...
      - name: Vet
        run: go vet ./...
      - name: gofmt
        run: test -z "$(gofmt -l .)" || { gofmt -l .; echo "gofmt needed"; exit 1; }
      - name: Test (race)
        run: go test -race ./...
@@ -1,27 +1,2 @@
-# ---> Go
+# build output
-# If you prefer the allow list template instead of the deny list, see community template:
+/gadfly-mcp
 # https://github.com/github/gitignore/blob/main/community/Golang/Go.AllowList.gitignore
 #
 # Binaries for programs and plugins
 *.exe
 *.exe~
 *.dll
 *.so
 *.dylib
 # Test binary, built with `go test -c`
 *.test
 # Output of the go coverage tool, specifically when used with LiteIDE
 *.out
 # Dependency directories (remove the comment below to include it)
 # vendor/
 # Go workspace file
 go.work
 go.work.sum
 # env file
 .env
@@ -0,0 +1,50 @@
 # gadfly-mcp — Developer Guide
 A stdio [MCP](https://modelcontextprotocol.io) server exposing the
 [gadfly-reports](https://gitea.stevedudenhoeffer.com/steve/gadfly-reports) findings store to an MCP
 client (e.g. Claude). It is a **thin, stateless HTTP client** to the store — it never opens SQLite
 and never imports the store's package.
 > This is a public, **vibe-coded** project (built largely by an AI agent). Keep that honest in the
 > README; it's homelab-grade.
 ## Shape
 - Single `main.go` (`package main`) at the repo root, so the launch path is just
  `go run gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest` — no `cmd/` subpath. This is the
  whole point: the client compiles + caches it on demand; nothing to install or manage.
 - Uses the official Go MCP SDK (`github.com/modelcontextprotocol/go-sdk`): `mcp.NewServer` →
  `mcp.AddTool[In,Out]` (input schemas inferred from struct + `jsonschema` tags) → `server.Run(ctx,
  &mcp.StdioTransport{})`.
 - Config: `--store` flag (default `$GADFLY_REPORTS_URL`, else `http://localhost:8090`); bearer token
  from `$GADFLY_REPORTS_TOKEN`, sent on every request when set.
 ## Contract with gadfly-reports
 The store's HTTP/JSON API is the contract — its README is the **source of truth**. This client
 mirrors only the subset it needs with small local structs (`exportRow`, `modelStat`, `gradeReq`,
 …). If you change a field here, check it against gadfly-reports' `server.go`/`store.go`. Endpoints
 used: `GET /export`, `POST /findings/{id}/grade`, `GET /scoreboard`.
 Three tools: `list_findings`, `record_finding_grade`, `scoreboard`. The grader is always forced to
 `"claude"`. The store holds **no points**; ranking by points/value-per-minute is a client concern —
 say so in the `scoreboard` tool description.
 ## Tests
 The core logic (`groupFindings` / `listFindings` / `recordGrade` / `scoreboard`) is factored free of
 MCP types and tested against an `httptest.Server`, so tests need no real daemon. Keep it that way —
 add a test when you add a tool or change the grouping/filtering.
 ```sh
 go build ./...
 go vet ./...
 gofmt -l .        # must be empty
 go test -race ./...
 ```
 ## When making changes
 - Keep this a thin client: no SQLite, no business logic the store should own.
 - Keep the launch path a root `package main` (don't move it under `cmd/`), so `go run …@latest`
  stays the one-liner the README documents.
@@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2026 Steve Dudenhoeffer
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
@@ -1,2 +1,62 @@
-# gadfly-mcp
+# 🪰🔌 gadfly-mcp
 An [MCP](https://modelcontextprotocol.io) server that lets an MCP client (e.g. Claude) read and
 **grade** [Gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly) review findings stored in
 [gadfly-reports](https://gitea.stevedudenhoeffer.com/steve/gadfly-reports). It's a tiny, stateless
 stdio process — a thin HTTP client to the store — so there's nothing to install or manage: your MCP
 client launches it on demand with `go run …@latest`.
 > ### 🤖 Heads up: this is a vibe-coded project
 > gadfly-mcp was built almost entirely by an AI agent (Claude Code) — code and docs. It's small and
 > tested, but treat it as homelab-grade. Issues and PRs welcome.
 ## Add it to Claude
 The store (`gadfly-reports`) runs persistently somewhere; this MCP server is throwaway. Point your
 client at it via `go run` (first launch compiles + caches; needs Go + access to the module host):
 ```jsonc
 {
  "mcpServers": {
    "gadfly": {
      "command": "go",
      "args": [
        "run", "gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest",
        "--store", "https://gadfly-reports.your-host:8090"
      ],
      "env": { "GADFLY_REPORTS_TOKEN": "the-same-token-the-store-uses" }
    }
  }
 }
 ```
 `--store` defaults to `$GADFLY_REPORTS_URL` (else `http://localhost:8090`). If the store requires a
 bearer token, set `GADFLY_REPORTS_TOKEN`.
 ## Tools
 | Tool | Args | Does |
 |---|---|---|
 | `list_findings` | `repo?`, `pr?`, `only_ungraded?` | lists findings (one entry per finding; reports from multiple models grouped, distinct models listed) |
 | `record_finding_grade` | `finding_id`, `is_real`, `severity?`, `usefulness?`, `notes?` | records a triage grade (grader is always `claude`) |
 | `scoreboard` | `model?` | per-model rollup (runs, minutes, tokens, confirmed-by-severity histogram) |
 `severity` is one of `trivial|small|medium|high|critical` (set it when `is_real=true`; omit for a
 false positive). **Points are not stored or returned** — gadfly-reports keeps raw facts, so any
 "value per minute / per token" ranking is computed client-side (map severity → points, divide by
 minutes). Use `scoreboard` for the raw material.
 Typical flow: *"List the ungraded gadfly findings on PR 2, look into each against the code, and
 record a grade for each."*
 ## Build / test
 ```sh
 go build ./...
 go test ./...
 gofmt -l .   # must be clean
 ```
 ## License
 MIT © 2026 Steve Dudenhoeffer.
@@ -0,0 +1,14 @@
 module gitea.stevedudenhoeffer.com/steve/gadfly-mcp
 go 1.26
 require github.com/modelcontextprotocol/go-sdk v1.6.1
 require (
 	github.com/google/jsonschema-go v0.4.3 // indirect
 	github.com/segmentio/asm v1.1.3 // indirect
 	github.com/segmentio/encoding v0.5.4 // indirect
 	github.com/yosida95/uritemplate/v3 v3.0.2 // indirect
 	golang.org/x/oauth2 v0.35.0 // indirect
 	golang.org/x/sys v0.41.0 // indirect
 )
@@ -0,0 +1,20 @@
 github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
 github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
 github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
 github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
 github.com/google/jsonschema-go v0.4.3 h1:/DBOLZTfDow7pe2GmaJNhltueGTtDKICi8V8p+DQPd0=
 github.com/google/jsonschema-go v0.4.3/go.mod h1:r5quNTdLOYEz95Ru18zA0ydNbBuYoo9tgaYcxEYhJVE=
 github.com/modelcontextprotocol/go-sdk v1.6.1 h1:0zOSupjKUxPKSocPT1Wtago+mUHU2/uZ4xSOY0FGReU=
 github.com/modelcontextprotocol/go-sdk v1.6.1/go.mod h1:kzm3kzFL1/+AziGOE0nUs3gvPoNxMCvkxokMkuFapXQ=
 github.com/segmentio/asm v1.1.3 h1:WM03sfUOENvvKexOLp+pCqgb/WDjsi7EK8gIsICtzhc=
 github.com/segmentio/asm v1.1.3/go.mod h1:Ld3L4ZXGNcSLRg4JBsZ3//1+f/TjYl0Mzen/DQy1EJg=
 github.com/segmentio/encoding v0.5.4 h1:OW1VRern8Nw6ITAtwSZ7Idrl3MXCFwXHPgqESYfvNt0=
 github.com/segmentio/encoding v0.5.4/go.mod h1:HS1ZKa3kSN32ZHVZ7ZLPLXWvOVIiZtyJnO1gPH1sKt0=
 github.com/yosida95/uritemplate/v3 v3.0.2 h1:Ed3Oyj9yrmi9087+NczuL5BwkIc4wvTb5zIM+UJPGz4=
 github.com/yosida95/uritemplate/v3 v3.0.2/go.mod h1:ILOh0sOhIJR3+L/8afwt/kE++YT040gmv5BQTMR2HP4=
 golang.org/x/oauth2 v0.35.0 h1:Mv2mzuHuZuY2+bkyWXIHMfhNdJAdwW3FuWeCPYN5GVQ=
 golang.org/x/oauth2 v0.35.0/go.mod h1:lzm5WQJQwKZ3nwavOZ3IS5Aulzxi68dUSgRHujetwEA=
 golang.org/x/sys v0.41.0 h1:Ivj+2Cp/ylzLiEU89QhWblYnOE9zerudt9Ftecq2C6k=
 golang.org/x/sys v0.41.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
 golang.org/x/tools v0.42.0 h1:uNgphsn75Tdz5Ji2q36v/nsFSfR/9BRFvqhGBaJGd5k=
 golang.org/x/tools v0.42.0/go.mod h1:Ma6lCIwGZvHK6XtgbswSoWroEkhugApmsXyrUmBhfr0=
@@ -0,0 +1,362 @@
 // Command gadfly-mcp is a stdio MCP server that exposes a gadfly-reports findings
 // store to an MCP client (e.g. Claude). It is a THIN HTTP client to the gadfly-reports daemon: it
 // never opens the SQLite database directly and does not import the daemon's
 // package, so it mirrors the store's JSON shapes with small local structs.
 //
 // Launch it with:
 //
 //	go run gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest
 //
 // Configuration:
 //
 //	--store        base URL of the gadfly-reports daemon
 //	               (default $GADFLY_REPORTS_URL, else http://localhost:8090)
 //	$GADFLY_REPORTS_TOKEN  if set, sent as "Authorization: Bearer <token>" on every request
 //
 // Tools: list_findings, record_finding_grade, scoreboard. The grader is always
 // "claude". gadfly-reports stores no points; the scoreboard tool's points are a
 // client-side concern (severity -> points, divided by minutes).
 package main
 import (
 	"bytes"
 	"context"
 	"encoding/json"
 	"flag"
 	"fmt"
 	"io"
 	"log"
 	"net/http"
 	"net/url"
 	"os"
 	"strings"
 	"time"
 	"github.com/modelcontextprotocol/go-sdk/mcp"
 )
 // ---- local mirrors of the store's JSON shapes (see ../../store.go) ----
 // exportRow mirrors store.ExportRow: one report joined with its finding, run
 // timing, and latest grade. Many rows can share a finding_id (one per reporting
 // model), which is why list_findings groups them.
 type exportRow struct {
 	FindingID  string `json:"finding_id"`
 	Repo       string `json:"repo"`
 	PR         int    `json:"pr"`
 	Lens       string `json:"lens"`
 	File       string `json:"file"`
 	Line       int    `json:"line"`
 	Title      string `json:"title"`
 	Model      string `json:"model"`
 	Provider   string `json:"provider"`
 	RunID      string `json:"run_id"`
 	Graded     bool   `json:"graded"`
 	IsReal     *bool  `json:"is_real"`
 	Severity   string `json:"severity"`
 	Usefulness *int   `json:"usefulness"`
 	Grader     string `json:"grader"`
 }
 // modelStat mirrors store.ModelStat: the points-free per-model rollup.
 type modelStat struct {
 	Model         string         `json:"model"`
 	Provider      string         `json:"provider"`
 	Runs          int            `json:"runs"`
 	Minutes       float64        `json:"minutes"`
 	InputTokens   int64          `json:"input_tokens"`
 	OutputTokens  int64          `json:"output_tokens"`
 	Findings      int            `json:"findings"`
 	Confirmed     int            `json:"confirmed"`
 	FalsePositive int            `json:"false_positive"`
 	Ungraded      int            `json:"ungraded"`
 	BySeverity    map[string]int `json:"by_severity"`
 }
 // findingOut is one grouped finding as list_findings returns it.
 type findingOut struct {
 	FindingID  string   `json:"finding_id"`
 	Repo       string   `json:"repo"`
 	PR         int      `json:"pr"`
 	Lens       string   `json:"lens"`
 	File       string   `json:"file,omitempty"`
 	Line       int      `json:"line,omitempty"`
 	Title      string   `json:"title"`
 	Models     []string `json:"models"`
 	Graded     bool     `json:"graded"`
 	IsReal     *bool    `json:"is_real,omitempty"`
 	Severity   string   `json:"severity,omitempty"`
 	Usefulness *int     `json:"usefulness,omitempty"`
 	Grader     string   `json:"grader,omitempty"`
 }
 // gradeReq is the POST body for /findings/{id}/grade. grader is always "claude".
 type gradeReq struct {
 	IsReal     bool   `json:"is_real"`
 	Severity   string `json:"severity,omitempty"`
 	Usefulness *int   `json:"usefulness,omitempty"`
 	Notes      string `json:"notes,omitempty"`
 	Grader     string `json:"grader"`
 }
 // ---- thin HTTP client to the gadfly-reports daemon ----
 type client struct {
 	base  string
 	token string
 	hc    *http.Client
 }
 func newClient(base, token string) *client {
 	return &client{
 		base:  strings.TrimRight(base, "/"),
 		token: token,
 		hc:    &http.Client{Timeout: 30 * time.Second},
 	}
 }
 // do issues a request, attaching the bearer token if configured, and returns the
 // response body. A non-2xx status becomes an error carrying the body (which the
 // daemon shapes as {"error":...}).
 func (c *client) do(ctx context.Context, method, path string, body io.Reader) ([]byte, error) {
 	req, err := http.NewRequestWithContext(ctx, method, c.base+path, body)
 	if err != nil {
 		return nil, err
 	}
 	if body != nil {
 		req.Header.Set("Content-Type", "application/json")
 	}
 	if c.token != "" {
 		req.Header.Set("Authorization", "Bearer "+c.token)
 	}
 	resp, err := c.hc.Do(req)
 	if err != nil {
 		return nil, err
 	}
 	defer resp.Body.Close()
 	b, err := io.ReadAll(resp.Body)
 	if err != nil {
 		return nil, err
 	}
 	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
 		return nil, fmt.Errorf("gadfly-reports %s %s: %s: %s", method, path, resp.Status, strings.TrimSpace(string(b)))
 	}
 	return b, nil
 }
 func (c *client) getJSON(ctx context.Context, path string, out any) error {
 	b, err := c.do(ctx, http.MethodGet, path, nil)
 	if err != nil {
 		return err
 	}
 	return json.Unmarshal(b, out)
 }
 func (c *client) postJSON(ctx context.Context, path string, in any) ([]byte, error) {
 	buf, err := json.Marshal(in)
 	if err != nil {
 		return nil, err
 	}
 	return c.do(ctx, http.MethodPost, path, bytes.NewReader(buf))
 }
 // ---- core logic (kept free of MCP types so it is directly testable) ----
 // groupFindings collapses export rows (one per reporting model) into one entry
 // per finding_id, preserving first-seen order, with distinct reporting models.
 // Filters: repo (exact, when non-empty), pr (when non-nil), only-ungraded.
 func groupFindings(rows []exportRow, repo string, pr *int, onlyUngraded bool) []findingOut {
 	type acc struct {
 		out  *findingOut
 		seen map[string]bool
 	}
 	byID := map[string]*acc{}
 	var order []string
 	for _, r := range rows {
 		if repo != "" && r.Repo != repo {
 			continue
 		}
 		if pr != nil && r.PR != *pr {
 			continue
 		}
 		a, ok := byID[r.FindingID]
 		if !ok {
 			a = &acc{
 				out: &findingOut{
 					FindingID:  r.FindingID,
 					Repo:       r.Repo,
 					PR:         r.PR,
 					Lens:       r.Lens,
 					File:       r.File,
 					Line:       r.Line,
 					Title:      r.Title,
 					Graded:     r.Graded,
 					IsReal:     r.IsReal,
 					Severity:   r.Severity,
 					Usefulness: r.Usefulness,
 					Grader:     r.Grader,
 				},
 				seen: map[string]bool{},
 			}
 			byID[r.FindingID] = a
 			order = append(order, r.FindingID)
 		}
 		if r.Model != "" && !a.seen[r.Model] {
 			a.seen[r.Model] = true
 			a.out.Models = append(a.out.Models, r.Model)
 		}
 	}
 	out := make([]findingOut, 0, len(order))
 	for _, id := range order {
 		f := byID[id].out
 		if onlyUngraded && f.Graded {
 			continue
 		}
 		if f.Models == nil {
 			f.Models = []string{}
 		}
 		out = append(out, *f)
 	}
 	return out
 }
 // listFindings fetches /export, groups + filters it, and returns pretty JSON.
 func listFindings(ctx context.Context, c *client, repo string, pr *int, onlyUngraded bool) (string, error) {
 	var rows []exportRow
 	if err := c.getJSON(ctx, "/export", &rows); err != nil {
 		return "", err
 	}
 	return prettyJSON(groupFindings(rows, repo, pr, onlyUngraded))
 }
 // recordGrade POSTs a grade for findingID (grader forced to "claude").
 func recordGrade(ctx context.Context, c *client, findingID string, g gradeReq) (string, error) {
 	g.Grader = "claude"
 	path := "/findings/" + url.PathEscape(findingID) + "/grade"
 	b, err := c.postJSON(ctx, path, g)
 	if err != nil {
 		return "", err
 	}
 	verdict := "false positive"
 	if g.IsReal {
 		verdict = "real"
 		if g.Severity != "" {
 			verdict += " (" + g.Severity + ")"
 		}
 	}
 	return fmt.Sprintf("graded finding %s as %s [%s]", findingID, verdict, strings.TrimSpace(string(b))), nil
 }
 // scoreboard fetches /scoreboard, optionally narrows to one model, returns JSON.
 func scoreboard(ctx context.Context, c *client, model string) (string, error) {
 	var stats []modelStat
 	if err := c.getJSON(ctx, "/scoreboard", &stats); err != nil {
 		return "", err
 	}
 	if model != "" {
 		filtered := make([]modelStat, 0, 1)
 		for _, s := range stats {
 			if s.Model == model {
 				filtered = append(filtered, s)
 			}
 		}
 		stats = filtered
 	}
 	return prettyJSON(stats)
 }
 func prettyJSON(v any) (string, error) {
 	b, err := json.MarshalIndent(v, "", "  ")
 	if err != nil {
 		return "", err
 	}
 	return string(b), nil
 }
 func textResult(s string) *mcp.CallToolResult {
 	return &mcp.CallToolResult{Content: []mcp.Content{&mcp.TextContent{Text: s}}}
 }
 // ---- MCP tool input shapes (json/jsonschema tags drive the input schema) ----
 type listFindingsIn struct {
 	Repo         string `json:"repo,omitempty" jsonschema:"filter to this repository (exact match)"`
 	PR           *int   `json:"pr,omitempty" jsonschema:"filter to this pull request number"`
 	OnlyUngraded bool   `json:"only_ungraded,omitempty" jsonschema:"when true, return only findings that have no grade yet"`
 }
 type recordGradeIn struct {
 	FindingID  string `json:"finding_id" jsonschema:"the finding id to grade"`
 	IsReal     bool   `json:"is_real" jsonschema:"true if the finding is a genuine problem, false if a false positive"`
 	Severity   string `json:"severity,omitempty" jsonschema:"required when is_real is true: one of trivial, small, medium, high, critical; omit when is_real is false"`
 	Usefulness *int   `json:"usefulness,omitempty" jsonschema:"optional 1..5 rating of how useful the finding was"`
 	Notes      string `json:"notes,omitempty" jsonschema:"optional free-text rationale for the grade"`
 }
 type scoreboardIn struct {
 	Model string `json:"model,omitempty" jsonschema:"optional: narrow the scoreboard to a single model"`
 }
 func main() {
 	store := flag.String("store", envOr("GADFLY_REPORTS_URL", "http://localhost:8090"), "base URL of the gadfly-reports store daemon")
 	flag.Parse()
 	c := newClient(*store, os.Getenv("GADFLY_REPORTS_TOKEN"))
 	server := mcp.NewServer(&mcp.Implementation{Name: "gadfly-mcp", Version: "0.1.0"}, nil)
 	mcp.AddTool(server, &mcp.Tool{
 		Name:        "list_findings",
 		Description: "List Gadfly review findings from the gadfly-reports store, one entry per finding (reports from multiple models are grouped, with the distinct reporting models listed). Optionally filter by repo, pr, or only_ungraded to focus on findings that still need a grade.",
 	}, func(ctx context.Context, _ *mcp.CallToolRequest, in listFindingsIn) (*mcp.CallToolResult, any, error) {
 		out, err := listFindings(ctx, c, in.Repo, in.PR, in.OnlyUngraded)
 		if err != nil {
 			return nil, nil, err
 		}
 		return textResult(out), nil, nil
 	})
 	mcp.AddTool(server, &mcp.Tool{
 		Name:        "record_finding_grade",
 		Description: "Grade a single finding in the gadfly-reports store (grader is always \"claude\"). Set is_real=true with a severity (trivial|small|medium|high|critical) for a genuine problem, or is_real=false (omit severity) for a false positive.",
 	}, func(ctx context.Context, _ *mcp.CallToolRequest, in recordGradeIn) (*mcp.CallToolResult, any, error) {
 		if strings.TrimSpace(in.FindingID) == "" {
 			return nil, nil, fmt.Errorf("finding_id is required")
 		}
 		msg, err := recordGrade(ctx, c, in.FindingID, gradeReq{
 			IsReal:     in.IsReal,
 			Severity:   in.Severity,
 			Usefulness: in.Usefulness,
 			Notes:      in.Notes,
 		})
 		if err != nil {
 			return nil, nil, err
 		}
 		return textResult(msg), nil, nil
 	})
 	mcp.AddTool(server, &mcp.Tool{
 		Name:        "scoreboard",
 		Description: "Per-model rollup from the gadfly-reports store (runs, minutes, tokens, findings, confirmed/false-positive/ungraded counts, and a confirmed-by-severity histogram). NOTE: gadfly-reports stores no points; any points/value-per-minute ranking is computed CLIENT-SIDE by mapping severity to points and dividing by minutes. Optionally filter to a single model.",
 	}, func(ctx context.Context, _ *mcp.CallToolRequest, in scoreboardIn) (*mcp.CallToolResult, any, error) {
 		out, err := scoreboard(ctx, c, in.Model)
 		if err != nil {
 			return nil, nil, err
 		}
 		return textResult(out), nil, nil
 	})
 	if err := server.Run(context.Background(), &mcp.StdioTransport{}); err != nil {
 		log.Printf("gadfly-reports mcp: %v", err)
 		os.Exit(1)
 	}
 }
 func envOr(key, def string) string {
 	if v := os.Getenv(key); v != "" {
 		return v
 	}
 	return def
 }
@@ -0,0 +1,174 @@
 package main
 import (
 	"context"
 	"encoding/json"
 	"io"
 	"net/http"
 	"net/http/httptest"
 	"testing"
 )
 func intp(i int) *int    { return &i }
 func boolp(b bool) *bool { return &b }
 // sample /export rows: finding f1 reported by two models (one ungraded? no,
 // graded real), f2 reported once and graded false-positive, f3 ungraded, plus a
 // row for a different repo/pr to exercise filtering.
 func sampleRows() []exportRow {
 	return []exportRow{
 		{FindingID: "f1", Repo: "acme/widget", PR: 7, Lens: "security", File: "a.go", Line: 10, Title: "SQL injection", Model: "gpt-4o", Provider: "openai", Graded: true, IsReal: boolp(true), Severity: "high", Usefulness: intp(5), Grader: "claude"},
 		{FindingID: "f1", Repo: "acme/widget", PR: 7, Lens: "security", File: "a.go", Line: 10, Title: "SQL injection", Model: "qwen2.5-coder:7b", Provider: "ollama", Graded: true, IsReal: boolp(true), Severity: "high", Usefulness: intp(5), Grader: "claude"},
 		{FindingID: "f1", Repo: "acme/widget", PR: 7, Lens: "security", File: "a.go", Line: 10, Title: "SQL injection", Model: "gpt-4o", Provider: "openai", Graded: true, IsReal: boolp(true), Severity: "high", Usefulness: intp(5), Grader: "claude"}, // dup model -> deduped
 		{FindingID: "f2", Repo: "acme/widget", PR: 7, Lens: "correctness", File: "b.go", Line: 22, Title: "off by one", Model: "gpt-4o", Provider: "openai", Graded: true, IsReal: boolp(false), Grader: "claude"},
 		{FindingID: "f3", Repo: "acme/widget", PR: 7, Lens: "performance", File: "c.go", Line: 3, Title: "n+1 query", Model: "qwen2.5-coder:7b", Provider: "ollama", Graded: false},
 		{FindingID: "f4", Repo: "other/repo", PR: 99, Lens: "docs", File: "d.go", Line: 1, Title: "typo", Model: "gpt-4o", Provider: "openai", Graded: false},
 	}
 }
 func TestGroupFindings_GroupingAndDedup(t *testing.T) {
 	got := groupFindings(sampleRows(), "", nil, false)
 	if len(got) != 4 {
 		t.Fatalf("want 4 grouped findings, got %d", len(got))
 	}
 	var f1 *findingOut
 	for i := range got {
 		if got[i].FindingID == "f1" {
 			f1 = &got[i]
 		}
 	}
 	if f1 == nil {
 		t.Fatal("f1 missing")
 	}
 	if len(f1.Models) != 2 {
 		t.Fatalf("f1 should have 2 distinct models, got %v", f1.Models)
 	}
 	if f1.Models[0] != "gpt-4o" || f1.Models[1] != "qwen2.5-coder:7b" {
 		t.Fatalf("f1 model order/dedup wrong: %v", f1.Models)
 	}
 	if !f1.Graded || f1.IsReal == nil || !*f1.IsReal || f1.Severity != "high" {
 		t.Fatalf("f1 grade not propagated: %+v", f1)
 	}
 }
 func TestGroupFindings_FilterRepoAndPR(t *testing.T) {
 	got := groupFindings(sampleRows(), "other/repo", nil, false)
 	if len(got) != 1 || got[0].FindingID != "f4" {
 		t.Fatalf("repo filter failed: %+v", got)
 	}
 	pr := 99
 	got = groupFindings(sampleRows(), "", &pr, false)
 	if len(got) != 1 || got[0].FindingID != "f4" {
 		t.Fatalf("pr filter failed: %+v", got)
 	}
 	pr = 7
 	got = groupFindings(sampleRows(), "acme/widget", &pr, false)
 	if len(got) != 3 {
 		t.Fatalf("combined repo+pr filter want 3, got %d", len(got))
 	}
 }
 func TestGroupFindings_OnlyUngraded(t *testing.T) {
 	got := groupFindings(sampleRows(), "", nil, true)
 	ids := map[string]bool{}
 	for _, f := range got {
 		ids[f.FindingID] = true
 		if f.Graded {
 			t.Fatalf("only_ungraded returned a graded finding: %s", f.FindingID)
 		}
 	}
 	if !ids["f3"] || !ids["f4"] || ids["f1"] || ids["f2"] {
 		t.Fatalf("only_ungraded set wrong: %v", ids)
 	}
 }
 func TestListFindings_EmptyModelsIsArray(t *testing.T) {
 	// A row with no model should still produce models:[] (not null) in JSON.
 	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
 		_ = json.NewEncoder(w).Encode([]exportRow{{FindingID: "x", Repo: "r", PR: 1, Lens: "l", Title: "t"}})
 	}))
 	defer srv.Close()
 	out, err := listFindings(context.Background(), newClient(srv.URL, ""), "", nil, false)
 	if err != nil {
 		t.Fatal(err)
 	}
 	var parsed []findingOut
 	if err := json.Unmarshal([]byte(out), &parsed); err != nil {
 		t.Fatalf("output not valid JSON: %v\n%s", err, out)
 	}
 	if len(parsed) != 1 || parsed[0].Models == nil {
 		t.Fatalf("expected models:[] non-nil, got %+v", parsed)
 	}
 }
 func TestRecordGrade_PathBodyAndAuth(t *testing.T) {
 	var gotPath, gotAuth string
 	var gotBody gradeReq
 	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		gotPath = r.URL.Path
 		gotAuth = r.Header.Get("Authorization")
 		b, _ := io.ReadAll(r.Body)
 		_ = json.Unmarshal(b, &gotBody)
 		_ = json.NewEncoder(w).Encode(map[string]string{"finding_id": "abc123"})
 	}))
 	defer srv.Close()
 	c := newClient(srv.URL, "sekret")
 	msg, err := recordGrade(context.Background(), c, "abc123", gradeReq{IsReal: true, Severity: "high", Usefulness: intp(4)})
 	if err != nil {
 		t.Fatal(err)
 	}
 	if gotPath != "/findings/abc123/grade" {
 		t.Fatalf("wrong path: %s", gotPath)
 	}
 	if gotAuth != "Bearer sekret" {
 		t.Fatalf("auth header not sent: %q", gotAuth)
 	}
 	if gotBody.Grader != "claude" {
 		t.Fatalf("grader should be forced to claude, got %q", gotBody.Grader)
 	}
 	if !gotBody.IsReal || gotBody.Severity != "high" || gotBody.Usefulness == nil || *gotBody.Usefulness != 4 {
 		t.Fatalf("body not forwarded correctly: %+v", gotBody)
 	}
 	if msg == "" {
 		t.Fatal("expected a confirmation message")
 	}
 }
 func TestRecordGrade_StoreErrorSurfaced(t *testing.T) {
 	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
 		w.WriteHeader(http.StatusBadRequest)
 		_ = json.NewEncoder(w).Encode(map[string]string{"error": "unknown finding_id"})
 	}))
 	defer srv.Close()
 	_, err := recordGrade(context.Background(), newClient(srv.URL, ""), "nope", gradeReq{IsReal: false})
 	if err == nil {
 		t.Fatal("expected non-2xx to surface as an error")
 	}
 }
 func TestScoreboard_FilterByModel(t *testing.T) {
 	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
 		_ = json.NewEncoder(w).Encode([]modelStat{
 			{Model: "gpt-4o", Provider: "openai", Runs: 3, Findings: 10},
 			{Model: "qwen2.5-coder:7b", Provider: "ollama", Runs: 5, Findings: 4},
 		})
 	}))
 	defer srv.Close()
 	out, err := scoreboard(context.Background(), newClient(srv.URL, ""), "gpt-4o")
 	if err != nil {
 		t.Fatal(err)
 	}
 	var parsed []modelStat
 	if err := json.Unmarshal([]byte(out), &parsed); err != nil {
 		t.Fatal(err)
 	}
 	if len(parsed) != 1 || parsed[0].Model != "gpt-4o" {
 		t.Fatalf("model filter failed: %+v", parsed)
 	}
 }