feat: gadfly-mcp — MCP server for grading gadfly-reports findings
CI / test (push) Successful in 10m10s

Thin, stateless stdio MCP server (official Go SDK) that exposes a gadfly-reports store to an MCP client (e.g. Claude). Tools: list_findings, record_finding_grade, scoreboard (grader forced to claude). Launch via 'go run ...@latest' — nothing to install. Core logic tested against httptest, no daemon required.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-26 23:55:24 -04:00
parent bb6bd209b4
commit f92e54e3ed
9 changed files with 730 additions and 28 deletions
+26
View File
@@ -0,0 +1,26 @@
name: CI
on:
push:
branches: [main]
pull_request:
types: [opened, synchronize, reopened]
workflow_dispatch: {}
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: "1.26"
- name: Build
run: go build ./...
- name: Vet
run: go vet ./...
- name: gofmt
run: test -z "$(gofmt -l .)" || { gofmt -l .; echo "gofmt needed"; exit 1; }
- name: Test (race)
run: go test -race ./...
+2 -27
View File
@@ -1,27 +1,2 @@
# ---> Go # build output
# If you prefer the allow list template instead of the deny list, see community template: /gadfly-mcp
# https://github.com/github/gitignore/blob/main/community/Golang/Go.AllowList.gitignore
#
# Binaries for programs and plugins
*.exe
*.exe~
*.dll
*.so
*.dylib
# Test binary, built with `go test -c`
*.test
# Output of the go coverage tool, specifically when used with LiteIDE
*.out
# Dependency directories (remove the comment below to include it)
# vendor/
# Go workspace file
go.work
go.work.sum
# env file
.env
+50
View File
@@ -0,0 +1,50 @@
# gadfly-mcp — Developer Guide
A stdio [MCP](https://modelcontextprotocol.io) server exposing the
[gadfly-reports](https://gitea.stevedudenhoeffer.com/steve/gadfly-reports) findings store to an MCP
client (e.g. Claude). It is a **thin, stateless HTTP client** to the store — it never opens SQLite
and never imports the store's package.
> This is a public, **vibe-coded** project (built largely by an AI agent). Keep that honest in the
> README; it's homelab-grade.
## Shape
- Single `main.go` (`package main`) at the repo root, so the launch path is just
`go run gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest` — no `cmd/` subpath. This is the
whole point: the client compiles + caches it on demand; nothing to install or manage.
- Uses the official Go MCP SDK (`github.com/modelcontextprotocol/go-sdk`): `mcp.NewServer`
`mcp.AddTool[In,Out]` (input schemas inferred from struct + `jsonschema` tags) → `server.Run(ctx,
&mcp.StdioTransport{})`.
- Config: `--store` flag (default `$GADFLY_REPORTS_URL`, else `http://localhost:8090`); bearer token
from `$GADFLY_REPORTS_TOKEN`, sent on every request when set.
## Contract with gadfly-reports
The store's HTTP/JSON API is the contract — its README is the **source of truth**. This client
mirrors only the subset it needs with small local structs (`exportRow`, `modelStat`, `gradeReq`,
…). If you change a field here, check it against gadfly-reports' `server.go`/`store.go`. Endpoints
used: `GET /export`, `POST /findings/{id}/grade`, `GET /scoreboard`.
Three tools: `list_findings`, `record_finding_grade`, `scoreboard`. The grader is always forced to
`"claude"`. The store holds **no points**; ranking by points/value-per-minute is a client concern —
say so in the `scoreboard` tool description.
## Tests
The core logic (`groupFindings` / `listFindings` / `recordGrade` / `scoreboard`) is factored free of
MCP types and tested against an `httptest.Server`, so tests need no real daemon. Keep it that way —
add a test when you add a tool or change the grouping/filtering.
```sh
go build ./...
go vet ./...
gofmt -l . # must be empty
go test -race ./...
```
## When making changes
- Keep this a thin client: no SQLite, no business logic the store should own.
- Keep the launch path a root `package main` (don't move it under `cmd/`), so `go run …@latest`
stays the one-liner the README documents.
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Steve Dudenhoeffer
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+61 -1
View File
@@ -1,2 +1,62 @@
# gadfly-mcp # 🪰🔌 gadfly-mcp
An [MCP](https://modelcontextprotocol.io) server that lets an MCP client (e.g. Claude) read and
**grade** [Gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly) review findings stored in
[gadfly-reports](https://gitea.stevedudenhoeffer.com/steve/gadfly-reports). It's a tiny, stateless
stdio process — a thin HTTP client to the store — so there's nothing to install or manage: your MCP
client launches it on demand with `go run …@latest`.
> ### 🤖 Heads up: this is a vibe-coded project
> gadfly-mcp was built almost entirely by an AI agent (Claude Code) — code and docs. It's small and
> tested, but treat it as homelab-grade. Issues and PRs welcome.
## Add it to Claude
The store (`gadfly-reports`) runs persistently somewhere; this MCP server is throwaway. Point your
client at it via `go run` (first launch compiles + caches; needs Go + access to the module host):
```jsonc
{
"mcpServers": {
"gadfly": {
"command": "go",
"args": [
"run", "gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest",
"--store", "https://gadfly-reports.your-host:8090"
],
"env": { "GADFLY_REPORTS_TOKEN": "the-same-token-the-store-uses" }
}
}
}
```
`--store` defaults to `$GADFLY_REPORTS_URL` (else `http://localhost:8090`). If the store requires a
bearer token, set `GADFLY_REPORTS_TOKEN`.
## Tools
| Tool | Args | Does |
|---|---|---|
| `list_findings` | `repo?`, `pr?`, `only_ungraded?` | lists findings (one entry per finding; reports from multiple models grouped, distinct models listed) |
| `record_finding_grade` | `finding_id`, `is_real`, `severity?`, `usefulness?`, `notes?` | records a triage grade (grader is always `claude`) |
| `scoreboard` | `model?` | per-model rollup (runs, minutes, tokens, confirmed-by-severity histogram) |
`severity` is one of `trivial|small|medium|high|critical` (set it when `is_real=true`; omit for a
false positive). **Points are not stored or returned** — gadfly-reports keeps raw facts, so any
"value per minute / per token" ranking is computed client-side (map severity → points, divide by
minutes). Use `scoreboard` for the raw material.
Typical flow: *"List the ungraded gadfly findings on PR 2, look into each against the code, and
record a grade for each."*
## Build / test
```sh
go build ./...
go test ./...
gofmt -l . # must be clean
```
## License
MIT © 2026 Steve Dudenhoeffer.
+14
View File
@@ -0,0 +1,14 @@
module gitea.stevedudenhoeffer.com/steve/gadfly-mcp
go 1.26
require github.com/modelcontextprotocol/go-sdk v1.6.1
require (
github.com/google/jsonschema-go v0.4.3 // indirect
github.com/segmentio/asm v1.1.3 // indirect
github.com/segmentio/encoding v0.5.4 // indirect
github.com/yosida95/uritemplate/v3 v3.0.2 // indirect
golang.org/x/oauth2 v0.35.0 // indirect
golang.org/x/sys v0.41.0 // indirect
)
+20
View File
@@ -0,0 +1,20 @@
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
github.com/google/jsonschema-go v0.4.3 h1:/DBOLZTfDow7pe2GmaJNhltueGTtDKICi8V8p+DQPd0=
github.com/google/jsonschema-go v0.4.3/go.mod h1:r5quNTdLOYEz95Ru18zA0ydNbBuYoo9tgaYcxEYhJVE=
github.com/modelcontextprotocol/go-sdk v1.6.1 h1:0zOSupjKUxPKSocPT1Wtago+mUHU2/uZ4xSOY0FGReU=
github.com/modelcontextprotocol/go-sdk v1.6.1/go.mod h1:kzm3kzFL1/+AziGOE0nUs3gvPoNxMCvkxokMkuFapXQ=
github.com/segmentio/asm v1.1.3 h1:WM03sfUOENvvKexOLp+pCqgb/WDjsi7EK8gIsICtzhc=
github.com/segmentio/asm v1.1.3/go.mod h1:Ld3L4ZXGNcSLRg4JBsZ3//1+f/TjYl0Mzen/DQy1EJg=
github.com/segmentio/encoding v0.5.4 h1:OW1VRern8Nw6ITAtwSZ7Idrl3MXCFwXHPgqESYfvNt0=
github.com/segmentio/encoding v0.5.4/go.mod h1:HS1ZKa3kSN32ZHVZ7ZLPLXWvOVIiZtyJnO1gPH1sKt0=
github.com/yosida95/uritemplate/v3 v3.0.2 h1:Ed3Oyj9yrmi9087+NczuL5BwkIc4wvTb5zIM+UJPGz4=
github.com/yosida95/uritemplate/v3 v3.0.2/go.mod h1:ILOh0sOhIJR3+L/8afwt/kE++YT040gmv5BQTMR2HP4=
golang.org/x/oauth2 v0.35.0 h1:Mv2mzuHuZuY2+bkyWXIHMfhNdJAdwW3FuWeCPYN5GVQ=
golang.org/x/oauth2 v0.35.0/go.mod h1:lzm5WQJQwKZ3nwavOZ3IS5Aulzxi68dUSgRHujetwEA=
golang.org/x/sys v0.41.0 h1:Ivj+2Cp/ylzLiEU89QhWblYnOE9zerudt9Ftecq2C6k=
golang.org/x/sys v0.41.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/tools v0.42.0 h1:uNgphsn75Tdz5Ji2q36v/nsFSfR/9BRFvqhGBaJGd5k=
golang.org/x/tools v0.42.0/go.mod h1:Ma6lCIwGZvHK6XtgbswSoWroEkhugApmsXyrUmBhfr0=
+362
View File
@@ -0,0 +1,362 @@
// Command gadfly-mcp is a stdio MCP server that exposes a gadfly-reports findings
// store to an MCP client (e.g. Claude). It is a THIN HTTP client to the gadfly-reports daemon: it
// never opens the SQLite database directly and does not import the daemon's
// package, so it mirrors the store's JSON shapes with small local structs.
//
// Launch it with:
//
// go run gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest
//
// Configuration:
//
// --store base URL of the gadfly-reports daemon
// (default $GADFLY_REPORTS_URL, else http://localhost:8090)
// $GADFLY_REPORTS_TOKEN if set, sent as "Authorization: Bearer <token>" on every request
//
// Tools: list_findings, record_finding_grade, scoreboard. The grader is always
// "claude". gadfly-reports stores no points; the scoreboard tool's points are a
// client-side concern (severity -> points, divided by minutes).
package main
import (
"bytes"
"context"
"encoding/json"
"flag"
"fmt"
"io"
"log"
"net/http"
"net/url"
"os"
"strings"
"time"
"github.com/modelcontextprotocol/go-sdk/mcp"
)
// ---- local mirrors of the store's JSON shapes (see ../../store.go) ----
// exportRow mirrors store.ExportRow: one report joined with its finding, run
// timing, and latest grade. Many rows can share a finding_id (one per reporting
// model), which is why list_findings groups them.
type exportRow struct {
FindingID string `json:"finding_id"`
Repo string `json:"repo"`
PR int `json:"pr"`
Lens string `json:"lens"`
File string `json:"file"`
Line int `json:"line"`
Title string `json:"title"`
Model string `json:"model"`
Provider string `json:"provider"`
RunID string `json:"run_id"`
Graded bool `json:"graded"`
IsReal *bool `json:"is_real"`
Severity string `json:"severity"`
Usefulness *int `json:"usefulness"`
Grader string `json:"grader"`
}
// modelStat mirrors store.ModelStat: the points-free per-model rollup.
type modelStat struct {
Model string `json:"model"`
Provider string `json:"provider"`
Runs int `json:"runs"`
Minutes float64 `json:"minutes"`
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
Findings int `json:"findings"`
Confirmed int `json:"confirmed"`
FalsePositive int `json:"false_positive"`
Ungraded int `json:"ungraded"`
BySeverity map[string]int `json:"by_severity"`
}
// findingOut is one grouped finding as list_findings returns it.
type findingOut struct {
FindingID string `json:"finding_id"`
Repo string `json:"repo"`
PR int `json:"pr"`
Lens string `json:"lens"`
File string `json:"file,omitempty"`
Line int `json:"line,omitempty"`
Title string `json:"title"`
Models []string `json:"models"`
Graded bool `json:"graded"`
IsReal *bool `json:"is_real,omitempty"`
Severity string `json:"severity,omitempty"`
Usefulness *int `json:"usefulness,omitempty"`
Grader string `json:"grader,omitempty"`
}
// gradeReq is the POST body for /findings/{id}/grade. grader is always "claude".
type gradeReq struct {
IsReal bool `json:"is_real"`
Severity string `json:"severity,omitempty"`
Usefulness *int `json:"usefulness,omitempty"`
Notes string `json:"notes,omitempty"`
Grader string `json:"grader"`
}
// ---- thin HTTP client to the gadfly-reports daemon ----
type client struct {
base string
token string
hc *http.Client
}
func newClient(base, token string) *client {
return &client{
base: strings.TrimRight(base, "/"),
token: token,
hc: &http.Client{Timeout: 30 * time.Second},
}
}
// do issues a request, attaching the bearer token if configured, and returns the
// response body. A non-2xx status becomes an error carrying the body (which the
// daemon shapes as {"error":...}).
func (c *client) do(ctx context.Context, method, path string, body io.Reader) ([]byte, error) {
req, err := http.NewRequestWithContext(ctx, method, c.base+path, body)
if err != nil {
return nil, err
}
if body != nil {
req.Header.Set("Content-Type", "application/json")
}
if c.token != "" {
req.Header.Set("Authorization", "Bearer "+c.token)
}
resp, err := c.hc.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
b, err := io.ReadAll(resp.Body)
if err != nil {
return nil, err
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return nil, fmt.Errorf("gadfly-reports %s %s: %s: %s", method, path, resp.Status, strings.TrimSpace(string(b)))
}
return b, nil
}
func (c *client) getJSON(ctx context.Context, path string, out any) error {
b, err := c.do(ctx, http.MethodGet, path, nil)
if err != nil {
return err
}
return json.Unmarshal(b, out)
}
func (c *client) postJSON(ctx context.Context, path string, in any) ([]byte, error) {
buf, err := json.Marshal(in)
if err != nil {
return nil, err
}
return c.do(ctx, http.MethodPost, path, bytes.NewReader(buf))
}
// ---- core logic (kept free of MCP types so it is directly testable) ----
// groupFindings collapses export rows (one per reporting model) into one entry
// per finding_id, preserving first-seen order, with distinct reporting models.
// Filters: repo (exact, when non-empty), pr (when non-nil), only-ungraded.
func groupFindings(rows []exportRow, repo string, pr *int, onlyUngraded bool) []findingOut {
type acc struct {
out *findingOut
seen map[string]bool
}
byID := map[string]*acc{}
var order []string
for _, r := range rows {
if repo != "" && r.Repo != repo {
continue
}
if pr != nil && r.PR != *pr {
continue
}
a, ok := byID[r.FindingID]
if !ok {
a = &acc{
out: &findingOut{
FindingID: r.FindingID,
Repo: r.Repo,
PR: r.PR,
Lens: r.Lens,
File: r.File,
Line: r.Line,
Title: r.Title,
Graded: r.Graded,
IsReal: r.IsReal,
Severity: r.Severity,
Usefulness: r.Usefulness,
Grader: r.Grader,
},
seen: map[string]bool{},
}
byID[r.FindingID] = a
order = append(order, r.FindingID)
}
if r.Model != "" && !a.seen[r.Model] {
a.seen[r.Model] = true
a.out.Models = append(a.out.Models, r.Model)
}
}
out := make([]findingOut, 0, len(order))
for _, id := range order {
f := byID[id].out
if onlyUngraded && f.Graded {
continue
}
if f.Models == nil {
f.Models = []string{}
}
out = append(out, *f)
}
return out
}
// listFindings fetches /export, groups + filters it, and returns pretty JSON.
func listFindings(ctx context.Context, c *client, repo string, pr *int, onlyUngraded bool) (string, error) {
var rows []exportRow
if err := c.getJSON(ctx, "/export", &rows); err != nil {
return "", err
}
return prettyJSON(groupFindings(rows, repo, pr, onlyUngraded))
}
// recordGrade POSTs a grade for findingID (grader forced to "claude").
func recordGrade(ctx context.Context, c *client, findingID string, g gradeReq) (string, error) {
g.Grader = "claude"
path := "/findings/" + url.PathEscape(findingID) + "/grade"
b, err := c.postJSON(ctx, path, g)
if err != nil {
return "", err
}
verdict := "false positive"
if g.IsReal {
verdict = "real"
if g.Severity != "" {
verdict += " (" + g.Severity + ")"
}
}
return fmt.Sprintf("graded finding %s as %s [%s]", findingID, verdict, strings.TrimSpace(string(b))), nil
}
// scoreboard fetches /scoreboard, optionally narrows to one model, returns JSON.
func scoreboard(ctx context.Context, c *client, model string) (string, error) {
var stats []modelStat
if err := c.getJSON(ctx, "/scoreboard", &stats); err != nil {
return "", err
}
if model != "" {
filtered := make([]modelStat, 0, 1)
for _, s := range stats {
if s.Model == model {
filtered = append(filtered, s)
}
}
stats = filtered
}
return prettyJSON(stats)
}
func prettyJSON(v any) (string, error) {
b, err := json.MarshalIndent(v, "", " ")
if err != nil {
return "", err
}
return string(b), nil
}
func textResult(s string) *mcp.CallToolResult {
return &mcp.CallToolResult{Content: []mcp.Content{&mcp.TextContent{Text: s}}}
}
// ---- MCP tool input shapes (json/jsonschema tags drive the input schema) ----
type listFindingsIn struct {
Repo string `json:"repo,omitempty" jsonschema:"filter to this repository (exact match)"`
PR *int `json:"pr,omitempty" jsonschema:"filter to this pull request number"`
OnlyUngraded bool `json:"only_ungraded,omitempty" jsonschema:"when true, return only findings that have no grade yet"`
}
type recordGradeIn struct {
FindingID string `json:"finding_id" jsonschema:"the finding id to grade"`
IsReal bool `json:"is_real" jsonschema:"true if the finding is a genuine problem, false if a false positive"`
Severity string `json:"severity,omitempty" jsonschema:"required when is_real is true: one of trivial, small, medium, high, critical; omit when is_real is false"`
Usefulness *int `json:"usefulness,omitempty" jsonschema:"optional 1..5 rating of how useful the finding was"`
Notes string `json:"notes,omitempty" jsonschema:"optional free-text rationale for the grade"`
}
type scoreboardIn struct {
Model string `json:"model,omitempty" jsonschema:"optional: narrow the scoreboard to a single model"`
}
func main() {
store := flag.String("store", envOr("GADFLY_REPORTS_URL", "http://localhost:8090"), "base URL of the gadfly-reports store daemon")
flag.Parse()
c := newClient(*store, os.Getenv("GADFLY_REPORTS_TOKEN"))
server := mcp.NewServer(&mcp.Implementation{Name: "gadfly-mcp", Version: "0.1.0"}, nil)
mcp.AddTool(server, &mcp.Tool{
Name: "list_findings",
Description: "List Gadfly review findings from the gadfly-reports store, one entry per finding (reports from multiple models are grouped, with the distinct reporting models listed). Optionally filter by repo, pr, or only_ungraded to focus on findings that still need a grade.",
}, func(ctx context.Context, _ *mcp.CallToolRequest, in listFindingsIn) (*mcp.CallToolResult, any, error) {
out, err := listFindings(ctx, c, in.Repo, in.PR, in.OnlyUngraded)
if err != nil {
return nil, nil, err
}
return textResult(out), nil, nil
})
mcp.AddTool(server, &mcp.Tool{
Name: "record_finding_grade",
Description: "Grade a single finding in the gadfly-reports store (grader is always \"claude\"). Set is_real=true with a severity (trivial|small|medium|high|critical) for a genuine problem, or is_real=false (omit severity) for a false positive.",
}, func(ctx context.Context, _ *mcp.CallToolRequest, in recordGradeIn) (*mcp.CallToolResult, any, error) {
if strings.TrimSpace(in.FindingID) == "" {
return nil, nil, fmt.Errorf("finding_id is required")
}
msg, err := recordGrade(ctx, c, in.FindingID, gradeReq{
IsReal: in.IsReal,
Severity: in.Severity,
Usefulness: in.Usefulness,
Notes: in.Notes,
})
if err != nil {
return nil, nil, err
}
return textResult(msg), nil, nil
})
mcp.AddTool(server, &mcp.Tool{
Name: "scoreboard",
Description: "Per-model rollup from the gadfly-reports store (runs, minutes, tokens, findings, confirmed/false-positive/ungraded counts, and a confirmed-by-severity histogram). NOTE: gadfly-reports stores no points; any points/value-per-minute ranking is computed CLIENT-SIDE by mapping severity to points and dividing by minutes. Optionally filter to a single model.",
}, func(ctx context.Context, _ *mcp.CallToolRequest, in scoreboardIn) (*mcp.CallToolResult, any, error) {
out, err := scoreboard(ctx, c, in.Model)
if err != nil {
return nil, nil, err
}
return textResult(out), nil, nil
})
if err := server.Run(context.Background(), &mcp.StdioTransport{}); err != nil {
log.Printf("gadfly-reports mcp: %v", err)
os.Exit(1)
}
}
func envOr(key, def string) string {
if v := os.Getenv(key); v != "" {
return v
}
return def
}
+174
View File
@@ -0,0 +1,174 @@
package main
import (
"context"
"encoding/json"
"io"
"net/http"
"net/http/httptest"
"testing"
)
func intp(i int) *int { return &i }
func boolp(b bool) *bool { return &b }
// sample /export rows: finding f1 reported by two models (one ungraded? no,
// graded real), f2 reported once and graded false-positive, f3 ungraded, plus a
// row for a different repo/pr to exercise filtering.
func sampleRows() []exportRow {
return []exportRow{
{FindingID: "f1", Repo: "acme/widget", PR: 7, Lens: "security", File: "a.go", Line: 10, Title: "SQL injection", Model: "gpt-4o", Provider: "openai", Graded: true, IsReal: boolp(true), Severity: "high", Usefulness: intp(5), Grader: "claude"},
{FindingID: "f1", Repo: "acme/widget", PR: 7, Lens: "security", File: "a.go", Line: 10, Title: "SQL injection", Model: "qwen2.5-coder:7b", Provider: "ollama", Graded: true, IsReal: boolp(true), Severity: "high", Usefulness: intp(5), Grader: "claude"},
{FindingID: "f1", Repo: "acme/widget", PR: 7, Lens: "security", File: "a.go", Line: 10, Title: "SQL injection", Model: "gpt-4o", Provider: "openai", Graded: true, IsReal: boolp(true), Severity: "high", Usefulness: intp(5), Grader: "claude"}, // dup model -> deduped
{FindingID: "f2", Repo: "acme/widget", PR: 7, Lens: "correctness", File: "b.go", Line: 22, Title: "off by one", Model: "gpt-4o", Provider: "openai", Graded: true, IsReal: boolp(false), Grader: "claude"},
{FindingID: "f3", Repo: "acme/widget", PR: 7, Lens: "performance", File: "c.go", Line: 3, Title: "n+1 query", Model: "qwen2.5-coder:7b", Provider: "ollama", Graded: false},
{FindingID: "f4", Repo: "other/repo", PR: 99, Lens: "docs", File: "d.go", Line: 1, Title: "typo", Model: "gpt-4o", Provider: "openai", Graded: false},
}
}
func TestGroupFindings_GroupingAndDedup(t *testing.T) {
got := groupFindings(sampleRows(), "", nil, false)
if len(got) != 4 {
t.Fatalf("want 4 grouped findings, got %d", len(got))
}
var f1 *findingOut
for i := range got {
if got[i].FindingID == "f1" {
f1 = &got[i]
}
}
if f1 == nil {
t.Fatal("f1 missing")
}
if len(f1.Models) != 2 {
t.Fatalf("f1 should have 2 distinct models, got %v", f1.Models)
}
if f1.Models[0] != "gpt-4o" || f1.Models[1] != "qwen2.5-coder:7b" {
t.Fatalf("f1 model order/dedup wrong: %v", f1.Models)
}
if !f1.Graded || f1.IsReal == nil || !*f1.IsReal || f1.Severity != "high" {
t.Fatalf("f1 grade not propagated: %+v", f1)
}
}
func TestGroupFindings_FilterRepoAndPR(t *testing.T) {
got := groupFindings(sampleRows(), "other/repo", nil, false)
if len(got) != 1 || got[0].FindingID != "f4" {
t.Fatalf("repo filter failed: %+v", got)
}
pr := 99
got = groupFindings(sampleRows(), "", &pr, false)
if len(got) != 1 || got[0].FindingID != "f4" {
t.Fatalf("pr filter failed: %+v", got)
}
pr = 7
got = groupFindings(sampleRows(), "acme/widget", &pr, false)
if len(got) != 3 {
t.Fatalf("combined repo+pr filter want 3, got %d", len(got))
}
}
func TestGroupFindings_OnlyUngraded(t *testing.T) {
got := groupFindings(sampleRows(), "", nil, true)
ids := map[string]bool{}
for _, f := range got {
ids[f.FindingID] = true
if f.Graded {
t.Fatalf("only_ungraded returned a graded finding: %s", f.FindingID)
}
}
if !ids["f3"] || !ids["f4"] || ids["f1"] || ids["f2"] {
t.Fatalf("only_ungraded set wrong: %v", ids)
}
}
func TestListFindings_EmptyModelsIsArray(t *testing.T) {
// A row with no model should still produce models:[] (not null) in JSON.
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
_ = json.NewEncoder(w).Encode([]exportRow{{FindingID: "x", Repo: "r", PR: 1, Lens: "l", Title: "t"}})
}))
defer srv.Close()
out, err := listFindings(context.Background(), newClient(srv.URL, ""), "", nil, false)
if err != nil {
t.Fatal(err)
}
var parsed []findingOut
if err := json.Unmarshal([]byte(out), &parsed); err != nil {
t.Fatalf("output not valid JSON: %v\n%s", err, out)
}
if len(parsed) != 1 || parsed[0].Models == nil {
t.Fatalf("expected models:[] non-nil, got %+v", parsed)
}
}
func TestRecordGrade_PathBodyAndAuth(t *testing.T) {
var gotPath, gotAuth string
var gotBody gradeReq
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
gotPath = r.URL.Path
gotAuth = r.Header.Get("Authorization")
b, _ := io.ReadAll(r.Body)
_ = json.Unmarshal(b, &gotBody)
_ = json.NewEncoder(w).Encode(map[string]string{"finding_id": "abc123"})
}))
defer srv.Close()
c := newClient(srv.URL, "sekret")
msg, err := recordGrade(context.Background(), c, "abc123", gradeReq{IsReal: true, Severity: "high", Usefulness: intp(4)})
if err != nil {
t.Fatal(err)
}
if gotPath != "/findings/abc123/grade" {
t.Fatalf("wrong path: %s", gotPath)
}
if gotAuth != "Bearer sekret" {
t.Fatalf("auth header not sent: %q", gotAuth)
}
if gotBody.Grader != "claude" {
t.Fatalf("grader should be forced to claude, got %q", gotBody.Grader)
}
if !gotBody.IsReal || gotBody.Severity != "high" || gotBody.Usefulness == nil || *gotBody.Usefulness != 4 {
t.Fatalf("body not forwarded correctly: %+v", gotBody)
}
if msg == "" {
t.Fatal("expected a confirmation message")
}
}
func TestRecordGrade_StoreErrorSurfaced(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusBadRequest)
_ = json.NewEncoder(w).Encode(map[string]string{"error": "unknown finding_id"})
}))
defer srv.Close()
_, err := recordGrade(context.Background(), newClient(srv.URL, ""), "nope", gradeReq{IsReal: false})
if err == nil {
t.Fatal("expected non-2xx to surface as an error")
}
}
func TestScoreboard_FilterByModel(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
_ = json.NewEncoder(w).Encode([]modelStat{
{Model: "gpt-4o", Provider: "openai", Runs: 3, Findings: 10},
{Model: "qwen2.5-coder:7b", Provider: "ollama", Runs: 5, Findings: 4},
})
}))
defer srv.Close()
out, err := scoreboard(context.Background(), newClient(srv.URL, ""), "gpt-4o")
if err != nil {
t.Fatal(err)
}
var parsed []modelStat
if err := json.Unmarshal([]byte(out), &parsed); err != nil {
t.Fatal(err)
}
if len(parsed) != 1 || parsed[0].Model != "gpt-4o" {
t.Fatalf("model filter failed: %+v", parsed)
}
}