feat: gadfly-mcp — MCP server for grading gadfly-reports findings
CI / test (push) Successful in 10m10s
CI / test (push) Successful in 10m10s
Thin, stateless stdio MCP server (official Go SDK) that exposes a gadfly-reports store to an MCP client (e.g. Claude). Tools: list_findings, record_finding_grade, scoreboard (grader forced to claude). Launch via 'go run ...@latest' — nothing to install. Core logic tested against httptest, no daemon required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,26 @@
|
||||
name: CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
types: [opened, synchronize, reopened]
|
||||
workflow_dispatch: {}
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 10
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: "1.26"
|
||||
- name: Build
|
||||
run: go build ./...
|
||||
- name: Vet
|
||||
run: go vet ./...
|
||||
- name: gofmt
|
||||
run: test -z "$(gofmt -l .)" || { gofmt -l .; echo "gofmt needed"; exit 1; }
|
||||
- name: Test (race)
|
||||
run: go test -race ./...
|
||||
+2
-27
@@ -1,27 +1,2 @@
|
||||
# ---> Go
|
||||
# If you prefer the allow list template instead of the deny list, see community template:
|
||||
# https://github.com/github/gitignore/blob/main/community/Golang/Go.AllowList.gitignore
|
||||
#
|
||||
# Binaries for programs and plugins
|
||||
*.exe
|
||||
*.exe~
|
||||
*.dll
|
||||
*.so
|
||||
*.dylib
|
||||
|
||||
# Test binary, built with `go test -c`
|
||||
*.test
|
||||
|
||||
# Output of the go coverage tool, specifically when used with LiteIDE
|
||||
*.out
|
||||
|
||||
# Dependency directories (remove the comment below to include it)
|
||||
# vendor/
|
||||
|
||||
# Go workspace file
|
||||
go.work
|
||||
go.work.sum
|
||||
|
||||
# env file
|
||||
.env
|
||||
|
||||
# build output
|
||||
/gadfly-mcp
|
||||
|
||||
@@ -0,0 +1,50 @@
|
||||
# gadfly-mcp — Developer Guide
|
||||
|
||||
A stdio [MCP](https://modelcontextprotocol.io) server exposing the
|
||||
[gadfly-reports](https://gitea.stevedudenhoeffer.com/steve/gadfly-reports) findings store to an MCP
|
||||
client (e.g. Claude). It is a **thin, stateless HTTP client** to the store — it never opens SQLite
|
||||
and never imports the store's package.
|
||||
|
||||
> This is a public, **vibe-coded** project (built largely by an AI agent). Keep that honest in the
|
||||
> README; it's homelab-grade.
|
||||
|
||||
## Shape
|
||||
|
||||
- Single `main.go` (`package main`) at the repo root, so the launch path is just
|
||||
`go run gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest` — no `cmd/` subpath. This is the
|
||||
whole point: the client compiles + caches it on demand; nothing to install or manage.
|
||||
- Uses the official Go MCP SDK (`github.com/modelcontextprotocol/go-sdk`): `mcp.NewServer` →
|
||||
`mcp.AddTool[In,Out]` (input schemas inferred from struct + `jsonschema` tags) → `server.Run(ctx,
|
||||
&mcp.StdioTransport{})`.
|
||||
- Config: `--store` flag (default `$GADFLY_REPORTS_URL`, else `http://localhost:8090`); bearer token
|
||||
from `$GADFLY_REPORTS_TOKEN`, sent on every request when set.
|
||||
|
||||
## Contract with gadfly-reports
|
||||
|
||||
The store's HTTP/JSON API is the contract — its README is the **source of truth**. This client
|
||||
mirrors only the subset it needs with small local structs (`exportRow`, `modelStat`, `gradeReq`,
|
||||
…). If you change a field here, check it against gadfly-reports' `server.go`/`store.go`. Endpoints
|
||||
used: `GET /export`, `POST /findings/{id}/grade`, `GET /scoreboard`.
|
||||
|
||||
Three tools: `list_findings`, `record_finding_grade`, `scoreboard`. The grader is always forced to
|
||||
`"claude"`. The store holds **no points**; ranking by points/value-per-minute is a client concern —
|
||||
say so in the `scoreboard` tool description.
|
||||
|
||||
## Tests
|
||||
|
||||
The core logic (`groupFindings` / `listFindings` / `recordGrade` / `scoreboard`) is factored free of
|
||||
MCP types and tested against an `httptest.Server`, so tests need no real daemon. Keep it that way —
|
||||
add a test when you add a tool or change the grouping/filtering.
|
||||
|
||||
```sh
|
||||
go build ./...
|
||||
go vet ./...
|
||||
gofmt -l . # must be empty
|
||||
go test -race ./...
|
||||
```
|
||||
|
||||
## When making changes
|
||||
|
||||
- Keep this a thin client: no SQLite, no business logic the store should own.
|
||||
- Keep the launch path a root `package main` (don't move it under `cmd/`), so `go run …@latest`
|
||||
stays the one-liner the README documents.
|
||||
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2026 Steve Dudenhoeffer
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
@@ -1,2 +1,62 @@
|
||||
# gadfly-mcp
|
||||
# 🪰🔌 gadfly-mcp
|
||||
|
||||
An [MCP](https://modelcontextprotocol.io) server that lets an MCP client (e.g. Claude) read and
|
||||
**grade** [Gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly) review findings stored in
|
||||
[gadfly-reports](https://gitea.stevedudenhoeffer.com/steve/gadfly-reports). It's a tiny, stateless
|
||||
stdio process — a thin HTTP client to the store — so there's nothing to install or manage: your MCP
|
||||
client launches it on demand with `go run …@latest`.
|
||||
|
||||
> ### 🤖 Heads up: this is a vibe-coded project
|
||||
> gadfly-mcp was built almost entirely by an AI agent (Claude Code) — code and docs. It's small and
|
||||
> tested, but treat it as homelab-grade. Issues and PRs welcome.
|
||||
|
||||
## Add it to Claude
|
||||
|
||||
The store (`gadfly-reports`) runs persistently somewhere; this MCP server is throwaway. Point your
|
||||
client at it via `go run` (first launch compiles + caches; needs Go + access to the module host):
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"mcpServers": {
|
||||
"gadfly": {
|
||||
"command": "go",
|
||||
"args": [
|
||||
"run", "gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest",
|
||||
"--store", "https://gadfly-reports.your-host:8090"
|
||||
],
|
||||
"env": { "GADFLY_REPORTS_TOKEN": "the-same-token-the-store-uses" }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`--store` defaults to `$GADFLY_REPORTS_URL` (else `http://localhost:8090`). If the store requires a
|
||||
bearer token, set `GADFLY_REPORTS_TOKEN`.
|
||||
|
||||
## Tools
|
||||
|
||||
| Tool | Args | Does |
|
||||
|---|---|---|
|
||||
| `list_findings` | `repo?`, `pr?`, `only_ungraded?` | lists findings (one entry per finding; reports from multiple models grouped, distinct models listed) |
|
||||
| `record_finding_grade` | `finding_id`, `is_real`, `severity?`, `usefulness?`, `notes?` | records a triage grade (grader is always `claude`) |
|
||||
| `scoreboard` | `model?` | per-model rollup (runs, minutes, tokens, confirmed-by-severity histogram) |
|
||||
|
||||
`severity` is one of `trivial|small|medium|high|critical` (set it when `is_real=true`; omit for a
|
||||
false positive). **Points are not stored or returned** — gadfly-reports keeps raw facts, so any
|
||||
"value per minute / per token" ranking is computed client-side (map severity → points, divide by
|
||||
minutes). Use `scoreboard` for the raw material.
|
||||
|
||||
Typical flow: *"List the ungraded gadfly findings on PR 2, look into each against the code, and
|
||||
record a grade for each."*
|
||||
|
||||
## Build / test
|
||||
|
||||
```sh
|
||||
go build ./...
|
||||
go test ./...
|
||||
gofmt -l . # must be clean
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT © 2026 Steve Dudenhoeffer.
|
||||
|
||||
@@ -0,0 +1,14 @@
|
||||
module gitea.stevedudenhoeffer.com/steve/gadfly-mcp
|
||||
|
||||
go 1.26
|
||||
|
||||
require github.com/modelcontextprotocol/go-sdk v1.6.1
|
||||
|
||||
require (
|
||||
github.com/google/jsonschema-go v0.4.3 // indirect
|
||||
github.com/segmentio/asm v1.1.3 // indirect
|
||||
github.com/segmentio/encoding v0.5.4 // indirect
|
||||
github.com/yosida95/uritemplate/v3 v3.0.2 // indirect
|
||||
golang.org/x/oauth2 v0.35.0 // indirect
|
||||
golang.org/x/sys v0.41.0 // indirect
|
||||
)
|
||||
@@ -0,0 +1,20 @@
|
||||
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
|
||||
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
|
||||
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
|
||||
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
|
||||
github.com/google/jsonschema-go v0.4.3 h1:/DBOLZTfDow7pe2GmaJNhltueGTtDKICi8V8p+DQPd0=
|
||||
github.com/google/jsonschema-go v0.4.3/go.mod h1:r5quNTdLOYEz95Ru18zA0ydNbBuYoo9tgaYcxEYhJVE=
|
||||
github.com/modelcontextprotocol/go-sdk v1.6.1 h1:0zOSupjKUxPKSocPT1Wtago+mUHU2/uZ4xSOY0FGReU=
|
||||
github.com/modelcontextprotocol/go-sdk v1.6.1/go.mod h1:kzm3kzFL1/+AziGOE0nUs3gvPoNxMCvkxokMkuFapXQ=
|
||||
github.com/segmentio/asm v1.1.3 h1:WM03sfUOENvvKexOLp+pCqgb/WDjsi7EK8gIsICtzhc=
|
||||
github.com/segmentio/asm v1.1.3/go.mod h1:Ld3L4ZXGNcSLRg4JBsZ3//1+f/TjYl0Mzen/DQy1EJg=
|
||||
github.com/segmentio/encoding v0.5.4 h1:OW1VRern8Nw6ITAtwSZ7Idrl3MXCFwXHPgqESYfvNt0=
|
||||
github.com/segmentio/encoding v0.5.4/go.mod h1:HS1ZKa3kSN32ZHVZ7ZLPLXWvOVIiZtyJnO1gPH1sKt0=
|
||||
github.com/yosida95/uritemplate/v3 v3.0.2 h1:Ed3Oyj9yrmi9087+NczuL5BwkIc4wvTb5zIM+UJPGz4=
|
||||
github.com/yosida95/uritemplate/v3 v3.0.2/go.mod h1:ILOh0sOhIJR3+L/8afwt/kE++YT040gmv5BQTMR2HP4=
|
||||
golang.org/x/oauth2 v0.35.0 h1:Mv2mzuHuZuY2+bkyWXIHMfhNdJAdwW3FuWeCPYN5GVQ=
|
||||
golang.org/x/oauth2 v0.35.0/go.mod h1:lzm5WQJQwKZ3nwavOZ3IS5Aulzxi68dUSgRHujetwEA=
|
||||
golang.org/x/sys v0.41.0 h1:Ivj+2Cp/ylzLiEU89QhWblYnOE9zerudt9Ftecq2C6k=
|
||||
golang.org/x/sys v0.41.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
|
||||
golang.org/x/tools v0.42.0 h1:uNgphsn75Tdz5Ji2q36v/nsFSfR/9BRFvqhGBaJGd5k=
|
||||
golang.org/x/tools v0.42.0/go.mod h1:Ma6lCIwGZvHK6XtgbswSoWroEkhugApmsXyrUmBhfr0=
|
||||
@@ -0,0 +1,362 @@
|
||||
// Command gadfly-mcp is a stdio MCP server that exposes a gadfly-reports findings
|
||||
// store to an MCP client (e.g. Claude). It is a THIN HTTP client to the gadfly-reports daemon: it
|
||||
// never opens the SQLite database directly and does not import the daemon's
|
||||
// package, so it mirrors the store's JSON shapes with small local structs.
|
||||
//
|
||||
// Launch it with:
|
||||
//
|
||||
// go run gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest
|
||||
//
|
||||
// Configuration:
|
||||
//
|
||||
// --store base URL of the gadfly-reports daemon
|
||||
// (default $GADFLY_REPORTS_URL, else http://localhost:8090)
|
||||
// $GADFLY_REPORTS_TOKEN if set, sent as "Authorization: Bearer <token>" on every request
|
||||
//
|
||||
// Tools: list_findings, record_finding_grade, scoreboard. The grader is always
|
||||
// "claude". gadfly-reports stores no points; the scoreboard tool's points are a
|
||||
// client-side concern (severity -> points, divided by minutes).
|
||||
package main
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"os"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/modelcontextprotocol/go-sdk/mcp"
|
||||
)
|
||||
|
||||
// ---- local mirrors of the store's JSON shapes (see ../../store.go) ----
|
||||
|
||||
// exportRow mirrors store.ExportRow: one report joined with its finding, run
|
||||
// timing, and latest grade. Many rows can share a finding_id (one per reporting
|
||||
// model), which is why list_findings groups them.
|
||||
type exportRow struct {
|
||||
FindingID string `json:"finding_id"`
|
||||
Repo string `json:"repo"`
|
||||
PR int `json:"pr"`
|
||||
Lens string `json:"lens"`
|
||||
File string `json:"file"`
|
||||
Line int `json:"line"`
|
||||
Title string `json:"title"`
|
||||
Model string `json:"model"`
|
||||
Provider string `json:"provider"`
|
||||
RunID string `json:"run_id"`
|
||||
Graded bool `json:"graded"`
|
||||
IsReal *bool `json:"is_real"`
|
||||
Severity string `json:"severity"`
|
||||
Usefulness *int `json:"usefulness"`
|
||||
Grader string `json:"grader"`
|
||||
}
|
||||
|
||||
// modelStat mirrors store.ModelStat: the points-free per-model rollup.
|
||||
type modelStat struct {
|
||||
Model string `json:"model"`
|
||||
Provider string `json:"provider"`
|
||||
Runs int `json:"runs"`
|
||||
Minutes float64 `json:"minutes"`
|
||||
InputTokens int64 `json:"input_tokens"`
|
||||
OutputTokens int64 `json:"output_tokens"`
|
||||
Findings int `json:"findings"`
|
||||
Confirmed int `json:"confirmed"`
|
||||
FalsePositive int `json:"false_positive"`
|
||||
Ungraded int `json:"ungraded"`
|
||||
BySeverity map[string]int `json:"by_severity"`
|
||||
}
|
||||
|
||||
// findingOut is one grouped finding as list_findings returns it.
|
||||
type findingOut struct {
|
||||
FindingID string `json:"finding_id"`
|
||||
Repo string `json:"repo"`
|
||||
PR int `json:"pr"`
|
||||
Lens string `json:"lens"`
|
||||
File string `json:"file,omitempty"`
|
||||
Line int `json:"line,omitempty"`
|
||||
Title string `json:"title"`
|
||||
Models []string `json:"models"`
|
||||
Graded bool `json:"graded"`
|
||||
IsReal *bool `json:"is_real,omitempty"`
|
||||
Severity string `json:"severity,omitempty"`
|
||||
Usefulness *int `json:"usefulness,omitempty"`
|
||||
Grader string `json:"grader,omitempty"`
|
||||
}
|
||||
|
||||
// gradeReq is the POST body for /findings/{id}/grade. grader is always "claude".
|
||||
type gradeReq struct {
|
||||
IsReal bool `json:"is_real"`
|
||||
Severity string `json:"severity,omitempty"`
|
||||
Usefulness *int `json:"usefulness,omitempty"`
|
||||
Notes string `json:"notes,omitempty"`
|
||||
Grader string `json:"grader"`
|
||||
}
|
||||
|
||||
// ---- thin HTTP client to the gadfly-reports daemon ----
|
||||
|
||||
type client struct {
|
||||
base string
|
||||
token string
|
||||
hc *http.Client
|
||||
}
|
||||
|
||||
func newClient(base, token string) *client {
|
||||
return &client{
|
||||
base: strings.TrimRight(base, "/"),
|
||||
token: token,
|
||||
hc: &http.Client{Timeout: 30 * time.Second},
|
||||
}
|
||||
}
|
||||
|
||||
// do issues a request, attaching the bearer token if configured, and returns the
|
||||
// response body. A non-2xx status becomes an error carrying the body (which the
|
||||
// daemon shapes as {"error":...}).
|
||||
func (c *client) do(ctx context.Context, method, path string, body io.Reader) ([]byte, error) {
|
||||
req, err := http.NewRequestWithContext(ctx, method, c.base+path, body)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if body != nil {
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
}
|
||||
if c.token != "" {
|
||||
req.Header.Set("Authorization", "Bearer "+c.token)
|
||||
}
|
||||
resp, err := c.hc.Do(req)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
b, err := io.ReadAll(resp.Body)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
||||
return nil, fmt.Errorf("gadfly-reports %s %s: %s: %s", method, path, resp.Status, strings.TrimSpace(string(b)))
|
||||
}
|
||||
return b, nil
|
||||
}
|
||||
|
||||
func (c *client) getJSON(ctx context.Context, path string, out any) error {
|
||||
b, err := c.do(ctx, http.MethodGet, path, nil)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return json.Unmarshal(b, out)
|
||||
}
|
||||
|
||||
func (c *client) postJSON(ctx context.Context, path string, in any) ([]byte, error) {
|
||||
buf, err := json.Marshal(in)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return c.do(ctx, http.MethodPost, path, bytes.NewReader(buf))
|
||||
}
|
||||
|
||||
// ---- core logic (kept free of MCP types so it is directly testable) ----
|
||||
|
||||
// groupFindings collapses export rows (one per reporting model) into one entry
|
||||
// per finding_id, preserving first-seen order, with distinct reporting models.
|
||||
// Filters: repo (exact, when non-empty), pr (when non-nil), only-ungraded.
|
||||
func groupFindings(rows []exportRow, repo string, pr *int, onlyUngraded bool) []findingOut {
|
||||
type acc struct {
|
||||
out *findingOut
|
||||
seen map[string]bool
|
||||
}
|
||||
byID := map[string]*acc{}
|
||||
var order []string
|
||||
|
||||
for _, r := range rows {
|
||||
if repo != "" && r.Repo != repo {
|
||||
continue
|
||||
}
|
||||
if pr != nil && r.PR != *pr {
|
||||
continue
|
||||
}
|
||||
a, ok := byID[r.FindingID]
|
||||
if !ok {
|
||||
a = &acc{
|
||||
out: &findingOut{
|
||||
FindingID: r.FindingID,
|
||||
Repo: r.Repo,
|
||||
PR: r.PR,
|
||||
Lens: r.Lens,
|
||||
File: r.File,
|
||||
Line: r.Line,
|
||||
Title: r.Title,
|
||||
Graded: r.Graded,
|
||||
IsReal: r.IsReal,
|
||||
Severity: r.Severity,
|
||||
Usefulness: r.Usefulness,
|
||||
Grader: r.Grader,
|
||||
},
|
||||
seen: map[string]bool{},
|
||||
}
|
||||
byID[r.FindingID] = a
|
||||
order = append(order, r.FindingID)
|
||||
}
|
||||
if r.Model != "" && !a.seen[r.Model] {
|
||||
a.seen[r.Model] = true
|
||||
a.out.Models = append(a.out.Models, r.Model)
|
||||
}
|
||||
}
|
||||
|
||||
out := make([]findingOut, 0, len(order))
|
||||
for _, id := range order {
|
||||
f := byID[id].out
|
||||
if onlyUngraded && f.Graded {
|
||||
continue
|
||||
}
|
||||
if f.Models == nil {
|
||||
f.Models = []string{}
|
||||
}
|
||||
out = append(out, *f)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// listFindings fetches /export, groups + filters it, and returns pretty JSON.
|
||||
func listFindings(ctx context.Context, c *client, repo string, pr *int, onlyUngraded bool) (string, error) {
|
||||
var rows []exportRow
|
||||
if err := c.getJSON(ctx, "/export", &rows); err != nil {
|
||||
return "", err
|
||||
}
|
||||
return prettyJSON(groupFindings(rows, repo, pr, onlyUngraded))
|
||||
}
|
||||
|
||||
// recordGrade POSTs a grade for findingID (grader forced to "claude").
|
||||
func recordGrade(ctx context.Context, c *client, findingID string, g gradeReq) (string, error) {
|
||||
g.Grader = "claude"
|
||||
path := "/findings/" + url.PathEscape(findingID) + "/grade"
|
||||
b, err := c.postJSON(ctx, path, g)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
verdict := "false positive"
|
||||
if g.IsReal {
|
||||
verdict = "real"
|
||||
if g.Severity != "" {
|
||||
verdict += " (" + g.Severity + ")"
|
||||
}
|
||||
}
|
||||
return fmt.Sprintf("graded finding %s as %s [%s]", findingID, verdict, strings.TrimSpace(string(b))), nil
|
||||
}
|
||||
|
||||
// scoreboard fetches /scoreboard, optionally narrows to one model, returns JSON.
|
||||
func scoreboard(ctx context.Context, c *client, model string) (string, error) {
|
||||
var stats []modelStat
|
||||
if err := c.getJSON(ctx, "/scoreboard", &stats); err != nil {
|
||||
return "", err
|
||||
}
|
||||
if model != "" {
|
||||
filtered := make([]modelStat, 0, 1)
|
||||
for _, s := range stats {
|
||||
if s.Model == model {
|
||||
filtered = append(filtered, s)
|
||||
}
|
||||
}
|
||||
stats = filtered
|
||||
}
|
||||
return prettyJSON(stats)
|
||||
}
|
||||
|
||||
func prettyJSON(v any) (string, error) {
|
||||
b, err := json.MarshalIndent(v, "", " ")
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
return string(b), nil
|
||||
}
|
||||
|
||||
func textResult(s string) *mcp.CallToolResult {
|
||||
return &mcp.CallToolResult{Content: []mcp.Content{&mcp.TextContent{Text: s}}}
|
||||
}
|
||||
|
||||
// ---- MCP tool input shapes (json/jsonschema tags drive the input schema) ----
|
||||
|
||||
type listFindingsIn struct {
|
||||
Repo string `json:"repo,omitempty" jsonschema:"filter to this repository (exact match)"`
|
||||
PR *int `json:"pr,omitempty" jsonschema:"filter to this pull request number"`
|
||||
OnlyUngraded bool `json:"only_ungraded,omitempty" jsonschema:"when true, return only findings that have no grade yet"`
|
||||
}
|
||||
|
||||
type recordGradeIn struct {
|
||||
FindingID string `json:"finding_id" jsonschema:"the finding id to grade"`
|
||||
IsReal bool `json:"is_real" jsonschema:"true if the finding is a genuine problem, false if a false positive"`
|
||||
Severity string `json:"severity,omitempty" jsonschema:"required when is_real is true: one of trivial, small, medium, high, critical; omit when is_real is false"`
|
||||
Usefulness *int `json:"usefulness,omitempty" jsonschema:"optional 1..5 rating of how useful the finding was"`
|
||||
Notes string `json:"notes,omitempty" jsonschema:"optional free-text rationale for the grade"`
|
||||
}
|
||||
|
||||
type scoreboardIn struct {
|
||||
Model string `json:"model,omitempty" jsonschema:"optional: narrow the scoreboard to a single model"`
|
||||
}
|
||||
|
||||
func main() {
|
||||
store := flag.String("store", envOr("GADFLY_REPORTS_URL", "http://localhost:8090"), "base URL of the gadfly-reports store daemon")
|
||||
flag.Parse()
|
||||
|
||||
c := newClient(*store, os.Getenv("GADFLY_REPORTS_TOKEN"))
|
||||
|
||||
server := mcp.NewServer(&mcp.Implementation{Name: "gadfly-mcp", Version: "0.1.0"}, nil)
|
||||
|
||||
mcp.AddTool(server, &mcp.Tool{
|
||||
Name: "list_findings",
|
||||
Description: "List Gadfly review findings from the gadfly-reports store, one entry per finding (reports from multiple models are grouped, with the distinct reporting models listed). Optionally filter by repo, pr, or only_ungraded to focus on findings that still need a grade.",
|
||||
}, func(ctx context.Context, _ *mcp.CallToolRequest, in listFindingsIn) (*mcp.CallToolResult, any, error) {
|
||||
out, err := listFindings(ctx, c, in.Repo, in.PR, in.OnlyUngraded)
|
||||
if err != nil {
|
||||
return nil, nil, err
|
||||
}
|
||||
return textResult(out), nil, nil
|
||||
})
|
||||
|
||||
mcp.AddTool(server, &mcp.Tool{
|
||||
Name: "record_finding_grade",
|
||||
Description: "Grade a single finding in the gadfly-reports store (grader is always \"claude\"). Set is_real=true with a severity (trivial|small|medium|high|critical) for a genuine problem, or is_real=false (omit severity) for a false positive.",
|
||||
}, func(ctx context.Context, _ *mcp.CallToolRequest, in recordGradeIn) (*mcp.CallToolResult, any, error) {
|
||||
if strings.TrimSpace(in.FindingID) == "" {
|
||||
return nil, nil, fmt.Errorf("finding_id is required")
|
||||
}
|
||||
msg, err := recordGrade(ctx, c, in.FindingID, gradeReq{
|
||||
IsReal: in.IsReal,
|
||||
Severity: in.Severity,
|
||||
Usefulness: in.Usefulness,
|
||||
Notes: in.Notes,
|
||||
})
|
||||
if err != nil {
|
||||
return nil, nil, err
|
||||
}
|
||||
return textResult(msg), nil, nil
|
||||
})
|
||||
|
||||
mcp.AddTool(server, &mcp.Tool{
|
||||
Name: "scoreboard",
|
||||
Description: "Per-model rollup from the gadfly-reports store (runs, minutes, tokens, findings, confirmed/false-positive/ungraded counts, and a confirmed-by-severity histogram). NOTE: gadfly-reports stores no points; any points/value-per-minute ranking is computed CLIENT-SIDE by mapping severity to points and dividing by minutes. Optionally filter to a single model.",
|
||||
}, func(ctx context.Context, _ *mcp.CallToolRequest, in scoreboardIn) (*mcp.CallToolResult, any, error) {
|
||||
out, err := scoreboard(ctx, c, in.Model)
|
||||
if err != nil {
|
||||
return nil, nil, err
|
||||
}
|
||||
return textResult(out), nil, nil
|
||||
})
|
||||
|
||||
if err := server.Run(context.Background(), &mcp.StdioTransport{}); err != nil {
|
||||
log.Printf("gadfly-reports mcp: %v", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
func envOr(key, def string) string {
|
||||
if v := os.Getenv(key); v != "" {
|
||||
return v
|
||||
}
|
||||
return def
|
||||
}
|
||||
+174
@@ -0,0 +1,174 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"io"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func intp(i int) *int { return &i }
|
||||
func boolp(b bool) *bool { return &b }
|
||||
|
||||
// sample /export rows: finding f1 reported by two models (one ungraded? no,
|
||||
// graded real), f2 reported once and graded false-positive, f3 ungraded, plus a
|
||||
// row for a different repo/pr to exercise filtering.
|
||||
func sampleRows() []exportRow {
|
||||
return []exportRow{
|
||||
{FindingID: "f1", Repo: "acme/widget", PR: 7, Lens: "security", File: "a.go", Line: 10, Title: "SQL injection", Model: "gpt-4o", Provider: "openai", Graded: true, IsReal: boolp(true), Severity: "high", Usefulness: intp(5), Grader: "claude"},
|
||||
{FindingID: "f1", Repo: "acme/widget", PR: 7, Lens: "security", File: "a.go", Line: 10, Title: "SQL injection", Model: "qwen2.5-coder:7b", Provider: "ollama", Graded: true, IsReal: boolp(true), Severity: "high", Usefulness: intp(5), Grader: "claude"},
|
||||
{FindingID: "f1", Repo: "acme/widget", PR: 7, Lens: "security", File: "a.go", Line: 10, Title: "SQL injection", Model: "gpt-4o", Provider: "openai", Graded: true, IsReal: boolp(true), Severity: "high", Usefulness: intp(5), Grader: "claude"}, // dup model -> deduped
|
||||
{FindingID: "f2", Repo: "acme/widget", PR: 7, Lens: "correctness", File: "b.go", Line: 22, Title: "off by one", Model: "gpt-4o", Provider: "openai", Graded: true, IsReal: boolp(false), Grader: "claude"},
|
||||
{FindingID: "f3", Repo: "acme/widget", PR: 7, Lens: "performance", File: "c.go", Line: 3, Title: "n+1 query", Model: "qwen2.5-coder:7b", Provider: "ollama", Graded: false},
|
||||
{FindingID: "f4", Repo: "other/repo", PR: 99, Lens: "docs", File: "d.go", Line: 1, Title: "typo", Model: "gpt-4o", Provider: "openai", Graded: false},
|
||||
}
|
||||
}
|
||||
|
||||
func TestGroupFindings_GroupingAndDedup(t *testing.T) {
|
||||
got := groupFindings(sampleRows(), "", nil, false)
|
||||
if len(got) != 4 {
|
||||
t.Fatalf("want 4 grouped findings, got %d", len(got))
|
||||
}
|
||||
var f1 *findingOut
|
||||
for i := range got {
|
||||
if got[i].FindingID == "f1" {
|
||||
f1 = &got[i]
|
||||
}
|
||||
}
|
||||
if f1 == nil {
|
||||
t.Fatal("f1 missing")
|
||||
}
|
||||
if len(f1.Models) != 2 {
|
||||
t.Fatalf("f1 should have 2 distinct models, got %v", f1.Models)
|
||||
}
|
||||
if f1.Models[0] != "gpt-4o" || f1.Models[1] != "qwen2.5-coder:7b" {
|
||||
t.Fatalf("f1 model order/dedup wrong: %v", f1.Models)
|
||||
}
|
||||
if !f1.Graded || f1.IsReal == nil || !*f1.IsReal || f1.Severity != "high" {
|
||||
t.Fatalf("f1 grade not propagated: %+v", f1)
|
||||
}
|
||||
}
|
||||
|
||||
func TestGroupFindings_FilterRepoAndPR(t *testing.T) {
|
||||
got := groupFindings(sampleRows(), "other/repo", nil, false)
|
||||
if len(got) != 1 || got[0].FindingID != "f4" {
|
||||
t.Fatalf("repo filter failed: %+v", got)
|
||||
}
|
||||
|
||||
pr := 99
|
||||
got = groupFindings(sampleRows(), "", &pr, false)
|
||||
if len(got) != 1 || got[0].FindingID != "f4" {
|
||||
t.Fatalf("pr filter failed: %+v", got)
|
||||
}
|
||||
|
||||
pr = 7
|
||||
got = groupFindings(sampleRows(), "acme/widget", &pr, false)
|
||||
if len(got) != 3 {
|
||||
t.Fatalf("combined repo+pr filter want 3, got %d", len(got))
|
||||
}
|
||||
}
|
||||
|
||||
func TestGroupFindings_OnlyUngraded(t *testing.T) {
|
||||
got := groupFindings(sampleRows(), "", nil, true)
|
||||
ids := map[string]bool{}
|
||||
for _, f := range got {
|
||||
ids[f.FindingID] = true
|
||||
if f.Graded {
|
||||
t.Fatalf("only_ungraded returned a graded finding: %s", f.FindingID)
|
||||
}
|
||||
}
|
||||
if !ids["f3"] || !ids["f4"] || ids["f1"] || ids["f2"] {
|
||||
t.Fatalf("only_ungraded set wrong: %v", ids)
|
||||
}
|
||||
}
|
||||
|
||||
func TestListFindings_EmptyModelsIsArray(t *testing.T) {
|
||||
// A row with no model should still produce models:[] (not null) in JSON.
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
|
||||
_ = json.NewEncoder(w).Encode([]exportRow{{FindingID: "x", Repo: "r", PR: 1, Lens: "l", Title: "t"}})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
out, err := listFindings(context.Background(), newClient(srv.URL, ""), "", nil, false)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
var parsed []findingOut
|
||||
if err := json.Unmarshal([]byte(out), &parsed); err != nil {
|
||||
t.Fatalf("output not valid JSON: %v\n%s", err, out)
|
||||
}
|
||||
if len(parsed) != 1 || parsed[0].Models == nil {
|
||||
t.Fatalf("expected models:[] non-nil, got %+v", parsed)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRecordGrade_PathBodyAndAuth(t *testing.T) {
|
||||
var gotPath, gotAuth string
|
||||
var gotBody gradeReq
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
gotPath = r.URL.Path
|
||||
gotAuth = r.Header.Get("Authorization")
|
||||
b, _ := io.ReadAll(r.Body)
|
||||
_ = json.Unmarshal(b, &gotBody)
|
||||
_ = json.NewEncoder(w).Encode(map[string]string{"finding_id": "abc123"})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
c := newClient(srv.URL, "sekret")
|
||||
msg, err := recordGrade(context.Background(), c, "abc123", gradeReq{IsReal: true, Severity: "high", Usefulness: intp(4)})
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if gotPath != "/findings/abc123/grade" {
|
||||
t.Fatalf("wrong path: %s", gotPath)
|
||||
}
|
||||
if gotAuth != "Bearer sekret" {
|
||||
t.Fatalf("auth header not sent: %q", gotAuth)
|
||||
}
|
||||
if gotBody.Grader != "claude" {
|
||||
t.Fatalf("grader should be forced to claude, got %q", gotBody.Grader)
|
||||
}
|
||||
if !gotBody.IsReal || gotBody.Severity != "high" || gotBody.Usefulness == nil || *gotBody.Usefulness != 4 {
|
||||
t.Fatalf("body not forwarded correctly: %+v", gotBody)
|
||||
}
|
||||
if msg == "" {
|
||||
t.Fatal("expected a confirmation message")
|
||||
}
|
||||
}
|
||||
|
||||
func TestRecordGrade_StoreErrorSurfaced(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
|
||||
w.WriteHeader(http.StatusBadRequest)
|
||||
_ = json.NewEncoder(w).Encode(map[string]string{"error": "unknown finding_id"})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
_, err := recordGrade(context.Background(), newClient(srv.URL, ""), "nope", gradeReq{IsReal: false})
|
||||
if err == nil {
|
||||
t.Fatal("expected non-2xx to surface as an error")
|
||||
}
|
||||
}
|
||||
|
||||
func TestScoreboard_FilterByModel(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
|
||||
_ = json.NewEncoder(w).Encode([]modelStat{
|
||||
{Model: "gpt-4o", Provider: "openai", Runs: 3, Findings: 10},
|
||||
{Model: "qwen2.5-coder:7b", Provider: "ollama", Runs: 5, Findings: 4},
|
||||
})
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
out, err := scoreboard(context.Background(), newClient(srv.URL, ""), "gpt-4o")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
var parsed []modelStat
|
||||
if err := json.Unmarshal([]byte(out), &parsed); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(parsed) != 1 || parsed[0].Model != "gpt-4o" {
|
||||
t.Fatalf("model filter failed: %+v", parsed)
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user