Thin, stateless stdio MCP server (official Go SDK) that exposes a gadfly-reports store to an MCP client (e.g. Claude). Tools: list_findings, record_finding_grade, scoreboard (grader forced to claude). Launch via 'go run ...@latest' — nothing to install. Core logic tested against httptest, no daemon required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🪰🔌 gadfly-mcp
An MCP server that lets an MCP client (e.g. Claude) read and
grade Gadfly review findings stored in
gadfly-reports. It's a tiny, stateless
stdio process — a thin HTTP client to the store — so there's nothing to install or manage: your MCP
client launches it on demand with go run …@latest.
🤖 Heads up: this is a vibe-coded project
gadfly-mcp was built almost entirely by an AI agent (Claude Code) — code and docs. It's small and tested, but treat it as homelab-grade. Issues and PRs welcome.
Add it to Claude
The store (gadfly-reports) runs persistently somewhere; this MCP server is throwaway. Point your
client at it via go run (first launch compiles + caches; needs Go + access to the module host):
{
"mcpServers": {
"gadfly": {
"command": "go",
"args": [
"run", "gitea.stevedudenhoeffer.com/steve/gadfly-mcp@latest",
"--store", "https://gadfly-reports.your-host:8090"
],
"env": { "GADFLY_REPORTS_TOKEN": "the-same-token-the-store-uses" }
}
}
}
--store defaults to $GADFLY_REPORTS_URL (else http://localhost:8090). If the store requires a
bearer token, set GADFLY_REPORTS_TOKEN.
Tools
| Tool | Args | Does |
|---|---|---|
list_findings |
repo?, pr?, only_ungraded? |
lists findings (one entry per finding; reports from multiple models grouped, distinct models listed) |
record_finding_grade |
finding_id, is_real, severity?, usefulness?, notes? |
records a triage grade (grader is always claude) |
scoreboard |
model? |
per-model rollup (runs, minutes, tokens, confirmed-by-severity histogram) |
severity is one of trivial|small|medium|high|critical (set it when is_real=true; omit for a
false positive). Points are not stored or returned — gadfly-reports keeps raw facts, so any
"value per minute / per token" ranking is computed client-side (map severity → points, divide by
minutes). Use scoreboard for the raw material.
Typical flow: "List the ungraded gadfly findings on PR 2, look into each against the code, and record a grade for each."
Build / test
go build ./...
go test ./...
gofmt -l . # must be clean
License
MIT © 2026 Steve Dudenhoeffer.