feat: specialist suite — configurable + custom review lenses (one consolidated comment)
Build & push image / build-and-push (push) Successful in 8s

Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).

- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
  maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
  improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
  GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
  Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
  Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
  focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
  specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
  CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Steve Dudenhoeffer
2026-06-25 19:23:05 -04:00
parent 676c9d4f07
commit 7809d1b93d
13 changed files with 581 additions and 38 deletions
+17 -6
View File
@@ -29,11 +29,13 @@ verifies each one against the actual code, and posts its findings as a comment.
``` ```
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout) cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
main.go agent orchestration: review pass + adversarial recheck pass, budgets main.go orchestration: loop specialists, each a review pass + adversarial recheck
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
consolidate.go verdict parsing + one-comment consolidation (a section per specialist)
model.go provider/model resolution (majordomo.Parse) + env endpoint aliases model.go provider/model resolution (majordomo.Parse) + env endpoint aliases
tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff) tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
recheck.go second-pass verification prompt + verdict recompute recheck.go second-pass verification prompt + verdict recompute
*_test.go sandbox, recheck, wrap-up, and spec/endpoint-parse unit tests *_test.go sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests
scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific) scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific)
entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that
@@ -78,10 +80,19 @@ GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly
## Configuration ## Configuration
The full env reference lives in the **README** (`Models & providers` + `Configuration`). The full env reference lives in the **README** (`Specialists`, `Models & providers`,
Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`), `GADFLY_MODEL`/`GADFLY_MODELS`, `Configuration`). Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`),
`GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via `GADFLY_ENDPOINT_<NAME>` / `GADFLY_MODEL`/`GADFLY_MODELS`, `GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via
`GADFLY_ALIAS_<NAME>` (http-capable) and majordomo `LLM_*` DSNs (HTTPS-only). `GADFLY_ENDPOINT_<NAME>` / `GADFLY_ALIAS_<NAME>` (http-capable) and majordomo `LLM_*` DSNs
(HTTPS-only).
**Specialists (the swarm):** the reviewer runs a suite of focused lenses, one consolidated
comment with a section each. Default suite = security/correctness/maintainability/performance/
error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via
`GADFLY_SPECIALISTS` (csv or `all`); define/override via `GADFLY_SPECIALIST_<NAME>` env or a repo
`.gadfly.yml` (`specialists:` + `define:`). See `cmd/gadfly/specialists.go`. Cost ≈
specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one).
Dynamic `auto` selection (a cheap model picks lenses per-diff) is the planned next step.
**Tested vs untested:** only the Ollama paths (local + OpenAI-compatible pointed at Ollama) **Tested vs untested:** only the Ollama paths (local + OpenAI-compatible pointed at Ollama)
are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are
+40
View File
@@ -102,6 +102,46 @@ which Gadfly also honors — but those are **HTTPS-only**, so for plaintext loca
> **Gitea Actions note:** repo `vars`/`secrets` aren't auto-exposed as env — add each alias to > **Gitea Actions note:** repo `vars`/`secrets` aren't auto-exposed as env — add each alias to
> the stub workflow's `env:` block, e.g. `GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}`. > the stub workflow's `env:` block, e.g. `GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}`.
## Specialists (the review swarm)
Instead of one generic reviewer, Gadfly runs a **suite of specialists** — each a focused lens
with its own review (+recheck) pass — and merges them into **one comment**, a collapsible
section per lens, led by an overall verdict (the worst across lenses; the optional
`improvements` lens never escalates it).
**Default suite** (when nothing is configured):
`security`, `correctness`, `maintainability` (code cleanliness), `performance`, `error-handling`.
**Also built in** (opt-in by name): `tests`, `docs`, `conventions`, and `improvements`
(strict & quiet — at most 12 high-value, non-blocking suggestions, silent otherwise).
Select which run with **`GADFLY_SPECIALISTS`** (comma-separated names, or `all`):
```yaml
GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"
```
**Define your own** — two ways, which compose (env overrides file overrides built-ins):
```yaml
# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>" (also overrides a built-in by reusing its name)
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
GADFLY_SPECIALISTS: "security,correctness,migrations"
```
```yaml
# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
specialists: [security, correctness, maintainability, migrations]
define:
- name: migrations
title: "🗃️ DB migrations"
focus: "Review schema migrations for destructive ops, missing indexes, table locks."
```
> **Cost:** each specialist is its own review+recheck, so cost ≈ *specialists × models × 2*.
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, and a future
> `auto` mode will let a cheap model pick only the lenses a given diff actually needs.
### Triggers ### Triggers
1. A **new/reopened/ready** non-draft PR — automatic. 1. A **new/reopened/ready** non-draft PR — automatic.
+94
View File
@@ -0,0 +1,94 @@
package main
import (
"fmt"
"strings"
)
// verdict is the severity a specialist (or the whole review) lands on.
type verdict int
const (
verdictUnknown verdict = iota // couldn't parse; treated below Minor for display, not blocking
verdictClean // "No material issues found"
verdictMinor // "Minor issues"
verdictBlocking // "Blocking issues found"
)
func (v verdict) label() string {
switch v {
case verdictClean:
return "No material issues found"
case verdictMinor:
return "Minor issues"
case verdictBlocking:
return "Blocking issues found"
default:
return "Reviewed"
}
}
// parseVerdict extracts a specialist's self-reported verdict from its output.
// The base prompt tells each lens to lead with one of the three phrases.
func parseVerdict(out string) verdict {
l := strings.ToLower(out)
switch {
case strings.Contains(l, "blocking issues found"):
return verdictBlocking
case strings.Contains(l, "no material issues"):
return verdictClean
case strings.Contains(l, "minor issues"):
return verdictMinor
default:
return verdictUnknown
}
}
// specialistResult pairs a specialist with its rendered review and verdict.
type specialistResult struct {
spec Specialist
out string
verdict verdict
}
// worstVerdict returns the most severe verdict across results. The optional
// "improvements" lens never escalates the overall verdict (it's advisory).
func worstVerdict(results []specialistResult) verdict {
worst := verdictClean
for _, r := range results {
if r.spec.Name == "improvements" {
continue
}
if r.verdict > worst {
worst = r.verdict
}
}
return worst
}
// renderConsolidated assembles the single comment body: an overall verdict line
// followed by one verbatim section per specialist. run.sh wraps this with the
// "🪰 Gadfly review — <model>" header and the advisory footer.
func renderConsolidated(results []specialistResult) string {
var b strings.Builder
fmt.Fprintf(&b, "**Verdict: %s** — %d reviewers: %s\n",
worstVerdict(results).label(), len(results), strings.Join(specialistNames(results), ", "))
for _, r := range results {
body := strings.TrimSpace(r.out)
if body == "" {
body = "_(no output)_"
}
fmt.Fprintf(&b, "\n<details><summary><b>%s</b> — %s</summary>\n\n%s\n\n</details>\n",
r.spec.Title, r.verdict.label(), body)
}
return strings.TrimRight(b.String(), "\n")
}
func specialistNames(results []specialistResult) []string {
names := make([]string, 0, len(results))
for _, r := range results {
names = append(names, r.spec.Name)
}
return names
}
+36 -15
View File
@@ -65,8 +65,10 @@ import (
) )
const ( const (
defaultMaxSteps = 24 defaultMaxSteps = 24
defaultTimeoutSecs = 300 // defaultTimeoutSecs is the overall deadline shared by ALL specialist passes
// in one invocation; the default suite runs several lenses sequentially.
defaultTimeoutSecs = 600
defaultMaxDiffChars = 60000 defaultMaxDiffChars = 60000
// defaultWrapUpReserve is how many steps before the cap the agent is told // defaultWrapUpReserve is how many steps before the cap the agent is told
// to stop investigating and write its final answer. Reserving a margin is // to stop investigating and write its final answer. Reserving a margin is
@@ -130,34 +132,53 @@ func run() error {
return fmt.Errorf("resolve model: %w", err) return fmt.Errorf("resolve model: %w", err)
} }
specialists, serrs := resolveSpecialists(repoDir)
for _, e := range serrs {
fmt.Fprintln(os.Stderr, "gadfly:", e)
}
if len(specialists) == 0 {
return errors.New("no specialists resolved to run")
}
timeout := time.Duration(envInt("GADFLY_TIMEOUT_SECS", defaultTimeoutSecs)) * time.Second timeout := time.Duration(envInt("GADFLY_TIMEOUT_SECS", defaultTimeoutSecs)) * time.Second
ctx, cancel := context.WithTimeout(context.Background(), timeout) ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel() defer cancel()
// Pass 1 — review: produce the draft. base := string(systemBytes)
draft, err := runAgent(ctx, mdl, fsTools, string(systemBytes), buildTask(diff), task := buildTask(diff)
envInt("GADFLY_MAX_STEPS", defaultMaxSteps)) results := make([]specialistResult, 0, len(specialists))
if err != nil { for _, sp := range specialists {
return fmt.Errorf("review pass: %w", err) out := reviewWithSpecialist(ctx, mdl, fsTools, base, sp, task, diff)
results = append(results, specialistResult{spec: sp, out: out, verdict: parseVerdict(out)})
}
fmt.Println(renderConsolidated(results))
return nil
}
// reviewWithSpecialist runs one lens end-to-end: a review pass under the
// specialist's composed prompt, then the shared adversarial recheck pass. A
// failed pass is rendered as an inline notice (advisory — one lens failing
// never sinks the others or the job).
func reviewWithSpecialist(ctx context.Context, mdl llm.Model, fsTools *repoFS, base string, sp Specialist, task, diff string) string {
draft, err := runAgent(ctx, mdl, fsTools, composeSpecialistPrompt(base, sp), task,
envInt("GADFLY_MAX_STEPS", defaultMaxSteps))
if err != nil {
fmt.Fprintf(os.Stderr, "gadfly: specialist %q review pass failed: %v\n", sp.Name, err)
return fmt.Sprintf("⚠️ This reviewer failed to complete: %v", err)
} }
// Pass 2 — recheck: adversarially re-verify the draft's findings and drop
// the unconfirmed ones. Skipped for a clean draft (nothing to verify) or
// when disabled. A recheck failure is non-fatal — we emit the unverified
// draft rather than losing the review entirely.
final := draft final := draft
if shouldRecheck(draft) { if shouldRecheck(draft) {
rechecked, rerr := runAgent(ctx, mdl, fsTools, recheckSystemPrompt, buildRecheckTask(draft, diff), rechecked, rerr := runAgent(ctx, mdl, fsTools, recheckSystemPrompt, buildRecheckTask(draft, diff),
envInt("GADFLY_RECHECK_MAX_STEPS", defaultRecheckMaxSteps)) envInt("GADFLY_RECHECK_MAX_STEPS", defaultRecheckMaxSteps))
if rerr != nil { if rerr != nil {
fmt.Fprintln(os.Stderr, "gadfly: recheck pass failed; emitting unverified draft:", rerr) fmt.Fprintf(os.Stderr, "gadfly: specialist %q recheck failed; emitting unverified draft: %v\n", sp.Name, rerr)
} else { } else {
final = rechecked final = rechecked
} }
} }
return final
fmt.Println(final)
return nil
} }
// runAgent runs one agent pass (its own fresh toolbox over the sandbox) and // runAgent runs one agent pass (its own fresh toolbox over the sandbox) and
+219
View File
@@ -0,0 +1,219 @@
package main
import (
"fmt"
"os"
"path/filepath"
"sort"
"strings"
"gopkg.in/yaml.v3"
)
// Specialist is one review lens: a focused instruction appended to the base
// system prompt, run as its own review (+recheck) pass and rendered as its own
// section in the consolidated comment.
type Specialist struct {
Name string // lowercase id, used in GADFLY_SPECIALISTS and section anchors
Title string // human label for the section header
Focus string // lens-specific instruction appended to the base prompt
}
// builtinSpecialists are the lenses Gadfly ships with. The default suite (below)
// is the "biggies"; the rest are opt-in by name via GADFLY_SPECIALISTS, a
// .gadfly.yml `specialists:` list, or `all`.
var builtinSpecialists = map[string]Specialist{
"security": {"security", "🔒 Security", "" +
"Authn/authz gaps, injection (SQL/command/path/template), SSRF, unsafe deserialization, " +
"secret/credential leakage, missing input validation, unsafe handling of untrusted data, " +
"and insecure defaults. Trace tainted input to where it's used."},
"correctness": {"correctness", "✅ Correctness", "" +
"Logic bugs and incorrect behavior introduced by the change. Pay special attention to " +
"SEMANTIC / domain correctness: do NOT trust a constant, conversion factor, formula, unit, " +
"or threshold because it looks reasonable — independently RE-DERIVE the expected value from " +
"first principles (units, dimensions, boundary values) and compare. A plausible-looking magic " +
"number is exactly where real bugs hide."},
"maintainability": {"maintainability", "🧹 Code cleanliness & maintainability", "" +
"Readability and structure: dead/duplicated code, confusing names, overly long or deeply " +
"nested functions, leaky abstractions, copy-paste that should be shared, and changes that " +
"don't follow the patterns the surrounding code already uses. Prefer concrete, low-churn fixes."},
"performance": {"performance", "⚡ Performance", "" +
"Efficiency regressions: N+1 queries, unnecessary allocations or copies, work inside hot loops, " +
"unbounded growth, missing pagination/limits, blocking calls on hot paths, and avoidable " +
"quadratic behavior. Only flag impact you can justify, not micro-optimizations."},
"error-handling": {"error-handling", "🧯 Error handling & edge cases", "" +
"Ignored or swallowed errors, missing cleanup/rollback/`defer`, panics on bad input, and " +
"unhandled edge cases: nil/null, empty collections, zero/negative, integer overflow, and " +
"boundary (off-by-one) conditions. Check the unhappy paths the diff introduces."},
// --- opt-in (not in the default suite) ---
"tests": {"tests", "🧪 Tests", "" +
"Whether the change is adequately tested: new logic without tests, changed behavior whose " +
"tests weren't updated, missing edge-case/error-path coverage, and brittle or assertion-free tests."},
"docs": {"docs", "📝 Docs", "" +
"Documentation drift: public APIs/flags/config changed without updating doc comments, README, " +
"or examples; stale comments now contradicting the code; missing docs for new exported surface."},
"conventions": {"conventions", "📐 Conventions", "" +
"Adherence to THIS repo's own conventions. Discover them — read any README / CONTRIBUTING / " +
"CLAUDE.md / AGENTS.md / lint config the repo ships, and hold the change to the documented and " +
"surrounding-code patterns."},
"improvements": {"improvements", "💡 Improvements (optional)", "" +
"OPTIONAL, non-blocking suggestions only. Surface at most 12 genuinely high-value blind spots " +
"or simplifications the author likely wants to know about. Be quiet: if nothing rises to that " +
"bar, reply exactly 'No suggestions.' Never nitpick, never pad. These never affect the verdict."},
}
// defaultSuite is used when nothing selects specialists explicitly.
var defaultSuite = []string{"security", "correctness", "maintainability", "performance", "error-handling"}
// fileConfig is the optional .gadfly.yml schema.
type fileConfig struct {
// Specialists selects which run (names). Empty => the default suite.
Specialists []string `yaml:"specialists"`
// Define adds or overrides specialist definitions.
Define []struct {
Name string `yaml:"name"`
Title string `yaml:"title"`
Focus string `yaml:"focus"`
} `yaml:"define"`
}
// resolveSpecialists builds the active specialist list from (in increasing
// precedence) the built-ins, a repo .gadfly.yml, and GADFLY_SPECIALIST_<NAME>
// env definitions. The active SELECTION is GADFLY_SPECIALISTS (csv or "all"),
// else the file's `specialists:`, else the default suite. Unknown names are
// reported as errors and skipped; valid ones still run.
func resolveSpecialists(repoDir string) ([]Specialist, []error) {
var errs []error
// 1. registry seeded with built-ins.
registry := make(map[string]Specialist, len(builtinSpecialists))
for k, v := range builtinSpecialists {
registry[k] = v
}
// 2. config file: definitions + (maybe) the selection.
var fileSelection []string
if cfg, ok, err := loadFileConfig(repoDir); err != nil {
errs = append(errs, err)
} else if ok {
for _, d := range cfg.Define {
name := strings.ToLower(strings.TrimSpace(d.Name))
if name == "" || strings.TrimSpace(d.Focus) == "" {
errs = append(errs, fmt.Errorf(".gadfly.yml: specialist define needs non-empty name and focus"))
continue
}
registry[name] = Specialist{Name: name, Title: titleOr(d.Title, name), Focus: d.Focus}
}
fileSelection = cfg.Specialists
}
// 3. env definitions: GADFLY_SPECIALIST_<NAME>="focus".
for _, kv := range os.Environ() {
key, val, _ := strings.Cut(kv, "=")
if !strings.HasPrefix(key, "GADFLY_SPECIALIST_") || len(key) == len("GADFLY_SPECIALIST_") {
continue
}
name := strings.ToLower(strings.TrimPrefix(key, "GADFLY_SPECIALIST_"))
if strings.TrimSpace(val) == "" {
errs = append(errs, fmt.Errorf("%s: empty specialist focus", key))
continue
}
existing, ok := registry[name]
title := titleOr("", name)
if ok {
title = existing.Title
}
registry[name] = Specialist{Name: name, Title: title, Focus: val}
}
// 4. selection: env > file > default suite.
var names []string
switch {
case strings.TrimSpace(os.Getenv("GADFLY_SPECIALISTS")) != "":
names = splitCSV(os.Getenv("GADFLY_SPECIALISTS"))
case len(fileSelection) > 0:
names = fileSelection
default:
names = defaultSuite
}
if len(names) == 1 && strings.EqualFold(names[0], "all") {
names = sortedKeys(registry)
}
// 5. resolve names -> specialists.
var out []Specialist
seen := map[string]bool{}
for _, raw := range names {
name := strings.ToLower(strings.TrimSpace(raw))
if name == "" || seen[name] {
continue
}
seen[name] = true
sp, ok := registry[name]
if !ok {
errs = append(errs, fmt.Errorf("unknown specialist %q (built-ins: %s; or define it via .gadfly.yml / GADFLY_SPECIALIST_%s)",
name, strings.Join(sortedKeys(builtinSpecialists), ", "), strings.ToUpper(name)))
continue
}
out = append(out, sp)
}
return out, errs
}
// composeSpecialistPrompt appends a specialist's lens to the base system prompt.
func composeSpecialistPrompt(base string, sp Specialist) string {
return strings.TrimRight(base, "\n") +
"\n\n## Your review lens: " + sp.Title + "\n" +
"Focus your review on this lens; other reviewers cover the rest. " + sp.Focus
}
func loadFileConfig(repoDir string) (fileConfig, bool, error) {
candidates := []string{}
if p := strings.TrimSpace(os.Getenv("GADFLY_CONFIG")); p != "" {
if !filepath.IsAbs(p) {
p = filepath.Join(repoDir, p)
}
candidates = append(candidates, p)
} else {
candidates = append(candidates, filepath.Join(repoDir, ".gadfly.yml"), filepath.Join(repoDir, ".gadfly.yaml"))
}
for _, p := range candidates {
data, err := os.ReadFile(p)
if err != nil {
continue // not found / unreadable: try next
}
var cfg fileConfig
if err := yaml.Unmarshal(data, &cfg); err != nil {
return fileConfig{}, false, fmt.Errorf("parse %s: %w", filepath.Base(p), err)
}
return cfg, true, nil
}
return fileConfig{}, false, nil
}
func titleOr(title, name string) string {
if strings.TrimSpace(title) != "" {
return title
}
return name
}
func splitCSV(s string) []string {
var out []string
for _, p := range strings.Split(s, ",") {
if t := strings.TrimSpace(p); t != "" {
out = append(out, t)
}
}
return out
}
func sortedKeys(m map[string]Specialist) []string {
keys := make([]string, 0, len(m))
for k := range m {
keys = append(keys, k)
}
sort.Strings(keys)
return keys
}
+118
View File
@@ -0,0 +1,118 @@
package main
import (
"os"
"path/filepath"
"testing"
)
func names(specs []Specialist) []string {
out := make([]string, len(specs))
for i, s := range specs {
out[i] = s.Name
}
return out
}
func eq(a, b []string) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if a[i] != b[i] {
return false
}
}
return true
}
func TestResolveSpecialists_DefaultSuite(t *testing.T) {
t.Setenv("GADFLY_SPECIALISTS", "")
specs, errs := resolveSpecialists(t.TempDir())
if len(errs) != 0 {
t.Fatalf("unexpected errors: %v", errs)
}
if !eq(names(specs), defaultSuite) {
t.Errorf("default = %v, want %v", names(specs), defaultSuite)
}
}
func TestResolveSpecialists_EnvSelection(t *testing.T) {
t.Setenv("GADFLY_SPECIALISTS", "security, tests")
specs, errs := resolveSpecialists(t.TempDir())
if len(errs) != 0 {
t.Fatalf("unexpected errors: %v", errs)
}
if !eq(names(specs), []string{"security", "tests"}) {
t.Errorf("got %v", names(specs))
}
}
func TestResolveSpecialists_UnknownNameErrors(t *testing.T) {
t.Setenv("GADFLY_SPECIALISTS", "security,bogus")
specs, errs := resolveSpecialists(t.TempDir())
if len(errs) == 0 {
t.Fatal("expected an error for unknown specialist")
}
if !eq(names(specs), []string{"security"}) {
t.Errorf("valid ones should still resolve, got %v", names(specs))
}
}
func TestResolveSpecialists_EnvCustomDefinition(t *testing.T) {
t.Setenv("GADFLY_SPECIALIST_MIGRATIONS", "Review DB migrations for destructive ops.")
t.Setenv("GADFLY_SPECIALISTS", "migrations")
specs, errs := resolveSpecialists(t.TempDir())
if len(errs) != 0 {
t.Fatalf("unexpected errors: %v", errs)
}
if len(specs) != 1 || specs[0].Name != "migrations" || specs[0].Focus == "" {
t.Fatalf("custom specialist not registered: %+v", specs)
}
}
func TestResolveSpecialists_FileConfig(t *testing.T) {
dir := t.TempDir()
cfg := `specialists: [security, migrations]
define:
- name: migrations
title: "DB migrations"
focus: "Review schema migrations for destructive or unindexed changes."
`
if err := os.WriteFile(filepath.Join(dir, ".gadfly.yml"), []byte(cfg), 0o644); err != nil {
t.Fatal(err)
}
t.Setenv("GADFLY_SPECIALISTS", "") // let the file drive selection
specs, errs := resolveSpecialists(dir)
if len(errs) != 0 {
t.Fatalf("unexpected errors: %v", errs)
}
if !eq(names(specs), []string{"security", "migrations"}) {
t.Errorf("got %v", names(specs))
}
}
func TestParseVerdictAndWorst(t *testing.T) {
cases := map[string]verdict{
"VERDICT: No material issues found.": verdictClean,
"Minor issues\n- nit": verdictMinor,
"**Blocking issues found**": verdictBlocking,
"something unparseable": verdictUnknown,
}
for in, want := range cases {
if got := parseVerdict(in); got != want {
t.Errorf("parseVerdict(%q) = %v, want %v", in, got, want)
}
}
results := []specialistResult{
{spec: Specialist{Name: "security"}, verdict: verdictMinor},
{spec: Specialist{Name: "correctness"}, verdict: verdictBlocking},
{spec: Specialist{Name: "improvements"}, verdict: verdictBlocking}, // must not count
}
if w := worstVerdict(results[:2]); w != verdictBlocking {
t.Errorf("worst = %v, want blocking", w)
}
if w := worstVerdict([]specialistResult{results[0], results[2]}); w != verdictMinor {
t.Errorf("improvements should not escalate; worst = %v, want minor", w)
}
}
+3 -1
View File
@@ -41,7 +41,9 @@
# empty => fall back to "is a repo collaborator" # empty => fall back to "is a repo collaborator"
set -uo pipefail set -uo pipefail
DEFAULT_MODELS="qwen3-coder:480b-cloud,gpt-oss:120b-cloud" # One model by default: the specialist suite already provides breadth, so a
# multi-model default would multiply cost (models × specialists × 2 passes).
DEFAULT_MODELS="qwen3-coder:480b-cloud"
TRIGGER_PHRASE="${GADFLY_TRIGGER_PHRASE:-@gadfly review}" TRIGGER_PHRASE="${GADFLY_TRIGGER_PHRASE:-@gadfly review}"
SCRIPTS_DIR="/app/scripts" SCRIPTS_DIR="/app/scripts"
WORKDIR="${WORKDIR:-/tmp/gadfly}" WORKDIR="${WORKDIR:-/tmp/gadfly}"
+34
View File
@@ -0,0 +1,34 @@
# Optional per-repo Gadfly config. Place at the repo ROOT as `.gadfly.yml`
# (or point GADFLY_CONFIG at it). It's read from the PR's checked-out tree, so
# it's version-controlled and reviewed like any other file.
#
# Precedence for specialist DEFINITIONS: built-ins < this file < GADFLY_SPECIALIST_* env.
# Precedence for SELECTION: GADFLY_SPECIALISTS env > this file's `specialists:` > default suite.
# Which specialists run. Omit this key to use the default suite
# (security, correctness, maintainability, performance, error-handling).
# Built-in opt-ins you can add: tests, docs, conventions, improvements. Or "all".
specialists:
- security
- correctness
- maintainability
- tests
- migrations # a custom one, defined below
# Add new specialists, or override a built-in by reusing its name (e.g. give
# `security` a repo-specific focus). `focus` is appended to the base reviewer
# prompt as that lens's instruction.
define:
- name: migrations
title: "🗃️ DB migrations"
focus: >
Review database schema migrations for destructive operations (DROP/ALTER
that loses data), missing or unindexed foreign keys, non-idempotent steps,
and changes that would lock a large table on deploy. Check that any new
column added to a domain struct is wired through every storage layer.
- name: security
title: "🔒 Security (house rules)"
focus: >
In addition to the usual security review, this repo requires: all web
routes use the auth middleware, no secrets in logs, and all external HTTP
calls set a timeout. Flag any violation.
+1
View File
@@ -10,6 +10,7 @@ set the secrets/vars it references. Gadfly is advisory only — it never blocks
| [`local-ollama.yml`](local-ollama.yml) | a **local/LAN Ollama** daemon | nothing (or `GADFLY_BASE_URL` for a remote host) | | [`local-ollama.yml`](local-ollama.yml) | a **local/LAN Ollama** daemon | nothing (or `GADFLY_BASE_URL` for a remote host) |
| [`openai-compatible.yml`](openai-compatible.yml) | any **OpenAI-compatible** endpoint (local Ollama `/v1`, gateway, vLLM, OpenRouter…) | `GADFLY_BASE_URL` (+ a key for most gateways) | | [`openai-compatible.yml`](openai-compatible.yml) | any **OpenAI-compatible** endpoint (local Ollama `/v1`, gateway, vLLM, OpenRouter…) | `GADFLY_BASE_URL` (+ a key for most gateways) |
| [`endpoint-aliases.yml`](endpoint-aliases.yml) | **several named backends** at once (one comment each) | repo vars `GADFLY_ENDPOINT_<NAME>` | | [`endpoint-aliases.yml`](endpoint-aliases.yml) | **several named backends** at once (one comment each) | repo vars `GADFLY_ENDPOINT_<NAME>` |
| [`.gadfly.yml`](.gadfly.yml) | **per-repo specialist config** (not a workflow — goes at your repo root) | — |
Common to all: Common to all:
- **Triggers:** new/reopened/ready non-draft PR (auto), `@gadfly review` comment (allowed users), - **Triggers:** new/reopened/ready non-draft PR (auto), `@gadfly review` comment (allowed users),
+5
View File
@@ -43,6 +43,11 @@ jobs:
OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }} OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }}
OLLAMA_REVIEW_MODELS: ${{ vars.OLLAMA_REVIEW_MODELS }} OLLAMA_REVIEW_MODELS: ${{ vars.OLLAMA_REVIEW_MODELS }}
GADFLY_ALLOWED_USERS: ${{ vars.GADFLY_ALLOWED_USERS }} GADFLY_ALLOWED_USERS: ${{ vars.GADFLY_ALLOWED_USERS }}
# Specialist suite (optional). Empty = default suite
# (security,correctness,maintainability,performance,error-handling).
# csv to choose; "all" for everything; or define custom ones via a repo
# .gadfly.yml / GADFLY_SPECIALIST_<NAME>. See README "Specialists".
GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS }}
# --- Models & providers (optional; default = Ollama Cloud) ---------- # --- Models & providers (optional; default = Ollama Cloud) ----------
# Gadfly is majordomo-powered, so it can target other backends. Set a # Gadfly is majordomo-powered, so it can target other backends. Set a
# provider for bare model ids; point at a different endpoint with a # provider for bare model ids; point at a different endpoint with a
+4 -1
View File
@@ -2,7 +2,10 @@ module gitea.stevedudenhoeffer.com/steve/gadfly
go 1.26.2 go 1.26.2
require gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b require (
gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b
gopkg.in/yaml.v3 v3.0.1
)
require ( require (
cloud.google.com/go v0.116.0 // indirect cloud.google.com/go v0.116.0 // indirect
+2
View File
@@ -121,8 +121,10 @@ google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpAD
google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c= google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c=
google.golang.org/protobuf v1.34.2 h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg= google.golang.org/protobuf v1.34.2 h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg=
google.golang.org/protobuf v1.34.2/go.mod h1:qYOHts0dSfpeUzUFpOMr/WGzszTmLH+DiWniOlNbLDw= google.golang.org/protobuf v1.34.2/go.mod h1:qYOHts0dSfpeUzUFpOMr/WGzszTmLH+DiWniOlNbLDw=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
+8 -15
View File
@@ -1,6 +1,10 @@
You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
pull request below — not to praise it. A gadfly does not let things slide. pull request below — not to praise it. A gadfly does not let things slide.
You review through ONE assigned lens (given at the end of this prompt). Stay in your lane —
other reviewers cover the other angles — but do report anything clearly serious you happen
to notice.
You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
state. USE THEM to verify before you report. Do not review the diff in isolation. state. USE THEM to verify before you report. Do not review the diff in isolation.
- read_file(path[, start_line, limit]) — read a file with line numbers. - read_file(path[, start_line, limit]) — read a file with line numbers.
@@ -19,21 +23,10 @@ Mandatory verification discipline — this is the whole point of giving you tool
- If you cannot confirm a suspicion with the tools, either drop it or clearly label it - If you cannot confirm a suspicion with the tools, either drop it or clearly label it
"unverified" — do NOT present an unchecked guess as a finding. "unverified" — do NOT present an unchecked guess as a finding.
Be skeptical and concrete. Hunt specifically for: Be skeptical and concrete, and apply your assigned lens rigorously. A recurring, high-value
- Correctness bugs and logic errors introduced by the change. discipline regardless of lens: do NOT trust a constant, conversion factor, formula, unit, or
- SEMANTIC / domain correctness — the failure mode plausible-looking code hides best. threshold just because it looks reasonable — RE-DERIVE the expected value from first principles
Do NOT trust a constant, conversion factor, formula, unit, or threshold just because and compare. Plausible-looking magic numbers are where real bugs hide.
it looks reasonable. Independently RE-DERIVE the expected value from first principles
(units, dimensions, edge values) and compare. A magic number that "looks about right"
is exactly where real bugs hide (e.g. a linear factor used where it must be squared).
- Concurrency issues: data races, deadlocks, unsynchronized shared state, leaked tasks.
- Security problems: injection, missing authz/authn, secret leakage, unsafe input handling.
- Error handling gaps: ignored errors, swallowed exceptions, missing rollback/cleanup.
- Resource leaks: unclosed handles/bodies/files, context/lifetime misuse, unbounded growth.
- Missed edge cases: off-by-one, nil/null, empty collection, overflow, zero/negative.
- Violations of THIS repo's own conventions. Discover them — do not assume. Read any
README / CONTRIBUTING / CLAUDE.md / AGENTS.md / lint config the repo ships, and hold
the change to the patterns the surrounding code actually uses.
Output rules: Output rules:
- Output GitHub-flavored markdown, concise. No filler, no restating the diff. - Output GitHub-flavored markdown, concise. No filler, no restating the diff.