From 7809d1b93d30a1b437c637d66b30fe4d6332f915 Mon Sep 17 00:00:00 2001 From: Steve Dudenhoeffer Date: Thu, 25 Jun 2026 19:23:05 -0400 Subject: [PATCH] =?UTF-8?q?feat:=20specialist=20suite=20=E2=80=94=20config?= =?UTF-8?q?urable=20+=20custom=20review=20lenses=20(one=20consolidated=20c?= =?UTF-8?q?omment)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the single generic review with a suite of focused specialists, each its own review+recheck pass, merged into ONE comment (a collapsible section per lens, led by the worst verdict; the optional `improvements` lens never escalates it). - cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness, maintainability, performance, error-handling) + opt-in (tests, docs, conventions, improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via GADFLY_SPECIALIST_ env and a repo .gadfly.yml (specialists + define). Precedence: built-ins < file < env. Unknown names error but don't sink the run. - cmd/gadfly/consolidate.go: verdict parse + one-comment render. - main.go: loop specialists; per-lens failure is an inline notice, never fatal. Default timeout bumped to 600s (suite runs sequentially). - base system prompt trimmed to persona+tools+discipline+output; lens-specific focus is appended per specialist (semantic re-derivation discipline kept in base). - entrypoint default models -> single model (suite already gives breadth; cost ~= specialists × models × 2). Adds gopkg.in/yaml.v3. - docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var, CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step. Co-Authored-By: Claude Opus 4.8 (1M context) --- CLAUDE.md | 23 +++- README.md | 40 ++++++ cmd/gadfly/consolidate.go | 94 ++++++++++++++ cmd/gadfly/main.go | 51 +++++--- cmd/gadfly/specialists.go | 219 ++++++++++++++++++++++++++++++++ cmd/gadfly/specialists_test.go | 118 +++++++++++++++++ entrypoint.sh | 4 +- examples/.gadfly.yml | 34 +++++ examples/README.md | 1 + examples/adversarial-review.yml | 5 + go.mod | 5 +- go.sum | 2 + scripts/system-prompt.txt | 23 ++-- 13 files changed, 581 insertions(+), 38 deletions(-) create mode 100644 cmd/gadfly/consolidate.go create mode 100644 cmd/gadfly/specialists.go create mode 100644 cmd/gadfly/specialists_test.go create mode 100644 examples/.gadfly.yml diff --git a/CLAUDE.md b/CLAUDE.md index ceebca4..25cfa70 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -29,11 +29,13 @@ verifies each one against the actual code, and posts its findings as a comment. ``` cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout) - main.go agent orchestration: review pass + adversarial recheck pass, budgets + main.go orchestration: loop specialists, each a review pass + adversarial recheck + specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution + consolidate.go verdict parsing + one-comment consolidation (a section per specialist) model.go provider/model resolution (majordomo.Parse) + env endpoint aliases tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff) recheck.go second-pass verification prompt + verdict recompute - *_test.go sandbox, recheck, wrap-up, and spec/endpoint-parse unit tests + *_test.go sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific) entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that @@ -78,10 +80,19 @@ GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly ## Configuration -The full env reference lives in the **README** (`Models & providers` + `Configuration`). -Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`), `GADFLY_MODEL`/`GADFLY_MODELS`, -`GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via `GADFLY_ENDPOINT_` / -`GADFLY_ALIAS_` (http-capable) and majordomo `LLM_*` DSNs (HTTPS-only). +The full env reference lives in the **README** (`Specialists`, `Models & providers`, +`Configuration`). Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`), +`GADFLY_MODEL`/`GADFLY_MODELS`, `GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via +`GADFLY_ENDPOINT_` / `GADFLY_ALIAS_` (http-capable) and majordomo `LLM_*` DSNs +(HTTPS-only). + +**Specialists (the swarm):** the reviewer runs a suite of focused lenses, one consolidated +comment with a section each. Default suite = security/correctness/maintainability/performance/ +error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via +`GADFLY_SPECIALISTS` (csv or `all`); define/override via `GADFLY_SPECIALIST_` env or a repo +`.gadfly.yml` (`specialists:` + `define:`). See `cmd/gadfly/specialists.go`. Cost ≈ +specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one). +Dynamic `auto` selection (a cheap model picks lenses per-diff) is the planned next step. **Tested vs untested:** only the Ollama paths (local + OpenAI-compatible pointed at Ollama) are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are diff --git a/README.md b/README.md index 21b74ed..226bacf 100644 --- a/README.md +++ b/README.md @@ -102,6 +102,46 @@ which Gadfly also honors — but those are **HTTPS-only**, so for plaintext loca > **Gitea Actions note:** repo `vars`/`secrets` aren't auto-exposed as env — add each alias to > the stub workflow's `env:` block, e.g. `GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}`. +## Specialists (the review swarm) + +Instead of one generic reviewer, Gadfly runs a **suite of specialists** — each a focused lens +with its own review (+recheck) pass — and merges them into **one comment**, a collapsible +section per lens, led by an overall verdict (the worst across lenses; the optional +`improvements` lens never escalates it). + +**Default suite** (when nothing is configured): +`security`, `correctness`, `maintainability` (code cleanliness), `performance`, `error-handling`. + +**Also built in** (opt-in by name): `tests`, `docs`, `conventions`, and `improvements` +(strict & quiet — at most 1–2 high-value, non-blocking suggestions, silent otherwise). + +Select which run with **`GADFLY_SPECIALISTS`** (comma-separated names, or `all`): + +```yaml +GADFLY_SPECIALISTS: "security,correctness,maintainability,tests" +``` + +**Define your own** — two ways, which compose (env overrides file overrides built-ins): + +```yaml +# 1. env: GADFLY_SPECIALIST_="" (also overrides a built-in by reusing its name) +GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes." +GADFLY_SPECIALISTS: "security,correctness,migrations" +``` + +```yaml +# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml: +specialists: [security, correctness, maintainability, migrations] +define: + - name: migrations + title: "🗃️ DB migrations" + focus: "Review schema migrations for destructive ops, missing indexes, table locks." +``` + +> **Cost:** each specialist is its own review+recheck, so cost ≈ *specialists × models × 2*. +> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, and a future +> `auto` mode will let a cheap model pick only the lenses a given diff actually needs. + ### Triggers 1. A **new/reopened/ready** non-draft PR — automatic. diff --git a/cmd/gadfly/consolidate.go b/cmd/gadfly/consolidate.go new file mode 100644 index 0000000..d354335 --- /dev/null +++ b/cmd/gadfly/consolidate.go @@ -0,0 +1,94 @@ +package main + +import ( + "fmt" + "strings" +) + +// verdict is the severity a specialist (or the whole review) lands on. +type verdict int + +const ( + verdictUnknown verdict = iota // couldn't parse; treated below Minor for display, not blocking + verdictClean // "No material issues found" + verdictMinor // "Minor issues" + verdictBlocking // "Blocking issues found" +) + +func (v verdict) label() string { + switch v { + case verdictClean: + return "No material issues found" + case verdictMinor: + return "Minor issues" + case verdictBlocking: + return "Blocking issues found" + default: + return "Reviewed" + } +} + +// parseVerdict extracts a specialist's self-reported verdict from its output. +// The base prompt tells each lens to lead with one of the three phrases. +func parseVerdict(out string) verdict { + l := strings.ToLower(out) + switch { + case strings.Contains(l, "blocking issues found"): + return verdictBlocking + case strings.Contains(l, "no material issues"): + return verdictClean + case strings.Contains(l, "minor issues"): + return verdictMinor + default: + return verdictUnknown + } +} + +// specialistResult pairs a specialist with its rendered review and verdict. +type specialistResult struct { + spec Specialist + out string + verdict verdict +} + +// worstVerdict returns the most severe verdict across results. The optional +// "improvements" lens never escalates the overall verdict (it's advisory). +func worstVerdict(results []specialistResult) verdict { + worst := verdictClean + for _, r := range results { + if r.spec.Name == "improvements" { + continue + } + if r.verdict > worst { + worst = r.verdict + } + } + return worst +} + +// renderConsolidated assembles the single comment body: an overall verdict line +// followed by one verbatim section per specialist. run.sh wraps this with the +// "🪰 Gadfly review — " header and the advisory footer. +func renderConsolidated(results []specialistResult) string { + var b strings.Builder + fmt.Fprintf(&b, "**Verdict: %s** — %d reviewers: %s\n", + worstVerdict(results).label(), len(results), strings.Join(specialistNames(results), ", ")) + + for _, r := range results { + body := strings.TrimSpace(r.out) + if body == "" { + body = "_(no output)_" + } + fmt.Fprintf(&b, "\n
%s — %s\n\n%s\n\n
\n", + r.spec.Title, r.verdict.label(), body) + } + return strings.TrimRight(b.String(), "\n") +} + +func specialistNames(results []specialistResult) []string { + names := make([]string, 0, len(results)) + for _, r := range results { + names = append(names, r.spec.Name) + } + return names +} diff --git a/cmd/gadfly/main.go b/cmd/gadfly/main.go index 145767e..7adc073 100644 --- a/cmd/gadfly/main.go +++ b/cmd/gadfly/main.go @@ -65,8 +65,10 @@ import ( ) const ( - defaultMaxSteps = 24 - defaultTimeoutSecs = 300 + defaultMaxSteps = 24 + // defaultTimeoutSecs is the overall deadline shared by ALL specialist passes + // in one invocation; the default suite runs several lenses sequentially. + defaultTimeoutSecs = 600 defaultMaxDiffChars = 60000 // defaultWrapUpReserve is how many steps before the cap the agent is told // to stop investigating and write its final answer. Reserving a margin is @@ -130,34 +132,53 @@ func run() error { return fmt.Errorf("resolve model: %w", err) } + specialists, serrs := resolveSpecialists(repoDir) + for _, e := range serrs { + fmt.Fprintln(os.Stderr, "gadfly:", e) + } + if len(specialists) == 0 { + return errors.New("no specialists resolved to run") + } + timeout := time.Duration(envInt("GADFLY_TIMEOUT_SECS", defaultTimeoutSecs)) * time.Second ctx, cancel := context.WithTimeout(context.Background(), timeout) defer cancel() - // Pass 1 — review: produce the draft. - draft, err := runAgent(ctx, mdl, fsTools, string(systemBytes), buildTask(diff), - envInt("GADFLY_MAX_STEPS", defaultMaxSteps)) - if err != nil { - return fmt.Errorf("review pass: %w", err) + base := string(systemBytes) + task := buildTask(diff) + results := make([]specialistResult, 0, len(specialists)) + for _, sp := range specialists { + out := reviewWithSpecialist(ctx, mdl, fsTools, base, sp, task, diff) + results = append(results, specialistResult{spec: sp, out: out, verdict: parseVerdict(out)}) + } + + fmt.Println(renderConsolidated(results)) + return nil +} + +// reviewWithSpecialist runs one lens end-to-end: a review pass under the +// specialist's composed prompt, then the shared adversarial recheck pass. A +// failed pass is rendered as an inline notice (advisory — one lens failing +// never sinks the others or the job). +func reviewWithSpecialist(ctx context.Context, mdl llm.Model, fsTools *repoFS, base string, sp Specialist, task, diff string) string { + draft, err := runAgent(ctx, mdl, fsTools, composeSpecialistPrompt(base, sp), task, + envInt("GADFLY_MAX_STEPS", defaultMaxSteps)) + if err != nil { + fmt.Fprintf(os.Stderr, "gadfly: specialist %q review pass failed: %v\n", sp.Name, err) + return fmt.Sprintf("⚠️ This reviewer failed to complete: %v", err) } - // Pass 2 — recheck: adversarially re-verify the draft's findings and drop - // the unconfirmed ones. Skipped for a clean draft (nothing to verify) or - // when disabled. A recheck failure is non-fatal — we emit the unverified - // draft rather than losing the review entirely. final := draft if shouldRecheck(draft) { rechecked, rerr := runAgent(ctx, mdl, fsTools, recheckSystemPrompt, buildRecheckTask(draft, diff), envInt("GADFLY_RECHECK_MAX_STEPS", defaultRecheckMaxSteps)) if rerr != nil { - fmt.Fprintln(os.Stderr, "gadfly: recheck pass failed; emitting unverified draft:", rerr) + fmt.Fprintf(os.Stderr, "gadfly: specialist %q recheck failed; emitting unverified draft: %v\n", sp.Name, rerr) } else { final = rechecked } } - - fmt.Println(final) - return nil + return final } // runAgent runs one agent pass (its own fresh toolbox over the sandbox) and diff --git a/cmd/gadfly/specialists.go b/cmd/gadfly/specialists.go new file mode 100644 index 0000000..9c8a002 --- /dev/null +++ b/cmd/gadfly/specialists.go @@ -0,0 +1,219 @@ +package main + +import ( + "fmt" + "os" + "path/filepath" + "sort" + "strings" + + "gopkg.in/yaml.v3" +) + +// Specialist is one review lens: a focused instruction appended to the base +// system prompt, run as its own review (+recheck) pass and rendered as its own +// section in the consolidated comment. +type Specialist struct { + Name string // lowercase id, used in GADFLY_SPECIALISTS and section anchors + Title string // human label for the section header + Focus string // lens-specific instruction appended to the base prompt +} + +// builtinSpecialists are the lenses Gadfly ships with. The default suite (below) +// is the "biggies"; the rest are opt-in by name via GADFLY_SPECIALISTS, a +// .gadfly.yml `specialists:` list, or `all`. +var builtinSpecialists = map[string]Specialist{ + "security": {"security", "🔒 Security", "" + + "Authn/authz gaps, injection (SQL/command/path/template), SSRF, unsafe deserialization, " + + "secret/credential leakage, missing input validation, unsafe handling of untrusted data, " + + "and insecure defaults. Trace tainted input to where it's used."}, + "correctness": {"correctness", "✅ Correctness", "" + + "Logic bugs and incorrect behavior introduced by the change. Pay special attention to " + + "SEMANTIC / domain correctness: do NOT trust a constant, conversion factor, formula, unit, " + + "or threshold because it looks reasonable — independently RE-DERIVE the expected value from " + + "first principles (units, dimensions, boundary values) and compare. A plausible-looking magic " + + "number is exactly where real bugs hide."}, + "maintainability": {"maintainability", "🧹 Code cleanliness & maintainability", "" + + "Readability and structure: dead/duplicated code, confusing names, overly long or deeply " + + "nested functions, leaky abstractions, copy-paste that should be shared, and changes that " + + "don't follow the patterns the surrounding code already uses. Prefer concrete, low-churn fixes."}, + "performance": {"performance", "⚡ Performance", "" + + "Efficiency regressions: N+1 queries, unnecessary allocations or copies, work inside hot loops, " + + "unbounded growth, missing pagination/limits, blocking calls on hot paths, and avoidable " + + "quadratic behavior. Only flag impact you can justify, not micro-optimizations."}, + "error-handling": {"error-handling", "🧯 Error handling & edge cases", "" + + "Ignored or swallowed errors, missing cleanup/rollback/`defer`, panics on bad input, and " + + "unhandled edge cases: nil/null, empty collections, zero/negative, integer overflow, and " + + "boundary (off-by-one) conditions. Check the unhappy paths the diff introduces."}, + + // --- opt-in (not in the default suite) --- + "tests": {"tests", "🧪 Tests", "" + + "Whether the change is adequately tested: new logic without tests, changed behavior whose " + + "tests weren't updated, missing edge-case/error-path coverage, and brittle or assertion-free tests."}, + "docs": {"docs", "📝 Docs", "" + + "Documentation drift: public APIs/flags/config changed without updating doc comments, README, " + + "or examples; stale comments now contradicting the code; missing docs for new exported surface."}, + "conventions": {"conventions", "📐 Conventions", "" + + "Adherence to THIS repo's own conventions. Discover them — read any README / CONTRIBUTING / " + + "CLAUDE.md / AGENTS.md / lint config the repo ships, and hold the change to the documented and " + + "surrounding-code patterns."}, + "improvements": {"improvements", "💡 Improvements (optional)", "" + + "OPTIONAL, non-blocking suggestions only. Surface at most 1–2 genuinely high-value blind spots " + + "or simplifications the author likely wants to know about. Be quiet: if nothing rises to that " + + "bar, reply exactly 'No suggestions.' Never nitpick, never pad. These never affect the verdict."}, +} + +// defaultSuite is used when nothing selects specialists explicitly. +var defaultSuite = []string{"security", "correctness", "maintainability", "performance", "error-handling"} + +// fileConfig is the optional .gadfly.yml schema. +type fileConfig struct { + // Specialists selects which run (names). Empty => the default suite. + Specialists []string `yaml:"specialists"` + // Define adds or overrides specialist definitions. + Define []struct { + Name string `yaml:"name"` + Title string `yaml:"title"` + Focus string `yaml:"focus"` + } `yaml:"define"` +} + +// resolveSpecialists builds the active specialist list from (in increasing +// precedence) the built-ins, a repo .gadfly.yml, and GADFLY_SPECIALIST_ +// env definitions. The active SELECTION is GADFLY_SPECIALISTS (csv or "all"), +// else the file's `specialists:`, else the default suite. Unknown names are +// reported as errors and skipped; valid ones still run. +func resolveSpecialists(repoDir string) ([]Specialist, []error) { + var errs []error + + // 1. registry seeded with built-ins. + registry := make(map[string]Specialist, len(builtinSpecialists)) + for k, v := range builtinSpecialists { + registry[k] = v + } + + // 2. config file: definitions + (maybe) the selection. + var fileSelection []string + if cfg, ok, err := loadFileConfig(repoDir); err != nil { + errs = append(errs, err) + } else if ok { + for _, d := range cfg.Define { + name := strings.ToLower(strings.TrimSpace(d.Name)) + if name == "" || strings.TrimSpace(d.Focus) == "" { + errs = append(errs, fmt.Errorf(".gadfly.yml: specialist define needs non-empty name and focus")) + continue + } + registry[name] = Specialist{Name: name, Title: titleOr(d.Title, name), Focus: d.Focus} + } + fileSelection = cfg.Specialists + } + + // 3. env definitions: GADFLY_SPECIALIST_="focus". + for _, kv := range os.Environ() { + key, val, _ := strings.Cut(kv, "=") + if !strings.HasPrefix(key, "GADFLY_SPECIALIST_") || len(key) == len("GADFLY_SPECIALIST_") { + continue + } + name := strings.ToLower(strings.TrimPrefix(key, "GADFLY_SPECIALIST_")) + if strings.TrimSpace(val) == "" { + errs = append(errs, fmt.Errorf("%s: empty specialist focus", key)) + continue + } + existing, ok := registry[name] + title := titleOr("", name) + if ok { + title = existing.Title + } + registry[name] = Specialist{Name: name, Title: title, Focus: val} + } + + // 4. selection: env > file > default suite. + var names []string + switch { + case strings.TrimSpace(os.Getenv("GADFLY_SPECIALISTS")) != "": + names = splitCSV(os.Getenv("GADFLY_SPECIALISTS")) + case len(fileSelection) > 0: + names = fileSelection + default: + names = defaultSuite + } + if len(names) == 1 && strings.EqualFold(names[0], "all") { + names = sortedKeys(registry) + } + + // 5. resolve names -> specialists. + var out []Specialist + seen := map[string]bool{} + for _, raw := range names { + name := strings.ToLower(strings.TrimSpace(raw)) + if name == "" || seen[name] { + continue + } + seen[name] = true + sp, ok := registry[name] + if !ok { + errs = append(errs, fmt.Errorf("unknown specialist %q (built-ins: %s; or define it via .gadfly.yml / GADFLY_SPECIALIST_%s)", + name, strings.Join(sortedKeys(builtinSpecialists), ", "), strings.ToUpper(name))) + continue + } + out = append(out, sp) + } + return out, errs +} + +// composeSpecialistPrompt appends a specialist's lens to the base system prompt. +func composeSpecialistPrompt(base string, sp Specialist) string { + return strings.TrimRight(base, "\n") + + "\n\n## Your review lens: " + sp.Title + "\n" + + "Focus your review on this lens; other reviewers cover the rest. " + sp.Focus +} + +func loadFileConfig(repoDir string) (fileConfig, bool, error) { + candidates := []string{} + if p := strings.TrimSpace(os.Getenv("GADFLY_CONFIG")); p != "" { + if !filepath.IsAbs(p) { + p = filepath.Join(repoDir, p) + } + candidates = append(candidates, p) + } else { + candidates = append(candidates, filepath.Join(repoDir, ".gadfly.yml"), filepath.Join(repoDir, ".gadfly.yaml")) + } + for _, p := range candidates { + data, err := os.ReadFile(p) + if err != nil { + continue // not found / unreadable: try next + } + var cfg fileConfig + if err := yaml.Unmarshal(data, &cfg); err != nil { + return fileConfig{}, false, fmt.Errorf("parse %s: %w", filepath.Base(p), err) + } + return cfg, true, nil + } + return fileConfig{}, false, nil +} + +func titleOr(title, name string) string { + if strings.TrimSpace(title) != "" { + return title + } + return name +} + +func splitCSV(s string) []string { + var out []string + for _, p := range strings.Split(s, ",") { + if t := strings.TrimSpace(p); t != "" { + out = append(out, t) + } + } + return out +} + +func sortedKeys(m map[string]Specialist) []string { + keys := make([]string, 0, len(m)) + for k := range m { + keys = append(keys, k) + } + sort.Strings(keys) + return keys +} diff --git a/cmd/gadfly/specialists_test.go b/cmd/gadfly/specialists_test.go new file mode 100644 index 0000000..1b73756 --- /dev/null +++ b/cmd/gadfly/specialists_test.go @@ -0,0 +1,118 @@ +package main + +import ( + "os" + "path/filepath" + "testing" +) + +func names(specs []Specialist) []string { + out := make([]string, len(specs)) + for i, s := range specs { + out[i] = s.Name + } + return out +} + +func eq(a, b []string) bool { + if len(a) != len(b) { + return false + } + for i := range a { + if a[i] != b[i] { + return false + } + } + return true +} + +func TestResolveSpecialists_DefaultSuite(t *testing.T) { + t.Setenv("GADFLY_SPECIALISTS", "") + specs, errs := resolveSpecialists(t.TempDir()) + if len(errs) != 0 { + t.Fatalf("unexpected errors: %v", errs) + } + if !eq(names(specs), defaultSuite) { + t.Errorf("default = %v, want %v", names(specs), defaultSuite) + } +} + +func TestResolveSpecialists_EnvSelection(t *testing.T) { + t.Setenv("GADFLY_SPECIALISTS", "security, tests") + specs, errs := resolveSpecialists(t.TempDir()) + if len(errs) != 0 { + t.Fatalf("unexpected errors: %v", errs) + } + if !eq(names(specs), []string{"security", "tests"}) { + t.Errorf("got %v", names(specs)) + } +} + +func TestResolveSpecialists_UnknownNameErrors(t *testing.T) { + t.Setenv("GADFLY_SPECIALISTS", "security,bogus") + specs, errs := resolveSpecialists(t.TempDir()) + if len(errs) == 0 { + t.Fatal("expected an error for unknown specialist") + } + if !eq(names(specs), []string{"security"}) { + t.Errorf("valid ones should still resolve, got %v", names(specs)) + } +} + +func TestResolveSpecialists_EnvCustomDefinition(t *testing.T) { + t.Setenv("GADFLY_SPECIALIST_MIGRATIONS", "Review DB migrations for destructive ops.") + t.Setenv("GADFLY_SPECIALISTS", "migrations") + specs, errs := resolveSpecialists(t.TempDir()) + if len(errs) != 0 { + t.Fatalf("unexpected errors: %v", errs) + } + if len(specs) != 1 || specs[0].Name != "migrations" || specs[0].Focus == "" { + t.Fatalf("custom specialist not registered: %+v", specs) + } +} + +func TestResolveSpecialists_FileConfig(t *testing.T) { + dir := t.TempDir() + cfg := `specialists: [security, migrations] +define: + - name: migrations + title: "DB migrations" + focus: "Review schema migrations for destructive or unindexed changes." +` + if err := os.WriteFile(filepath.Join(dir, ".gadfly.yml"), []byte(cfg), 0o644); err != nil { + t.Fatal(err) + } + t.Setenv("GADFLY_SPECIALISTS", "") // let the file drive selection + specs, errs := resolveSpecialists(dir) + if len(errs) != 0 { + t.Fatalf("unexpected errors: %v", errs) + } + if !eq(names(specs), []string{"security", "migrations"}) { + t.Errorf("got %v", names(specs)) + } +} + +func TestParseVerdictAndWorst(t *testing.T) { + cases := map[string]verdict{ + "VERDICT: No material issues found.": verdictClean, + "Minor issues\n- nit": verdictMinor, + "**Blocking issues found**": verdictBlocking, + "something unparseable": verdictUnknown, + } + for in, want := range cases { + if got := parseVerdict(in); got != want { + t.Errorf("parseVerdict(%q) = %v, want %v", in, got, want) + } + } + results := []specialistResult{ + {spec: Specialist{Name: "security"}, verdict: verdictMinor}, + {spec: Specialist{Name: "correctness"}, verdict: verdictBlocking}, + {spec: Specialist{Name: "improvements"}, verdict: verdictBlocking}, // must not count + } + if w := worstVerdict(results[:2]); w != verdictBlocking { + t.Errorf("worst = %v, want blocking", w) + } + if w := worstVerdict([]specialistResult{results[0], results[2]}); w != verdictMinor { + t.Errorf("improvements should not escalate; worst = %v, want minor", w) + } +} diff --git a/entrypoint.sh b/entrypoint.sh index fa704c3..d67843a 100644 --- a/entrypoint.sh +++ b/entrypoint.sh @@ -41,7 +41,9 @@ # empty => fall back to "is a repo collaborator" set -uo pipefail -DEFAULT_MODELS="qwen3-coder:480b-cloud,gpt-oss:120b-cloud" +# One model by default: the specialist suite already provides breadth, so a +# multi-model default would multiply cost (models × specialists × 2 passes). +DEFAULT_MODELS="qwen3-coder:480b-cloud" TRIGGER_PHRASE="${GADFLY_TRIGGER_PHRASE:-@gadfly review}" SCRIPTS_DIR="/app/scripts" WORKDIR="${WORKDIR:-/tmp/gadfly}" diff --git a/examples/.gadfly.yml b/examples/.gadfly.yml new file mode 100644 index 0000000..d4331fb --- /dev/null +++ b/examples/.gadfly.yml @@ -0,0 +1,34 @@ +# Optional per-repo Gadfly config. Place at the repo ROOT as `.gadfly.yml` +# (or point GADFLY_CONFIG at it). It's read from the PR's checked-out tree, so +# it's version-controlled and reviewed like any other file. +# +# Precedence for specialist DEFINITIONS: built-ins < this file < GADFLY_SPECIALIST_* env. +# Precedence for SELECTION: GADFLY_SPECIALISTS env > this file's `specialists:` > default suite. + +# Which specialists run. Omit this key to use the default suite +# (security, correctness, maintainability, performance, error-handling). +# Built-in opt-ins you can add: tests, docs, conventions, improvements. Or "all". +specialists: + - security + - correctness + - maintainability + - tests + - migrations # a custom one, defined below + +# Add new specialists, or override a built-in by reusing its name (e.g. give +# `security` a repo-specific focus). `focus` is appended to the base reviewer +# prompt as that lens's instruction. +define: + - name: migrations + title: "🗃️ DB migrations" + focus: > + Review database schema migrations for destructive operations (DROP/ALTER + that loses data), missing or unindexed foreign keys, non-idempotent steps, + and changes that would lock a large table on deploy. Check that any new + column added to a domain struct is wired through every storage layer. + - name: security + title: "🔒 Security (house rules)" + focus: > + In addition to the usual security review, this repo requires: all web + routes use the auth middleware, no secrets in logs, and all external HTTP + calls set a timeout. Flag any violation. diff --git a/examples/README.md b/examples/README.md index c8d8919..4879543 100644 --- a/examples/README.md +++ b/examples/README.md @@ -10,6 +10,7 @@ set the secrets/vars it references. Gadfly is advisory only — it never blocks | [`local-ollama.yml`](local-ollama.yml) | a **local/LAN Ollama** daemon | nothing (or `GADFLY_BASE_URL` for a remote host) | | [`openai-compatible.yml`](openai-compatible.yml) | any **OpenAI-compatible** endpoint (local Ollama `/v1`, gateway, vLLM, OpenRouter…) | `GADFLY_BASE_URL` (+ a key for most gateways) | | [`endpoint-aliases.yml`](endpoint-aliases.yml) | **several named backends** at once (one comment each) | repo vars `GADFLY_ENDPOINT_` | +| [`.gadfly.yml`](.gadfly.yml) | **per-repo specialist config** (not a workflow — goes at your repo root) | — | Common to all: - **Triggers:** new/reopened/ready non-draft PR (auto), `@gadfly review` comment (allowed users), diff --git a/examples/adversarial-review.yml b/examples/adversarial-review.yml index b34b0f1..f03674f 100644 --- a/examples/adversarial-review.yml +++ b/examples/adversarial-review.yml @@ -43,6 +43,11 @@ jobs: OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }} OLLAMA_REVIEW_MODELS: ${{ vars.OLLAMA_REVIEW_MODELS }} GADFLY_ALLOWED_USERS: ${{ vars.GADFLY_ALLOWED_USERS }} + # Specialist suite (optional). Empty = default suite + # (security,correctness,maintainability,performance,error-handling). + # csv to choose; "all" for everything; or define custom ones via a repo + # .gadfly.yml / GADFLY_SPECIALIST_. See README "Specialists". + GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS }} # --- Models & providers (optional; default = Ollama Cloud) ---------- # Gadfly is majordomo-powered, so it can target other backends. Set a # provider for bare model ids; point at a different endpoint with a diff --git a/go.mod b/go.mod index 9ad88f6..94d28b2 100644 --- a/go.mod +++ b/go.mod @@ -2,7 +2,10 @@ module gitea.stevedudenhoeffer.com/steve/gadfly go 1.26.2 -require gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b +require ( + gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b + gopkg.in/yaml.v3 v3.0.1 +) require ( cloud.google.com/go v0.116.0 // indirect diff --git a/go.sum b/go.sum index d482ba0..e6fff57 100644 --- a/go.sum +++ b/go.sum @@ -121,8 +121,10 @@ google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpAD google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c= google.golang.org/protobuf v1.34.2 h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg= google.golang.org/protobuf v1.34.2/go.mod h1:qYOHts0dSfpeUzUFpOMr/WGzszTmLH+DiWniOlNbLDw= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= diff --git a/scripts/system-prompt.txt b/scripts/system-prompt.txt index ef7787d..169af2f 100644 --- a/scripts/system-prompt.txt +++ b/scripts/system-prompt.txt @@ -1,6 +1,10 @@ You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the pull request below — not to praise it. A gadfly does not let things slide. +You review through ONE assigned lens (given at the end of this prompt). Stay in your lane — +other reviewers cover the other angles — but do report anything clearly serious you happen +to notice. + You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out state. USE THEM to verify before you report. Do not review the diff in isolation. - read_file(path[, start_line, limit]) — read a file with line numbers. @@ -19,21 +23,10 @@ Mandatory verification discipline — this is the whole point of giving you tool - If you cannot confirm a suspicion with the tools, either drop it or clearly label it "unverified" — do NOT present an unchecked guess as a finding. -Be skeptical and concrete. Hunt specifically for: -- Correctness bugs and logic errors introduced by the change. -- SEMANTIC / domain correctness — the failure mode plausible-looking code hides best. - Do NOT trust a constant, conversion factor, formula, unit, or threshold just because - it looks reasonable. Independently RE-DERIVE the expected value from first principles - (units, dimensions, edge values) and compare. A magic number that "looks about right" - is exactly where real bugs hide (e.g. a linear factor used where it must be squared). -- Concurrency issues: data races, deadlocks, unsynchronized shared state, leaked tasks. -- Security problems: injection, missing authz/authn, secret leakage, unsafe input handling. -- Error handling gaps: ignored errors, swallowed exceptions, missing rollback/cleanup. -- Resource leaks: unclosed handles/bodies/files, context/lifetime misuse, unbounded growth. -- Missed edge cases: off-by-one, nil/null, empty collection, overflow, zero/negative. -- Violations of THIS repo's own conventions. Discover them — do not assume. Read any - README / CONTRIBUTING / CLAUDE.md / AGENTS.md / lint config the repo ships, and hold - the change to the patterns the surrounding code actually uses. +Be skeptical and concrete, and apply your assigned lens rigorously. A recurring, high-value +discipline regardless of lens: do NOT trust a constant, conversion factor, formula, unit, or +threshold just because it looks reasonable — RE-DERIVE the expected value from first principles +and compare. Plausible-looking magic numbers are where real bugs hide. Output rules: - Output GitHub-flavored markdown, concise. No filler, no restating the diff.