feat: specialist suite — configurable + custom review lenses (one consolidated comment)
Build & push image / build-and-push (push) Successful in 8s

Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).

- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
  maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
  improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
  GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
  Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
  Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
  focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
  specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
  CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Steve Dudenhoeffer
2026-06-25 19:23:05 -04:00
parent 676c9d4f07
commit 7809d1b93d
13 changed files with 581 additions and 38 deletions
+17 -6
View File
@@ -29,11 +29,13 @@ verifies each one against the actual code, and posts its findings as a comment.
```
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
main.go agent orchestration: review pass + adversarial recheck pass, budgets
main.go orchestration: loop specialists, each a review pass + adversarial recheck
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
consolidate.go verdict parsing + one-comment consolidation (a section per specialist)
model.go provider/model resolution (majordomo.Parse) + env endpoint aliases
tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
recheck.go second-pass verification prompt + verdict recompute
*_test.go sandbox, recheck, wrap-up, and spec/endpoint-parse unit tests
*_test.go sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests
scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific)
entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that
@@ -78,10 +80,19 @@ GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly
## Configuration
The full env reference lives in the **README** (`Models & providers` + `Configuration`).
Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`), `GADFLY_MODEL`/`GADFLY_MODELS`,
`GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via `GADFLY_ENDPOINT_<NAME>` /
`GADFLY_ALIAS_<NAME>` (http-capable) and majordomo `LLM_*` DSNs (HTTPS-only).
The full env reference lives in the **README** (`Specialists`, `Models & providers`,
`Configuration`). Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`),
`GADFLY_MODEL`/`GADFLY_MODELS`, `GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via
`GADFLY_ENDPOINT_<NAME>` / `GADFLY_ALIAS_<NAME>` (http-capable) and majordomo `LLM_*` DSNs
(HTTPS-only).
**Specialists (the swarm):** the reviewer runs a suite of focused lenses, one consolidated
comment with a section each. Default suite = security/correctness/maintainability/performance/
error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via
`GADFLY_SPECIALISTS` (csv or `all`); define/override via `GADFLY_SPECIALIST_<NAME>` env or a repo
`.gadfly.yml` (`specialists:` + `define:`). See `cmd/gadfly/specialists.go`. Cost ≈
specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one).
Dynamic `auto` selection (a cheap model picks lenses per-diff) is the planned next step.
**Tested vs untested:** only the Ollama paths (local + OpenAI-compatible pointed at Ollama)
are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are
+40
View File
@@ -102,6 +102,46 @@ which Gadfly also honors — but those are **HTTPS-only**, so for plaintext loca
> **Gitea Actions note:** repo `vars`/`secrets` aren't auto-exposed as env — add each alias to
> the stub workflow's `env:` block, e.g. `GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}`.
## Specialists (the review swarm)
Instead of one generic reviewer, Gadfly runs a **suite of specialists** — each a focused lens
with its own review (+recheck) pass — and merges them into **one comment**, a collapsible
section per lens, led by an overall verdict (the worst across lenses; the optional
`improvements` lens never escalates it).
**Default suite** (when nothing is configured):
`security`, `correctness`, `maintainability` (code cleanliness), `performance`, `error-handling`.
**Also built in** (opt-in by name): `tests`, `docs`, `conventions`, and `improvements`
(strict & quiet — at most 12 high-value, non-blocking suggestions, silent otherwise).
Select which run with **`GADFLY_SPECIALISTS`** (comma-separated names, or `all`):
```yaml
GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"
```
**Define your own** — two ways, which compose (env overrides file overrides built-ins):
```yaml
# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>" (also overrides a built-in by reusing its name)
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
GADFLY_SPECIALISTS: "security,correctness,migrations"
```
```yaml
# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
specialists: [security, correctness, maintainability, migrations]
define:
- name: migrations
title: "🗃️ DB migrations"
focus: "Review schema migrations for destructive ops, missing indexes, table locks."
```
> **Cost:** each specialist is its own review+recheck, so cost ≈ *specialists × models × 2*.
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, and a future
> `auto` mode will let a cheap model pick only the lenses a given diff actually needs.
### Triggers
1. A **new/reopened/ready** non-draft PR — automatic.
+94
View File
@@ -0,0 +1,94 @@
package main
import (
"fmt"
"strings"
)
// verdict is the severity a specialist (or the whole review) lands on.
type verdict int
const (
verdictUnknown verdict = iota // couldn't parse; treated below Minor for display, not blocking
verdictClean // "No material issues found"
verdictMinor // "Minor issues"
verdictBlocking // "Blocking issues found"
)
func (v verdict) label() string {
switch v {
case verdictClean:
return "No material issues found"
case verdictMinor:
return "Minor issues"
case verdictBlocking:
return "Blocking issues found"
default:
return "Reviewed"
}
}
// parseVerdict extracts a specialist's self-reported verdict from its output.
// The base prompt tells each lens to lead with one of the three phrases.
func parseVerdict(out string) verdict {
l := strings.ToLower(out)
switch {
case strings.Contains(l, "blocking issues found"):
return verdictBlocking
case strings.Contains(l, "no material issues"):
return verdictClean
case strings.Contains(l, "minor issues"):
return verdictMinor
default:
return verdictUnknown
}
}
// specialistResult pairs a specialist with its rendered review and verdict.
type specialistResult struct {
spec Specialist
out string
verdict verdict
}
// worstVerdict returns the most severe verdict across results. The optional
// "improvements" lens never escalates the overall verdict (it's advisory).
func worstVerdict(results []specialistResult) verdict {
worst := verdictClean
for _, r := range results {
if r.spec.Name == "improvements" {
continue
}
if r.verdict > worst {
worst = r.verdict
}
}
return worst
}
// renderConsolidated assembles the single comment body: an overall verdict line
// followed by one verbatim section per specialist. run.sh wraps this with the
// "🪰 Gadfly review — <model>" header and the advisory footer.
func renderConsolidated(results []specialistResult) string {
var b strings.Builder
fmt.Fprintf(&b, "**Verdict: %s** — %d reviewers: %s\n",
worstVerdict(results).label(), len(results), strings.Join(specialistNames(results), ", "))
for _, r := range results {
body := strings.TrimSpace(r.out)
if body == "" {
body = "_(no output)_"
}
fmt.Fprintf(&b, "\n<details><summary><b>%s</b> — %s</summary>\n\n%s\n\n</details>\n",
r.spec.Title, r.verdict.label(), body)
}
return strings.TrimRight(b.String(), "\n")
}
func specialistNames(results []specialistResult) []string {
names := make([]string, 0, len(results))
for _, r := range results {
names = append(names, r.spec.Name)
}
return names
}
+36 -15
View File
@@ -65,8 +65,10 @@ import (
)
const (
defaultMaxSteps = 24
defaultTimeoutSecs = 300
defaultMaxSteps = 24
// defaultTimeoutSecs is the overall deadline shared by ALL specialist passes
// in one invocation; the default suite runs several lenses sequentially.
defaultTimeoutSecs = 600
defaultMaxDiffChars = 60000
// defaultWrapUpReserve is how many steps before the cap the agent is told
// to stop investigating and write its final answer. Reserving a margin is
@@ -130,34 +132,53 @@ func run() error {
return fmt.Errorf("resolve model: %w", err)
}
specialists, serrs := resolveSpecialists(repoDir)
for _, e := range serrs {
fmt.Fprintln(os.Stderr, "gadfly:", e)
}
if len(specialists) == 0 {
return errors.New("no specialists resolved to run")
}
timeout := time.Duration(envInt("GADFLY_TIMEOUT_SECS", defaultTimeoutSecs)) * time.Second
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
// Pass 1 — review: produce the draft.
draft, err := runAgent(ctx, mdl, fsTools, string(systemBytes), buildTask(diff),
envInt("GADFLY_MAX_STEPS", defaultMaxSteps))
if err != nil {
return fmt.Errorf("review pass: %w", err)
base := string(systemBytes)
task := buildTask(diff)
results := make([]specialistResult, 0, len(specialists))
for _, sp := range specialists {
out := reviewWithSpecialist(ctx, mdl, fsTools, base, sp, task, diff)
results = append(results, specialistResult{spec: sp, out: out, verdict: parseVerdict(out)})
}
fmt.Println(renderConsolidated(results))
return nil
}
// reviewWithSpecialist runs one lens end-to-end: a review pass under the
// specialist's composed prompt, then the shared adversarial recheck pass. A
// failed pass is rendered as an inline notice (advisory — one lens failing
// never sinks the others or the job).
func reviewWithSpecialist(ctx context.Context, mdl llm.Model, fsTools *repoFS, base string, sp Specialist, task, diff string) string {
draft, err := runAgent(ctx, mdl, fsTools, composeSpecialistPrompt(base, sp), task,
envInt("GADFLY_MAX_STEPS", defaultMaxSteps))
if err != nil {
fmt.Fprintf(os.Stderr, "gadfly: specialist %q review pass failed: %v\n", sp.Name, err)
return fmt.Sprintf("⚠️ This reviewer failed to complete: %v", err)
}
// Pass 2 — recheck: adversarially re-verify the draft's findings and drop
// the unconfirmed ones. Skipped for a clean draft (nothing to verify) or
// when disabled. A recheck failure is non-fatal — we emit the unverified
// draft rather than losing the review entirely.
final := draft
if shouldRecheck(draft) {
rechecked, rerr := runAgent(ctx, mdl, fsTools, recheckSystemPrompt, buildRecheckTask(draft, diff),
envInt("GADFLY_RECHECK_MAX_STEPS", defaultRecheckMaxSteps))
if rerr != nil {
fmt.Fprintln(os.Stderr, "gadfly: recheck pass failed; emitting unverified draft:", rerr)
fmt.Fprintf(os.Stderr, "gadfly: specialist %q recheck failed; emitting unverified draft: %v\n", sp.Name, rerr)
} else {
final = rechecked
}
}
fmt.Println(final)
return nil
return final
}
// runAgent runs one agent pass (its own fresh toolbox over the sandbox) and
+219
View File
@@ -0,0 +1,219 @@
package main
import (
"fmt"
"os"
"path/filepath"
"sort"
"strings"
"gopkg.in/yaml.v3"
)
// Specialist is one review lens: a focused instruction appended to the base
// system prompt, run as its own review (+recheck) pass and rendered as its own
// section in the consolidated comment.
type Specialist struct {
Name string // lowercase id, used in GADFLY_SPECIALISTS and section anchors
Title string // human label for the section header
Focus string // lens-specific instruction appended to the base prompt
}
// builtinSpecialists are the lenses Gadfly ships with. The default suite (below)
// is the "biggies"; the rest are opt-in by name via GADFLY_SPECIALISTS, a
// .gadfly.yml `specialists:` list, or `all`.
var builtinSpecialists = map[string]Specialist{
"security": {"security", "🔒 Security", "" +
"Authn/authz gaps, injection (SQL/command/path/template), SSRF, unsafe deserialization, " +
"secret/credential leakage, missing input validation, unsafe handling of untrusted data, " +
"and insecure defaults. Trace tainted input to where it's used."},
"correctness": {"correctness", "✅ Correctness", "" +
"Logic bugs and incorrect behavior introduced by the change. Pay special attention to " +
"SEMANTIC / domain correctness: do NOT trust a constant, conversion factor, formula, unit, " +
"or threshold because it looks reasonable — independently RE-DERIVE the expected value from " +
"first principles (units, dimensions, boundary values) and compare. A plausible-looking magic " +
"number is exactly where real bugs hide."},
"maintainability": {"maintainability", "🧹 Code cleanliness & maintainability", "" +
"Readability and structure: dead/duplicated code, confusing names, overly long or deeply " +
"nested functions, leaky abstractions, copy-paste that should be shared, and changes that " +
"don't follow the patterns the surrounding code already uses. Prefer concrete, low-churn fixes."},
"performance": {"performance", "⚡ Performance", "" +
"Efficiency regressions: N+1 queries, unnecessary allocations or copies, work inside hot loops, " +
"unbounded growth, missing pagination/limits, blocking calls on hot paths, and avoidable " +
"quadratic behavior. Only flag impact you can justify, not micro-optimizations."},
"error-handling": {"error-handling", "🧯 Error handling & edge cases", "" +
"Ignored or swallowed errors, missing cleanup/rollback/`defer`, panics on bad input, and " +
"unhandled edge cases: nil/null, empty collections, zero/negative, integer overflow, and " +
"boundary (off-by-one) conditions. Check the unhappy paths the diff introduces."},
// --- opt-in (not in the default suite) ---
"tests": {"tests", "🧪 Tests", "" +
"Whether the change is adequately tested: new logic without tests, changed behavior whose " +
"tests weren't updated, missing edge-case/error-path coverage, and brittle or assertion-free tests."},
"docs": {"docs", "📝 Docs", "" +
"Documentation drift: public APIs/flags/config changed without updating doc comments, README, " +
"or examples; stale comments now contradicting the code; missing docs for new exported surface."},
"conventions": {"conventions", "📐 Conventions", "" +
"Adherence to THIS repo's own conventions. Discover them — read any README / CONTRIBUTING / " +
"CLAUDE.md / AGENTS.md / lint config the repo ships, and hold the change to the documented and " +
"surrounding-code patterns."},
"improvements": {"improvements", "💡 Improvements (optional)", "" +
"OPTIONAL, non-blocking suggestions only. Surface at most 12 genuinely high-value blind spots " +
"or simplifications the author likely wants to know about. Be quiet: if nothing rises to that " +
"bar, reply exactly 'No suggestions.' Never nitpick, never pad. These never affect the verdict."},
}
// defaultSuite is used when nothing selects specialists explicitly.
var defaultSuite = []string{"security", "correctness", "maintainability", "performance", "error-handling"}
// fileConfig is the optional .gadfly.yml schema.
type fileConfig struct {
// Specialists selects which run (names). Empty => the default suite.
Specialists []string `yaml:"specialists"`
// Define adds or overrides specialist definitions.
Define []struct {
Name string `yaml:"name"`
Title string `yaml:"title"`
Focus string `yaml:"focus"`
} `yaml:"define"`
}
// resolveSpecialists builds the active specialist list from (in increasing
// precedence) the built-ins, a repo .gadfly.yml, and GADFLY_SPECIALIST_<NAME>
// env definitions. The active SELECTION is GADFLY_SPECIALISTS (csv or "all"),
// else the file's `specialists:`, else the default suite. Unknown names are
// reported as errors and skipped; valid ones still run.
func resolveSpecialists(repoDir string) ([]Specialist, []error) {
var errs []error
// 1. registry seeded with built-ins.
registry := make(map[string]Specialist, len(builtinSpecialists))
for k, v := range builtinSpecialists {
registry[k] = v
}
// 2. config file: definitions + (maybe) the selection.
var fileSelection []string
if cfg, ok, err := loadFileConfig(repoDir); err != nil {
errs = append(errs, err)
} else if ok {
for _, d := range cfg.Define {
name := strings.ToLower(strings.TrimSpace(d.Name))
if name == "" || strings.TrimSpace(d.Focus) == "" {
errs = append(errs, fmt.Errorf(".gadfly.yml: specialist define needs non-empty name and focus"))
continue
}
registry[name] = Specialist{Name: name, Title: titleOr(d.Title, name), Focus: d.Focus}
}
fileSelection = cfg.Specialists
}
// 3. env definitions: GADFLY_SPECIALIST_<NAME>="focus".
for _, kv := range os.Environ() {
key, val, _ := strings.Cut(kv, "=")
if !strings.HasPrefix(key, "GADFLY_SPECIALIST_") || len(key) == len("GADFLY_SPECIALIST_") {
continue
}
name := strings.ToLower(strings.TrimPrefix(key, "GADFLY_SPECIALIST_"))
if strings.TrimSpace(val) == "" {
errs = append(errs, fmt.Errorf("%s: empty specialist focus", key))
continue
}
existing, ok := registry[name]
title := titleOr("", name)
if ok {
title = existing.Title
}
registry[name] = Specialist{Name: name, Title: title, Focus: val}
}
// 4. selection: env > file > default suite.
var names []string
switch {
case strings.TrimSpace(os.Getenv("GADFLY_SPECIALISTS")) != "":
names = splitCSV(os.Getenv("GADFLY_SPECIALISTS"))
case len(fileSelection) > 0:
names = fileSelection
default:
names = defaultSuite
}
if len(names) == 1 && strings.EqualFold(names[0], "all") {
names = sortedKeys(registry)
}
// 5. resolve names -> specialists.
var out []Specialist
seen := map[string]bool{}
for _, raw := range names {
name := strings.ToLower(strings.TrimSpace(raw))
if name == "" || seen[name] {
continue
}
seen[name] = true
sp, ok := registry[name]
if !ok {
errs = append(errs, fmt.Errorf("unknown specialist %q (built-ins: %s; or define it via .gadfly.yml / GADFLY_SPECIALIST_%s)",
name, strings.Join(sortedKeys(builtinSpecialists), ", "), strings.ToUpper(name)))
continue
}
out = append(out, sp)
}
return out, errs
}
// composeSpecialistPrompt appends a specialist's lens to the base system prompt.
func composeSpecialistPrompt(base string, sp Specialist) string {
return strings.TrimRight(base, "\n") +
"\n\n## Your review lens: " + sp.Title + "\n" +
"Focus your review on this lens; other reviewers cover the rest. " + sp.Focus
}
func loadFileConfig(repoDir string) (fileConfig, bool, error) {
candidates := []string{}
if p := strings.TrimSpace(os.Getenv("GADFLY_CONFIG")); p != "" {
if !filepath.IsAbs(p) {
p = filepath.Join(repoDir, p)
}
candidates = append(candidates, p)
} else {
candidates = append(candidates, filepath.Join(repoDir, ".gadfly.yml"), filepath.Join(repoDir, ".gadfly.yaml"))
}
for _, p := range candidates {
data, err := os.ReadFile(p)
if err != nil {
continue // not found / unreadable: try next
}
var cfg fileConfig
if err := yaml.Unmarshal(data, &cfg); err != nil {
return fileConfig{}, false, fmt.Errorf("parse %s: %w", filepath.Base(p), err)
}
return cfg, true, nil
}
return fileConfig{}, false, nil
}
func titleOr(title, name string) string {
if strings.TrimSpace(title) != "" {
return title
}
return name
}
func splitCSV(s string) []string {
var out []string
for _, p := range strings.Split(s, ",") {
if t := strings.TrimSpace(p); t != "" {
out = append(out, t)
}
}
return out
}
func sortedKeys(m map[string]Specialist) []string {
keys := make([]string, 0, len(m))
for k := range m {
keys = append(keys, k)
}
sort.Strings(keys)
return keys
}
+118
View File
@@ -0,0 +1,118 @@
package main
import (
"os"
"path/filepath"
"testing"
)
func names(specs []Specialist) []string {
out := make([]string, len(specs))
for i, s := range specs {
out[i] = s.Name
}
return out
}
func eq(a, b []string) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if a[i] != b[i] {
return false
}
}
return true
}
func TestResolveSpecialists_DefaultSuite(t *testing.T) {
t.Setenv("GADFLY_SPECIALISTS", "")
specs, errs := resolveSpecialists(t.TempDir())
if len(errs) != 0 {
t.Fatalf("unexpected errors: %v", errs)
}
if !eq(names(specs), defaultSuite) {
t.Errorf("default = %v, want %v", names(specs), defaultSuite)
}
}
func TestResolveSpecialists_EnvSelection(t *testing.T) {
t.Setenv("GADFLY_SPECIALISTS", "security, tests")
specs, errs := resolveSpecialists(t.TempDir())
if len(errs) != 0 {
t.Fatalf("unexpected errors: %v", errs)
}
if !eq(names(specs), []string{"security", "tests"}) {
t.Errorf("got %v", names(specs))
}
}
func TestResolveSpecialists_UnknownNameErrors(t *testing.T) {
t.Setenv("GADFLY_SPECIALISTS", "security,bogus")
specs, errs := resolveSpecialists(t.TempDir())
if len(errs) == 0 {
t.Fatal("expected an error for unknown specialist")
}
if !eq(names(specs), []string{"security"}) {
t.Errorf("valid ones should still resolve, got %v", names(specs))
}
}
func TestResolveSpecialists_EnvCustomDefinition(t *testing.T) {
t.Setenv("GADFLY_SPECIALIST_MIGRATIONS", "Review DB migrations for destructive ops.")
t.Setenv("GADFLY_SPECIALISTS", "migrations")
specs, errs := resolveSpecialists(t.TempDir())
if len(errs) != 0 {
t.Fatalf("unexpected errors: %v", errs)
}
if len(specs) != 1 || specs[0].Name != "migrations" || specs[0].Focus == "" {
t.Fatalf("custom specialist not registered: %+v", specs)
}
}
func TestResolveSpecialists_FileConfig(t *testing.T) {
dir := t.TempDir()
cfg := `specialists: [security, migrations]
define:
- name: migrations
title: "DB migrations"
focus: "Review schema migrations for destructive or unindexed changes."
`
if err := os.WriteFile(filepath.Join(dir, ".gadfly.yml"), []byte(cfg), 0o644); err != nil {
t.Fatal(err)
}
t.Setenv("GADFLY_SPECIALISTS", "") // let the file drive selection
specs, errs := resolveSpecialists(dir)
if len(errs) != 0 {
t.Fatalf("unexpected errors: %v", errs)
}
if !eq(names(specs), []string{"security", "migrations"}) {
t.Errorf("got %v", names(specs))
}
}
func TestParseVerdictAndWorst(t *testing.T) {
cases := map[string]verdict{
"VERDICT: No material issues found.": verdictClean,
"Minor issues\n- nit": verdictMinor,
"**Blocking issues found**": verdictBlocking,
"something unparseable": verdictUnknown,
}
for in, want := range cases {
if got := parseVerdict(in); got != want {
t.Errorf("parseVerdict(%q) = %v, want %v", in, got, want)
}
}
results := []specialistResult{
{spec: Specialist{Name: "security"}, verdict: verdictMinor},
{spec: Specialist{Name: "correctness"}, verdict: verdictBlocking},
{spec: Specialist{Name: "improvements"}, verdict: verdictBlocking}, // must not count
}
if w := worstVerdict(results[:2]); w != verdictBlocking {
t.Errorf("worst = %v, want blocking", w)
}
if w := worstVerdict([]specialistResult{results[0], results[2]}); w != verdictMinor {
t.Errorf("improvements should not escalate; worst = %v, want minor", w)
}
}
+3 -1
View File
@@ -41,7 +41,9 @@
# empty => fall back to "is a repo collaborator"
set -uo pipefail
DEFAULT_MODELS="qwen3-coder:480b-cloud,gpt-oss:120b-cloud"
# One model by default: the specialist suite already provides breadth, so a
# multi-model default would multiply cost (models × specialists × 2 passes).
DEFAULT_MODELS="qwen3-coder:480b-cloud"
TRIGGER_PHRASE="${GADFLY_TRIGGER_PHRASE:-@gadfly review}"
SCRIPTS_DIR="/app/scripts"
WORKDIR="${WORKDIR:-/tmp/gadfly}"
+34
View File
@@ -0,0 +1,34 @@
# Optional per-repo Gadfly config. Place at the repo ROOT as `.gadfly.yml`
# (or point GADFLY_CONFIG at it). It's read from the PR's checked-out tree, so
# it's version-controlled and reviewed like any other file.
#
# Precedence for specialist DEFINITIONS: built-ins < this file < GADFLY_SPECIALIST_* env.
# Precedence for SELECTION: GADFLY_SPECIALISTS env > this file's `specialists:` > default suite.
# Which specialists run. Omit this key to use the default suite
# (security, correctness, maintainability, performance, error-handling).
# Built-in opt-ins you can add: tests, docs, conventions, improvements. Or "all".
specialists:
- security
- correctness
- maintainability
- tests
- migrations # a custom one, defined below
# Add new specialists, or override a built-in by reusing its name (e.g. give
# `security` a repo-specific focus). `focus` is appended to the base reviewer
# prompt as that lens's instruction.
define:
- name: migrations
title: "🗃️ DB migrations"
focus: >
Review database schema migrations for destructive operations (DROP/ALTER
that loses data), missing or unindexed foreign keys, non-idempotent steps,
and changes that would lock a large table on deploy. Check that any new
column added to a domain struct is wired through every storage layer.
- name: security
title: "🔒 Security (house rules)"
focus: >
In addition to the usual security review, this repo requires: all web
routes use the auth middleware, no secrets in logs, and all external HTTP
calls set a timeout. Flag any violation.
+1
View File
@@ -10,6 +10,7 @@ set the secrets/vars it references. Gadfly is advisory only — it never blocks
| [`local-ollama.yml`](local-ollama.yml) | a **local/LAN Ollama** daemon | nothing (or `GADFLY_BASE_URL` for a remote host) |
| [`openai-compatible.yml`](openai-compatible.yml) | any **OpenAI-compatible** endpoint (local Ollama `/v1`, gateway, vLLM, OpenRouter…) | `GADFLY_BASE_URL` (+ a key for most gateways) |
| [`endpoint-aliases.yml`](endpoint-aliases.yml) | **several named backends** at once (one comment each) | repo vars `GADFLY_ENDPOINT_<NAME>` |
| [`.gadfly.yml`](.gadfly.yml) | **per-repo specialist config** (not a workflow — goes at your repo root) | — |
Common to all:
- **Triggers:** new/reopened/ready non-draft PR (auto), `@gadfly review` comment (allowed users),
+5
View File
@@ -43,6 +43,11 @@ jobs:
OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }}
OLLAMA_REVIEW_MODELS: ${{ vars.OLLAMA_REVIEW_MODELS }}
GADFLY_ALLOWED_USERS: ${{ vars.GADFLY_ALLOWED_USERS }}
# Specialist suite (optional). Empty = default suite
# (security,correctness,maintainability,performance,error-handling).
# csv to choose; "all" for everything; or define custom ones via a repo
# .gadfly.yml / GADFLY_SPECIALIST_<NAME>. See README "Specialists".
GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS }}
# --- Models & providers (optional; default = Ollama Cloud) ----------
# Gadfly is majordomo-powered, so it can target other backends. Set a
# provider for bare model ids; point at a different endpoint with a
+4 -1
View File
@@ -2,7 +2,10 @@ module gitea.stevedudenhoeffer.com/steve/gadfly
go 1.26.2
require gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b
require (
gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b
gopkg.in/yaml.v3 v3.0.1
)
require (
cloud.google.com/go v0.116.0 // indirect
+2
View File
@@ -121,8 +121,10 @@ google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpAD
google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c=
google.golang.org/protobuf v1.34.2 h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg=
google.golang.org/protobuf v1.34.2/go.mod h1:qYOHts0dSfpeUzUFpOMr/WGzszTmLH+DiWniOlNbLDw=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
+8 -15
View File
@@ -1,6 +1,10 @@
You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
pull request below — not to praise it. A gadfly does not let things slide.
You review through ONE assigned lens (given at the end of this prompt). Stay in your lane —
other reviewers cover the other angles — but do report anything clearly serious you happen
to notice.
You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
state. USE THEM to verify before you report. Do not review the diff in isolation.
- read_file(path[, start_line, limit]) — read a file with line numbers.
@@ -19,21 +23,10 @@ Mandatory verification discipline — this is the whole point of giving you tool
- If you cannot confirm a suspicion with the tools, either drop it or clearly label it
"unverified" — do NOT present an unchecked guess as a finding.
Be skeptical and concrete. Hunt specifically for:
- Correctness bugs and logic errors introduced by the change.
- SEMANTIC / domain correctness — the failure mode plausible-looking code hides best.
Do NOT trust a constant, conversion factor, formula, unit, or threshold just because
it looks reasonable. Independently RE-DERIVE the expected value from first principles
(units, dimensions, edge values) and compare. A magic number that "looks about right"
is exactly where real bugs hide (e.g. a linear factor used where it must be squared).
- Concurrency issues: data races, deadlocks, unsynchronized shared state, leaked tasks.
- Security problems: injection, missing authz/authn, secret leakage, unsafe input handling.
- Error handling gaps: ignored errors, swallowed exceptions, missing rollback/cleanup.
- Resource leaks: unclosed handles/bodies/files, context/lifetime misuse, unbounded growth.
- Missed edge cases: off-by-one, nil/null, empty collection, overflow, zero/negative.
- Violations of THIS repo's own conventions. Discover them — do not assume. Read any
README / CONTRIBUTING / CLAUDE.md / AGENTS.md / lint config the repo ships, and hold
the change to the patterns the surrounding code actually uses.
Be skeptical and concrete, and apply your assigned lens rigorously. A recurring, high-value
discipline regardless of lens: do NOT trust a constant, conversion factor, formula, unit, or
threshold just because it looks reasonable — RE-DERIVE the expected value from first principles
and compare. Plausible-looking magic numbers are where real bugs hide.
Output rules:
- Output GitHub-flavored markdown, concise. No filler, no restating the diff.