feat: specialist suite — configurable + custom review lenses (one consolidated comment)
Build & push image / build-and-push (push) Successful in 8s
Build & push image / build-and-push (push) Successful in 8s
Replace the single generic review with a suite of focused specialists, each its own review+recheck pass, merged into ONE comment (a collapsible section per lens, led by the worst verdict; the optional `improvements` lens never escalates it). - cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness, maintainability, performance, error-handling) + opt-in (tests, docs, conventions, improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define). Precedence: built-ins < file < env. Unknown names error but don't sink the run. - cmd/gadfly/consolidate.go: verdict parse + one-comment render. - main.go: loop specialists; per-lens failure is an inline notice, never fatal. Default timeout bumped to 600s (suite runs sequentially). - base system prompt trimmed to persona+tools+discipline+output; lens-specific focus is appended per specialist (semantic re-derivation discipline kept in base). - entrypoint default models -> single model (suite already gives breadth; cost ~= specialists × models × 2). Adds gopkg.in/yaml.v3. - docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var, CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -29,11 +29,13 @@ verifies each one against the actual code, and posts its findings as a comment.
|
|||||||
|
|
||||||
```
|
```
|
||||||
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
|
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
|
||||||
main.go agent orchestration: review pass + adversarial recheck pass, budgets
|
main.go orchestration: loop specialists, each a review pass + adversarial recheck
|
||||||
|
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
|
||||||
|
consolidate.go verdict parsing + one-comment consolidation (a section per specialist)
|
||||||
model.go provider/model resolution (majordomo.Parse) + env endpoint aliases
|
model.go provider/model resolution (majordomo.Parse) + env endpoint aliases
|
||||||
tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
|
tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
|
||||||
recheck.go second-pass verification prompt + verdict recompute
|
recheck.go second-pass verification prompt + verdict recompute
|
||||||
*_test.go sandbox, recheck, wrap-up, and spec/endpoint-parse unit tests
|
*_test.go sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests
|
||||||
scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
|
scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
|
||||||
scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific)
|
scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific)
|
||||||
entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that
|
entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that
|
||||||
@@ -78,10 +80,19 @@ GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly
|
|||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
The full env reference lives in the **README** (`Models & providers` + `Configuration`).
|
The full env reference lives in the **README** (`Specialists`, `Models & providers`,
|
||||||
Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`), `GADFLY_MODEL`/`GADFLY_MODELS`,
|
`Configuration`). Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`),
|
||||||
`GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via `GADFLY_ENDPOINT_<NAME>` /
|
`GADFLY_MODEL`/`GADFLY_MODELS`, `GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via
|
||||||
`GADFLY_ALIAS_<NAME>` (http-capable) and majordomo `LLM_*` DSNs (HTTPS-only).
|
`GADFLY_ENDPOINT_<NAME>` / `GADFLY_ALIAS_<NAME>` (http-capable) and majordomo `LLM_*` DSNs
|
||||||
|
(HTTPS-only).
|
||||||
|
|
||||||
|
**Specialists (the swarm):** the reviewer runs a suite of focused lenses, one consolidated
|
||||||
|
comment with a section each. Default suite = security/correctness/maintainability/performance/
|
||||||
|
error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via
|
||||||
|
`GADFLY_SPECIALISTS` (csv or `all`); define/override via `GADFLY_SPECIALIST_<NAME>` env or a repo
|
||||||
|
`.gadfly.yml` (`specialists:` + `define:`). See `cmd/gadfly/specialists.go`. Cost ≈
|
||||||
|
specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one).
|
||||||
|
Dynamic `auto` selection (a cheap model picks lenses per-diff) is the planned next step.
|
||||||
|
|
||||||
**Tested vs untested:** only the Ollama paths (local + OpenAI-compatible pointed at Ollama)
|
**Tested vs untested:** only the Ollama paths (local + OpenAI-compatible pointed at Ollama)
|
||||||
are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are
|
are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are
|
||||||
|
|||||||
@@ -102,6 +102,46 @@ which Gadfly also honors — but those are **HTTPS-only**, so for plaintext loca
|
|||||||
> **Gitea Actions note:** repo `vars`/`secrets` aren't auto-exposed as env — add each alias to
|
> **Gitea Actions note:** repo `vars`/`secrets` aren't auto-exposed as env — add each alias to
|
||||||
> the stub workflow's `env:` block, e.g. `GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}`.
|
> the stub workflow's `env:` block, e.g. `GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}`.
|
||||||
|
|
||||||
|
## Specialists (the review swarm)
|
||||||
|
|
||||||
|
Instead of one generic reviewer, Gadfly runs a **suite of specialists** — each a focused lens
|
||||||
|
with its own review (+recheck) pass — and merges them into **one comment**, a collapsible
|
||||||
|
section per lens, led by an overall verdict (the worst across lenses; the optional
|
||||||
|
`improvements` lens never escalates it).
|
||||||
|
|
||||||
|
**Default suite** (when nothing is configured):
|
||||||
|
`security`, `correctness`, `maintainability` (code cleanliness), `performance`, `error-handling`.
|
||||||
|
|
||||||
|
**Also built in** (opt-in by name): `tests`, `docs`, `conventions`, and `improvements`
|
||||||
|
(strict & quiet — at most 1–2 high-value, non-blocking suggestions, silent otherwise).
|
||||||
|
|
||||||
|
Select which run with **`GADFLY_SPECIALISTS`** (comma-separated names, or `all`):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Define your own** — two ways, which compose (env overrides file overrides built-ins):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>" (also overrides a built-in by reusing its name)
|
||||||
|
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
|
||||||
|
GADFLY_SPECIALISTS: "security,correctness,migrations"
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
|
||||||
|
specialists: [security, correctness, maintainability, migrations]
|
||||||
|
define:
|
||||||
|
- name: migrations
|
||||||
|
title: "🗃️ DB migrations"
|
||||||
|
focus: "Review schema migrations for destructive ops, missing indexes, table locks."
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Cost:** each specialist is its own review+recheck, so cost ≈ *specialists × models × 2*.
|
||||||
|
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, and a future
|
||||||
|
> `auto` mode will let a cheap model pick only the lenses a given diff actually needs.
|
||||||
|
|
||||||
### Triggers
|
### Triggers
|
||||||
|
|
||||||
1. A **new/reopened/ready** non-draft PR — automatic.
|
1. A **new/reopened/ready** non-draft PR — automatic.
|
||||||
|
|||||||
@@ -0,0 +1,94 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"strings"
|
||||||
|
)
|
||||||
|
|
||||||
|
// verdict is the severity a specialist (or the whole review) lands on.
|
||||||
|
type verdict int
|
||||||
|
|
||||||
|
const (
|
||||||
|
verdictUnknown verdict = iota // couldn't parse; treated below Minor for display, not blocking
|
||||||
|
verdictClean // "No material issues found"
|
||||||
|
verdictMinor // "Minor issues"
|
||||||
|
verdictBlocking // "Blocking issues found"
|
||||||
|
)
|
||||||
|
|
||||||
|
func (v verdict) label() string {
|
||||||
|
switch v {
|
||||||
|
case verdictClean:
|
||||||
|
return "No material issues found"
|
||||||
|
case verdictMinor:
|
||||||
|
return "Minor issues"
|
||||||
|
case verdictBlocking:
|
||||||
|
return "Blocking issues found"
|
||||||
|
default:
|
||||||
|
return "Reviewed"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// parseVerdict extracts a specialist's self-reported verdict from its output.
|
||||||
|
// The base prompt tells each lens to lead with one of the three phrases.
|
||||||
|
func parseVerdict(out string) verdict {
|
||||||
|
l := strings.ToLower(out)
|
||||||
|
switch {
|
||||||
|
case strings.Contains(l, "blocking issues found"):
|
||||||
|
return verdictBlocking
|
||||||
|
case strings.Contains(l, "no material issues"):
|
||||||
|
return verdictClean
|
||||||
|
case strings.Contains(l, "minor issues"):
|
||||||
|
return verdictMinor
|
||||||
|
default:
|
||||||
|
return verdictUnknown
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// specialistResult pairs a specialist with its rendered review and verdict.
|
||||||
|
type specialistResult struct {
|
||||||
|
spec Specialist
|
||||||
|
out string
|
||||||
|
verdict verdict
|
||||||
|
}
|
||||||
|
|
||||||
|
// worstVerdict returns the most severe verdict across results. The optional
|
||||||
|
// "improvements" lens never escalates the overall verdict (it's advisory).
|
||||||
|
func worstVerdict(results []specialistResult) verdict {
|
||||||
|
worst := verdictClean
|
||||||
|
for _, r := range results {
|
||||||
|
if r.spec.Name == "improvements" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if r.verdict > worst {
|
||||||
|
worst = r.verdict
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return worst
|
||||||
|
}
|
||||||
|
|
||||||
|
// renderConsolidated assembles the single comment body: an overall verdict line
|
||||||
|
// followed by one verbatim section per specialist. run.sh wraps this with the
|
||||||
|
// "🪰 Gadfly review — <model>" header and the advisory footer.
|
||||||
|
func renderConsolidated(results []specialistResult) string {
|
||||||
|
var b strings.Builder
|
||||||
|
fmt.Fprintf(&b, "**Verdict: %s** — %d reviewers: %s\n",
|
||||||
|
worstVerdict(results).label(), len(results), strings.Join(specialistNames(results), ", "))
|
||||||
|
|
||||||
|
for _, r := range results {
|
||||||
|
body := strings.TrimSpace(r.out)
|
||||||
|
if body == "" {
|
||||||
|
body = "_(no output)_"
|
||||||
|
}
|
||||||
|
fmt.Fprintf(&b, "\n<details><summary><b>%s</b> — %s</summary>\n\n%s\n\n</details>\n",
|
||||||
|
r.spec.Title, r.verdict.label(), body)
|
||||||
|
}
|
||||||
|
return strings.TrimRight(b.String(), "\n")
|
||||||
|
}
|
||||||
|
|
||||||
|
func specialistNames(results []specialistResult) []string {
|
||||||
|
names := make([]string, 0, len(results))
|
||||||
|
for _, r := range results {
|
||||||
|
names = append(names, r.spec.Name)
|
||||||
|
}
|
||||||
|
return names
|
||||||
|
}
|
||||||
+36
-15
@@ -65,8 +65,10 @@ import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
const (
|
const (
|
||||||
defaultMaxSteps = 24
|
defaultMaxSteps = 24
|
||||||
defaultTimeoutSecs = 300
|
// defaultTimeoutSecs is the overall deadline shared by ALL specialist passes
|
||||||
|
// in one invocation; the default suite runs several lenses sequentially.
|
||||||
|
defaultTimeoutSecs = 600
|
||||||
defaultMaxDiffChars = 60000
|
defaultMaxDiffChars = 60000
|
||||||
// defaultWrapUpReserve is how many steps before the cap the agent is told
|
// defaultWrapUpReserve is how many steps before the cap the agent is told
|
||||||
// to stop investigating and write its final answer. Reserving a margin is
|
// to stop investigating and write its final answer. Reserving a margin is
|
||||||
@@ -130,34 +132,53 @@ func run() error {
|
|||||||
return fmt.Errorf("resolve model: %w", err)
|
return fmt.Errorf("resolve model: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
specialists, serrs := resolveSpecialists(repoDir)
|
||||||
|
for _, e := range serrs {
|
||||||
|
fmt.Fprintln(os.Stderr, "gadfly:", e)
|
||||||
|
}
|
||||||
|
if len(specialists) == 0 {
|
||||||
|
return errors.New("no specialists resolved to run")
|
||||||
|
}
|
||||||
|
|
||||||
timeout := time.Duration(envInt("GADFLY_TIMEOUT_SECS", defaultTimeoutSecs)) * time.Second
|
timeout := time.Duration(envInt("GADFLY_TIMEOUT_SECS", defaultTimeoutSecs)) * time.Second
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), timeout)
|
ctx, cancel := context.WithTimeout(context.Background(), timeout)
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
// Pass 1 — review: produce the draft.
|
base := string(systemBytes)
|
||||||
draft, err := runAgent(ctx, mdl, fsTools, string(systemBytes), buildTask(diff),
|
task := buildTask(diff)
|
||||||
envInt("GADFLY_MAX_STEPS", defaultMaxSteps))
|
results := make([]specialistResult, 0, len(specialists))
|
||||||
if err != nil {
|
for _, sp := range specialists {
|
||||||
return fmt.Errorf("review pass: %w", err)
|
out := reviewWithSpecialist(ctx, mdl, fsTools, base, sp, task, diff)
|
||||||
|
results = append(results, specialistResult{spec: sp, out: out, verdict: parseVerdict(out)})
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println(renderConsolidated(results))
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// reviewWithSpecialist runs one lens end-to-end: a review pass under the
|
||||||
|
// specialist's composed prompt, then the shared adversarial recheck pass. A
|
||||||
|
// failed pass is rendered as an inline notice (advisory — one lens failing
|
||||||
|
// never sinks the others or the job).
|
||||||
|
func reviewWithSpecialist(ctx context.Context, mdl llm.Model, fsTools *repoFS, base string, sp Specialist, task, diff string) string {
|
||||||
|
draft, err := runAgent(ctx, mdl, fsTools, composeSpecialistPrompt(base, sp), task,
|
||||||
|
envInt("GADFLY_MAX_STEPS", defaultMaxSteps))
|
||||||
|
if err != nil {
|
||||||
|
fmt.Fprintf(os.Stderr, "gadfly: specialist %q review pass failed: %v\n", sp.Name, err)
|
||||||
|
return fmt.Sprintf("⚠️ This reviewer failed to complete: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Pass 2 — recheck: adversarially re-verify the draft's findings and drop
|
|
||||||
// the unconfirmed ones. Skipped for a clean draft (nothing to verify) or
|
|
||||||
// when disabled. A recheck failure is non-fatal — we emit the unverified
|
|
||||||
// draft rather than losing the review entirely.
|
|
||||||
final := draft
|
final := draft
|
||||||
if shouldRecheck(draft) {
|
if shouldRecheck(draft) {
|
||||||
rechecked, rerr := runAgent(ctx, mdl, fsTools, recheckSystemPrompt, buildRecheckTask(draft, diff),
|
rechecked, rerr := runAgent(ctx, mdl, fsTools, recheckSystemPrompt, buildRecheckTask(draft, diff),
|
||||||
envInt("GADFLY_RECHECK_MAX_STEPS", defaultRecheckMaxSteps))
|
envInt("GADFLY_RECHECK_MAX_STEPS", defaultRecheckMaxSteps))
|
||||||
if rerr != nil {
|
if rerr != nil {
|
||||||
fmt.Fprintln(os.Stderr, "gadfly: recheck pass failed; emitting unverified draft:", rerr)
|
fmt.Fprintf(os.Stderr, "gadfly: specialist %q recheck failed; emitting unverified draft: %v\n", sp.Name, rerr)
|
||||||
} else {
|
} else {
|
||||||
final = rechecked
|
final = rechecked
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
return final
|
||||||
fmt.Println(final)
|
|
||||||
return nil
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// runAgent runs one agent pass (its own fresh toolbox over the sandbox) and
|
// runAgent runs one agent pass (its own fresh toolbox over the sandbox) and
|
||||||
|
|||||||
@@ -0,0 +1,219 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"sort"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"gopkg.in/yaml.v3"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Specialist is one review lens: a focused instruction appended to the base
|
||||||
|
// system prompt, run as its own review (+recheck) pass and rendered as its own
|
||||||
|
// section in the consolidated comment.
|
||||||
|
type Specialist struct {
|
||||||
|
Name string // lowercase id, used in GADFLY_SPECIALISTS and section anchors
|
||||||
|
Title string // human label for the section header
|
||||||
|
Focus string // lens-specific instruction appended to the base prompt
|
||||||
|
}
|
||||||
|
|
||||||
|
// builtinSpecialists are the lenses Gadfly ships with. The default suite (below)
|
||||||
|
// is the "biggies"; the rest are opt-in by name via GADFLY_SPECIALISTS, a
|
||||||
|
// .gadfly.yml `specialists:` list, or `all`.
|
||||||
|
var builtinSpecialists = map[string]Specialist{
|
||||||
|
"security": {"security", "🔒 Security", "" +
|
||||||
|
"Authn/authz gaps, injection (SQL/command/path/template), SSRF, unsafe deserialization, " +
|
||||||
|
"secret/credential leakage, missing input validation, unsafe handling of untrusted data, " +
|
||||||
|
"and insecure defaults. Trace tainted input to where it's used."},
|
||||||
|
"correctness": {"correctness", "✅ Correctness", "" +
|
||||||
|
"Logic bugs and incorrect behavior introduced by the change. Pay special attention to " +
|
||||||
|
"SEMANTIC / domain correctness: do NOT trust a constant, conversion factor, formula, unit, " +
|
||||||
|
"or threshold because it looks reasonable — independently RE-DERIVE the expected value from " +
|
||||||
|
"first principles (units, dimensions, boundary values) and compare. A plausible-looking magic " +
|
||||||
|
"number is exactly where real bugs hide."},
|
||||||
|
"maintainability": {"maintainability", "🧹 Code cleanliness & maintainability", "" +
|
||||||
|
"Readability and structure: dead/duplicated code, confusing names, overly long or deeply " +
|
||||||
|
"nested functions, leaky abstractions, copy-paste that should be shared, and changes that " +
|
||||||
|
"don't follow the patterns the surrounding code already uses. Prefer concrete, low-churn fixes."},
|
||||||
|
"performance": {"performance", "⚡ Performance", "" +
|
||||||
|
"Efficiency regressions: N+1 queries, unnecessary allocations or copies, work inside hot loops, " +
|
||||||
|
"unbounded growth, missing pagination/limits, blocking calls on hot paths, and avoidable " +
|
||||||
|
"quadratic behavior. Only flag impact you can justify, not micro-optimizations."},
|
||||||
|
"error-handling": {"error-handling", "🧯 Error handling & edge cases", "" +
|
||||||
|
"Ignored or swallowed errors, missing cleanup/rollback/`defer`, panics on bad input, and " +
|
||||||
|
"unhandled edge cases: nil/null, empty collections, zero/negative, integer overflow, and " +
|
||||||
|
"boundary (off-by-one) conditions. Check the unhappy paths the diff introduces."},
|
||||||
|
|
||||||
|
// --- opt-in (not in the default suite) ---
|
||||||
|
"tests": {"tests", "🧪 Tests", "" +
|
||||||
|
"Whether the change is adequately tested: new logic without tests, changed behavior whose " +
|
||||||
|
"tests weren't updated, missing edge-case/error-path coverage, and brittle or assertion-free tests."},
|
||||||
|
"docs": {"docs", "📝 Docs", "" +
|
||||||
|
"Documentation drift: public APIs/flags/config changed without updating doc comments, README, " +
|
||||||
|
"or examples; stale comments now contradicting the code; missing docs for new exported surface."},
|
||||||
|
"conventions": {"conventions", "📐 Conventions", "" +
|
||||||
|
"Adherence to THIS repo's own conventions. Discover them — read any README / CONTRIBUTING / " +
|
||||||
|
"CLAUDE.md / AGENTS.md / lint config the repo ships, and hold the change to the documented and " +
|
||||||
|
"surrounding-code patterns."},
|
||||||
|
"improvements": {"improvements", "💡 Improvements (optional)", "" +
|
||||||
|
"OPTIONAL, non-blocking suggestions only. Surface at most 1–2 genuinely high-value blind spots " +
|
||||||
|
"or simplifications the author likely wants to know about. Be quiet: if nothing rises to that " +
|
||||||
|
"bar, reply exactly 'No suggestions.' Never nitpick, never pad. These never affect the verdict."},
|
||||||
|
}
|
||||||
|
|
||||||
|
// defaultSuite is used when nothing selects specialists explicitly.
|
||||||
|
var defaultSuite = []string{"security", "correctness", "maintainability", "performance", "error-handling"}
|
||||||
|
|
||||||
|
// fileConfig is the optional .gadfly.yml schema.
|
||||||
|
type fileConfig struct {
|
||||||
|
// Specialists selects which run (names). Empty => the default suite.
|
||||||
|
Specialists []string `yaml:"specialists"`
|
||||||
|
// Define adds or overrides specialist definitions.
|
||||||
|
Define []struct {
|
||||||
|
Name string `yaml:"name"`
|
||||||
|
Title string `yaml:"title"`
|
||||||
|
Focus string `yaml:"focus"`
|
||||||
|
} `yaml:"define"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// resolveSpecialists builds the active specialist list from (in increasing
|
||||||
|
// precedence) the built-ins, a repo .gadfly.yml, and GADFLY_SPECIALIST_<NAME>
|
||||||
|
// env definitions. The active SELECTION is GADFLY_SPECIALISTS (csv or "all"),
|
||||||
|
// else the file's `specialists:`, else the default suite. Unknown names are
|
||||||
|
// reported as errors and skipped; valid ones still run.
|
||||||
|
func resolveSpecialists(repoDir string) ([]Specialist, []error) {
|
||||||
|
var errs []error
|
||||||
|
|
||||||
|
// 1. registry seeded with built-ins.
|
||||||
|
registry := make(map[string]Specialist, len(builtinSpecialists))
|
||||||
|
for k, v := range builtinSpecialists {
|
||||||
|
registry[k] = v
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. config file: definitions + (maybe) the selection.
|
||||||
|
var fileSelection []string
|
||||||
|
if cfg, ok, err := loadFileConfig(repoDir); err != nil {
|
||||||
|
errs = append(errs, err)
|
||||||
|
} else if ok {
|
||||||
|
for _, d := range cfg.Define {
|
||||||
|
name := strings.ToLower(strings.TrimSpace(d.Name))
|
||||||
|
if name == "" || strings.TrimSpace(d.Focus) == "" {
|
||||||
|
errs = append(errs, fmt.Errorf(".gadfly.yml: specialist define needs non-empty name and focus"))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
registry[name] = Specialist{Name: name, Title: titleOr(d.Title, name), Focus: d.Focus}
|
||||||
|
}
|
||||||
|
fileSelection = cfg.Specialists
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. env definitions: GADFLY_SPECIALIST_<NAME>="focus".
|
||||||
|
for _, kv := range os.Environ() {
|
||||||
|
key, val, _ := strings.Cut(kv, "=")
|
||||||
|
if !strings.HasPrefix(key, "GADFLY_SPECIALIST_") || len(key) == len("GADFLY_SPECIALIST_") {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
name := strings.ToLower(strings.TrimPrefix(key, "GADFLY_SPECIALIST_"))
|
||||||
|
if strings.TrimSpace(val) == "" {
|
||||||
|
errs = append(errs, fmt.Errorf("%s: empty specialist focus", key))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
existing, ok := registry[name]
|
||||||
|
title := titleOr("", name)
|
||||||
|
if ok {
|
||||||
|
title = existing.Title
|
||||||
|
}
|
||||||
|
registry[name] = Specialist{Name: name, Title: title, Focus: val}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 4. selection: env > file > default suite.
|
||||||
|
var names []string
|
||||||
|
switch {
|
||||||
|
case strings.TrimSpace(os.Getenv("GADFLY_SPECIALISTS")) != "":
|
||||||
|
names = splitCSV(os.Getenv("GADFLY_SPECIALISTS"))
|
||||||
|
case len(fileSelection) > 0:
|
||||||
|
names = fileSelection
|
||||||
|
default:
|
||||||
|
names = defaultSuite
|
||||||
|
}
|
||||||
|
if len(names) == 1 && strings.EqualFold(names[0], "all") {
|
||||||
|
names = sortedKeys(registry)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 5. resolve names -> specialists.
|
||||||
|
var out []Specialist
|
||||||
|
seen := map[string]bool{}
|
||||||
|
for _, raw := range names {
|
||||||
|
name := strings.ToLower(strings.TrimSpace(raw))
|
||||||
|
if name == "" || seen[name] {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
seen[name] = true
|
||||||
|
sp, ok := registry[name]
|
||||||
|
if !ok {
|
||||||
|
errs = append(errs, fmt.Errorf("unknown specialist %q (built-ins: %s; or define it via .gadfly.yml / GADFLY_SPECIALIST_%s)",
|
||||||
|
name, strings.Join(sortedKeys(builtinSpecialists), ", "), strings.ToUpper(name)))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
out = append(out, sp)
|
||||||
|
}
|
||||||
|
return out, errs
|
||||||
|
}
|
||||||
|
|
||||||
|
// composeSpecialistPrompt appends a specialist's lens to the base system prompt.
|
||||||
|
func composeSpecialistPrompt(base string, sp Specialist) string {
|
||||||
|
return strings.TrimRight(base, "\n") +
|
||||||
|
"\n\n## Your review lens: " + sp.Title + "\n" +
|
||||||
|
"Focus your review on this lens; other reviewers cover the rest. " + sp.Focus
|
||||||
|
}
|
||||||
|
|
||||||
|
func loadFileConfig(repoDir string) (fileConfig, bool, error) {
|
||||||
|
candidates := []string{}
|
||||||
|
if p := strings.TrimSpace(os.Getenv("GADFLY_CONFIG")); p != "" {
|
||||||
|
if !filepath.IsAbs(p) {
|
||||||
|
p = filepath.Join(repoDir, p)
|
||||||
|
}
|
||||||
|
candidates = append(candidates, p)
|
||||||
|
} else {
|
||||||
|
candidates = append(candidates, filepath.Join(repoDir, ".gadfly.yml"), filepath.Join(repoDir, ".gadfly.yaml"))
|
||||||
|
}
|
||||||
|
for _, p := range candidates {
|
||||||
|
data, err := os.ReadFile(p)
|
||||||
|
if err != nil {
|
||||||
|
continue // not found / unreadable: try next
|
||||||
|
}
|
||||||
|
var cfg fileConfig
|
||||||
|
if err := yaml.Unmarshal(data, &cfg); err != nil {
|
||||||
|
return fileConfig{}, false, fmt.Errorf("parse %s: %w", filepath.Base(p), err)
|
||||||
|
}
|
||||||
|
return cfg, true, nil
|
||||||
|
}
|
||||||
|
return fileConfig{}, false, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func titleOr(title, name string) string {
|
||||||
|
if strings.TrimSpace(title) != "" {
|
||||||
|
return title
|
||||||
|
}
|
||||||
|
return name
|
||||||
|
}
|
||||||
|
|
||||||
|
func splitCSV(s string) []string {
|
||||||
|
var out []string
|
||||||
|
for _, p := range strings.Split(s, ",") {
|
||||||
|
if t := strings.TrimSpace(p); t != "" {
|
||||||
|
out = append(out, t)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
func sortedKeys(m map[string]Specialist) []string {
|
||||||
|
keys := make([]string, 0, len(m))
|
||||||
|
for k := range m {
|
||||||
|
keys = append(keys, k)
|
||||||
|
}
|
||||||
|
sort.Strings(keys)
|
||||||
|
return keys
|
||||||
|
}
|
||||||
@@ -0,0 +1,118 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
func names(specs []Specialist) []string {
|
||||||
|
out := make([]string, len(specs))
|
||||||
|
for i, s := range specs {
|
||||||
|
out[i] = s.Name
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
func eq(a, b []string) bool {
|
||||||
|
if len(a) != len(b) {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
for i := range a {
|
||||||
|
if a[i] != b[i] {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestResolveSpecialists_DefaultSuite(t *testing.T) {
|
||||||
|
t.Setenv("GADFLY_SPECIALISTS", "")
|
||||||
|
specs, errs := resolveSpecialists(t.TempDir())
|
||||||
|
if len(errs) != 0 {
|
||||||
|
t.Fatalf("unexpected errors: %v", errs)
|
||||||
|
}
|
||||||
|
if !eq(names(specs), defaultSuite) {
|
||||||
|
t.Errorf("default = %v, want %v", names(specs), defaultSuite)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestResolveSpecialists_EnvSelection(t *testing.T) {
|
||||||
|
t.Setenv("GADFLY_SPECIALISTS", "security, tests")
|
||||||
|
specs, errs := resolveSpecialists(t.TempDir())
|
||||||
|
if len(errs) != 0 {
|
||||||
|
t.Fatalf("unexpected errors: %v", errs)
|
||||||
|
}
|
||||||
|
if !eq(names(specs), []string{"security", "tests"}) {
|
||||||
|
t.Errorf("got %v", names(specs))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestResolveSpecialists_UnknownNameErrors(t *testing.T) {
|
||||||
|
t.Setenv("GADFLY_SPECIALISTS", "security,bogus")
|
||||||
|
specs, errs := resolveSpecialists(t.TempDir())
|
||||||
|
if len(errs) == 0 {
|
||||||
|
t.Fatal("expected an error for unknown specialist")
|
||||||
|
}
|
||||||
|
if !eq(names(specs), []string{"security"}) {
|
||||||
|
t.Errorf("valid ones should still resolve, got %v", names(specs))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestResolveSpecialists_EnvCustomDefinition(t *testing.T) {
|
||||||
|
t.Setenv("GADFLY_SPECIALIST_MIGRATIONS", "Review DB migrations for destructive ops.")
|
||||||
|
t.Setenv("GADFLY_SPECIALISTS", "migrations")
|
||||||
|
specs, errs := resolveSpecialists(t.TempDir())
|
||||||
|
if len(errs) != 0 {
|
||||||
|
t.Fatalf("unexpected errors: %v", errs)
|
||||||
|
}
|
||||||
|
if len(specs) != 1 || specs[0].Name != "migrations" || specs[0].Focus == "" {
|
||||||
|
t.Fatalf("custom specialist not registered: %+v", specs)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestResolveSpecialists_FileConfig(t *testing.T) {
|
||||||
|
dir := t.TempDir()
|
||||||
|
cfg := `specialists: [security, migrations]
|
||||||
|
define:
|
||||||
|
- name: migrations
|
||||||
|
title: "DB migrations"
|
||||||
|
focus: "Review schema migrations for destructive or unindexed changes."
|
||||||
|
`
|
||||||
|
if err := os.WriteFile(filepath.Join(dir, ".gadfly.yml"), []byte(cfg), 0o644); err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Setenv("GADFLY_SPECIALISTS", "") // let the file drive selection
|
||||||
|
specs, errs := resolveSpecialists(dir)
|
||||||
|
if len(errs) != 0 {
|
||||||
|
t.Fatalf("unexpected errors: %v", errs)
|
||||||
|
}
|
||||||
|
if !eq(names(specs), []string{"security", "migrations"}) {
|
||||||
|
t.Errorf("got %v", names(specs))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestParseVerdictAndWorst(t *testing.T) {
|
||||||
|
cases := map[string]verdict{
|
||||||
|
"VERDICT: No material issues found.": verdictClean,
|
||||||
|
"Minor issues\n- nit": verdictMinor,
|
||||||
|
"**Blocking issues found**": verdictBlocking,
|
||||||
|
"something unparseable": verdictUnknown,
|
||||||
|
}
|
||||||
|
for in, want := range cases {
|
||||||
|
if got := parseVerdict(in); got != want {
|
||||||
|
t.Errorf("parseVerdict(%q) = %v, want %v", in, got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
results := []specialistResult{
|
||||||
|
{spec: Specialist{Name: "security"}, verdict: verdictMinor},
|
||||||
|
{spec: Specialist{Name: "correctness"}, verdict: verdictBlocking},
|
||||||
|
{spec: Specialist{Name: "improvements"}, verdict: verdictBlocking}, // must not count
|
||||||
|
}
|
||||||
|
if w := worstVerdict(results[:2]); w != verdictBlocking {
|
||||||
|
t.Errorf("worst = %v, want blocking", w)
|
||||||
|
}
|
||||||
|
if w := worstVerdict([]specialistResult{results[0], results[2]}); w != verdictMinor {
|
||||||
|
t.Errorf("improvements should not escalate; worst = %v, want minor", w)
|
||||||
|
}
|
||||||
|
}
|
||||||
+3
-1
@@ -41,7 +41,9 @@
|
|||||||
# empty => fall back to "is a repo collaborator"
|
# empty => fall back to "is a repo collaborator"
|
||||||
set -uo pipefail
|
set -uo pipefail
|
||||||
|
|
||||||
DEFAULT_MODELS="qwen3-coder:480b-cloud,gpt-oss:120b-cloud"
|
# One model by default: the specialist suite already provides breadth, so a
|
||||||
|
# multi-model default would multiply cost (models × specialists × 2 passes).
|
||||||
|
DEFAULT_MODELS="qwen3-coder:480b-cloud"
|
||||||
TRIGGER_PHRASE="${GADFLY_TRIGGER_PHRASE:-@gadfly review}"
|
TRIGGER_PHRASE="${GADFLY_TRIGGER_PHRASE:-@gadfly review}"
|
||||||
SCRIPTS_DIR="/app/scripts"
|
SCRIPTS_DIR="/app/scripts"
|
||||||
WORKDIR="${WORKDIR:-/tmp/gadfly}"
|
WORKDIR="${WORKDIR:-/tmp/gadfly}"
|
||||||
|
|||||||
@@ -0,0 +1,34 @@
|
|||||||
|
# Optional per-repo Gadfly config. Place at the repo ROOT as `.gadfly.yml`
|
||||||
|
# (or point GADFLY_CONFIG at it). It's read from the PR's checked-out tree, so
|
||||||
|
# it's version-controlled and reviewed like any other file.
|
||||||
|
#
|
||||||
|
# Precedence for specialist DEFINITIONS: built-ins < this file < GADFLY_SPECIALIST_* env.
|
||||||
|
# Precedence for SELECTION: GADFLY_SPECIALISTS env > this file's `specialists:` > default suite.
|
||||||
|
|
||||||
|
# Which specialists run. Omit this key to use the default suite
|
||||||
|
# (security, correctness, maintainability, performance, error-handling).
|
||||||
|
# Built-in opt-ins you can add: tests, docs, conventions, improvements. Or "all".
|
||||||
|
specialists:
|
||||||
|
- security
|
||||||
|
- correctness
|
||||||
|
- maintainability
|
||||||
|
- tests
|
||||||
|
- migrations # a custom one, defined below
|
||||||
|
|
||||||
|
# Add new specialists, or override a built-in by reusing its name (e.g. give
|
||||||
|
# `security` a repo-specific focus). `focus` is appended to the base reviewer
|
||||||
|
# prompt as that lens's instruction.
|
||||||
|
define:
|
||||||
|
- name: migrations
|
||||||
|
title: "🗃️ DB migrations"
|
||||||
|
focus: >
|
||||||
|
Review database schema migrations for destructive operations (DROP/ALTER
|
||||||
|
that loses data), missing or unindexed foreign keys, non-idempotent steps,
|
||||||
|
and changes that would lock a large table on deploy. Check that any new
|
||||||
|
column added to a domain struct is wired through every storage layer.
|
||||||
|
- name: security
|
||||||
|
title: "🔒 Security (house rules)"
|
||||||
|
focus: >
|
||||||
|
In addition to the usual security review, this repo requires: all web
|
||||||
|
routes use the auth middleware, no secrets in logs, and all external HTTP
|
||||||
|
calls set a timeout. Flag any violation.
|
||||||
@@ -10,6 +10,7 @@ set the secrets/vars it references. Gadfly is advisory only — it never blocks
|
|||||||
| [`local-ollama.yml`](local-ollama.yml) | a **local/LAN Ollama** daemon | nothing (or `GADFLY_BASE_URL` for a remote host) |
|
| [`local-ollama.yml`](local-ollama.yml) | a **local/LAN Ollama** daemon | nothing (or `GADFLY_BASE_URL` for a remote host) |
|
||||||
| [`openai-compatible.yml`](openai-compatible.yml) | any **OpenAI-compatible** endpoint (local Ollama `/v1`, gateway, vLLM, OpenRouter…) | `GADFLY_BASE_URL` (+ a key for most gateways) |
|
| [`openai-compatible.yml`](openai-compatible.yml) | any **OpenAI-compatible** endpoint (local Ollama `/v1`, gateway, vLLM, OpenRouter…) | `GADFLY_BASE_URL` (+ a key for most gateways) |
|
||||||
| [`endpoint-aliases.yml`](endpoint-aliases.yml) | **several named backends** at once (one comment each) | repo vars `GADFLY_ENDPOINT_<NAME>` |
|
| [`endpoint-aliases.yml`](endpoint-aliases.yml) | **several named backends** at once (one comment each) | repo vars `GADFLY_ENDPOINT_<NAME>` |
|
||||||
|
| [`.gadfly.yml`](.gadfly.yml) | **per-repo specialist config** (not a workflow — goes at your repo root) | — |
|
||||||
|
|
||||||
Common to all:
|
Common to all:
|
||||||
- **Triggers:** new/reopened/ready non-draft PR (auto), `@gadfly review` comment (allowed users),
|
- **Triggers:** new/reopened/ready non-draft PR (auto), `@gadfly review` comment (allowed users),
|
||||||
|
|||||||
@@ -43,6 +43,11 @@ jobs:
|
|||||||
OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }}
|
OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }}
|
||||||
OLLAMA_REVIEW_MODELS: ${{ vars.OLLAMA_REVIEW_MODELS }}
|
OLLAMA_REVIEW_MODELS: ${{ vars.OLLAMA_REVIEW_MODELS }}
|
||||||
GADFLY_ALLOWED_USERS: ${{ vars.GADFLY_ALLOWED_USERS }}
|
GADFLY_ALLOWED_USERS: ${{ vars.GADFLY_ALLOWED_USERS }}
|
||||||
|
# Specialist suite (optional). Empty = default suite
|
||||||
|
# (security,correctness,maintainability,performance,error-handling).
|
||||||
|
# csv to choose; "all" for everything; or define custom ones via a repo
|
||||||
|
# .gadfly.yml / GADFLY_SPECIALIST_<NAME>. See README "Specialists".
|
||||||
|
GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS }}
|
||||||
# --- Models & providers (optional; default = Ollama Cloud) ----------
|
# --- Models & providers (optional; default = Ollama Cloud) ----------
|
||||||
# Gadfly is majordomo-powered, so it can target other backends. Set a
|
# Gadfly is majordomo-powered, so it can target other backends. Set a
|
||||||
# provider for bare model ids; point at a different endpoint with a
|
# provider for bare model ids; point at a different endpoint with a
|
||||||
|
|||||||
@@ -2,7 +2,10 @@ module gitea.stevedudenhoeffer.com/steve/gadfly
|
|||||||
|
|
||||||
go 1.26.2
|
go 1.26.2
|
||||||
|
|
||||||
require gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b
|
require (
|
||||||
|
gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b
|
||||||
|
gopkg.in/yaml.v3 v3.0.1
|
||||||
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
cloud.google.com/go v0.116.0 // indirect
|
cloud.google.com/go v0.116.0 // indirect
|
||||||
|
|||||||
@@ -121,8 +121,10 @@ google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpAD
|
|||||||
google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c=
|
google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c=
|
||||||
google.golang.org/protobuf v1.34.2 h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg=
|
google.golang.org/protobuf v1.34.2 h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg=
|
||||||
google.golang.org/protobuf v1.34.2/go.mod h1:qYOHts0dSfpeUzUFpOMr/WGzszTmLH+DiWniOlNbLDw=
|
google.golang.org/protobuf v1.34.2/go.mod h1:qYOHts0dSfpeUzUFpOMr/WGzszTmLH+DiWniOlNbLDw=
|
||||||
|
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
|
||||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||||
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||||
|
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
||||||
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||||
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
||||||
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
||||||
|
|||||||
@@ -1,6 +1,10 @@
|
|||||||
You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
|
You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
|
||||||
pull request below — not to praise it. A gadfly does not let things slide.
|
pull request below — not to praise it. A gadfly does not let things slide.
|
||||||
|
|
||||||
|
You review through ONE assigned lens (given at the end of this prompt). Stay in your lane —
|
||||||
|
other reviewers cover the other angles — but do report anything clearly serious you happen
|
||||||
|
to notice.
|
||||||
|
|
||||||
You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
|
You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
|
||||||
state. USE THEM to verify before you report. Do not review the diff in isolation.
|
state. USE THEM to verify before you report. Do not review the diff in isolation.
|
||||||
- read_file(path[, start_line, limit]) — read a file with line numbers.
|
- read_file(path[, start_line, limit]) — read a file with line numbers.
|
||||||
@@ -19,21 +23,10 @@ Mandatory verification discipline — this is the whole point of giving you tool
|
|||||||
- If you cannot confirm a suspicion with the tools, either drop it or clearly label it
|
- If you cannot confirm a suspicion with the tools, either drop it or clearly label it
|
||||||
"unverified" — do NOT present an unchecked guess as a finding.
|
"unverified" — do NOT present an unchecked guess as a finding.
|
||||||
|
|
||||||
Be skeptical and concrete. Hunt specifically for:
|
Be skeptical and concrete, and apply your assigned lens rigorously. A recurring, high-value
|
||||||
- Correctness bugs and logic errors introduced by the change.
|
discipline regardless of lens: do NOT trust a constant, conversion factor, formula, unit, or
|
||||||
- SEMANTIC / domain correctness — the failure mode plausible-looking code hides best.
|
threshold just because it looks reasonable — RE-DERIVE the expected value from first principles
|
||||||
Do NOT trust a constant, conversion factor, formula, unit, or threshold just because
|
and compare. Plausible-looking magic numbers are where real bugs hide.
|
||||||
it looks reasonable. Independently RE-DERIVE the expected value from first principles
|
|
||||||
(units, dimensions, edge values) and compare. A magic number that "looks about right"
|
|
||||||
is exactly where real bugs hide (e.g. a linear factor used where it must be squared).
|
|
||||||
- Concurrency issues: data races, deadlocks, unsynchronized shared state, leaked tasks.
|
|
||||||
- Security problems: injection, missing authz/authn, secret leakage, unsafe input handling.
|
|
||||||
- Error handling gaps: ignored errors, swallowed exceptions, missing rollback/cleanup.
|
|
||||||
- Resource leaks: unclosed handles/bodies/files, context/lifetime misuse, unbounded growth.
|
|
||||||
- Missed edge cases: off-by-one, nil/null, empty collection, overflow, zero/negative.
|
|
||||||
- Violations of THIS repo's own conventions. Discover them — do not assume. Read any
|
|
||||||
README / CONTRIBUTING / CLAUDE.md / AGENTS.md / lint config the repo ships, and hold
|
|
||||||
the change to the patterns the surrounding code actually uses.
|
|
||||||
|
|
||||||
Output rules:
|
Output rules:
|
||||||
- Output GitHub-flavored markdown, concise. No filler, no restating the diff.
|
- Output GitHub-flavored markdown, concise. No filler, no restating the diff.
|
||||||
|
|||||||
Reference in New Issue
Block a user