feat: claude-code reviewer engine (#2)
Build & push image / build-and-push (push) Successful in 28s
Build & push image / build-and-push (push) Successful in 28s
Phase 1: a second review engine alongside the majordomo agent loop. For each lens, shell out to the Claude Code CLI (`claude -p --output-format json`) inside the checked-out repo so it verifies findings with its own read tools, then reuse gadfly's verdict-parse + recheck + consolidate + emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`; auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY; read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node + @anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board image (PR #2 was the first dogfood with the live board + full fleet). Folded in the swarm's own review findings: minimal subprocess env (no GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr), process-group cleanup on timeout, rune-safe error truncation, and engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings graded via the gadfly MCP. gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
This commit was merged in pull request #2.
This commit is contained in:
@@ -45,7 +45,7 @@ jobs:
|
|||||||
# every PR with the 3-lens suite — the slow local lane dominates wall time.
|
# every PR with the 3-lens suite — the slow local lane dominates wall time.
|
||||||
timeout-minutes: 90
|
timeout-minutes: 90
|
||||||
steps:
|
steps:
|
||||||
- uses: docker://gitea.stevedudenhoeffer.com/steve/gadfly:sha-d7f364d
|
- uses: docker://gitea.stevedudenhoeffer.com/steve/gadfly:sha-c3d09d3
|
||||||
env:
|
env:
|
||||||
GITEA_API: ${{ github.server_url }}/api/v1/repos/${{ github.repository }}
|
GITEA_API: ${{ github.server_url }}/api/v1/repos/${{ github.repository }}
|
||||||
GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
|
GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
|
||||||
|
|||||||
@@ -30,6 +30,7 @@ verifies each one against the actual code, and posts its findings as a comment.
|
|||||||
```
|
```
|
||||||
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
|
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
|
||||||
main.go orchestration: loop specialists, each a review pass + adversarial recheck
|
main.go orchestration: loop specialists, each a review pass + adversarial recheck
|
||||||
|
engine.go reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
|
||||||
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
|
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
|
||||||
auto.go dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
|
auto.go dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
|
||||||
delegate.go worker-tier delegate_investigation tool (cheap sub-agent does legwork)
|
delegate.go worker-tier delegate_investigation tool (cheap sub-agent does legwork)
|
||||||
|
|||||||
+6
-1
@@ -24,7 +24,12 @@ RUN --mount=type=cache,target=/go/pkg/mod \
|
|||||||
go build -trimpath -ldflags="-s -w" -o /out/gadfly ./cmd/gadfly
|
go build -trimpath -ldflags="-s -w" -o /out/gadfly ./cmd/gadfly
|
||||||
|
|
||||||
FROM alpine:3.20
|
FROM alpine:3.20
|
||||||
RUN apk add --no-cache bash git curl jq ca-certificates
|
RUN apk add --no-cache bash git curl jq ca-certificates nodejs npm
|
||||||
|
# Bundle the Claude Code CLI so the `claude-code` review engine works out of the
|
||||||
|
# box (GADFLY_MODELS=claude-code or claude-code/<model>). This adds Node + the
|
||||||
|
# CLI to the image (notably larger); ollama-only users pay the size but nothing
|
||||||
|
# else. Auth is provided at runtime via CLAUDE_CODE_OAUTH_TOKEN / ANTHROPIC_API_KEY.
|
||||||
|
RUN npm install -g @anthropic-ai/claude-code && npm cache clean --force
|
||||||
COPY --from=build /out/gadfly /usr/local/bin/gadfly
|
COPY --from=build /out/gadfly /usr/local/bin/gadfly
|
||||||
COPY scripts /app/scripts
|
COPY scripts /app/scripts
|
||||||
COPY entrypoint.sh /entrypoint.sh
|
COPY entrypoint.sh /entrypoint.sh
|
||||||
|
|||||||
@@ -79,6 +79,48 @@ majordomo failover chain / alias) is used verbatim.
|
|||||||
> and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you
|
> and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you
|
||||||
> try a cloud provider and it works (or doesn't), please open an issue.
|
> try a cloud provider and it works (or doesn't), please open an issue.
|
||||||
|
|
||||||
|
### Claude Code engine (`claude-code`)
|
||||||
|
|
||||||
|
Besides the majordomo model loop, Gadfly can review through the **[Claude Code](https://claude.com/claude-code)
|
||||||
|
CLI**: for each lens it shells out to `claude -p` *inside the checked-out repo*, so Claude Code
|
||||||
|
uses its **own** read tools (Read/Grep/Glob) to verify findings against real code, then Gadfly
|
||||||
|
parses the result and runs the same verdict-parse → recheck → consolidate → emit pipeline. The
|
||||||
|
CLI is bundled in the image (Node + `@anthropic-ai/claude-code`).
|
||||||
|
|
||||||
|
Select it as a model id — bare `claude-code` (CLI default model) or `claude-code/<model>` (the
|
||||||
|
suffix becomes `--model`, e.g. `claude-code/sonnet`, `claude-code/opus`):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
GADFLY_MODELS: "claude-code/sonnet,claude-code/opus"
|
||||||
|
```
|
||||||
|
|
||||||
|
Auth is read from the environment: the default is a **Pro/Max subscription** via
|
||||||
|
`CLAUDE_CODE_OAUTH_TOKEN` (from `claude setup-token`; no `--bare`), falling back to
|
||||||
|
`ANTHROPIC_API_KEY`. Don't set both. Tuning knobs (all optional):
|
||||||
|
|
||||||
|
| Env | Default | Meaning |
|
||||||
|
|-----|---------|---------|
|
||||||
|
| `GADFLY_CLAUDE_MODEL` | *(from the spec suffix)* | overrides the `--model` value |
|
||||||
|
| `GADFLY_CLAUDE_PERMISSION_MODE` | `plan` | `--permission-mode` (read-only `plan` keeps it from editing) |
|
||||||
|
| `GADFLY_CLAUDE_ALLOWED_TOOLS` | *(unset)* | `--allowedTools` value, passed verbatim (e.g. `Read,Grep,Glob`) |
|
||||||
|
| `GADFLY_CLAUDE_EXTRA_ARGS` | *(unset)* | extra CLI args, **whitespace-split** (no shell quoting) and appended after the defaults (e.g. `--max-turns 30`) |
|
||||||
|
| `GADFLY_CLAUDE_BIN` | `claude` | CLI binary path |
|
||||||
|
|
||||||
|
> These are **operator** knobs (workflow env), not PR-author input. Because
|
||||||
|
> `GADFLY_CLAUDE_EXTRA_ARGS` is appended *after* the defaults, it can override the
|
||||||
|
> read-only `--permission-mode plan` (e.g. passing `--permission-mode acceptEdits`),
|
||||||
|
> so keep it read-only unless you mean otherwise. It's whitespace-split, so values
|
||||||
|
> can't contain spaces — use `GADFLY_CLAUDE_ALLOWED_TOOLS` / `_PERMISSION_MODE` /
|
||||||
|
> `_MODEL` for those. The subprocess runs with a **minimal environment** (its auth
|
||||||
|
> token + `PATH`/`HOME`/locale/`GADFLY_CLAUDE_*`), not the runner's full env, so the
|
||||||
|
> Gitea token and provider keys aren't handed to the CLI.
|
||||||
|
|
||||||
|
> **Untested, like the cloud providers.** This wires the CLI in and is exercised by its unit
|
||||||
|
> tests, but a live subscription-auth run hasn't been validated end-to-end here — and using
|
||||||
|
> subscription auth in automated CI is a gray area in Anthropic's terms. `auto` specialist
|
||||||
|
> selection and the `delegate_investigation` worker are majordomo-only and are skipped with this
|
||||||
|
> engine (Claude Code does its own legwork).
|
||||||
|
|
||||||
### Endpoint aliases via env vars
|
### Endpoint aliases via env vars
|
||||||
|
|
||||||
For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by
|
For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by
|
||||||
@@ -264,6 +306,7 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
|
|||||||
| `GADFLY_PROVIDER` | `ollama-cloud` | provider prefix for a bare model id |
|
| `GADFLY_PROVIDER` | `ollama-cloud` | provider prefix for a bare model id |
|
||||||
| `GADFLY_BASE_URL` | — | override endpoint (OpenAI/Ollama-compatible servers) |
|
| `GADFLY_BASE_URL` | — | override endpoint (OpenAI/Ollama-compatible servers) |
|
||||||
| `GADFLY_API_KEY` | — | provider key; falls back to the provider's standard env |
|
| `GADFLY_API_KEY` | — | provider key; falls back to the provider's standard env |
|
||||||
|
| `claude-code` model id | — | route a model through the bundled Claude Code CLI (`claude-code` / `claude-code/<model>`); see [Claude Code engine](#claude-code-engine-claude-code) for its `GADFLY_CLAUDE_*` knobs |
|
||||||
| `GADFLY_SPECIALISTS` | default suite | csv of lenses, `all`, or `auto` (dynamic selection) |
|
| `GADFLY_SPECIALISTS` | default suite | csv of lenses, `all`, or `auto` (dynamic selection) |
|
||||||
| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
|
| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
|
||||||
| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
|
| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
|
||||||
|
|||||||
@@ -0,0 +1,227 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"os/exec"
|
||||||
|
"strings"
|
||||||
|
"syscall"
|
||||||
|
"unicode/utf8"
|
||||||
|
|
||||||
|
llm "gitea.stevedudenhoeffer.com/steve/majordomo/llm"
|
||||||
|
)
|
||||||
|
|
||||||
|
// reviewEngine runs a single agent pass against the checked-out repo and returns
|
||||||
|
// the model's text answer. It is the one primitive both review passes use — the
|
||||||
|
// draft review and the adversarial recheck — so the rest of the pipeline
|
||||||
|
// (specialist composition, recheck orchestration, consolidation, emit) is
|
||||||
|
// engine-agnostic. Two implementations:
|
||||||
|
//
|
||||||
|
// - majordomoEngine: the original path — a majordomo tool-using agent loop
|
||||||
|
// (read_file/grep/… over a sandboxed repoFS).
|
||||||
|
// - claudeCodeEngine: shells out to the `claude` CLI in print mode, which
|
||||||
|
// brings its OWN repo tools; gadfly just feeds it the prompt and reads back
|
||||||
|
// the final text.
|
||||||
|
//
|
||||||
|
// maxSteps is the tool-step budget for engines that have one (majordomo); the
|
||||||
|
// claude-code engine manages its own loop and ignores it.
|
||||||
|
type reviewEngine interface {
|
||||||
|
runPass(ctx context.Context, system, task string, maxSteps int) (string, error)
|
||||||
|
}
|
||||||
|
|
||||||
|
// majordomoEngine drives the in-process majordomo agent over the repo sandbox.
|
||||||
|
type majordomoEngine struct {
|
||||||
|
mdl llm.Model
|
||||||
|
fsTools *repoFS
|
||||||
|
}
|
||||||
|
|
||||||
|
func (e *majordomoEngine) runPass(ctx context.Context, system, task string, maxSteps int) (string, error) {
|
||||||
|
return runAgent(ctx, e.mdl, e.fsTools, system, task, maxSteps)
|
||||||
|
}
|
||||||
|
|
||||||
|
// claudeCodeEngine reviews by shelling out to the `claude` CLI (Claude Code) in
|
||||||
|
// non-interactive print mode. Claude Code reads the checked-out tree with its
|
||||||
|
// own read tools (so it verifies findings against real code, like the agentic
|
||||||
|
// majordomo path), and we parse its final answer out of `--output-format json`.
|
||||||
|
//
|
||||||
|
// Auth is inherited from the environment: the default backend is a Pro/Max
|
||||||
|
// subscription via CLAUDE_CODE_OAUTH_TOKEN (no `--bare`). See README.
|
||||||
|
type claudeCodeEngine struct {
|
||||||
|
bin string // CLI binary (GADFLY_CLAUDE_BIN, default "claude")
|
||||||
|
model string // --model value ("" = CLI default)
|
||||||
|
repoDir string // cwd for the CLI, so its tools read the checked-out tree
|
||||||
|
permissionMode string // --permission-mode (default "plan": read-only, no edits)
|
||||||
|
allowedTools string // --allowedTools value, passed verbatim ("" = omit)
|
||||||
|
extraArgs []string // appended verbatim (GADFLY_CLAUDE_EXTRA_ARGS)
|
||||||
|
}
|
||||||
|
|
||||||
|
// isClaudeCodeSpec reports whether a GADFLY_MODEL spec selects the claude-code
|
||||||
|
// engine: the bare id "claude-code" or a "claude-code/<model>" form.
|
||||||
|
func isClaudeCodeSpec(model string) bool {
|
||||||
|
m := strings.TrimSpace(model)
|
||||||
|
return m == "claude-code" || strings.HasPrefix(m, "claude-code/")
|
||||||
|
}
|
||||||
|
|
||||||
|
// newClaudeCodeEngine builds the engine from the GADFLY_MODEL spec and the
|
||||||
|
// optional GADFLY_CLAUDE_* overrides. The model after the slash in
|
||||||
|
// "claude-code/<model>" becomes --model (e.g. "claude-code/sonnet" → "sonnet");
|
||||||
|
// GADFLY_CLAUDE_MODEL overrides it. It does not verify the CLI is installed —
|
||||||
|
// a missing binary surfaces as a normal pass error (advisory, never fatal).
|
||||||
|
func newClaudeCodeEngine(spec, repoDir string) *claudeCodeEngine {
|
||||||
|
model := strings.TrimSpace(os.Getenv("GADFLY_CLAUDE_MODEL"))
|
||||||
|
if model == "" {
|
||||||
|
if _, after, ok := strings.Cut(strings.TrimSpace(spec), "/"); ok {
|
||||||
|
model = strings.TrimSpace(after)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return &claudeCodeEngine{
|
||||||
|
bin: envOr("GADFLY_CLAUDE_BIN", "claude"),
|
||||||
|
model: model,
|
||||||
|
repoDir: repoDir,
|
||||||
|
permissionMode: envOr("GADFLY_CLAUDE_PERMISSION_MODE", "plan"),
|
||||||
|
allowedTools: strings.TrimSpace(os.Getenv("GADFLY_CLAUDE_ALLOWED_TOOLS")),
|
||||||
|
extraArgs: strings.Fields(os.Getenv("GADFLY_CLAUDE_EXTRA_ARGS")),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// args assembles the `claude` argv for one pass. Factored out (and pure) so it
|
||||||
|
// can be unit-tested without invoking the CLI. The system prompt is layered on
|
||||||
|
// top of Claude Code's own via --append-system-prompt; the task is the -p
|
||||||
|
// prompt.
|
||||||
|
func (e *claudeCodeEngine) args(system, task string) []string {
|
||||||
|
a := []string{"-p", task, "--output-format", "json", "--append-system-prompt", system}
|
||||||
|
if e.model != "" {
|
||||||
|
a = append(a, "--model", e.model)
|
||||||
|
}
|
||||||
|
if e.permissionMode != "" {
|
||||||
|
a = append(a, "--permission-mode", e.permissionMode)
|
||||||
|
}
|
||||||
|
if e.allowedTools != "" {
|
||||||
|
a = append(a, "--allowedTools", e.allowedTools)
|
||||||
|
}
|
||||||
|
return append(a, e.extraArgs...)
|
||||||
|
}
|
||||||
|
|
||||||
|
// claudeResult is the subset of `claude --output-format json` we read.
|
||||||
|
type claudeResult struct {
|
||||||
|
Result string `json:"result"`
|
||||||
|
IsError bool `json:"is_error"`
|
||||||
|
Subtype string `json:"subtype"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func (e *claudeCodeEngine) runPass(ctx context.Context, system, task string, _ int) (string, error) {
|
||||||
|
cmd := exec.CommandContext(ctx, e.bin, e.args(system, task)...)
|
||||||
|
cmd.Dir = e.repoDir
|
||||||
|
cmd.Env = claudeEnv() // minimal env — don't hand GITEA_TOKEN et al. to the CLI
|
||||||
|
// Put the CLI and the Node children it spawns in their own process group and
|
||||||
|
// kill the WHOLE group on context cancel, so a timed-out lens can't leave
|
||||||
|
// orphaned claude/node processes behind in the container.
|
||||||
|
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
|
||||||
|
cmd.Cancel = func() error {
|
||||||
|
if cmd.Process != nil {
|
||||||
|
_ = syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
var stdout, stderr bytes.Buffer
|
||||||
|
cmd.Stdout = &stdout
|
||||||
|
cmd.Stderr = &stderr
|
||||||
|
|
||||||
|
runErr := cmd.Run()
|
||||||
|
|
||||||
|
// A cancelled/timed-out run must surface as an error, never as whatever
|
||||||
|
// partial bytes the CLI flushed before it was killed.
|
||||||
|
if ctx.Err() != nil {
|
||||||
|
return "", fmt.Errorf("claude -p %v", ctx.Err())
|
||||||
|
}
|
||||||
|
|
||||||
|
var res claudeResult
|
||||||
|
parsed := json.Unmarshal(bytes.TrimSpace(stdout.Bytes()), &res) == nil
|
||||||
|
|
||||||
|
// Clean exit: trust the parsed JSON answer, and ONLY it — never fall back to
|
||||||
|
// the raw JSON envelope when the result is empty.
|
||||||
|
if runErr == nil && parsed {
|
||||||
|
if res.IsError {
|
||||||
|
return "", fmt.Errorf("claude reported error (%s): %s", res.Subtype, truncateForErr(res.Result))
|
||||||
|
}
|
||||||
|
if out := strings.TrimSpace(res.Result); out != "" {
|
||||||
|
return out, nil
|
||||||
|
}
|
||||||
|
return "", fmt.Errorf("claude -p returned an empty result")
|
||||||
|
}
|
||||||
|
|
||||||
|
if runErr != nil {
|
||||||
|
// Prefer the CLI's own structured error message when it gave one.
|
||||||
|
if parsed && res.IsError && strings.TrimSpace(res.Result) != "" {
|
||||||
|
return "", fmt.Errorf("claude reported error (%s): %s", res.Subtype, truncateForErr(res.Result))
|
||||||
|
}
|
||||||
|
detail := truncateForErr(stderr.String())
|
||||||
|
if detail == "" {
|
||||||
|
detail = truncateForErr(stdout.String())
|
||||||
|
}
|
||||||
|
if detail != "" {
|
||||||
|
return "", fmt.Errorf("claude -p failed: %v: %s", runErr, detail)
|
||||||
|
}
|
||||||
|
return "", fmt.Errorf("claude -p failed: %v", runErr)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean exit but stdout wasn't the expected JSON envelope: degrade to the raw
|
||||||
|
// text so a CLI format change still yields a review instead of nothing.
|
||||||
|
if raw := strings.TrimSpace(stdout.String()); raw != "" {
|
||||||
|
return raw, nil
|
||||||
|
}
|
||||||
|
return "", fmt.Errorf("claude -p produced no parseable output")
|
||||||
|
}
|
||||||
|
|
||||||
|
// claudeEnv builds a minimal environment for the `claude` subprocess: only what
|
||||||
|
// the CLI needs (PATH/HOME, its auth tokens, locale, Node/XDG/GADFLY_CLAUDE_*
|
||||||
|
// knobs), deliberately dropping the rest of the runner's secrets — GITEA_TOKEN,
|
||||||
|
// GADFLY_FINDINGS_TOKEN, provider keys — so they never reach the third-party
|
||||||
|
// CLI. Defense in depth: the parent already holds them, but the CLI has no need.
|
||||||
|
func claudeEnv() []string {
|
||||||
|
keep := func(k string) bool {
|
||||||
|
switch k {
|
||||||
|
case "PATH", "HOME", "USER", "LOGNAME", "TMPDIR", "LANG", "TERM", "SHELL":
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
return strings.HasPrefix(k, "LC_") ||
|
||||||
|
strings.HasPrefix(k, "CLAUDE_") ||
|
||||||
|
strings.HasPrefix(k, "ANTHROPIC_") ||
|
||||||
|
strings.HasPrefix(k, "GADFLY_CLAUDE_") ||
|
||||||
|
strings.HasPrefix(k, "NODE_") ||
|
||||||
|
strings.HasPrefix(k, "XDG_")
|
||||||
|
}
|
||||||
|
var env []string
|
||||||
|
for _, kv := range os.Environ() {
|
||||||
|
if k, _, ok := strings.Cut(kv, "="); ok && keep(k) {
|
||||||
|
env = append(env, kv)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return env
|
||||||
|
}
|
||||||
|
|
||||||
|
// truncateForErr caps CLI error detail so a stderr dump can't bloat the comment,
|
||||||
|
// cutting on a rune boundary so it never emits invalid UTF-8.
|
||||||
|
func truncateForErr(s string) string {
|
||||||
|
s = strings.TrimSpace(s)
|
||||||
|
const max = 800
|
||||||
|
if len(s) <= max {
|
||||||
|
return s
|
||||||
|
}
|
||||||
|
cut := max
|
||||||
|
for cut > 0 && !utf8.RuneStart(s[cut]) {
|
||||||
|
cut--
|
||||||
|
}
|
||||||
|
return s[:cut] + "…"
|
||||||
|
}
|
||||||
|
|
||||||
|
// envOr returns the env var value or a default when unset/blank.
|
||||||
|
func envOr(name, def string) string {
|
||||||
|
if v := strings.TrimSpace(os.Getenv(name)); v != "" {
|
||||||
|
return v
|
||||||
|
}
|
||||||
|
return def
|
||||||
|
}
|
||||||
@@ -0,0 +1,219 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"os"
|
||||||
|
"slices"
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
"unicode/utf8"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestIsClaudeCodeSpec(t *testing.T) {
|
||||||
|
cases := map[string]bool{
|
||||||
|
"claude-code": true,
|
||||||
|
"claude-code/sonnet": true,
|
||||||
|
"claude-code/opus": true,
|
||||||
|
"claude-code/claude-opus-4-8": true,
|
||||||
|
" claude-code ": true, // trimmed
|
||||||
|
"qwen3-coder:480b-cloud": false,
|
||||||
|
"claude-code-extra": false, // not the bare id, not a "/" form
|
||||||
|
"sonnet": false,
|
||||||
|
"": false,
|
||||||
|
}
|
||||||
|
for spec, want := range cases {
|
||||||
|
if got := isClaudeCodeSpec(spec); got != want {
|
||||||
|
t.Errorf("isClaudeCodeSpec(%q) = %v, want %v", spec, got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestNewClaudeCodeEngineModel(t *testing.T) {
|
||||||
|
// model derived from the spec's "/<model>" suffix
|
||||||
|
t.Setenv("GADFLY_CLAUDE_MODEL", "")
|
||||||
|
if e := newClaudeCodeEngine("claude-code/sonnet", "/repo"); e.model != "sonnet" {
|
||||||
|
t.Errorf("model = %q, want sonnet", e.model)
|
||||||
|
}
|
||||||
|
// bare spec → CLI default (no --model)
|
||||||
|
if e := newClaudeCodeEngine("claude-code", "/repo"); e.model != "" {
|
||||||
|
t.Errorf("model = %q, want empty for bare spec", e.model)
|
||||||
|
}
|
||||||
|
// GADFLY_CLAUDE_MODEL overrides the spec suffix
|
||||||
|
t.Setenv("GADFLY_CLAUDE_MODEL", "opus")
|
||||||
|
if e := newClaudeCodeEngine("claude-code/sonnet", "/repo"); e.model != "opus" {
|
||||||
|
t.Errorf("model = %q, want opus (env override)", e.model)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestClaudeCodeEngineDefaults(t *testing.T) {
|
||||||
|
t.Setenv("GADFLY_CLAUDE_BIN", "")
|
||||||
|
t.Setenv("GADFLY_CLAUDE_PERMISSION_MODE", "")
|
||||||
|
t.Setenv("GADFLY_CLAUDE_ALLOWED_TOOLS", "")
|
||||||
|
t.Setenv("GADFLY_CLAUDE_EXTRA_ARGS", "")
|
||||||
|
e := newClaudeCodeEngine("claude-code", "/repo")
|
||||||
|
if e.bin != "claude" {
|
||||||
|
t.Errorf("bin = %q, want claude", e.bin)
|
||||||
|
}
|
||||||
|
if e.permissionMode != "plan" {
|
||||||
|
t.Errorf("permissionMode = %q, want plan", e.permissionMode)
|
||||||
|
}
|
||||||
|
if e.repoDir != "/repo" {
|
||||||
|
t.Errorf("repoDir = %q, want /repo", e.repoDir)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// argAfter returns the value following flag in args, or "" if absent.
|
||||||
|
func argAfter(args []string, flag string) string {
|
||||||
|
if i := slices.Index(args, flag); i >= 0 && i+1 < len(args) {
|
||||||
|
return args[i+1]
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestClaudeCodeArgs(t *testing.T) {
|
||||||
|
t.Setenv("GADFLY_CLAUDE_MODEL", "")
|
||||||
|
t.Setenv("GADFLY_CLAUDE_PERMISSION_MODE", "")
|
||||||
|
t.Setenv("GADFLY_CLAUDE_ALLOWED_TOOLS", "Read,Grep,Glob")
|
||||||
|
t.Setenv("GADFLY_CLAUDE_EXTRA_ARGS", "--max-turns 30")
|
||||||
|
e := newClaudeCodeEngine("claude-code/sonnet", "/repo")
|
||||||
|
args := e.args("SYS-PROMPT", "TASK-PROMPT")
|
||||||
|
|
||||||
|
// task is the -p value; json output; system appended; model + policy present.
|
||||||
|
if argAfter(args, "-p") != "TASK-PROMPT" {
|
||||||
|
t.Errorf("-p = %q, want TASK-PROMPT", argAfter(args, "-p"))
|
||||||
|
}
|
||||||
|
if argAfter(args, "--output-format") != "json" {
|
||||||
|
t.Errorf("--output-format = %q, want json", argAfter(args, "--output-format"))
|
||||||
|
}
|
||||||
|
if argAfter(args, "--append-system-prompt") != "SYS-PROMPT" {
|
||||||
|
t.Errorf("--append-system-prompt = %q, want SYS-PROMPT", argAfter(args, "--append-system-prompt"))
|
||||||
|
}
|
||||||
|
if argAfter(args, "--model") != "sonnet" {
|
||||||
|
t.Errorf("--model = %q, want sonnet", argAfter(args, "--model"))
|
||||||
|
}
|
||||||
|
if argAfter(args, "--permission-mode") != "plan" {
|
||||||
|
t.Errorf("--permission-mode = %q, want plan", argAfter(args, "--permission-mode"))
|
||||||
|
}
|
||||||
|
if argAfter(args, "--allowedTools") != "Read,Grep,Glob" {
|
||||||
|
t.Errorf("--allowedTools = %q, want Read,Grep,Glob", argAfter(args, "--allowedTools"))
|
||||||
|
}
|
||||||
|
// extra args appended verbatim (split on whitespace)
|
||||||
|
if !strings.Contains(strings.Join(args, " "), "--max-turns 30") {
|
||||||
|
t.Errorf("extra args not appended: %v", args)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestClaudeCodeArgsBareModelOmitsFlag(t *testing.T) {
|
||||||
|
t.Setenv("GADFLY_CLAUDE_MODEL", "")
|
||||||
|
t.Setenv("GADFLY_CLAUDE_ALLOWED_TOOLS", "") // omit when blank
|
||||||
|
t.Setenv("GADFLY_CLAUDE_EXTRA_ARGS", "")
|
||||||
|
e := newClaudeCodeEngine("claude-code", "/repo")
|
||||||
|
args := e.args("s", "t")
|
||||||
|
if slices.Contains(args, "--model") {
|
||||||
|
t.Errorf("--model should be omitted for a bare claude-code spec: %v", args)
|
||||||
|
}
|
||||||
|
if slices.Contains(args, "--allowedTools") {
|
||||||
|
t.Errorf("--allowedTools should be omitted when blank: %v", args)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestClaudeEnvFilters(t *testing.T) {
|
||||||
|
t.Setenv("GITEA_TOKEN", "secret-gitea")
|
||||||
|
t.Setenv("OLLAMA_API_KEY", "secret-ollama")
|
||||||
|
t.Setenv("GADFLY_API_KEY", "secret-gadfly")
|
||||||
|
t.Setenv("GADFLY_FINDINGS_TOKEN", "secret-findings")
|
||||||
|
t.Setenv("CLAUDE_CODE_OAUTH_TOKEN", "keep-claude")
|
||||||
|
t.Setenv("ANTHROPIC_API_KEY", "keep-anthropic")
|
||||||
|
t.Setenv("GADFLY_CLAUDE_MODEL", "keep-knob")
|
||||||
|
|
||||||
|
env := claudeEnv()
|
||||||
|
has := func(k string) bool {
|
||||||
|
for _, kv := range env {
|
||||||
|
if strings.HasPrefix(kv, k+"=") {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
// kept: the CLI's auth + its own knobs + PATH
|
||||||
|
for _, k := range []string{"CLAUDE_CODE_OAUTH_TOKEN", "ANTHROPIC_API_KEY", "GADFLY_CLAUDE_MODEL", "PATH"} {
|
||||||
|
if !has(k) {
|
||||||
|
t.Errorf("claudeEnv dropped %s, but it should be kept", k)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// dropped: the runner's secrets the CLI doesn't need
|
||||||
|
for _, k := range []string{"GITEA_TOKEN", "OLLAMA_API_KEY", "GADFLY_API_KEY", "GADFLY_FINDINGS_TOKEN"} {
|
||||||
|
if has(k) {
|
||||||
|
t.Errorf("claudeEnv leaked %s into the subprocess env", k)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestTruncateForErrRuneSafe(t *testing.T) {
|
||||||
|
// 900 multibyte runes (3 bytes each) -> well over the 800-byte cap; the cut
|
||||||
|
// must land on a rune boundary so the result stays valid UTF-8.
|
||||||
|
s := strings.Repeat("€", 900)
|
||||||
|
got := truncateForErr(s)
|
||||||
|
if !utf8.ValidString(got) {
|
||||||
|
t.Fatalf("truncateForErr produced invalid UTF-8")
|
||||||
|
}
|
||||||
|
if !strings.HasSuffix(got, "…") {
|
||||||
|
t.Fatalf("truncateForErr should append an ellipsis when truncating")
|
||||||
|
}
|
||||||
|
// short strings pass through untouched
|
||||||
|
if truncateForErr(" hi ") != "hi" {
|
||||||
|
t.Fatalf("truncateForErr should trim and pass short strings through")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// stubClaude writes an executable shell stub that prints body and exits code,
|
||||||
|
// and returns an engine pointed at it.
|
||||||
|
func stubClaude(t *testing.T, body string, code int) *claudeCodeEngine {
|
||||||
|
t.Helper()
|
||||||
|
dir := t.TempDir()
|
||||||
|
path := dir + "/claude-stub.sh"
|
||||||
|
script := "#!/bin/sh\nprintf '%s' " + shSingleQuote(body) + "\nexit " + itoa(code) + "\n"
|
||||||
|
if err := os.WriteFile(path, []byte(script), 0o755); err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
return &claudeCodeEngine{bin: path, repoDir: dir}
|
||||||
|
}
|
||||||
|
|
||||||
|
func shSingleQuote(s string) string { return "'" + strings.ReplaceAll(s, "'", `'\''`) + "'" }
|
||||||
|
func itoa(i int) string { return string(rune('0' + i)) } // single-digit exit codes only
|
||||||
|
|
||||||
|
func TestRunPassCleanResult(t *testing.T) {
|
||||||
|
e := stubClaude(t, `{"result":"REVIEW TEXT","is_error":false}`, 0)
|
||||||
|
out, err := e.runPass(context.Background(), "sys", "task", 0)
|
||||||
|
if err != nil || out != "REVIEW TEXT" {
|
||||||
|
t.Fatalf("clean result: got (%q, %v), want (REVIEW TEXT, nil)", out, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestRunPassEmptyResultIsError(t *testing.T) {
|
||||||
|
// JSON parses, exit 0, but result empty: must NOT return the raw JSON blob.
|
||||||
|
e := stubClaude(t, `{"result":"","is_error":false}`, 0)
|
||||||
|
out, err := e.runPass(context.Background(), "sys", "task", 0)
|
||||||
|
if err == nil {
|
||||||
|
t.Fatalf("empty result should be an error, got out=%q", out)
|
||||||
|
}
|
||||||
|
if strings.Contains(out, "{") {
|
||||||
|
t.Fatalf("empty result must not leak raw JSON, got %q", out)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestRunPassIsErrorFlag(t *testing.T) {
|
||||||
|
e := stubClaude(t, `{"result":"boom","is_error":true,"subtype":"error_max_turns"}`, 0)
|
||||||
|
_, err := e.runPass(context.Background(), "sys", "task", 0)
|
||||||
|
if err == nil || !strings.Contains(err.Error(), "claude reported error") {
|
||||||
|
t.Fatalf("is_error should surface as an error, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestRunPassNonZeroNoJSON(t *testing.T) {
|
||||||
|
e := stubClaude(t, "fatal: auth failed", 1)
|
||||||
|
_, err := e.runPass(context.Background(), "sys", "task", 0)
|
||||||
|
if err == nil || !strings.Contains(err.Error(), "claude -p failed") {
|
||||||
|
t.Fatalf("non-zero exit should error with detail, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -104,7 +104,7 @@ func TestRunSpecialists_FansOut(t *testing.T) {
|
|||||||
}
|
}
|
||||||
specs := threeLenses()
|
specs := threeLenses()
|
||||||
|
|
||||||
results := runSpecialists(mdl, fs, "sys", specs, "task", "diff")
|
results := runSpecialists(&majordomoEngine{mdl: mdl, fsTools: fs}, "sys", specs, "task", "diff")
|
||||||
|
|
||||||
if got := peak(); got != 3 {
|
if got := peak(); got != 3 {
|
||||||
t.Errorf("peak concurrent lenses = %d, want 3", got)
|
t.Errorf("peak concurrent lenses = %d, want 3", got)
|
||||||
@@ -124,7 +124,7 @@ func TestRunSpecialists_SequentialByDefault(t *testing.T) {
|
|||||||
}
|
}
|
||||||
specs := threeLenses()
|
specs := threeLenses()
|
||||||
|
|
||||||
results := runSpecialists(mdl, fs, "sys", specs, "task", "diff")
|
results := runSpecialists(&majordomoEngine{mdl: mdl, fsTools: fs}, "sys", specs, "task", "diff")
|
||||||
|
|
||||||
if got := peak(); got != 1 {
|
if got := peak(); got != 1 {
|
||||||
t.Errorf("peak concurrent lenses = %d, want 1 (sequential by default)", got)
|
t.Errorf("peak concurrent lenses = %d, want 1 (sequential by default)", got)
|
||||||
@@ -146,7 +146,7 @@ func TestRunSpecialists_PerProviderFanOut(t *testing.T) {
|
|||||||
}
|
}
|
||||||
specs := threeLenses()
|
specs := threeLenses()
|
||||||
|
|
||||||
results := runSpecialists(mdl, fs, "sys", specs, "task", "diff")
|
results := runSpecialists(&majordomoEngine{mdl: mdl, fsTools: fs}, "sys", specs, "task", "diff")
|
||||||
|
|
||||||
if got := peak(); got != 3 {
|
if got := peak(); got != 3 {
|
||||||
t.Errorf("peak concurrent lenses = %d, want 3 (m1 per-provider override)", got)
|
t.Errorf("peak concurrent lenses = %d, want 3 (m1 per-provider override)", got)
|
||||||
|
|||||||
+46
-30
@@ -149,17 +149,27 @@ func run() error {
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
mdl, err := resolveModel()
|
// Resolve the review engine. The claude-code engine shells out to the
|
||||||
if err != nil {
|
// `claude` CLI (its own repo tools); every other spec is a majordomo model.
|
||||||
return fmt.Errorf("resolve model: %w", err)
|
// auto-selection and the delegate worker are majordomo-only — with
|
||||||
}
|
// claude-code they're skipped (Claude Code does its own legwork).
|
||||||
|
ccSpec := isClaudeCodeSpec(os.Getenv("GADFLY_MODEL"))
|
||||||
// Optional cheap worker for delegate_investigation. Non-fatal: a bad worker
|
var eng reviewEngine
|
||||||
// spec just disables delegation rather than sinking the review.
|
if ccSpec {
|
||||||
if worker, werr := resolveWorkerModel(); werr != nil {
|
eng = newClaudeCodeEngine(os.Getenv("GADFLY_MODEL"), fsTools.root)
|
||||||
fmt.Fprintln(os.Stderr, "gadfly: worker model disabled:", werr)
|
} else {
|
||||||
} else if worker != nil {
|
mdl, merr := resolveModel()
|
||||||
fsTools.worker = worker
|
if merr != nil {
|
||||||
|
return fmt.Errorf("resolve model: %w", merr)
|
||||||
|
}
|
||||||
|
// Optional cheap worker for delegate_investigation. Non-fatal: a bad
|
||||||
|
// worker spec just disables delegation rather than sinking the review.
|
||||||
|
if worker, werr := resolveWorkerModel(); werr != nil {
|
||||||
|
fmt.Fprintln(os.Stderr, "gadfly: worker model disabled:", werr)
|
||||||
|
} else if worker != nil {
|
||||||
|
fsTools.worker = worker
|
||||||
|
}
|
||||||
|
eng = &majordomoEngine{mdl: mdl, fsTools: fsTools}
|
||||||
}
|
}
|
||||||
|
|
||||||
specialists, registry, auto, serrs := resolveSpecialists(repoDir)
|
specialists, registry, auto, serrs := resolveSpecialists(repoDir)
|
||||||
@@ -168,20 +178,26 @@ func run() error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Dynamic selection: a (cheap) model picks the lenses this diff needs.
|
// Dynamic selection: a (cheap) model picks the lenses this diff needs.
|
||||||
|
// Majordomo-only — the selector is an llm.Model.
|
||||||
if auto {
|
if auto {
|
||||||
selector, serr := resolveSelectorModel(mdl)
|
if ccSpec {
|
||||||
if serr != nil {
|
fmt.Fprintln(os.Stderr, "gadfly: auto-select is not supported with the claude-code engine; using the default suite")
|
||||||
return fmt.Errorf("resolve selector model: %w", serr)
|
|
||||||
}
|
|
||||||
selCtx, cancel := context.WithTimeout(context.Background(), autoSelectTimeout)
|
|
||||||
picked, aerr := autoSelectSpecialists(selCtx, selector, os.Getenv("GADFLY_TITLE"), os.Getenv("GADFLY_BODY"), diff, registry)
|
|
||||||
cancel()
|
|
||||||
if aerr != nil {
|
|
||||||
fmt.Fprintln(os.Stderr, "gadfly: auto-select failed; falling back to the default suite:", aerr)
|
|
||||||
specialists = suiteFromRegistry(registry, defaultSuite)
|
specialists = suiteFromRegistry(registry, defaultSuite)
|
||||||
} else {
|
} else {
|
||||||
specialists = picked
|
selector, serr := resolveSelectorModel(eng.(*majordomoEngine).mdl)
|
||||||
fmt.Fprintln(os.Stderr, "gadfly: auto-selected specialists:", specialistNamesOf(specialists))
|
if serr != nil {
|
||||||
|
return fmt.Errorf("resolve selector model: %w", serr)
|
||||||
|
}
|
||||||
|
selCtx, cancel := context.WithTimeout(context.Background(), autoSelectTimeout)
|
||||||
|
picked, aerr := autoSelectSpecialists(selCtx, selector, os.Getenv("GADFLY_TITLE"), os.Getenv("GADFLY_BODY"), diff, registry)
|
||||||
|
cancel()
|
||||||
|
if aerr != nil {
|
||||||
|
fmt.Fprintln(os.Stderr, "gadfly: auto-select failed; falling back to the default suite:", aerr)
|
||||||
|
specialists = suiteFromRegistry(registry, defaultSuite)
|
||||||
|
} else {
|
||||||
|
specialists = picked
|
||||||
|
fmt.Fprintln(os.Stderr, "gadfly: auto-selected specialists:", specialistNamesOf(specialists))
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -191,7 +207,7 @@ func run() error {
|
|||||||
|
|
||||||
base := string(systemBytes)
|
base := string(systemBytes)
|
||||||
task := buildTask(diff)
|
task := buildTask(diff)
|
||||||
results := runSpecialists(mdl, fsTools, base, specialists, task, diff)
|
results := runSpecialists(eng, base, specialists, task, diff)
|
||||||
|
|
||||||
fmt.Println(renderConsolidated(results))
|
fmt.Println(renderConsolidated(results))
|
||||||
|
|
||||||
@@ -215,7 +231,7 @@ func run() error {
|
|||||||
// per-provider model concurrency, so total concurrent backend requests ≈
|
// per-provider model concurrency, so total concurrent backend requests ≈
|
||||||
// (models at once) × (lenses at once). To fan lenses out without oversubscribing
|
// (models at once) × (lenses at once). To fan lenses out without oversubscribing
|
||||||
// the backend, run models one at a time (provider lane cap 1) and raise this.
|
// the backend, run models one at a time (provider lane cap 1) and raise this.
|
||||||
func runSpecialists(mdl llm.Model, fsTools *repoFS, base string, specialists []Specialist, task, diff string) []specialistResult {
|
func runSpecialists(eng reviewEngine, base string, specialists []Specialist, task, diff string) []specialistResult {
|
||||||
results := make([]specialistResult, len(specialists))
|
results := make([]specialistResult, len(specialists))
|
||||||
|
|
||||||
// Optional live status board: publishes this model's per-lens progress to a
|
// Optional live status board: publishes this model's per-lens progress to a
|
||||||
@@ -244,7 +260,7 @@ func runSpecialists(mdl llm.Model, fsTools *repoFS, base string, specialists []S
|
|||||||
}
|
}
|
||||||
}()
|
}()
|
||||||
sw.set(sp.Name, lensRunning, "", false)
|
sw.set(sp.Name, lensRunning, "", false)
|
||||||
out, errored := reviewWithSpecialist(mdl, fsTools, base, sp, task, diff)
|
out, errored := reviewWithSpecialist(eng, base, sp, task, diff)
|
||||||
v := parseVerdict(out)
|
v := parseVerdict(out)
|
||||||
results[i] = specialistResult{spec: sp, out: out, verdict: v, errored: errored}
|
results[i] = specialistResult{spec: sp, out: out, verdict: v, errored: errored}
|
||||||
sw.set(sp.Name, lensFinished, v.label(), errored)
|
sw.set(sp.Name, lensFinished, v.label(), errored)
|
||||||
@@ -290,12 +306,12 @@ func providerOverride(envName, provider string) (int, bool) {
|
|||||||
// specialist's composed prompt, then the shared adversarial recheck pass. The
|
// specialist's composed prompt, then the shared adversarial recheck pass. The
|
||||||
// returned bool is true when the review pass failed (rendered as an inline
|
// returned bool is true when the review pass failed (rendered as an inline
|
||||||
// notice — advisory; one lens failing never sinks the others or the job).
|
// notice — advisory; one lens failing never sinks the others or the job).
|
||||||
func reviewWithSpecialist(mdl llm.Model, fsTools *repoFS, base string, sp Specialist, task, diff string) (string, bool) {
|
func reviewWithSpecialist(eng reviewEngine, base string, sp Specialist, task, diff string) (string, bool) {
|
||||||
timeout := time.Duration(envInt("GADFLY_TIMEOUT_SECS", defaultTimeoutSecs)) * time.Second
|
timeout := time.Duration(envInt("GADFLY_TIMEOUT_SECS", defaultTimeoutSecs)) * time.Second
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), timeout)
|
ctx, cancel := context.WithTimeout(context.Background(), timeout)
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
draft, err := runAgent(ctx, mdl, fsTools, composeSpecialistPrompt(base, sp), task,
|
draft, err := eng.runPass(ctx, composeSpecialistPrompt(base, sp), task,
|
||||||
envInt("GADFLY_MAX_STEPS", defaultMaxSteps))
|
envInt("GADFLY_MAX_STEPS", defaultMaxSteps))
|
||||||
if err != nil {
|
if err != nil {
|
||||||
fmt.Fprintf(os.Stderr, "gadfly: specialist %q review pass failed: %v\n", sp.Name, err)
|
fmt.Fprintf(os.Stderr, "gadfly: specialist %q review pass failed: %v\n", sp.Name, err)
|
||||||
@@ -304,7 +320,7 @@ func reviewWithSpecialist(mdl llm.Model, fsTools *repoFS, base string, sp Specia
|
|||||||
|
|
||||||
final := draft
|
final := draft
|
||||||
if shouldRecheck(draft) {
|
if shouldRecheck(draft) {
|
||||||
rechecked, rerr := runAgent(ctx, mdl, fsTools, recheckSystemPrompt, buildRecheckTask(draft, diff),
|
rechecked, rerr := eng.runPass(ctx, recheckSystemPrompt, buildRecheckTask(draft, diff),
|
||||||
envInt("GADFLY_RECHECK_MAX_STEPS", defaultRecheckMaxSteps))
|
envInt("GADFLY_RECHECK_MAX_STEPS", defaultRecheckMaxSteps))
|
||||||
if rerr != nil {
|
if rerr != nil {
|
||||||
fmt.Fprintf(os.Stderr, "gadfly: specialist %q recheck failed; emitting unverified draft: %v\n", sp.Name, rerr)
|
fmt.Fprintf(os.Stderr, "gadfly: specialist %q recheck failed; emitting unverified draft: %v\n", sp.Name, rerr)
|
||||||
@@ -415,7 +431,7 @@ func buildTask(diff string) string {
|
|||||||
truncNote := ""
|
truncNote := ""
|
||||||
if maxDiff > 0 && len(diff) > maxDiff {
|
if maxDiff > 0 && len(diff) > maxDiff {
|
||||||
diff = diff[:maxDiff]
|
diff = diff[:maxDiff]
|
||||||
truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars in this message; call get_diff for the full text.]", maxDiff)
|
truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars in this message; read the changed files (or call get_diff, if available) for the full text.]", maxDiff)
|
||||||
}
|
}
|
||||||
|
|
||||||
var b strings.Builder
|
var b strings.Builder
|
||||||
@@ -425,7 +441,7 @@ func buildTask(diff string) string {
|
|||||||
if strings.TrimSpace(body) != "" {
|
if strings.TrimSpace(body) != "" {
|
||||||
fmt.Fprintf(&b, "PR description:\n%s\n\n", body)
|
fmt.Fprintf(&b, "PR description:\n%s\n\n", body)
|
||||||
}
|
}
|
||||||
b.WriteString("Review the following unified diff. Before reporting any cross-file or compile-correctness issue, use your tools (read_file, grep, find_files) to verify it against the actual checked-out code — do not rely on the diff alone.\n\n")
|
b.WriteString("Review the following unified diff. Before reporting any cross-file or compile-correctness issue, use your repository read tools to verify it against the actual checked-out code — do not rely on the diff alone.\n\n")
|
||||||
fmt.Fprintf(&b, "```diff\n%s\n```%s", diff, truncNote)
|
fmt.Fprintf(&b, "```diff\n%s\n```%s", diff, truncNote)
|
||||||
return b.String()
|
return b.String()
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -16,15 +16,14 @@ const defaultRecheckMaxSteps = 16
|
|||||||
// against the real code before letting it survive — the antidote to a
|
// against the real code before letting it survive — the antidote to a
|
||||||
// single-pass reviewer that reads a couple of files, mis-connects them, and
|
// single-pass reviewer that reads a couple of files, mis-connects them, and
|
||||||
// posts a confident but wrong "blocking" verdict.
|
// posts a confident but wrong "blocking" verdict.
|
||||||
const recheckSystemPrompt = `You are a VERIFICATION GATE for an automated adversarial code review of the
|
const recheckSystemPrompt = `You are a VERIFICATION GATE for an automated adversarial code review. You are
|
||||||
"mort" project (a large Go Discord bot). You are given a DRAFT review produced
|
given a DRAFT review produced by another model. Your job is NOT to write a new
|
||||||
by another model. Your job is NOT to write a new review — it is to confirm or
|
review — it is to confirm or reject each finding in the draft against the ACTUAL
|
||||||
reject each finding in the draft against the ACTUAL code, then output the
|
code, then output the corrected review.
|
||||||
corrected review.
|
|
||||||
|
|
||||||
You have the same read-only repository tools as the original reviewer:
|
You have read-only access to the checked-out repository — use your tools to read
|
||||||
- read_file(path[, start_line, limit]), list_dir([path]), grep(pattern[, path,
|
files and search the code to independently verify each finding against the real
|
||||||
max_results]), find_files(name[, max_results]), get_diff().
|
source.
|
||||||
|
|
||||||
For EVERY finding in the draft:
|
For EVERY finding in the draft:
|
||||||
1. Independently reproduce the reasoning by reading the actual files with your
|
1. Independently reproduce the reasoning by reading the actual files with your
|
||||||
@@ -84,7 +83,7 @@ func buildRecheckTask(draft, diff string) string {
|
|||||||
truncNote := ""
|
truncNote := ""
|
||||||
if maxDiff > 0 && len(diff) > maxDiff {
|
if maxDiff > 0 && len(diff) > maxDiff {
|
||||||
diff = diff[:maxDiff]
|
diff = diff[:maxDiff]
|
||||||
truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars here; call get_diff for the full text.]", maxDiff)
|
truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars here; read the changed files (or call get_diff, if available) for the full text.]", maxDiff)
|
||||||
}
|
}
|
||||||
|
|
||||||
var b strings.Builder
|
var b strings.Builder
|
||||||
|
|||||||
@@ -36,6 +36,9 @@
|
|||||||
# e.g. "ollama" local, "openai", "anthropic", "google")
|
# e.g. "ollama" local, "openai", "anthropic", "google")
|
||||||
# GADFLY_BASE_URL override backend endpoint (OpenAI/Ollama-compatible servers)
|
# GADFLY_BASE_URL override backend endpoint (OpenAI/Ollama-compatible servers)
|
||||||
# GADFLY_API_KEY provider key (else provider's standard env: OPENAI_API_KEY, …)
|
# GADFLY_API_KEY provider key (else provider's standard env: OPENAI_API_KEY, …)
|
||||||
|
# CLAUDE_CODE_OAUTH_TOKEN auth for the claude-code engine (GADFLY_MODELS entry
|
||||||
|
# "claude-code"/"claude-code/<model>"); Pro/Max subscription
|
||||||
|
# token from `claude setup-token`. Else ANTHROPIC_API_KEY.
|
||||||
# GADFLY_TRIGGER_PHRASE comment phrase that triggers a re-review (default "@gadfly review")
|
# GADFLY_TRIGGER_PHRASE comment phrase that triggers a re-review (default "@gadfly review")
|
||||||
# GADFLY_ALLOWED_USERS comma-separated usernames allowed to comment-trigger;
|
# GADFLY_ALLOWED_USERS comma-separated usernames allowed to comment-trigger;
|
||||||
# empty => fall back to "is a repo collaborator"
|
# empty => fall back to "is a repo collaborator"
|
||||||
|
|||||||
@@ -10,6 +10,7 @@ set the secrets/vars it references. Gadfly is advisory only — it never blocks
|
|||||||
| [`local-ollama.yml`](local-ollama.yml) | a **local/LAN Ollama** daemon | nothing (or `GADFLY_BASE_URL` for a remote host) |
|
| [`local-ollama.yml`](local-ollama.yml) | a **local/LAN Ollama** daemon | nothing (or `GADFLY_BASE_URL` for a remote host) |
|
||||||
| [`openai-compatible.yml`](openai-compatible.yml) | any **OpenAI-compatible** endpoint (local Ollama `/v1`, gateway, vLLM, OpenRouter…) | `GADFLY_BASE_URL` (+ a key for most gateways) |
|
| [`openai-compatible.yml`](openai-compatible.yml) | any **OpenAI-compatible** endpoint (local Ollama `/v1`, gateway, vLLM, OpenRouter…) | `GADFLY_BASE_URL` (+ a key for most gateways) |
|
||||||
| [`endpoint-aliases.yml`](endpoint-aliases.yml) | **several named backends** at once (one comment each) | repo vars `GADFLY_ENDPOINT_<NAME>` |
|
| [`endpoint-aliases.yml`](endpoint-aliases.yml) | **several named backends** at once (one comment each) | repo vars `GADFLY_ENDPOINT_<NAME>` |
|
||||||
|
| [`claude-code.yml`](claude-code.yml) | the bundled **Claude Code CLI** engine (`claude-code/<model>`) | secret `CLAUDE_CODE_OAUTH_TOKEN` (or `ANTHROPIC_API_KEY`) |
|
||||||
| [`.gadfly.yml`](.gadfly.yml) | **per-repo specialist config** (not a workflow — goes at your repo root) | — |
|
| [`.gadfly.yml`](.gadfly.yml) | **per-repo specialist config** (not a workflow — goes at your repo root) | — |
|
||||||
|
|
||||||
Common to all:
|
Common to all:
|
||||||
|
|||||||
@@ -0,0 +1,71 @@
|
|||||||
|
# Gadfly reviewing via the Claude Code CLI engine.
|
||||||
|
# Copy to .gitea/workflows/adversarial-review.yml in your repo.
|
||||||
|
#
|
||||||
|
# Instead of a majordomo model, each lens shells out to the bundled `claude` CLI
|
||||||
|
# inside the checked-out repo (it uses its own Read/Grep/Glob tools to verify
|
||||||
|
# findings), then Gadfly runs its usual verdict + recheck + consolidate pipeline.
|
||||||
|
#
|
||||||
|
# Auth: a Pro/Max subscription token from `claude setup-token` (no --bare),
|
||||||
|
# stored as the CLAUDE_CODE_OAUTH_TOKEN secret. Falls back to ANTHROPIC_API_KEY
|
||||||
|
# if you'd rather pay per-token — set only ONE.
|
||||||
|
#
|
||||||
|
# Heads-up: this engine is wired but not yet validated end-to-end here, and using
|
||||||
|
# subscription auth in automated CI is a gray area in Anthropic's terms — read
|
||||||
|
# the README's "Claude Code engine" note before relying on it.
|
||||||
|
|
||||||
|
name: Adversarial Review (Gadfly)
|
||||||
|
|
||||||
|
on:
|
||||||
|
pull_request:
|
||||||
|
types: [opened, reopened, ready_for_review]
|
||||||
|
issue_comment:
|
||||||
|
types: [created]
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
pr_number: { description: "PR number to review", required: true }
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
issues: write
|
||||||
|
pull-requests: write
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: gadfly-${{ github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
review:
|
||||||
|
# Security: only trusted users may trigger a secret-bearing run via a PR
|
||||||
|
# comment. Replace the username(s) below with your maintainers — keep them in
|
||||||
|
# sync with GADFLY_ALLOWED_USERS (the in-container belt-and-suspenders check).
|
||||||
|
if: >-
|
||||||
|
github.event_name != 'issue_comment'
|
||||||
|
|| github.actor == 'your-username'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
timeout-minutes: 30
|
||||||
|
steps:
|
||||||
|
- uses: docker://gitea.stevedudenhoeffer.com/steve/gadfly:latest
|
||||||
|
env:
|
||||||
|
GITEA_API: ${{ github.server_url }}/api/v1/repos/${{ github.repository }}
|
||||||
|
GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
|
||||||
|
# --- Claude Code engine ---
|
||||||
|
# Pro/Max subscription token (preferred). Or set ANTHROPIC_API_KEY
|
||||||
|
# instead for per-token billing — but never both.
|
||||||
|
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
|
||||||
|
# ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||||
|
# bare "claude-code" uses the CLI default model; "claude-code/<model>"
|
||||||
|
# sets --model (sonnet/opus/haiku, or a full id). One comment per entry.
|
||||||
|
GADFLY_MODELS: "claude-code/sonnet"
|
||||||
|
# Optional CLI tuning (defaults are read-only-safe):
|
||||||
|
# GADFLY_CLAUDE_PERMISSION_MODE: plan # read-only; never edits
|
||||||
|
# GADFLY_CLAUDE_ALLOWED_TOOLS: "Read,Grep,Glob"
|
||||||
|
# GADFLY_CLAUDE_EXTRA_ARGS: "--max-turns 30"
|
||||||
|
GADFLY_ALLOWED_USERS: "your-username"
|
||||||
|
# --- event context (leave as-is) ---
|
||||||
|
EVENT_NAME: ${{ github.event_name }}
|
||||||
|
PR: ${{ github.event.pull_request.number || github.event.issue.number || github.event.inputs.pr_number }}
|
||||||
|
PR_BRANCH: ${{ github.head_ref }}
|
||||||
|
IS_DRAFT: ${{ github.event.pull_request.draft }}
|
||||||
|
COMMENT_BODY: ${{ github.event.comment.body }}
|
||||||
|
COMMENT_ID: ${{ github.event.comment.id }}
|
||||||
|
ACTOR: ${{ github.actor }}
|
||||||
@@ -23,6 +23,12 @@
|
|||||||
# GADFLY_REPO_DIR (checked-out repo; default: this script's repo)
|
# GADFLY_REPO_DIR (checked-out repo; default: this script's repo)
|
||||||
# antigravity: `agy` on PATH with credentials already seeded (~/.gemini)
|
# antigravity: `agy` on PATH with credentials already seeded (~/.gemini)
|
||||||
#
|
#
|
||||||
|
# claude-code engine: when MODEL is "claude-code" or "claude-code/<model>" the
|
||||||
|
# binary shells out to the bundled `claude` CLI instead of a majordomo model.
|
||||||
|
# Its auth (CLAUDE_CODE_OAUTH_TOKEN, else ANTHROPIC_API_KEY) and GADFLY_CLAUDE_*
|
||||||
|
# tuning are read straight from the inherited environment — same as the other
|
||||||
|
# provider keys (OPENAI_API_KEY, …) — so no extra wiring is needed here.
|
||||||
|
#
|
||||||
# Optional:
|
# Optional:
|
||||||
# MAX_DIFF_CHARS diff truncation cap for the prompt (default 60000)
|
# MAX_DIFF_CHARS diff truncation cap for the prompt (default 60000)
|
||||||
# GADFLY_STATUS_FILE per-model JSON path for the live status board (set by
|
# GADFLY_STATUS_FILE per-model JSON path for the live status board (set by
|
||||||
|
|||||||
Reference in New Issue
Block a user