feat: claude-code reviewer engine (per-lens claude -p shell-out) #2

Merged
steve merged 2 commits from feat/claude-code-engine into main 2026-06-27 20:40:41 +00:00
Owner

Phase 1 of the gadfly-games build.

Adds a second review engine alongside the majordomo agent loop. For each lens it shells out to the Claude Code CLI (claude -p) inside the checked-out repo, so Claude Code verifies findings with its own read tools — then gadfly runs the same verdict-parse → recheck → consolidate → emit pipeline, unchanged.

Design

  • cmd/gadfly/engine.go — new reviewEngine interface with two impls:
    • majordomoEngine wraps the existing runAgent path (no behaviour change).
    • claudeCodeEngine execs claude -p … --output-format json and parses .result.
    • main.go's runSpecialists/reviewWithSpecialist are now engine-agnostic; the per-lens recheck reuses the same runPass primitive.
  • Selection: a model id of claude-code (CLI default model) or claude-code/<model> (suffix → --model, e.g. claude-code/sonnet).
  • Auth inherits from the env: Pro/Max via CLAUDE_CODE_OAUTH_TOKEN (no --bare), else ANTHROPIC_API_KEY. Read-only by default (--permission-mode plan); tunable via GADFLY_CLAUDE_PERMISSION_MODE / _ALLOWED_TOOLS / _EXTRA_ARGS / _MODEL / _BIN.
  • auto selection + the delegate_investigation worker are majordomo-only and are skipped with this engine.
  • Dockerfile bundles Node + @anthropic-ai/claude-code (notably larger image; ollama-only users pay only the size).

Also

  • Bumps the dogfood image pin to :sha-c3d09d3 so gadfly's own PRs now review with the live status board shipped in Phase 3 — this PR is the first to show the board.

Honesty

The CLI invocation is covered by unit tests, but a live subscription-auth run hasn't been validated end-to-end here, and subscription-auth-in-CI is a gray area in Anthropic's terms — flagged in the README and example. Phase 2 will add the pluggable backends (Ollama-through-the-harness, etc.) and actually exercise it.

Tests

engine_test.go covers spec detection, --model derivation, and argv building (no live CLI call). gofmt clean · go vet quiet · go build ok · go test -race ./... green · shells bash -n clean.

🤖 Generated with Claude Code

**Phase 1** of the gadfly-games build. Adds a second review **engine** alongside the majordomo agent loop. For each lens it shells out to the **Claude Code CLI** (`claude -p`) *inside the checked-out repo*, so Claude Code verifies findings with its own read tools — then gadfly runs the same verdict-parse → recheck → consolidate → emit pipeline, unchanged. ### Design - **`cmd/gadfly/engine.go`** — new `reviewEngine` interface with two impls: - `majordomoEngine` wraps the existing `runAgent` path (no behaviour change). - `claudeCodeEngine` execs `claude -p … --output-format json` and parses `.result`. - `main.go`'s `runSpecialists`/`reviewWithSpecialist` are now engine-agnostic; the per-lens recheck reuses the same `runPass` primitive. - **Selection**: a model id of `claude-code` (CLI default model) or `claude-code/<model>` (suffix → `--model`, e.g. `claude-code/sonnet`). - **Auth** inherits from the env: Pro/Max via `CLAUDE_CODE_OAUTH_TOKEN` (no `--bare`), else `ANTHROPIC_API_KEY`. Read-only by default (`--permission-mode plan`); tunable via `GADFLY_CLAUDE_PERMISSION_MODE` / `_ALLOWED_TOOLS` / `_EXTRA_ARGS` / `_MODEL` / `_BIN`. - `auto` selection + the `delegate_investigation` worker are majordomo-only and are skipped with this engine. - **Dockerfile** bundles Node + `@anthropic-ai/claude-code` (notably larger image; ollama-only users pay only the size). ### Also - Bumps the **dogfood image pin to `:sha-c3d09d3`** so gadfly's own PRs now review with the live status board shipped in Phase 3 — **this PR is the first to show the board**. ### Honesty The CLI invocation is covered by unit tests, but a live subscription-auth run hasn't been validated end-to-end here, and subscription-auth-in-CI is a gray area in Anthropic's terms — flagged in the README and example. Phase 2 will add the pluggable backends (Ollama-through-the-harness, etc.) and actually exercise it. ### Tests `engine_test.go` covers spec detection, `--model` derivation, and argv building (no live CLI call). `gofmt` clean · `go vet` quiet · `go build` ok · `go test -race ./...` green · shells `bash -n` clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-06-27 19:27:14 +00:00
feat: claude-code reviewer engine (per-lens claude -p shell-out)
Build & push image / build-and-push (pull_request) Successful in 46s
Adversarial Review (Gadfly) / review (pull_request) Failing after 30m16s
4237a18d09
Phase 1 of the gadfly-games build. Adds a second review engine alongside
the majordomo agent loop: for each lens, shell out to the Claude Code CLI
(`claude -p`) inside the checked-out repo so it verifies findings with
its OWN read tools, then reuse gadfly's verdict-parse + recheck +
consolidate + emit pipeline unchanged.

- cmd/gadfly/engine.go: new reviewEngine interface with two impls —
  majordomoEngine (wraps the existing runAgent path) and claudeCodeEngine
  (exec `claude -p ... --output-format json`, parse `.result`). main.go's
  runSpecialists/reviewWithSpecialist are now engine-agnostic.
- Select via a model id: `claude-code` (CLI default) or
  `claude-code/<model>` (suffix → --model). Auth inherits from the env:
  Pro/Max via CLAUDE_CODE_OAUTH_TOKEN (no --bare), else ANTHROPIC_API_KEY.
  Read-only by default (--permission-mode plan); tunable via GADFLY_CLAUDE_*.
- auto-select + delegate worker are majordomo-only and are skipped with
  this engine (Claude Code does its own legwork).
- Dockerfile bundles Node + @anthropic-ai/claude-code (larger image).
- Docs: README "Claude Code engine" section + config rows, examples/
  claude-code.yml stub, examples/README + CLAUDE.md updated. Honest note
  that subscription-auth-in-CI is untested here / a ToS gray area.
- Bumps the dogfood image pin to :sha-c3d09d3 so gadfly's own PRs now
  review with the live status board from Phase 3.

New engine_test.go covers spec detection, model derivation, and argv
building (no live CLI call). gofmt clean, go vet quiet, go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly — live review status

9/10 reviewers finished · updated 2026-06-27 19:59:36Z

deepseek-v4-pro:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — Minor issues
  • error-handling — Minor issues

gemma4:cloud · ollama-cloud — done

  • security — Blocking issues found
  • correctness — No material issues found
  • error-handling — Blocking issues found

glm-5.1:cloud · ollama-cloud — done

  • security — Minor issues
  • correctness — No material issues found
  • error-handling — Minor issues

glm-5.2:cloud · ollama-cloud — done

  • security — Minor issues
  • correctness — Minor issues
  • error-handling — Minor issues

gpt-oss:120b-cloud · ollama-cloud — done

  • security — No material issues found
  • ⚠️ correctness — could not complete
  • error-handling — No material issues found

kimi-k2.7-code:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — Minor issues
  • error-handling — Minor issues

m5/qwen3.6:35b-mlx · m5 — 2/3 lenses

  • security — No material issues found
  • ⚠️ correctness — could not complete
  • 🔄 error-handling — running

minimax-m3:cloud · ollama-cloud — done

  • security — Blocking issues found
  • correctness — Minor issues
  • error-handling — Minor issues

nemotron-3-super:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — Minor issues

qwen3-coder:480b-cloud · ollama-cloud — done

  • security — Minor issues
  • correctness — No material issues found
  • error-handling — No material issues found

Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.

<!-- gadfly-status-board --> ## 🪰 Gadfly — live review status 9/10 reviewers finished · updated 2026-06-27 19:59:36Z #### `deepseek-v4-pro:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **error-handling** — Minor issues #### `gemma4:cloud` · ollama-cloud — ✅ done - ✅ **security** — Blocking issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — Blocking issues found #### `glm-5.1:cloud` · ollama-cloud — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — No material issues found - ✅ **error-handling** — Minor issues #### `glm-5.2:cloud` · ollama-cloud — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — Minor issues - ✅ **error-handling** — Minor issues #### `gpt-oss:120b-cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ⚠️ **correctness** — could not complete - ✅ **error-handling** — No material issues found #### `kimi-k2.7-code:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **error-handling** — Minor issues #### `m5/qwen3.6:35b-mlx` · m5 — ⏳ 2/3 lenses - ✅ **security** — No material issues found - ⚠️ **correctness** — could not complete - 🔄 **error-handling** — running #### `minimax-m3:cloud` · ollama-cloud — ✅ done - ✅ **security** — Blocking issues found - ✅ **correctness** — Minor issues - ✅ **error-handling** — Minor issues #### `nemotron-3-super:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — Minor issues #### `qwen3-coder:480b-cloud` · ollama-cloud — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found <sub>Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.</sub>

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues

Through the security lens, the new claude-code engine is a command-injection surface and a privilege boundary. I verified the relevant paths against the actual code; the design is mostly sound, but two things deserve a fix.

  • cmd/gadfly/engine.go:82,92-103GADFLY_CLAUDE_* values are forwarded to the CLI with no validation, so the "read-only by default" guarantee is not enforced at the code level. The argv is built via exec.CommandContext(ctx, e.bin, e.args(...)...) (engine.go:114), which does not go through a shell, so there's no classic shell injection — good. But e.permissionMode = envOr("GADFLY_CLAUDE_PERMISSION_MODE", "plan") (line 82) is appended verbatim as --permission-mode (lines 97-98), GADFLY_CLAUDE_ALLOWED_TOOLS is forwarded as --allowedTools (lines 100-101), and e.extraArgs = strings.Fields(os.Getenv("GADFLY_CLAUDE_EXTRA_ARGS")) is appended verbatim (line 103). plan is only the default, not enforced: an operator (or a compromised/mis-set env) can flip the CLI into a write-capable mode (GADFLY_CLAUDE_PERMISSION_MODE=acceptEdits, or --allowedTools Write,Bash,Edit). In the threat model these are operator-set env vars (not PR-author-controlled), so this is low severity, but the README's "read-only by default" premise is not enforced at the code level. Suggested fix: validate permissionMode against an allowlist (plan, and optionally view/bypassPermissions with a warning), and warn/reject when a non-read-only mode or an --allowedTools containing Write/Bash/Edit is selected.

  • cmd/gadfly/engine.go:116 — full process env (os.Environ()) is inherited by the claude subprocess, including GITEA_TOKEN. Verified: entrypoint.sh:60-61 requires GITEA_TOKEN, and scripts/run.sh:164-174 invokes the gadfly binary inheriting the script's environment (which contains GITEA_TOKEN, GADFLY_FINDINGS_TOKEN, OLLAMA_API_KEY, etc.). The claude -p CLI then runs with all of those visible in its environment, and the CLI is an LLM agent with Read/Grep/Glob tools pointed at the checked-out repo tree (cmd.Dir = e.repoDir, line 115). A model prompted via attacker-influenced diff content to "verify findings against real code" can be induced to read .env files or any file in the tree, and the model's own tool output flows back into the PR comment via renderConsolidated (main.go:210). This is a credential-leakage path the majordomo path does not have — repoFS tools (tools.go) are sandboxed to the repo directory and do not expose process env. There is no env allowlist at engine.go:116. The 800-char truncateForErr (line 150) only caps error detail, not successful res.Result output, so it is not a sufficient mitigation. Suggested fix: build an explicit minimal env for the subprocess (only CLAUDE_CODE_OAUTH_TOKEN/ANTHROPIC_API_KEY + needed HOME/PATH/GADFLY_CLAUDE_*), rather than os.Environ(), so the Gitea token and other provider keys can't be exfiltrated via the agent's read tools or surfaced in output.

No other material security issues in this lens. The exec.CommandContext use avoids shell injection; cmd.Dir is set to the checked-out repo (operator-controlled); and isClaudeCodeSpec (engine.go:64-67) is a plain equality/prefix check with no globbing, so no path-injection there.

(Both findings above were verified by reading engine.go, entrypoint.sh, scripts/run.sh, tools.go, and main.go.)

🎯 Correctness — Minor issues

Both findings confirmed against the actual code.

VERDICT: Minor issues found

  • cmd/gadfly/main.go:434 & cmd/gadfly/recheck.go:87The shared task/recheck prompts name majordomo-specific tools that the claude-code engine does not have. I verified buildTask (line 444) instructs the reviewer to "use your tools (read_file, grep, find_files)", and both buildTask (line 434) and buildRecheckTask (line 87) emit the note "call get_diff for the full text" when the diff exceeds defaultMaxDiffChars (60000, main.go:87). These are gadfly's in-process majordomo tool names. The claudeCodeEngine execs the claude CLI, which provides its own differently-named tools (Read/Grep/Glob) and has no get_diff tool (confirmed engine.go:113-147 only sets cwd + argv; it does not register any gadfly tools). Concrete impact: (1) the prompt tells the model to use read_file/find_files/get_diff, which don't exist in that environment — at best noise, at worst the model wastes turns or fails to verify; (2) on any diff > 60k chars the reviewer is told to "call get_diff" to recover the truncated portion, but it cannot, so the claude-code engine silently reviews against an incomplete diff and can neither see nor flag the missing part. The majordomo path is unaffected because it does expose those tools. Suggested fix: make the tool-name hint and the truncation-recovery note engine-aware (e.g. a toolHint/fullDiffHint the engine supplies), or have claudeCodeEngine inject the full diff into the task instead of relying on a get_diff call.

  • cmd/gadfly/main.go:187Type assertion eng.(*majordomoEngine).mdl is correct but fragile. Verified: in the auto && !ccSpec branch eng is guaranteed to be *majordomoEngine (set at main.go:172, with ccSpec false), so the assertion cannot panic. No bug today, but it relies on the invariant that the non-claude branch is the only other engine impl; a third engine would silently break. Not material to this PR — noted only because I verified it's safe.

🧯 Error handling & edge cases — Minor issues

Minor issues found

  • cmd/gadfly/engine.go:113-121claudeCodeEngine.runPass shells out with exec.CommandContext but relies on Go's default child-process handling: when ctx expires (per-lens timeout in reviewWithSpecialist, main.go:311), cmd is killed but the claude CLI may have spawned its own child processes (Node). Those grandchildren are not killed by default (no Setpgid/SysProcAttr, confirmed absent in the package), so a timed-out lens can leave orphaned claude processes holding an OAuth session slot. Inside a multi-lens, multi-model CI run this can pile up background processes. Suggested fix: set cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true} and kill the whole group on context cancel (or cmd.Cancel + cmd.WaitDelay), so the CLI tree dies with the parent. Severity is minor because it only manifests on timeout/unhappy paths and the container is torn down after the run.

  • cmd/gadfly/engine.go:125-132 — The structured-answer path returns success whenever json.Unmarshal succeeds and res.Result is non-empty, even if runErr != nil (it only checks res.IsError). If the CLI exits non-zero but still emits a JSON object on stdout with a populated result field (no is_error), runPass swallows runErr and returns that text as a valid draft. For an advisory reviewer this degrades to plausible-looking output instead of an error notice; the recheck pass would then treat it as a real draft. Suggested fix: if runErr != nil, prefer the error branch (fall through to runErr handling) instead of trusting the stdout JSON, or at least require !res.IsError && runErr == nil before accepting.

  • cmd/gadfly/engine.go:153-154 (truncateForErr)s[:max] slices bytes, so truncation can split a multi-byte UTF-8 rune mid-sequence and produce invalid UTF-8 in the error string (which is later rendered into the review comment / stderr). Edge case, cosmetic. Suggested fix: use a rune-safe truncation, e.g. loop over utf8.DecodeRuneInString.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 8s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> ## VERDICT: Minor issues Through the security lens, the new `claude-code` engine is a command-injection surface and a privilege boundary. I verified the relevant paths against the actual code; the design is mostly sound, but two things deserve a fix. - **`cmd/gadfly/engine.go:82,92-103` — `GADFLY_CLAUDE_*` values are forwarded to the CLI with no validation, so the "read-only by default" guarantee is not enforced at the code level.** The argv is built via `exec.CommandContext(ctx, e.bin, e.args(...)...)` (`engine.go:114`), which does **not** go through a shell, so there's no classic shell injection — good. But `e.permissionMode = envOr("GADFLY_CLAUDE_PERMISSION_MODE", "plan")` (line 82) is appended verbatim as `--permission-mode` (lines 97-98), `GADFLY_CLAUDE_ALLOWED_TOOLS` is forwarded as `--allowedTools` (lines 100-101), and `e.extraArgs = strings.Fields(os.Getenv("GADFLY_CLAUDE_EXTRA_ARGS"))` is appended verbatim (line 103). `plan` is only the *default*, not enforced: an operator (or a compromised/mis-set env) can flip the CLI into a write-capable mode (`GADFLY_CLAUDE_PERMISSION_MODE=acceptEdits`, or `--allowedTools Write,Bash,Edit`). In the threat model these are operator-set env vars (not PR-author-controlled), so this is low severity, but the README's "read-only by default" premise is not enforced at the code level. Suggested fix: validate `permissionMode` against an allowlist (`plan`, and optionally `view`/`bypassPermissions` with a warning), and warn/reject when a non-read-only mode or an `--allowedTools` containing `Write`/`Bash`/`Edit` is selected. - **`cmd/gadfly/engine.go:116` — full process env (`os.Environ()`) is inherited by the `claude` subprocess, including `GITEA_TOKEN`.** Verified: `entrypoint.sh:60-61` requires `GITEA_TOKEN`, and `scripts/run.sh:164-174` invokes the gadfly binary inheriting the script's environment (which contains `GITEA_TOKEN`, `GADFLY_FINDINGS_TOKEN`, `OLLAMA_API_KEY`, etc.). The `claude -p` CLI then runs with all of those visible in its environment, and the CLI is an LLM agent with `Read`/`Grep`/`Glob` tools pointed at the checked-out repo tree (`cmd.Dir = e.repoDir`, line 115). A model prompted via attacker-influenced diff content to "verify findings against real code" can be induced to read `.env` files or any file in the tree, and the model's own tool output flows back into the PR comment via `renderConsolidated` (`main.go:210`). This is a credential-leakage path the majordomo path does not have — `repoFS` tools (`tools.go`) are sandboxed to the repo directory and do not expose process env. There is no env allowlist at `engine.go:116`. The 800-char `truncateForErr` (line 150) only caps *error* detail, not successful `res.Result` output, so it is not a sufficient mitigation. Suggested fix: build an explicit minimal env for the subprocess (only `CLAUDE_CODE_OAUTH_TOKEN`/`ANTHROPIC_API_KEY` + needed `HOME`/`PATH`/`GADFLY_CLAUDE_*`), rather than `os.Environ()`, so the Gitea token and other provider keys can't be exfiltrated via the agent's read tools or surfaced in output. No other material security issues in this lens. The `exec.CommandContext` use avoids shell injection; `cmd.Dir` is set to the checked-out repo (operator-controlled); and `isClaudeCodeSpec` (`engine.go:64-67`) is a plain equality/prefix check with no globbing, so no path-injection there. (Both findings above were verified by reading `engine.go`, `entrypoint.sh`, `scripts/run.sh`, `tools.go`, and `main.go`.) </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> Both findings confirmed against the actual code. **VERDICT: Minor issues found** - `cmd/gadfly/main.go:434` & `cmd/gadfly/recheck.go:87` — **The shared task/recheck prompts name majordomo-specific tools that the claude-code engine does not have.** I verified `buildTask` (line 444) instructs the reviewer to "use your tools (read_file, grep, find_files)", and both `buildTask` (line 434) and `buildRecheckTask` (line 87) emit the note "call `get_diff` for the full text" when the diff exceeds `defaultMaxDiffChars` (60000, `main.go:87`). These are gadfly's in-process majordomo tool names. The `claudeCodeEngine` execs the `claude` CLI, which provides its own differently-named tools (`Read`/`Grep`/`Glob`) and has **no** `get_diff` tool (confirmed `engine.go:113-147` only sets cwd + argv; it does not register any gadfly tools). Concrete impact: (1) the prompt tells the model to use `read_file`/`find_files`/`get_diff`, which don't exist in that environment — at best noise, at worst the model wastes turns or fails to verify; (2) on any diff > 60k chars the reviewer is told to "call `get_diff`" to recover the truncated portion, but it cannot, so the claude-code engine silently reviews against an incomplete diff and can neither see nor flag the missing part. The majordomo path is unaffected because it does expose those tools. Suggested fix: make the tool-name hint and the truncation-recovery note engine-aware (e.g. a `toolHint`/`fullDiffHint` the engine supplies), or have `claudeCodeEngine` inject the full diff into the task instead of relying on a `get_diff` call. - `cmd/gadfly/main.go:187` — **Type assertion `eng.(*majordomoEngine).mdl` is correct but fragile.** Verified: in the `auto && !ccSpec` branch `eng` is guaranteed to be `*majordomoEngine` (set at `main.go:172`, with `ccSpec` false), so the assertion cannot panic. No bug today, but it relies on the invariant that the non-claude branch is the only other engine impl; a third engine would silently break. Not material to this PR — noted only because I verified it's safe. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> **Minor issues found** - **`cmd/gadfly/engine.go:113-121`** — `claudeCodeEngine.runPass` shells out with `exec.CommandContext` but relies on Go's default child-process handling: when `ctx` expires (per-lens timeout in `reviewWithSpecialist`, `main.go:311`), `cmd` is killed but the `claude` CLI may have spawned its own child processes (Node). Those grandchildren are *not* killed by default (no `Setpgid`/`SysProcAttr`, confirmed absent in the package), so a timed-out lens can leave orphaned `claude` processes holding an OAuth session slot. Inside a multi-lens, multi-model CI run this can pile up background processes. Suggested fix: set `cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}` and kill the whole group on context cancel (or `cmd.Cancel` + `cmd.WaitDelay`), so the CLI tree dies with the parent. Severity is minor because it only manifests on timeout/unhappy paths and the container is torn down after the run. - **`cmd/gadfly/engine.go:125-132`** — The structured-answer path returns success whenever `json.Unmarshal` succeeds and `res.Result` is non-empty, *even if `runErr != nil`* (it only checks `res.IsError`). If the CLI exits non-zero but still emits a JSON object on stdout with a populated `result` field (no `is_error`), `runPass` swallows `runErr` and returns that text as a valid draft. For an advisory reviewer this degrades to plausible-looking output instead of an error notice; the recheck pass would then treat it as a real draft. Suggested fix: if `runErr != nil`, prefer the error branch (fall through to `runErr` handling) instead of trusting the stdout JSON, or at least require `!res.IsError && runErr == nil` before accepting. - **`cmd/gadfly/engine.go:153-154` (`truncateForErr`)** — `s[:max]` slices bytes, so truncation can split a multi-byte UTF-8 rune mid-sequence and produce invalid UTF-8 in the error string (which is later rendered into the review comment / stderr). Edge case, cosmetic. Suggested fix: use a rune-safe truncation, e.g. loop over `utf8.DecodeRuneInString`. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 8s</sub>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

VERDICT: Blocking issues found

  • cmd/gadfly/engine.go:103extraArgs appended after --permission-mode plan defeats the read-only default. args() builds --permission-mode plan and then appends e.extraArgs (from GADFLY_CLAUDE_EXTRA_ARGS, strings.Fields-split). Most CLIs (and Claude Code specifically) treat the last occurrence of a flag as the winning one, so an attacker who can set env vars in CI can use GADFLY_CLAUDE_EXTRA_ARGS="--permission-mode bypassPermissions" to turn the subprocess fully write-capable — let it edit files, run arbitrary shell commands, exfiltrate secrets, etc. The "read-only by default" claim in the README/example is bypassed by the documented tuning knob. Verified by reading engine.go:92-104 (extraArgs is the final append, after --permission-mode) and runPass (cmd.Dir = e.repoDir, full env inherited). Fix: either reject permission-mode inside extraArgs, strip --permission-mode overrides from extraArgs, or place the user overrides before the safe default. Consider the same for --allowedTools.

  • cmd/gadfly/engine.go:116 — full os.Environ() is passed to the claude subprocess, leaking the Gitea token into a third-party CLI. cmd.Env = os.Environ() carries GITEA_TOKEN, GITEA_API, GADFLY_API_KEY, GADFLY_FINDINGS_TOKEN, and any other secrets the runner inherited (confirmed via grep — no other cmd.Env sanitization anywhere in the repo). Claude Code is a third-party node binary installed via npm install -g @anthropic-ai/claude-code (verified in Dockerfile:32); its telemetry / error reporting path may observe and transmit env. The entrypoint already filters/handles env separately for the worker (entrypoint.sh:139-142 only re-exports a small set), but the binary does not. Fix: build a minimal env (only PATH, HOME, CLAUDE_CODE_OAUTH_TOKEN / ANTHROPIC_API_KEY, locale, and CLAUDE_CODE_* knobs), dropping GITEA_*, GADFLY_API_KEY, GADFLY_FINDINGS_TOKEN, etc.

  • cmd/gadfly/engine.go:79,114GADFLY_CLAUDE_BIN is used as a literal executable with no validation. With the env inherited, anyone able to set env (a privileged operator or any compromise of the env namespace) can point bin at any binary, which then runs in e.repoDir with the full env — full RCE as the review job. Expected scope for this knob is "which claude binary", so enforce that it resolves to the bundled claude install (e.g. filepath.EvalSymlinks check, or refuse paths outside /usr/local/bin/the npm prefix), or document and accept the trust assumption.

🎯 Correctness — Minor issues

VERDICT: Minor issues

  • cmd/gadfly/engine.go:113-147runPass swallows context cancellation when JSON parsing succeeds. After cmd.Run() (which respects ctx via exec.CommandContext), the code goes straight to json.Unmarshal(...) and returns res.Result without consulting ctx.Err(). If the CLI flushes a JSON envelope just before the timeout fires, the caller (reviewWithSpecialist at main.go:316) only checks err and will emit the truncated/random text as a real review. Low-likelihood but real; fix is an early if err := ctx.Err(); err != nil { return "", err } before the JSON-parse branch, plus treating runErr != nil && ctx.Err() != nil as a context error rather than a generic claude -p failed.

  • cmd/gadfly/engine.go:129strings.TrimSpace(res.Result) collapses legitimate single-space/blank results. A guardrail/blocked response that prints {"result": " "} yields out == "" here and falls through to the raw := strings.TrimSpace(stdout.String()) fallback, returning the entire raw JSON envelope ({"result":" ", "is_error":true, ...}) as the "review" — parseVerdict then tries to read it as review markdown. Same shape as the cancellation finding: the degraded-mode fallback isn't context-/error-aware. Fix: on IsError == true always go through the error path (don't trust res.Result as an answer); or detect that the JSON parse succeeded but res.Result was empty + runErr == nil and treat that as a parse-failure error too.

  • cmd/gadfly/engine.go:84strings.Fields(GADFLY_CLAUDE_EXTRA_ARGS) silently splits quoted arguments. The README's "passed verbatim" promise is misleading: a user-supplied --append-system-prompt "system prompt text" becomes four argv tokens. The README/example currently only shows --max-turns 30 (no quotes needed), so this is documentation/behavior drift rather than a code bug, but worth either documenting GADFLY_CLAUDE_EXTRA_ARGS as shell-tokenized or parsing with github.com/google/shlex / kballard/go-shellquote.

  • cmd/gadfly/main.go:187Type assertion eng.(*majordomoEngine).mdl has no safety net. Guarded by ccSpec == false in the enclosing else branch today, so it can't panic on the current call site — but it's a fragility foot-gun: if the auto-select block is ever refactored so eng could be a claudeCodeEngine, this panics. Minor; hoisting var mdl *majordomoEngine in the non-cc branch (or just threading the model through) would be clearer.

  • scripts/run.sh:56-59Cosmetic: MODEL_PROVIDER for claude-code/<model> displays as claude-code in the comment header. entrypoint.sh:181 always sets PROVIDER=ollama, and run.sh derives MODEL_PROVIDER="${MODEL%%/*}", so claude-code/sonnet renders provider claude-code while every other model gets a real backend name (m1pro, ollama-cloud, …). Cosmetic only — no logic bug — but a reader of the comment header will be confused.

🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

Findings (🧯 Error handling & edge cases)

  • truncateForErr slices a string at an arbitrary byte boundarycmd/gadfly/engine.go:149-157

    const max = 800
    if len(s) > max {
        return s[:max] + "…"
    }
    

    max = 800 is a byte index, but s is arbitrary CLI output (the claude CLI emits UTF-8 — model names like claude-opus-4-5, accented words, emoji). If byte 800 lands mid-rune, s[:800] returns invalid UTF-8, then a literal (itself 3 bytes) is appended next to corrupted bytes. The string is interpolated into an fmt.Errorf that flows to the inline ⚠️ reviewer failed notice.
    Fix: truncate on a rune boundary, e.g. if utf8.RuneCountInString(s) > max { return string([]rune(s)[:max]) + "…" }, or keep byte-slicing but trim the trailing partial rune with utf8.DecodeLastRuneInString.

  • GADFLY_CLAUDE_EXTRA_ARGS is split with strings.Fields, silently corrupting quoted valuescmd/gadfly/engine.go:84

    extraArgs: strings.Fields(os.Getenv("GADFLY_CLAUDE_EXTRA_ARGS")),
    

    A user setting --append-system-prompt "hello world" (or --system-prompt-file /tmp/path with spaces/x.txt, etc.) gets three tokens: --append-system-prompt, "hello, world". The CLI then either errors on a malformed flag or silently misbehaves. The unit test (TestClaudeCodeArgs, engine_test.go:70-101) only exercises a value with no whitespace (--max-turns 30), so this edge case is uncovered.
    Fix: if you want verbatim passthrough, document that and use a shell-style split (shlex.Split) or accept the env as a \n-delimited list. At minimum, add a test with a value containing spaces to lock in the contract.

  • eng.(*majordomoEngine) type assertion is locally safe but brittlecmd/gadfly/main.go:187

    selector, serr := resolveSelectorModel(eng.(*majordomoEngine).mdl)
    

    Only reached when ccSpec is false (it's inside the else arm at lines 186-200), so today it cannot panic — but the assertion sits next to the ccSpec branch that builds the engine, and a future refactor that lifts it (or adds a third engine) will turn this into a runtime panic with no test to catch it. The existing panic-recovery in runSpecialists (lines 256-261) would catch it but emit a noisy ⚠️ This reviewer panicked per lens instead of a clear "auto-select requires the majordomo engine" message.
    Fix: either expose mdl llm.Model on the interface (via a model() accessor with a concrete return), or gate the assertion explicitly with if e, ok := eng.(*majordomoEngine); ok { … } and fall back to the default suite otherwise.

Verified clean in this lens

  • runPass failure paths: stderr/stdout truncation is applied to error detail before fmt.Errorf; non-fatal (rendered as an inline notice, never sinks the job). cmd/gadfly/engine.go:113-147.
  • runSpecialists panic recovery (lines 256-261) is preserved across the engine refactor.
  • reviewWithSpecialist timeout + cancel are correctly scoped per-lens (lines 310-312).
  • Recheck-pass failures fall back to the draft with a stderr notice (lines 325-329) — explicit, non-fatal, by design.
  • claude-code engine correctly skips the worker model + auto-select (lines 158-172, 183-185) without leaving stale state on fsTools.worker (the ccSpec branch never assigns fsTools.worker, leaving it as the zero value nil).
  • os.Environ() passthrough inherits auth tokens without leaking them into argv; the only env-derived strings that reach os/exec are bin (defaulted) and extraArgs (user-supplied).

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 31s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Blocking issues found</summary> VERDICT: Blocking issues found - **`cmd/gadfly/engine.go:103` — `extraArgs` appended *after* `--permission-mode plan` defeats the read-only default.** `args()` builds `--permission-mode plan` and then appends `e.extraArgs` (from `GADFLY_CLAUDE_EXTRA_ARGS`, `strings.Fields`-split). Most CLIs (and Claude Code specifically) treat the last occurrence of a flag as the winning one, so an attacker who can set env vars in CI can use `GADFLY_CLAUDE_EXTRA_ARGS="--permission-mode bypassPermissions"` to turn the subprocess fully write-capable — let it edit files, run arbitrary shell commands, exfiltrate secrets, etc. The "read-only by default" claim in the README/example is bypassed by the documented tuning knob. Verified by reading `engine.go:92-104` (extraArgs is the final `append`, after `--permission-mode`) and `runPass` (`cmd.Dir = e.repoDir`, full env inherited). Fix: either reject `permission-mode` inside `extraArgs`, strip `--permission-mode` overrides from `extraArgs`, or place the user overrides *before* the safe default. Consider the same for `--allowedTools`. - **`cmd/gadfly/engine.go:116` — full `os.Environ()` is passed to the `claude` subprocess, leaking the Gitea token into a third-party CLI.** `cmd.Env = os.Environ()` carries `GITEA_TOKEN`, `GITEA_API`, `GADFLY_API_KEY`, `GADFLY_FINDINGS_TOKEN`, and any other secrets the runner inherited (confirmed via `grep` — no other `cmd.Env` sanitization anywhere in the repo). Claude Code is a third-party node binary installed via `npm install -g @anthropic-ai/claude-code` (verified in `Dockerfile:32`); its telemetry / error reporting path may observe and transmit env. The entrypoint already filters/handles env separately for the worker (`entrypoint.sh:139-142` only re-exports a small set), but the binary does not. Fix: build a minimal env (only `PATH`, `HOME`, `CLAUDE_CODE_OAUTH_TOKEN` / `ANTHROPIC_API_KEY`, locale, and `CLAUDE_CODE_*` knobs), dropping `GITEA_*`, `GADFLY_API_KEY`, `GADFLY_FINDINGS_TOKEN`, etc. - **`cmd/gadfly/engine.go:79,114` — `GADFLY_CLAUDE_BIN` is used as a literal executable with no validation.** With the env inherited, anyone able to set env (a privileged operator or any compromise of the env namespace) can point `bin` at any binary, which then runs in `e.repoDir` with the full env — full RCE as the review job. Expected scope for this knob is "which `claude` binary", so enforce that it resolves to the bundled `claude` install (e.g. `filepath.EvalSymlinks` check, or refuse paths outside `/usr/local/bin`/the npm prefix), or document and accept the trust assumption. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> ## VERDICT: Minor issues - `cmd/gadfly/engine.go:113-147` — **`runPass` swallows context cancellation when JSON parsing succeeds.** After `cmd.Run()` (which respects `ctx` via `exec.CommandContext`), the code goes straight to `json.Unmarshal(...)` and returns `res.Result` without consulting `ctx.Err()`. If the CLI flushes a JSON envelope just before the timeout fires, the caller (`reviewWithSpecialist` at `main.go:316`) only checks `err` and will emit the truncated/random text as a real review. Low-likelihood but real; fix is an early `if err := ctx.Err(); err != nil { return "", err }` before the JSON-parse branch, plus treating `runErr != nil && ctx.Err() != nil` as a context error rather than a generic `claude -p failed`. - `cmd/gadfly/engine.go:129` — **`strings.TrimSpace(res.Result)` collapses legitimate single-space/blank results.** A guardrail/blocked response that prints `{"result": " "}` yields `out == ""` here and falls through to the `raw := strings.TrimSpace(stdout.String())` fallback, returning the entire raw JSON envelope (`{"result":" ", "is_error":true, ...}`) as the "review" — `parseVerdict` then tries to read it as review markdown. Same shape as the cancellation finding: the degraded-mode fallback isn't context-/error-aware. Fix: on `IsError == true` always go through the error path (don't trust `res.Result` as an answer); or detect that the JSON parse succeeded but `res.Result` was empty + `runErr == nil` and treat that as a parse-failure error too. - `cmd/gadfly/engine.go:84` — **`strings.Fields(GADFLY_CLAUDE_EXTRA_ARGS)` silently splits quoted arguments.** The README's "passed verbatim" promise is misleading: a user-supplied `--append-system-prompt "system prompt text"` becomes four argv tokens. The README/example currently only shows `--max-turns 30` (no quotes needed), so this is documentation/behavior drift rather than a code bug, but worth either documenting `GADFLY_CLAUDE_EXTRA_ARGS` as shell-tokenized or parsing with `github.com/google/shlex` / `kballard/go-shellquote`. - `cmd/gadfly/main.go:187` — **Type assertion `eng.(*majordomoEngine).mdl` has no safety net.** Guarded by `ccSpec == false` in the enclosing `else` branch today, so it can't panic on the current call site — but it's a fragility foot-gun: if the auto-select block is ever refactored so `eng` could be a `claudeCodeEngine`, this panics. Minor; hoisting `var mdl *majordomoEngine` in the non-cc branch (or just threading the model through) would be clearer. - `scripts/run.sh:56-59` — **Cosmetic: `MODEL_PROVIDER` for `claude-code/<model>` displays as `claude-code` in the comment header.** `entrypoint.sh:181` always sets `PROVIDER=ollama`, and run.sh derives `MODEL_PROVIDER="${MODEL%%/*}"`, so `claude-code/sonnet` renders provider `claude-code` while every other model gets a real backend name (m1pro, ollama-cloud, …). Cosmetic only — no logic bug — but a reader of the comment header will be confused. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> ## VERDICT: Minor issues ### Findings (🧯 Error handling & edge cases) - **`truncateForErr` slices a string at an arbitrary byte boundary** — `cmd/gadfly/engine.go:149-157` ```go const max = 800 if len(s) > max { return s[:max] + "…" } ``` `max = 800` is a byte index, but `s` is arbitrary CLI output (the `claude` CLI emits UTF-8 — model names like `claude-opus-4-5`, accented words, emoji). If byte 800 lands mid-rune, `s[:800]` returns invalid UTF-8, then a literal `…` (itself 3 bytes) is appended next to corrupted bytes. The string is interpolated into an `fmt.Errorf` that flows to the inline `⚠️ reviewer failed` notice. **Fix:** truncate on a rune boundary, e.g. `if utf8.RuneCountInString(s) > max { return string([]rune(s)[:max]) + "…" }`, or keep byte-slicing but trim the trailing partial rune with `utf8.DecodeLastRuneInString`. - **`GADFLY_CLAUDE_EXTRA_ARGS` is split with `strings.Fields`, silently corrupting quoted values** — `cmd/gadfly/engine.go:84` ```go extraArgs: strings.Fields(os.Getenv("GADFLY_CLAUDE_EXTRA_ARGS")), ``` A user setting `--append-system-prompt "hello world"` (or `--system-prompt-file /tmp/path with spaces/x.txt`, etc.) gets three tokens: `--append-system-prompt`, `"hello`, `world"`. The CLI then either errors on a malformed flag or silently misbehaves. The unit test (`TestClaudeCodeArgs`, `engine_test.go:70-101`) only exercises a value with no whitespace (`--max-turns 30`), so this edge case is uncovered. **Fix:** if you want verbatim passthrough, document that and use a shell-style split (`shlex.Split`) or accept the env as a `\n`-delimited list. At minimum, add a test with a value containing spaces to lock in the contract. - **`eng.(*majordomoEngine)` type assertion is locally safe but brittle** — `cmd/gadfly/main.go:187` ```go selector, serr := resolveSelectorModel(eng.(*majordomoEngine).mdl) ``` Only reached when `ccSpec` is false (it's inside the `else` arm at lines 186-200), so today it cannot panic — but the assertion sits next to the `ccSpec` branch that builds the engine, and a future refactor that lifts it (or adds a third engine) will turn this into a runtime panic with no test to catch it. The existing panic-recovery in `runSpecialists` (lines 256-261) would catch it but emit a noisy `⚠️ This reviewer panicked` per lens instead of a clear "auto-select requires the majordomo engine" message. **Fix:** either expose `mdl llm.Model` on the interface (via a `model()` accessor with a concrete return), or gate the assertion explicitly with `if e, ok := eng.(*majordomoEngine); ok { … }` and fall back to the default suite otherwise. ### Verified clean in this lens - `runPass` failure paths: stderr/stdout truncation is applied to error detail before `fmt.Errorf`; non-fatal (rendered as an inline notice, never sinks the job). `cmd/gadfly/engine.go:113-147`. - `runSpecialists` panic recovery (lines 256-261) is preserved across the engine refactor. - `reviewWithSpecialist` timeout + cancel are correctly scoped per-lens (lines 310-312). - Recheck-pass failures fall back to the draft with a stderr notice (lines 325-329) — explicit, non-fatal, by design. - `claude-code` engine correctly skips the worker model + auto-select (lines 158-172, 183-185) without leaving stale state on `fsTools.worker` (the ccSpec branch never assigns `fsTools.worker`, leaving it as the zero value `nil`). - `os.Environ()` passthrough inherits auth tokens without leaking them into argv; the only env-derived strings that reach `os/exec` are `bin` (defaulted) and `extraArgs` (user-supplied). </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 31s</sub>

🪰 Gadfly review — m5/qwen3.6:35b-mlx (m5)

Reviewing… this comment will update with findings and run time.

<!-- gadfly-review:ollama:m5/qwen3.6:35b-mlx --> ### 🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5) ⏳ Reviewing… this comment will update with findings and run time.

🪰 Gadfly review — glm-5.1:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

GADFLY_BIN is a shell script variable (in scripts/run.sh), not a Go-level concern. The draft's finding 3 acknowledges it's the same pattern and "low risk since it's admin-controlled."

Let me now verify the exact claim about "commander" — the draft says "Most CLI parsers (including Claude Code's commander)" — this is a claim about an external package. Let me check if there's any documentation or code that confirms this.

The @anthropic-ai/claude-code package is installed via npm in the Dockerfile. It's not in this repo. The claim about commander being the CLI parser is speculative but reasonable for Node.js CLIs. More importantly, the underlying issue — that extraArgs can override --permission-mode plan — is real regardless of which CLI parser is used. Even if "last wins" isn't how commander works, many CLI parsers do honor the last flag, and even those that don't may still be subvertable by other means (like --dangerously-skip-permissions).

Now let me verify the README's claim about safety:

Looking at the README at line 104: GADFLY_CLAUDE_PERMISSION_MODE | plan | --permission-mode (read-only plan keeps it from editing)

The README advertises plan as a safety feature but doesn't mention that EXTRA_ARGS can override it. This is a valid documentation gap.

Let me now make my final assessment:

Finding 1: Valid. extraArgs are appended after --permission-mode plan, so GADFLY_CLAUDE_EXTRA_ARGS="--permission-mode default" could potentially override the safety default. The README advertises plan as a safety feature without noting this caveat. The claim about commander is speculative (external package) but the core issue is real. Minor severity since these are admin-controlled env vars.

Finding 2: Valid. GADFLY_CLAUDE_ALLOWED_TOOLS is passed verbatim with no validation. Under --permission-mode plan, tools like Bash and Write should be restricted, but the code doesn't enforce this. Minor severity — defense-in-depth concern.

Finding 3: Valid but the lowest value. GADFLY_CLAUDE_BIN is exec'd without path validation, same pattern as GADFLY_BIN. The draft itself says "low risk since it's admin-controlled." This is more of an observation than an actionable finding. I'll keep it but it's very minor.

All three findings survive verification. The verdict remains "Minor issues."

VERDICT: Minor issues

  • cmd/gadfly/engine.go:92–103GADFLY_CLAUDE_EXTRA_ARGS can override --permission-mode plan safety default. extraArgs are appended after --permission-mode plan in the argv (return append(a, e.extraArgs...)). Most CLI parsers apply the last occurrence of a flag, so GADFLY_CLAUDE_EXTRA_ARGS="--permission-mode default" could silently downgrade from read-only to full write access. No validation or documentation warns about this. The README (line 104) advertises plan as a safety feature ("read-only plan keeps it from editing") without noting that EXTRA_ARGS can negate it. Fix: either validate extraArgs against overrides of --permission-mode / --dangerously-skip-permissions, or append --permission-mode after extraArgs so the hard-coded default always wins.

  • cmd/gadfly/engine.go:83,100–101 — No validation on GADFLY_CLAUDE_ALLOWED_TOOLS values. The env var is passed verbatim as --allowedTools. A value like Bash,Write,Edit could grant write/execute capability. Under --permission-mode plan this should be vetoed, but the code doesn't enforce that, and the EXTRA_ARGS override above makes it moot. A brief validation or at least a README warning that this knob should not expand beyond read-only tool names would add defense-in-depth.

  • cmd/gadfly/engine.go:79,114GADFLY_CLAUDE_BIN allows arbitrary binary execution. The env var is passed directly to exec.CommandContext. A mis-set value (or a compromised CI variable) would execute an attacker-chosen binary with the full container environment (including GITEA_TOKEN, CLAUDE_CODE_OAUTH_TOKEN, etc.). This is the same threat model as the pre-existing GADFLY_BIN, but there is no path resolution safety (e.g., checking it resolves to an absolute path or is on a safe list). Low risk since it's admin-controlled, but worth noting for defense-in-depth.

🎯 Correctness — No material issues found

VERDICT: No material issues found

I reviewed the diff through the correctness lens, verifying env var flow, CLI argument construction, JSON parsing logic, and branching in main.go. Key things I checked:

  • GADFLY_MODEL (singular) vs GADFLY_MODELS (plural): Not a bug — entrypoint.sh iterates GADFLY_MODELS, sets MODEL per iteration, and run.sh maps MODELGADFLY_MODEL for the Go binary. Consistent.

  • isClaudeCodeSpec branching: The ccSpec guard in main.go:156 correctly prevents the eng.(*majordomoEngine).mdl type assertion (line 187) from ever executing for a claude-code engine — that code path only runs when ccSpec is false.

  • CLI argv construction (args() method): Arguments are passed via exec.Command (not a shell), so no injection or quoting issues. The order and flags are syntactically correct.

  • claudeResult JSON parsing: The struct tags (result, is_error, subtype) match the documented Claude Code JSON output format. The fallback logic (try JSON → try raw stdout → error) degrades gracefully.

  • truncateForErr uses len() (bytes) rather than rune count, so it could split a multi-byte UTF-8 character at position 800. Impact: a garbled byte in an error message trailer. Not material.

  • strings.Fields on GADFLY_CLAUDE_EXTRA_ARGS: Splits on all whitespace, which is correct for space-separated flags like --max-turns 30. Arguments containing literal spaces can't be represented, but that's an acceptable limitation for the documented use.

  • provider_of("claude-code/sonnet") in entrypoint.sh returns "claude-code", grouping all claude-code specs into one provider lane — correct behavior.

No logic bugs, domain errors, or semantic correctness issues found.

🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

  • cmd/gadfly/engine.go:150-156truncateForErr truncates on byte boundary, can split a multi-byte UTF-8 character. len(s) > max and s[:max] operate on bytes. If the 800th byte falls mid-rune, the returned string is invalid UTF-8. It is then embedded in an fmt.Errorf message that ultimately appears in a review comment. Low-severity (cosmetic, error path only), but straightforwardly fixable by truncating on a rune boundary (e.g., loop backward from max to find a valid boundary, or use utf8.RuneStart).

  • cmd/gadfly/engine.go:113-146 — Timeout-killed CLI process always surfaces as error, never as partial output. When context.WithTimeout expires, exec.CommandContext sends SIGKILL; the child's stdout is typically incomplete JSON. The JSON parse then fails, runErr is non-nil, and the function returns a fmt.Errorf. By contrast, the majordomo engine's runAgent can return a partial answer on step-budget exhaustion via its wrap-up steer and finalization fallback. A slow claude-code lens that times out mid-write always produces the ⚠️ error notice rather than potentially usable partial text. Not a bug — truncated JSON can't be parsed — but a behavioral gap on the unhappy path worth noting.

  • cmd/gadfly/recheck.go:27,87 & cmd/gadfly/main.go:434,444 — Prompts reference get_diff / read_file / grep / find_files tools that don't exist in the Claude Code engine. When the claude-code engine is active, the system prompt (scripts/system-prompt.txt), the review task (buildTask), and the recheck prompt (recheckSystemPrompt + buildRecheckTask) all instruct the model to call read_file, grep, find_files, and get_diff — which are the majordomo agent's tools, not Claude Code's (Read, Grep, Glob). The most impactful case is get_diff: the diff is truncated in the prompt and the model is told to "call get_diff for the full text," but no such tool exists for Claude Code. The model can still use its own tools to verify findings against the repo, so this doesn't cause a crash — but the instruction to call a nonexistent tool is misleading. Low-impact but an edge case where prompts are engine-specific yet delivered unconditionally.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 38s

<!-- gadfly-review:ollama:glm-5.1:cloud --> ### 🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> `GADFLY_BIN` is a shell script variable (in `scripts/run.sh`), not a Go-level concern. The draft's finding 3 acknowledges it's the same pattern and "low risk since it's admin-controlled." Let me now verify the exact claim about "commander" — the draft says "Most CLI parsers (including Claude Code's `commander`)" — this is a claim about an external package. Let me check if there's any documentation or code that confirms this. The `@anthropic-ai/claude-code` package is installed via npm in the Dockerfile. It's not in this repo. The claim about `commander` being the CLI parser is speculative but reasonable for Node.js CLIs. More importantly, the underlying issue — that extraArgs can override `--permission-mode plan` — is real regardless of which CLI parser is used. Even if "last wins" isn't how `commander` works, many CLI parsers do honor the last flag, and even those that don't may still be subvertable by other means (like `--dangerously-skip-permissions`). Now let me verify the README's claim about safety: Looking at the README at line 104: `GADFLY_CLAUDE_PERMISSION_MODE` | `plan` | `--permission-mode` (read-only `plan` keeps it from editing) The README advertises `plan` as a safety feature but doesn't mention that `EXTRA_ARGS` can override it. This is a valid documentation gap. Let me now make my final assessment: **Finding 1**: Valid. `extraArgs` are appended after `--permission-mode plan`, so `GADFLY_CLAUDE_EXTRA_ARGS="--permission-mode default"` could potentially override the safety default. The README advertises `plan` as a safety feature without noting this caveat. The claim about `commander` is speculative (external package) but the core issue is real. **Minor** severity since these are admin-controlled env vars. **Finding 2**: Valid. `GADFLY_CLAUDE_ALLOWED_TOOLS` is passed verbatim with no validation. Under `--permission-mode plan`, tools like `Bash` and `Write` *should* be restricted, but the code doesn't enforce this. **Minor** severity — defense-in-depth concern. **Finding 3**: Valid but the lowest value. `GADFLY_CLAUDE_BIN` is exec'd without path validation, same pattern as `GADFLY_BIN`. The draft itself says "low risk since it's admin-controlled." This is more of an observation than an actionable finding. I'll keep it but it's very minor. All three findings survive verification. The verdict remains "Minor issues." **VERDICT: Minor issues** - **`cmd/gadfly/engine.go:92–103` — `GADFLY_CLAUDE_EXTRA_ARGS` can override `--permission-mode plan` safety default.** `extraArgs` are appended after `--permission-mode plan` in the argv (`return append(a, e.extraArgs...)`). Most CLI parsers apply the last occurrence of a flag, so `GADFLY_CLAUDE_EXTRA_ARGS="--permission-mode default"` could silently downgrade from read-only to full write access. No validation or documentation warns about this. The README (line 104) advertises `plan` as a safety feature ("read-only `plan` keeps it from editing") without noting that `EXTRA_ARGS` can negate it. Fix: either validate `extraArgs` against overrides of `--permission-mode` / `--dangerously-skip-permissions`, or append `--permission-mode` *after* `extraArgs` so the hard-coded default always wins. - **`cmd/gadfly/engine.go:83,100–101` — No validation on `GADFLY_CLAUDE_ALLOWED_TOOLS` values.** The env var is passed verbatim as `--allowedTools`. A value like `Bash,Write,Edit` could grant write/execute capability. Under `--permission-mode plan` this *should* be vetoed, but the code doesn't enforce that, and the EXTRA_ARGS override above makes it moot. A brief validation or at least a README warning that this knob should not expand beyond read-only tool names would add defense-in-depth. - **`cmd/gadfly/engine.go:79,114` — `GADFLY_CLAUDE_BIN` allows arbitrary binary execution.** The env var is passed directly to `exec.CommandContext`. A mis-set value (or a compromised CI variable) would execute an attacker-chosen binary with the full container environment (including `GITEA_TOKEN`, `CLAUDE_CODE_OAUTH_TOKEN`, etc.). This is the same threat model as the pre-existing `GADFLY_BIN`, but there is no path resolution safety (e.g., checking it resolves to an absolute path or is on a safe list). Low risk since it's admin-controlled, but worth noting for defense-in-depth. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **VERDICT: No material issues found** I reviewed the diff through the correctness lens, verifying env var flow, CLI argument construction, JSON parsing logic, and branching in `main.go`. Key things I checked: - **`GADFLY_MODEL` (singular) vs `GADFLY_MODELS` (plural)**: Not a bug — `entrypoint.sh` iterates `GADFLY_MODELS`, sets `MODEL` per iteration, and `run.sh` maps `MODEL` → `GADFLY_MODEL` for the Go binary. Consistent. - **`isClaudeCodeSpec` branching**: The `ccSpec` guard in `main.go:156` correctly prevents the `eng.(*majordomoEngine).mdl` type assertion (line 187) from ever executing for a claude-code engine — that code path only runs when `ccSpec` is false. - **CLI argv construction** (`args()` method): Arguments are passed via `exec.Command` (not a shell), so no injection or quoting issues. The order and flags are syntactically correct. - **`claudeResult` JSON parsing**: The struct tags (`result`, `is_error`, `subtype`) match the documented Claude Code JSON output format. The fallback logic (try JSON → try raw stdout → error) degrades gracefully. - **`truncateForErr`** uses `len()` (bytes) rather than `rune` count, so it could split a multi-byte UTF-8 character at position 800. Impact: a garbled byte in an error message trailer. Not material. - **`strings.Fields` on `GADFLY_CLAUDE_EXTRA_ARGS`**: Splits on all whitespace, which is correct for space-separated flags like `--max-turns 30`. Arguments containing literal spaces can't be represented, but that's an acceptable limitation for the documented use. - **`provider_of("claude-code/sonnet")`** in entrypoint.sh returns `"claude-code"`, grouping all claude-code specs into one provider lane — correct behavior. No logic bugs, domain errors, or semantic correctness issues found. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> VERDICT: Minor issues - **`cmd/gadfly/engine.go:150-156` — `truncateForErr` truncates on byte boundary, can split a multi-byte UTF-8 character.** `len(s) > max` and `s[:max]` operate on bytes. If the 800th byte falls mid-rune, the returned string is invalid UTF-8. It is then embedded in an `fmt.Errorf` message that ultimately appears in a review comment. Low-severity (cosmetic, error path only), but straightforwardly fixable by truncating on a rune boundary (e.g., loop backward from `max` to find a valid boundary, or use `utf8.RuneStart`). - **`cmd/gadfly/engine.go:113-146` — Timeout-killed CLI process always surfaces as error, never as partial output.** When `context.WithTimeout` expires, `exec.CommandContext` sends SIGKILL; the child's stdout is typically incomplete JSON. The JSON parse then fails, `runErr` is non-nil, and the function returns a `fmt.Errorf`. By contrast, the majordomo engine's `runAgent` can return a partial answer on step-budget exhaustion via its wrap-up steer and finalization fallback. A slow `claude-code` lens that times out mid-write always produces the ⚠️ error notice rather than potentially usable partial text. Not a bug — truncated JSON can't be parsed — but a behavioral gap on the unhappy path worth noting. - **`cmd/gadfly/recheck.go:27,87` & `cmd/gadfly/main.go:434,444` — Prompts reference `get_diff` / `read_file` / `grep` / `find_files` tools that don't exist in the Claude Code engine.** When the claude-code engine is active, the system prompt (`scripts/system-prompt.txt`), the review task (`buildTask`), and the recheck prompt (`recheckSystemPrompt` + `buildRecheckTask`) all instruct the model to call `read_file`, `grep`, `find_files`, and `get_diff` — which are the majordomo agent's tools, not Claude Code's (Read, Grep, Glob). The most impactful case is `get_diff`: the diff is truncated in the prompt and the model is told to "call get_diff for the full text," but no such tool exists for Claude Code. The model can still use its own tools to verify findings against the repo, so this doesn't cause a crash — but the instruction to call a nonexistent tool is misleading. Low-impact but an edge case where prompts are engine-specific yet delivered unconditionally. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 38s</sub>

🪰 Gadfly review — kimi-k2.7-code:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found.

Security-relevant points checked:

  • Shell-out is direct, not via sh -c: cmd/gadfly/engine.go:114 uses exec.CommandContext(ctx, e.bin, e.args(system, task)...), so GADFLY_CLAUDE_EXTRA_ARGS (split with strings.Fields) and the prompts become separate argv entries; no shell metacharacter injection is possible.
  • Credential inheritance is intentional and limited: cmd/gadfly/engine.go:116 sets cmd.Env = os.Environ() only to pass CLAUDE_CODE_OAUTH_TOKEN / ANTHROPIC_API_KEY to the CLI, matching how scripts/run.sh already passes through provider keys. No code logs, prints, or persists the token.
  • Working directory is the sandbox root: cmd/gadfly/main.go:159 sets the CLI cwd to fsTools.root, which newRepoFS resolves to an absolute, symlink-clean path (cmd/gadfly/tools.go:45-64), so the CLI's tools operate inside the intended checkout.
  • Read-only default is enforced via --permission-mode plan: cmd/gadfly/engine.go:82 and engine.go:98.
  • No unsafe deserialization: the only JSON parse (cmd/gadfly/engine.go:125) unmarshals into a plain struct and never acts on the result as code.

The trade-off that the in-process sandbox is replaced by the CLI's own file access is documented honestly in the README and PR description, and is not a hidden security regression.

🎯 Correctness — Minor issues

Minor issues

  • cmd/gadfly/main.go:187eng.(*majordomoEngine).mdl breaks the reviewEngine abstraction. It is safe today because the surrounding else branch always assigns a *majordomoEngine, but the cast is a latent panic if that branch ever changes and is brittle for testing/mocking. Better to keep the resolved llm.Model in a local variable before wrapping it, or add a small getter.

  • cmd/gadfly/engine.go:84 / README.md:105GADFLY_CLAUDE_EXTRA_ARGS is parsed with strings.Fields, which splits on any whitespace and discards quotes. That contradicts the README's "appended verbatim" wording and makes it impossible to pass a value containing spaces (e.g. --some-flag "two words" becomes three argv entries, not two). Either document the whitespace-split behavior explicitly, or use a shell-style tokenizer so quoted values work.

🧯 Error handling & edge cases — Minor issues

Minor issues

  • cmd/gadfly/engine.go:84GADFLY_CLAUDE_EXTRA_ARGS is split with strings.Fields, which silently mishandles quoted arguments (e.g. --some-flag "value with spaces" becomes two tokens with the quotes still attached). The README says extra args are “appended verbatim”, but that is not what strings.Fields does. Use a small shell-style splitter or document the limitation.

  • cmd/gadfly/engine.go:153truncateForErr caps by byte length (len(s) > 800), not rune length, so a multi-byte error message can be truncated mid-rune. Use utf8.RuneCountInString and slice by runes. (minor cosmetic risk in log output)

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 8s

<!-- gadfly-review:ollama:kimi-k2.7-code:cloud --> ### 🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> No material issues found. Security-relevant points checked: - **Shell-out is direct, not via `sh -c`**: `cmd/gadfly/engine.go:114` uses `exec.CommandContext(ctx, e.bin, e.args(system, task)...)`, so `GADFLY_CLAUDE_EXTRA_ARGS` (split with `strings.Fields`) and the prompts become separate argv entries; no shell metacharacter injection is possible. - **Credential inheritance is intentional and limited**: `cmd/gadfly/engine.go:116` sets `cmd.Env = os.Environ()` only to pass `CLAUDE_CODE_OAUTH_TOKEN` / `ANTHROPIC_API_KEY` to the CLI, matching how `scripts/run.sh` already passes through provider keys. No code logs, prints, or persists the token. - **Working directory is the sandbox root**: `cmd/gadfly/main.go:159` sets the CLI cwd to `fsTools.root`, which `newRepoFS` resolves to an absolute, symlink-clean path (`cmd/gadfly/tools.go:45-64`), so the CLI's tools operate inside the intended checkout. - **Read-only default is enforced via `--permission-mode plan`**: `cmd/gadfly/engine.go:82` and `engine.go:98`. - **No unsafe deserialization**: the only JSON parse (`cmd/gadfly/engine.go:125`) unmarshals into a plain struct and never acts on the result as code. The trade-off that the in-process sandbox is replaced by the CLI's own file access is documented honestly in the README and PR description, and is not a hidden security regression. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> **Minor issues** - `cmd/gadfly/main.go:187` — `eng.(*majordomoEngine).mdl` breaks the `reviewEngine` abstraction. It is safe *today* because the surrounding `else` branch always assigns a `*majordomoEngine`, but the cast is a latent panic if that branch ever changes and is brittle for testing/mocking. Better to keep the resolved `llm.Model` in a local variable before wrapping it, or add a small getter. - `cmd/gadfly/engine.go:84` / `README.md:105` — `GADFLY_CLAUDE_EXTRA_ARGS` is parsed with `strings.Fields`, which splits on any whitespace and discards quotes. That contradicts the README's "appended verbatim" wording and makes it impossible to pass a value containing spaces (e.g. `--some-flag "two words"` becomes three argv entries, not two). Either document the whitespace-split behavior explicitly, or use a shell-style tokenizer so quoted values work. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> **Minor issues** - `cmd/gadfly/engine.go:84` — `GADFLY_CLAUDE_EXTRA_ARGS` is split with `strings.Fields`, which silently mishandles quoted arguments (e.g. `--some-flag "value with spaces"` becomes two tokens with the quotes still attached). The README says extra args are “appended verbatim”, but that is not what `strings.Fields` does. Use a small shell-style splitter or document the limitation. - `cmd/gadfly/engine.go:153` — `truncateForErr` caps by byte length (`len(s) > 800`), not rune length, so a multi-byte error message can be truncated mid-rune. Use `utf8.RuneCountInString` and slice by runes. *(minor cosmetic risk in log output)* </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 8s</sub>

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

The security posture of this change is sound. I traced the key data flows and verified against the actual code:

  • Command construction (engine.go:92-103, engine.go:113-114): exec.CommandContext is used, not a shell. The task (which contains untrusted PR title/body/diff) and system prompt are passed as direct argv elements — no shell injection vector. Verified by reading engine.go:92-103 and main.go:424-447 (buildTask).

  • os.Environ() inheritance (engine.go:116): Passes the full environment to the claude child process. This is standard Go practice and the claude CLI is designed to read only CLAUDE_CODE_OAUTH_TOKEN / ANTHROPIC_API_KEY from it. The parent process already holds the same secrets in its own environment — passing them to a child on the same machine doesn't cross a trust boundary. Not a material finding.

  • Read-only default (engine.go:82, engine.go:97-98): --permission-mode plan is the hardcoded default, enforced in args(). Even if the LLM were prompt-injected by a malicious diff, the CLI's permission layer (not the model) blocks modifications. Verified.

  • --allowedTools and --extraArgs (engine.go:83-84, engine.go:100-103): Both are sourced from admin-controlled env vars (GADFLY_CLAUDE_*), not from PR content. No untrusted input reaches these flags.

  • JSON deserialization (engine.go:124-125): claudeResult only has string/bool fields. Go's encoding/json is memory-safe. The input is from a locally-executed trusted binary's stdout. No unsafe deserialization risk.

  • Pre-existing .git/config token exposure (entrypoint.sh:126): The clone URL embeds GITEA_TOKEN, which lands in .git/config. Both the majordomo read_file tool and the claude CLI's Read tool can access it. This is not introduced by this PR and is not made worse by it — the majordomo path already has the same exposure. Worth noting but outside the scope of this review.

  • No new credential leakage paths: Error messages are truncated to 800 chars (engine.go:150-157). The review output pipeline (verdict parse → recheck → consolidate → emit) is unchanged and doesn't introduce new channels for secret exfiltration.

🎯 Correctness — Minor issues

Both findings are confirmed against the actual code. Let me verify the exact line numbers and logic flow:

Finding 1 (lines 125-145): The control flow is exactly as described. When JSON parses successfully (line 125), res.IsError is false (line 126), and res.Result is empty (line 129), the code falls through both the error check at line 134 (runErr is nil) and lands at line 143, which returns the raw JSON string as the review text. Confirmed.

Finding 2 (line 84): strings.Fields is the sole parsing mechanism for GADFLY_CLAUDE_EXTRA_ARGS, and it cannot represent arguments containing spaces. Confirmed.

Both are real, minor issues. The verdict stands.


Verdict: Minor issues

  • cmd/gadfly/engine.go:125-145runPass returns raw JSON as review text when Result is empty but JSON parses successfully. If the CLI exits 0 and emits {"result":"","is_error":false}, the JSON unmarshals cleanly, res.Result is empty, runErr is nil, and the code falls through to the raw-stdout fallback at line 143 — which returns the JSON string itself as the "review." The consolidated comment would then contain a raw JSON blob instead of a meaningful review. The fix is to treat an empty Result in a successfully-parsed JSON response as an error (or at minimum not fall back to raw JSON):

    if jerr := json.Unmarshal(bytes.TrimSpace(stdout.Bytes()), &res); jerr == nil {
        if res.IsError {
            return "", fmt.Errorf("claude reported error (%s): %s", res.Subtype, truncateForErr(res.Result))
        }
        if out := strings.TrimSpace(res.Result); out != "" {
            return out, nil
        }
        // JSON parsed but result is empty — don't fall through to raw JSON.
        return "", fmt.Errorf("claude -p returned empty result")
    }
    

    This is an edge case (the CLI shouldn't normally produce empty results), but the current fallback would silently emit garbage.

  • cmd/gadfly/engine.go:84strings.Fields on GADFLY_CLAUDE_EXTRA_ARGS cannot represent arguments containing spaces. If a user needs to pass something like --system-prompt-file /path/with spaces/prompt.txt, strings.Fields would split it into three arguments. This is a limitation rather than a bug per se, but it means the _EXTRA_ARGS knob can't express all valid CLI invocations. A shell-compatible split (e.g., shlex or a CSV approach) would be more robust.

🧯 Error handling & edge cases — Minor issues

All three findings are confirmed against the actual code. The verdict remains Minor issues.


Verdict: Minor issues

  • cmd/gadfly/engine.go:154truncateForErr can split a multi-byte UTF-8 character (verified by reading the function). s[:max] uses byte-index slicing; if max falls mid-codepoint the error string will contain an invalid UTF-8 sequence. In practice CLI error output is almost certainly ASCII, so this is cosmetic — but it's a latent bug. Fix: use string([]rune(s)[:max]) or iterate with for i := range s to find a safe cut point.

  • cmd/gadfly/main.go:310–312 — shared timeout context starves the recheck pass (verified by reading reviewWithSpecialist). A single context.WithTimeout is created and used for both the draft runPass and the recheck runPass. For the majordomo engine this is mitigated by the internal step budget, but the claude-code engine is an external process with no step cap — the timeout is its only constraint. If the draft CLI call consumes, say, 280 of 300 seconds, the recheck gets ≤20 seconds and is very likely to fail (gracefully — it falls back to the draft). This is pre-existing architecture, but the claude-code engine makes it materially worse because there is no intermediate step-budget backstop. Consider splitting the timeout or giving the recheck its own deadline.

  • cmd/gadfly/engine.go:134–139runErr detail can be empty when stderr and stdout are both blank (verified by reading runPass). If claude exits non-zero with no output at all (e.g. immediate crash before writing anything), the error message becomes "claude -p failed: <runErr>: " with a trailing ": ". Harmless but sloppy. A if detail != "" { ... } guard or strings.TrimRight on the format string would clean it up.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 52s

<!-- gadfly-review:ollama:deepseek-v4-pro:cloud --> ### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## Verdict: No material issues found The security posture of this change is sound. I traced the key data flows and verified against the actual code: - **Command construction** (`engine.go:92-103`, `engine.go:113-114`): `exec.CommandContext` is used, not a shell. The `task` (which contains untrusted PR title/body/diff) and `system` prompt are passed as direct `argv` elements — no shell injection vector. Verified by reading `engine.go:92-103` and `main.go:424-447` (buildTask). - **`os.Environ()` inheritance** (`engine.go:116`): Passes the full environment to the `claude` child process. This is standard Go practice and the `claude` CLI is designed to read only `CLAUDE_CODE_OAUTH_TOKEN` / `ANTHROPIC_API_KEY` from it. The parent process already holds the same secrets in its own environment — passing them to a child on the same machine doesn't cross a trust boundary. Not a material finding. - **Read-only default** (`engine.go:82`, `engine.go:97-98`): `--permission-mode plan` is the hardcoded default, enforced in `args()`. Even if the LLM were prompt-injected by a malicious diff, the CLI's permission layer (not the model) blocks modifications. Verified. - **`--allowedTools` and `--extraArgs`** (`engine.go:83-84`, `engine.go:100-103`): Both are sourced from admin-controlled env vars (`GADFLY_CLAUDE_*`), not from PR content. No untrusted input reaches these flags. - **JSON deserialization** (`engine.go:124-125`): `claudeResult` only has `string`/`bool` fields. Go's `encoding/json` is memory-safe. The input is from a locally-executed trusted binary's stdout. No unsafe deserialization risk. - **Pre-existing `.git/config` token exposure** (`entrypoint.sh:126`): The clone URL embeds `GITEA_TOKEN`, which lands in `.git/config`. Both the majordomo `read_file` tool and the `claude` CLI's Read tool can access it. This is not introduced by this PR and is not made worse by it — the majordomo path already has the same exposure. Worth noting but outside the scope of this review. - **No new credential leakage paths**: Error messages are truncated to 800 chars (`engine.go:150-157`). The review output pipeline (verdict parse → recheck → consolidate → emit) is unchanged and doesn't introduce new channels for secret exfiltration. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> Both findings are confirmed against the actual code. Let me verify the exact line numbers and logic flow: **Finding 1 (lines 125-145):** The control flow is exactly as described. When JSON parses successfully (line 125), `res.IsError` is false (line 126), and `res.Result` is empty (line 129), the code falls through both the error check at line 134 (`runErr` is nil) and lands at line 143, which returns the raw JSON string as the review text. Confirmed. **Finding 2 (line 84):** `strings.Fields` is the sole parsing mechanism for `GADFLY_CLAUDE_EXTRA_ARGS`, and it cannot represent arguments containing spaces. Confirmed. Both are real, minor issues. The verdict stands. --- ## Verdict: Minor issues - **`cmd/gadfly/engine.go:125-145` — `runPass` returns raw JSON as review text when `Result` is empty but JSON parses successfully.** If the CLI exits 0 and emits `{"result":"","is_error":false}`, the JSON unmarshals cleanly, `res.Result` is empty, `runErr` is nil, and the code falls through to the raw-stdout fallback at line 143 — which returns the JSON string itself as the "review." The consolidated comment would then contain a raw JSON blob instead of a meaningful review. The fix is to treat an empty `Result` in a successfully-parsed JSON response as an error (or at minimum not fall back to raw JSON): ```go if jerr := json.Unmarshal(bytes.TrimSpace(stdout.Bytes()), &res); jerr == nil { if res.IsError { return "", fmt.Errorf("claude reported error (%s): %s", res.Subtype, truncateForErr(res.Result)) } if out := strings.TrimSpace(res.Result); out != "" { return out, nil } // JSON parsed but result is empty — don't fall through to raw JSON. return "", fmt.Errorf("claude -p returned empty result") } ``` This is an edge case (the CLI shouldn't normally produce empty results), but the current fallback would silently emit garbage. - **`cmd/gadfly/engine.go:84` — `strings.Fields` on `GADFLY_CLAUDE_EXTRA_ARGS` cannot represent arguments containing spaces.** If a user needs to pass something like `--system-prompt-file /path/with spaces/prompt.txt`, `strings.Fields` would split it into three arguments. This is a limitation rather than a bug per se, but it means the `_EXTRA_ARGS` knob can't express all valid CLI invocations. A shell-compatible split (e.g., `shlex` or a CSV approach) would be more robust. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> All three findings are confirmed against the actual code. The verdict remains **Minor issues**. --- ## Verdict: Minor issues - **`cmd/gadfly/engine.go:154` — `truncateForErr` can split a multi-byte UTF-8 character** (verified by reading the function). `s[:max]` uses byte-index slicing; if `max` falls mid-codepoint the error string will contain an invalid UTF-8 sequence. In practice CLI error output is almost certainly ASCII, so this is cosmetic — but it's a latent bug. Fix: use `string([]rune(s)[:max])` or iterate with `for i := range s` to find a safe cut point. - **`cmd/gadfly/main.go:310–312` — shared timeout context starves the recheck pass** (verified by reading `reviewWithSpecialist`). A single `context.WithTimeout` is created and used for *both* the draft `runPass` and the recheck `runPass`. For the majordomo engine this is mitigated by the internal step budget, but the claude-code engine is an external process with no step cap — the timeout is its *only* constraint. If the draft CLI call consumes, say, 280 of 300 seconds, the recheck gets ≤20 seconds and is very likely to fail (gracefully — it falls back to the draft). This is pre-existing architecture, but the claude-code engine makes it materially worse because there is no intermediate step-budget backstop. Consider splitting the timeout or giving the recheck its own deadline. - **`cmd/gadfly/engine.go:134–139` — `runErr` detail can be empty when stderr and stdout are both blank** (verified by reading `runPass`). If `claude` exits non-zero with no output at all (e.g. immediate crash before writing anything), the error message becomes `"claude -p failed: <runErr>: "` with a trailing `": "`. Harmless but sloppy. A `if detail != "" { ... }` guard or `strings.TrimRight` on the format string would clean it up. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 52s</sub>

🪰 Gadfly review — nemotron-3-super:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found


Note: After thorough review through the security lens, no material security issues were identified in the changes. The introduction of the Claude Code engine properly isolates untrusted input (prompts/diffs) as arguments to a trusted binary, uses a read-only permission mode by default, and inherits environment variables from a trusted workflow context. All potential injection vectors are mitigated by using exec.CommandContext with argument arrays rather than shell execution. The environment variable handling assumes trusted workflow configuration, which aligns with the project's threat model. The compliance note about Anthropic's terms is noted but falls outside the strict security lens scope.

🎯 Correctness — No material issues found

No material issues found

🧯 Error handling & edge cases — Minor issues

Review Summary: Error Handling & Edge Cases

Verdict: Minor issues

Findings

  • cmd/gadfly/engine.go:113-146 - In claudeCodeEngine.runPass, when the Claude Code CLI returns a successful JSON response (IsError=false) with an empty or whitespace-only result field, the function incorrectly falls back to returning the raw stdout (the entire JSON) or an error if stdout is empty. This violates the contract of returning the model's actual output. For example, if the CLI returns {"result": " ", "is_error": false}, the function would return the raw JSON string instead of an empty string. This could cause downstream processing (like verdict parsing) to receive malformed input.
    Fix: After confirming !res.IsError, return res.Result directly without checking for emptiness, matching the Majordomo engine's behavior of returning the model's output verbatim.
    Verified by reading the function in cmd/gadfly/engine.go.

No other error handling or edge case issues were found within the scope of this lens. The context propagation, error handling for command execution, and JSON fallback logic are otherwise sound.

-- No material issues found in other files reviewed (main.go changes, etc.) for this lens.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 28s

<!-- gadfly-review:ollama:nemotron-3-super:cloud --> ### 🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> No material issues found --- **Note**: After thorough review through the security lens, no material security issues were identified in the changes. The introduction of the Claude Code engine properly isolates untrusted input (prompts/diffs) as arguments to a trusted binary, uses a read-only permission mode by default, and inherits environment variables from a trusted workflow context. All potential injection vectors are mitigated by using `exec.CommandContext` with argument arrays rather than shell execution. The environment variable handling assumes trusted workflow configuration, which aligns with the project's threat model. The compliance note about Anthropic's terms is noted but falls outside the strict security lens scope. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> No material issues found </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> ## Review Summary: Error Handling & Edge Cases ### Verdict: Minor issues ### Findings - **cmd/gadfly/engine.go:113-146** - In `claudeCodeEngine.runPass`, when the Claude Code CLI returns a successful JSON response (`IsError=false`) with an empty or whitespace-only `result` field, the function incorrectly falls back to returning the raw stdout (the entire JSON) or an error if stdout is empty. This violates the contract of returning the model's actual output. For example, if the CLI returns `{"result": " ", "is_error": false}`, the function would return the raw JSON string instead of an empty string. This could cause downstream processing (like verdict parsing) to receive malformed input. **Fix**: After confirming `!res.IsError`, return `res.Result` directly without checking for emptiness, matching the Majordomo engine's behavior of returning the model's output verbatim. *Verified by reading the function in `cmd/gadfly/engine.go`.* No other error handling or edge case issues were found within the scope of this lens. The context propagation, error handling for command execution, and JSON fallback logic are otherwise sound. -- No material issues found in other files reviewed (main.go changes, etc.) for this lens. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 28s</sub>

🪰 Gadfly review — gpt-oss:120b-cloud (ollama-cloud)

Verdict: No material issues found · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

🎯 Correctness⚠️ could not complete

⚠️ This reviewer failed to complete: agent: tool-error guard tripped: 4 consecutive steps with only failing tool calls

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

  • The new claudeCodeEngine correctly checks and propagates errors from the CLI execution, JSON unmarshalling, and result validation; no errors are silently ignored.
  • All added resources (exec.Cmd, buffers) are scoped to the function; there are no missing defer statements or resource leaks.
  • Context cancellation, empty or nil inputs, and malformed CLI output are all handled with explicit error returns.
  • Modifications in main.go and runSpecialists preserve existing error handling pathways; failures from the new engine are reported via the same errored flag used elsewhere.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 45s

<!-- gadfly-review:ollama:gpt-oss:120b-cloud --> ### 🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud) **Verdict: No material issues found · ⚠️ 1/3 lens(es) errored** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> No material issues found </details> <details><summary><b>🎯 Correctness</b> — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: tool-error guard tripped: 4 consecutive steps with only failing tool calls </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> **Verdict:** No material issues found - The new `claudeCodeEngine` correctly checks and propagates errors from the CLI execution, JSON unmarshalling, and result validation; no errors are silently ignored. - All added resources (`exec.Cmd`, buffers) are scoped to the function; there are no missing `defer` statements or resource leaks. - Context cancellation, empty or nil inputs, and malformed CLI output are all handled with explicit error returns. - Modifications in `main.go` and `runSpecialists` preserve existing error handling pathways; failures from the new engine are reported via the same `errored` flag used elsewhere. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 45s</sub>

🪰 Gadfly review — qwen3-coder:480b-cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues

  • cmd/gadfly/engine.go:84 — The newClaudeCodeEngine function uses strings.Fields to parse GADFLY_CLAUDE_EXTRA_ARGS, which splits on any whitespace. This could be problematic if users intend to pass arguments with spaces.
  • cmd/gadfly/engine.go:116 — The claudeCodeEngine.runPass method inherits the entire environment via os.Environ(), which could inadvertently expose sensitive environment variables to the claude CLI.
  • examples/claude-code.yml:63 — The workflow example restricts triggering to specific users for security, but this might be too restrictive for some use cases.
🎯 Correctness — No material issues found

No material issues found.

The changes introduce a new claudeCodeEngine that shells out to the Claude Code CLI, alongside the existing majordomoEngine. The logic for selecting between these engines and handling their different behaviors (e.g., auto-selection being majordomo-only) appears sound and well-tested. The abstraction through the reviewEngine interface correctly allows both engines to be used interchangeably in the rest of the pipeline.

I verified key parts of the implementation:

  • The isClaudeCodeSpec function correctly identifies Claude Code model specifications.
  • The newClaudeCodeEngine function properly parses the model specification and environment variables.
  • The args method constructs the correct command-line arguments for the Claude CLI.
  • The runPass method handles execution and parsing of the CLI output appropriately.
  • The integration into main.go correctly selects the engine and adjusts behavior (like disabling auto-selection) when using Claude Code.

The unit tests cover the core logic of the new engine well. The changes are logically correct and maintain the existing behavior for the majordomo engine while adding the new functionality in a clean, modular way.

🧯 Error handling & edge cases — No material issues found

No material issues found.

The changes introduce a new claudeCodeEngine that shells out to the Claude Code CLI. The error handling and edge cases appear to be well-covered:

  1. The runPass method in claudeCodeEngine handles various error scenarios:

    • It checks for structured JSON output even when the command fails
    • Falls back to raw stdout if JSON parsing fails but output exists
    • Properly handles execution errors with stderr/stdout truncation
    • Uses exec.CommandContext for proper timeout handling
  2. Environment variable handling is robust:

    • Uses envOr helper for safe default values
    • Properly trims whitespace from environment variables
    • Handles optional flags (like --model and --allowedTools) by omitting them when empty
  3. The integration with the existing pipeline is clean:

    • reviewWithSpecialist now uses the engine abstraction properly
    • Context timeouts are preserved
    • Error handling maintains the existing behavior (advisory notices rather than fatal errors)
  4. Unit tests cover the key edge cases:

    • Model specification parsing
    • Environment variable overrides
    • Argument construction including omitted flags
    • Both bare and suffixed model specifications

No blocking issues were found in error handling or edge case management within the scope of this change.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 8s

<!-- gadfly-review:ollama:qwen3-coder:480b-cloud --> ### 🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> VERDICT: Minor issues - `cmd/gadfly/engine.go:84` — The `newClaudeCodeEngine` function uses `strings.Fields` to parse `GADFLY_CLAUDE_EXTRA_ARGS`, which splits on any whitespace. This could be problematic if users intend to pass arguments with spaces. - `cmd/gadfly/engine.go:116` — The `claudeCodeEngine.runPass` method inherits the entire environment via `os.Environ()`, which could inadvertently expose sensitive environment variables to the `claude` CLI. - `examples/claude-code.yml:63` — The workflow example restricts triggering to specific users for security, but this might be too restrictive for some use cases. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> No material issues found. The changes introduce a new `claudeCodeEngine` that shells out to the Claude Code CLI, alongside the existing `majordomoEngine`. The logic for selecting between these engines and handling their different behaviors (e.g., auto-selection being majordomo-only) appears sound and well-tested. The abstraction through the `reviewEngine` interface correctly allows both engines to be used interchangeably in the rest of the pipeline. I verified key parts of the implementation: - The `isClaudeCodeSpec` function correctly identifies Claude Code model specifications. - The `newClaudeCodeEngine` function properly parses the model specification and environment variables. - The `args` method constructs the correct command-line arguments for the Claude CLI. - The `runPass` method handles execution and parsing of the CLI output appropriately. - The integration into `main.go` correctly selects the engine and adjusts behavior (like disabling auto-selection) when using Claude Code. The unit tests cover the core logic of the new engine well. The changes are logically correct and maintain the existing behavior for the majordomo engine while adding the new functionality in a clean, modular way. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> No material issues found. The changes introduce a new `claudeCodeEngine` that shells out to the Claude Code CLI. The error handling and edge cases appear to be well-covered: 1. The `runPass` method in `claudeCodeEngine` handles various error scenarios: - It checks for structured JSON output even when the command fails - Falls back to raw stdout if JSON parsing fails but output exists - Properly handles execution errors with stderr/stdout truncation - Uses `exec.CommandContext` for proper timeout handling 2. Environment variable handling is robust: - Uses `envOr` helper for safe default values - Properly trims whitespace from environment variables - Handles optional flags (like `--model` and `--allowedTools`) by omitting them when empty 3. The integration with the existing pipeline is clean: - `reviewWithSpecialist` now uses the engine abstraction properly - Context timeouts are preserved - Error handling maintains the existing behavior (advisory notices rather than fatal errors) 4. Unit tests cover the key edge cases: - Model specification parsing - Environment variable overrides - Argument construction including omitted flags - Both bare and suffixed model specifications No blocking issues were found in error handling or edge case management within the scope of this change. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 8s</sub>

🪰 Gadfly review — gemma4:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

VERDICT: Blocking issues found

  • Command Injection via Environment Variables: cmd/gadfly/engine.go:84 and cmd/gadfly/engine.go:85 read GADFLY_CLAUDE_ALLOWED_TOOLS and GADFLY_CLAUDE_EXTRA_ARGS and pass them directly to exec.CommandContext. While exec.Command does not use a shell by default, strings.Fields on GADFLY_CLAUDE_EXTRA_ARGS allows a user to inject any arbitrary flag into the claude CLI. More critically, since these environment variables are often set in CI YAML files (which can be modified in PRs if the CI is misconfigured or if they are sourced from untrusted repo-level configs), this allows an attacker to pass flags like --permission-mode agent or other destructive flags to the CLI, bypassing the intended plan (read-only) restriction.

    • Fix: Implement a strict allow-list of permitted flags/arguments or forbid the use of GADFLY_CLAUDE_EXTRA_ARGS entirely in favor of a structured configuration.
  • Insecure Default/Override of Permission Mode: cmd/gadfly/engine.go:83 allows GADFLY_CLAUDE_PERMISSION_MODE to be overridden by an environment variable. The README claims plan is read-only, but by allowing this to be tuned via env vars, any user who can trigger the CI (or modify the workflow if permissions are loose) can switch the engine to a mode that allows the LLM to write to the filesystem.

    • Fix: Hardcode the plan mode or restrict overrides to a set of known-safe read-only modes.
  • Potential Token Leakage in Logs: cmd/gadfly/engine.go:134-139 captures stdout and stderr and includes them in the error return value, which is then printed to os.Stderr in main.go:317 and main.go:326. If the claude CLI outputs sensitive information (like the OAuth token or API key) during a failure or in a debug trace, those secrets will be leaked to the CI logs.

    • Fix: Implement a scrubbing mechanism to remove known secret values (from os.Environ()) from the CLI output before returning the error string.
🎯 Correctness — No material issues found

No material issues found

🧯 Error handling & edge cases — Blocking issues found

VERDICT: Blocking issues found

  • cmd/gadfly/main.go:156: The ccSpec check uses os.Getenv("GADFLY_MODEL") instead of the GADFLY_MODELS (plural) environment variable used by the rest of the application to resolve the model list.

    • Impact: Blocking. The claude-code engine cannot be triggered via the standard GADFLY_MODELS configuration variable, which is what the documentation and example YAMLs specify.
    • Fix: Ensure the engine selection logic uses the same model spec string that is passed to the specialist loop.
  • cmd/gadfly/engine.go:125: Potential for json.Unmarshal to fail silently on non-JSON output. While there is a fallback to raw stdout at line 143, the code attempts to Unmarshal the entire stdout buffer first. If the claude CLI outputs a mix of non-JSON warnings/logs followed by a JSON block (a common pattern for CLI tools), json.Unmarshal will fail, and the code will fall back to treating the entire buffer (including the JSON) as a raw string.

    • Impact: Degradation of structured error handling to raw text if the CLI is chatty.
    • Fix: Use a regex or a scanner to isolate the JSON block if it doesn't start at byte 0.
  • cmd/gadfly/engine.go:121: The cmd.Run() call is made, and then stdout is processed. However, if the command is killed by the ctx timeout (defined in reviewWithSpecialist), cmd.Run() returns an error. The current logic attempts to parse JSON first regardless of runErr. If the process was killed, stdout might be partial/malformed JSON, leading to jerr != nil and then falling through to the runErr != nil block. While functionally correct, the error message "claude -p failed: signal: killed" might mask the fact that a partial JSON result was actually produced.

    • Impact: Minor loss of diagnostic detail during timeouts.
    • Fix: Check runErr explicitly for context cancellation/timeout before falling back to raw stdout.
  • cmd/gadfly/engine.go:135: In the error path where runErr != nil, the code attempts to use stderr.String() then stdout.String(). Because stdout is also used for the JSON attempt, this is correct, but truncateForErr is applied. If the error is a massive stack trace or log dump, we only see the first 800 characters. Given this is an adversarial reviewer, the "unhappy path" of the engine itself should be as transparent as possible.

    • Impact: Minimal; just a limitation of the truncation logic.
    • Fix: (No change required, but noted).

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 25s

<!-- gadfly-review:ollama:gemma4:cloud --> ### 🪰 Gadfly review — `gemma4:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Blocking issues found</summary> VERDICT: Blocking issues found * **Command Injection via Environment Variables**: `cmd/gadfly/engine.go:84` and `cmd/gadfly/engine.go:85` read `GADFLY_CLAUDE_ALLOWED_TOOLS` and `GADFLY_CLAUDE_EXTRA_ARGS` and pass them directly to `exec.CommandContext`. While `exec.Command` does not use a shell by default, `strings.Fields` on `GADFLY_CLAUDE_EXTRA_ARGS` allows a user to inject any arbitrary flag into the `claude` CLI. More critically, since these environment variables are often set in CI YAML files (which can be modified in PRs if the CI is misconfigured or if they are sourced from untrusted repo-level configs), this allows an attacker to pass flags like `--permission-mode agent` or other destructive flags to the CLI, bypassing the intended `plan` (read-only) restriction. * **Fix**: Implement a strict allow-list of permitted flags/arguments or forbid the use of `GADFLY_CLAUDE_EXTRA_ARGS` entirely in favor of a structured configuration. * **Insecure Default/Override of Permission Mode**: `cmd/gadfly/engine.go:83` allows `GADFLY_CLAUDE_PERMISSION_MODE` to be overridden by an environment variable. The README claims `plan` is read-only, but by allowing this to be tuned via env vars, any user who can trigger the CI (or modify the workflow if permissions are loose) can switch the engine to a mode that allows the LLM to write to the filesystem. * **Fix**: Hardcode the `plan` mode or restrict overrides to a set of known-safe read-only modes. * **Potential Token Leakage in Logs**: `cmd/gadfly/engine.go:134-139` captures `stdout` and `stderr` and includes them in the error return value, which is then printed to `os.Stderr` in `main.go:317` and `main.go:326`. If the `claude` CLI outputs sensitive information (like the OAuth token or API key) during a failure or in a debug trace, those secrets will be leaked to the CI logs. * **Fix**: Implement a scrubbing mechanism to remove known secret values (from `os.Environ()`) from the CLI output before returning the error string. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> No material issues found </details> <details><summary><b>🧯 Error handling & edge cases</b> — Blocking issues found</summary> VERDICT: Blocking issues found * `cmd/gadfly/main.go:156`: The `ccSpec` check uses `os.Getenv("GADFLY_MODEL")` instead of the `GADFLY_MODELS` (plural) environment variable used by the rest of the application to resolve the model list. - **Impact**: **Blocking**. The `claude-code` engine cannot be triggered via the standard `GADFLY_MODELS` configuration variable, which is what the documentation and example YAMLs specify. - **Fix**: Ensure the engine selection logic uses the same model spec string that is passed to the specialist loop. * `cmd/gadfly/engine.go:125`: Potential for `json.Unmarshal` to fail silently on non-JSON output. While there is a fallback to raw stdout at line 143, the code attempts to `Unmarshal` the entire stdout buffer first. If the `claude` CLI outputs a mix of non-JSON warnings/logs followed by a JSON block (a common pattern for CLI tools), `json.Unmarshal` will fail, and the code will fall back to treating the entire buffer (including the JSON) as a raw string. - **Impact**: Degradation of structured error handling to raw text if the CLI is chatty. - **Fix**: Use a regex or a scanner to isolate the JSON block if it doesn't start at byte 0. * `cmd/gadfly/engine.go:121`: The `cmd.Run()` call is made, and then stdout is processed. However, if the command is killed by the `ctx` timeout (defined in `reviewWithSpecialist`), `cmd.Run()` returns an error. The current logic attempts to parse JSON *first* regardless of `runErr`. If the process was killed, `stdout` might be partial/malformed JSON, leading to `jerr != nil` and then falling through to the `runErr != nil` block. While functionally correct, the error message "claude -p failed: signal: killed" might mask the fact that a partial JSON result was actually produced. - **Impact**: Minor loss of diagnostic detail during timeouts. - **Fix**: Check `runErr` explicitly for context cancellation/timeout before falling back to raw stdout. * `cmd/gadfly/engine.go:135`: In the error path where `runErr != nil`, the code attempts to use `stderr.String()` then `stdout.String()`. Because `stdout` is also used for the JSON attempt, this is correct, but `truncateForErr` is applied. If the error is a massive stack trace or log dump, we only see the first 800 characters. Given this is an adversarial reviewer, the "unhappy path" of the engine itself should be as transparent as possible. - **Impact**: Minimal; just a limitation of the truncation logic. - **Fix**: (No change required, but noted). </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 25s</sub>
steve added 1 commit 2026-06-27 20:39:52 +00:00
fix: fold in claude-code-engine review findings
Build & push image / build-and-push (pull_request) Successful in 14s
2ca1ce0b6b
The dogfood swarm reviewed PR #2 (9 cloud reviewers; m5 wedged by a host
reboot and skipped this once). 66 findings graded via the gadfly MCP
(~half real, half false-positive/clean-verification). Folding in the
warranted ones:

- engine.go: claudeEnv() builds a MINIMAL subprocess environment (auth
  token + PATH/HOME/locale/GADFLY_CLAUDE_*), no longer handing GITEA_TOKEN
  and provider keys to the third-party CLI (4-model consensus).
- engine.go: runPass rewrite — check ctx.Err() first (don't emit a
  review from a timed-out run), treat an empty parsed result as an error
  instead of returning the raw JSON envelope, only trust a JSON answer on
  a clean exit, and drop the dangling ": " when there's no error detail.
- engine.go: put the CLI in its own process group (Setpgid) and SIGKILL
  the whole group on cancel, so a timed-out lens can't orphan node procs.
- engine.go: rune-safe truncateForErr.
- prompts: genericized the tool-name hints in buildTask + recheck so the
  claude-code engine isn't told to call majordomo-only tools (read_file/
  get_diff); also dropped the mort-specific framing from the recheck
  prompt (it must stay generic per CLAUDE.md).
- README: documented that GADFLY_CLAUDE_EXTRA_ARGS is whitespace-split
  and can override the read-only default, and that the subprocess gets a
  minimal env.

Left as-is (graded, noted in finding notes): operator-knob override of
read-only (intentional escape hatch), shared per-lens timeout (by design),
GADFLY_CLAUDE_BIN trust (operator-controlled, like GADFLY_BIN).

New tests: claudeEnv filtering, rune-safe truncation, and runPass paths
(clean / empty-result / is_error / non-zero) via a stub binary. gofmt
clean, go vet quiet, go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve merged commit 86f12c126f into main 2026-06-27 20:40:41 +00:00
steve deleted branch feat/claude-code-engine 2026-06-27 20:40:41 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/gadfly#2