feat: claude-code reviewer engine (per-lens claude -p shell-out)
#2
Reference in New Issue
Block a user
Delete Branch "feat/claude-code-engine"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Phase 1 of the gadfly-games build.
Adds a second review engine alongside the majordomo agent loop. For each lens it shells out to the Claude Code CLI (
claude -p) inside the checked-out repo, so Claude Code verifies findings with its own read tools — then gadfly runs the same verdict-parse → recheck → consolidate → emit pipeline, unchanged.Design
cmd/gadfly/engine.go— newreviewEngineinterface with two impls:majordomoEnginewraps the existingrunAgentpath (no behaviour change).claudeCodeEngineexecsclaude -p … --output-format jsonand parses.result.main.go'srunSpecialists/reviewWithSpecialistare now engine-agnostic; the per-lens recheck reuses the samerunPassprimitive.claude-code(CLI default model) orclaude-code/<model>(suffix →--model, e.g.claude-code/sonnet).CLAUDE_CODE_OAUTH_TOKEN(no--bare), elseANTHROPIC_API_KEY. Read-only by default (--permission-mode plan); tunable viaGADFLY_CLAUDE_PERMISSION_MODE/_ALLOWED_TOOLS/_EXTRA_ARGS/_MODEL/_BIN.autoselection + thedelegate_investigationworker are majordomo-only and are skipped with this engine.@anthropic-ai/claude-code(notably larger image; ollama-only users pay only the size).Also
:sha-c3d09d3so gadfly's own PRs now review with the live status board shipped in Phase 3 — this PR is the first to show the board.Honesty
The CLI invocation is covered by unit tests, but a live subscription-auth run hasn't been validated end-to-end here, and subscription-auth-in-CI is a gray area in Anthropic's terms — flagged in the README and example. Phase 2 will add the pluggable backends (Ollama-through-the-harness, etc.) and actually exercise it.
Tests
engine_test.gocovers spec detection,--modelderivation, and argv building (no live CLI call).gofmtclean ·go vetquiet ·go buildok ·go test -race ./...green · shellsbash -nclean.🤖 Generated with Claude Code
claude -pshell-out)🪰 Gadfly — live review status
9/10 reviewers finished · updated 2026-06-27 19:59:36Z
deepseek-v4-pro:cloud· ollama-cloud — ✅ donegemma4:cloud· ollama-cloud — ✅ doneglm-5.1:cloud· ollama-cloud — ✅ doneglm-5.2:cloud· ollama-cloud — ✅ donegpt-oss:120b-cloud· ollama-cloud — ✅ donekimi-k2.7-code:cloud· ollama-cloud — ✅ donem5/qwen3.6:35b-mlx· m5 — ⏳ 2/3 lensesminimax-m3:cloud· ollama-cloud — ✅ donenemotron-3-super:cloud· ollama-cloud — ✅ doneqwen3-coder:480b-cloud· ollama-cloud — ✅ doneLive status board. Findings are posted in each model's own comment. Advisory only — does not block merge.
🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
VERDICT: Minor issues
Through the security lens, the new
claude-codeengine is a command-injection surface and a privilege boundary. I verified the relevant paths against the actual code; the design is mostly sound, but two things deserve a fix.cmd/gadfly/engine.go:82,92-103—GADFLY_CLAUDE_*values are forwarded to the CLI with no validation, so the "read-only by default" guarantee is not enforced at the code level. The argv is built viaexec.CommandContext(ctx, e.bin, e.args(...)...)(engine.go:114), which does not go through a shell, so there's no classic shell injection — good. Bute.permissionMode = envOr("GADFLY_CLAUDE_PERMISSION_MODE", "plan")(line 82) is appended verbatim as--permission-mode(lines 97-98),GADFLY_CLAUDE_ALLOWED_TOOLSis forwarded as--allowedTools(lines 100-101), ande.extraArgs = strings.Fields(os.Getenv("GADFLY_CLAUDE_EXTRA_ARGS"))is appended verbatim (line 103).planis only the default, not enforced: an operator (or a compromised/mis-set env) can flip the CLI into a write-capable mode (GADFLY_CLAUDE_PERMISSION_MODE=acceptEdits, or--allowedTools Write,Bash,Edit). In the threat model these are operator-set env vars (not PR-author-controlled), so this is low severity, but the README's "read-only by default" premise is not enforced at the code level. Suggested fix: validatepermissionModeagainst an allowlist (plan, and optionallyview/bypassPermissionswith a warning), and warn/reject when a non-read-only mode or an--allowedToolscontainingWrite/Bash/Editis selected.cmd/gadfly/engine.go:116— full process env (os.Environ()) is inherited by theclaudesubprocess, includingGITEA_TOKEN. Verified:entrypoint.sh:60-61requiresGITEA_TOKEN, andscripts/run.sh:164-174invokes the gadfly binary inheriting the script's environment (which containsGITEA_TOKEN,GADFLY_FINDINGS_TOKEN,OLLAMA_API_KEY, etc.). Theclaude -pCLI then runs with all of those visible in its environment, and the CLI is an LLM agent withRead/Grep/Globtools pointed at the checked-out repo tree (cmd.Dir = e.repoDir, line 115). A model prompted via attacker-influenced diff content to "verify findings against real code" can be induced to read.envfiles or any file in the tree, and the model's own tool output flows back into the PR comment viarenderConsolidated(main.go:210). This is a credential-leakage path the majordomo path does not have —repoFStools (tools.go) are sandboxed to the repo directory and do not expose process env. There is no env allowlist atengine.go:116. The 800-chartruncateForErr(line 150) only caps error detail, not successfulres.Resultoutput, so it is not a sufficient mitigation. Suggested fix: build an explicit minimal env for the subprocess (onlyCLAUDE_CODE_OAUTH_TOKEN/ANTHROPIC_API_KEY+ neededHOME/PATH/GADFLY_CLAUDE_*), rather thanos.Environ(), so the Gitea token and other provider keys can't be exfiltrated via the agent's read tools or surfaced in output.No other material security issues in this lens. The
exec.CommandContextuse avoids shell injection;cmd.Diris set to the checked-out repo (operator-controlled); andisClaudeCodeSpec(engine.go:64-67) is a plain equality/prefix check with no globbing, so no path-injection there.(Both findings above were verified by reading
engine.go,entrypoint.sh,scripts/run.sh,tools.go, andmain.go.)🎯 Correctness — Minor issues
Both findings confirmed against the actual code.
VERDICT: Minor issues found
cmd/gadfly/main.go:434&cmd/gadfly/recheck.go:87— The shared task/recheck prompts name majordomo-specific tools that the claude-code engine does not have. I verifiedbuildTask(line 444) instructs the reviewer to "use your tools (read_file, grep, find_files)", and bothbuildTask(line 434) andbuildRecheckTask(line 87) emit the note "callget_difffor the full text" when the diff exceedsdefaultMaxDiffChars(60000,main.go:87). These are gadfly's in-process majordomo tool names. TheclaudeCodeEngineexecs theclaudeCLI, which provides its own differently-named tools (Read/Grep/Glob) and has noget_difftool (confirmedengine.go:113-147only sets cwd + argv; it does not register any gadfly tools). Concrete impact: (1) the prompt tells the model to useread_file/find_files/get_diff, which don't exist in that environment — at best noise, at worst the model wastes turns or fails to verify; (2) on any diff > 60k chars the reviewer is told to "callget_diff" to recover the truncated portion, but it cannot, so the claude-code engine silently reviews against an incomplete diff and can neither see nor flag the missing part. The majordomo path is unaffected because it does expose those tools. Suggested fix: make the tool-name hint and the truncation-recovery note engine-aware (e.g. atoolHint/fullDiffHintthe engine supplies), or haveclaudeCodeEngineinject the full diff into the task instead of relying on aget_diffcall.cmd/gadfly/main.go:187— Type assertioneng.(*majordomoEngine).mdlis correct but fragile. Verified: in theauto && !ccSpecbranchengis guaranteed to be*majordomoEngine(set atmain.go:172, withccSpecfalse), so the assertion cannot panic. No bug today, but it relies on the invariant that the non-claude branch is the only other engine impl; a third engine would silently break. Not material to this PR — noted only because I verified it's safe.🧯 Error handling & edge cases — Minor issues
Minor issues found
cmd/gadfly/engine.go:113-121—claudeCodeEngine.runPassshells out withexec.CommandContextbut relies on Go's default child-process handling: whenctxexpires (per-lens timeout inreviewWithSpecialist,main.go:311),cmdis killed but theclaudeCLI may have spawned its own child processes (Node). Those grandchildren are not killed by default (noSetpgid/SysProcAttr, confirmed absent in the package), so a timed-out lens can leave orphanedclaudeprocesses holding an OAuth session slot. Inside a multi-lens, multi-model CI run this can pile up background processes. Suggested fix: setcmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}and kill the whole group on context cancel (orcmd.Cancel+cmd.WaitDelay), so the CLI tree dies with the parent. Severity is minor because it only manifests on timeout/unhappy paths and the container is torn down after the run.cmd/gadfly/engine.go:125-132— The structured-answer path returns success wheneverjson.Unmarshalsucceeds andres.Resultis non-empty, even ifrunErr != nil(it only checksres.IsError). If the CLI exits non-zero but still emits a JSON object on stdout with a populatedresultfield (nois_error),runPassswallowsrunErrand returns that text as a valid draft. For an advisory reviewer this degrades to plausible-looking output instead of an error notice; the recheck pass would then treat it as a real draft. Suggested fix: ifrunErr != nil, prefer the error branch (fall through torunErrhandling) instead of trusting the stdout JSON, or at least require!res.IsError && runErr == nilbefore accepting.cmd/gadfly/engine.go:153-154(truncateForErr) —s[:max]slices bytes, so truncation can split a multi-byte UTF-8 rune mid-sequence and produce invalid UTF-8 in the error string (which is later rendered into the review comment / stderr). Edge case, cosmetic. Suggested fix: use a rune-safe truncation, e.g. loop overutf8.DecodeRuneInString.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 8s
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — Blocking issues found
VERDICT: Blocking issues found
cmd/gadfly/engine.go:103—extraArgsappended after--permission-mode plandefeats the read-only default.args()builds--permission-mode planand then appendse.extraArgs(fromGADFLY_CLAUDE_EXTRA_ARGS,strings.Fields-split). Most CLIs (and Claude Code specifically) treat the last occurrence of a flag as the winning one, so an attacker who can set env vars in CI can useGADFLY_CLAUDE_EXTRA_ARGS="--permission-mode bypassPermissions"to turn the subprocess fully write-capable — let it edit files, run arbitrary shell commands, exfiltrate secrets, etc. The "read-only by default" claim in the README/example is bypassed by the documented tuning knob. Verified by readingengine.go:92-104(extraArgs is the finalappend, after--permission-mode) andrunPass(cmd.Dir = e.repoDir, full env inherited). Fix: either rejectpermission-modeinsideextraArgs, strip--permission-modeoverrides fromextraArgs, or place the user overrides before the safe default. Consider the same for--allowedTools.cmd/gadfly/engine.go:116— fullos.Environ()is passed to theclaudesubprocess, leaking the Gitea token into a third-party CLI.cmd.Env = os.Environ()carriesGITEA_TOKEN,GITEA_API,GADFLY_API_KEY,GADFLY_FINDINGS_TOKEN, and any other secrets the runner inherited (confirmed viagrep— no othercmd.Envsanitization anywhere in the repo). Claude Code is a third-party node binary installed vianpm install -g @anthropic-ai/claude-code(verified inDockerfile:32); its telemetry / error reporting path may observe and transmit env. The entrypoint already filters/handles env separately for the worker (entrypoint.sh:139-142only re-exports a small set), but the binary does not. Fix: build a minimal env (onlyPATH,HOME,CLAUDE_CODE_OAUTH_TOKEN/ANTHROPIC_API_KEY, locale, andCLAUDE_CODE_*knobs), droppingGITEA_*,GADFLY_API_KEY,GADFLY_FINDINGS_TOKEN, etc.cmd/gadfly/engine.go:79,114—GADFLY_CLAUDE_BINis used as a literal executable with no validation. With the env inherited, anyone able to set env (a privileged operator or any compromise of the env namespace) can pointbinat any binary, which then runs ine.repoDirwith the full env — full RCE as the review job. Expected scope for this knob is "whichclaudebinary", so enforce that it resolves to the bundledclaudeinstall (e.g.filepath.EvalSymlinkscheck, or refuse paths outside/usr/local/bin/the npm prefix), or document and accept the trust assumption.🎯 Correctness — Minor issues
VERDICT: Minor issues
cmd/gadfly/engine.go:113-147—runPassswallows context cancellation when JSON parsing succeeds. Aftercmd.Run()(which respectsctxviaexec.CommandContext), the code goes straight tojson.Unmarshal(...)and returnsres.Resultwithout consultingctx.Err(). If the CLI flushes a JSON envelope just before the timeout fires, the caller (reviewWithSpecialistatmain.go:316) only checkserrand will emit the truncated/random text as a real review. Low-likelihood but real; fix is an earlyif err := ctx.Err(); err != nil { return "", err }before the JSON-parse branch, plus treatingrunErr != nil && ctx.Err() != nilas a context error rather than a genericclaude -p failed.cmd/gadfly/engine.go:129—strings.TrimSpace(res.Result)collapses legitimate single-space/blank results. A guardrail/blocked response that prints{"result": " "}yieldsout == ""here and falls through to theraw := strings.TrimSpace(stdout.String())fallback, returning the entire raw JSON envelope ({"result":" ", "is_error":true, ...}) as the "review" —parseVerdictthen tries to read it as review markdown. Same shape as the cancellation finding: the degraded-mode fallback isn't context-/error-aware. Fix: onIsError == truealways go through the error path (don't trustres.Resultas an answer); or detect that the JSON parse succeeded butres.Resultwas empty +runErr == niland treat that as a parse-failure error too.cmd/gadfly/engine.go:84—strings.Fields(GADFLY_CLAUDE_EXTRA_ARGS)silently splits quoted arguments. The README's "passed verbatim" promise is misleading: a user-supplied--append-system-prompt "system prompt text"becomes four argv tokens. The README/example currently only shows--max-turns 30(no quotes needed), so this is documentation/behavior drift rather than a code bug, but worth either documentingGADFLY_CLAUDE_EXTRA_ARGSas shell-tokenized or parsing withgithub.com/google/shlex/kballard/go-shellquote.cmd/gadfly/main.go:187— Type assertioneng.(*majordomoEngine).mdlhas no safety net. Guarded byccSpec == falsein the enclosingelsebranch today, so it can't panic on the current call site — but it's a fragility foot-gun: if the auto-select block is ever refactored soengcould be aclaudeCodeEngine, this panics. Minor; hoistingvar mdl *majordomoEnginein the non-cc branch (or just threading the model through) would be clearer.scripts/run.sh:56-59— Cosmetic:MODEL_PROVIDERforclaude-code/<model>displays asclaude-codein the comment header.entrypoint.sh:181always setsPROVIDER=ollama, and run.sh derivesMODEL_PROVIDER="${MODEL%%/*}", soclaude-code/sonnetrenders providerclaude-codewhile every other model gets a real backend name (m1pro, ollama-cloud, …). Cosmetic only — no logic bug — but a reader of the comment header will be confused.🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
Findings (🧯 Error handling & edge cases)
truncateForErrslices a string at an arbitrary byte boundary —cmd/gadfly/engine.go:149-157max = 800is a byte index, butsis arbitrary CLI output (theclaudeCLI emits UTF-8 — model names likeclaude-opus-4-5, accented words, emoji). If byte 800 lands mid-rune,s[:800]returns invalid UTF-8, then a literal…(itself 3 bytes) is appended next to corrupted bytes. The string is interpolated into anfmt.Errorfthat flows to the inline⚠️ reviewer failednotice.Fix: truncate on a rune boundary, e.g.
if utf8.RuneCountInString(s) > max { return string([]rune(s)[:max]) + "…" }, or keep byte-slicing but trim the trailing partial rune withutf8.DecodeLastRuneInString.GADFLY_CLAUDE_EXTRA_ARGSis split withstrings.Fields, silently corrupting quoted values —cmd/gadfly/engine.go:84A user setting
--append-system-prompt "hello world"(or--system-prompt-file /tmp/path with spaces/x.txt, etc.) gets three tokens:--append-system-prompt,"hello,world". The CLI then either errors on a malformed flag or silently misbehaves. The unit test (TestClaudeCodeArgs,engine_test.go:70-101) only exercises a value with no whitespace (--max-turns 30), so this edge case is uncovered.Fix: if you want verbatim passthrough, document that and use a shell-style split (
shlex.Split) or accept the env as a\n-delimited list. At minimum, add a test with a value containing spaces to lock in the contract.eng.(*majordomoEngine)type assertion is locally safe but brittle —cmd/gadfly/main.go:187Only reached when
ccSpecis false (it's inside theelsearm at lines 186-200), so today it cannot panic — but the assertion sits next to theccSpecbranch that builds the engine, and a future refactor that lifts it (or adds a third engine) will turn this into a runtime panic with no test to catch it. The existing panic-recovery inrunSpecialists(lines 256-261) would catch it but emit a noisy⚠️ This reviewer panickedper lens instead of a clear "auto-select requires the majordomo engine" message.Fix: either expose
mdl llm.Modelon the interface (via amodel()accessor with a concrete return), or gate the assertion explicitly withif e, ok := eng.(*majordomoEngine); ok { … }and fall back to the default suite otherwise.Verified clean in this lens
runPassfailure paths: stderr/stdout truncation is applied to error detail beforefmt.Errorf; non-fatal (rendered as an inline notice, never sinks the job).cmd/gadfly/engine.go:113-147.runSpecialistspanic recovery (lines 256-261) is preserved across the engine refactor.reviewWithSpecialisttimeout + cancel are correctly scoped per-lens (lines 310-312).claude-codeengine correctly skips the worker model + auto-select (lines 158-172, 183-185) without leaving stale state onfsTools.worker(the ccSpec branch never assignsfsTools.worker, leaving it as the zero valuenil).os.Environ()passthrough inherits auth tokens without leaking them into argv; the only env-derived strings that reachos/execarebin(defaulted) andextraArgs(user-supplied).Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 31s
🪰 Gadfly review —
m5/qwen3.6:35b-mlx(m5)⏳ Reviewing… this comment will update with findings and run time.
🪰 Gadfly review —
glm-5.1:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
GADFLY_BINis a shell script variable (inscripts/run.sh), not a Go-level concern. The draft's finding 3 acknowledges it's the same pattern and "low risk since it's admin-controlled."Let me now verify the exact claim about "commander" — the draft says "Most CLI parsers (including Claude Code's
commander)" — this is a claim about an external package. Let me check if there's any documentation or code that confirms this.The
@anthropic-ai/claude-codepackage is installed via npm in the Dockerfile. It's not in this repo. The claim aboutcommanderbeing the CLI parser is speculative but reasonable for Node.js CLIs. More importantly, the underlying issue — that extraArgs can override--permission-mode plan— is real regardless of which CLI parser is used. Even if "last wins" isn't howcommanderworks, many CLI parsers do honor the last flag, and even those that don't may still be subvertable by other means (like--dangerously-skip-permissions).Now let me verify the README's claim about safety:
Looking at the README at line 104:
GADFLY_CLAUDE_PERMISSION_MODE|plan|--permission-mode(read-onlyplankeeps it from editing)The README advertises
planas a safety feature but doesn't mention thatEXTRA_ARGScan override it. This is a valid documentation gap.Let me now make my final assessment:
Finding 1: Valid.
extraArgsare appended after--permission-mode plan, soGADFLY_CLAUDE_EXTRA_ARGS="--permission-mode default"could potentially override the safety default. The README advertisesplanas a safety feature without noting this caveat. The claim aboutcommanderis speculative (external package) but the core issue is real. Minor severity since these are admin-controlled env vars.Finding 2: Valid.
GADFLY_CLAUDE_ALLOWED_TOOLSis passed verbatim with no validation. Under--permission-mode plan, tools likeBashandWriteshould be restricted, but the code doesn't enforce this. Minor severity — defense-in-depth concern.Finding 3: Valid but the lowest value.
GADFLY_CLAUDE_BINis exec'd without path validation, same pattern asGADFLY_BIN. The draft itself says "low risk since it's admin-controlled." This is more of an observation than an actionable finding. I'll keep it but it's very minor.All three findings survive verification. The verdict remains "Minor issues."
VERDICT: Minor issues
cmd/gadfly/engine.go:92–103—GADFLY_CLAUDE_EXTRA_ARGScan override--permission-mode plansafety default.extraArgsare appended after--permission-mode planin the argv (return append(a, e.extraArgs...)). Most CLI parsers apply the last occurrence of a flag, soGADFLY_CLAUDE_EXTRA_ARGS="--permission-mode default"could silently downgrade from read-only to full write access. No validation or documentation warns about this. The README (line 104) advertisesplanas a safety feature ("read-onlyplankeeps it from editing") without noting thatEXTRA_ARGScan negate it. Fix: either validateextraArgsagainst overrides of--permission-mode/--dangerously-skip-permissions, or append--permission-modeafterextraArgsso the hard-coded default always wins.cmd/gadfly/engine.go:83,100–101— No validation onGADFLY_CLAUDE_ALLOWED_TOOLSvalues. The env var is passed verbatim as--allowedTools. A value likeBash,Write,Editcould grant write/execute capability. Under--permission-mode planthis should be vetoed, but the code doesn't enforce that, and the EXTRA_ARGS override above makes it moot. A brief validation or at least a README warning that this knob should not expand beyond read-only tool names would add defense-in-depth.cmd/gadfly/engine.go:79,114—GADFLY_CLAUDE_BINallows arbitrary binary execution. The env var is passed directly toexec.CommandContext. A mis-set value (or a compromised CI variable) would execute an attacker-chosen binary with the full container environment (includingGITEA_TOKEN,CLAUDE_CODE_OAUTH_TOKEN, etc.). This is the same threat model as the pre-existingGADFLY_BIN, but there is no path resolution safety (e.g., checking it resolves to an absolute path or is on a safe list). Low risk since it's admin-controlled, but worth noting for defense-in-depth.🎯 Correctness — No material issues found
VERDICT: No material issues found
I reviewed the diff through the correctness lens, verifying env var flow, CLI argument construction, JSON parsing logic, and branching in
main.go. Key things I checked:GADFLY_MODEL(singular) vsGADFLY_MODELS(plural): Not a bug —entrypoint.shiteratesGADFLY_MODELS, setsMODELper iteration, andrun.shmapsMODEL→GADFLY_MODELfor the Go binary. Consistent.isClaudeCodeSpecbranching: TheccSpecguard inmain.go:156correctly prevents theeng.(*majordomoEngine).mdltype assertion (line 187) from ever executing for a claude-code engine — that code path only runs whenccSpecis false.CLI argv construction (
args()method): Arguments are passed viaexec.Command(not a shell), so no injection or quoting issues. The order and flags are syntactically correct.claudeResultJSON parsing: The struct tags (result,is_error,subtype) match the documented Claude Code JSON output format. The fallback logic (try JSON → try raw stdout → error) degrades gracefully.truncateForErruseslen()(bytes) rather thanrunecount, so it could split a multi-byte UTF-8 character at position 800. Impact: a garbled byte in an error message trailer. Not material.strings.FieldsonGADFLY_CLAUDE_EXTRA_ARGS: Splits on all whitespace, which is correct for space-separated flags like--max-turns 30. Arguments containing literal spaces can't be represented, but that's an acceptable limitation for the documented use.provider_of("claude-code/sonnet")in entrypoint.sh returns"claude-code", grouping all claude-code specs into one provider lane — correct behavior.No logic bugs, domain errors, or semantic correctness issues found.
🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
cmd/gadfly/engine.go:150-156—truncateForErrtruncates on byte boundary, can split a multi-byte UTF-8 character.len(s) > maxands[:max]operate on bytes. If the 800th byte falls mid-rune, the returned string is invalid UTF-8. It is then embedded in anfmt.Errorfmessage that ultimately appears in a review comment. Low-severity (cosmetic, error path only), but straightforwardly fixable by truncating on a rune boundary (e.g., loop backward frommaxto find a valid boundary, or useutf8.RuneStart).cmd/gadfly/engine.go:113-146— Timeout-killed CLI process always surfaces as error, never as partial output. Whencontext.WithTimeoutexpires,exec.CommandContextsends SIGKILL; the child's stdout is typically incomplete JSON. The JSON parse then fails,runErris non-nil, and the function returns afmt.Errorf. By contrast, the majordomo engine'srunAgentcan return a partial answer on step-budget exhaustion via its wrap-up steer and finalization fallback. A slowclaude-codelens that times out mid-write always produces the ⚠️ error notice rather than potentially usable partial text. Not a bug — truncated JSON can't be parsed — but a behavioral gap on the unhappy path worth noting.cmd/gadfly/recheck.go:27,87&cmd/gadfly/main.go:434,444— Prompts referenceget_diff/read_file/grep/find_filestools that don't exist in the Claude Code engine. When the claude-code engine is active, the system prompt (scripts/system-prompt.txt), the review task (buildTask), and the recheck prompt (recheckSystemPrompt+buildRecheckTask) all instruct the model to callread_file,grep,find_files, andget_diff— which are the majordomo agent's tools, not Claude Code's (Read, Grep, Glob). The most impactful case isget_diff: the diff is truncated in the prompt and the model is told to "call get_diff for the full text," but no such tool exists for Claude Code. The model can still use its own tools to verify findings against the repo, so this doesn't cause a crash — but the instruction to call a nonexistent tool is misleading. Low-impact but an edge case where prompts are engine-specific yet delivered unconditionally.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 38s
🪰 Gadfly review —
kimi-k2.7-code:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found.
Security-relevant points checked:
sh -c:cmd/gadfly/engine.go:114usesexec.CommandContext(ctx, e.bin, e.args(system, task)...), soGADFLY_CLAUDE_EXTRA_ARGS(split withstrings.Fields) and the prompts become separate argv entries; no shell metacharacter injection is possible.cmd/gadfly/engine.go:116setscmd.Env = os.Environ()only to passCLAUDE_CODE_OAUTH_TOKEN/ANTHROPIC_API_KEYto the CLI, matching howscripts/run.shalready passes through provider keys. No code logs, prints, or persists the token.cmd/gadfly/main.go:159sets the CLI cwd tofsTools.root, whichnewRepoFSresolves to an absolute, symlink-clean path (cmd/gadfly/tools.go:45-64), so the CLI's tools operate inside the intended checkout.--permission-mode plan:cmd/gadfly/engine.go:82andengine.go:98.cmd/gadfly/engine.go:125) unmarshals into a plain struct and never acts on the result as code.The trade-off that the in-process sandbox is replaced by the CLI's own file access is documented honestly in the README and PR description, and is not a hidden security regression.
🎯 Correctness — Minor issues
Minor issues
cmd/gadfly/main.go:187—eng.(*majordomoEngine).mdlbreaks thereviewEngineabstraction. It is safe today because the surroundingelsebranch always assigns a*majordomoEngine, but the cast is a latent panic if that branch ever changes and is brittle for testing/mocking. Better to keep the resolvedllm.Modelin a local variable before wrapping it, or add a small getter.cmd/gadfly/engine.go:84/README.md:105—GADFLY_CLAUDE_EXTRA_ARGSis parsed withstrings.Fields, which splits on any whitespace and discards quotes. That contradicts the README's "appended verbatim" wording and makes it impossible to pass a value containing spaces (e.g.--some-flag "two words"becomes three argv entries, not two). Either document the whitespace-split behavior explicitly, or use a shell-style tokenizer so quoted values work.🧯 Error handling & edge cases — Minor issues
Minor issues
cmd/gadfly/engine.go:84—GADFLY_CLAUDE_EXTRA_ARGSis split withstrings.Fields, which silently mishandles quoted arguments (e.g.--some-flag "value with spaces"becomes two tokens with the quotes still attached). The README says extra args are “appended verbatim”, but that is not whatstrings.Fieldsdoes. Use a small shell-style splitter or document the limitation.cmd/gadfly/engine.go:153—truncateForErrcaps by byte length (len(s) > 800), not rune length, so a multi-byte error message can be truncated mid-rune. Useutf8.RuneCountInStringand slice by runes. (minor cosmetic risk in log output)Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 8s
🪰 Gadfly review —
deepseek-v4-pro:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
The security posture of this change is sound. I traced the key data flows and verified against the actual code:
Command construction (
engine.go:92-103,engine.go:113-114):exec.CommandContextis used, not a shell. Thetask(which contains untrusted PR title/body/diff) andsystemprompt are passed as directargvelements — no shell injection vector. Verified by readingengine.go:92-103andmain.go:424-447(buildTask).os.Environ()inheritance (engine.go:116): Passes the full environment to theclaudechild process. This is standard Go practice and theclaudeCLI is designed to read onlyCLAUDE_CODE_OAUTH_TOKEN/ANTHROPIC_API_KEYfrom it. The parent process already holds the same secrets in its own environment — passing them to a child on the same machine doesn't cross a trust boundary. Not a material finding.Read-only default (
engine.go:82,engine.go:97-98):--permission-mode planis the hardcoded default, enforced inargs(). Even if the LLM were prompt-injected by a malicious diff, the CLI's permission layer (not the model) blocks modifications. Verified.--allowedToolsand--extraArgs(engine.go:83-84,engine.go:100-103): Both are sourced from admin-controlled env vars (GADFLY_CLAUDE_*), not from PR content. No untrusted input reaches these flags.JSON deserialization (
engine.go:124-125):claudeResultonly hasstring/boolfields. Go'sencoding/jsonis memory-safe. The input is from a locally-executed trusted binary's stdout. No unsafe deserialization risk.Pre-existing
.git/configtoken exposure (entrypoint.sh:126): The clone URL embedsGITEA_TOKEN, which lands in.git/config. Both the majordomoread_filetool and theclaudeCLI's Read tool can access it. This is not introduced by this PR and is not made worse by it — the majordomo path already has the same exposure. Worth noting but outside the scope of this review.No new credential leakage paths: Error messages are truncated to 800 chars (
engine.go:150-157). The review output pipeline (verdict parse → recheck → consolidate → emit) is unchanged and doesn't introduce new channels for secret exfiltration.🎯 Correctness — Minor issues
Both findings are confirmed against the actual code. Let me verify the exact line numbers and logic flow:
Finding 1 (lines 125-145): The control flow is exactly as described. When JSON parses successfully (line 125),
res.IsErroris false (line 126), andres.Resultis empty (line 129), the code falls through both the error check at line 134 (runErris nil) and lands at line 143, which returns the raw JSON string as the review text. Confirmed.Finding 2 (line 84):
strings.Fieldsis the sole parsing mechanism forGADFLY_CLAUDE_EXTRA_ARGS, and it cannot represent arguments containing spaces. Confirmed.Both are real, minor issues. The verdict stands.
Verdict: Minor issues
cmd/gadfly/engine.go:125-145—runPassreturns raw JSON as review text whenResultis empty but JSON parses successfully. If the CLI exits 0 and emits{"result":"","is_error":false}, the JSON unmarshals cleanly,res.Resultis empty,runErris nil, and the code falls through to the raw-stdout fallback at line 143 — which returns the JSON string itself as the "review." The consolidated comment would then contain a raw JSON blob instead of a meaningful review. The fix is to treat an emptyResultin a successfully-parsed JSON response as an error (or at minimum not fall back to raw JSON):This is an edge case (the CLI shouldn't normally produce empty results), but the current fallback would silently emit garbage.
cmd/gadfly/engine.go:84—strings.FieldsonGADFLY_CLAUDE_EXTRA_ARGScannot represent arguments containing spaces. If a user needs to pass something like--system-prompt-file /path/with spaces/prompt.txt,strings.Fieldswould split it into three arguments. This is a limitation rather than a bug per se, but it means the_EXTRA_ARGSknob can't express all valid CLI invocations. A shell-compatible split (e.g.,shlexor a CSV approach) would be more robust.🧯 Error handling & edge cases — Minor issues
All three findings are confirmed against the actual code. The verdict remains Minor issues.
Verdict: Minor issues
cmd/gadfly/engine.go:154—truncateForErrcan split a multi-byte UTF-8 character (verified by reading the function).s[:max]uses byte-index slicing; ifmaxfalls mid-codepoint the error string will contain an invalid UTF-8 sequence. In practice CLI error output is almost certainly ASCII, so this is cosmetic — but it's a latent bug. Fix: usestring([]rune(s)[:max])or iterate withfor i := range sto find a safe cut point.cmd/gadfly/main.go:310–312— shared timeout context starves the recheck pass (verified by readingreviewWithSpecialist). A singlecontext.WithTimeoutis created and used for both the draftrunPassand the recheckrunPass. For the majordomo engine this is mitigated by the internal step budget, but the claude-code engine is an external process with no step cap — the timeout is its only constraint. If the draft CLI call consumes, say, 280 of 300 seconds, the recheck gets ≤20 seconds and is very likely to fail (gracefully — it falls back to the draft). This is pre-existing architecture, but the claude-code engine makes it materially worse because there is no intermediate step-budget backstop. Consider splitting the timeout or giving the recheck its own deadline.cmd/gadfly/engine.go:134–139—runErrdetail can be empty when stderr and stdout are both blank (verified by readingrunPass). Ifclaudeexits non-zero with no output at all (e.g. immediate crash before writing anything), the error message becomes"claude -p failed: <runErr>: "with a trailing": ". Harmless but sloppy. Aif detail != "" { ... }guard orstrings.TrimRighton the format string would clean it up.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 52s
🪰 Gadfly review —
nemotron-3-super:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found
Note: After thorough review through the security lens, no material security issues were identified in the changes. The introduction of the Claude Code engine properly isolates untrusted input (prompts/diffs) as arguments to a trusted binary, uses a read-only permission mode by default, and inherits environment variables from a trusted workflow context. All potential injection vectors are mitigated by using
exec.CommandContextwith argument arrays rather than shell execution. The environment variable handling assumes trusted workflow configuration, which aligns with the project's threat model. The compliance note about Anthropic's terms is noted but falls outside the strict security lens scope.🎯 Correctness — No material issues found
No material issues found
🧯 Error handling & edge cases — Minor issues
Review Summary: Error Handling & Edge Cases
Verdict: Minor issues
Findings
claudeCodeEngine.runPass, when the Claude Code CLI returns a successful JSON response (IsError=false) with an empty or whitespace-onlyresultfield, the function incorrectly falls back to returning the raw stdout (the entire JSON) or an error if stdout is empty. This violates the contract of returning the model's actual output. For example, if the CLI returns{"result": " ", "is_error": false}, the function would return the raw JSON string instead of an empty string. This could cause downstream processing (like verdict parsing) to receive malformed input.Fix: After confirming
!res.IsError, returnres.Resultdirectly without checking for emptiness, matching the Majordomo engine's behavior of returning the model's output verbatim.Verified by reading the function in
cmd/gadfly/engine.go.No other error handling or edge case issues were found within the scope of this lens. The context propagation, error handling for command execution, and JSON fallback logic are otherwise sound.
-- No material issues found in other files reviewed (main.go changes, etc.) for this lens.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 28s
🪰 Gadfly review —
gpt-oss:120b-cloud(ollama-cloud)Verdict: No material issues found · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found
🎯 Correctness — ⚠️ could not complete
⚠️ This reviewer failed to complete: agent: tool-error guard tripped: 4 consecutive steps with only failing tool calls
🧯 Error handling & edge cases — No material issues found
Verdict: No material issues found
claudeCodeEnginecorrectly checks and propagates errors from the CLI execution, JSON unmarshalling, and result validation; no errors are silently ignored.exec.Cmd, buffers) are scoped to the function; there are no missingdeferstatements or resource leaks.main.goandrunSpecialistspreserve existing error handling pathways; failures from the new engine are reported via the sameerroredflag used elsewhere.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 45s
🪰 Gadfly review —
qwen3-coder:480b-cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
VERDICT: Minor issues
cmd/gadfly/engine.go:84— ThenewClaudeCodeEnginefunction usesstrings.Fieldsto parseGADFLY_CLAUDE_EXTRA_ARGS, which splits on any whitespace. This could be problematic if users intend to pass arguments with spaces.cmd/gadfly/engine.go:116— TheclaudeCodeEngine.runPassmethod inherits the entire environment viaos.Environ(), which could inadvertently expose sensitive environment variables to theclaudeCLI.examples/claude-code.yml:63— The workflow example restricts triggering to specific users for security, but this might be too restrictive for some use cases.🎯 Correctness — No material issues found
No material issues found.
The changes introduce a new
claudeCodeEnginethat shells out to the Claude Code CLI, alongside the existingmajordomoEngine. The logic for selecting between these engines and handling their different behaviors (e.g., auto-selection being majordomo-only) appears sound and well-tested. The abstraction through thereviewEngineinterface correctly allows both engines to be used interchangeably in the rest of the pipeline.I verified key parts of the implementation:
isClaudeCodeSpecfunction correctly identifies Claude Code model specifications.newClaudeCodeEnginefunction properly parses the model specification and environment variables.argsmethod constructs the correct command-line arguments for the Claude CLI.runPassmethod handles execution and parsing of the CLI output appropriately.main.gocorrectly selects the engine and adjusts behavior (like disabling auto-selection) when using Claude Code.The unit tests cover the core logic of the new engine well. The changes are logically correct and maintain the existing behavior for the majordomo engine while adding the new functionality in a clean, modular way.
🧯 Error handling & edge cases — No material issues found
No material issues found.
The changes introduce a new
claudeCodeEnginethat shells out to the Claude Code CLI. The error handling and edge cases appear to be well-covered:The
runPassmethod inclaudeCodeEnginehandles various error scenarios:exec.CommandContextfor proper timeout handlingEnvironment variable handling is robust:
envOrhelper for safe default values--modeland--allowedTools) by omitting them when emptyThe integration with the existing pipeline is clean:
reviewWithSpecialistnow uses the engine abstraction properlyUnit tests cover the key edge cases:
No blocking issues were found in error handling or edge case management within the scope of this change.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 8s
🪰 Gadfly review —
gemma4:cloud(ollama-cloud)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — Blocking issues found
VERDICT: Blocking issues found
Command Injection via Environment Variables:
cmd/gadfly/engine.go:84andcmd/gadfly/engine.go:85readGADFLY_CLAUDE_ALLOWED_TOOLSandGADFLY_CLAUDE_EXTRA_ARGSand pass them directly toexec.CommandContext. Whileexec.Commanddoes not use a shell by default,strings.FieldsonGADFLY_CLAUDE_EXTRA_ARGSallows a user to inject any arbitrary flag into theclaudeCLI. More critically, since these environment variables are often set in CI YAML files (which can be modified in PRs if the CI is misconfigured or if they are sourced from untrusted repo-level configs), this allows an attacker to pass flags like--permission-mode agentor other destructive flags to the CLI, bypassing the intendedplan(read-only) restriction.GADFLY_CLAUDE_EXTRA_ARGSentirely in favor of a structured configuration.Insecure Default/Override of Permission Mode:
cmd/gadfly/engine.go:83allowsGADFLY_CLAUDE_PERMISSION_MODEto be overridden by an environment variable. The README claimsplanis read-only, but by allowing this to be tuned via env vars, any user who can trigger the CI (or modify the workflow if permissions are loose) can switch the engine to a mode that allows the LLM to write to the filesystem.planmode or restrict overrides to a set of known-safe read-only modes.Potential Token Leakage in Logs:
cmd/gadfly/engine.go:134-139capturesstdoutandstderrand includes them in the error return value, which is then printed toos.Stderrinmain.go:317andmain.go:326. If theclaudeCLI outputs sensitive information (like the OAuth token or API key) during a failure or in a debug trace, those secrets will be leaked to the CI logs.os.Environ()) from the CLI output before returning the error string.🎯 Correctness — No material issues found
No material issues found
🧯 Error handling & edge cases — Blocking issues found
VERDICT: Blocking issues found
cmd/gadfly/main.go:156: TheccSpeccheck usesos.Getenv("GADFLY_MODEL")instead of theGADFLY_MODELS(plural) environment variable used by the rest of the application to resolve the model list.claude-codeengine cannot be triggered via the standardGADFLY_MODELSconfiguration variable, which is what the documentation and example YAMLs specify.cmd/gadfly/engine.go:125: Potential forjson.Unmarshalto fail silently on non-JSON output. While there is a fallback to raw stdout at line 143, the code attempts toUnmarshalthe entire stdout buffer first. If theclaudeCLI outputs a mix of non-JSON warnings/logs followed by a JSON block (a common pattern for CLI tools),json.Unmarshalwill fail, and the code will fall back to treating the entire buffer (including the JSON) as a raw string.cmd/gadfly/engine.go:121: Thecmd.Run()call is made, and then stdout is processed. However, if the command is killed by thectxtimeout (defined inreviewWithSpecialist),cmd.Run()returns an error. The current logic attempts to parse JSON first regardless ofrunErr. If the process was killed,stdoutmight be partial/malformed JSON, leading tojerr != niland then falling through to therunErr != nilblock. While functionally correct, the error message "claude -p failed: signal: killed" might mask the fact that a partial JSON result was actually produced.runErrexplicitly for context cancellation/timeout before falling back to raw stdout.cmd/gadfly/engine.go:135: In the error path whererunErr != nil, the code attempts to usestderr.String()thenstdout.String(). Becausestdoutis also used for the JSON attempt, this is correct, buttruncateForErris applied. If the error is a massive stack trace or log dump, we only see the first 800 characters. Given this is an adversarial reviewer, the "unhappy path" of the engine itself should be as transparent as possible.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 25s