fix: fold in claude-code-engine review findings
Build & push image / build-and-push (pull_request) Successful in 14s

The dogfood swarm reviewed PR #2 (9 cloud reviewers; m5 wedged by a host
reboot and skipped this once). 66 findings graded via the gadfly MCP
(~half real, half false-positive/clean-verification). Folding in the
warranted ones:

- engine.go: claudeEnv() builds a MINIMAL subprocess environment (auth
  token + PATH/HOME/locale/GADFLY_CLAUDE_*), no longer handing GITEA_TOKEN
  and provider keys to the third-party CLI (4-model consensus).
- engine.go: runPass rewrite — check ctx.Err() first (don't emit a
  review from a timed-out run), treat an empty parsed result as an error
  instead of returning the raw JSON envelope, only trust a JSON answer on
  a clean exit, and drop the dangling ": " when there's no error detail.
- engine.go: put the CLI in its own process group (Setpgid) and SIGKILL
  the whole group on cancel, so a timed-out lens can't orphan node procs.
- engine.go: rune-safe truncateForErr.
- prompts: genericized the tool-name hints in buildTask + recheck so the
  claude-code engine isn't told to call majordomo-only tools (read_file/
  get_diff); also dropped the mort-specific framing from the recheck
  prompt (it must stay generic per CLAUDE.md).
- README: documented that GADFLY_CLAUDE_EXTRA_ARGS is whitespace-split
  and can override the read-only default, and that the subprocess gets a
  minimal env.

Left as-is (graded, noted in finding notes): operator-knob override of
read-only (intentional escape hatch), shared per-lens timeout (by design),
GADFLY_CLAUDE_BIN trust (operator-controlled, like GADFLY_BIN).

New tests: claudeEnv filtering, rune-safe truncation, and runPass paths
(clean / empty-result / is_error / non-zero) via a stub binary. gofmt
clean, go vet quiet, go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-27 16:39:50 -04:00
parent 4237a18d09
commit 2ca1ce0b6b
5 changed files with 196 additions and 22 deletions
+10 -1
View File
@@ -103,9 +103,18 @@ Auth is read from the environment: the default is a **Pro/Max subscription** via
| `GADFLY_CLAUDE_MODEL` | *(from the spec suffix)* | overrides the `--model` value | | `GADFLY_CLAUDE_MODEL` | *(from the spec suffix)* | overrides the `--model` value |
| `GADFLY_CLAUDE_PERMISSION_MODE` | `plan` | `--permission-mode` (read-only `plan` keeps it from editing) | | `GADFLY_CLAUDE_PERMISSION_MODE` | `plan` | `--permission-mode` (read-only `plan` keeps it from editing) |
| `GADFLY_CLAUDE_ALLOWED_TOOLS` | *(unset)* | `--allowedTools` value, passed verbatim (e.g. `Read,Grep,Glob`) | | `GADFLY_CLAUDE_ALLOWED_TOOLS` | *(unset)* | `--allowedTools` value, passed verbatim (e.g. `Read,Grep,Glob`) |
| `GADFLY_CLAUDE_EXTRA_ARGS` | *(unset)* | extra CLI args appended verbatim (e.g. `--max-turns 30`) | | `GADFLY_CLAUDE_EXTRA_ARGS` | *(unset)* | extra CLI args, **whitespace-split** (no shell quoting) and appended after the defaults (e.g. `--max-turns 30`) |
| `GADFLY_CLAUDE_BIN` | `claude` | CLI binary path | | `GADFLY_CLAUDE_BIN` | `claude` | CLI binary path |
> These are **operator** knobs (workflow env), not PR-author input. Because
> `GADFLY_CLAUDE_EXTRA_ARGS` is appended *after* the defaults, it can override the
> read-only `--permission-mode plan` (e.g. passing `--permission-mode acceptEdits`),
> so keep it read-only unless you mean otherwise. It's whitespace-split, so values
> can't contain spaces — use `GADFLY_CLAUDE_ALLOWED_TOOLS` / `_PERMISSION_MODE` /
> `_MODEL` for those. The subprocess runs with a **minimal environment** (its auth
> token + `PATH`/`HOME`/locale/`GADFLY_CLAUDE_*`), not the runner's full env, so the
> Gitea token and provider keys aren't handed to the CLI.
> **Untested, like the cloud providers.** This wires the CLI in and is exercised by its unit > **Untested, like the cloud providers.** This wires the CLI in and is exercised by its unit
> tests, but a live subscription-auth run hasn't been validated end-to-end here — and using > tests, but a live subscription-auth run hasn't been validated end-to-end here — and using
> subscription auth in automated CI is a gray area in Anthropic's terms. `auto` specialist > subscription auth in automated CI is a gray area in Anthropic's terms. `auto` specialist
+72 -10
View File
@@ -8,6 +8,8 @@ import (
"os" "os"
"os/exec" "os/exec"
"strings" "strings"
"syscall"
"unicode/utf8"
llm "gitea.stevedudenhoeffer.com/steve/majordomo/llm" llm "gitea.stevedudenhoeffer.com/steve/majordomo/llm"
) )
@@ -113,47 +115,107 @@ type claudeResult struct {
func (e *claudeCodeEngine) runPass(ctx context.Context, system, task string, _ int) (string, error) { func (e *claudeCodeEngine) runPass(ctx context.Context, system, task string, _ int) (string, error) {
cmd := exec.CommandContext(ctx, e.bin, e.args(system, task)...) cmd := exec.CommandContext(ctx, e.bin, e.args(system, task)...)
cmd.Dir = e.repoDir cmd.Dir = e.repoDir
cmd.Env = os.Environ() // inherits CLAUDE_CODE_OAUTH_TOKEN / ANTHROPIC_API_KEY cmd.Env = claudeEnv() // minimal env — don't hand GITEA_TOKEN et al. to the CLI
// Put the CLI and the Node children it spawns in their own process group and
// kill the WHOLE group on context cancel, so a timed-out lens can't leave
// orphaned claude/node processes behind in the container.
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
cmd.Cancel = func() error {
if cmd.Process != nil {
_ = syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)
}
return nil
}
var stdout, stderr bytes.Buffer var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout cmd.Stdout = &stdout
cmd.Stderr = &stderr cmd.Stderr = &stderr
runErr := cmd.Run() runErr := cmd.Run()
// Prefer a structured answer; the CLI prints JSON even on some failures. // A cancelled/timed-out run must surface as an error, never as whatever
// partial bytes the CLI flushed before it was killed.
if ctx.Err() != nil {
return "", fmt.Errorf("claude -p %v", ctx.Err())
}
var res claudeResult var res claudeResult
if jerr := json.Unmarshal(bytes.TrimSpace(stdout.Bytes()), &res); jerr == nil { parsed := json.Unmarshal(bytes.TrimSpace(stdout.Bytes()), &res) == nil
// Clean exit: trust the parsed JSON answer, and ONLY it — never fall back to
// the raw JSON envelope when the result is empty.
if runErr == nil && parsed {
if res.IsError { if res.IsError {
return "", fmt.Errorf("claude reported error (%s): %s", res.Subtype, truncateForErr(res.Result)) return "", fmt.Errorf("claude reported error (%s): %s", res.Subtype, truncateForErr(res.Result))
} }
if out := strings.TrimSpace(res.Result); out != "" { if out := strings.TrimSpace(res.Result); out != "" {
return out, nil return out, nil
} }
return "", fmt.Errorf("claude -p returned an empty result")
} }
if runErr != nil { if runErr != nil {
// Prefer the CLI's own structured error message when it gave one.
if parsed && res.IsError && strings.TrimSpace(res.Result) != "" {
return "", fmt.Errorf("claude reported error (%s): %s", res.Subtype, truncateForErr(res.Result))
}
detail := truncateForErr(stderr.String()) detail := truncateForErr(stderr.String())
if detail == "" { if detail == "" {
detail = truncateForErr(stdout.String()) detail = truncateForErr(stdout.String())
} }
return "", fmt.Errorf("claude -p failed: %v: %s", runErr, detail) if detail != "" {
return "", fmt.Errorf("claude -p failed: %v: %s", runErr, detail)
}
return "", fmt.Errorf("claude -p failed: %v", runErr)
} }
// Ran cleanly but we couldn't pull a result out of the JSON: fall back to
// raw stdout so a format change degrades to "use the text" instead of empty. // Clean exit but stdout wasn't the expected JSON envelope: degrade to the raw
// text so a CLI format change still yields a review instead of nothing.
if raw := strings.TrimSpace(stdout.String()); raw != "" { if raw := strings.TrimSpace(stdout.String()); raw != "" {
return raw, nil return raw, nil
} }
return "", fmt.Errorf("claude -p produced no parseable output") return "", fmt.Errorf("claude -p produced no parseable output")
} }
// truncateForErr caps CLI error detail so a stderr dump can't bloat the comment. // claudeEnv builds a minimal environment for the `claude` subprocess: only what
// the CLI needs (PATH/HOME, its auth tokens, locale, Node/XDG/GADFLY_CLAUDE_*
// knobs), deliberately dropping the rest of the runner's secrets — GITEA_TOKEN,
// GADFLY_FINDINGS_TOKEN, provider keys — so they never reach the third-party
// CLI. Defense in depth: the parent already holds them, but the CLI has no need.
func claudeEnv() []string {
keep := func(k string) bool {
switch k {
case "PATH", "HOME", "USER", "LOGNAME", "TMPDIR", "LANG", "TERM", "SHELL":
return true
}
return strings.HasPrefix(k, "LC_") ||
strings.HasPrefix(k, "CLAUDE_") ||
strings.HasPrefix(k, "ANTHROPIC_") ||
strings.HasPrefix(k, "GADFLY_CLAUDE_") ||
strings.HasPrefix(k, "NODE_") ||
strings.HasPrefix(k, "XDG_")
}
var env []string
for _, kv := range os.Environ() {
if k, _, ok := strings.Cut(kv, "="); ok && keep(k) {
env = append(env, kv)
}
}
return env
}
// truncateForErr caps CLI error detail so a stderr dump can't bloat the comment,
// cutting on a rune boundary so it never emits invalid UTF-8.
func truncateForErr(s string) string { func truncateForErr(s string) string {
s = strings.TrimSpace(s) s = strings.TrimSpace(s)
const max = 800 const max = 800
if len(s) > max { if len(s) <= max {
return s[:max] + "…" return s
} }
return s cut := max
for cut > 0 && !utf8.RuneStart(s[cut]) {
cut--
}
return s[:cut] + "…"
} }
// envOr returns the env var value or a default when unset/blank. // envOr returns the env var value or a default when unset/blank.
+104
View File
@@ -1,9 +1,12 @@
package main package main
import ( import (
"context"
"os"
"slices" "slices"
"strings" "strings"
"testing" "testing"
"unicode/utf8"
) )
func TestIsClaudeCodeSpec(t *testing.T) { func TestIsClaudeCodeSpec(t *testing.T) {
@@ -113,3 +116,104 @@ func TestClaudeCodeArgsBareModelOmitsFlag(t *testing.T) {
t.Errorf("--allowedTools should be omitted when blank: %v", args) t.Errorf("--allowedTools should be omitted when blank: %v", args)
} }
} }
func TestClaudeEnvFilters(t *testing.T) {
t.Setenv("GITEA_TOKEN", "secret-gitea")
t.Setenv("OLLAMA_API_KEY", "secret-ollama")
t.Setenv("GADFLY_API_KEY", "secret-gadfly")
t.Setenv("GADFLY_FINDINGS_TOKEN", "secret-findings")
t.Setenv("CLAUDE_CODE_OAUTH_TOKEN", "keep-claude")
t.Setenv("ANTHROPIC_API_KEY", "keep-anthropic")
t.Setenv("GADFLY_CLAUDE_MODEL", "keep-knob")
env := claudeEnv()
has := func(k string) bool {
for _, kv := range env {
if strings.HasPrefix(kv, k+"=") {
return true
}
}
return false
}
// kept: the CLI's auth + its own knobs + PATH
for _, k := range []string{"CLAUDE_CODE_OAUTH_TOKEN", "ANTHROPIC_API_KEY", "GADFLY_CLAUDE_MODEL", "PATH"} {
if !has(k) {
t.Errorf("claudeEnv dropped %s, but it should be kept", k)
}
}
// dropped: the runner's secrets the CLI doesn't need
for _, k := range []string{"GITEA_TOKEN", "OLLAMA_API_KEY", "GADFLY_API_KEY", "GADFLY_FINDINGS_TOKEN"} {
if has(k) {
t.Errorf("claudeEnv leaked %s into the subprocess env", k)
}
}
}
func TestTruncateForErrRuneSafe(t *testing.T) {
// 900 multibyte runes (3 bytes each) -> well over the 800-byte cap; the cut
// must land on a rune boundary so the result stays valid UTF-8.
s := strings.Repeat("€", 900)
got := truncateForErr(s)
if !utf8.ValidString(got) {
t.Fatalf("truncateForErr produced invalid UTF-8")
}
if !strings.HasSuffix(got, "…") {
t.Fatalf("truncateForErr should append an ellipsis when truncating")
}
// short strings pass through untouched
if truncateForErr(" hi ") != "hi" {
t.Fatalf("truncateForErr should trim and pass short strings through")
}
}
// stubClaude writes an executable shell stub that prints body and exits code,
// and returns an engine pointed at it.
func stubClaude(t *testing.T, body string, code int) *claudeCodeEngine {
t.Helper()
dir := t.TempDir()
path := dir + "/claude-stub.sh"
script := "#!/bin/sh\nprintf '%s' " + shSingleQuote(body) + "\nexit " + itoa(code) + "\n"
if err := os.WriteFile(path, []byte(script), 0o755); err != nil {
t.Fatal(err)
}
return &claudeCodeEngine{bin: path, repoDir: dir}
}
func shSingleQuote(s string) string { return "'" + strings.ReplaceAll(s, "'", `'\''`) + "'" }
func itoa(i int) string { return string(rune('0' + i)) } // single-digit exit codes only
func TestRunPassCleanResult(t *testing.T) {
e := stubClaude(t, `{"result":"REVIEW TEXT","is_error":false}`, 0)
out, err := e.runPass(context.Background(), "sys", "task", 0)
if err != nil || out != "REVIEW TEXT" {
t.Fatalf("clean result: got (%q, %v), want (REVIEW TEXT, nil)", out, err)
}
}
func TestRunPassEmptyResultIsError(t *testing.T) {
// JSON parses, exit 0, but result empty: must NOT return the raw JSON blob.
e := stubClaude(t, `{"result":"","is_error":false}`, 0)
out, err := e.runPass(context.Background(), "sys", "task", 0)
if err == nil {
t.Fatalf("empty result should be an error, got out=%q", out)
}
if strings.Contains(out, "{") {
t.Fatalf("empty result must not leak raw JSON, got %q", out)
}
}
func TestRunPassIsErrorFlag(t *testing.T) {
e := stubClaude(t, `{"result":"boom","is_error":true,"subtype":"error_max_turns"}`, 0)
_, err := e.runPass(context.Background(), "sys", "task", 0)
if err == nil || !strings.Contains(err.Error(), "claude reported error") {
t.Fatalf("is_error should surface as an error, got %v", err)
}
}
func TestRunPassNonZeroNoJSON(t *testing.T) {
e := stubClaude(t, "fatal: auth failed", 1)
_, err := e.runPass(context.Background(), "sys", "task", 0)
if err == nil || !strings.Contains(err.Error(), "claude -p failed") {
t.Fatalf("non-zero exit should error with detail, got %v", err)
}
}
+2 -2
View File
@@ -431,7 +431,7 @@ func buildTask(diff string) string {
truncNote := "" truncNote := ""
if maxDiff > 0 && len(diff) > maxDiff { if maxDiff > 0 && len(diff) > maxDiff {
diff = diff[:maxDiff] diff = diff[:maxDiff]
truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars in this message; call get_diff for the full text.]", maxDiff) truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars in this message; read the changed files (or call get_diff, if available) for the full text.]", maxDiff)
} }
var b strings.Builder var b strings.Builder
@@ -441,7 +441,7 @@ func buildTask(diff string) string {
if strings.TrimSpace(body) != "" { if strings.TrimSpace(body) != "" {
fmt.Fprintf(&b, "PR description:\n%s\n\n", body) fmt.Fprintf(&b, "PR description:\n%s\n\n", body)
} }
b.WriteString("Review the following unified diff. Before reporting any cross-file or compile-correctness issue, use your tools (read_file, grep, find_files) to verify it against the actual checked-out code — do not rely on the diff alone.\n\n") b.WriteString("Review the following unified diff. Before reporting any cross-file or compile-correctness issue, use your repository read tools to verify it against the actual checked-out code — do not rely on the diff alone.\n\n")
fmt.Fprintf(&b, "```diff\n%s\n```%s", diff, truncNote) fmt.Fprintf(&b, "```diff\n%s\n```%s", diff, truncNote)
return b.String() return b.String()
} }
+8 -9
View File
@@ -16,15 +16,14 @@ const defaultRecheckMaxSteps = 16
// against the real code before letting it survive — the antidote to a // against the real code before letting it survive — the antidote to a
// single-pass reviewer that reads a couple of files, mis-connects them, and // single-pass reviewer that reads a couple of files, mis-connects them, and
// posts a confident but wrong "blocking" verdict. // posts a confident but wrong "blocking" verdict.
const recheckSystemPrompt = `You are a VERIFICATION GATE for an automated adversarial code review of the const recheckSystemPrompt = `You are a VERIFICATION GATE for an automated adversarial code review. You are
"mort" project (a large Go Discord bot). You are given a DRAFT review produced given a DRAFT review produced by another model. Your job is NOT to write a new
by another model. Your job is NOT to write a new review — it is to confirm or review — it is to confirm or reject each finding in the draft against the ACTUAL
reject each finding in the draft against the ACTUAL code, then output the code, then output the corrected review.
corrected review.
You have the same read-only repository tools as the original reviewer: You have read-only access to the checked-out repository — use your tools to read
- read_file(path[, start_line, limit]), list_dir([path]), grep(pattern[, path, files and search the code to independently verify each finding against the real
max_results]), find_files(name[, max_results]), get_diff(). source.
For EVERY finding in the draft: For EVERY finding in the draft:
1. Independently reproduce the reasoning by reading the actual files with your 1. Independently reproduce the reasoning by reading the actual files with your
@@ -84,7 +83,7 @@ func buildRecheckTask(draft, diff string) string {
truncNote := "" truncNote := ""
if maxDiff > 0 && len(diff) > maxDiff { if maxDiff > 0 && len(diff) > maxDiff {
diff = diff[:maxDiff] diff = diff[:maxDiff]
truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars here; call get_diff for the full text.]", maxDiff) truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars here; read the changed files (or call get_diff, if available) for the full text.]", maxDiff)
} }
var b strings.Builder var b strings.Builder