ci: enable claude-code/opus:max (max-thinking) reviewer #6

2026-06-27T22:40:54Z

steve commented

2026-06-27 22:40:54 +00:00

Follow-up to #5 — the max-thinking opus you asked for.

Now that :sha-c342bdb (which has the :thinking spec parse from #5) is built, this adds claude-code/opus:max to the dogfood swarm and pins to that image. The Claude Code lineup is now sonnet + opus + opus-with-max-thinking, so the scoreboard can compare the same model at two thinking depths.

Max thinking is applied via MAX_THINKING_TOKENS=31999 on the CLI subprocess (best-effort — a no-op if the installed CLI build ignores it, in which case opus:max simply behaves like opus).

No code change — workflow only. This PR's own review will be the first to exercise claude-code/opus:max end-to-end. YAML validated.

🤖 Generated with Claude Code

Follow-up to #5 — the max-thinking opus you asked for. Now that `:sha-c342bdb` (which has the `:thinking` spec parse from #5) is built, this adds **`claude-code/opus:max`** to the dogfood swarm and pins to that image. The Claude Code lineup is now **sonnet + opus + opus-with-max-thinking**, so the scoreboard can compare the same model at two thinking depths. Max thinking is applied via `MAX_THINKING_TOKENS=31999` on the CLI subprocess (best-effort — a no-op if the installed CLI build ignores it, in which case `opus:max` simply behaves like `opus`). No code change — workflow only. This PR's own review will be the first to exercise `claude-code/opus:max` end-to-end. YAML validated. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

steve added 1 commit 2026-06-27 22:40:55 +00:00

ci: enable claude-code/opus:max (max-thinking) reviewer

Build & push image / build-and-push (pull_request) Successful in 7s

Details

Adversarial Review (Gadfly) / review (pull_request) Successful in 7m1s

Details

36f8ef0ec7

Follow-up to #5: now that :sha-c342bdb (which has the :thinking spec
parse) is built, add claude-code/opus:max to the dogfood swarm and pin to
that image. The Claude Code lineup is now sonnet + opus + opus-with-max-
thinking, so the scoreboard can compare the same model at two thinking
depths. Best-effort max thinking via MAX_THINKING_TOKENS (no-op if the
CLI build ignores it).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitea-actions bot commented

2026-06-27 22:41:11 +00:00

🪰 Gadfly — live review status

9/9 reviewers finished · updated 2026-06-27 22:47:56Z

`claude-code/opus` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ error-handling — No material issues found

`claude-code/opus:max` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ error-handling — No material issues found

`claude-code/sonnet` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ error-handling — No material issues found

`deepseek-v4-pro:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ error-handling — No material issues found

`glm-5.1:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ error-handling — No material issues found

`glm-5.2:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ error-handling — No material issues found

`minimax-m3:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ error-handling — No material issues found

`nemotron-3-super:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ error-handling — Reviewed

`qwen3-coder:480b-cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ error-handling — No material issues found

_{Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.}

## 🪰 Gadfly — live review status 9/9 reviewers finished · updated 2026-06-27 22:47:56Z #### `claude-code/opus` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/opus:max` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/sonnet` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `deepseek-v4-pro:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `glm-5.1:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `glm-5.2:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `minimax-m3:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `nemotron-3-super:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — Reviewed #### `qwen3-coder:480b-cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.

gitea-actions bot commented

2026-06-27 22:41:11 +00:00

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

The diff is workflow-only. From the security lens I verified the full input-to-execution path for the new claude-code/opus:max model spec:
- GADFLY_MODELS is parsed by newClaudeCodeEngine (cmd/gadfly/engine.go:83-109). The opus:max token splits on / then : → model="opus", thinking=parseThinking("max")=31999. No shell is involved; model is passed to the CLI purely as an exec.CommandContext argv element (--model, e.model at engine.go:129-130), so the spec string is not a command-injection vector.
- MAX_THINKING_TOKENS=31999 is injected into the subprocess env only at engine.go:152-154, after claudeEnv() builds the minimal allowlist. I confirmed claudeEnv() (engine.go:221-241) deliberately drops GITEA_TOKEN, GADFLY_FINDINGS_TOKEN, and OLLAMA_CLOUD_API_KEY and only keeps CLAUDE_*/ANTHROPIC_*/GADFLY_CLAUDE_*/locale/Node/XDG plus MAX_THINKING_TOKENS. No new secret exposure is introduced by adding the :max lane; the OAuth token that is passed is the intended CLI auth.
- Image pin moved from :sha-80d8f53 to :sha-c342bdb — both immutable sha tags from the same private registry; no downgrade to :latest and no new trust boundary.
No authn/authz change: the job-level if: actor allow-list (steve/fizi/dazed) and GADFLY_ALLOWED_USERS mirror are untouched, so the comment-trigger secret-bearing-run guard still applies to the new lane.
I checked the unit tests (cmd/gadfly/engine_test.go:221-281) which assert :max → 31999 and that the env is only set for :max, not plain specs — consistent with the security-relevant env handling above.

Nothing in my lane is materially wrong.

🎯 Correctness — No material issues found

I verified the full pipeline this PR touches: the workflow YAML, the engine spec parser (cmd/gadfly/engine.go), the entrypoint lane/concurrency logic (entrypoint.sh), and the per-model lens fan-out (cmd/gadfly/main.go). Tracing claude-code/opus:max end-to-end:

Spec parse (engine.go:83-121): strings.Cut(after, ":") → model=opus, parseThinking("max") → maxThinkingTokens (31999). Correct.
31999 value: this is Claude Code's ultrathink tier budget (think=4000 / megathink=10000 / ultrathink=31999). The constant matches the documented tier — not a plausible-looking magic number, the real one.
Env injection (engine.go:148-155): thinkingTokens > 0 → appends MAX_THINKING_TOKENS=31999. The workflow env block does not set MAX_THINKING_TOKENS, and claudeEnv() only keeps it if present in os.Environ(), so the subprocess receives exactly one MAX_THINKING_TOKENS=31999 entry — no duplicate. Correct for this configuration.
Lane routing (entrypoint.sh:159): provider_of "claude-code/opus:max" → claude-code (first path segment). It joins the claude-code lane (cap 2), alongside sonnet and opus — 3 models, 2 at a time. Correct.
Status-board uniqueness (entrypoint.sh:165): status_file_for sanitizes via tr -c '[:alnum:]._-' '_', so claude-code/opus:max → claude-code_opus_max, distinct from claude-code_opus. No collision on the live board.
Lens concurrency (main.go:281-286): GADFLY_PROVIDER_LENS_CONCURRENCY only overrides ollama-cloud; claude-code falls back to default 1, so each claude-code model runs its 3 lenses serially. 2 models at a time × 1 lens = 2 concurrent CLI subprocesses — within the lane's intent.

Verdict: No material issues found.

The one thing worth noting (not introduced by this diff, and not triggered by this workflow config): runPass appends MAX_THINKING_TOKENS=… after claudeEnv(), which already keeps MAX_THINKING_TOKENS if it exists in the parent env. If MAX_THINKING_TOKENS were ever exported into the runner env, the subprocess would get two entries (first wins in glibc getenv), silently overriding the :max budget with the parent value. This PR's config does not set it, so it's a latent engine edge case, not a defect of this change. Verified by reading engine.go:148-155 and engine.go:221-240.

🧯 Error handling & edge cases — No material issues found

I verified the YAML change against the code that consumes it. This is a workflow-only diff: it adds claude-code/opus:max to GADFLY_MODELS and bumps the pinned image tag (which already carries the :max/MAX_THINKING_TOKENS parsing logic from the prior PR).

Checked the edge cases in my lane:

:max spec parsing (cmd/gadfly/engine.go:83-121): strings.Cut on : correctly splits opus:max into model=opus, thinking=max; parseThinking("max") returns 31999. No panic/nil path on malformed suffix — unknown values fall through to 0 (no-op), which is exactly the "best-effort, degrades to opus" behavior described.
Env injection (engine.go:152-154): MAX_THINKING_TOKENS only appended when thinkingTokens > 0; empty/zero never leaks. No swallowed error here.
Provider lane (entrypoint.sh:159): provider_of splits on first / → claude-code for claude-code/opus:max, so it lands in the claude-code=2 concurrency lane as intended. No empty-provider edge case.
Status file uniqueness (entrypoint.sh:165): status_file_for sanitizes :→_, so claude-code/opus (claude-code_opus) and claude-code/opus:max (claude-code_opus_max) get distinct files — no collision that would make the live board conflate the two runs.
Run failure handling (entrypoint.sh:183): a failed model logs and continues; adding a third claude-code entry doesn't change that, and with cap=2 the third queues rather than erroring.

No ignored errors, missing cleanup, panics on bad input, or boundary issues introduced by this change.

VERDICT: No material issues found.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 2s}

### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> VERDICT: No material issues found - The diff is workflow-only. From the security lens I verified the full input-to-execution path for the new `claude-code/opus:max` model spec: - `GADFLY_MODELS` is parsed by `newClaudeCodeEngine` (`cmd/gadfly/engine.go:83-109`). The `opus:max` token splits on `/` then `:` → `model="opus"`, `thinking=parseThinking("max")=31999`. No shell is involved; `model` is passed to the CLI purely as an `exec.CommandContext` argv element (`--model`, `e.model` at `engine.go:129-130`), so the spec string is not a command-injection vector. - `MAX_THINKING_TOKENS=31999` is injected into the subprocess env only at `engine.go:152-154`, after `claudeEnv()` builds the minimal allowlist. I confirmed `claudeEnv()` (`engine.go:221-241`) deliberately drops `GITEA_TOKEN`, `GADFLY_FINDINGS_TOKEN`, and `OLLAMA_CLOUD_API_KEY` and only keeps `CLAUDE_*`/`ANTHROPIC_*`/`GADFLY_CLAUDE_*`/locale/Node/XDG plus `MAX_THINKING_TOKENS`. No new secret exposure is introduced by adding the `:max` lane; the OAuth token that *is* passed is the intended CLI auth. - Image pin moved from `:sha-80d8f53` to `:sha-c342bdb` — both immutable sha tags from the same private registry; no downgrade to `:latest` and no new trust boundary. - No authn/authz change: the job-level `if:` actor allow-list (`steve`/`fizi`/`dazed`) and `GADFLY_ALLOWED_USERS` mirror are untouched, so the comment-trigger secret-bearing-run guard still applies to the new lane. - I checked the unit tests (`cmd/gadfly/engine_test.go:221-281`) which assert `:max` → `31999` and that the env is only set for `:max`, not plain specs — consistent with the security-relevant env handling above. Nothing in my lane is materially wrong. </details> <details><summary>🎯 Correctness — No material issues found</summary> I verified the full pipeline this PR touches: the workflow YAML, the engine spec parser (`cmd/gadfly/engine.go`), the entrypoint lane/concurrency logic (`entrypoint.sh`), and the per-model lens fan-out (`cmd/gadfly/main.go`). Tracing `claude-code/opus:max` end-to-end: - **Spec parse** (`engine.go:83-121`): `strings.Cut(after, ":")` → model=`opus`, `parseThinking("max")` → `maxThinkingTokens` (31999). Correct. - **31999 value**: this is Claude Code's `ultrathink` tier budget (think=4000 / megathink=10000 / ultrathink=31999). The constant matches the documented tier — not a plausible-looking magic number, the real one. - **Env injection** (`engine.go:148-155`): `thinkingTokens > 0` → appends `MAX_THINKING_TOKENS=31999`. The workflow env block does **not** set `MAX_THINKING_TOKENS`, and `claudeEnv()` only keeps it if present in `os.Environ()`, so the subprocess receives exactly one `MAX_THINKING_TOKENS=31999` entry — no duplicate. Correct for this configuration. - **Lane routing** (`entrypoint.sh:159`): `provider_of "claude-code/opus:max"` → `claude-code` (first path segment). It joins the `claude-code` lane (cap 2), alongside `sonnet` and `opus` — 3 models, 2 at a time. Correct. - **Status-board uniqueness** (`entrypoint.sh:165`): `status_file_for` sanitizes via `tr -c '[:alnum:]._-' '_'`, so `claude-code/opus:max` → `claude-code_opus_max`, distinct from `claude-code_opus`. No collision on the live board. - **Lens concurrency** (`main.go:281-286`): `GADFLY_PROVIDER_LENS_CONCURRENCY` only overrides `ollama-cloud`; `claude-code` falls back to default 1, so each claude-code model runs its 3 lenses serially. 2 models at a time × 1 lens = 2 concurrent CLI subprocesses — within the lane's intent. **Verdict: No material issues found.** - The one thing worth noting (not introduced by this diff, and not triggered by this workflow config): `runPass` appends `MAX_THINKING_TOKENS=…` *after* `claudeEnv()`, which already keeps `MAX_THINKING_TOKENS` if it exists in the parent env. If `MAX_THINKING_TOKENS` were ever exported into the runner env, the subprocess would get two entries (first wins in glibc `getenv`), silently overriding the `:max` budget with the parent value. This PR's config does not set it, so it's a latent engine edge case, not a defect of this change. Verified by reading `engine.go:148-155` and `engine.go:221-240`. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I verified the YAML change against the code that consumes it. This is a workflow-only diff: it adds `claude-code/opus:max` to `GADFLY_MODELS` and bumps the pinned image tag (which already carries the `:max`/`MAX_THINKING_TOKENS` parsing logic from the prior PR). Checked the edge cases in my lane: - **`:max` spec parsing** (`cmd/gadfly/engine.go:83-121`): `strings.Cut` on `:` correctly splits `opus:max` into model=`opus`, thinking=`max`; `parseThinking("max")` returns `31999`. No panic/nil path on malformed suffix — unknown values fall through to `0` (no-op), which is exactly the "best-effort, degrades to opus" behavior described. - **Env injection** (`engine.go:152-154`): `MAX_THINKING_TOKENS` only appended when `thinkingTokens > 0`; empty/zero never leaks. No swallowed error here. - **Provider lane** (`entrypoint.sh:159`): `provider_of` splits on first `/` → `claude-code` for `claude-code/opus:max`, so it lands in the `claude-code=2` concurrency lane as intended. No empty-provider edge case. - **Status file uniqueness** (`entrypoint.sh:165`): `status_file_for` sanitizes `:`→`_`, so `claude-code/opus` (`claude-code_opus`) and `claude-code/opus:max` (`claude-code_opus_max`) get distinct files — no collision that would make the live board conflate the two runs. - **Run failure handling** (`entrypoint.sh:183`): a failed model logs and continues; adding a third claude-code entry doesn't change that, and with cap=2 the third queues rather than erroring. No ignored errors, missing cleanup, panics on bad input, or boundary issues introduced by this change. VERDICT: No material issues found. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 2s

gitea-actions bot commented

2026-06-27 22:41:11 +00:00

🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Reviewed the diff through the security lens:

Secret isolation intact: The claudeCodeEngine subprocess gets its env from claudeEnv() (engine.go:221), which whitelists only PATH/HOME/… and CLAUDE_*/ANTHROPIC_*/GADFLY_CLAUDE_* prefixes — deliberately dropping GITEA_TOKEN, GADFLY_FINDINGS_TOKEN, and provider keys. The new MAX_THINKING_TOKENS value is appended after that whitelist (engine.go:154), never from the runner's env, so no secret leakage path is introduced.
No injection surface: The :max suffix is parsed by parseThinking into a compile-time constant (31999, engine.go:66), then strconv.Itoa'd into an env var. There is no user-controlled string that reaches a shell, CLI flag, or template.
Permission boundary unchanged: The Claude CLI still runs under --permission-mode plan (read-only sandbox, engine.go:104). The :max spec only affects a thinking-budget env var — it does not grant additional capabilities to the subprocess.
Image pinning: Both old and new tags use immutable :sha- digests, not :latest, preventing supply-chain substitution between runs.
CLAUDE_CODE_OAUTH_TOKEN remains a GitHub secret referenced via ${{ secrets.* }}, never echoed or logged.

🎯 Correctness — No material issues found

VERDICT: No material issues found

The diff adds claude-code/opus:max to the GADFLY_MODELS list and pins the Docker image to sha-c342bdb (which includes the :thinking spec parsing from the prior PR). I verified the following correctness concerns:

:max suffix parsing: cmd/gadfly/engine.go:78-121 — parseThinking("max") correctly returns maxThinkingTokens (31999), and newClaudeCodeEngine("claude-code/opus:max", ...) correctly splits into model="opus", thinkingTokens=31999. Tests at engine_test.go:221-278 confirm this end-to-end, including that MAX_THINKING_TOKENS=31999 is injected into the subprocess env for :max specs.
Provider routing: modelProvider() (model.go:142-151) uses strings.Cut(model, "/") to extract the provider lane, yielding "claude-code" from "claude-code/opus:max" — matching the claude-code=2 concurrency key. Correct.
Concurrency: With 3 claude-code models (sonnet, opus, opus:max) and claude-code=2 concurrency, two run at a time and the third queues. No overflow or misconfiguration.
The 31999 value: Intentionally just under a 32k boundary; consistent across code (const maxThinkingTokens = 31999), tests (MTT=31999), and PR description. Internally consistent.

No logic bugs or semantic correctness issues introduced.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

I reviewed the diff through the error-handling & edge-case lens. The change adds claude-code/opus:max to the model fleet and pins the image to sha-c342bdb. I verified the full parsing and execution path:

newClaudeCodeEngine("claude-code/opus:max", …): strings.Cut(after, ":") correctly splits into model=opus, thinking suffix=max. parseThinking("max") returns maxThinkingTokens (31999). Verified in cmd/gadfly/engine.go:83–121 and confirmed by TestClaudeCodeThinking in engine_test.go.
runPass env injection: MAX_THINKING_TOKENS=31999 is appended to the subprocess env only when thinkingTokens > 0. claudeEnv() explicitly keeps MAX_THINKING_TOKENS if already in the parent env (line 224), and since the append happens after claudeEnv(), the value is always present for :max specs. No env leak path.
Graceful degradation: If the CLI build ignores MAX_THINKING_TOKENS, the env var is simply unused — the subprocess runs normally. This is the documented "no-op" behavior. No error is swallowed.
Entrypoint model normalization: provider_of("claude-code/opus:max") extracts claude-code (first /-segment), matching the claude-code=2 concurrency lane correctly. The : in the spec becomes _ in the status filename via tr -c '[:alnum:]._-' '_' — unique and safe.
Negative/invalid thinking values: parseThinking returns 0 for any non-integer, non-"max" string, and for n ≤ 0 from strconv.Atoi. No overflow or injection risk.
Process cleanup on timeout/cancel: cmd.Cancel kills the entire process group with SIGKILL. The cancelled-context check returns an error, never partial output. Verified in engine.go:148–176.
Failed model runs: entrypoint.sh:183 logs the failure and continues (|| log "model $1 failed (continuing)"), so one model's crash doesn't abort the swarm.

No unhandled errors, missing defers, nil-unsafety, or edge-case gaps found in this change.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 59s}

### 🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> **VERDICT: No material issues found** Reviewed the diff through the security lens: - **Secret isolation intact**: The `claudeCodeEngine` subprocess gets its env from `claudeEnv()` (engine.go:221), which whitelists only `PATH/HOME/…` and `CLAUDE_*`/`ANTHROPIC_*`/`GADFLY_CLAUDE_*` prefixes — deliberately dropping `GITEA_TOKEN`, `GADFLY_FINDINGS_TOKEN`, and provider keys. The new `MAX_THINKING_TOKENS` value is appended *after* that whitelist (engine.go:154), never from the runner's env, so no secret leakage path is introduced. - **No injection surface**: The `:max` suffix is parsed by `parseThinking` into a compile-time constant (`31999`, engine.go:66), then `strconv.Itoa`'d into an env var. There is no user-controlled string that reaches a shell, CLI flag, or template. - **Permission boundary unchanged**: The Claude CLI still runs under `--permission-mode plan` (read-only sandbox, engine.go:104). The `:max` spec only affects a thinking-budget env var — it does not grant additional capabilities to the subprocess. - **Image pinning**: Both old and new tags use immutable `:sha-` digests, not `:latest`, preventing supply-chain substitution between runs. - **CLAUDE_CODE_OAUTH_TOKEN** remains a GitHub secret referenced via `${{ secrets.* }}`, never echoed or logged. </details> <details><summary>🎯 Correctness — No material issues found</summary> **VERDICT: No material issues found** The diff adds `claude-code/opus:max` to the `GADFLY_MODELS` list and pins the Docker image to `sha-c342bdb` (which includes the `:thinking` spec parsing from the prior PR). I verified the following correctness concerns: - **`:max` suffix parsing**: `cmd/gadfly/engine.go:78-121` — `parseThinking("max")` correctly returns `maxThinkingTokens` (31999), and `newClaudeCodeEngine("claude-code/opus:max", ...)` correctly splits into `model="opus"`, `thinkingTokens=31999`. Tests at `engine_test.go:221-278` confirm this end-to-end, including that `MAX_THINKING_TOKENS=31999` is injected into the subprocess env for `:max` specs. - **Provider routing**: `modelProvider()` (`model.go:142-151`) uses `strings.Cut(model, "/")` to extract the provider lane, yielding `"claude-code"` from `"claude-code/opus:max"` — matching the `claude-code=2` concurrency key. Correct. - **Concurrency**: With 3 claude-code models (sonnet, opus, opus:max) and `claude-code=2` concurrency, two run at a time and the third queues. No overflow or misconfiguration. - **The 31999 value**: Intentionally just under a 32k boundary; consistent across code (`const maxThinkingTokens = 31999`), tests (`MTT=31999`), and PR description. Internally consistent. No logic bugs or semantic correctness issues introduced. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **VERDICT: No material issues found** I reviewed the diff through the error-handling & edge-case lens. The change adds `claude-code/opus:max` to the model fleet and pins the image to `sha-c342bdb`. I verified the full parsing and execution path: - **`newClaudeCodeEngine("claude-code/opus:max", …)`**: `strings.Cut(after, ":")` correctly splits into model=`opus`, thinking suffix=`max`. `parseThinking("max")` returns `maxThinkingTokens` (31999). Verified in `cmd/gadfly/engine.go:83–121` and confirmed by `TestClaudeCodeThinking` in `engine_test.go`. - **`runPass` env injection**: `MAX_THINKING_TOKENS=31999` is appended to the subprocess env only when `thinkingTokens > 0`. `claudeEnv()` explicitly keeps `MAX_THINKING_TOKENS` if already in the parent env (line 224), and since the append happens *after* `claudeEnv()`, the value is always present for `:max` specs. No env leak path. - **Graceful degradation**: If the CLI build ignores `MAX_THINKING_TOKENS`, the env var is simply unused — the subprocess runs normally. This is the documented "no-op" behavior. No error is swallowed. - **Entrypoint model normalization**: `provider_of("claude-code/opus:max")` extracts `claude-code` (first `/`-segment), matching the `claude-code=2` concurrency lane correctly. The `:` in the spec becomes `_` in the status filename via `tr -c '[:alnum:]._-' '_'` — unique and safe. - **Negative/invalid thinking values**: `parseThinking` returns 0 for any non-integer, non-"max" string, and for `n ≤ 0` from `strconv.Atoi`. No overflow or injection risk. - **Process cleanup on timeout/cancel**: `cmd.Cancel` kills the entire process group with `SIGKILL`. The cancelled-context check returns an error, never partial output. Verified in `engine.go:148–176`. - **Failed model runs**: `entrypoint.sh:183` logs the failure and continues (`|| log "model $1 failed (continuing)"`), so one model's crash doesn't abort the swarm. No unhandled errors, missing defers, nil-unsafety, or edge-case gaps found in this change. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 59s

gitea-actions bot commented

2026-06-27 22:41:11 +00:00

🪰 Gadfly review — `claude-code/sonnet` (claude-code)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

Image SHA pin (sha-c342bdb): Pinning to an immutable SHA is the right security practice here. The CLAUDE.md explicitly warns that :latest is cached by act_runner and may silently run a stale image; a SHA pin avoids that and provides integrity.
claude-code/opus:max model name handling (verified via cmd/gadfly/engine.go): The :max suffix is parsed by parseThinking() and converted to the integer 31999 via strconv.Atoi — never shell-interpolated. The model name itself is passed as a separate argv element to exec.CommandContext, not concatenated into a shell string, so there is no command-injection surface.
MAX_THINKING_TOKENS env var (verified via engine.go:152-155): Set on the subprocess via cmd.Env = append(…, "MAX_THINKING_TOKENS="+strconv.Itoa(e.thinkingTokens)). Value originates from the hardcoded constant 31999 in engine.go, not from any untrusted input.
Secret exposure to subprocess (verified via engine.go's claudeEnv() allowlist): GITEA_TOKEN, GADFLY_FINDINGS_TOKEN, and other provider credentials are explicitly excluded from the Claude Code subprocess environment. CLAUDE_CODE_OAUTH_TOKEN reaches the subprocess via the CLAUDE_* wildcard — this is intentional and unchanged from the existing sonnet/opus runners.
GADFLY_MODELS value: Hardcoded in the workflow YAML, not derived from PR content or any user-controlled input, so no injection vector from that surface.

Nothing in this diff opens a new security surface relative to the existing claude-code/sonnet and claude-code/opus entries that are already present.

🎯 Correctness — No material issues found

Based on my investigation, I've verified the key correctness properties of this change.

Verdict: No material issues found

What I verified:

claude-code/opus:max parses correctly. engine.go:83–109 (newClaudeCodeEngine) splits on the first / to extract the provider (claude-code) then splits the remainder on : to separate the model id (opus) from the thinking tier (max). parseThinking("max") maps that to maxThinkingTokens = 31999 and injects it as MAX_THINKING_TOKENS into the subprocess env. The spec is unambiguous because Claude model IDs contain no colons. Verified by reading engine.go.
MAX_THINKING_TOKENS=31999. This is a pre-existing constant from PR #5, not introduced here. The PR description correctly characterises it as best-effort: if the CLI subprocess ignores the env var, opus:max degrades to plain opus. Verified by reading engine.go:66.
Concurrency with three claude-code models is fine. claude-code=2 means the third model queues behind whichever of the first two finishes; all three still run. The cap was already set before this PR and only controls peak parallelism, not total execution. Verified by reading entrypoint.sh.
Fleet comment is accurate. The updated comment ("6 cloud (3 at a time) + Claude Code (sonnet, opus, and opus with max extended thinking)") correctly describes the new model list of 6 ollama-cloud models + 3 claude-code models. Verified against the GADFLY_MODELS value in the diff.

🧯 Error handling & edge cases — No material issues found

No material issues found

The diff touches only workflow YAML: a new image tag and appending claude-code/opus:max to GADFLY_MODELS. I verified the error-handling paths through the actual code.

:max suffix parsing — parseThinking() in engine.go:111-121 correctly maps "max" → 31999 tokens. Invalid suffixes degrade silently to 0 (thinking disabled), which is tested and intentional (engine_test.go:231). The new spec is well-formed and hits the happy path.
MAX_THINKING_TOKENS injection — engine.go:148-155 guards injection behind thinkingTokens > 0, so it's only set for :max. The PR description's "best-effort / no-op if CLI ignores it" matches the code exactly — the subprocess treats the env var as optional. No error path here.
Concurrency: claude-code=2 with 3 claude-code models — run_lane (entrypoint.sh:206-218) handles this correctly. With cap=2 and 3 models, model 3 queues: it's launched only after wait -n frees a slot. Each lane runs in its own subshell (line 250), so wait -n inside the claude-code lane cannot accidentally reap processes from the ollama-cloud lane. The wait -n 2>/dev/null || wait fallback degrades gracefully to sequential execution if wait -n is unavailable. No overflow, no starvation, no missing wait.
Status board pre-seeding — entrypoint.sh:233-236 pre-seeds all models in MODEL_LIST at startup (including the new third claude-code entry), so the board correctly tracks all 3 from t=0, even while the 3rd is queued.

No new error paths are introduced. All relevant edge cases are handled by existing, tested code.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 32s}

### 🪰 Gadfly review — `claude-code/sonnet` (claude-code) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> **No material issues found** - **Image SHA pin** (`sha-c342bdb`): Pinning to an immutable SHA is the right security practice here. The CLAUDE.md explicitly warns that `:latest` is cached by act_runner and may silently run a stale image; a SHA pin avoids that and provides integrity. - **`claude-code/opus:max` model name handling** (verified via `cmd/gadfly/engine.go`): The `:max` suffix is parsed by `parseThinking()` and converted to the integer `31999` via `strconv.Atoi` — never shell-interpolated. The model name itself is passed as a separate argv element to `exec.CommandContext`, not concatenated into a shell string, so there is no command-injection surface. - **`MAX_THINKING_TOKENS` env var** (verified via `engine.go:152-155`): Set on the subprocess via `cmd.Env = append(…, "MAX_THINKING_TOKENS="+strconv.Itoa(e.thinkingTokens))`. Value originates from the hardcoded constant `31999` in engine.go, not from any untrusted input. - **Secret exposure to subprocess** (verified via `engine.go`'s `claudeEnv()` allowlist): `GITEA_TOKEN`, `GADFLY_FINDINGS_TOKEN`, and other provider credentials are explicitly excluded from the Claude Code subprocess environment. `CLAUDE_CODE_OAUTH_TOKEN` reaches the subprocess via the `CLAUDE_*` wildcard — this is intentional and unchanged from the existing sonnet/opus runners. - **`GADFLY_MODELS` value**: Hardcoded in the workflow YAML, not derived from PR content or any user-controlled input, so no injection vector from that surface. Nothing in this diff opens a new security surface relative to the existing `claude-code/sonnet` and `claude-code/opus` entries that are already present. </details> <details><summary>🎯 Correctness — No material issues found</summary> Based on my investigation, I've verified the key correctness properties of this change. --- **Verdict: No material issues found** **What I verified:** - **`claude-code/opus:max` parses correctly.** `engine.go:83–109` (`newClaudeCodeEngine`) splits on the first `/` to extract the provider (`claude-code`) then splits the remainder on `:` to separate the model id (`opus`) from the thinking tier (`max`). `parseThinking("max")` maps that to `maxThinkingTokens = 31999` and injects it as `MAX_THINKING_TOKENS` into the subprocess env. The spec is unambiguous because Claude model IDs contain no colons. *Verified by reading `engine.go`.* - **`MAX_THINKING_TOKENS=31999`.** This is a pre-existing constant from PR #5, not introduced here. The PR description correctly characterises it as best-effort: if the CLI subprocess ignores the env var, `opus:max` degrades to plain `opus`. *Verified by reading `engine.go:66`.* - **Concurrency with three claude-code models is fine.** `claude-code=2` means the third model queues behind whichever of the first two finishes; all three still run. The cap was already set before this PR and only controls peak parallelism, not total execution. *Verified by reading `entrypoint.sh`.* - **Fleet comment is accurate.** The updated comment ("6 cloud (3 at a time) + Claude Code (sonnet, opus, and opus with max extended thinking)") correctly describes the new model list of 6 ollama-cloud models + 3 claude-code models. *Verified against the `GADFLY_MODELS` value in the diff.* </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **No material issues found** The diff touches only workflow YAML: a new image tag and appending `claude-code/opus:max` to `GADFLY_MODELS`. I verified the error-handling paths through the actual code. - **`:max` suffix parsing** — `parseThinking()` in `engine.go:111-121` correctly maps `"max"` → 31999 tokens. Invalid suffixes degrade silently to 0 (thinking disabled), which is tested and intentional (`engine_test.go:231`). The new spec is well-formed and hits the happy path. - **`MAX_THINKING_TOKENS` injection** — `engine.go:148-155` guards injection behind `thinkingTokens > 0`, so it's only set for `:max`. The PR description's "best-effort / no-op if CLI ignores it" matches the code exactly — the subprocess treats the env var as optional. No error path here. - **Concurrency: `claude-code=2` with 3 claude-code models** — `run_lane` (`entrypoint.sh:206-218`) handles this correctly. With `cap=2` and 3 models, model 3 queues: it's launched only after `wait -n` frees a slot. Each lane runs in its own subshell (line 250), so `wait -n` inside the claude-code lane cannot accidentally reap processes from the ollama-cloud lane. The `wait -n 2>/dev/null || wait` fallback degrades gracefully to sequential execution if `wait -n` is unavailable. No overflow, no starvation, no missing `wait`. - **Status board pre-seeding** — `entrypoint.sh:233-236` pre-seeds *all* models in `MODEL_LIST` at startup (including the new third claude-code entry), so the board correctly tracks all 3 from t=0, even while the 3rd is queued. No new error paths are introduced. All relevant edge cases are handled by existing, tested code. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 32s

gitea-actions bot commented

2026-06-27 22:41:11 +00:00

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

🔒 Security — No material issues found.

Verified the change against the security lens:

Image bump sha-80d8f53 → sha-c342bdb: same pinned-SHA pattern, same internal registry, same permissions: contents: read / issues: write / pull-requests: write block, same secrets. No new trust boundary.
claude-code/opus:max model spec (adversarial-review.yml:63): parsed by newClaudeCodeEngine in cmd/gadfly/engine.go:83-109. Traced through:
- provider_of("claude-code/opus:max") → provider claude-code (read via ${1%%/*} at entrypoint.sh:159). No path/colon ambiguity.
- strings.Cut("opus:max", ":") → model="opus", thinking=parseThinking("max")=31999 (engine.go:113-121). The literal "max" never reaches the model — only an integer token budget does, so no spec-string injection vector.
- CLI invocation: claude -p ... --model opus --permission-mode plan + MAX_THINKING_TOKENS=31999. --permission-mode plan (read-only) is unchanged (engine.go:104); no write tools granted to the new reviewer.
Subprocess env sanitization (engine.go:221-241, claudeEnv): GITEA_TOKEN, GADFLY_FINDINGS_TOKEN, OLLAMA_CLOUD_API_KEY, and other runner secrets are still dropped. The keep-list (PATH, HOME, *CLAUDE_*, *ANTHROPIC_*, *GADFLY_CLAUDE_*, *NODE_*, *XDG_*, …) is unchanged. MAX_THINKING_TOKENS is allowed through (already on the keep list) AND force-appended with strconv.Itoa(e.thinkingTokens) (engine.go:152-155), so a leaked-from-runner value is overridden deterministically — no env-injection risk.
Auth: the new spec reuses the existing CLAUDE_CODE_OAUTH_TOKEN; no new secret, no new credential surface.
Trigger gating: the issue_comment if: allow-list (steve|fizi|dazed) is unchanged. Untrusted commenters still can't fire a secret-bearing run.
Concurrency cap (claude-code=2): bounds in-flight subprocesses; no DoS surface introduced.

The :max spec only adjusts an integer thinking-budget env var on a read-only Claude Code subprocess under the same minimal env as the existing claude-code/opus. Nothing in my lens is materially wrong.

🎯 Correctness — No material issues found

VERDICT: No material issues found

Reviewed through the Correctness lens. I verified the diff end-to-end against the checked-out code (cmd/gadfly/engine.go, cmd/gadfly/engine_test.go, cmd/gadfly/main.go, cmd/gadfly/lens_concurrency_test.go, cmd/gadfly/model.go, entrypoint.sh).

What I confirmed works correctly:

:max spec parsing — engine.go:83-109 cuts on / then :; claude-code/opus:max → model="opus", thinkingTokens=31999. parseThinking("max") at engine.go:113-121 maps to the maxThinkingTokens const (31999), a positive int to itself, anything else to 0 (off). Covered by TestClaudeCodeThinking (engine_test.go:221-241) and TestRunPassInjectsThinkingTokens (engine_test.go:259-281).
Subprocess env injection — engine.go:152-155 appends MAX_THINKING_TOKENS=31999 to the CLI subprocess only when thinkingTokens>0. The CLI env allowlist at engine.go:224 explicitly keeps MAX_THINKING_TOKENS, so it survives claudeEnv()'s strict filtering.
Provider lane resolution — entrypoint.sh:159's provider_of extracts claude-code as the lane for claude-code/opus:max (matches on */*, takes ${1%%/*}).
Concurrency math — GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,claude-code=2" means the three claude-code entries (sonnet, opus, opus:max) cap at 2 in-flight; lenses default to serial (1) inside the claude-code lane since it's absent from GADFLY_PROVIDER_LENS_CONCURRENCY. Peak: 2 claude subprocesses + 9 ollama-cloud queries ≤ the 10-comment budget called out at adversarial-review.yml:66. Job-level timeout-minutes: 90 (5400 s) comfortably covers worst-case serialization.
Image bump — :sha-c342bdb is asserted (in the PR description) to carry the :thinking spec parse from #5; the constant maxThinkingTokens = 31999 is defined in that same engine.go file and is what gets injected — verified end-to-end by TestRunPassInjectsThinkingTokens.
Fallback honesty — :max on an older CLI that ignores MAX_THINKING_TOKENS simply degrades to plain claude-code/opus, as documented. No silent corruption.

The :max magic number 31999 is the Claude Code CLI's documented "ultrathink" ceiling, not a re-derivable constant — it's the vendor's value, applied as a best-effort env override. No bug.

No logic errors, no domain-incorrect constants, no behavioral regressions introduced by this PR through the correctness lens.

🧯 Error handling & edge cases — No material issues found

Nothing in my lens specifically. The diff is a YAML-only change to a Gitea workflow that adds a new GADFLY_MODELS entry. The engine code clearly supports the :max spec (mapped to 31999), the env sanitization preserves MAX_THINKING_TOKENS, and timeout/concurrency envelopes still fit.

VERDICT: No material issues found

Nothing in my error-handling / edge-cases lens is materially wrong with this diff:

:max parsing: Verified at cmd/gadfly/engine.go:111-121 (parseThinking) and the constant at cmd/gadfly/engine.go:66 (maxThinkingTokens = 31999) — exactly the value the PR description claims gets injected.
Env preservation: claudeEnv() at cmd/gadfly/engine.go:224 explicitly keeps MAX_THINKING_TOKENS through its allow-list, so the subprocess will see it (and not have it silently dropped).
Unrecognized suffix fallback: parseThinking returns 0 for any garbage (typos like claude-code/opus:Maxx or negative ints), which claudeCodeEngine.runPass then treats as "no thinking override" — cmd.Env simply doesn't get MAX_THINKING_TOKENS appended (engine.go:152 guards on > 0). This is a safe degradation, not a swallowed error, and the PR's own spec is correct.
Concurrency / timeout envelope: claude-code=2 means sonnet + opus + opus:max (3 entries) queues 1 — well within GADFLY_TIMEOUT_SECS=600 × 3 lenses and the 90-minute job ceiling. No overflow concern introduced by adding the third claude-code lane.
Process-group cleanup: Already handled at engine.go:159-165 (Setpgid + group SIGKILL on cancel), so a long-running :max run that gets timed out can't orphan Node children. Pre-existing, not regressed by this diff.

The "best-effort no-op if the installed CLI build ignores it" caveat in the PR description is honest: engine.go:65 calls it out, and the claude-code/opus lane has always had this posture for MAX_THINKING_TOKENS.

No findings to report in this lane.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 22s}

### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> 🔒 **Security — No material issues found.** Verified the change against the security lens: - **Image bump** `sha-80d8f53 → sha-c342bdb`: same pinned-SHA pattern, same internal registry, same `permissions: contents: read / issues: write / pull-requests: write` block, same secrets. No new trust boundary. - **`claude-code/opus:max` model spec** (`adversarial-review.yml:63`): parsed by `newClaudeCodeEngine` in `cmd/gadfly/engine.go:83-109`. Traced through: - `provider_of("claude-code/opus:max")` → provider `claude-code` (read via `${1%%/*}` at `entrypoint.sh:159`). No path/colon ambiguity. - `strings.Cut("opus:max", ":")` → model="opus", thinking=`parseThinking("max")`=31999 (`engine.go:113-121`). The literal `"max"` never reaches the model — only an integer token budget does, so no spec-string injection vector. - CLI invocation: `claude -p ... --model opus --permission-mode plan` + `MAX_THINKING_TOKENS=31999`. `--permission-mode plan` (read-only) is unchanged (`engine.go:104`); no write tools granted to the new reviewer. - **Subprocess env sanitization** (`engine.go:221-241`, `claudeEnv`): `GITEA_TOKEN`, `GADFLY_FINDINGS_TOKEN`, `OLLAMA_CLOUD_API_KEY`, and other runner secrets are still dropped. The keep-list (`PATH`, `HOME`, `*CLAUDE_*`, `*ANTHROPIC_*`, `*GADFLY_CLAUDE_*`, `*NODE_*`, `*XDG_*`, …) is unchanged. `MAX_THINKING_TOKENS` is allowed through (already on the keep list) AND force-appended with `strconv.Itoa(e.thinkingTokens)` (`engine.go:152-155`), so a leaked-from-runner value is overridden deterministically — no env-injection risk. - **Auth**: the new spec reuses the existing `CLAUDE_CODE_OAUTH_TOKEN`; no new secret, no new credential surface. - **Trigger gating**: the `issue_comment` `if:` allow-list (`steve|fizi|dazed`) is unchanged. Untrusted commenters still can't fire a secret-bearing run. - **Concurrency cap** (`claude-code=2`): bounds in-flight subprocesses; no DoS surface introduced. The `:max` spec only adjusts an integer thinking-budget env var on a read-only Claude Code subprocess under the same minimal env as the existing `claude-code/opus`. Nothing in my lens is materially wrong. </details> <details><summary>🎯 Correctness — No material issues found</summary> ## VERDICT: No material issues found Reviewed through the **Correctness** lens. I verified the diff end-to-end against the checked-out code (`cmd/gadfly/engine.go`, `cmd/gadfly/engine_test.go`, `cmd/gadfly/main.go`, `cmd/gadfly/lens_concurrency_test.go`, `cmd/gadfly/model.go`, `entrypoint.sh`). What I confirmed works correctly: - **`:max` spec parsing** — `engine.go:83-109` cuts on `/` then `:`; `claude-code/opus:max` → `model="opus"`, `thinkingTokens=31999`. `parseThinking("max")` at `engine.go:113-121` maps to the `maxThinkingTokens` const (31999), a positive int to itself, anything else to 0 (off). Covered by `TestClaudeCodeThinking` (`engine_test.go:221-241`) and `TestRunPassInjectsThinkingTokens` (`engine_test.go:259-281`). - **Subprocess env injection** — `engine.go:152-155` appends `MAX_THINKING_TOKENS=31999` to the CLI subprocess only when `thinkingTokens>0`. The CLI env allowlist at `engine.go:224` explicitly keeps `MAX_THINKING_TOKENS`, so it survives `claudeEnv()`'s strict filtering. - **Provider lane resolution** — `entrypoint.sh:159`'s `provider_of` extracts `claude-code` as the lane for `claude-code/opus:max` (matches on `*/*`, takes `${1%%/*}`). - **Concurrency math** — `GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,claude-code=2"` means the three claude-code entries (sonnet, opus, opus:max) cap at 2 in-flight; lenses default to serial (1) inside the claude-code lane since it's absent from `GADFLY_PROVIDER_LENS_CONCURRENCY`. Peak: 2 claude subprocesses + 9 ollama-cloud queries ≤ the 10-comment budget called out at `adversarial-review.yml:66`. Job-level `timeout-minutes: 90` (5400 s) comfortably covers worst-case serialization. - **Image bump** — `:sha-c342bdb` is asserted (in the PR description) to carry the `:thinking` spec parse from #5; the constant `maxThinkingTokens = 31999` is defined in that same engine.go file and is what gets injected — verified end-to-end by `TestRunPassInjectsThinkingTokens`. - **Fallback honesty** — `:max` on an older CLI that ignores `MAX_THINKING_TOKENS` simply degrades to plain `claude-code/opus`, as documented. No silent corruption. The `:max` magic number 31999 is the Claude Code CLI's documented "ultrathink" ceiling, not a re-derivable constant — it's the vendor's value, applied as a best-effort env override. No bug. No logic errors, no domain-incorrect constants, no behavioral regressions introduced by this PR through the correctness lens. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> Nothing in my lens specifically. The diff is a YAML-only change to a Gitea workflow that adds a new GADFLY_MODELS entry. The engine code clearly supports the `:max` spec (mapped to `31999`), the env sanitization preserves `MAX_THINKING_TOKENS`, and timeout/concurrency envelopes still fit. VERDICT: No material issues found Nothing in my error-handling / edge-cases lens is materially wrong with this diff: - **`:max` parsing**: Verified at `cmd/gadfly/engine.go:111-121` (`parseThinking`) and the constant at `cmd/gadfly/engine.go:66` (`maxThinkingTokens = 31999`) — exactly the value the PR description claims gets injected. - **Env preservation**: `claudeEnv()` at `cmd/gadfly/engine.go:224` explicitly keeps `MAX_THINKING_TOKENS` through its allow-list, so the subprocess will see it (and not have it silently dropped). - **Unrecognized suffix fallback**: `parseThinking` returns `0` for any garbage (typos like `claude-code/opus:Maxx` or negative ints), which `claudeCodeEngine.runPass` then treats as "no thinking override" — `cmd.Env` simply doesn't get `MAX_THINKING_TOKENS` appended (`engine.go:152` guards on `> 0`). This is a safe degradation, not a swallowed error, and the PR's own spec is correct. - **Concurrency / timeout envelope**: `claude-code=2` means sonnet + opus + opus:max (3 entries) queues 1 — well within `GADFLY_TIMEOUT_SECS=600` × 3 lenses and the 90-minute job ceiling. No overflow concern introduced by adding the third claude-code lane. - **Process-group cleanup**: Already handled at `engine.go:159-165` (`Setpgid` + group `SIGKILL` on cancel), so a long-running `:max` run that gets timed out can't orphan Node children. Pre-existing, not regressed by this diff. The "best-effort no-op if the installed CLI build ignores it" caveat in the PR description is honest: `engine.go:65` calls it out, and the `claude-code/opus` lane has always had this posture for `MAX_THINKING_TOKENS`. No findings to report in this lane. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 22s

gitea-actions bot commented

2026-06-27 22:41:11 +00:00

🪰 Gadfly review — `claude-code/opus` (claude-code)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

I've verified the relevant code path. My review is complete.

VERDICT: No material issues found

This PR is a workflow-config-only change: it bumps the dogfood image tag (sha-80d8f53 → sha-c342bdb), appends claude-code/opus:max to GADFLY_MODELS, and edits comments. I traced the security-relevant path for the new :max spec:

No injection from the model spec. claude-code/opus:max flows into newClaudeCodeEngine → parseThinking("max") → maxThinkingTokens (31999), which is applied as an env var (engine.go:152-154), and the CLI is invoked via exec.CommandContext with an argv slice (engine.go:127-139,149) — no shell, so no command-injection surface. The constant 31999 is hardcoded, not derived from the spec. Verified by reading engine.go.
Spec values are maintainer-controlled. GADFLY_MODELS is set in the committed workflow YAML, not from untrusted PR input, so there's no tainted-data path here regardless.
Secrets remain scoped away from the third-party CLI. claudeEnv() (engine.go:221-241) allowlists env into the subprocess and deliberately drops GITEA_TOKEN/provider keys; MAX_THINKING_TOKENS and CLAUDE_* (the OAuth token) are intentionally kept, which is correct and unchanged by this PR. Verified by reading the function.
Image pinning is the safer choice. Pinning to an immutable :sha-… tag rather than :latest reduces (not increases) supply-chain risk; it points at the project's own registry. No concern.

Nothing in the security lens is materially wrong with this change.

Outside my lens: the PR's correctness rests on the image sha-c342bdb actually containing the :thinking/:max parse — the checked-out source has it (engine.go:83-121), but the image contents can't be verified from the repo. That's a build/correctness matter, not security.

🎯 Correctness — No material issues found

I've verified the change end-to-end against the actual code. Here's my review.

VERDICT: No material issues found

This is a config-only change (image tag bump + one new GADFLY_MODELS entry + comment). Through the correctness lens, I verified every way the new claude-code/opus:max spec is consumed downstream, and it is handled correctly:

Spec parsing (cmd/gadfly/engine.go:83-121): claude-code/opus:max → strings.Cut(spec, "/") yields opus:max, then strings.Cut(after, ":") yields model opus + thinking suffix max. parseThinking("max") returns maxThinkingTokens (engine.go:66 = 31999), which is set as MAX_THINKING_TOKENS on the subprocess (engine.go:152-154). The magic number 31999 matches Claude Code's documented "ultrathink" budget — correct. There's an explicit test for exactly this spec (engine_test.go:229, :251, :270-273), and the comment/PR description's 31999 is consistent with the constant.
No model-id collision with the existing claude-code/opus entry: both resolve to --model opus, but they are intentionally distinct review passes (the stated goal — same model at two thinking depths). I checked the two places identity matters and neither collides:
- PR comment marker (scripts/run.sh:50) keys on the full ${MODEL} spec, so the markers are …claude-code/opus --> vs …claude-code/opus:max --> — separate comments, no clobbering.
- Status-board file (entrypoint.sh:165) sanitizes the full spec to claude-code_opus vs claude-code_opus_max — distinct files.
Provider/lane routing (entrypoint.sh:159): provider_of splits on / → claude-code, matching the claude-code=2 concurrency entry. The new entry correctly joins the existing claude-code lane (now 3 models, cap 2), and the cloud-lane concurrency comment ("3 cloud × 3 lenses = 9") is untouched and unaffected.
Best-effort claim is accurate: MAX_THINKING_TOKENS is only appended when thinkingTokens > 0 and is a plain env var on the CLI subprocess, so it's genuinely a no-op if the installed CLI ignores it — matching the PR description.

I verified the above by reading engine.go, engine_test.go, entrypoint.sh, and run.sh directly (not from the diff alone). The image-tag bump to :sha-c342bdb is asserted to contain this :max parsing; I can confirm the parsing exists at this checkout but cannot verify the contents of the published image from here — that's the only unverifiable element, and it's outside the correctness lens.

No logic, unit, or threshold errors in my lane.

🧯 Error handling & edge cases — No material issues found

I verified the relevant code rather than reviewing the YAML in isolation. The diff is workflow-only: it bumps the image to :sha-c342bdb, adds claude-code/opus:max to GADFLY_MODELS, and updates a comment. My lens is error handling & edge cases, so I traced where :max actually gets parsed and executed.

VERDICT: No material issues found

Findings (all within the error-handling / edge-case lens):

The :max suffix on claude-code/opus:max is parsed safely. Verified in cmd/gadfly/engine.go:86-95: strings.Cut(after, ":") splits the thinking tier, and parseThinking (engine.go:113-121) maps "max" → maxThinkingTokens (31999), a positive int → itself, and anything else → 0 (off). Bad/garbage suffixes degrade to "no extended thinking" rather than panicking — no strconv panic path, since the error from Atoi is checked.
The CSV-with-colons edge case is fine. claude-code/opus:max contains a colon, but GADFLY_MODELS is comma-separated and existing entries (qwen3-coder:480b-cloud) already carry colons, so colon-in-spec is an established, handled shape. The colon only gets special meaning after the claude-code/ slash prefix (engine.go:88-90), which is correct.
The "best-effort / no-op if CLI ignores it" claim in the PR holds without a failure mode: MAX_THINKING_TOKENS is only appended to the subprocess env when thinkingTokens > 0 (engine.go:152-154), and it's explicitly whitelisted in the minimal env passthrough (engine.go:224). If the installed CLI ignores the var, the run still produces a normal result — no error swallowed, no crash.
Unhappy paths in the claude-code engine that this model exercises are already handled: timeout/cancel surfaces as an error instead of partial bytes (engine.go:174-176), empty results error out (engine.go:190), and process-group kill prevents orphaned children (engine.go:159-165). This PR adds no new unhandled path.

Nothing in my lane is materially wrong. The change introduces no new error-handling or edge-case surface beyond the :max parse, which is covered (and tested — engine_test.go:229,247,272).

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 56s}

### 🪰 Gadfly review — `claude-code/opus` (claude-code) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> I've verified the relevant code path. My review is complete. ## VERDICT: No material issues found This PR is a workflow-config-only change: it bumps the dogfood image tag (`sha-80d8f53` → `sha-c342bdb`), appends `claude-code/opus:max` to `GADFLY_MODELS`, and edits comments. I traced the security-relevant path for the new `:max` spec: - **No injection from the model spec.** `claude-code/opus:max` flows into `newClaudeCodeEngine` → `parseThinking("max")` → `maxThinkingTokens` (31999), which is applied as an env var (`engine.go:152-154`), and the CLI is invoked via `exec.CommandContext` with an argv slice (`engine.go:127-139,149`) — no shell, so no command-injection surface. The constant `31999` is hardcoded, not derived from the spec. Verified by reading `engine.go`. - **Spec values are maintainer-controlled.** `GADFLY_MODELS` is set in the committed workflow YAML, not from untrusted PR input, so there's no tainted-data path here regardless. - **Secrets remain scoped away from the third-party CLI.** `claudeEnv()` (`engine.go:221-241`) allowlists env into the subprocess and deliberately drops `GITEA_TOKEN`/provider keys; `MAX_THINKING_TOKENS` and `CLAUDE_*` (the OAuth token) are intentionally kept, which is correct and unchanged by this PR. Verified by reading the function. - **Image pinning is the safer choice.** Pinning to an immutable `:sha-…` tag rather than `:latest` reduces (not increases) supply-chain risk; it points at the project's own registry. No concern. Nothing in the security lens is materially wrong with this change. Outside my lens: the PR's correctness rests on the *image* `sha-c342bdb` actually containing the `:thinking`/`:max` parse — the checked-out source has it (`engine.go:83-121`), but the image contents can't be verified from the repo. That's a build/correctness matter, not security. </details> <details><summary>🎯 Correctness — No material issues found</summary> I've verified the change end-to-end against the actual code. Here's my review. --- **VERDICT: No material issues found** This is a config-only change (image tag bump + one new `GADFLY_MODELS` entry + comment). Through the correctness lens, I verified every way the new `claude-code/opus:max` spec is consumed downstream, and it is handled correctly: - **Spec parsing** (`cmd/gadfly/engine.go:83-121`): `claude-code/opus:max` → `strings.Cut(spec, "/")` yields `opus:max`, then `strings.Cut(after, ":")` yields model `opus` + thinking suffix `max`. `parseThinking("max")` returns `maxThinkingTokens` (`engine.go:66 = 31999`), which is set as `MAX_THINKING_TOKENS` on the subprocess (`engine.go:152-154`). The magic number 31999 matches Claude Code's documented "ultrathink" budget — correct. There's an explicit test for exactly this spec (`engine_test.go:229`, `:251`, `:270-273`), and the comment/PR description's `31999` is consistent with the constant. - **No model-id collision with the existing `claude-code/opus` entry**: both resolve to `--model opus`, but they are intentionally distinct review passes (the stated goal — same model at two thinking depths). I checked the two places identity matters and neither collides: - PR comment marker (`scripts/run.sh:50`) keys on the full `${MODEL}` spec, so the markers are `…claude-code/opus -->` vs `…claude-code/opus:max -->` — separate comments, no clobbering. - Status-board file (`entrypoint.sh:165`) sanitizes the full spec to `claude-code_opus` vs `claude-code_opus_max` — distinct files. - **Provider/lane routing** (`entrypoint.sh:159`): `provider_of` splits on `/` → `claude-code`, matching the `claude-code=2` concurrency entry. The new entry correctly joins the existing claude-code lane (now 3 models, cap 2), and the cloud-lane concurrency comment ("3 cloud × 3 lenses = 9") is untouched and unaffected. - **Best-effort claim is accurate**: `MAX_THINKING_TOKENS` is only appended when `thinkingTokens > 0` and is a plain env var on the CLI subprocess, so it's genuinely a no-op if the installed CLI ignores it — matching the PR description. I verified the above by reading `engine.go`, `engine_test.go`, `entrypoint.sh`, and `run.sh` directly (not from the diff alone). The image-tag bump to `:sha-c342bdb` is asserted to contain this `:max` parsing; I can confirm the parsing exists at this checkout but cannot verify the contents of the published image from here — that's the only unverifiable element, and it's outside the correctness lens. No logic, unit, or threshold errors in my lane. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I verified the relevant code rather than reviewing the YAML in isolation. The diff is workflow-only: it bumps the image to `:sha-c342bdb`, adds `claude-code/opus:max` to `GADFLY_MODELS`, and updates a comment. My lens is error handling & edge cases, so I traced where `:max` actually gets parsed and executed. **VERDICT: No material issues found** Findings (all within the error-handling / edge-case lens): - The `:max` suffix on `claude-code/opus:max` is parsed safely. Verified in `cmd/gadfly/engine.go:86-95`: `strings.Cut(after, ":")` splits the thinking tier, and `parseThinking` (`engine.go:113-121`) maps `"max"` → `maxThinkingTokens` (31999), a positive int → itself, and **anything else → 0 (off)**. Bad/garbage suffixes degrade to "no extended thinking" rather than panicking — no `strconv` panic path, since the error from `Atoi` is checked. - The CSV-with-colons edge case is fine. `claude-code/opus:max` contains a colon, but `GADFLY_MODELS` is comma-separated and existing entries (`qwen3-coder:480b-cloud`) already carry colons, so colon-in-spec is an established, handled shape. The colon only gets special meaning *after* the `claude-code/` slash prefix (`engine.go:88-90`), which is correct. - The "best-effort / no-op if CLI ignores it" claim in the PR holds without a failure mode: `MAX_THINKING_TOKENS` is only appended to the subprocess env when `thinkingTokens > 0` (`engine.go:152-154`), and it's explicitly whitelisted in the minimal env passthrough (`engine.go:224`). If the installed CLI ignores the var, the run still produces a normal result — no error swallowed, no crash. - Unhappy paths in the claude-code engine that this model exercises are already handled: timeout/cancel surfaces as an error instead of partial bytes (`engine.go:174-176`), empty results error out (`engine.go:190`), and process-group kill prevents orphaned children (`engine.go:159-165`). This PR adds no new unhandled path. Nothing in my lane is materially wrong. The change introduces no new error-handling or edge-case surface beyond the `:max` parse, which is covered (and tested — `engine_test.go:229,247,272`). </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 56s

gitea-actions bot commented

2026-06-27 22:42:11 +00:00

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

This is a workflow-only change that adds a new model entry (claude-code/opus:max) and bumps the pinned Docker image. Through the security lens:

No new secrets or credentials are introduced. The existing CLAUDE_CODE_OAUTH_TOKEN is unchanged.
No injection vectors: the model spec claude-code/opus:max is hardcoded in the workflow YAML, not derived from any user-controlled input. The :max suffix is parsed by parseThinking() (engine.go:113-121), which only accepts the literal "max" (case-insensitive) or a positive integer — no path traversal, command injection, or other taint risk.
MAX_THINKING_TOKENS in the subprocess env: I verified that claudeEnv() (engine.go:224) already allowlists MAX_THINKING_TOKENS for the CLI subprocess. The value is hardcoded as 31999 in Go (engine.go:66), not sourced from workflow inputs or secrets. No risk of an attacker-controlled value reaching the CLI.
Image pinning: the image is pinned to an immutable :sha-c342bdb tag — no supply-chain regression from the prior :sha-80d8f53.
No authz/permission changes: the job-level if: guard, GADFLY_ALLOWED_USERS, and permissions: block are all untouched.

Nothing material to flag here.

🎯 Correctness — No material issues found

Verdict: No material issues found

The diff adds claude-code/opus:max to the GADFLY_MODELS list and bumps the pinned image to sha-c342bdb (which carries the :thinking suffix parser from #5). I verified the full parse chain end-to-end:

Spec parsing (engine.go:83-108): claude-code/opus:max → model="opus", thinkingTokens=31999. Confirmed by reading the newClaudeCodeEngine and parseThinking functions.
Env injection (engine.go:152-154): thinkingTokens > 0 appends MAX_THINKING_TOKENS=31999 to the subprocess env. Confirmed by the test at engine_test.go:270-273 which asserts the stub receives MTT=31999.
Provider lane (entrypoint.sh:159): provider_of "claude-code/opus:max" → claude-code, matching the existing claude-code=2 concurrency cap. The 3 claude-code models (sonnet, opus, opus:max) will run 2 at a time — correct.
Lens concurrency: GADFLY_PROVIDER_LENS_CONCURRENCY has no claude-code entry, so claude-code models fall back to defaultLensConcurrency=1 (sequential lenses per model). This is consistent with the existing sonnet/opus behavior.

No logic bugs, no misrouted models, no concurrency misconfiguration. The change is purely additive and mechanically sound.

🧯 Error handling & edge cases — No material issues found

No material issues found.

The diff adds claude-code/opus:max to the model fleet. Through the error-handling lens, I verified:

:max parsing: newClaudeCodeEngine (engine.go:86-95) splits opus:max on : → model=opus, thinking=max. parseThinking("max") (engine.go:114) returns maxThinkingTokens=31999. The runPass method (engine.go:152-154) appends MAX_THINKING_TOKENS=31999 to the subprocess env only when thinkingTokens > 0. All paths are exercised by existing tests (engine_test.go:257+).
Graceful degradation: The engine explicitly documents (engine.go:64-66) that MAX_THINKING_TOKENS is a no-op if the CLI build doesn't honor it — opus:max falls back to behaving like opus. No crash, no error, no hung process.
Concurrency: GADFLY_PROVIDER_CONCURRENCY=claude-code=2 with 3 claude-code models means one queues. At 10 min/lens × 3 lenses × ~2.5 serial batches ≈ 75 min, well within the 90-minute job timeout. No starvation or deadlock risk.
No new nil/empty/zero/negative paths: The new spec flows through the same strings.Cut and parseThinking branches already exercised by claude-code/opus (no colon) and claude-code/sonnet. No new edge cases introduced.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 59s}

### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> ## Verdict: No material issues found This is a workflow-only change that adds a new model entry (`claude-code/opus:max`) and bumps the pinned Docker image. Through the security lens: - **No new secrets or credentials** are introduced. The existing `CLAUDE_CODE_OAUTH_TOKEN` is unchanged. - **No injection vectors**: the model spec `claude-code/opus:max` is hardcoded in the workflow YAML, not derived from any user-controlled input. The `:max` suffix is parsed by `parseThinking()` (engine.go:113-121), which only accepts the literal `"max"` (case-insensitive) or a positive integer — no path traversal, command injection, or other taint risk. - **`MAX_THINKING_TOKENS` in the subprocess env**: I verified that `claudeEnv()` (engine.go:224) already allowlists `MAX_THINKING_TOKENS` for the CLI subprocess. The value is hardcoded as `31999` in Go (engine.go:66), not sourced from workflow inputs or secrets. No risk of an attacker-controlled value reaching the CLI. - **Image pinning**: the image is pinned to an immutable `:sha-c342bdb` tag — no supply-chain regression from the prior `:sha-80d8f53`. - **No authz/permission changes**: the job-level `if:` guard, `GADFLY_ALLOWED_USERS`, and `permissions:` block are all untouched. Nothing material to flag here. </details> <details><summary>🎯 Correctness — No material issues found</summary> ## Verdict: No material issues found The diff adds `claude-code/opus:max` to the `GADFLY_MODELS` list and bumps the pinned image to `sha-c342bdb` (which carries the `:thinking` suffix parser from #5). I verified the full parse chain end-to-end: - **Spec parsing** (`engine.go:83-108`): `claude-code/opus:max` → `model="opus"`, `thinkingTokens=31999`. Confirmed by reading the `newClaudeCodeEngine` and `parseThinking` functions. - **Env injection** (`engine.go:152-154`): `thinkingTokens > 0` appends `MAX_THINKING_TOKENS=31999` to the subprocess env. Confirmed by the test at `engine_test.go:270-273` which asserts the stub receives `MTT=31999`. - **Provider lane** (`entrypoint.sh:159`): `provider_of "claude-code/opus:max"` → `claude-code`, matching the existing `claude-code=2` concurrency cap. The 3 claude-code models (sonnet, opus, opus:max) will run 2 at a time — correct. - **Lens concurrency**: `GADFLY_PROVIDER_LENS_CONCURRENCY` has no `claude-code` entry, so claude-code models fall back to `defaultLensConcurrency=1` (sequential lenses per model). This is consistent with the existing sonnet/opus behavior. No logic bugs, no misrouted models, no concurrency misconfiguration. The change is purely additive and mechanically sound. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **No material issues found.** The diff adds `claude-code/opus:max` to the model fleet. Through the error-handling lens, I verified: - **`:max` parsing**: `newClaudeCodeEngine` (engine.go:86-95) splits `opus:max` on `:` → model=`opus`, thinking=`max`. `parseThinking("max")` (engine.go:114) returns `maxThinkingTokens=31999`. The `runPass` method (engine.go:152-154) appends `MAX_THINKING_TOKENS=31999` to the subprocess env only when `thinkingTokens > 0`. All paths are exercised by existing tests (engine_test.go:257+). - **Graceful degradation**: The engine explicitly documents (engine.go:64-66) that `MAX_THINKING_TOKENS` is a no-op if the CLI build doesn't honor it — `opus:max` falls back to behaving like `opus`. No crash, no error, no hung process. - **Concurrency**: `GADFLY_PROVIDER_CONCURRENCY=claude-code=2` with 3 claude-code models means one queues. At 10 min/lens × 3 lenses × ~2.5 serial batches ≈ 75 min, well within the 90-minute job timeout. No starvation or deadlock risk. - **No new nil/empty/zero/negative paths**: The new spec flows through the same `strings.Cut` and `parseThinking` branches already exercised by `claude-code/opus` (no colon) and `claude-code/sonnet`. No new edge cases introduced. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 59s

gitea-actions bot commented

2026-06-27 22:42:14 +00:00

🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

.gitea/workflows/adversarial-review.yml:48 - Routine dependency update (Docker image)
.gitea/workflows/adversarial-review.yml:63 - Adds a new model variant (Claude Code opus with max extended thinking) to the model fleet
.gitea/workflows/adversarial-review.yml:57-62,65 - Updates comments to accurately reflect the changes

🎯 Correctness — No material issues found

No material issues found

🧯 Error handling & edge cases — Reviewed

Yes, the change shown in the diff is present in the current file .gitea/workflows/adversarial-review.yml. The line:

          GADFLY_MODELS: "minimax-m3:cloud,glm-5.2:cloud,glm-5.1:cloud,deepseek-v4-pro:cloud,nemotron-3-super:cloud,qwen3-coder:480b-cloud,claude-code/sonnet,claude-code/opus,claude-code/opus:max"

matches the updated line shown in the diff (the addition of ,claude-code/opus:max). The file currently contains that exact string, confirming the change has been applied.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 39s}

### 🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> No material issues found - .gitea/workflows/adversarial-review.yml:48 - Routine dependency update (Docker image) - .gitea/workflows/adversarial-review.yml:63 - Adds a new model variant (Claude Code opus with max extended thinking) to the model fleet - .gitea/workflows/adversarial-review.yml:57-62,65 - Updates comments to accurately reflect the changes </details> <details><summary>🎯 Correctness — No material issues found</summary> No material issues found </details> <details><summary>🧯 Error handling & edge cases — Reviewed</summary> Yes, the change shown in the diff is present in the current file `.gitea/workflows/adversarial-review.yml`. The line: ``` GADFLY_MODELS: "minimax-m3:cloud,glm-5.2:cloud,glm-5.1:cloud,deepseek-v4-pro:cloud,nemotron-3-super:cloud,qwen3-coder:480b-cloud,claude-code/sonnet,claude-code/opus,claude-code/opus:max" ``` matches the updated line shown in the diff (the addition of `,claude-code/opus:max`). The file currently contains that exact string, confirming the change has been applied. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 39s

gitea-actions bot commented

2026-06-27 22:42:34 +00:00

🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

This PR modifies a CI workflow file to enable a new reviewer model (claude-code/opus:max) by updating the Docker image tag and adding the new model to the GADFLY_MODELS environment variable. Through the security lens:

The change involves no new code paths or logic, only configuration updates in a YAML workflow file.
No secrets are exposed or mishandled; the existing CLAUDE_CODE_OAUTH_TOKEN secret is reused appropriately.
No input validation or sanitization concerns arise since this is a static configuration change.
No authentication or authorization gaps are introduced.
No injection, SSRF, or deserialization risks are present.

The diff has been verified as a safe configuration update with no security implications. The addition of a new model to the review fleet is operationally sound and follows the established pattern for model inclusion.

🎯 Correctness — No material issues found

No material issues found.

🧯 Error handling & edge cases — No material issues found

No material issues found.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5s}

### 🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> VERDICT: No material issues found This PR modifies a CI workflow file to enable a new reviewer model (`claude-code/opus:max`) by updating the Docker image tag and adding the new model to the `GADFLY_MODELS` environment variable. Through the security lens: - The change involves no new code paths or logic, only configuration updates in a YAML workflow file. - No secrets are exposed or mishandled; the existing `CLAUDE_CODE_OAUTH_TOKEN` secret is reused appropriately. - No input validation or sanitization concerns arise since this is a static configuration change. - No authentication or authorization gaps are introduced. - No injection, SSRF, or deserialization risks are present. The diff has been verified as a safe configuration update with no security implications. The addition of a new model to the review fleet is operationally sound and follows the established pattern for model inclusion. </details> <details><summary>🎯 Correctness — No material issues found</summary> No material issues found. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> No material issues found. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5s

gitea-actions bot commented

2026-06-27 22:44:08 +00:00

🪰 Gadfly review — `claude-code/opus:max` (claude-code)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

I have what I need. The change is workflow-only (image tag bump + one model added to GADFLY_MODELS), and I've verified the security-relevant code path that the new claude-code/opus:max spec exercises.

VERDICT: No material issues found

Reviewed through the 🔒 Security lens. This is a CI workflow-only change: bumps the pinned image from :sha-80d8f53 to :sha-c342bdb and appends claude-code/opus:max to GADFLY_MODELS. I traced the new spec into the code it actually exercises and found no security problem.

What I verified:

No injection via the new model spec. claude-code/opus:max is parsed by parseThinking (cmd/gadfly/engine.go:113), which only maps "max" → the hardcoded constant maxThinkingTokens = 31999 or a positive integer → itself; everything else → 0. The value reaches the subprocess as MAX_THINKING_TOKENS=<int> via strconv.Itoa (engine.go:154), and the CLI is launched with exec.CommandContext + an argv slice (engine.go:149) — no shell, so no command/arg injection from the spec.
No secret leakage to the third-party CLI. The Claude Code subprocess gets a minimal allowlisted env via claudeEnv() (engine.go:221-233): GITEA_TOKEN, GADFLY_FINDINGS_TOKEN, and provider API keys are deliberately dropped; only CLAUDE_*/ANTHROPIC_*/NODE_*/XDG_*/locale/MAX_THINKING_TOKENS pass through. Adding another claude-code/* model reuses the already-present CLAUDE_CODE_OAUTH_TOKEN — no new secret is introduced or exposed by this diff.
Supply-chain pinning preserved. The image is still pinned to an immutable :sha-<short> digest tag rather than :latest, which is the safer choice (and matches the project's own documented lesson about :latest caching). No downgrade in pinning discipline.

Note on scope: I did not re-audit the pre-existing env-allowlist or spec-parsing logic as findings since this PR doesn't modify them — I only confirmed the new value flows through them safely.

Nothing in the security lane is materially wrong here.

🎯 Correctness — No material issues found

I've verified the change against the actual checked-out code. Here is my review.

VERDICT: No material issues found (Correctness lens)

This is a workflow-only change: bump the image tag and append claude-code/opus:max to GADFLY_MODELS. I checked the supporting code paths that make this spec meaningful:

:max suffix parses correctly. newClaudeCodeEngine (cmd/gadfly/engine.go:83-96) cuts on the first /, then on :, sending opus:max → model opus, suffix max. parseThinking (engine.go:113-121) maps max → maxThinkingTokens. Verified by reading the code and the table tests at engine_test.go:228-232 ({"claude-code/opus:max", "opus", maxThinkingTokens}).
MAX_THINKING_TOKENS=31999 is the correct value. maxThinkingTokens = 31999 (engine.go:66) is Claude Code's documented "ultrathink" budget (32000−1). Re-derived against the known tier value; it's not an arbitrary magic number. It's injected into the subprocess env only when thinkingTokens > 0 (engine.go:152-154) and whitelisted through claudeEnv() (engine.go:224), so it actually reaches the CLI. The "best-effort / no-op if ignored" claim in the PR is accurate.
No collision between opus and opus:max. My main correctness worry was that both specs resolve to model id opus and would clobber each other's PR comment or status file. They don't: the comment MARKER uses the full ${MODEL} spec (scripts/run.sh:50), and status_file_for (entrypoint.sh:165) sanitizes the full spec via tr → claude-code_opus vs claude-code_opus_max. Both keys stay distinct, so the two thinking depths get separate comments as intended.
Engine selection still triggers. GADFLY_MODEL="claude-code/opus:max" satisfies isClaudeCodeSpec's claude-code/ prefix check (engine.go:72), and provider_of (entrypoint.sh:159) yields lane claude-code, matching the claude-code=2 concurrency entry. Correct routing.

One thing I could not verify (and it's operational, not code-correctness): whether the pinned image :sha-c342bdb actually contains this :max parse code. The repo is shallow here (only the head commit is present), so I can't confirm c342bdb's contents. If that image predates the suffix parsing, claude-code/opus:max would be passed to the CLI as a literal --model opus:max and fail — but that's a deployment claim the PR asserts, and the code at this checkout is correct and tested.

Nothing material in the correctness lane.

🧯 Error handling & edge cases — No material issues found

I've verified the relevant code paths. My lens is error handling & edge cases.

VERDICT: No material issues found

This is a workflow-only change (image SHA bump + adding claude-code/opus:max to GADFLY_MODELS). I checked the unhappy paths the new spec could introduce and they are all handled:

Spec parsing of opus:max is defensive. newClaudeCodeEngine splits on the first : (engine.go:90-95) and parseThinking (engine.go:113-121) maps "max"→31999, a positive int→itself, and anything else→0 with no panic — strconv.Atoi errors and non-positive values fall through to the safe 0 (thinking off). Bad input can't crash the run. Verified by reading the function.
No silent output collision between claude-code/opus and claude-code/opus:max (the one edge case worth checking, since both reduce to --model opus). The two stay distinct everywhere they're keyed:
- Status file: status_file_for sanitizes via tr -c '[:alnum:]._-' '_' (entrypoint.sh:165), giving claude-code_opus.json vs claude-code_opus_max.json.
- PR comment marker:  (run.sh:50) keys on the full MODEL, so the two get separate comments.
 Neither overwrites the other.
Cleanup on the slow path is intact. opus:max will run longer; if it exceeds the per-lens timeout, cmd.Cancel SIGKILLs the whole process group (engine.go:156-165, Setpgid), so no orphaned claude/node children. A missing/ignoring CLI is a normal pass error, kept advisory (engine.go:81-82), and review_one logs+continues on failure (entrypoint.sh:183).
Lane routing is correct. provider_of "claude-code/opus:max" → claude-code (segment before first /, entrypoint.sh:159), so the claude-code=2 concurrency cap applies as intended.

The constant 31999 matches the documented Claude Code "ultrathink" budget and the PR description; no off-by-one or unit concern in my lane.

Nothing in the error-handling lens is materially wrong here.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 38s}

### 🪰 Gadfly review — `claude-code/opus:max` (claude-code) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> I have what I need. The change is workflow-only (image tag bump + one model added to `GADFLY_MODELS`), and I've verified the security-relevant code path that the new `claude-code/opus:max` spec exercises. ## VERDICT: No material issues found Reviewed through the **🔒 Security** lens. This is a CI workflow-only change: bumps the pinned image from `:sha-80d8f53` to `:sha-c342bdb` and appends `claude-code/opus:max` to `GADFLY_MODELS`. I traced the new spec into the code it actually exercises and found no security problem. **What I verified:** - **No injection via the new model spec.** `claude-code/opus:max` is parsed by `parseThinking` (`cmd/gadfly/engine.go:113`), which only maps `"max"` → the hardcoded constant `maxThinkingTokens = 31999` or a positive integer → itself; everything else → 0. The value reaches the subprocess as `MAX_THINKING_TOKENS=<int>` via `strconv.Itoa` (`engine.go:154`), and the CLI is launched with `exec.CommandContext` + an argv slice (`engine.go:149`) — no shell, so no command/arg injection from the spec. - **No secret leakage to the third-party CLI.** The Claude Code subprocess gets a minimal allowlisted env via `claudeEnv()` (`engine.go:221-233`): `GITEA_TOKEN`, `GADFLY_FINDINGS_TOKEN`, and provider API keys are deliberately dropped; only `CLAUDE_*`/`ANTHROPIC_*`/`NODE_*`/`XDG_*`/locale/`MAX_THINKING_TOKENS` pass through. Adding another `claude-code/*` model reuses the already-present `CLAUDE_CODE_OAUTH_TOKEN` — no new secret is introduced or exposed by this diff. - **Supply-chain pinning preserved.** The image is still pinned to an immutable `:sha-<short>` digest tag rather than `:latest`, which is the safer choice (and matches the project's own documented lesson about `:latest` caching). No downgrade in pinning discipline. **Note on scope:** I did not re-audit the pre-existing env-allowlist or spec-parsing logic as findings since this PR doesn't modify them — I only confirmed the new value flows through them safely. Nothing in the security lane is materially wrong here. </details> <details><summary>🎯 Correctness — No material issues found</summary> I've verified the change against the actual checked-out code. Here is my review. --- **VERDICT: No material issues found** (Correctness lens) This is a workflow-only change: bump the image tag and append `claude-code/opus:max` to `GADFLY_MODELS`. I checked the supporting code paths that make this spec meaningful: - **`:max` suffix parses correctly.** `newClaudeCodeEngine` (`cmd/gadfly/engine.go:83-96`) cuts on the first `/`, then on `:`, sending `opus:max` → model `opus`, suffix `max`. `parseThinking` (`engine.go:113-121`) maps `max` → `maxThinkingTokens`. Verified by reading the code and the table tests at `engine_test.go:228-232` (`{"claude-code/opus:max", "opus", maxThinkingTokens}`). - **`MAX_THINKING_TOKENS=31999` is the correct value.** `maxThinkingTokens = 31999` (`engine.go:66`) is Claude Code's documented "ultrathink" budget (32000−1). Re-derived against the known tier value; it's not an arbitrary magic number. It's injected into the subprocess env only when `thinkingTokens > 0` (`engine.go:152-154`) and whitelisted through `claudeEnv()` (`engine.go:224`), so it actually reaches the CLI. The "best-effort / no-op if ignored" claim in the PR is accurate. - **No collision between `opus` and `opus:max`.** My main correctness worry was that both specs resolve to model id `opus` and would clobber each other's PR comment or status file. They don't: the comment `MARKER` uses the full `${MODEL}` spec (`scripts/run.sh:50`), and `status_file_for` (`entrypoint.sh:165`) sanitizes the full spec via `tr` → `claude-code_opus` vs `claude-code_opus_max`. Both keys stay distinct, so the two thinking depths get separate comments as intended. - **Engine selection still triggers.** `GADFLY_MODEL="claude-code/opus:max"` satisfies `isClaudeCodeSpec`'s `claude-code/` prefix check (`engine.go:72`), and `provider_of` (`entrypoint.sh:159`) yields lane `claude-code`, matching the `claude-code=2` concurrency entry. Correct routing. One thing I could **not** verify (and it's operational, not code-correctness): whether the pinned image `:sha-c342bdb` actually contains this `:max` parse code. The repo is shallow here (only the head commit is present), so I can't confirm `c342bdb`'s contents. If that image predates the suffix parsing, `claude-code/opus:max` would be passed to the CLI as a literal `--model opus:max` and fail — but that's a deployment claim the PR asserts, and the code at this checkout is correct and tested. Nothing material in the correctness lane. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I've verified the relevant code paths. My lens is error handling & edge cases. ## VERDICT: No material issues found This is a workflow-only change (image SHA bump + adding `claude-code/opus:max` to `GADFLY_MODELS`). I checked the unhappy paths the new spec could introduce and they are all handled: - **Spec parsing of `opus:max` is defensive.** `newClaudeCodeEngine` splits on the first `:` (`engine.go:90-95`) and `parseThinking` (`engine.go:113-121`) maps `"max"`→`31999`, a positive int→itself, and **anything else→0** with no panic — `strconv.Atoi` errors and non-positive values fall through to the safe `0` (thinking off). Bad input can't crash the run. Verified by reading the function. - **No silent output collision** between `claude-code/opus` and `claude-code/opus:max` (the one edge case worth checking, since both reduce to `--model opus`). The two stay distinct everywhere they're keyed: - Status file: `status_file_for` sanitizes via `tr -c '[:alnum:]._-' '_'` (`entrypoint.sh:165`), giving `claude-code_opus.json` vs `claude-code_opus_max.json`. - PR comment marker: `` (`run.sh:50`) keys on the full `MODEL`, so the two get separate comments. Neither overwrites the other. - **Cleanup on the slow path is intact.** `opus:max` will run longer; if it exceeds the per-lens timeout, `cmd.Cancel` SIGKILLs the whole process group (`engine.go:156-165`, `Setpgid`), so no orphaned `claude`/node children. A missing/ignoring CLI is a normal pass error, kept advisory (`engine.go:81-82`), and `review_one` logs+continues on failure (`entrypoint.sh:183`). - **Lane routing is correct.** `provider_of "claude-code/opus:max"` → `claude-code` (segment before first `/`, `entrypoint.sh:159`), so the `claude-code=2` concurrency cap applies as intended. The constant `31999` matches the documented Claude Code "ultrathink" budget and the PR description; no off-by-one or unit concern in my lane. Nothing in the error-handling lens is materially wrong here. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 38s

steve merged commit a4cdc905c9 into main

2026-06-27 22:49:49 +00:00

steve deleted branch ci/opus-max-thinking

2026-06-27 22:49:49 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: steve/gadfly#6

ci: enable claude-code/opus:max (max-thinking) reviewer #6

🪰 Gadfly — live review status

claude-code/opus · claude-code — ✅ done

claude-code/opus:max · claude-code — ✅ done

claude-code/sonnet · claude-code — ✅ done

deepseek-v4-pro:cloud · ollama-cloud — ✅ done

glm-5.1:cloud · ollama-cloud — ✅ done

glm-5.2:cloud · ollama-cloud — ✅ done

minimax-m3:cloud · ollama-cloud — ✅ done

nemotron-3-super:cloud · ollama-cloud — ✅ done

qwen3-coder:480b-cloud · ollama-cloud — ✅ done

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

🪰 Gadfly review — glm-5.1:cloud (ollama-cloud)

🪰 Gadfly review — claude-code/sonnet (claude-code)

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

VERDICT: No material issues found

🪰 Gadfly review — claude-code/opus (claude-code)

VERDICT: No material issues found

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: No material issues found

Verdict: No material issues found

🪰 Gadfly review — nemotron-3-super:cloud (ollama-cloud)

🪰 Gadfly review — qwen3-coder:480b-cloud (ollama-cloud)

🪰 Gadfly review — claude-code/opus:max (claude-code)

VERDICT: No material issues found

VERDICT: No material issues found

`claude-code/opus` · claude-code — ✅ done

`claude-code/opus:max` · claude-code — ✅ done

`claude-code/sonnet` · claude-code — ✅ done

`deepseek-v4-pro:cloud` · ollama-cloud — ✅ done

`glm-5.1:cloud` · ollama-cloud — ✅ done

`glm-5.2:cloud` · ollama-cloud — ✅ done

`minimax-m3:cloud` · ollama-cloud — ✅ done

`nemotron-3-super:cloud` · ollama-cloud — ✅ done

`qwen3-coder:480b-cloud` · ollama-cloud — ✅ done

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud)

🪰 Gadfly review — `claude-code/sonnet` (claude-code)

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

🪰 Gadfly review — `claude-code/opus` (claude-code)

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud)

🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud)

🪰 Gadfly review — `claude-code/opus:max` (claude-code)