ci: enable claude-code/opus:max (max-thinking) reviewer #6
Reference in New Issue
Block a user
Delete Branch "ci/opus-max-thinking"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Follow-up to #5 — the max-thinking opus you asked for.
Now that
:sha-c342bdb(which has the:thinkingspec parse from #5) is built, this addsclaude-code/opus:maxto the dogfood swarm and pins to that image. The Claude Code lineup is now sonnet + opus + opus-with-max-thinking, so the scoreboard can compare the same model at two thinking depths.Max thinking is applied via
MAX_THINKING_TOKENS=31999on the CLI subprocess (best-effort — a no-op if the installed CLI build ignores it, in which caseopus:maxsimply behaves likeopus).No code change — workflow only. This PR's own review will be the first to exercise
claude-code/opus:maxend-to-end. YAML validated.🤖 Generated with Claude Code
🪰 Gadfly — live review status
9/9 reviewers finished · updated 2026-06-27 22:47:56Z
claude-code/opus· claude-code — ✅ doneclaude-code/opus:max· claude-code — ✅ doneclaude-code/sonnet· claude-code — ✅ donedeepseek-v4-pro:cloud· ollama-cloud — ✅ doneglm-5.1:cloud· ollama-cloud — ✅ doneglm-5.2:cloud· ollama-cloud — ✅ doneminimax-m3:cloud· ollama-cloud — ✅ donenemotron-3-super:cloud· ollama-cloud — ✅ doneqwen3-coder:480b-cloud· ollama-cloud — ✅ doneLive status board. Findings are posted in each model's own comment. Advisory only — does not block merge.
🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
claude-code/opus:maxmodel spec:GADFLY_MODELSis parsed bynewClaudeCodeEngine(cmd/gadfly/engine.go:83-109). Theopus:maxtoken splits on/then:→model="opus",thinking=parseThinking("max")=31999. No shell is involved;modelis passed to the CLI purely as anexec.CommandContextargv element (--model,e.modelatengine.go:129-130), so the spec string is not a command-injection vector.MAX_THINKING_TOKENS=31999is injected into the subprocess env only atengine.go:152-154, afterclaudeEnv()builds the minimal allowlist. I confirmedclaudeEnv()(engine.go:221-241) deliberately dropsGITEA_TOKEN,GADFLY_FINDINGS_TOKEN, andOLLAMA_CLOUD_API_KEYand only keepsCLAUDE_*/ANTHROPIC_*/GADFLY_CLAUDE_*/locale/Node/XDG plusMAX_THINKING_TOKENS. No new secret exposure is introduced by adding the:maxlane; the OAuth token that is passed is the intended CLI auth.:sha-80d8f53to:sha-c342bdb— both immutable sha tags from the same private registry; no downgrade to:latestand no new trust boundary.if:actor allow-list (steve/fizi/dazed) andGADFLY_ALLOWED_USERSmirror are untouched, so the comment-trigger secret-bearing-run guard still applies to the new lane.cmd/gadfly/engine_test.go:221-281) which assert:max→31999and that the env is only set for:max, not plain specs — consistent with the security-relevant env handling above.Nothing in my lane is materially wrong.
🎯 Correctness — No material issues found
I verified the full pipeline this PR touches: the workflow YAML, the engine spec parser (
cmd/gadfly/engine.go), the entrypoint lane/concurrency logic (entrypoint.sh), and the per-model lens fan-out (cmd/gadfly/main.go). Tracingclaude-code/opus:maxend-to-end:engine.go:83-121):strings.Cut(after, ":")→ model=opus,parseThinking("max")→maxThinkingTokens(31999). Correct.ultrathinktier budget (think=4000 / megathink=10000 / ultrathink=31999). The constant matches the documented tier — not a plausible-looking magic number, the real one.engine.go:148-155):thinkingTokens > 0→ appendsMAX_THINKING_TOKENS=31999. The workflow env block does not setMAX_THINKING_TOKENS, andclaudeEnv()only keeps it if present inos.Environ(), so the subprocess receives exactly oneMAX_THINKING_TOKENS=31999entry — no duplicate. Correct for this configuration.entrypoint.sh:159):provider_of "claude-code/opus:max"→claude-code(first path segment). It joins theclaude-codelane (cap 2), alongsidesonnetandopus— 3 models, 2 at a time. Correct.entrypoint.sh:165):status_file_forsanitizes viatr -c '[:alnum:]._-' '_', soclaude-code/opus:max→claude-code_opus_max, distinct fromclaude-code_opus. No collision on the live board.main.go:281-286):GADFLY_PROVIDER_LENS_CONCURRENCYonly overridesollama-cloud;claude-codefalls back to default 1, so each claude-code model runs its 3 lenses serially. 2 models at a time × 1 lens = 2 concurrent CLI subprocesses — within the lane's intent.Verdict: No material issues found.
runPassappendsMAX_THINKING_TOKENS=…afterclaudeEnv(), which already keepsMAX_THINKING_TOKENSif it exists in the parent env. IfMAX_THINKING_TOKENSwere ever exported into the runner env, the subprocess would get two entries (first wins in glibcgetenv), silently overriding the:maxbudget with the parent value. This PR's config does not set it, so it's a latent engine edge case, not a defect of this change. Verified by readingengine.go:148-155andengine.go:221-240.🧯 Error handling & edge cases — No material issues found
I verified the YAML change against the code that consumes it. This is a workflow-only diff: it adds
claude-code/opus:maxtoGADFLY_MODELSand bumps the pinned image tag (which already carries the:max/MAX_THINKING_TOKENSparsing logic from the prior PR).Checked the edge cases in my lane:
:maxspec parsing (cmd/gadfly/engine.go:83-121):strings.Cuton:correctly splitsopus:maxinto model=opus, thinking=max;parseThinking("max")returns31999. No panic/nil path on malformed suffix — unknown values fall through to0(no-op), which is exactly the "best-effort, degrades to opus" behavior described.engine.go:152-154):MAX_THINKING_TOKENSonly appended whenthinkingTokens > 0; empty/zero never leaks. No swallowed error here.entrypoint.sh:159):provider_ofsplits on first/→claude-codeforclaude-code/opus:max, so it lands in theclaude-code=2concurrency lane as intended. No empty-provider edge case.entrypoint.sh:165):status_file_forsanitizes:→_, soclaude-code/opus(claude-code_opus) andclaude-code/opus:max(claude-code_opus_max) get distinct files — no collision that would make the live board conflate the two runs.entrypoint.sh:183): a failed model logs and continues; adding a third claude-code entry doesn't change that, and with cap=2 the third queues rather than erroring.No ignored errors, missing cleanup, panics on bad input, or boundary issues introduced by this change.
VERDICT: No material issues found.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 2s
🪰 Gadfly review —
glm-5.1:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
Reviewed the diff through the security lens:
Secret isolation intact: The
claudeCodeEnginesubprocess gets its env fromclaudeEnv()(engine.go:221), which whitelists onlyPATH/HOME/…andCLAUDE_*/ANTHROPIC_*/GADFLY_CLAUDE_*prefixes — deliberately droppingGITEA_TOKEN,GADFLY_FINDINGS_TOKEN, and provider keys. The newMAX_THINKING_TOKENSvalue is appended after that whitelist (engine.go:154), never from the runner's env, so no secret leakage path is introduced.No injection surface: The
:maxsuffix is parsed byparseThinkinginto a compile-time constant (31999, engine.go:66), thenstrconv.Itoa'd into an env var. There is no user-controlled string that reaches a shell, CLI flag, or template.Permission boundary unchanged: The Claude CLI still runs under
--permission-mode plan(read-only sandbox, engine.go:104). The:maxspec only affects a thinking-budget env var — it does not grant additional capabilities to the subprocess.Image pinning: Both old and new tags use immutable
:sha-digests, not:latest, preventing supply-chain substitution between runs.CLAUDE_CODE_OAUTH_TOKEN remains a GitHub secret referenced via
${{ secrets.* }}, never echoed or logged.🎯 Correctness — No material issues found
VERDICT: No material issues found
The diff adds
claude-code/opus:maxto theGADFLY_MODELSlist and pins the Docker image tosha-c342bdb(which includes the:thinkingspec parsing from the prior PR). I verified the following correctness concerns::maxsuffix parsing:cmd/gadfly/engine.go:78-121—parseThinking("max")correctly returnsmaxThinkingTokens(31999), andnewClaudeCodeEngine("claude-code/opus:max", ...)correctly splits intomodel="opus",thinkingTokens=31999. Tests atengine_test.go:221-278confirm this end-to-end, including thatMAX_THINKING_TOKENS=31999is injected into the subprocess env for:maxspecs.Provider routing:
modelProvider()(model.go:142-151) usesstrings.Cut(model, "/")to extract the provider lane, yielding"claude-code"from"claude-code/opus:max"— matching theclaude-code=2concurrency key. Correct.Concurrency: With 3 claude-code models (sonnet, opus, opus:max) and
claude-code=2concurrency, two run at a time and the third queues. No overflow or misconfiguration.The 31999 value: Intentionally just under a 32k boundary; consistent across code (
const maxThinkingTokens = 31999), tests (MTT=31999), and PR description. Internally consistent.No logic bugs or semantic correctness issues introduced.
🧯 Error handling & edge cases — No material issues found
VERDICT: No material issues found
I reviewed the diff through the error-handling & edge-case lens. The change adds
claude-code/opus:maxto the model fleet and pins the image tosha-c342bdb. I verified the full parsing and execution path:newClaudeCodeEngine("claude-code/opus:max", …):strings.Cut(after, ":")correctly splits into model=opus, thinking suffix=max.parseThinking("max")returnsmaxThinkingTokens(31999). Verified incmd/gadfly/engine.go:83–121and confirmed byTestClaudeCodeThinkinginengine_test.go.runPassenv injection:MAX_THINKING_TOKENS=31999is appended to the subprocess env only whenthinkingTokens > 0.claudeEnv()explicitly keepsMAX_THINKING_TOKENSif already in the parent env (line 224), and since the append happens afterclaudeEnv(), the value is always present for:maxspecs. No env leak path.Graceful degradation: If the CLI build ignores
MAX_THINKING_TOKENS, the env var is simply unused — the subprocess runs normally. This is the documented "no-op" behavior. No error is swallowed.Entrypoint model normalization:
provider_of("claude-code/opus:max")extractsclaude-code(first/-segment), matching theclaude-code=2concurrency lane correctly. The:in the spec becomes_in the status filename viatr -c '[:alnum:]._-' '_'— unique and safe.Negative/invalid thinking values:
parseThinkingreturns 0 for any non-integer, non-"max" string, and forn ≤ 0fromstrconv.Atoi. No overflow or injection risk.Process cleanup on timeout/cancel:
cmd.Cancelkills the entire process group withSIGKILL. The cancelled-context check returns an error, never partial output. Verified inengine.go:148–176.Failed model runs:
entrypoint.sh:183logs the failure and continues (|| log "model $1 failed (continuing)"), so one model's crash doesn't abort the swarm.No unhandled errors, missing defers, nil-unsafety, or edge-case gaps found in this change.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 59s
🪰 Gadfly review —
claude-code/sonnet(claude-code)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found
Image SHA pin (
sha-c342bdb): Pinning to an immutable SHA is the right security practice here. The CLAUDE.md explicitly warns that:latestis cached by act_runner and may silently run a stale image; a SHA pin avoids that and provides integrity.claude-code/opus:maxmodel name handling (verified viacmd/gadfly/engine.go): The:maxsuffix is parsed byparseThinking()and converted to the integer31999viastrconv.Atoi— never shell-interpolated. The model name itself is passed as a separate argv element toexec.CommandContext, not concatenated into a shell string, so there is no command-injection surface.MAX_THINKING_TOKENSenv var (verified viaengine.go:152-155): Set on the subprocess viacmd.Env = append(…, "MAX_THINKING_TOKENS="+strconv.Itoa(e.thinkingTokens)). Value originates from the hardcoded constant31999in engine.go, not from any untrusted input.Secret exposure to subprocess (verified via
engine.go'sclaudeEnv()allowlist):GITEA_TOKEN,GADFLY_FINDINGS_TOKEN, and other provider credentials are explicitly excluded from the Claude Code subprocess environment.CLAUDE_CODE_OAUTH_TOKENreaches the subprocess via theCLAUDE_*wildcard — this is intentional and unchanged from the existing sonnet/opus runners.GADFLY_MODELSvalue: Hardcoded in the workflow YAML, not derived from PR content or any user-controlled input, so no injection vector from that surface.Nothing in this diff opens a new security surface relative to the existing
claude-code/sonnetandclaude-code/opusentries that are already present.🎯 Correctness — No material issues found
Based on my investigation, I've verified the key correctness properties of this change.
Verdict: No material issues found
What I verified:
claude-code/opus:maxparses correctly.engine.go:83–109(newClaudeCodeEngine) splits on the first/to extract the provider (claude-code) then splits the remainder on:to separate the model id (opus) from the thinking tier (max).parseThinking("max")maps that tomaxThinkingTokens = 31999and injects it asMAX_THINKING_TOKENSinto the subprocess env. The spec is unambiguous because Claude model IDs contain no colons. Verified by readingengine.go.MAX_THINKING_TOKENS=31999. This is a pre-existing constant from PR #5, not introduced here. The PR description correctly characterises it as best-effort: if the CLI subprocess ignores the env var,opus:maxdegrades to plainopus. Verified by readingengine.go:66.Concurrency with three claude-code models is fine.
claude-code=2means the third model queues behind whichever of the first two finishes; all three still run. The cap was already set before this PR and only controls peak parallelism, not total execution. Verified by readingentrypoint.sh.Fleet comment is accurate. The updated comment ("6 cloud (3 at a time) + Claude Code (sonnet, opus, and opus with max extended thinking)") correctly describes the new model list of 6 ollama-cloud models + 3 claude-code models. Verified against the
GADFLY_MODELSvalue in the diff.🧯 Error handling & edge cases — No material issues found
No material issues found
The diff touches only workflow YAML: a new image tag and appending
claude-code/opus:maxtoGADFLY_MODELS. I verified the error-handling paths through the actual code.:maxsuffix parsing —parseThinking()inengine.go:111-121correctly maps"max"→ 31999 tokens. Invalid suffixes degrade silently to 0 (thinking disabled), which is tested and intentional (engine_test.go:231). The new spec is well-formed and hits the happy path.MAX_THINKING_TOKENSinjection —engine.go:148-155guards injection behindthinkingTokens > 0, so it's only set for:max. The PR description's "best-effort / no-op if CLI ignores it" matches the code exactly — the subprocess treats the env var as optional. No error path here.Concurrency:
claude-code=2with 3 claude-code models —run_lane(entrypoint.sh:206-218) handles this correctly. Withcap=2and 3 models, model 3 queues: it's launched only afterwait -nfrees a slot. Each lane runs in its own subshell (line 250), sowait -ninside the claude-code lane cannot accidentally reap processes from the ollama-cloud lane. Thewait -n 2>/dev/null || waitfallback degrades gracefully to sequential execution ifwait -nis unavailable. No overflow, no starvation, no missingwait.Status board pre-seeding —
entrypoint.sh:233-236pre-seeds all models inMODEL_LISTat startup (including the new third claude-code entry), so the board correctly tracks all 3 from t=0, even while the 3rd is queued.No new error paths are introduced. All relevant edge cases are handled by existing, tested code.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 32s
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
🔒 Security — No material issues found.
Verified the change against the security lens:
sha-80d8f53 → sha-c342bdb: same pinned-SHA pattern, same internal registry, samepermissions: contents: read / issues: write / pull-requests: writeblock, same secrets. No new trust boundary.claude-code/opus:maxmodel spec (adversarial-review.yml:63): parsed bynewClaudeCodeEngineincmd/gadfly/engine.go:83-109. Traced through:provider_of("claude-code/opus:max")→ providerclaude-code(read via${1%%/*}atentrypoint.sh:159). No path/colon ambiguity.strings.Cut("opus:max", ":")→ model="opus", thinking=parseThinking("max")=31999 (engine.go:113-121). The literal"max"never reaches the model — only an integer token budget does, so no spec-string injection vector.claude -p ... --model opus --permission-mode plan+MAX_THINKING_TOKENS=31999.--permission-mode plan(read-only) is unchanged (engine.go:104); no write tools granted to the new reviewer.engine.go:221-241,claudeEnv):GITEA_TOKEN,GADFLY_FINDINGS_TOKEN,OLLAMA_CLOUD_API_KEY, and other runner secrets are still dropped. The keep-list (PATH,HOME,*CLAUDE_*,*ANTHROPIC_*,*GADFLY_CLAUDE_*,*NODE_*,*XDG_*, …) is unchanged.MAX_THINKING_TOKENSis allowed through (already on the keep list) AND force-appended withstrconv.Itoa(e.thinkingTokens)(engine.go:152-155), so a leaked-from-runner value is overridden deterministically — no env-injection risk.CLAUDE_CODE_OAUTH_TOKEN; no new secret, no new credential surface.issue_commentif:allow-list (steve|fizi|dazed) is unchanged. Untrusted commenters still can't fire a secret-bearing run.claude-code=2): bounds in-flight subprocesses; no DoS surface introduced.The
:maxspec only adjusts an integer thinking-budget env var on a read-only Claude Code subprocess under the same minimal env as the existingclaude-code/opus. Nothing in my lens is materially wrong.🎯 Correctness — No material issues found
VERDICT: No material issues found
Reviewed through the Correctness lens. I verified the diff end-to-end against the checked-out code (
cmd/gadfly/engine.go,cmd/gadfly/engine_test.go,cmd/gadfly/main.go,cmd/gadfly/lens_concurrency_test.go,cmd/gadfly/model.go,entrypoint.sh).What I confirmed works correctly:
:maxspec parsing —engine.go:83-109cuts on/then:;claude-code/opus:max→model="opus",thinkingTokens=31999.parseThinking("max")atengine.go:113-121maps to themaxThinkingTokensconst (31999), a positive int to itself, anything else to 0 (off). Covered byTestClaudeCodeThinking(engine_test.go:221-241) andTestRunPassInjectsThinkingTokens(engine_test.go:259-281).engine.go:152-155appendsMAX_THINKING_TOKENS=31999to the CLI subprocess only whenthinkingTokens>0. The CLI env allowlist atengine.go:224explicitly keepsMAX_THINKING_TOKENS, so it survivesclaudeEnv()'s strict filtering.entrypoint.sh:159'sprovider_ofextractsclaude-codeas the lane forclaude-code/opus:max(matches on*/*, takes${1%%/*}).GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,claude-code=2"means the three claude-code entries (sonnet, opus, opus:max) cap at 2 in-flight; lenses default to serial (1) inside the claude-code lane since it's absent fromGADFLY_PROVIDER_LENS_CONCURRENCY. Peak: 2 claude subprocesses + 9 ollama-cloud queries ≤ the 10-comment budget called out atadversarial-review.yml:66. Job-leveltimeout-minutes: 90(5400 s) comfortably covers worst-case serialization.:sha-c342bdbis asserted (in the PR description) to carry the:thinkingspec parse from #5; the constantmaxThinkingTokens = 31999is defined in that same engine.go file and is what gets injected — verified end-to-end byTestRunPassInjectsThinkingTokens.:maxon an older CLI that ignoresMAX_THINKING_TOKENSsimply degrades to plainclaude-code/opus, as documented. No silent corruption.The
:maxmagic number 31999 is the Claude Code CLI's documented "ultrathink" ceiling, not a re-derivable constant — it's the vendor's value, applied as a best-effort env override. No bug.No logic errors, no domain-incorrect constants, no behavioral regressions introduced by this PR through the correctness lens.
🧯 Error handling & edge cases — No material issues found
Nothing in my lens specifically. The diff is a YAML-only change to a Gitea workflow that adds a new GADFLY_MODELS entry. The engine code clearly supports the
:maxspec (mapped to31999), the env sanitization preservesMAX_THINKING_TOKENS, and timeout/concurrency envelopes still fit.VERDICT: No material issues found
Nothing in my error-handling / edge-cases lens is materially wrong with this diff:
:maxparsing: Verified atcmd/gadfly/engine.go:111-121(parseThinking) and the constant atcmd/gadfly/engine.go:66(maxThinkingTokens = 31999) — exactly the value the PR description claims gets injected.claudeEnv()atcmd/gadfly/engine.go:224explicitly keepsMAX_THINKING_TOKENSthrough its allow-list, so the subprocess will see it (and not have it silently dropped).parseThinkingreturns0for any garbage (typos likeclaude-code/opus:Maxxor negative ints), whichclaudeCodeEngine.runPassthen treats as "no thinking override" —cmd.Envsimply doesn't getMAX_THINKING_TOKENSappended (engine.go:152guards on> 0). This is a safe degradation, not a swallowed error, and the PR's own spec is correct.claude-code=2means sonnet + opus + opus:max (3 entries) queues 1 — well withinGADFLY_TIMEOUT_SECS=600× 3 lenses and the 90-minute job ceiling. No overflow concern introduced by adding the third claude-code lane.engine.go:159-165(Setpgid+ groupSIGKILLon cancel), so a long-running:maxrun that gets timed out can't orphan Node children. Pre-existing, not regressed by this diff.The "best-effort no-op if the installed CLI build ignores it" caveat in the PR description is honest:
engine.go:65calls it out, and theclaude-code/opuslane has always had this posture forMAX_THINKING_TOKENS.No findings to report in this lane.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 22s
🪰 Gadfly review —
claude-code/opus(claude-code)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
I've verified the relevant code path. My review is complete.
VERDICT: No material issues found
This PR is a workflow-config-only change: it bumps the dogfood image tag (
sha-80d8f53→sha-c342bdb), appendsclaude-code/opus:maxtoGADFLY_MODELS, and edits comments. I traced the security-relevant path for the new:maxspec:claude-code/opus:maxflows intonewClaudeCodeEngine→parseThinking("max")→maxThinkingTokens(31999), which is applied as an env var (engine.go:152-154), and the CLI is invoked viaexec.CommandContextwith an argv slice (engine.go:127-139,149) — no shell, so no command-injection surface. The constant31999is hardcoded, not derived from the spec. Verified by readingengine.go.GADFLY_MODELSis set in the committed workflow YAML, not from untrusted PR input, so there's no tainted-data path here regardless.claudeEnv()(engine.go:221-241) allowlists env into the subprocess and deliberately dropsGITEA_TOKEN/provider keys;MAX_THINKING_TOKENSandCLAUDE_*(the OAuth token) are intentionally kept, which is correct and unchanged by this PR. Verified by reading the function.:sha-…tag rather than:latestreduces (not increases) supply-chain risk; it points at the project's own registry. No concern.Nothing in the security lens is materially wrong with this change.
Outside my lens: the PR's correctness rests on the image
sha-c342bdbactually containing the:thinking/:maxparse — the checked-out source has it (engine.go:83-121), but the image contents can't be verified from the repo. That's a build/correctness matter, not security.🎯 Correctness — No material issues found
I've verified the change end-to-end against the actual code. Here's my review.
VERDICT: No material issues found
This is a config-only change (image tag bump + one new
GADFLY_MODELSentry + comment). Through the correctness lens, I verified every way the newclaude-code/opus:maxspec is consumed downstream, and it is handled correctly:cmd/gadfly/engine.go:83-121):claude-code/opus:max→strings.Cut(spec, "/")yieldsopus:max, thenstrings.Cut(after, ":")yields modelopus+ thinking suffixmax.parseThinking("max")returnsmaxThinkingTokens(engine.go:66 = 31999), which is set asMAX_THINKING_TOKENSon the subprocess (engine.go:152-154). The magic number 31999 matches Claude Code's documented "ultrathink" budget — correct. There's an explicit test for exactly this spec (engine_test.go:229,:251,:270-273), and the comment/PR description's31999is consistent with the constant.claude-code/opusentry: both resolve to--model opus, but they are intentionally distinct review passes (the stated goal — same model at two thinking depths). I checked the two places identity matters and neither collides:scripts/run.sh:50) keys on the full${MODEL}spec, so the markers are…claude-code/opus -->vs…claude-code/opus:max -->— separate comments, no clobbering.entrypoint.sh:165) sanitizes the full spec toclaude-code_opusvsclaude-code_opus_max— distinct files.entrypoint.sh:159):provider_ofsplits on/→claude-code, matching theclaude-code=2concurrency entry. The new entry correctly joins the existing claude-code lane (now 3 models, cap 2), and the cloud-lane concurrency comment ("3 cloud × 3 lenses = 9") is untouched and unaffected.MAX_THINKING_TOKENSis only appended whenthinkingTokens > 0and is a plain env var on the CLI subprocess, so it's genuinely a no-op if the installed CLI ignores it — matching the PR description.I verified the above by reading
engine.go,engine_test.go,entrypoint.sh, andrun.shdirectly (not from the diff alone). The image-tag bump to:sha-c342bdbis asserted to contain this:maxparsing; I can confirm the parsing exists at this checkout but cannot verify the contents of the published image from here — that's the only unverifiable element, and it's outside the correctness lens.No logic, unit, or threshold errors in my lane.
🧯 Error handling & edge cases — No material issues found
I verified the relevant code rather than reviewing the YAML in isolation. The diff is workflow-only: it bumps the image to
:sha-c342bdb, addsclaude-code/opus:maxtoGADFLY_MODELS, and updates a comment. My lens is error handling & edge cases, so I traced where:maxactually gets parsed and executed.VERDICT: No material issues found
Findings (all within the error-handling / edge-case lens):
:maxsuffix onclaude-code/opus:maxis parsed safely. Verified incmd/gadfly/engine.go:86-95:strings.Cut(after, ":")splits the thinking tier, andparseThinking(engine.go:113-121) maps"max"→maxThinkingTokens(31999), a positive int → itself, and anything else → 0 (off). Bad/garbage suffixes degrade to "no extended thinking" rather than panicking — nostrconvpanic path, since the error fromAtoiis checked.claude-code/opus:maxcontains a colon, butGADFLY_MODELSis comma-separated and existing entries (qwen3-coder:480b-cloud) already carry colons, so colon-in-spec is an established, handled shape. The colon only gets special meaning after theclaude-code/slash prefix (engine.go:88-90), which is correct.MAX_THINKING_TOKENSis only appended to the subprocess env whenthinkingTokens > 0(engine.go:152-154), and it's explicitly whitelisted in the minimal env passthrough (engine.go:224). If the installed CLI ignores the var, the run still produces a normal result — no error swallowed, no crash.engine.go:174-176), empty results error out (engine.go:190), and process-group kill prevents orphaned children (engine.go:159-165). This PR adds no new unhandled path.Nothing in my lane is materially wrong. The change introduces no new error-handling or edge-case surface beyond the
:maxparse, which is covered (and tested —engine_test.go:229,247,272).Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 56s
🪰 Gadfly review —
deepseek-v4-pro:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
This is a workflow-only change that adds a new model entry (
claude-code/opus:max) and bumps the pinned Docker image. Through the security lens:CLAUDE_CODE_OAUTH_TOKENis unchanged.claude-code/opus:maxis hardcoded in the workflow YAML, not derived from any user-controlled input. The:maxsuffix is parsed byparseThinking()(engine.go:113-121), which only accepts the literal"max"(case-insensitive) or a positive integer — no path traversal, command injection, or other taint risk.MAX_THINKING_TOKENSin the subprocess env: I verified thatclaudeEnv()(engine.go:224) already allowlistsMAX_THINKING_TOKENSfor the CLI subprocess. The value is hardcoded as31999in Go (engine.go:66), not sourced from workflow inputs or secrets. No risk of an attacker-controlled value reaching the CLI.:sha-c342bdbtag — no supply-chain regression from the prior:sha-80d8f53.if:guard,GADFLY_ALLOWED_USERS, andpermissions:block are all untouched.Nothing material to flag here.
🎯 Correctness — No material issues found
Verdict: No material issues found
The diff adds
claude-code/opus:maxto theGADFLY_MODELSlist and bumps the pinned image tosha-c342bdb(which carries the:thinkingsuffix parser from #5). I verified the full parse chain end-to-end:engine.go:83-108):claude-code/opus:max→model="opus",thinkingTokens=31999. Confirmed by reading thenewClaudeCodeEngineandparseThinkingfunctions.engine.go:152-154):thinkingTokens > 0appendsMAX_THINKING_TOKENS=31999to the subprocess env. Confirmed by the test atengine_test.go:270-273which asserts the stub receivesMTT=31999.entrypoint.sh:159):provider_of "claude-code/opus:max"→claude-code, matching the existingclaude-code=2concurrency cap. The 3 claude-code models (sonnet, opus, opus:max) will run 2 at a time — correct.GADFLY_PROVIDER_LENS_CONCURRENCYhas noclaude-codeentry, so claude-code models fall back todefaultLensConcurrency=1(sequential lenses per model). This is consistent with the existing sonnet/opus behavior.No logic bugs, no misrouted models, no concurrency misconfiguration. The change is purely additive and mechanically sound.
🧯 Error handling & edge cases — No material issues found
No material issues found.
The diff adds
claude-code/opus:maxto the model fleet. Through the error-handling lens, I verified::maxparsing:newClaudeCodeEngine(engine.go:86-95) splitsopus:maxon:→ model=opus, thinking=max.parseThinking("max")(engine.go:114) returnsmaxThinkingTokens=31999. TherunPassmethod (engine.go:152-154) appendsMAX_THINKING_TOKENS=31999to the subprocess env only whenthinkingTokens > 0. All paths are exercised by existing tests (engine_test.go:257+).Graceful degradation: The engine explicitly documents (engine.go:64-66) that
MAX_THINKING_TOKENSis a no-op if the CLI build doesn't honor it —opus:maxfalls back to behaving likeopus. No crash, no error, no hung process.Concurrency:
GADFLY_PROVIDER_CONCURRENCY=claude-code=2with 3 claude-code models means one queues. At 10 min/lens × 3 lenses × ~2.5 serial batches ≈ 75 min, well within the 90-minute job timeout. No starvation or deadlock risk.No new nil/empty/zero/negative paths: The new spec flows through the same
strings.CutandparseThinkingbranches already exercised byclaude-code/opus(no colon) andclaude-code/sonnet. No new edge cases introduced.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 59s
🪰 Gadfly review —
nemotron-3-super:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found
🎯 Correctness — No material issues found
No material issues found
🧯 Error handling & edge cases — Reviewed
Yes, the change shown in the diff is present in the current file
.gitea/workflows/adversarial-review.yml. The line:matches the updated line shown in the diff (the addition of
,claude-code/opus:max). The file currently contains that exact string, confirming the change has been applied.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 39s
🪰 Gadfly review —
qwen3-coder:480b-cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
This PR modifies a CI workflow file to enable a new reviewer model (
claude-code/opus:max) by updating the Docker image tag and adding the new model to theGADFLY_MODELSenvironment variable. Through the security lens:CLAUDE_CODE_OAUTH_TOKENsecret is reused appropriately.The diff has been verified as a safe configuration update with no security implications. The addition of a new model to the review fleet is operationally sound and follows the established pattern for model inclusion.
🎯 Correctness — No material issues found
No material issues found.
🧯 Error handling & edge cases — No material issues found
No material issues found.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5s
🪰 Gadfly review —
claude-code/opus:max(claude-code)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
I have what I need. The change is workflow-only (image tag bump + one model added to
GADFLY_MODELS), and I've verified the security-relevant code path that the newclaude-code/opus:maxspec exercises.VERDICT: No material issues found
Reviewed through the 🔒 Security lens. This is a CI workflow-only change: bumps the pinned image from
:sha-80d8f53to:sha-c342bdband appendsclaude-code/opus:maxtoGADFLY_MODELS. I traced the new spec into the code it actually exercises and found no security problem.What I verified:
claude-code/opus:maxis parsed byparseThinking(cmd/gadfly/engine.go:113), which only maps"max"→ the hardcoded constantmaxThinkingTokens = 31999or a positive integer → itself; everything else → 0. The value reaches the subprocess asMAX_THINKING_TOKENS=<int>viastrconv.Itoa(engine.go:154), and the CLI is launched withexec.CommandContext+ an argv slice (engine.go:149) — no shell, so no command/arg injection from the spec.claudeEnv()(engine.go:221-233):GITEA_TOKEN,GADFLY_FINDINGS_TOKEN, and provider API keys are deliberately dropped; onlyCLAUDE_*/ANTHROPIC_*/NODE_*/XDG_*/locale/MAX_THINKING_TOKENSpass through. Adding anotherclaude-code/*model reuses the already-presentCLAUDE_CODE_OAUTH_TOKEN— no new secret is introduced or exposed by this diff.:sha-<short>digest tag rather than:latest, which is the safer choice (and matches the project's own documented lesson about:latestcaching). No downgrade in pinning discipline.Note on scope: I did not re-audit the pre-existing env-allowlist or spec-parsing logic as findings since this PR doesn't modify them — I only confirmed the new value flows through them safely.
Nothing in the security lane is materially wrong here.
🎯 Correctness — No material issues found
I've verified the change against the actual checked-out code. Here is my review.
VERDICT: No material issues found (Correctness lens)
This is a workflow-only change: bump the image tag and append
claude-code/opus:maxtoGADFLY_MODELS. I checked the supporting code paths that make this spec meaningful::maxsuffix parses correctly.newClaudeCodeEngine(cmd/gadfly/engine.go:83-96) cuts on the first/, then on:, sendingopus:max→ modelopus, suffixmax.parseThinking(engine.go:113-121) mapsmax→maxThinkingTokens. Verified by reading the code and the table tests atengine_test.go:228-232({"claude-code/opus:max", "opus", maxThinkingTokens}).MAX_THINKING_TOKENS=31999is the correct value.maxThinkingTokens = 31999(engine.go:66) is Claude Code's documented "ultrathink" budget (32000−1). Re-derived against the known tier value; it's not an arbitrary magic number. It's injected into the subprocess env only whenthinkingTokens > 0(engine.go:152-154) and whitelisted throughclaudeEnv()(engine.go:224), so it actually reaches the CLI. The "best-effort / no-op if ignored" claim in the PR is accurate.opusandopus:max. My main correctness worry was that both specs resolve to model idopusand would clobber each other's PR comment or status file. They don't: the commentMARKERuses the full${MODEL}spec (scripts/run.sh:50), andstatus_file_for(entrypoint.sh:165) sanitizes the full spec viatr→claude-code_opusvsclaude-code_opus_max. Both keys stay distinct, so the two thinking depths get separate comments as intended.GADFLY_MODEL="claude-code/opus:max"satisfiesisClaudeCodeSpec'sclaude-code/prefix check (engine.go:72), andprovider_of(entrypoint.sh:159) yields laneclaude-code, matching theclaude-code=2concurrency entry. Correct routing.One thing I could not verify (and it's operational, not code-correctness): whether the pinned image
:sha-c342bdbactually contains this:maxparse code. The repo is shallow here (only the head commit is present), so I can't confirmc342bdb's contents. If that image predates the suffix parsing,claude-code/opus:maxwould be passed to the CLI as a literal--model opus:maxand fail — but that's a deployment claim the PR asserts, and the code at this checkout is correct and tested.Nothing material in the correctness lane.
🧯 Error handling & edge cases — No material issues found
I've verified the relevant code paths. My lens is error handling & edge cases.
VERDICT: No material issues found
This is a workflow-only change (image SHA bump + adding
claude-code/opus:maxtoGADFLY_MODELS). I checked the unhappy paths the new spec could introduce and they are all handled:opus:maxis defensive.newClaudeCodeEnginesplits on the first:(engine.go:90-95) andparseThinking(engine.go:113-121) maps"max"→31999, a positive int→itself, and anything else→0 with no panic —strconv.Atoierrors and non-positive values fall through to the safe0(thinking off). Bad input can't crash the run. Verified by reading the function.claude-code/opusandclaude-code/opus:max(the one edge case worth checking, since both reduce to--model opus). The two stay distinct everywhere they're keyed:status_file_forsanitizes viatr -c '[:alnum:]._-' '_'(entrypoint.sh:165), givingclaude-code_opus.jsonvsclaude-code_opus_max.json.<!-- gadfly-review:${PROVIDER}:${MODEL} -->(run.sh:50) keys on the fullMODEL, so the two get separate comments.Neither overwrites the other.
opus:maxwill run longer; if it exceeds the per-lens timeout,cmd.CancelSIGKILLs the whole process group (engine.go:156-165,Setpgid), so no orphanedclaude/node children. A missing/ignoring CLI is a normal pass error, kept advisory (engine.go:81-82), andreview_onelogs+continues on failure (entrypoint.sh:183).provider_of "claude-code/opus:max"→claude-code(segment before first/,entrypoint.sh:159), so theclaude-code=2concurrency cap applies as intended.The constant
31999matches the documented Claude Code "ultrathink" budget and the PR description; no off-by-one or unit concern in my lane.Nothing in the error-handling lens is materially wrong here.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 38s