The swarm reviewed PR #5 (8 reviewers; the telemetry fix from #4 is now
live, so only 13 findings vs 43 on the comparably-clean #4 — the fix
works). Folded in the two warranted ones:
- engine: keep MAX_THINKING_TOKENS in claudeEnv() so a globally-set value
reaches the CLI too (not just the per-spec :max append). (minimax)
- test: TestRunPassInjectsThinkingTokens verifies runPass actually puts
MAX_THINKING_TOKENS in the subprocess env (31999 for :max, unset for a
plain spec) — the parse was tested, the injection wasn't. (minimax)
The MAX_THINKING_TOKENS-is-unverified concern (minimax, qwen) is the same
caveat already documented; left as-is. gofmt/vet/test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per Steve: add Claude Code opus to gadfly's own swarm, and prep a
max-thinking variant.
- Dogfood workflow: add claude-code/opus alongside claude-code/sonnet
(claude-code lane bumped to 2 so they run in parallel), and bump the
image pin to :sha-80d8f53 so the clean-lens telemetry fix from #4 is
actually live in dogfood reviews.
- Engine: a "claude-code/<model>:<thinking>" spec now sets an extended-
thinking budget for that run via MAX_THINKING_TOKENS on the subprocess
— ":max" (high ultrathink tier) or ":<n>". Best-effort (a no-op if the
CLI build ignores it); harmless, never errors. This ships the capability
so a follow-up can enable claude-code/opus:max once this image builds
(the currently-pinned image predates the parse and would mis-route it).
- README documents the :thinking suffix; new tests cover the spec parse.
gofmt clean, go vet quiet, go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>