feat(reusable): ship the curated swarm as the default config consumers inherit #10
Reference in New Issue
Block a user
Delete Branch "feat/default-swarm-config"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Makes the curated swarm the default Gadfly config, defined once in the reusable workflow's input defaults, so consumers subscribe by just calling it (no
with:block) and inherit it.Default swarm (was all
""→ image default)modelsminimax-m3:cloud, glm-5.2:cloud, deepseek-v4-pro:cloud, claude-code/sonnet, claude-code/opus, claude-code/opus:maxspecialistssecurity, correctness, maintainability, performance, error-handling(5-lens suite)provider_concurrencyollama-cloud=3,claude-code=3(all three claudes at once)provider_lens_concurrencyollama-cloud=3timeout_secs/max_steps600/14timeout_minutes45 → 90Consumers now inherit
adversarial-review.ymlslimmed to justallowed_users(uses the local reusable ref, so this PR's own review self-tests the new defaults).Opinionated default — documented
The default needs
OLLAMA_CLOUD_API_KEYandCLAUDE_CODE_OAUTH_TOKEN. A consumer with only one (or a different provider) overridesmodels:. The publicexamples/reusable.ymlkeeps a cloud-onlymodels:override so a copy-paste works with just the Ollama key; README / examples / CLAUDE.md updated to match.Advisory-only invariant unchanged.
🤖 Generated with Claude Code
🪰 Gadfly — live review status
6/6 reviewers finished · updated 2026-06-28 02:20:34Z
claude-code/opus· claude-code — ✅ doneclaude-code/opus:max· claude-code — ✅ doneclaude-code/sonnet· claude-code — ✅ donedeepseek-v4-pro:cloud· ollama-cloud — ✅ doneglm-5.2:cloud· ollama-cloud — ✅ doneminimax-m3:cloud· ollama-cloud — ✅ doneLive status board. Findings are posted in each model's own comment. Advisory only — does not block merge.
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Blocking issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — Minor issues
VERDICT: Minor issues
Caller-side actor gate is now load-bearing for a larger blast radius —
.gitea/workflows/review-reusable.yml:33-56+entrypoint.sh:71-82The reusable enforces no
issue_commentactor gate itself; that gate lives in the caller (adversarial-review.yml:36-41,examples/reusable.yml:48-50). The in-containerGADFLY_ALLOWED_USERS=""fallback only checks "is a repo collaborator" (entrypoint.sh:78-82). Before this PR the default swarm was""(one model, viaentrypoint.sh:52DEFAULT_MODELS="qwen3-coder:480b-cloud"), so a missed gate had small blast radius. After this PR the default swarm pullsOLLAMA_CLOUD_API_KEY+CLAUDE_CODE_OAUTH_TOKENplus 5 lenses × 6 models × 2 passes, so any collaborator who can comment@gadfly reviewon a copy-pasted stub now triggers a full multi-provider secret-bearing run. The docs warn about this, but the warning is one line in a comment; consider one of: (a) move the gate into the reusable with ainputs.bypass_actor_gateopt-out (default off), or (b) add a pre-flightallowed_users: required: trueso a caller can't inherit""by accident. Verified by reading both consumers'if:blocks and the in-container collaborator check.Image is tag-pinned, not digest-pinned —
.gitea/workflows/review-reusable.yml:88:sha-c342bdbis a namespace prefix, not asha256:…digest. The internal registry can re-tag and serve a different image at the same tag. Pre-existing, not introduced by this PR, but worth noting since the comment claims "immutable" — only digest pinning is truly immutable. Suggest switching toimage@sha256:…for the consumer-facing reusable to match the comment's claim. Unverified that the registry actually enforces immutability on the:sha-*namespace.No new injection / SSRF / deserialization / secret-leakage vectors introduced by this PR. Verified by tracing: each input flows
${{ inputs.X }}→env:ondocker://…→entrypoint.shenv read (${GADFLY_…:-}). No new YAMLrun:blocks, noeval, no template expansion of attacker-controlled strings, no new outbound URLs.workflow_call.secretsall stayrequired: false, so an unforwarded secret stays empty — no accidental leak.secrets: inheritis documented as discouraged in both consumer files; verified least-privilege is preserved.timeout_minutesraised 45 → 90 —.gitea/workflows/review-reusable.yml:56. Larger window for a comment-triggered (or stale) job to hold the runner and burn secrets. Mitigated byconcurrency.cancel-in-progress: truein both callers, so not material.Outside my lens: nothing.
🎯 Correctness — No material issues found
VERDICT: No material issues found
I checked the change through the Correctness lens (logic, semantic correctness, re-derived values):
cmd/gadfly/specialists.go:67definesdefaultSuite = []string{"security", "correctness", "maintainability", "performance", "error-handling"}. The new reusablespecialistsdefault (review-reusable.yml:44) is the exact same 5 — verified character-for-character. So the workflow input default and the Go binary's built-in default suite agree.modelsdefault (minimax-m3:cloud,glm-5.2:cloud,deepseek-v4-pro:cloud,claude-code/sonnet,claude-code/opus,claude-code/opus:max) splits cleanly into 3 ollama-cloud + 3 claude-code models. Verified againstentrypoint.sh:159(provider_of—claude-code/<x>⇒ providerclaude-code) and:167-176(provider_capparsesprovider=NCSVs), soprovider_concurrency: "ollama-cloud=3,claude-code=3"resolves to "cap 3 per lane" as documented.claude-code/*entries share theclaude-codeprovider, so the cap-3 lane runs sonnet/opus/opus:max in parallel — matches the PR's "all three claudes at once" claim. (Bonus: opus:max setsMAX_THINKING_TOKENS=31999perengine.go:63-66— well-formed spec.)timeout_minutes: 90andtimeout_secs: 600: rough arithmetic — 6 models × 5 lenses × 2 passes = 60 lens invocations per provider;600sper lens × 10 sequential lens-passes per model = 6000s (~100 min) if fully serial, but lanes + within-lane concurrency drop that well under 90 min. Tight but internally consistent; the PR explicitly addresses the concern in the comment.examples/reusable.yml:66deliberately overridesmodelsto a cloud-only CSV; its comment correctly instructs users to delete the override (and forward the Claude token) to get the full default. Verified..gitea/workflows/adversarial-review.ymlslimming: the only remainingwith:key isallowed_users, which is the one consumer-specific value. Verified the reusable passes every GADFLY_* env through (lines 120-129), so no input is lost.No magic numbers, conversion factors, or thresholds in this diff that need re-derivation — they're config defaults, not formulas. The structural change is self-consistent and the defaults align with the Go binary's own defaults.
🧹 Code cleanliness & maintainability — Minor issues
Now let me go through each finding carefully:
Finding 1 (duplicate header paragraphs): Lines 15-19 vs 35-41. Looking at the actual file, lines 15-19 describe the consumer example with
with: { allowed_users: "..." }and then the inputs section starts. Lines 35-41 are insideon.workflow_call.inputsand explain the default swarm. They DO cover similar ground but for different audiences (top-of-file summary vs inline inputs doc). The duplication claim has merit — both paragraphs explain "3 cloud + Claude Code engine, 5-lens suite, all three claudes concurrent" almost verbatim. CONFIRMED.Finding 2 (mixed defaults misleading header): Line 18 says "Set any input to
""to fall back to the image's built-in default" — but this is now in the new top header (the consumer-facing part), andwith:inputs defaulting to non-empty curated values won't fall back to image defaults when set to"". Wait, actually re-reading: the diff shows the comment is at line 18 in the "Inputs ship the DEFAULT swarm..." paragraph, but in the current file the""fallback rule is at line 41 (in theworkflow_call.inputsblock). Let me re-verify.Looking at the current file, line 41 says "Set any input to
""to fall back to the image/entrypoint built-in default." This is inside theon.workflow_call.inputscomment block. The claim in the finding references "line 18" — but the file has been restructured. The draft's line numbers don't match the current file. The line at 18 is in the top header section. However, the broader concern is real: the""fallback rule applies to inputs with empty string defaults, but the workflow now has a mix. CONFIRMED in spirit but line numbers stale.Finding 3 (adversarial-review.yml comments): The
# Security:comment at lines 33-35 is the only place that explains the mirror relationship withallowed_users. The input is at line 54. That's 17 lines. The finding is accurate. CONFIRMED.Finding 4 (examples/reusable.yml comment placement): Line 67-68 says "Other inputs inherit the default... override any of them here" and sits immediately above
allowed_users: "your-username"at line 69. This is misleading becauseallowed_usersIS being explicitly set. CONFIRMED.Finding 5 ("two passes" terminology): The draft says "two passes" is found in
entrypoint.sh:51andCLAUDE.md:13,103. My grep finds zero matches in those files. The actual usage is only inreview-reusable.yml:55. The claim about entrypoint.sh/CLAUDE.md usage is wrong — but the core point that "two passes" is undefined in the workflow file stands, since it's a workflow-internal term with no definition nearby. The finding's supporting evidence is partially false, but the finding itself (the doc gap) is valid. I'll CONFIRM the finding but drop the false evidence details.Finding 6 (Phase 4 fossil): CLAUDE.md:52 is the only remaining "(Phase 4)" reference, and the workflow files no longer mention it. CONFIRMED.
Now let me write the corrected review.
VERDICT: Minor issues
.gitea/workflows/review-reusable.yml:15-19and:35-41— Two near-duplicate header paragraphs explain the same "3 cloud + Claude Code engine, 5-lens suite, all three claudes concurrent" story. The first (top-of-file, lines 15–19) is the consumer-facing summary; the second (insideon.workflow_call.inputs, lines 35–41) repeats the same "default swarm" explanation. Trim the second to a short cross-reference so the explanation lives in one place..gitea/workflows/review-reusable.yml:41— The inline comment says "Set any input to""to fall back to the image/entrypoint built-in default" as a general rule, but the inputs block now mixes empty-string defaults (provider,base_url,worker_model,trigger_phrase,allowed_users) with curated non-empty defaults (models,specialists,provider_concurrency,provider_lens_concurrency,timeout_secs,max_steps). Empty-string defaults do fall through to image defaults; non-empty curated defaults fall back to the workflow default, not the image. A per-input marker (# "" → image default) on the five empty-default fields would prevent a maintainer from conflating the two..gitea/workflows/adversarial-review.yml:33-35vs:54— The# Security:comment is the only place that explains the actor-gate mirrorsallowed_users, and it sits ~17 lines above the input it mirrors. A one-line# mirror of allowed_users below (and the in-container belt-and-suspenders check)next to thewith:block would make the security coupling obvious at the point of definition.examples/reusable.yml:67-68— The comment "Other inputs inherit the default (5-lens suite, concurrency, 90-min cap); override any of them here…" sits immediately aboveallowed_users: "your-username", which IS being explicitly set — not inherited. A copy-paste reader would reasonably conclude theallowed_users:line should be deleted too. Move the comment abovemodels:(where it's accurate) or split it into two annotations..gitea/workflows/review-reusable.yml:54-55— The timeout rationale says "over two passes can run long", but the workflow file never defines what a "pass" is (and the phrase doesn't appear elsewhere in the repo). Either expand to "two review passes (seeentrypoint.sh)" or drop the phrase.CLAUDE.md:52— The lone remaining "(Phase 4)" reference, now orphaned since the workflow files dropped the term. Either restore a matching reference in.gitea/workflows/review-reusable.ymlor drop the parenthetical so the doc and the workflow agree.⚡ Performance — Blocking issues found
VERDICT: Blocking issues found
.gitea/workflows/review-reusable.yml:43— ~6× compute cost per review is now the default, contradicting documented intent. Verified:entrypoint.sh:52shipsDEFAULT_MODELS="qwen3-coder:480b-cloud"(one model), andCLAUDE.md:103explicitly warns "Cost ≈ specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one)." This PR replaces that with 6 models as the defaultmodelsinput (minimax-m3:cloud, glm-5.2:cloud, deepseek-v4-pro:cloud, claude-code/sonnet, claude-code/opus, claude-code/opus:max). With the new 5-lens default suite (review-reusable.yml:44), that is5 specialists × 6 models × 2 passes = 60 model invocations per PR— every consumer that omitswith:now runs ~6× the token/$$ of the prior default. The PR compensates by settingtimeout_secsto600(line 49) and the job cap to90(line 56, up from 45), but raising the budget makes the oversubscription easier to fit, not cheaper to run.CLAUDE.md:103reads as a deliberate design constraint that this PR overrides silently. Either keep the "default = cheap" invariant (1–2 models, opt into the swarm) or updateCLAUDE.mdand the README cost note explicitly so consumers know what they're inheriting..gitea/workflows/review-reusable.yml:47-48— multiplicative backend fan-out is now the default, and the in-repo code already warns against this. Verified:cmd/gadfly/main.go:230-233warns "Caution: this fans out WITHIN one model. It multiplies with entrypoint.sh's per-provider model concurrency, so total concurrent backend requests ≈ (models at once) × (lenses at once)." The new defaults shipprovider_concurrency: ollama-cloud=3,claude-code=3(line 47) ×provider_lens_concurrency: ollama-cloud=3(line 48) → 9 simultaneous Ollama Cloud requests by default for every consumer that just calls the reusable. Every consumer that omitswith:also inherits 3 concurrentclaudeCLI processes (each spawning Node children in their own process group perengine.go:159'sSetpgid: true). This is fine for a curated set of repos, but a "default" implies safe-for-anyone; the multiplicative fan-out is the exact pattern the codebase flags as a footgun. Consider droppingprovider_lens_concurrency's default to""(fall through to the image's safer built-in default) and letting consumers raise it explicitly, or pickprovider_concurrency: ollama-cloud=2,claude-code=2to keep parallelism but cap the multiplier..gitea/workflows/review-reusable.yml:50—max_steps: 14(vs. image defaultdefaultMaxSteps = 24incmd/gadfly/main.go:81) is a perf-positive change (fewer wasted tool steps per lens) — verified, flagging only so it's not a silent behavior drift: any consumer that previously omittedwith:and relied on the image's 24-step cap now silently gets 14. That is the intended direction here, just worth a one-liner in the PR description /CHANGELOGso consumers know the cap shrank.Verified by reading:
entrypoint.sh:50-52,cmd/gadfly/main.go:80-90,220-234,cmd/gadfly/engine.go:63-77,150-165,.gitea/workflows/review-reusable.yml,.gitea/workflows/adversarial-review.yml,CLAUDE.md:98-107.🧯 Error handling & edge cases — No material issues found
VERDICT: No material issues found
Reviewed the diff through the error-handling & edge-cases lens. The change is purely workflow YAML + docs (no Go/shell code touched), so I focused on whether the new defaults introduce any new failure modes, swallowed errors, or unhandled edge cases in the existing plumbing they feed into.
What I verified:
modelsdefault (review-reusable.yml:43): all six specs parse cleanly viaentrypoint.sh'sprovider_of(claude-code/*→ laneclaude-code, bare*:cloud→ laneollama-cloud).provider_concurrencyandprovider_lens_concurrencydefaults cover both lanes — no provider falls through with an unset cap.specialistsdefault (review-reusable.yml:44): all five names (security, correctness, maintainability, performance, error-handling) are present incmd/gadfly/specialists.go'sbuiltinSpecialistsand exactly match thedefaultSuitefallback — no "unknown specialist" warnings."", as the new docs claim:GADFLY_MODELS→entrypoint.sh:156(${GADFLY_MODELS:-…$DEFAULT_MODELS}).GADFLY_SPECIALISTS→cmd/gadfly/specialists.go:136(empty trimmed →defaultSuite).GADFLY_PROVIDER_CONCURRENCY→provider_cap(entrypoint.sh:167) returnsDEFAULT_CONCwhen no/blank entry matches.GADFLY_TIMEOUT_SECS/GADFLY_MAX_STEPS→envInt(main.go:450) treats"", unparseable, and<=0as missing → returns the built-in default.run_lane(entrypoint.sh:208) still guards[ "$cap" -ge 1 ] 2>/dev/null || cap=1, so even a malformedprovider_concurrencyvalue can't drive concurrency to zero/negative.run_lanestill runs models withreview_one … || log "model $1 failed (continuing)", so missingOLLAMA_CLOUD_API_KEYorCLAUDE_CODE_OAUTH_TOKENwon't crash the swarm — individual lenses mark themselves errored on the status board and the run completes.timeout_minutes: 90: callers that previously pinned 90 now inherit the same value from the reusable default — no behavior change for existing callers (adversarial-review.ymldropped its redundanttimeout_minutes: 90line, matching the new default).Nothing in this diff introduces a swallowed error, missing cleanup, panic on bad input, or an unhandled nil/empty/zero boundary. The "inherit by default, override per-input, empty falls back to image default" contract holds end-to-end.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 59s
🪰 Gadfly review —
claude-code/opus(claude-code)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
This PR is a defaults-and-docs change to the reusable workflow plumbing. I traced the security-relevant paths (secret forwarding, actor/authz gating, untrusted-input flow) against the actual files and found nothing material in my lens.
What I verified:
Authz gate preserved. The caller-side actor gate in
.gitea/workflows/adversarial-review.yml:36-41still restrictsissue_commenttriggers tosteve/fizi/dazed, andallowed_users: "steve,fizi,dazed"(line 54) keeps the in-container belt-and-suspenders check (entrypoint.sh:68-83) intact. The example stubexamples/reusable.yml:48-50,69likewise retains both itsif:gate andallowed_users. The diff does not weaken either.Least-privilege secret forwarding unchanged.
review-reusable.ymlstill declares each secret explicitly (:64-74) and maps them through env (:104-118); the recommendation againstsecrets: inheritis unchanged. Addingclaude-code/*to the defaultmodelsonly means the run usesCLAUDE_CODE_OAUTH_TOKEN, which is forwarded the same explicit way (adversarial-review.yml:47) and resolves to empty if a caller omits it — no new leakage path.No new tainted-input sink. The PR touches only input defaults (model list, specialist names, concurrency, timeouts) and comments/docs. The untrusted-data surfaces (
COMMENT_BODY,ACTOR, etc.) and their handling inentrypoint.share untouched, so no new injection/SSRF/deserialization vector is introduced.allowed_usersdefault of""is pre-existing. With it empty,actor_allowed()falls back to a Gitea collaborator (write-access) check (entrypoint.sh:78-82) — a reasonable control, and not a regression introduced here (it defaulted to""before this PR too).Note on scope: the per-lens timeout and the 45→90 wall-clock bump are advisory/cost concerns, not security — those belong to other lenses.
🎯 Correctness — No material issues found
VERDICT: No material issues found (Correctness lens)
This PR is purely a defaults/config + docs change in the reusable workflow YAML. I verified the substance against the checked-out code rather than the diff alone:
review-reusable.yml:44defaultsecurity,correctness,maintainability,performance,error-handlingis identical todefaultSuiteincmd/gadfly/specialists.go:67, and every name is a registered built-in lens (specialists.go:26-44). No invalid/typo'd lens.entrypoint.sh:159provider_ofreturnsollama-cloudfor the three bare*:cloudids andclaude-codefor the threeclaude-code/*specs. The defaultprovider_concurrency: "ollama-cloud=3,claude-code=3"(review-reusable.yml:47) therefore keys correctly onto exactly 3 cloud + 3 claude models — so "all three claudes at once" is accurate, not off-by-one.claude-code/opus:maxparses correctly. VerifiednewClaudeCodeEngine(engine.go:83-108): the:maxsuffix is split off as the thinking tier (→maxThinkingTokens,engine.go:114), leavingmodel=opus. The:maxis not mistaken for part of the model id.GADFLY_*env (review-reusable.yml:120-130); omittingwith:in a caller applies these defaults normally. The slimmedadversarial-review.ymlstill forwards the secrets the default swarm needs (Ollama + Claude Code) at lines 46-47.status_file_for(entrypoint.sh:165) sanitizesclaude-code/opus→claude-code_opusandclaude-code/opus:max→claude-code_opus_max, which stay distinct.Nothing in the correctness lens is materially wrong. (
timeout_secs=600/max_steps=14/provider_lens_concurrency=ollama-cloud=3carry over the previous explicit values, so there's no behavioral regression for me to flag — and tuning judgment on those falls under the performance lens, not mine.)🧹 Code cleanliness & maintainability — Minor issues
I've verified all claims against the actual code. Let me confirm my assessment:
Duplicated default-swarm prose: Confirmed. The roster description is genuinely repeated across
review-reusable.yml:15-19and:35-41(two blocks, same file),adversarial-review.yml:6-7and:51-53, andexamples/reusable.yml:8-12and:63-66. The single source of truth is theinputs:defaults atreview-reusable.yml:43-56. All citations check out.The concurrency "outside my lens" flag: I investigated
lensConcurrency()(cmd/gadfly/main.go:281-286). Whenprovider_lens_concurrencyhas noclaude-codeentry, it gracefully falls back to theGADFLY_LENS_CONCURRENCYscalar (default 1) — claude-code lenses run serially while the 3 claude models run concurrently. That asymmetry is handled cleanly by the code and is a plausible deliberate choice (claude-code with extended thinking is slow/expensive). It is not a confirmed defect, and the draft itself only raised it as an open question — dropping it.VERDICT: Minor issues
One confirmed, low-severity maintainability concern:
review-reusable.yml:15-19and again atreview-reusable.yml:35-41(same file, two blocks), plusadversarial-review.yml:6-7and:51-53, andexamples/reusable.yml:8-12and:63-66. The single source of truth — theinputs:defaults atreview-reusable.yml:43-56— is clean, but the prose copies will drift the next time the swarm changes (this PR itself had to hand-edit every copy). Concrete fix: inreview-reusable.ymlcollapse the two near-identical blocks (file header 15-19 and theworkflow_callcomment 35-41) into one, and in the satellite files replace the spelled-out roster with a back-reference ("inherits the default swarm — seereview-reusable.yml") rather than re-listing the exact models/lenses. Mild because the repo's house style is deliberately verbose comments, so the redundancy partly follows convention.⚡ Performance — No material issues found
I've now verified the full timing/concurrency model end-to-end. Here's my assessment through the Performance lens.
VERDICT: No material issues found
This is a config-defaults change (YAML/markdown only — no code paths altered). I traced the actual cost/wall-clock implications against
entrypoint.shandcmd/gadfly/main.gorather than eyeballing the diff:Cost is comparable to the prior config, not a regression. Cost ≈ models × specialists × 2 passes. The old
adversarial-review.ymlran 9 models × 3 lenses = 54 pass-units; the new default runs 6 models × 5 lenses = 60 pass-units. Marginally higher, not a step-change, and it's an intentional, documented "opinionated default."The 90-min job cap comfortably fits the worst-case lane. Verified the concurrency mapping:
provider_lens_concurrencyonly listsollama-cloud=3, so the claude-code lane falls back todefaultLensConcurrency = 1(main.go:281-285,:99) — its 5 lenses run sequentially. With per-lensGADFLY_TIMEOUT_SECS=600shared across review+recheck (main.go:310-311), a single claude model is bounded at 5 × 10 min = 50 min; the three claude models run concurrently (claude-code=3), so the lane ≈ 50 min. The ollama lane (3 models × 3 lenses concurrent) is faster (~20 min). Lanes run in parallel, so worst-case wall-clock ≈ 50 min < 90 min cap. The 45→90 bump is adequate and is a ceiling (kill-guard), not a guaranteed runtime.No new N+1 / unbounded growth / hot-loop work. The status board still renders one comment; backend fan-out (
models × lenses, e.g. 9 concurrent ollama-cloud requests) is unchanged in mechanism and deliberately bounded by the existing caps.One thing I checked and deliberately did not flag as an issue: the claude-code lane runs its 5 lenses serially (lens-concurrency 1) because the lens-concurrency map omits
claude-code. That's slower per-model than it could be, but (a) it fits the cap with margin, and (b) raising it would put 3 models × N lenses concurrent claude-CLI processes against the same token, so serial-lens there is a reasonable rate-limit tradeoff, not a performance defect.Nothing in my lane is materially wrong.
🧯 Error handling & edge cases — No material issues found
I've reviewed the actual flow end-to-end. This diff is config-default and documentation changes only; the error-handling paths it touches live in the unchanged scripts, so the relevant question for my lens is whether the new defaults drive any unhappy path that isn't gracefully handled. I traced the most obvious one.
VERDICT: No material issues found
This PR only changes input default values in
review-reusable.ymlplus docs/examples. No control-flow or error-handling code changed. I verified the one edge case the new defaults actually introduce:Inheriting the default
modelswithoutCLAUDE_CODE_OAUTH_TOKEN(the new default now includesclaude-code/sonnet|opus|opus:max). A consumer that forwards onlyOLLAMA_CLOUD_API_KEYand omitswith:will run three claude-code reviewers with no auth. I traced this path and it is handled gracefully, not a crash or CI failure:entrypoint.sh:181always invokesrun.shwithPROVIDER=ollama; the per-model failure is swallowed (run.sh:183... || log "...(continuing)").run.sh, the up-front key guard (run.sh:151) only fires forollama-cloud; aclaude-code/*model skips it and runs the binary. If theclaudeCLI exits non-zero or empty,run.sh:176-181capturesrcand replacesREVIEWwith a⚠️ … failedcomment, thenrun.shstill exits 0.The
examples/reusable.ymlcloud-onlymodels:override (examples/reusable.yml:66) inheritsprovider_concurrency: "ollama-cloud=3,claude-code=3"; the strayclaude-code=3entry is inert with no claude models —provider_cap/run_lane(entrypoint.sh:167-218) only build lanes for providers that appear in the model list, so no empty lane or divide-by-anything. Verified.New numeric/string defaults (
timeout_secs: "600",max_steps: "14",provider_lens_concurrency: "ollama-cloud=3",timeout_minutes: 90) are all well-formed for their existing parsers; the per-lens timeout (run.sh/binary) still bounds a hung lane independent of the 90-min job cap, so no starvation regression.Nothing in the error-handling / edge-case lens is materially wrong here. The unhappy paths the new defaults can reach were already defensively handled by the unchanged scripts.
(Note: I did not produce a plan file — the actual task here is an adversarial code review, which is read-only, so I've delivered the review directly.)
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 27s
🪰 Gadfly review —
claude-code/opus:max(claude-code)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
Reviewed through the 🔒 Security lens. This PR changes only workflow input defaults (models/specialists/concurrency/timeouts) plus docs. I traced every security-relevant control that could be affected and verified each against the checked-out files:
review-reusable.yml:64-74still declares each secret asrequired: falseand forwards them explicitly (:105-118); the calleradversarial-review.yml:45-49forwards only the four it uses, notsecrets: inherit. No new secret is exposed and nothing switched toinherit. Verified by reading both files.if:allow-list (adversarial-review.yml:36-41, hardcoded steve/fizi/dazed) and the in-container checkactor_allowed()(entrypoint.sh:68-83), driven by the still-forwardedGADFLY_ALLOWED_USERS(review-reusable.yml:129, set to"steve,fizi,dazed"at:54). Theallowed_usersdefault stays"", which falls back to a collaborator/write-access check — not fail-open to the public. The PR doesn't touch this logic.models,provider_concurrency, etc.) are consumer-/maintainer-controlled inputs, not PR-author-controlled. They flow to env and are parsed inentrypoint.sh(provider_cap, model-list split) with fixed, non-tainted values. No untrusted-input path was added or widened.examples/reusable.yml:48-50keeps theissue_commentactor gate, and:57-59keeps explicit secret forwarding. The new "omitwith:to inherit defaults" guidance does not remove any auth requirement.Nothing in this change weakens authn/authz, leaks credentials, or adds an injection/SSRF/deserialization vector. Honest result for this lens: clean.
Note: I did not duplicate non-security observations (e.g. the doc/comment churn or the model-list correctness), which belong to other lenses.
(Plan mode is active, but a review is read-only — I made no edits and there is no implementation to plan; this is the requested review output.)
🎯 Correctness — No material issues found
VERDICT: No material issues found
Reviewed strictly through the Correctness lens. This PR only moves default values into the reusable workflow's
inputsblock and updates docs — no executable code changed. I verified the new defaults actually resolve and behave correctly end-to-end:GADFLY_*env var inreview-reusable.yml:120-130, so the new defaults genuinely flow toentrypoint.sh. Verified by reading the full step env block.claude-code/sonnet|opus|opus:maxand the:cloud-suffixed cloud ids are valid;claude-code/opus:max→ max thinking tokens is explicitly covered byengine_test.go:229, andprovider_ofinentrypoint.sh:159correctly extractsclaude-codefromclaude-code/opus:max(splits on first/).provider_concurrency: "ollama-cloud=3,claude-code=3"→provider_cap(entrypoint.sh:167-176) yields cap 3 for theclaude-codelane, and there are exactly 3 claude models, so the "all three claudes at once" claim holds.GADFLY_TIMEOUT_SECSis the deadline per specialist shared across both passes (main.go:48, 82-86), so worst case is 5 lenses × 600s = 50 min when a lane runs lenses serially (claude-code has noprovider_lens_concurrencyentry → falls back to default 1). 50 < 90, so the wall-clock default doesn't truncate a normal run. No off-by-timeout bug.adversarial-review.yml:46-47forwards bothOLLAMA_CLOUD_API_KEYandCLAUDE_CODE_OAUTH_TOKEN, which the now-claude-inclusive inherited default requires; the localuses: ./...ref means this PR's review really does exercise the new defaults.examples/reusable.ymloverrides onlymodels:to three:cloudids; inheritingclaude-code=3inprovider_concurrencyis a harmless unused entry since no claude models are present.Nothing in the correctness lane is wrong here. (Note: README/
examplesdoc-accuracy obligations and the lens-concurrency tuning trade-off belong to the maintainability/performance lenses, so I'm leaving those to those reviewers.)🧹 Code cleanliness & maintainability — Minor issues
Both findings verified against the actual files.
Finding 1 — confirmed.
review-reusable.yml:15-19(file header) and:35-41(theworkflow_call.inputscomment) both describe the same default swarm in near-verbatim prose, ~20 lines apart, both added by this PR. Header: "Inputs ship the DEFAULT swarm … 3 cloud models + the Claude Code engine, 5-lens suite, all three claudes concurrent." Inputs block: "Inputs ship the DEFAULT Gadfly swarm … 3 strong cloud models + the Claude Code engine (sonnet/opus/opus:max), the 5-lens suite, all three claudes concurrent." Genuine duplication that will drift.Finding 2 — confirmed.
examples/README.md:9is not in the diff (unchanged) and now reads "take the defaults or override a few inputs | secretOLLAMA_CLOUD_API_KEY." But this PR's new defaults require bothOLLAMA_CLOUD_API_KEYandCLAUDE_CODE_OAUTH_TOKEN(perreview-reusable.yml:43+ the new note atreusable.yml:8-12), and the example it points to (examples/reusable.yml:62-66) explicitly overridesmodels:to a cloud-only set so it does NOT take the defaults. The row contradicts the example and the new defaults; per the CLAUDE.md maintenance rule it should have been updated in this change.Both survive.
VERDICT: Minor issues
Duplicated default-swarm blurb within
review-reusable.yml(.gitea/workflows/review-reusable.yml:15-19and:35-41). This PR adds the same paragraph twice in one file ~20 lines apart: the file header says "Inputs ship the DEFAULT swarm … 3 cloud models + the Claude Code engine, 5-lens suite, all three claudes concurrent," and the newworkflow_callinputs comment repeats it almost verbatim ("Inputs ship the DEFAULT Gadfly swarm … 3 strong cloud models + the Claude Code engine (sonnet/opus/opus:max), the 5-lens suite, all three claudes concurrent"). Two copies will drift independently when the swarm changes. Fix: keep the detailed description on theinputs:block (next to the actual defaults) and reduce the header to a one-line pointer, or vice-versa.examples/README.md:9is now stale and was not updated with this change (maintenance rule: "Keep the README andexamples/current … in the same change"). The row still reads "take the defaults or override a few inputs | secretOLLAMA_CLOUD_API_KEY." After this PR the reusable's actual defaults require bothOLLAMA_CLOUD_API_KEYandCLAUDE_CODE_OAUTH_TOKEN; the shippedexamples/reusable.ymlonly works with the single key because it overridesmodels:to a cloud-only set (examples/reusable.yml:66) — i.e. it explicitly does not take the defaults. The "take the defaults" wording now contradicts the example it points at. Fix: reword to something like "cloud-only override out of the box; delete it (and forward the Claude token) to inherit the full default swarm."Both are minor wording/consistency issues, not functional. Everything else checked is consistent: the reusable defaults (
models,specialists,provider_concurrency,timeout_secs,max_steps,timeout_minutes:90) match whatadversarial-review.yml, the README blurb, andCLAUDE.mdnow describe; the slimmedadversarial-review.ymlcorrectly drops the redundantwith:overrides and keeps onlyallowed_users; no stale references to the removed models (glm-5.1,nemotron-3-super,qwen3-coder:480b-cloud) or the oldclaude-code=2/timeout_minutes: 45survive inREADME.md/examples/; model-list strings are spacing-consistent acrossreview-reusable.yml:43andexamples/reusable.yml:66.⚡ Performance — No material issues found
Verified the concurrency/timeout model against
entrypoint.shandcmd/gadfly/main.goso I could check the wall-clock math behind the new defaults.VERDICT: No material issues found
This PR is config-defaults + docs only — no algorithmic code, no hot loops, no query/allocation paths. Through the performance lens the relevant question is whether the new swarm defaults fit their wall-clock budget. They do.
Wall-clock math checks out (verified):
GADFLY_TIMEOUT_SECSis per-lens, shared by both passes (cmd/gadfly/main.go:48), default 300, now defaulted to 600 (10 min/lens).provider_lens_concurrencydefaults toollama-cloud=3only — claude-code is omitted, so claude lenses run serially at the scalar default of 1 (main.go:282-285).provider_concurrencycaps models-at-once (entrypoint.sh:206-217).claude-code=3runs all three claudes concurrently.timeout_minutes: 45 → 90bump is necessary and correct, not a regression: the old 45-min cap would have killed the 5-lens serial claude lane mid-run. The new cap leaves comfortable margin.One observation, not a defect (deliberate tradeoff):
provider_lens_concurrencysetsollama-cloud=3but no entry forclaude-code, so each claude model serializes its 5 lenses (the ~50-min lane). Parallelizing claude lenses would cut that lane's wall-clock substantially, but doing so under a single OAuth token / extended-thinking model would likely hit rate limits and isn't clearly a win — so leaving it serial is a reasonable choice. Flagging only so it's a conscious decision, not an oversight.Cost footprint is out of my lane: the default jumps from the image's 1-model default to 6 models × 5 lenses × 2 passes (≈60 passes/PR) for any consumer who omits
with:. That's an intentional, documented "curated swarm" cost decision — a maintainability/cost question, not an efficiency regression, so I'll leave it to that lens.No performance changes to request.
🧯 Error handling & edge cases — No material issues found
I've verified the unhappy-path behavior this default change reaches. Here's my review through the error-handling & edge-cases lens.
VERDICT: No material issues found
This diff is config-only: it changes default input values in
review-reusable.yml, slimsadversarial-review.ymlto inherit them, and updates docs. It touches no error-handling code — the machinery that handles the unhappy paths lives inentrypoint.sh/scripts/run.sh, which this PR doesn't modify. I checked the edge cases the new default could newly trigger, and all degrade gracefully:New default needs two creds, but a consumer may forward only one. The default
modelsnow includesclaude-code/sonnet|opus|opus:maxalongside the cloud models. I verified a consumer forwarding onlyOLLAMA_CLOUD_API_KEY(the historically common case) does not crash or block:scripts/run.sh:151-152posts a per-model "⚠️ … skipped" notice.CLAUDE_CODE_OAUTH_TOKEN→ the binary exits nonzero, caught atscripts/run.sh:176-181and surfaced as a "⚠️ Agentic reviewer … failed" comment. Per-model failure is further isolated byentrypoint.sh:183(|| log "model … failed (continuing)"), so one failing model never aborts the lane or the job. Advisory-only invariant holds. This is noisy (3 extra warning comments) but that's a docs/UX concern, not an error-handling defect — and the PR explicitly documents the two-credential requirement.entrypoint.sh:189-191, so the board still reaches N/N.Empty-string fallback path is preserved. Each input still resolves to
""→ image default if explicitly overridden, andentrypoint.sh/run.shalready treat empty config envs as "use built-in default." Changing the YAML default from""to a concrete value doesn't open a new nil/empty branch.One thing I considered and deliberately leave to the performance/correctness lens rather than claim here:
timeout_secs=600per lens × 5 lenses × 2 passes against the new 90-minute job cap is a tight wall-clock budget for the slowest claude-code model (lens concurrency is set only forollama-cloud, notclaude-code). That's a resource-budgeting/boundary question, not a swallowed-error or unhandled-input defect, so it's out of my lane — flagging it only so it isn't missed.Within error handling & edge cases, the diff is clean.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 3s
🪰 Gadfly review —
claude-code/sonnet(claude-code)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
No material issues found
This PR's security-relevant surface is entirely in how new workflow input defaults (
models,specialists,provider_concurrency, etc.) propagate to the container. I verified all of the following:Default strings are hardcoded YAML literals, not consumer-supplied at runtime. They reach the container as
GADFLY_*env vars — the safe channel. No shell execution touches these values directly;entrypoint.shparses the model CSV into a bash array and strips whitespace withtr, notevalor unquoted glob expansion.CLAUDE_CODE_OAUTH_TOKENis already properly declared in theworkflow_call.secretsblock (required: false) and is forwarded to the container via env at.gitea/workflows/review-reusable.yml:110. It is never logged, never passed as a CLI argument, and never echoed — it flows through env inheritance to the gadfly binary, same as every other API key.Least-privilege secret forwarding is preserved. Both the updated
adversarial-review.ymlandexamples/reusable.ymlstill enumerate secrets explicitly rather than usingsecrets: inherit. No new secrets are introduced.allowed_usersdefault remains""(no-ACL-by-default), but this is unchanged behavior from before the PR. The entrypoint's username comparison (splitting on,viaIFS) has no shell injection risk given Gitea's username character constraints.No model-name injection risk. Even if a consumer passes an adversarial
models:string, it ends up in a bash array element passed asMODEL="$value"torun.sh, then asGADFLY_MODEL="$MODEL"in a subprocess env — never evaluated as shell code.🎯 Correctness — No material issues found
No material issues found
I verified the key correctness claims by reading
entrypoint.sh,cmd/gadfly/main.go, and both workflow files.Model-to-lane assignment (verified,
entrypoint.sh:159,model.go:146-155):provider_of("claude-code/opus:max")→"claude-code"in both bash and Go. Theprovider_concurrency: "ollama-cloud=3,claude-code=3"cap therefore applies correctly to all three claude-code models, allowing all three to run in parallel as documented.Lens concurrency for claude-code (verified,
main.go:281-286):GADFLY_PROVIDER_LENS_CONCURRENCY="ollama-cloud=3"has noclaude-codeentry, solensConcurrency()falls back todefaultLensConcurrency=1— claude-code lenses run sequentially. This is intentional (not a bug), and the timeout math is sound: 5 lenses × 600s max = 3000s ≈ 50 min per model, with 3 models in parallel → ~50 min wall-clock for the claude-code lane, well inside the 90-min job cap.timeout_secs shared across both passes (verified,
main.go:310-324): Both the review pass and recheck pass consume the same 600s context per lens. This is pre-existing behavior, unchanged by this PR.Empty-string fallback (verified,
entrypoint.sh:156):MODELS="${GADFLY_MODELS:-...}"uses:-, so a consumer passingmodels: ""correctly falls back toDEFAULT_MODELS="qwen3-coder:480b-cloud"as the comment claims.Cross-file consistency (verified):
adversarial-review.yml'sif:actor gate (steve,fizi,dazed) matches theallowed_usersinput;examples/reusable.ymloverrides models to cloud-only and comments the CLAUDE_CODE_OAUTH_TOKEN consistently; all doc claims (3 cloud + 3 claude-code, 5-lens suite, claude-code=3) match the actual input defaults inreview-reusable.yml.🧹 Code cleanliness & maintainability — Minor issues
All three findings check out against the actual code. Let me confirm the exact line numbers before writing the final review.
The actual file contents confirm:
adversarial-review.ymllines 6-7 and 51-53 both describe the same swarm in slightly different words — confirmed.review-reusable.yml:43default andexamples/reusable.yml:66both containminimax-m3:cloud,glm-5.2:cloud,deepseek-v4-pro:cloudverbatim — confirmed.examples/README.md:9still says "take the defaults or override a few inputs" butexamples/reusable.ymlnow always ships an activemodels:override (cloud-only, to avoid needing the Claude token) — confirmed stale.All findings survive. Line numbers in the draft were slightly off (off by 1-3 lines); correcting them below.
Minor issues
Redundant swarm description in
adversarial-review.yml(.gitea/workflows/adversarial-review.yml:6-7and:51-53). The file-header comment ("It inherits the default swarm: 3 cloud models + Claude Code (sonnet, opus, opus:max), 5-lens suite, 3 claudes at once.") and thewith:block comment ("Inherit the default swarm (3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite, all three claudes concurrent) from review-reusable.yml.") state the same thing in slightly different words. Every time the swarm changes both comments need to be updated in sync. One of them can be dropped; thewith:comment is closer to the point of use and the more useful of the two.Cloud model subset duplicated across files (
.gitea/workflows/review-reusable.yml:43default vsexamples/reusable.yml:66). The three cloud models (minimax-m3:cloud,glm-5.2:cloud,deepseek-v4-pro:cloud) appear verbatim as the first three items in the full default inreview-reusable.ymland as the standalone override inexamples/reusable.yml. YAML workflow inputs can't reference constants, so this is somewhat forced, but a rename (e.g.glm-5.2→glm-6) will silently miss one of the two sites. A one-line comment inexamples/reusable.ymlnoting which models to keep in sync with the defaults would reduce the maintenance risk.examples/README.mddescription column is slightly stale (examples/README.md:9). The table saysreusable.ymllets you "take the defaults or override a few inputs." After this PR, the example file ships an activemodels:override (a cloud-only subset, not the default), so it no longer illustrates "taking the defaults." TheNeedscolumn (OLLAMA_CLOUD_API_KEY) is still accurate for the example as written, but a reader who looks at the table to decide whether to usereusable.ymloradversarial-review.ymlgets a subtly wrong description of what the slim stub does. CLAUDE.md's maintenance rule ("Any change to … the consumer stub MUST be reflected in … the relevant files underexamples/") applies here.⚡ Performance — No material issues found
The finding hinges on the claim that the job wall-clock worst case is
5 × 2 × 600 = 6,000 s. The code contradicts this.In
reviewWithSpecialist(main.go:310–312), a singlectxis created with the 600 s deadline, and both the review pass (line 314) and the recheck pass (line 323) run under that same context. The two passes share one 600 s budget, not one each. The correct worst-case is:The finding's "2 passes × 600 s" multiplier is wrong. The finding cannot be confirmed and is dropped.
No material issues found
🧯 Error handling & edge cases — No material issues found
No material issues found
Reviewed through the error handling & edge cases lens. Changes are confined to YAML workflow defaults and documentation.
Verification performed:
""fallback claim (workflow comment: "Set any input to""to fall back to the image/entrypoint built-in default") — verified correct on both sides:GADFLY_MODELS=""→entrypoint.sh:156${GADFLY_MODELS:-...}correctly falls through to$DEFAULT_MODELS.GADFLY_SPECIALISTS=""→specialists.go:136checksstrings.TrimSpace(os.Getenv(...)) != "", so empty string falls through todefaultSuite. No regression.Failed claude-code models when
CLAUDE_CODE_OAUTH_TOKENis absent —entrypoint.sh:183|| log "model $1 failed (continuing)"catches it;entrypoint.sh:189-191then marks the pre-seeded status filedone:trueso the board completes. Graceful degradation is intact.Malformed/absent
provider_concurrency—provider_cap()atentrypoint.sh:167-176handles empty or missing=Nentries by falling back to$DEFAULT_CONC=1. The newclaude-code=3entry in the default is syntactically clean.PROVIDERS=""(empty model list edge case) —entrypoint.sh:249-253for p in $PROVIDERSsimply doesn't iterate;LANE_PIDS=()stays empty; the conditionalwaitis skipped. No hang.Actor gate consistency —
adversarial-review.yml:36-41Giteaif:condition and theallowed_users: "steve,fizi,dazed"input are consistent. The belt-and-suspenders duplication is pre-existing and intentional.with:containing only comments +allowed_users— valid YAML; no parse issue.Nothing in this diff introduces new unhappy paths, ignores errors, or removes an existing guard.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 14m 38s
🪰 Gadfly review —
deepseek-v4-pro:cloud(ollama-cloud)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
No material issues found
The diff moves configuration defaults (model list, specialist suite, concurrency, timeouts) from the caller workflow into the reusable workflow's
workflow_callinput defaults. Through the security lens:required: false, and the caller (adversarial-review.yml) forwards the same four secrets as before. No new secrets are exposed, and the least-privilege pattern is preserved.if:condition inadversarial-review.ymlstill gatesissue_commentevents on the same three usernames (steve,fizi,dazed). The in-containerGADFLY_ALLOWED_USERScheck (entrypoint.sh:71-76) is unchanged and still provides the belt-and-suspenders layer.allowed_usersdefault of"": The fallback inentrypoint.sh:78-82(collaborator check) is the same behavior as before this PR. No weakening.entrypoint.sh— the values are used in parameter expansions,readinto arrays, and (for the status file path) sanitized viatr -c '[:alnum:]._-' '_'(line 165). No command substitution or eval on these inputs.examples/reusable.ymluses"your-username"as a placeholder — a copy-paste consumer must customize it, but the worst case is a non-functional comment trigger, not a security bypass (PR events still fire, and the in-container check would deny an unknown actor).No authn/authz gaps, injection vectors, secret leaks, or insecure defaults introduced by this change.
🎯 Correctness — Minor issues
Both findings are confirmed against the actual code. Let me output the corrected review.
Verdict: Minor issues
max_steps: "14"is now the inherited default, down from the binary's built-in 24 (review-reusable.yml:50). Previously, a consumer that omittedmax_stepsgot""→ the Go binary used its owndefaultMaxStepsof 24 (percmd/gadfly/main.go:81and theenvIntfallback at line 450–459). Now they get 14 — a 42% drop. The binary has a tool-free finalization fallback for step-exhausted passes (finalizeInstructionat line 113, invoked inrunAgent), so it won't silently produce nothing, but reviews may be less thorough. This was already gadfly's own value in the oldadversarial-review.yml, so it's a known tradeoff, but it's now imposed on all inheriting consumers.timeout_secs: "600"unchanged while the default specialist count grew from 3→5 (review-reusable.yml:49). The oldadversarial-review.ymlran 3 specialists per model with a 600s binary deadline; the new default runs 5. For cloud models with lens concurrency=3, 5 specialists run in 2 batches (3+2), each batch doing review+recheck = 4 serial rounds → ~150s per round, which is fine. But for claude-code models (provider_lens_concurrencyhas noclaude-codeentry, solensConcurrency()atcmd/gadfly/main.go:281-286falls back to 1), 5 specialists × 2 passes = 10 sequential rounds → ~60s per round. Claude Code with extended thinking (opus:max) can exceed 60s per call. This was already the case in the old config (same 600s, same lens-concurrency gap for claude-code), so it's not a new regression — but the PR expands the default specialist suite without addressing the claude-code lens-concurrency bottleneck.🧹 Code cleanliness & maintainability — No material issues found
No material issues found
The diff is clean from a maintainability standpoint:
adversarial-review.ymlno longer repeats config values that are now defaults inreview-reusable.yml— thewith:block shrinks from 8 explicit overrides to justallowed_users. This is the right pattern: define once, inherit everywhere.review-reusable.yml,adversarial-review.yml, andexamples/reusable.ymlall describe the same default swarm (3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite) with matching terminology. Verified by reading all three files.examples/reusable.ymlexplains why it overridesmodels:(to work with just the Ollama key) and how to restore the full default (delete the override + forward the Claude token). No magic — a copy-paste consumer can understand the tradeoff from the file alone.adversarial-review.yml(glm-5.1:cloud,nemotron-3-super:cloud,qwen3-coder:480b-cloud) don't appear in any other workflow or example file — they were only in that one caller. Verified via grep.adversarial-review.ymldropped "Phase-4" from its comment whileCLAUDE.mdretains it — is harmless internal project jargon, not a maintenance hazard.⚡ Performance — No material issues found
No material issues found.
This is a pure configuration change (YAML defaults + docs). I traced every new default through the actual runtime code (
entrypoint.sh,cmd/gadfly/main.go,cmd/gadfly/specialists.go) to verify the performance impact:models: 6 models (3 cloud + 3 claude-code) vs the image's prior built-inDEFAULT_MODELSof 1 (qwen3-coder:480b-cloudatentrypoint.sh:52). This is a deliberate 6× work increase for default consumers, not a regression — the PR explicitly ships a "curated swarm."specialists: The new default (security,correctness,maintainability,performance,error-handling) is byte-identical to the Go binary'sdefaultSuiteatcmd/gadfly/specialists.go:67. When the env was empty, the code already fell through to this exact same 5-lens set (specialists.go:141). Zero behavioral change.provider_concurrency:ollama-cloud=3,claude-code=3replaces the prior empty-string fallback ofGADFLY_CONCURRENCYdefault 1 (entrypoint.sh:157). 3× more parallel models per provider — intentional throughput increase.provider_lens_concurrency:ollama-cloud=3replaces the prior fallback ofGADFLY_LENS_CONCURRENCYdefault 1 (main.go:285). 3× more parallel lenses for cloud models — intentional.timeout_secs: 600 replaces the binary default of 300 (main.go:310). Needed for Claude Code extended-thinking models; the 90-minute job cap accommodates 5 sequential lenses × 600s = 50 min per claude-code model.max_steps: 14 replaces the binary default of 24 (main.go:315). This is a 42% reduction in tool steps per pass — a performance improvement (faster completion), not a regression.timeout_minutes: 90 replaces 45. Needed to fit the larger default swarm; the arithmetic checks out (slowest lane: 3 claude-code models in parallel, each ~50 min worst-case, fits under 90).No N+1 queries, no allocations, no hot-loop issues, no unbounded growth — there is no code change to evaluate those against. The configuration values are internally consistent and the parallelism limits are bounded.
🧯 Error handling & edge cases — No material issues found
Verdict: No material issues found
The diff changes only YAML workflow defaults and documentation. I traced every new default value through the runtime code (
entrypoint.sh,scripts/run.sh,cmd/gadfly/main.go,cmd/gadfly/specialists.go) and confirmed that all error paths are already handled:modelsdefault with claude-code entries:entrypoint.sh:156resolves models via${GADFLY_MODELS:-${OLLAMA_REVIEW_MODELS:-$DEFAULT_MODELS}}. The claude-code engine path inrun.sh:134-184handles missingCLAUDE_CODE_OAUTH_TOKEN/ binary-not-found / non-zero exit — all produce a⚠️error comment, never a crash.specialistsdefault:resolveSpecialists(specialists.go:90-169) parses the CSV, validates each name against the registry, and skips unknowns with an error logged — no panic on unrecognized names.provider_concurrency/provider_lens_concurrency:provider_cap()(entrypoint.sh:167-176) strips whitespace, checks for non-empty values, and falls back toDEFAULT_CONC=1on any parse failure.lensConcurrency()(main.go:281) has the same fallback pattern, confirmed by the test matrix inlens_concurrency_test.go:160-188(including "scalar invalid falls back to default").timeout_secs: "600"/max_steps: "14": Both are valid integer strings;envInt(main.go:310,315) falls back to built-in defaults on parse failure. The binary's tool-free finalization fallback (main.go:44-45) guarantees output even on step exhaustion.timeout_minutes: 90: Gitea Actions job-level cap; the oldadversarial-review.ymlalready used 90 explicitly, so no behavioral change for the dogfooding caller.allowed_users: ""default unchanged:entrypoint.sh:68-83handles empty by falling back to the collaborator check — no auth bypass.The one configuration tension worth noting (though not an unhandled error):
timeout_secs: "600"is a single per-model deadline shared across all lenses, andprovider_lens_concurrencyhas noclaude-codeentry, so claude-code models run 5 lenses sequentially within those 600 seconds. On a non-trivial PR where a claude-code opus:max lens pass exceeds ~60 seconds, the binary will hit its deadline. But this is handled —run.sh:176-181catches the non-zero exit and posts a⚠️error comment. The system degrades gracefully rather than silently failing or crashing.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 32s
🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
Through the security lens, I checked the actual checked-out code rather than trusting the diff:
review-reusable.yml:52keepsallowed_usersdefault as""(not a new permissive default), andGADFLY_ALLOWED_USERSis still consumed byentrypoint.sh:71-73as the in-container belt-and-suspenders check. Bothadversarial-review.yml:54andexamples/reusable.yml:69set it explicitly. The comment-trigger actor gate inadversarial-review.yml:36-41is unchanged. No authn/authz regression.workflow_call.secrets(review-reusable.yml:64-74), nosecrets: inheritintroduced. The new defaultmodelsvalue referencesclaude-code/*providers, but that does not cause any additional secret to be read —CLAUDE_CODE_OAUTH_TOKENwas already mapped atreview-reusable.yml:110and only resolves if the caller forwards it. A consumer who omits it simply gets an empty env (harmless fail), no leak.GADFLY_*env vars (review-reusable.yml:120-130) and consumed by the pinned image. They are not interpolated into shell, URLs, or templates within the workflow itself. Changing a default value does not expand the input→sink attack surface.OLLAMA_CLOUD_API_KEYandCLAUDE_CODE_OAUTH_TOKEN) is a usability/correctness concern, not a security gap — missing credentials degrade to empty env, they don't bypass any check or leak data. Theexamples/reusable.yml:62-68cloud-onlymodels:override correctly aligns with forwarding onlyOLLAMA_CLOUD_API_KEY.Nothing in the security lane is materially wrong.
🎯 Correctness — No material issues found
VERDICT: No material issues found
modelsdefault (review-reusable.yml:43) →GADFLY_MODELSenv (:120) → consumed asMODELSinentrypoint.sh:156. The 6 model ids parse fine, andprovider_of(entrypoint.sh:159) splits them into lanes:ollama-cloud(3 models) andclaude-code(3 models).provider_concurrency: "ollama-cloud=3,claude-code=3"(:47) →GADFLY_PROVIDER_CONCURRENCY→ parsed byprovider_cap(entrypoint.sh:167-176). Both keys match the lane names derived from the model specs, so caps resolve to 3/3 as intended. Comment "all 3 claudes at once" is accurate (3 claude-code models, cap 3).provider_lens_concurrency: "ollama-cloud=3"(:48) →GADFLY_PROVIDER_LENS_CONCURRENCY→ parsed byproviderOverride/lensConcurrency(cmd/gadfly/main.go:281-302). The provider keyollama-cloudmatchesmodelProvider()'s output for bare cloud models. Claude Code models have no override entry, so they correctly fall back toGADFLY_LENS_CONCURRENCYscalar default (1) — consistent with the comment intent of not oversubscribing the Claude lane.timeout_secs: "600"→GADFLY_TIMEOUT_SECS→reviewWithSpecialist(main.go:310). Per-lens, matches the documented semantic.max_steps: "14"→GADFLY_MAX_STEPS→eng.runPass(main.go:315). Correct.timeout_minutes: 90(number type) →jobs.review.timeout-minutes(:86). Valid.allowed_usersremains""default in the reusable, and the self-review caller overrides it with"steve,fizi,dazed"(adversarial-review.yml:54); the removedtimeout_minutes: 90override is now inherited from the default (also 90), so no behavior change there. No regression.examples/reusable.ymlcloud-onlymodels:override correctly omits the claude-code models so it works with onlyOLLAMA_CLOUD_API_KEY— verified the override string is a strict subset of the default's cloud portion.No correctness defects found in the change's lane.
🧹 Code cleanliness & maintainability — Minor issues
Both findings are confirmed against the actual code. The default-swarm description does appear twice in
review-reusable.yml(header lines 15-19 and inputs-block lines 35-41), and the same description is repeated across the five named files. Note: the phrasing varies slightly between files (not strictly verbatim), but the substance of the duplication is real.VERDICT: Minor issues
.gitea/workflows/review-reusable.yml:15-19vs35-41— the default-swarm description ("3 cloud models + Claude Code engine sonnet/opus/opus:max, 5-lens suite, all three claudes concurrent, needs both keys") is stated twice in the same file: once in the top header comment and again in theon.workflow_call.inputsblock comment. They say the same thing in slightly different words, so any change to the default (e.g. dropping a model, bumping concurrency) requires editing two prose blocks in one file, which drift easily. Consider trimming the inputs-block comment to a one-liner pointing at the header (or vice versa) so the canonical description lives in one place. Verified by reading the full file.examples/reusable.yml:8-12+review-reusable.yml:35-41+adversarial-review.yml:6-7+CLAUDE.md:49-53+README.md:42-43— the same default-swarm description is now copy-pasted across five locations (with minor wording variations, not strictly verbatim). Cross-file doc repetition is partly unavoidable, but the core phrasing ("3 cloud models + Claude Code engine (sonnet/opus/opus:max), 5-lens suite") recurs in each; if the default changes this is a multi-file edit prone to drift. Not blocking, but worth a shared note (e.g. a singleexamples/README.mdtable the others link to). Verified by reading all five files.Otherwise the change is a clean, low-churn consolidation: the
adversarial-review.ymlcaller correctly shed its duplicated config now that the reusable owns the defaults, and thetimeout_minutes: 90removal eliminates a previously-repeated value. Theexamples/reusable.ymlcloud-only override is well-commented and consistent with the secrets it forwards.⚡ Performance — No material issues found
Verdict: No material issues found
This is a workflow-input-defaults change in YAML (no code paths touched), so I verified the actual concurrency model against the entrypoint and binary before concluding.
Checked:
entrypoint.sh:144-218— provider lanes run in parallel; within a lane,provider_capbounds concurrent models; lens-level concurrency is separate (resolved per-model in the Go binary).cmd/gadfly/main.go:281-302—lensConcurrencyreadsGADFLY_PROVIDER_LENS_CONCURRENCYper provider, else falls back toGADFLY_LENS_CONCURRENCY(default 1). Confirmed the new defaultprovider_lens_concurrency: "ollama-cloud=3"leaves claude-code lenses serial (1), while ollama-cloud fans out to 3.cmd/gadfly/main.go:81-99— image defaults (maxSteps=24,timeoutSecs=300,lensConcurrency=1) confirm the new defaults (14/600/3-vs-1) are deliberate tuning, not regressions.Wall-clock reasoning for the new default swarm:
provider_concurrency=3→ all concurrent; lenses serial): worst case 5 lenses × 600s = ~50 min per model, all 3 in parallel → ~50 min lane wall-clock. The 90-mintimeout_minutesdefault accommodates this with margin. ✓lens_concurrency=3): bounded fan-out (3 models × 3 lenses = 9 concurrent), well within the 90-min budget since lenses batch 5/3 = 2 waves. ✓The increase from 3→5 lenses and the 45→90 min default are justified by the worst-case serial claude-code path and are consistent with the existing gadfly caller (which already used 90 min). No performance regression introduced.
🧯 Error handling & edge cases — No material issues found
Verdict: No material issues found
Through the error-handling / edge-cases lens, this diff only changes input default values and documentation; it touches no error-handling logic, no cleanup paths, and no boundary arithmetic. I verified the consuming code paths:
review-reusable.yml:120-130— inputs flow straight into env vars; empty-string fallback is unchanged and the entrypoint still treats""as "use image default" (entrypoint.sh:156).entrypoint.sh:167-176provider_capparsesollama-cloud=3,claude-code=3correctly:item%%=*/item#*=split yieldsclaude-code→3, matchingprovider_of's first-path-segment rule (claude-code/opus:max→claude-code). No off-by-one; the[ "$cap" -ge 1 ] 2>/dev/null || cap=1guard (entrypoint.sh:208) still clamps a malformed/zero value to 1.entrypoint.sh:212-217lane loop:inflight=$((inflight+1))againstcap=3with ≤3 models per lane — no overflow, no overrun;wait -n 2>/dev/null || waitfallback intact.cmd/gadfly/main.go:281-299lensConcurrency/providerOverride: the new defaultprovider_lens_concurrency="ollama-cloud=3"has noclaude-codeentry, so claude-code lenses fall back to the scalar default (1, serial). That's a deliberate config choice, not an unhandled case;strconvparse errors are ignored viaerr == nil && n > 0.One pre-existing, non-blocking observation (not introduced by this PR, so not a finding against the diff):
review_oneswallows per-model failures (run.sh ... || log "model $1 failed (continuing)",entrypoint.sh:183). The new default swarm requires bothOLLAMA_CLOUD_API_KEYandCLAUDE_CODE_OAUTH_TOKEN; a consumer that inherits the default while forwarding only one secret will get silent per-model failures for the un-authed lane rather than a clear abort. The PR documents this requirement andexamples/reusable.ymlships a cloud-onlymodels:override to avoid it, so the edge case is covered by guidance rather than by enforced validation — acceptable for an advisory-only workflow.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 30s