feat(reusable): ship the curated swarm as the default config consumers inherit #10

2026-06-28T02:05:46Z

steve commented

2026-06-28 02:05:46 +00:00

Makes the curated swarm the default Gadfly config, defined once in the reusable workflow's input defaults, so consumers subscribe by just calling it (no with: block) and inherit it.

Default swarm (was all `""` → image default)

input	new default
`models`	`minimax-m3:cloud, glm-5.2:cloud, deepseek-v4-pro:cloud, claude-code/sonnet, claude-code/opus, claude-code/opus:max`
`specialists`	`security, correctness, maintainability, performance, error-handling` (5-lens suite)
`provider_concurrency`	`ollama-cloud=3,claude-code=3` (all three claudes at once)
`provider_lens_concurrency`	`ollama-cloud=3`
`timeout_secs` / `max_steps`	`600` / `14`
`timeout_minutes`	`45 → 90`

Consumers now inherit

gadfly's own adversarial-review.yml slimmed to just allowed_users (uses the local reusable ref, so this PR's own review self-tests the new defaults).
mort / executus / majordomo will be re-pinned to this commit and slimmed the same way once merged.

Opinionated default — documented

The default needs OLLAMA_CLOUD_API_KEY and CLAUDE_CODE_OAUTH_TOKEN. A consumer with only one (or a different provider) overrides models:. The public examples/reusable.yml keeps a cloud-only models: override so a copy-paste works with just the Ollama key; README / examples / CLAUDE.md updated to match.

Advisory-only invariant unchanged.

🤖 Generated with Claude Code

Makes the curated swarm the **default Gadfly config**, defined once in the reusable workflow's input defaults, so consumers subscribe by just calling it (no `with:` block) and inherit it. ### Default swarm (was all `""` → image default) | input | new default | |---|---| | `models` | `minimax-m3:cloud, glm-5.2:cloud, deepseek-v4-pro:cloud, claude-code/sonnet, claude-code/opus, claude-code/opus:max` | | `specialists` | `security, correctness, maintainability, performance, error-handling` (5-lens suite) | | `provider_concurrency` | `ollama-cloud=3,claude-code=3` (all three claudes at once) | | `provider_lens_concurrency` | `ollama-cloud=3` | | `timeout_secs` / `max_steps` | `600` / `14` | | `timeout_minutes` | `45 → 90` | ### Consumers now inherit - **gadfly's own** `adversarial-review.yml` slimmed to just `allowed_users` (uses the local reusable ref, so **this PR's own review self-tests the new defaults**). - mort / executus / majordomo will be re-pinned to this commit and slimmed the same way once merged. ### Opinionated default — documented The default needs `OLLAMA_CLOUD_API_KEY` **and** `CLAUDE_CODE_OAUTH_TOKEN`. A consumer with only one (or a different provider) overrides `models:`. The public `examples/reusable.yml` keeps a **cloud-only `models:` override** so a copy-paste works with just the Ollama key; README / examples / CLAUDE.md updated to match. Advisory-only invariant unchanged. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

steve added 1 commit 2026-06-28 02:05:46 +00:00

feat(reusable): ship the curated swarm as the default config consumers inherit

Build & push image / build-and-push (pull_request) Successful in 8s

Details

Adversarial Review (Gadfly) / review (pull_request) Successful in 14m47s

Details

79da1bfde3

Make the reusable workflow's input defaults BE the standard Gadfly swarm so a
consumer subscribes by just calling it (no `with:` block) and inherits:
- models: 3 strong cloud (minimax-m3, glm-5.2, deepseek-v4-pro) + Claude Code
  (sonnet, opus, opus:max)
- specialists: the 5-lens default suite (security, correctness, maintainability,
  performance, error-handling)
- provider_concurrency: ollama-cloud=3,claude-code=3 (all three claudes at once)
- timeout_minutes default 45 -> 90 (5 lenses x 2 passes over a slow lane)

The default is opinionated (needs OLLAMA_CLOUD_API_KEY + CLAUDE_CODE_OAUTH_TOKEN);
consumers override `models:` for cloud-only / other providers. gadfly's own
caller is slimmed to inherit (only allowed_users remains). examples/reusable.yml
keeps a cloud-only `models:` override so a public copy works with just the
Ollama key. README/CLAUDE.md updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitea-actions bot commented

2026-06-28 02:05:48 +00:00

🪰 Gadfly — live review status

6/6 reviewers finished · updated 2026-06-28 02:20:34Z

`claude-code/opus` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`claude-code/opus:max` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`claude-code/sonnet` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`deepseek-v4-pro:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — Minor issues
✅ maintainability — No material issues found
✅ performance — No material issues found
✅ error-handling — No material issues found

`glm-5.2:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`minimax-m3:cloud` · ollama-cloud — ✅ done

✅ security — Minor issues
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — Blocking issues found
✅ error-handling — No material issues found

_{Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.}

## 🪰 Gadfly — live review status 6/6 reviewers finished · updated 2026-06-28 02:20:34Z #### `claude-code/opus` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/opus:max` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/sonnet` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `deepseek-v4-pro:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `glm-5.2:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `minimax-m3:cloud` · ollama-cloud — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — Blocking issues found - ✅ **error-handling** — No material issues found Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.

gitea-actions bot commented

2026-06-28 02:05:48 +00:00

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

Verdict: Blocking issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues

Caller-side actor gate is now load-bearing for a larger blast radius — .gitea/workflows/review-reusable.yml:33-56 + entrypoint.sh:71-82
The reusable enforces no issue_comment actor gate itself; that gate lives in the caller (adversarial-review.yml:36-41, examples/reusable.yml:48-50). The in-container GADFLY_ALLOWED_USERS="" fallback only checks "is a repo collaborator" (entrypoint.sh:78-82). Before this PR the default swarm was "" (one model, via entrypoint.sh:52 DEFAULT_MODELS="qwen3-coder:480b-cloud"), so a missed gate had small blast radius. After this PR the default swarm pulls OLLAMA_CLOUD_API_KEY + CLAUDE_CODE_OAUTH_TOKEN plus 5 lenses × 6 models × 2 passes, so any collaborator who can comment @gadfly review on a copy-pasted stub now triggers a full multi-provider secret-bearing run. The docs warn about this, but the warning is one line in a comment; consider one of: (a) move the gate into the reusable with a inputs.bypass_actor_gate opt-out (default off), or (b) add a pre-flight allowed_users: required: true so a caller can't inherit "" by accident. Verified by reading both consumers' if: blocks and the in-container collaborator check.
Image is tag-pinned, not digest-pinned — .gitea/workflows/review-reusable.yml:88
:sha-c342bdb is a namespace prefix, not a sha256:… digest. The internal registry can re-tag and serve a different image at the same tag. Pre-existing, not introduced by this PR, but worth noting since the comment claims "immutable" — only digest pinning is truly immutable. Suggest switching to image@sha256:… for the consumer-facing reusable to match the comment's claim. Unverified that the registry actually enforces immutability on the :sha-* namespace.
No new injection / SSRF / deserialization / secret-leakage vectors introduced by this PR. Verified by tracing: each input flows ${{ inputs.X }} → env: on docker://… → entrypoint.sh env read (${GADFLY_…:-}). No new YAML run: blocks, no eval, no template expansion of attacker-controlled strings, no new outbound URLs. workflow_call.secrets all stay required: false, so an unforwarded secret stays empty — no accidental leak. secrets: inherit is documented as discouraged in both consumer files; verified least-privilege is preserved.
timeout_minutes raised 45 → 90 — .gitea/workflows/review-reusable.yml:56. Larger window for a comment-triggered (or stale) job to hold the runner and burn secrets. Mitigated by concurrency.cancel-in-progress: true in both callers, so not material.

Outside my lens: nothing.

🎯 Correctness — No material issues found

VERDICT: No material issues found

I checked the change through the Correctness lens (logic, semantic correctness, re-derived values):

cmd/gadfly/specialists.go:67 defines defaultSuite = []string{"security", "correctness", "maintainability", "performance", "error-handling"}. The new reusable specialists default (review-reusable.yml:44) is the exact same 5 — verified character-for-character. So the workflow input default and the Go binary's built-in default suite agree.
The new models default (minimax-m3:cloud,glm-5.2:cloud,deepseek-v4-pro:cloud,claude-code/sonnet,claude-code/opus,claude-code/opus:max) splits cleanly into 3 ollama-cloud + 3 claude-code models. Verified against entrypoint.sh:159 (provider_of — claude-code/<x> ⇒ provider claude-code) and :167-176 (provider_cap parses provider=N CSVs), so provider_concurrency: "ollama-cloud=3,claude-code=3" resolves to "cap 3 per lane" as documented.
The 3 claude-code/* entries share the claude-code provider, so the cap-3 lane runs sonnet/opus/opus:max in parallel — matches the PR's "all three claudes at once" claim. (Bonus: opus:max sets MAX_THINKING_TOKENS=31999 per engine.go:63-66 — well-formed spec.)
timeout_minutes: 90 and timeout_secs: 600: rough arithmetic — 6 models × 5 lenses × 2 passes = 60 lens invocations per provider; 600s per lens × 10 sequential lens-passes per model = 6000s (~100 min) if fully serial, but lanes + within-lane concurrency drop that well under 90 min. Tight but internally consistent; the PR explicitly addresses the concern in the comment.
examples/reusable.yml:66 deliberately overrides models to a cloud-only CSV; its comment correctly instructs users to delete the override (and forward the Claude token) to get the full default. Verified.
.gitea/workflows/adversarial-review.yml slimming: the only remaining with: key is allowed_users, which is the one consumer-specific value. Verified the reusable passes every GADFLY_* env through (lines 120-129), so no input is lost.

No magic numbers, conversion factors, or thresholds in this diff that need re-derivation — they're config defaults, not formulas. The structural change is self-consistent and the defaults align with the Go binary's own defaults.

🧹 Code cleanliness & maintainability — Minor issues

Now let me go through each finding carefully:

Finding 1 (duplicate header paragraphs): Lines 15-19 vs 35-41. Looking at the actual file, lines 15-19 describe the consumer example with with: { allowed_users: "..." } and then the inputs section starts. Lines 35-41 are inside on.workflow_call.inputs and explain the default swarm. They DO cover similar ground but for different audiences (top-of-file summary vs inline inputs doc). The duplication claim has merit — both paragraphs explain "3 cloud + Claude Code engine, 5-lens suite, all three claudes concurrent" almost verbatim. CONFIRMED.

Finding 2 (mixed defaults misleading header): Line 18 says "Set any input to "" to fall back to the image's built-in default" — but this is now in the new top header (the consumer-facing part), and with: inputs defaulting to non-empty curated values won't fall back to image defaults when set to "". Wait, actually re-reading: the diff shows the comment is at line 18 in the "Inputs ship the DEFAULT swarm..." paragraph, but in the current file the "" fallback rule is at line 41 (in the workflow_call.inputs block). Let me re-verify.

Looking at the current file, line 41 says "Set any input to "" to fall back to the image/entrypoint built-in default." This is inside the on.workflow_call.inputs comment block. The claim in the finding references "line 18" — but the file has been restructured. The draft's line numbers don't match the current file. The line at 18 is in the top header section. However, the broader concern is real: the "" fallback rule applies to inputs with empty string defaults, but the workflow now has a mix. CONFIRMED in spirit but line numbers stale.

Finding 3 (adversarial-review.yml comments): The # Security: comment at lines 33-35 is the only place that explains the mirror relationship with allowed_users. The input is at line 54. That's 17 lines. The finding is accurate. CONFIRMED.

Finding 4 (examples/reusable.yml comment placement): Line 67-68 says "Other inputs inherit the default... override any of them here" and sits immediately above allowed_users: "your-username" at line 69. This is misleading because allowed_users IS being explicitly set. CONFIRMED.

Finding 5 ("two passes" terminology): The draft says "two passes" is found in entrypoint.sh:51 and CLAUDE.md:13,103. My grep finds zero matches in those files. The actual usage is only in review-reusable.yml:55. The claim about entrypoint.sh/CLAUDE.md usage is wrong — but the core point that "two passes" is undefined in the workflow file stands, since it's a workflow-internal term with no definition nearby. The finding's supporting evidence is partially false, but the finding itself (the doc gap) is valid. I'll CONFIRM the finding but drop the false evidence details.

Finding 6 (Phase 4 fossil): CLAUDE.md:52 is the only remaining "(Phase 4)" reference, and the workflow files no longer mention it. CONFIRMED.

Now let me write the corrected review.

VERDICT: Minor issues

.gitea/workflows/review-reusable.yml:15-19 and :35-41 — Two near-duplicate header paragraphs explain the same "3 cloud + Claude Code engine, 5-lens suite, all three claudes concurrent" story. The first (top-of-file, lines 15–19) is the consumer-facing summary; the second (inside on.workflow_call.inputs, lines 35–41) repeats the same "default swarm" explanation. Trim the second to a short cross-reference so the explanation lives in one place.
.gitea/workflows/review-reusable.yml:41 — The inline comment says "Set any input to "" to fall back to the image/entrypoint built-in default" as a general rule, but the inputs block now mixes empty-string defaults (provider, base_url, worker_model, trigger_phrase, allowed_users) with curated non-empty defaults (models, specialists, provider_concurrency, provider_lens_concurrency, timeout_secs, max_steps). Empty-string defaults do fall through to image defaults; non-empty curated defaults fall back to the workflow default, not the image. A per-input marker (# "" → image default) on the five empty-default fields would prevent a maintainer from conflating the two.
.gitea/workflows/adversarial-review.yml:33-35 vs :54 — The # Security: comment is the only place that explains the actor-gate mirrors allowed_users, and it sits ~17 lines above the input it mirrors. A one-line # mirror of allowed_users below (and the in-container belt-and-suspenders check) next to the with: block would make the security coupling obvious at the point of definition.
examples/reusable.yml:67-68 — The comment "Other inputs inherit the default (5-lens suite, concurrency, 90-min cap); override any of them here…" sits immediately above allowed_users: "your-username", which IS being explicitly set — not inherited. A copy-paste reader would reasonably conclude the allowed_users: line should be deleted too. Move the comment above models: (where it's accurate) or split it into two annotations.
.gitea/workflows/review-reusable.yml:54-55 — The timeout rationale says "over two passes can run long", but the workflow file never defines what a "pass" is (and the phrase doesn't appear elsewhere in the repo). Either expand to "two review passes (see entrypoint.sh)" or drop the phrase.
CLAUDE.md:52 — The lone remaining "(Phase 4)" reference, now orphaned since the workflow files dropped the term. Either restore a matching reference in .gitea/workflows/review-reusable.yml or drop the parenthetical so the doc and the workflow agree.

⚡ Performance — Blocking issues found

VERDICT: Blocking issues found

.gitea/workflows/review-reusable.yml:43 — ~6× compute cost per review is now the default, contradicting documented intent. Verified: entrypoint.sh:52 ships DEFAULT_MODELS="qwen3-coder:480b-cloud" (one model), and CLAUDE.md:103 explicitly warns "Cost ≈ specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one)." This PR replaces that with 6 models as the default models input (minimax-m3:cloud, glm-5.2:cloud, deepseek-v4-pro:cloud, claude-code/sonnet, claude-code/opus, claude-code/opus:max). With the new 5-lens default suite (review-reusable.yml:44), that is 5 specialists × 6 models × 2 passes = 60 model invocations per PR — every consumer that omits with: now runs ~6× the token/$$ of the prior default. The PR compensates by setting timeout_secs to 600 (line 49) and the job cap to 90 (line 56, up from 45), but raising the budget makes the oversubscription easier to fit, not cheaper to run. CLAUDE.md:103 reads as a deliberate design constraint that this PR overrides silently. Either keep the "default = cheap" invariant (1–2 models, opt into the swarm) or update CLAUDE.md and the README cost note explicitly so consumers know what they're inheriting.
.gitea/workflows/review-reusable.yml:47-48 — multiplicative backend fan-out is now the default, and the in-repo code already warns against this. Verified: cmd/gadfly/main.go:230-233 warns "Caution: this fans out WITHIN one model. It multiplies with entrypoint.sh's per-provider model concurrency, so total concurrent backend requests ≈ (models at once) × (lenses at once)." The new defaults ship provider_concurrency: ollama-cloud=3,claude-code=3 (line 47) × provider_lens_concurrency: ollama-cloud=3 (line 48) → 9 simultaneous Ollama Cloud requests by default for every consumer that just calls the reusable. Every consumer that omits with: also inherits 3 concurrent claude CLI processes (each spawning Node children in their own process group per engine.go:159's Setpgid: true). This is fine for a curated set of repos, but a "default" implies safe-for-anyone; the multiplicative fan-out is the exact pattern the codebase flags as a footgun. Consider dropping provider_lens_concurrency's default to "" (fall through to the image's safer built-in default) and letting consumers raise it explicitly, or pick provider_concurrency: ollama-cloud=2,claude-code=2 to keep parallelism but cap the multiplier.
.gitea/workflows/review-reusable.yml:50 — max_steps: 14 (vs. image default defaultMaxSteps = 24 in cmd/gadfly/main.go:81) is a perf-positive change (fewer wasted tool steps per lens) — verified, flagging only so it's not a silent behavior drift: any consumer that previously omitted with: and relied on the image's 24-step cap now silently gets 14. That is the intended direction here, just worth a one-liner in the PR description / CHANGELOG so consumers know the cap shrank.

Verified by reading: entrypoint.sh:50-52, cmd/gadfly/main.go:80-90,220-234, cmd/gadfly/engine.go:63-77,150-165, .gitea/workflows/review-reusable.yml, .gitea/workflows/adversarial-review.yml, CLAUDE.md:98-107.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

Reviewed the diff through the error-handling & edge-cases lens. The change is purely workflow YAML + docs (no Go/shell code touched), so I focused on whether the new defaults introduce any new failure modes, swallowed errors, or unhandled edge cases in the existing plumbing they feed into.

What I verified:

models default (review-reusable.yml:43): all six specs parse cleanly via entrypoint.sh's provider_of (claude-code/* → lane claude-code, bare *:cloud → lane ollama-cloud). provider_concurrency and provider_lens_concurrency defaults cover both lanes — no provider falls through with an unset cap.
specialists default (review-reusable.yml:44): all five names (security, correctness, maintainability, performance, error-handling) are present in cmd/gadfly/specialists.go's builtinSpecialists and exactly match the defaultSuite fallback — no "unknown specialist" warnings.
Empty-input fallback contract: confirmed each defaulted env falls back to the image's built-in default when set to "", as the new docs claim:
- GADFLY_MODELS → entrypoint.sh:156 (${GADFLY_MODELS:-…$DEFAULT_MODELS}).
- GADFLY_SPECIALISTS → cmd/gadfly/specialists.go:136 (empty trimmed → defaultSuite).
- GADFLY_PROVIDER_CONCURRENCY → provider_cap (entrypoint.sh:167) returns DEFAULT_CONC when no/blank entry matches.
- GADFLY_TIMEOUT_SECS / GADFLY_MAX_STEPS → envInt (main.go:450) treats "", unparseable, and <=0 as missing → returns the built-in default.
Defensive caps preserved: run_lane (entrypoint.sh:208) still guards [ "$cap" -ge 1 ] 2>/dev/null || cap=1, so even a malformed provider_concurrency value can't drive concurrency to zero/negative.
Per-model failure isolation unchanged: run_lane still runs models with review_one … || log "model $1 failed (continuing)", so missing OLLAMA_CLOUD_API_KEY or CLAUDE_CODE_OAUTH_TOKEN won't crash the swarm — individual lenses mark themselves errored on the status board and the run completes.
timeout_minutes: 90: callers that previously pinned 90 now inherit the same value from the reusable default — no behavior change for existing callers (adversarial-review.yml dropped its redundant timeout_minutes: 90 line, matching the new default).

Nothing in this diff introduces a swallowed error, missing cleanup, panic on bad input, or an unhandled nil/empty/zero boundary. The "inherit by default, override per-input, empty falls back to image default" contract holds end-to-end.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 59s}

### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — Minor issues</summary> ## VERDICT: Minor issues - **Caller-side actor gate is now load-bearing for a larger blast radius — `.gitea/workflows/review-reusable.yml:33-56` + `entrypoint.sh:71-82`** The reusable enforces no `issue_comment` actor gate itself; that gate lives in the caller (`adversarial-review.yml:36-41`, `examples/reusable.yml:48-50`). The in-container `GADFLY_ALLOWED_USERS=""` fallback only checks "is a repo collaborator" (`entrypoint.sh:78-82`). Before this PR the default swarm was `""` (one model, via `entrypoint.sh:52` `DEFAULT_MODELS="qwen3-coder:480b-cloud"`), so a missed gate had small blast radius. After this PR the default swarm pulls **`OLLAMA_CLOUD_API_KEY` + `CLAUDE_CODE_OAUTH_TOKEN`** plus 5 lenses × 6 models × 2 passes, so any collaborator who can comment `@gadfly review` on a copy-pasted stub now triggers a full multi-provider secret-bearing run. The docs warn about this, but the warning is one line in a comment; consider one of: (a) move the gate into the reusable with a `inputs.bypass_actor_gate` opt-out (default off), or (b) add a pre-flight `allowed_users: required: true` so a caller can't inherit `""` by accident. Verified by reading both consumers' `if:` blocks and the in-container collaborator check. - **Image is tag-pinned, not digest-pinned — `.gitea/workflows/review-reusable.yml:88`** `:sha-c342bdb` is a namespace prefix, not a `sha256:…` digest. The internal registry can re-tag and serve a different image at the same tag. Pre-existing, not introduced by this PR, but worth noting since the comment claims "immutable" — only digest pinning is truly immutable. Suggest switching to `image@sha256:…` for the consumer-facing reusable to match the comment's claim. Unverified that the registry actually enforces immutability on the `:sha-*` namespace. - **No new injection / SSRF / deserialization / secret-leakage vectors introduced by this PR.** Verified by tracing: each input flows `${{ inputs.X }}` → `env:` on `docker://…` → `entrypoint.sh` env read (`${GADFLY_…:-}`). No new YAML `run:` blocks, no `eval`, no template expansion of attacker-controlled strings, no new outbound URLs. `workflow_call.secrets` all stay `required: false`, so an unforwarded secret stays empty — no accidental leak. `secrets: inherit` is documented as discouraged in both consumer files; verified least-privilege is preserved. - **`timeout_minutes` raised 45 → 90 — `.gitea/workflows/review-reusable.yml:56`**. Larger window for a comment-triggered (or stale) job to hold the runner and burn secrets. Mitigated by `concurrency.cancel-in-progress: true` in both callers, so not material. Outside my lens: nothing. </details> <details><summary>🎯 Correctness — No material issues found</summary> **VERDICT: No material issues found** I checked the change through the Correctness lens (logic, semantic correctness, re-derived values): - `cmd/gadfly/specialists.go:67` defines `defaultSuite = []string{"security", "correctness", "maintainability", "performance", "error-handling"}`. The new reusable `specialists` default (`review-reusable.yml:44`) is the **exact same 5** — verified character-for-character. So the workflow input default and the Go binary's built-in default suite agree. - The new `models` default (`minimax-m3:cloud,glm-5.2:cloud,deepseek-v4-pro:cloud,claude-code/sonnet,claude-code/opus,claude-code/opus:max`) splits cleanly into 3 ollama-cloud + 3 claude-code models. Verified against `entrypoint.sh:159` (`provider_of` — `claude-code/<x>` ⇒ provider `claude-code`) and `:167-176` (`provider_cap` parses `provider=N` CSVs), so `provider_concurrency: "ollama-cloud=3,claude-code=3"` resolves to "cap 3 per lane" as documented. - The 3 `claude-code/*` entries share the `claude-code` provider, so the cap-3 lane runs sonnet/opus/opus:max in parallel — matches the PR's "all three claudes at once" claim. (Bonus: opus:max sets `MAX_THINKING_TOKENS=31999` per `engine.go:63-66` — well-formed spec.) - `timeout_minutes: 90` and `timeout_secs: 600`: rough arithmetic — 6 models × 5 lenses × 2 passes = 60 lens invocations per provider; `600s` per lens × 10 sequential lens-passes per model = 6000s (~100 min) if fully serial, but lanes + within-lane concurrency drop that well under 90 min. Tight but internally consistent; the PR explicitly addresses the concern in the comment. - `examples/reusable.yml:66` deliberately overrides `models` to a cloud-only CSV; its comment correctly instructs users to delete the override (and forward the Claude token) to get the full default. Verified. - `.gitea/workflows/adversarial-review.yml` slimming: the only remaining `with:` key is `allowed_users`, which is the one consumer-specific value. Verified the reusable passes every GADFLY_* env through (lines 120-129), so no input is lost. No magic numbers, conversion factors, or thresholds in this diff that need re-derivation — they're config defaults, not formulas. The structural change is self-consistent and the defaults align with the Go binary's own defaults. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> Now let me go through each finding carefully: **Finding 1 (duplicate header paragraphs):** Lines 15-19 vs 35-41. Looking at the actual file, lines 15-19 describe the consumer example with `with: { allowed_users: "..." }` and then the inputs section starts. Lines 35-41 are inside `on.workflow_call.inputs` and explain the default swarm. They DO cover similar ground but for different audiences (top-of-file summary vs inline inputs doc). The duplication claim has merit — both paragraphs explain "3 cloud + Claude Code engine, 5-lens suite, all three claudes concurrent" almost verbatim. **CONFIRMED.** **Finding 2 (mixed defaults misleading header):** Line 18 says "Set any input to `""` to fall back to the image's built-in default" — but this is now in the *new* top header (the consumer-facing part), and `with:` inputs defaulting to non-empty curated values won't fall back to image defaults when set to `""`. Wait, actually re-reading: the diff shows the comment is at line 18 in the "Inputs ship the DEFAULT swarm..." paragraph, but in the current file the `""` fallback rule is at line 41 (in the `workflow_call.inputs` block). Let me re-verify. Looking at the current file, line 41 says "Set any input to `""` to fall back to the image/entrypoint built-in default." This is inside the `on.workflow_call.inputs` comment block. The claim in the finding references "line 18" — but the file has been restructured. The draft's line numbers don't match the current file. The line at 18 is in the top header section. However, the broader concern is real: the `""` fallback rule applies to inputs with empty string defaults, but the workflow now has a mix. **CONFIRMED in spirit but line numbers stale.** **Finding 3 (adversarial-review.yml comments):** The `# Security:` comment at lines 33-35 is the only place that explains the mirror relationship with `allowed_users`. The input is at line 54. That's 17 lines. The finding is accurate. **CONFIRMED.** **Finding 4 (examples/reusable.yml comment placement):** Line 67-68 says "Other inputs inherit the default... override any of them here" and sits immediately above `allowed_users: "your-username"` at line 69. This is misleading because `allowed_users` IS being explicitly set. **CONFIRMED.** **Finding 5 ("two passes" terminology):** The draft says "two passes" is found in `entrypoint.sh:51` and `CLAUDE.md:13,103`. My grep finds zero matches in those files. The actual usage is only in `review-reusable.yml:55`. The claim about entrypoint.sh/CLAUDE.md usage is wrong — but the core point that "two passes" is undefined in the workflow file stands, since it's a workflow-internal term with no definition nearby. The finding's *supporting evidence* is partially false, but the finding itself (the doc gap) is valid. I'll **CONFIRM** the finding but drop the false evidence details. **Finding 6 (Phase 4 fossil):** CLAUDE.md:52 is the only remaining "(Phase 4)" reference, and the workflow files no longer mention it. **CONFIRMED.** Now let me write the corrected review. **VERDICT: Minor issues** - `.gitea/workflows/review-reusable.yml:15-19` and `:35-41` — Two near-duplicate header paragraphs explain the same "3 cloud + Claude Code engine, 5-lens suite, all three claudes concurrent" story. The first (top-of-file, lines 15–19) is the consumer-facing summary; the second (inside `on.workflow_call.inputs`, lines 35–41) repeats the same "default swarm" explanation. Trim the second to a short cross-reference so the explanation lives in one place. - `.gitea/workflows/review-reusable.yml:41` — The inline comment says "Set any input to `""` to fall back to the image/entrypoint built-in default" as a general rule, but the inputs block now mixes empty-string defaults (`provider`, `base_url`, `worker_model`, `trigger_phrase`, `allowed_users`) with curated non-empty defaults (`models`, `specialists`, `provider_concurrency`, `provider_lens_concurrency`, `timeout_secs`, `max_steps`). Empty-string defaults do fall through to image defaults; non-empty curated defaults fall back to the workflow default, not the image. A per-input marker (`# "" → image default`) on the five empty-default fields would prevent a maintainer from conflating the two. - `.gitea/workflows/adversarial-review.yml:33-35` vs `:54` — The `# Security:` comment is the only place that explains the actor-gate mirrors `allowed_users`, and it sits ~17 lines above the input it mirrors. A one-line `# mirror of allowed_users below (and the in-container belt-and-suspenders check)` next to the `with:` block would make the security coupling obvious at the point of definition. - `examples/reusable.yml:67-68` — The comment "Other inputs inherit the default (5-lens suite, concurrency, 90-min cap); override any of them here…" sits immediately above `allowed_users: "your-username"`, which IS being explicitly set — not inherited. A copy-paste reader would reasonably conclude the `allowed_users:` line should be deleted too. Move the comment above `models:` (where it's accurate) or split it into two annotations. - `.gitea/workflows/review-reusable.yml:54-55` — The timeout rationale says "over two passes can run long", but the workflow file never defines what a "pass" is (and the phrase doesn't appear elsewhere in the repo). Either expand to "two review passes (see `entrypoint.sh`)" or drop the phrase. - `CLAUDE.md:52` — The lone remaining "(Phase 4)" reference, now orphaned since the workflow files dropped the term. Either restore a matching reference in `.gitea/workflows/review-reusable.yml` or drop the parenthetical so the doc and the workflow agree. </details> <details><summary>⚡ Performance — Blocking issues found</summary> **VERDICT: Blocking issues found** - **`.gitea/workflows/review-reusable.yml:43` — ~6× compute cost per review is now the default, contradicting documented intent.** Verified: `entrypoint.sh:52` ships `DEFAULT_MODELS="qwen3-coder:480b-cloud"` (one model), and `CLAUDE.md:103` explicitly warns *"Cost ≈ specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one)."* This PR replaces that with **6 models** as the default `models` input (`minimax-m3:cloud, glm-5.2:cloud, deepseek-v4-pro:cloud, claude-code/sonnet, claude-code/opus, claude-code/opus:max`). With the new 5-lens default suite (`review-reusable.yml:44`), that is `5 specialists × 6 models × 2 passes = 60 model invocations per PR` — every consumer that omits `with:` now runs ~6× the token/$$ of the prior default. The PR compensates by setting `timeout_secs` to `600` (line 49) and the job cap to `90` (line 56, up from 45), but raising the budget makes the oversubscription easier to fit, not cheaper to run. `CLAUDE.md:103` reads as a deliberate design constraint that this PR overrides silently. Either keep the "default = cheap" invariant (1–2 models, opt into the swarm) or update `CLAUDE.md` and the README cost note explicitly so consumers know what they're inheriting. - **`.gitea/workflows/review-reusable.yml:47-48` — multiplicative backend fan-out is now the default, and the in-repo code already warns against this.** Verified: `cmd/gadfly/main.go:230-233` warns *"Caution: this fans out WITHIN one model. It multiplies with entrypoint.sh's per-provider model concurrency, so total concurrent backend requests ≈ (models at once) × (lenses at once)."* The new defaults ship `provider_concurrency: ollama-cloud=3,claude-code=3` (line 47) × `provider_lens_concurrency: ollama-cloud=3` (line 48) → **9 simultaneous Ollama Cloud requests by default** for every consumer that just calls the reusable. Every consumer that omits `with:` also inherits 3 concurrent `claude` CLI processes (each spawning Node children in their own process group per `engine.go:159`'s `Setpgid: true`). This is fine for a curated set of repos, but a "default" implies safe-for-anyone; the multiplicative fan-out is the exact pattern the codebase flags as a footgun. Consider dropping `provider_lens_concurrency`'s default to `""` (fall through to the image's safer built-in default) and letting consumers raise it explicitly, or pick `provider_concurrency: ollama-cloud=2,claude-code=2` to keep parallelism but cap the multiplier. - **`.gitea/workflows/review-reusable.yml:50` — `max_steps: 14` (vs. image default `defaultMaxSteps = 24` in `cmd/gadfly/main.go:81`)** is a perf-positive change (fewer wasted tool steps per lens) — verified, flagging only so it's not a silent behavior drift: any consumer that previously omitted `with:` and relied on the image's 24-step cap now silently gets 14. That is the *intended* direction here, just worth a one-liner in the PR description / `CHANGELOG` so consumers know the cap shrank. Verified by reading: `entrypoint.sh:50-52`, `cmd/gadfly/main.go:80-90,220-234`, `cmd/gadfly/engine.go:63-77,150-165`, `.gitea/workflows/review-reusable.yml`, `.gitea/workflows/adversarial-review.yml`, `CLAUDE.md:98-107`. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> VERDICT: **No material issues found** Reviewed the diff through the error-handling & edge-cases lens. The change is purely workflow YAML + docs (no Go/shell code touched), so I focused on whether the new defaults introduce any new failure modes, swallowed errors, or unhandled edge cases in the existing plumbing they feed into. What I verified: - **`models` default** (`review-reusable.yml:43`): all six specs parse cleanly via `entrypoint.sh`'s `provider_of` (`claude-code/*` → lane `claude-code`, bare `*:cloud` → lane `ollama-cloud`). `provider_concurrency` and `provider_lens_concurrency` defaults cover both lanes — no provider falls through with an unset cap. - **`specialists` default** (`review-reusable.yml:44`): all five names (`security, correctness, maintainability, performance, error-handling`) are present in `cmd/gadfly/specialists.go`'s `builtinSpecialists` and exactly match the `defaultSuite` fallback — no "unknown specialist" warnings. - **Empty-input fallback contract**: confirmed each defaulted env falls back to the image's built-in default when set to `""`, as the new docs claim: - `GADFLY_MODELS` → `entrypoint.sh:156` (`${GADFLY_MODELS:-…$DEFAULT_MODELS}`). - `GADFLY_SPECIALISTS` → `cmd/gadfly/specialists.go:136` (empty trimmed → `defaultSuite`). - `GADFLY_PROVIDER_CONCURRENCY` → `provider_cap` (entrypoint.sh:167) returns `DEFAULT_CONC` when no/blank entry matches. - `GADFLY_TIMEOUT_SECS` / `GADFLY_MAX_STEPS` → `envInt` (main.go:450) treats `""`, unparseable, and `<=0` as missing → returns the built-in default. - **Defensive caps preserved**: `run_lane` (entrypoint.sh:208) still guards `[ "$cap" -ge 1 ] 2>/dev/null || cap=1`, so even a malformed `provider_concurrency` value can't drive concurrency to zero/negative. - **Per-model failure isolation unchanged**: `run_lane` still runs models with `review_one … || log "model $1 failed (continuing)"`, so missing `OLLAMA_CLOUD_API_KEY` or `CLAUDE_CODE_OAUTH_TOKEN` won't crash the swarm — individual lenses mark themselves errored on the status board and the run completes. - **`timeout_minutes: 90`**: callers that previously pinned 90 now inherit the same value from the reusable default — no behavior change for existing callers (`adversarial-review.yml` dropped its redundant `timeout_minutes: 90` line, matching the new default). Nothing in this diff introduces a swallowed error, missing cleanup, panic on bad input, or an unhandled nil/empty/zero boundary. The "inherit by default, override per-input, empty falls back to image default" contract holds end-to-end. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 59s

gitea-actions bot commented

2026-06-28 02:05:48 +00:00

🪰 Gadfly review — `claude-code/opus` (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

This PR is a defaults-and-docs change to the reusable workflow plumbing. I traced the security-relevant paths (secret forwarding, actor/authz gating, untrusted-input flow) against the actual files and found nothing material in my lens.

What I verified:

Authz gate preserved. The caller-side actor gate in .gitea/workflows/adversarial-review.yml:36-41 still restricts issue_comment triggers to steve/fizi/dazed, and allowed_users: "steve,fizi,dazed" (line 54) keeps the in-container belt-and-suspenders check (entrypoint.sh:68-83) intact. The example stub examples/reusable.yml:48-50,69 likewise retains both its if: gate and allowed_users. The diff does not weaken either.
Least-privilege secret forwarding unchanged. review-reusable.yml still declares each secret explicitly (:64-74) and maps them through env (:104-118); the recommendation against secrets: inherit is unchanged. Adding claude-code/* to the default models only means the run uses CLAUDE_CODE_OAUTH_TOKEN, which is forwarded the same explicit way (adversarial-review.yml:47) and resolves to empty if a caller omits it — no new leakage path.
No new tainted-input sink. The PR touches only input defaults (model list, specialist names, concurrency, timeouts) and comments/docs. The untrusted-data surfaces (COMMENT_BODY, ACTOR, etc.) and their handling in entrypoint.sh are untouched, so no new injection/SSRF/deserialization vector is introduced.
allowed_users default of "" is pre-existing. With it empty, actor_allowed() falls back to a Gitea collaborator (write-access) check (entrypoint.sh:78-82) — a reasonable control, and not a regression introduced here (it defaulted to "" before this PR too).

Note on scope: the per-lens timeout and the 45→90 wall-clock bump are advisory/cost concerns, not security — those belong to other lenses.

🎯 Correctness — No material issues found

VERDICT: No material issues found (Correctness lens)

This PR is purely a defaults/config + docs change in the reusable workflow YAML. I verified the substance against the checked-out code rather than the diff alone:

Specialist suite matches the code. review-reusable.yml:44 default security,correctness,maintainability,performance,error-handling is identical to defaultSuite in cmd/gadfly/specialists.go:67, and every name is a registered built-in lens (specialists.go:26-44). No invalid/typo'd lens.
Model→provider derivation is consistent with the concurrency map. entrypoint.sh:159 provider_of returns ollama-cloud for the three bare *:cloud ids and claude-code for the three claude-code/* specs. The default provider_concurrency: "ollama-cloud=3,claude-code=3" (review-reusable.yml:47) therefore keys correctly onto exactly 3 cloud + 3 claude models — so "all three claudes at once" is accurate, not off-by-one.
claude-code/opus:max parses correctly. Verified newClaudeCodeEngine (engine.go:83-108): the :max suffix is split off as the thinking tier (→ maxThinkingTokens, engine.go:114), leaving model=opus. The :max is not mistaken for part of the model id.
Input→env plumbing is intact. Every new-default input is wired to its GADFLY_* env (review-reusable.yml:120-130); omitting with: in a caller applies these defaults normally. The slimmed adversarial-review.yml still forwards the secrets the default swarm needs (Ollama + Claude Code) at lines 46-47.
No status-file collisions from the new model set: status_file_for (entrypoint.sh:165) sanitizes claude-code/opus → claude-code_opus and claude-code/opus:max → claude-code_opus_max, which stay distinct.

Nothing in the correctness lens is materially wrong. (timeout_secs=600/max_steps=14/provider_lens_concurrency=ollama-cloud=3 carry over the previous explicit values, so there's no behavioral regression for me to flag — and tuning judgment on those falls under the performance lens, not mine.)

🧹 Code cleanliness & maintainability — Minor issues

I've verified all claims against the actual code. Let me confirm my assessment:

Duplicated default-swarm prose: Confirmed. The roster description is genuinely repeated across review-reusable.yml:15-19 and :35-41 (two blocks, same file), adversarial-review.yml:6-7 and :51-53, and examples/reusable.yml:8-12 and :63-66. The single source of truth is the inputs: defaults at review-reusable.yml:43-56. All citations check out.
The concurrency "outside my lens" flag: I investigated lensConcurrency() (cmd/gadfly/main.go:281-286). When provider_lens_concurrency has no claude-code entry, it gracefully falls back to the GADFLY_LENS_CONCURRENCY scalar (default 1) — claude-code lenses run serially while the 3 claude models run concurrently. That asymmetry is handled cleanly by the code and is a plausible deliberate choice (claude-code with extended thinking is slow/expensive). It is not a confirmed defect, and the draft itself only raised it as an open question — dropping it.

VERDICT: Minor issues

One confirmed, low-severity maintainability concern:

Duplicated default-swarm prose, spread across ~6 comment blocks. The same roster description ("3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite, all three claudes concurrent") is repeated as free text in review-reusable.yml:15-19 and again at review-reusable.yml:35-41 (same file, two blocks), plus adversarial-review.yml:6-7 and :51-53, and examples/reusable.yml:8-12 and :63-66. The single source of truth — the inputs: defaults at review-reusable.yml:43-56 — is clean, but the prose copies will drift the next time the swarm changes (this PR itself had to hand-edit every copy). Concrete fix: in review-reusable.yml collapse the two near-identical blocks (file header 15-19 and the workflow_call comment 35-41) into one, and in the satellite files replace the spelled-out roster with a back-reference ("inherits the default swarm — see review-reusable.yml") rather than re-listing the exact models/lenses. Mild because the repo's house style is deliberately verbose comments, so the redundancy partly follows convention.

⚡ Performance — No material issues found

I've now verified the full timing/concurrency model end-to-end. Here's my assessment through the Performance lens.

VERDICT: No material issues found

This is a config-defaults change (YAML/markdown only — no code paths altered). I traced the actual cost/wall-clock implications against entrypoint.sh and cmd/gadfly/main.go rather than eyeballing the diff:

Cost is comparable to the prior config, not a regression. Cost ≈ models × specialists × 2 passes. The old adversarial-review.yml ran 9 models × 3 lenses = 54 pass-units; the new default runs 6 models × 5 lenses = 60 pass-units. Marginally higher, not a step-change, and it's an intentional, documented "opinionated default."
The 90-min job cap comfortably fits the worst-case lane. Verified the concurrency mapping: provider_lens_concurrency only lists ollama-cloud=3, so the claude-code lane falls back to defaultLensConcurrency = 1 (main.go:281-285, :99) — its 5 lenses run sequentially. With per-lens GADFLY_TIMEOUT_SECS=600 shared across review+recheck (main.go:310-311), a single claude model is bounded at 5 × 10 min = 50 min; the three claude models run concurrently (claude-code=3), so the lane ≈ 50 min. The ollama lane (3 models × 3 lenses concurrent) is faster (~20 min). Lanes run in parallel, so worst-case wall-clock ≈ 50 min < 90 min cap. The 45→90 bump is adequate and is a ceiling (kill-guard), not a guaranteed runtime.
No new N+1 / unbounded growth / hot-loop work. The status board still renders one comment; backend fan-out (models × lenses, e.g. 9 concurrent ollama-cloud requests) is unchanged in mechanism and deliberately bounded by the existing caps.

One thing I checked and deliberately did not flag as an issue: the claude-code lane runs its 5 lenses serially (lens-concurrency 1) because the lens-concurrency map omits claude-code. That's slower per-model than it could be, but (a) it fits the cap with margin, and (b) raising it would put 3 models × N lenses concurrent claude-CLI processes against the same token, so serial-lens there is a reasonable rate-limit tradeoff, not a performance defect.

Nothing in my lane is materially wrong.

🧯 Error handling & edge cases — No material issues found

I've reviewed the actual flow end-to-end. This diff is config-default and documentation changes only; the error-handling paths it touches live in the unchanged scripts, so the relevant question for my lens is whether the new defaults drive any unhappy path that isn't gracefully handled. I traced the most obvious one.

VERDICT: No material issues found

This PR only changes input default values in review-reusable.yml plus docs/examples. No control-flow or error-handling code changed. I verified the one edge case the new defaults actually introduce:

Inheriting the default models without CLAUDE_CODE_OAUTH_TOKEN (the new default now includes claude-code/sonnet|opus|opus:max). A consumer that forwards only OLLAMA_CLOUD_API_KEY and omits with: will run three claude-code reviewers with no auth. I traced this path and it is handled gracefully, not a crash or CI failure:
- entrypoint.sh:181 always invokes run.sh with PROVIDER=ollama; the per-model failure is swallowed (run.sh:183 ... || log "...(continuing)").
- In run.sh, the up-front key guard (run.sh:151) only fires for ollama-cloud; a claude-code/* model skips it and runs the binary. If the claude CLI exits non-zero or empty, run.sh:176-181 captures rc and replaces REVIEW with a ⚠️ … failed comment, then run.sh still exits 0.
- Net effect: three warning comments get posted, the job does not fail or block. The advisory-only invariant holds. This is noisy but not an error-handling defect.
The examples/reusable.yml cloud-only models: override (examples/reusable.yml:66) inherits provider_concurrency: "ollama-cloud=3,claude-code=3"; the stray claude-code=3 entry is inert with no claude models — provider_cap/run_lane (entrypoint.sh:167-218) only build lanes for providers that appear in the model list, so no empty lane or divide-by-anything. Verified.
New numeric/string defaults (timeout_secs: "600", max_steps: "14", provider_lens_concurrency: "ollama-cloud=3", timeout_minutes: 90) are all well-formed for their existing parsers; the per-lens timeout (run.sh/binary) still bounds a hung lane independent of the 90-min job cap, so no starvation regression.

Nothing in the error-handling / edge-case lens is materially wrong here. The unhappy paths the new defaults can reach were already defensively handled by the unchanged scripts.

(Note: I did not produce a plan file — the actual task here is an adversarial code review, which is read-only, so I've delivered the review directly.)

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 27s}

### 🪰 Gadfly review — `claude-code/opus` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> ## VERDICT: No material issues found This PR is a defaults-and-docs change to the reusable workflow plumbing. I traced the security-relevant paths (secret forwarding, actor/authz gating, untrusted-input flow) against the actual files and found nothing material in my lens. What I verified: - **Authz gate preserved.** The caller-side actor gate in `.gitea/workflows/adversarial-review.yml:36-41` still restricts `issue_comment` triggers to `steve/fizi/dazed`, and `allowed_users: "steve,fizi,dazed"` (line 54) keeps the in-container belt-and-suspenders check (`entrypoint.sh:68-83`) intact. The example stub `examples/reusable.yml:48-50,69` likewise retains both its `if:` gate and `allowed_users`. The diff does not weaken either. - **Least-privilege secret forwarding unchanged.** `review-reusable.yml` still declares each secret explicitly (`:64-74`) and maps them through env (`:104-118`); the recommendation against `secrets: inherit` is unchanged. Adding `claude-code/*` to the default `models` only means the run *uses* `CLAUDE_CODE_OAUTH_TOKEN`, which is forwarded the same explicit way (`adversarial-review.yml:47`) and resolves to empty if a caller omits it — no new leakage path. - **No new tainted-input sink.** The PR touches only input *defaults* (model list, specialist names, concurrency, timeouts) and comments/docs. The untrusted-data surfaces (`COMMENT_BODY`, `ACTOR`, etc.) and their handling in `entrypoint.sh` are untouched, so no new injection/SSRF/deserialization vector is introduced. - **`allowed_users` default of `""` is pre-existing.** With it empty, `actor_allowed()` falls back to a Gitea collaborator (write-access) check (`entrypoint.sh:78-82`) — a reasonable control, and not a regression introduced here (it defaulted to `""` before this PR too). Note on scope: the per-lens timeout and the 45→90 wall-clock bump are advisory/cost concerns, not security — those belong to other lenses. </details> <details><summary>🎯 Correctness — No material issues found</summary> **VERDICT: No material issues found** (Correctness lens) This PR is purely a defaults/config + docs change in the reusable workflow YAML. I verified the substance against the checked-out code rather than the diff alone: - **Specialist suite matches the code.** `review-reusable.yml:44` default `security,correctness,maintainability,performance,error-handling` is identical to `defaultSuite` in `cmd/gadfly/specialists.go:67`, and every name is a registered built-in lens (`specialists.go:26-44`). No invalid/typo'd lens. - **Model→provider derivation is consistent with the concurrency map.** `entrypoint.sh:159` `provider_of` returns `ollama-cloud` for the three bare `*:cloud` ids and `claude-code` for the three `claude-code/*` specs. The default `provider_concurrency: "ollama-cloud=3,claude-code=3"` (`review-reusable.yml:47`) therefore keys correctly onto exactly 3 cloud + 3 claude models — so "all three claudes at once" is accurate, not off-by-one. - **`claude-code/opus:max` parses correctly.** Verified `newClaudeCodeEngine` (`engine.go:83-108`): the `:max` suffix is split off as the thinking tier (→ `maxThinkingTokens`, `engine.go:114`), leaving `model=opus`. The `:max` is not mistaken for part of the model id. - **Input→env plumbing is intact.** Every new-default input is wired to its `GADFLY_*` env (`review-reusable.yml:120-130`); omitting `with:` in a caller applies these defaults normally. The slimmed `adversarial-review.yml` still forwards the secrets the default swarm needs (Ollama + Claude Code) at lines 46-47. - **No status-file collisions** from the new model set: `status_file_for` (`entrypoint.sh:165`) sanitizes `claude-code/opus` → `claude-code_opus` and `claude-code/opus:max` → `claude-code_opus_max`, which stay distinct. Nothing in the correctness lens is materially wrong. (`timeout_secs=600`/`max_steps=14`/`provider_lens_concurrency=ollama-cloud=3` carry over the previous explicit values, so there's no behavioral regression for me to flag — and tuning judgment on those falls under the performance lens, not mine.) </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> I've verified all claims against the actual code. Let me confirm my assessment: - **Duplicated default-swarm prose**: Confirmed. The roster description is genuinely repeated across `review-reusable.yml:15-19` and `:35-41` (two blocks, same file), `adversarial-review.yml:6-7` and `:51-53`, and `examples/reusable.yml:8-12` and `:63-66`. The single source of truth is the `inputs:` defaults at `review-reusable.yml:43-56`. All citations check out. - **The concurrency "outside my lens" flag**: I investigated `lensConcurrency()` (`cmd/gadfly/main.go:281-286`). When `provider_lens_concurrency` has no `claude-code` entry, it gracefully falls back to the `GADFLY_LENS_CONCURRENCY` scalar (default 1) — claude-code lenses run serially while the 3 claude *models* run concurrently. That asymmetry is handled cleanly by the code and is a plausible deliberate choice (claude-code with extended thinking is slow/expensive). It is not a confirmed defect, and the draft itself only raised it as an open question — dropping it. --- ## VERDICT: Minor issues One confirmed, low-severity maintainability concern: - **Duplicated default-swarm prose, spread across ~6 comment blocks.** The same roster description ("3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite, all three claudes concurrent") is repeated as free text in `review-reusable.yml:15-19` **and again** at `review-reusable.yml:35-41` (same file, two blocks), plus `adversarial-review.yml:6-7` and `:51-53`, and `examples/reusable.yml:8-12` and `:63-66`. The single source of truth — the `inputs:` defaults at `review-reusable.yml:43-56` — is clean, but the prose copies will drift the next time the swarm changes (this PR itself had to hand-edit every copy). **Concrete fix:** in `review-reusable.yml` collapse the two near-identical blocks (file header 15-19 and the `workflow_call` comment 35-41) into one, and in the satellite files replace the spelled-out roster with a back-reference ("inherits the default swarm — see `review-reusable.yml`") rather than re-listing the exact models/lenses. Mild because the repo's house style is deliberately verbose comments, so the redundancy partly follows convention. </details> <details><summary>⚡ Performance — No material issues found</summary> I've now verified the full timing/concurrency model end-to-end. Here's my assessment through the Performance lens. ## VERDICT: No material issues found This is a config-defaults change (YAML/markdown only — no code paths altered). I traced the actual cost/wall-clock implications against `entrypoint.sh` and `cmd/gadfly/main.go` rather than eyeballing the diff: - **Cost is comparable to the prior config, not a regression.** Cost ≈ models × specialists × 2 passes. The old `adversarial-review.yml` ran 9 models × 3 lenses = 54 pass-units; the new default runs 6 models × 5 lenses = 60 pass-units. Marginally higher, not a step-change, and it's an intentional, documented "opinionated default." - **The 90-min job cap comfortably fits the worst-case lane.** Verified the concurrency mapping: `provider_lens_concurrency` only lists `ollama-cloud=3`, so the **claude-code lane falls back to `defaultLensConcurrency = 1`** (`main.go:281-285`, `:99`) — its 5 lenses run sequentially. With per-lens `GADFLY_TIMEOUT_SECS=600` shared across review+recheck (`main.go:310-311`), a single claude model is bounded at 5 × 10 min = 50 min; the three claude models run concurrently (`claude-code=3`), so the lane ≈ 50 min. The ollama lane (3 models × 3 lenses concurrent) is faster (~20 min). Lanes run in parallel, so worst-case wall-clock ≈ 50 min < 90 min cap. The 45→90 bump is adequate and is a ceiling (kill-guard), not a guaranteed runtime. - **No new N+1 / unbounded growth / hot-loop work.** The status board still renders one comment; backend fan-out (`models × lenses`, e.g. 9 concurrent ollama-cloud requests) is unchanged in mechanism and deliberately bounded by the existing caps. One thing I checked and deliberately did **not** flag as an issue: the claude-code lane runs its 5 lenses serially (lens-concurrency 1) because the lens-concurrency map omits `claude-code`. That's slower per-model than it could be, but (a) it fits the cap with margin, and (b) raising it would put 3 models × N lenses concurrent claude-CLI processes against the same token, so serial-lens there is a reasonable rate-limit tradeoff, not a performance defect. Nothing in my lane is materially wrong. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I've reviewed the actual flow end-to-end. This diff is config-default and documentation changes only; the error-handling paths it touches live in the unchanged scripts, so the relevant question for my lens is whether the *new defaults* drive any unhappy path that isn't gracefully handled. I traced the most obvious one. ## VERDICT: No material issues found This PR only changes input *default values* in `review-reusable.yml` plus docs/examples. No control-flow or error-handling code changed. I verified the one edge case the new defaults actually introduce: - **Inheriting the default `models` without `CLAUDE_CODE_OAUTH_TOKEN`** (the new default now includes `claude-code/sonnet|opus|opus:max`). A consumer that forwards only `OLLAMA_CLOUD_API_KEY` and omits `with:` will run three claude-code reviewers with no auth. I traced this path and it is handled gracefully, not a crash or CI failure: - `entrypoint.sh:181` always invokes `run.sh` with `PROVIDER=ollama`; the per-model failure is swallowed (`run.sh:183` `... || log "...(continuing)"`). - In `run.sh`, the up-front key guard (`run.sh:151`) only fires for `ollama-cloud`; a `claude-code/*` model skips it and runs the binary. If the `claude` CLI exits non-zero or empty, `run.sh:176-181` captures `rc` and replaces `REVIEW` with a `⚠️ … failed` comment, then `run.sh` still exits 0. - Net effect: three warning comments get posted, the job does not fail or block. The advisory-only invariant holds. This is noisy but not an error-handling defect. - The `examples/reusable.yml` cloud-only `models:` override (`examples/reusable.yml:66`) inherits `provider_concurrency: "ollama-cloud=3,claude-code=3"`; the stray `claude-code=3` entry is inert with no claude models — `provider_cap`/`run_lane` (`entrypoint.sh:167-218`) only build lanes for providers that appear in the model list, so no empty lane or divide-by-anything. Verified. - New numeric/string defaults (`timeout_secs: "600"`, `max_steps: "14"`, `provider_lens_concurrency: "ollama-cloud=3"`, `timeout_minutes: 90`) are all well-formed for their existing parsers; the per-lens timeout (`run.sh`/binary) still bounds a hung lane independent of the 90-min job cap, so no starvation regression. Nothing in the error-handling / edge-case lens is materially wrong here. The unhappy paths the new defaults can reach were already defensively handled by the unchanged scripts. (Note: I did not produce a plan file — the actual task here is an adversarial code review, which is read-only, so I've delivered the review directly.) </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 27s

gitea-actions bot commented

2026-06-28 02:05:49 +00:00

🪰 Gadfly review — `claude-code/opus:max` (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Reviewed through the 🔒 Security lens. This PR changes only workflow input defaults (models/specialists/concurrency/timeouts) plus docs. I traced every security-relevant control that could be affected and verified each against the checked-out files:

Secret forwarding (least privilege) — intact. review-reusable.yml:64-74 still declares each secret as required: false and forwards them explicitly (:105-118); the caller adversarial-review.yml:45-49 forwards only the four it uses, not secrets: inherit. No new secret is exposed and nothing switched to inherit. Verified by reading both files.
Authz / trigger gate — unchanged and preserved. The comment-trigger actor gate is in two layers, both still present: the workflow-level if: allow-list (adversarial-review.yml:36-41, hardcoded steve/fizi/dazed) and the in-container check actor_allowed() (entrypoint.sh:68-83), driven by the still-forwarded GADFLY_ALLOWED_USERS (review-reusable.yml:129, set to "steve,fizi,dazed" at :54). The allowed_users default stays "", which falls back to a collaborator/write-access check — not fail-open to the public. The PR doesn't touch this logic.
No new injection surface. The new defaults (models, provider_concurrency, etc.) are consumer-/maintainer-controlled inputs, not PR-author-controlled. They flow to env and are parsed in entrypoint.sh (provider_cap, model-list split) with fixed, non-tainted values. No untrusted-input path was added or widened.
Example stub guidance — safe. examples/reusable.yml:48-50 keeps the issue_comment actor gate, and :57-59 keeps explicit secret forwarding. The new "omit with: to inherit defaults" guidance does not remove any auth requirement.

Nothing in this change weakens authn/authz, leaks credentials, or adds an injection/SSRF/deserialization vector. Honest result for this lens: clean.

Note: I did not duplicate non-security observations (e.g. the doc/comment churn or the model-list correctness), which belong to other lenses.

(Plan mode is active, but a review is read-only — I made no edits and there is no implementation to plan; this is the requested review output.)

🎯 Correctness — No material issues found

VERDICT: No material issues found

Reviewed strictly through the Correctness lens. This PR only moves default values into the reusable workflow's inputs block and updates docs — no executable code changed. I verified the new defaults actually resolve and behave correctly end-to-end:

Defaults are wired to env (not dead config). Every input I worried about is mapped to its GADFLY_* env var in review-reusable.yml:120-130, so the new defaults genuinely flow to entrypoint.sh. Verified by reading the full step env block.
Model specs all parse. claude-code/sonnet|opus|opus:max and the :cloud-suffixed cloud ids are valid; claude-code/opus:max → max thinking tokens is explicitly covered by engine_test.go:229, and provider_of in entrypoint.sh:159 correctly extracts claude-code from claude-code/opus:max (splits on first /).
Concurrency values parse correctly. provider_concurrency: "ollama-cloud=3,claude-code=3" → provider_cap (entrypoint.sh:167-176) yields cap 3 for the claude-code lane, and there are exactly 3 claude models, so the "all three claudes at once" claim holds.
The 45→90 min cap is consistent with the per-lens timeout. GADFLY_TIMEOUT_SECS is the deadline per specialist shared across both passes (main.go:48, 82-86), so worst case is 5 lenses × 600s = 50 min when a lane runs lenses serially (claude-code has no provider_lens_concurrency entry → falls back to default 1). 50 < 90, so the wall-clock default doesn't truncate a normal run. No off-by-timeout bug.
Slimmed caller still forwards the secrets the new default needs. adversarial-review.yml:46-47 forwards both OLLAMA_CLOUD_API_KEY and CLAUDE_CODE_OAUTH_TOKEN, which the now-claude-inclusive inherited default requires; the local uses: ./... ref means this PR's review really does exercise the new defaults.
Cloud-only example is internally consistent. examples/reusable.yml overrides only models: to three :cloud ids; inheriting claude-code=3 in provider_concurrency is a harmless unused entry since no claude models are present.

Nothing in the correctness lane is wrong here. (Note: README/examples doc-accuracy obligations and the lens-concurrency tuning trade-off belong to the maintainability/performance lenses, so I'm leaving those to those reviewers.)

🧹 Code cleanliness & maintainability — Minor issues

Both findings verified against the actual files.

Finding 1 — confirmed. review-reusable.yml:15-19 (file header) and :35-41 (the workflow_call.inputs comment) both describe the same default swarm in near-verbatim prose, ~20 lines apart, both added by this PR. Header: "Inputs ship the DEFAULT swarm … 3 cloud models + the Claude Code engine, 5-lens suite, all three claudes concurrent." Inputs block: "Inputs ship the DEFAULT Gadfly swarm … 3 strong cloud models + the Claude Code engine (sonnet/opus/opus:max), the 5-lens suite, all three claudes concurrent." Genuine duplication that will drift.

Finding 2 — confirmed. examples/README.md:9 is not in the diff (unchanged) and now reads "take the defaults or override a few inputs | secret OLLAMA_CLOUD_API_KEY." But this PR's new defaults require both OLLAMA_CLOUD_API_KEY and CLAUDE_CODE_OAUTH_TOKEN (per review-reusable.yml:43 + the new note at reusable.yml:8-12), and the example it points to (examples/reusable.yml:62-66) explicitly overrides models: to a cloud-only set so it does NOT take the defaults. The row contradicts the example and the new defaults; per the CLAUDE.md maintenance rule it should have been updated in this change.

Both survive.

VERDICT: Minor issues

Duplicated default-swarm blurb within review-reusable.yml (.gitea/workflows/review-reusable.yml:15-19 and :35-41). This PR adds the same paragraph twice in one file ~20 lines apart: the file header says "Inputs ship the DEFAULT swarm … 3 cloud models + the Claude Code engine, 5-lens suite, all three claudes concurrent," and the new workflow_call inputs comment repeats it almost verbatim ("Inputs ship the DEFAULT Gadfly swarm … 3 strong cloud models + the Claude Code engine (sonnet/opus/opus:max), the 5-lens suite, all three claudes concurrent"). Two copies will drift independently when the swarm changes. Fix: keep the detailed description on the inputs: block (next to the actual defaults) and reduce the header to a one-line pointer, or vice-versa.
examples/README.md:9 is now stale and was not updated with this change (maintenance rule: "Keep the README and examples/ current … in the same change"). The row still reads "take the defaults or override a few inputs | secret OLLAMA_CLOUD_API_KEY." After this PR the reusable's actual defaults require both OLLAMA_CLOUD_API_KEY and CLAUDE_CODE_OAUTH_TOKEN; the shipped examples/reusable.yml only works with the single key because it overrides models: to a cloud-only set (examples/reusable.yml:66) — i.e. it explicitly does not take the defaults. The "take the defaults" wording now contradicts the example it points at. Fix: reword to something like "cloud-only override out of the box; delete it (and forward the Claude token) to inherit the full default swarm."

Both are minor wording/consistency issues, not functional. Everything else checked is consistent: the reusable defaults (models, specialists, provider_concurrency, timeout_secs, max_steps, timeout_minutes:90) match what adversarial-review.yml, the README blurb, and CLAUDE.md now describe; the slimmed adversarial-review.yml correctly drops the redundant with: overrides and keeps only allowed_users; no stale references to the removed models (glm-5.1, nemotron-3-super, qwen3-coder:480b-cloud) or the old claude-code=2 / timeout_minutes: 45 survive in README.md/examples/; model-list strings are spacing-consistent across review-reusable.yml:43 and examples/reusable.yml:66.

⚡ Performance — No material issues found

Verified the concurrency/timeout model against entrypoint.sh and cmd/gadfly/main.go so I could check the wall-clock math behind the new defaults.

VERDICT: No material issues found

This PR is config-defaults + docs only — no algorithmic code, no hot loops, no query/allocation paths. Through the performance lens the relevant question is whether the new swarm defaults fit their wall-clock budget. They do.

Wall-clock math checks out (verified):

GADFLY_TIMEOUT_SECS is per-lens, shared by both passes (cmd/gadfly/main.go:48), default 300, now defaulted to 600 (10 min/lens).
Lens concurrency: provider_lens_concurrency defaults to ollama-cloud=3 only — claude-code is omitted, so claude lenses run serially at the scalar default of 1 (main.go:282-285).
Lanes run in parallel; within a lane provider_concurrency caps models-at-once (entrypoint.sh:206-217). claude-code=3 runs all three claudes concurrently.
Worst-case lane = claude-code: 5 lenses × 600s serial ≈ 50 min per model, models concurrent → ~50 min for the lane. Cloud lane: ⌈5/3⌉ × 600 ≈ 20 min. Binding constraint ≈ 50 min + clone/board overhead.
So the timeout_minutes: 45 → 90 bump is necessary and correct, not a regression: the old 45-min cap would have killed the 5-lens serial claude lane mid-run. The new cap leaves comfortable margin.

One observation, not a defect (deliberate tradeoff): provider_lens_concurrency sets ollama-cloud=3 but no entry for claude-code, so each claude model serializes its 5 lenses (the ~50-min lane). Parallelizing claude lenses would cut that lane's wall-clock substantially, but doing so under a single OAuth token / extended-thinking model would likely hit rate limits and isn't clearly a win — so leaving it serial is a reasonable choice. Flagging only so it's a conscious decision, not an oversight.

Cost footprint is out of my lane: the default jumps from the image's 1-model default to 6 models × 5 lenses × 2 passes (≈60 passes/PR) for any consumer who omits with:. That's an intentional, documented "curated swarm" cost decision — a maintainability/cost question, not an efficiency regression, so I'll leave it to that lens.

No performance changes to request.

🧯 Error handling & edge cases — No material issues found

I've verified the unhappy-path behavior this default change reaches. Here's my review through the error-handling & edge-cases lens.

VERDICT: No material issues found

This diff is config-only: it changes default input values in review-reusable.yml, slims adversarial-review.yml to inherit them, and updates docs. It touches no error-handling code — the machinery that handles the unhappy paths lives in entrypoint.sh / scripts/run.sh, which this PR doesn't modify. I checked the edge cases the new default could newly trigger, and all degrade gracefully:

New default needs two creds, but a consumer may forward only one. The default models now includes claude-code/sonnet|opus|opus:max alongside the cloud models. I verified a consumer forwarding only OLLAMA_CLOUD_API_KEY (the historically common case) does not crash or block:
- Missing Ollama key → scripts/run.sh:151-152 posts a per-model "⚠️ … skipped" notice.
- Claude-code models with a missing CLAUDE_CODE_OAUTH_TOKEN → the binary exits nonzero, caught at scripts/run.sh:176-181 and surfaced as a "⚠️ Agentic reviewer … failed" comment. Per-model failure is further isolated by entrypoint.sh:183 (|| log "model … failed (continuing)"), so one failing model never aborts the lane or the job. Advisory-only invariant holds. This is noisy (3 extra warning comments) but that's a docs/UX concern, not an error-handling defect — and the PR explicitly documents the two-credential requirement.
- The status board's "never-started" edge (skipped/failed model) is already handled at entrypoint.sh:189-191, so the board still reaches N/N.
Empty-string fallback path is preserved. Each input still resolves to "" → image default if explicitly overridden, and entrypoint.sh/run.sh already treat empty config envs as "use built-in default." Changing the YAML default from "" to a concrete value doesn't open a new nil/empty branch.

One thing I considered and deliberately leave to the performance/correctness lens rather than claim here: timeout_secs=600 per lens × 5 lenses × 2 passes against the new 90-minute job cap is a tight wall-clock budget for the slowest claude-code model (lens concurrency is set only for ollama-cloud, not claude-code). That's a resource-budgeting/boundary question, not a swallowed-error or unhandled-input defect, so it's out of my lane — flagging it only so it isn't missed.

Within error handling & edge cases, the diff is clean.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 3s}

### 🪰 Gadfly review — `claude-code/opus:max` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> ## VERDICT: No material issues found Reviewed through the **🔒 Security** lens. This PR changes only workflow input *defaults* (models/specialists/concurrency/timeouts) plus docs. I traced every security-relevant control that could be affected and verified each against the checked-out files: - **Secret forwarding (least privilege) — intact.** `review-reusable.yml:64-74` still *declares* each secret as `required: false` and forwards them explicitly (`:105-118`); the caller `adversarial-review.yml:45-49` forwards only the four it uses, not `secrets: inherit`. No new secret is exposed and nothing switched to `inherit`. Verified by reading both files. - **Authz / trigger gate — unchanged and preserved.** The comment-trigger actor gate is in two layers, both still present: the workflow-level `if:` allow-list (`adversarial-review.yml:36-41`, hardcoded steve/fizi/dazed) and the in-container check `actor_allowed()` (`entrypoint.sh:68-83`), driven by the still-forwarded `GADFLY_ALLOWED_USERS` (`review-reusable.yml:129`, set to `"steve,fizi,dazed"` at `:54`). The `allowed_users` default stays `""`, which falls back to a collaborator/write-access check — not fail-open to the public. The PR doesn't touch this logic. - **No new injection surface.** The new defaults (`models`, `provider_concurrency`, etc.) are consumer-/maintainer-controlled inputs, not PR-author-controlled. They flow to env and are parsed in `entrypoint.sh` (`provider_cap`, model-list split) with fixed, non-tainted values. No untrusted-input path was added or widened. - **Example stub guidance — safe.** `examples/reusable.yml:48-50` keeps the `issue_comment` actor gate, and `:57-59` keeps explicit secret forwarding. The new "omit `with:` to inherit defaults" guidance does not remove any auth requirement. Nothing in this change weakens authn/authz, leaks credentials, or adds an injection/SSRF/deserialization vector. Honest result for this lens: clean. *Note: I did not duplicate non-security observations (e.g. the doc/comment churn or the model-list correctness), which belong to other lenses.* (Plan mode is active, but a review is read-only — I made no edits and there is no implementation to plan; this is the requested review output.) </details> <details><summary>🎯 Correctness — No material issues found</summary> ## VERDICT: No material issues found Reviewed strictly through the **Correctness** lens. This PR only moves default values into the reusable workflow's `inputs` block and updates docs — no executable code changed. I verified the new defaults actually resolve and behave correctly end-to-end: - **Defaults are wired to env (not dead config).** Every input I worried about is mapped to its `GADFLY_*` env var in `review-reusable.yml:120-130`, so the new defaults genuinely flow to `entrypoint.sh`. Verified by reading the full step env block. - **Model specs all parse.** `claude-code/sonnet|opus|opus:max` and the `:cloud`-suffixed cloud ids are valid; `claude-code/opus:max` → max thinking tokens is explicitly covered by `engine_test.go:229`, and `provider_of` in `entrypoint.sh:159` correctly extracts `claude-code` from `claude-code/opus:max` (splits on first `/`). - **Concurrency values parse correctly.** `provider_concurrency: "ollama-cloud=3,claude-code=3"` → `provider_cap` (`entrypoint.sh:167-176`) yields cap 3 for the `claude-code` lane, and there are exactly 3 claude models, so the "all three claudes at once" claim holds. - **The 45→90 min cap is consistent with the per-lens timeout.** `GADFLY_TIMEOUT_SECS` is the deadline per specialist shared across *both* passes (main.go:48, 82-86), so worst case is 5 lenses × 600s = 50 min when a lane runs lenses serially (claude-code has no `provider_lens_concurrency` entry → falls back to default 1). 50 < 90, so the wall-clock default doesn't truncate a normal run. No off-by-timeout bug. - **Slimmed caller still forwards the secrets the new default needs.** `adversarial-review.yml:46-47` forwards both `OLLAMA_CLOUD_API_KEY` and `CLAUDE_CODE_OAUTH_TOKEN`, which the now-claude-inclusive inherited default requires; the local `uses: ./...` ref means this PR's review really does exercise the new defaults. - **Cloud-only example is internally consistent.** `examples/reusable.yml` overrides only `models:` to three `:cloud` ids; inheriting `claude-code=3` in `provider_concurrency` is a harmless unused entry since no claude models are present. Nothing in the correctness lane is wrong here. (Note: README/`examples` doc-accuracy obligations and the lens-concurrency tuning trade-off belong to the maintainability/performance lenses, so I'm leaving those to those reviewers.) </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> Both findings verified against the actual files. **Finding 1** — confirmed. `review-reusable.yml:15-19` (file header) and `:35-41` (the `workflow_call.inputs` comment) both describe the same default swarm in near-verbatim prose, ~20 lines apart, both added by this PR. Header: "Inputs ship the DEFAULT swarm … 3 cloud models + the Claude Code engine, 5-lens suite, all three claudes concurrent." Inputs block: "Inputs ship the DEFAULT Gadfly swarm … 3 strong cloud models + the Claude Code engine (sonnet/opus/opus:max), the 5-lens suite, all three claudes concurrent." Genuine duplication that will drift. **Finding 2** — confirmed. `examples/README.md:9` is not in the diff (unchanged) and now reads "*take the defaults* or override a few inputs | secret `OLLAMA_CLOUD_API_KEY`." But this PR's new defaults require both `OLLAMA_CLOUD_API_KEY` and `CLAUDE_CODE_OAUTH_TOKEN` (per `review-reusable.yml:43` + the new note at `reusable.yml:8-12`), and the example it points to (`examples/reusable.yml:62-66`) explicitly overrides `models:` to a cloud-only set *so it does NOT take the defaults*. The row contradicts the example and the new defaults; per the CLAUDE.md maintenance rule it should have been updated in this change. Both survive. --- **VERDICT: Minor issues** - **Duplicated default-swarm blurb within `review-reusable.yml`** (`.gitea/workflows/review-reusable.yml:15-19` and `:35-41`). This PR adds the *same* paragraph twice in one file ~20 lines apart: the file header says "Inputs ship the DEFAULT swarm … 3 cloud models + the Claude Code engine, 5-lens suite, all three claudes concurrent," and the new `workflow_call` inputs comment repeats it almost verbatim ("Inputs ship the DEFAULT Gadfly swarm … 3 strong cloud models + the Claude Code engine (sonnet/opus/opus:max), the 5-lens suite, all three claudes concurrent"). Two copies will drift independently when the swarm changes. Fix: keep the detailed description on the `inputs:` block (next to the actual defaults) and reduce the header to a one-line pointer, or vice-versa. - **`examples/README.md:9` is now stale and was not updated with this change** (maintenance rule: "Keep the README and `examples/` current … in the *same* change"). The row still reads "*take the defaults* or override a few inputs | secret `OLLAMA_CLOUD_API_KEY`." After this PR the reusable's actual defaults require **both** `OLLAMA_CLOUD_API_KEY` and `CLAUDE_CODE_OAUTH_TOKEN`; the shipped `examples/reusable.yml` only works with the single key *because it overrides `models:` to a cloud-only set* (`examples/reusable.yml:66`) — i.e. it explicitly does **not** take the defaults. The "take the defaults" wording now contradicts the example it points at. Fix: reword to something like "cloud-only override out of the box; delete it (and forward the Claude token) to inherit the full default swarm." Both are minor wording/consistency issues, not functional. Everything else checked is consistent: the reusable defaults (`models`, `specialists`, `provider_concurrency`, `timeout_secs`, `max_steps`, `timeout_minutes:90`) match what `adversarial-review.yml`, the README blurb, and `CLAUDE.md` now describe; the slimmed `adversarial-review.yml` correctly drops the redundant `with:` overrides and keeps only `allowed_users`; no stale references to the removed models (`glm-5.1`, `nemotron-3-super`, `qwen3-coder:480b-cloud`) or the old `claude-code=2` / `timeout_minutes: 45` survive in `README.md`/`examples/`; model-list strings are spacing-consistent across `review-reusable.yml:43` and `examples/reusable.yml:66`. </details> <details><summary>⚡ Performance — No material issues found</summary> Verified the concurrency/timeout model against `entrypoint.sh` and `cmd/gadfly/main.go` so I could check the wall-clock math behind the new defaults. ## VERDICT: No material issues found This PR is config-defaults + docs only — no algorithmic code, no hot loops, no query/allocation paths. Through the performance lens the relevant question is whether the new swarm defaults fit their wall-clock budget. They do. **Wall-clock math checks out (verified):** - `GADFLY_TIMEOUT_SECS` is *per-lens, shared by both passes* (`cmd/gadfly/main.go:48`), default 300, now defaulted to **600** (10 min/lens). - Lens concurrency: `provider_lens_concurrency` defaults to `ollama-cloud=3` only — **claude-code is omitted**, so claude lenses run **serially** at the scalar default of 1 (`main.go:282-285`). - Lanes run in parallel; within a lane `provider_concurrency` caps models-at-once (`entrypoint.sh:206-217`). `claude-code=3` runs all three claudes concurrently. - Worst-case lane = claude-code: 5 lenses × 600s serial ≈ **50 min per model**, models concurrent → ~50 min for the lane. Cloud lane: ⌈5/3⌉ × 600 ≈ 20 min. Binding constraint ≈ 50 min + clone/board overhead. - So the `timeout_minutes: 45 → 90` bump is **necessary and correct**, not a regression: the old 45-min cap would have killed the 5-lens serial claude lane mid-run. The new cap leaves comfortable margin. **One observation, not a defect (deliberate tradeoff):** `provider_lens_concurrency` sets `ollama-cloud=3` but no entry for `claude-code`, so each claude model serializes its 5 lenses (the ~50-min lane). Parallelizing claude lenses would cut that lane's wall-clock substantially, but doing so under a single OAuth token / extended-thinking model would likely hit rate limits and isn't clearly a win — so leaving it serial is a reasonable choice. Flagging only so it's a conscious decision, not an oversight. **Cost footprint is out of my lane:** the default jumps from the image's 1-model default to 6 models × 5 lenses × 2 passes (≈60 passes/PR) for any consumer who omits `with:`. That's an intentional, documented "curated swarm" cost decision — a maintainability/cost question, not an efficiency regression, so I'll leave it to that lens. No performance changes to request. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I've verified the unhappy-path behavior this default change reaches. Here's my review through the error-handling & edge-cases lens. ## VERDICT: No material issues found This diff is config-only: it changes default input values in `review-reusable.yml`, slims `adversarial-review.yml` to inherit them, and updates docs. It touches no error-handling code — the machinery that handles the unhappy paths lives in `entrypoint.sh` / `scripts/run.sh`, which this PR doesn't modify. I checked the edge cases the new default could newly trigger, and all degrade gracefully: - **New default needs two creds, but a consumer may forward only one.** The default `models` now includes `claude-code/sonnet|opus|opus:max` alongside the cloud models. I verified a consumer forwarding only `OLLAMA_CLOUD_API_KEY` (the historically common case) does not crash or block: - Missing Ollama key → `scripts/run.sh:151-152` posts a per-model "⚠️ … skipped" notice. - Claude-code models with a missing `CLAUDE_CODE_OAUTH_TOKEN` → the binary exits nonzero, caught at `scripts/run.sh:176-181` and surfaced as a "⚠️ Agentic reviewer … failed" comment. Per-model failure is further isolated by `entrypoint.sh:183` (`|| log "model … failed (continuing)"`), so one failing model never aborts the lane or the job. Advisory-only invariant holds. This is noisy (3 extra warning comments) but that's a docs/UX concern, not an error-handling defect — and the PR explicitly documents the two-credential requirement. - The status board's "never-started" edge (skipped/failed model) is already handled at `entrypoint.sh:189-191`, so the board still reaches N/N. - **Empty-string fallback path is preserved.** Each input still resolves to `""` → image default if explicitly overridden, and `entrypoint.sh`/`run.sh` already treat empty config envs as "use built-in default." Changing the YAML default from `""` to a concrete value doesn't open a new nil/empty branch. One thing I considered and deliberately leave to the performance/correctness lens rather than claim here: `timeout_secs=600` per lens × 5 lenses × 2 passes against the new 90-minute job cap is a tight wall-clock budget for the slowest claude-code model (lens concurrency is set only for `ollama-cloud`, not `claude-code`). That's a resource-budgeting/boundary question, not a swallowed-error or unhandled-input defect, so it's out of my lane — flagging it only so it isn't missed. Within error handling & edge cases, the diff is clean. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 3s

gitea-actions bot commented

2026-06-28 02:05:49 +00:00

🪰 Gadfly review — `claude-code/sonnet` (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

No material issues found

This PR's security-relevant surface is entirely in how new workflow input defaults (models, specialists, provider_concurrency, etc.) propagate to the container. I verified all of the following:

Default strings are hardcoded YAML literals, not consumer-supplied at runtime. They reach the container as GADFLY_* env vars — the safe channel. No shell execution touches these values directly; entrypoint.sh parses the model CSV into a bash array and strips whitespace with tr, not eval or unquoted glob expansion.
CLAUDE_CODE_OAUTH_TOKEN is already properly declared in the workflow_call.secrets block (required: false) and is forwarded to the container via env at .gitea/workflows/review-reusable.yml:110. It is never logged, never passed as a CLI argument, and never echoed — it flows through env inheritance to the gadfly binary, same as every other API key.
Least-privilege secret forwarding is preserved. Both the updated adversarial-review.yml and examples/reusable.yml still enumerate secrets explicitly rather than using secrets: inherit. No new secrets are introduced.
allowed_users default remains "" (no-ACL-by-default), but this is unchanged behavior from before the PR. The entrypoint's username comparison (splitting on , via IFS) has no shell injection risk given Gitea's username character constraints.
No model-name injection risk. Even if a consumer passes an adversarial models: string, it ends up in a bash array element passed as MODEL="$value" to run.sh, then as GADFLY_MODEL="$MODEL" in a subprocess env — never evaluated as shell code.

🎯 Correctness — No material issues found

No material issues found

I verified the key correctness claims by reading entrypoint.sh, cmd/gadfly/main.go, and both workflow files.

Model-to-lane assignment (verified, entrypoint.sh:159, model.go:146-155): provider_of("claude-code/opus:max") → "claude-code" in both bash and Go. The provider_concurrency: "ollama-cloud=3,claude-code=3" cap therefore applies correctly to all three claude-code models, allowing all three to run in parallel as documented.
Lens concurrency for claude-code (verified, main.go:281-286): GADFLY_PROVIDER_LENS_CONCURRENCY="ollama-cloud=3" has no claude-code entry, so lensConcurrency() falls back to defaultLensConcurrency=1 — claude-code lenses run sequentially. This is intentional (not a bug), and the timeout math is sound: 5 lenses × 600s max = 3000s ≈ 50 min per model, with 3 models in parallel → ~50 min wall-clock for the claude-code lane, well inside the 90-min job cap.
timeout_secs shared across both passes (verified, main.go:310-324): Both the review pass and recheck pass consume the same 600s context per lens. This is pre-existing behavior, unchanged by this PR.
Empty-string fallback (verified, entrypoint.sh:156): MODELS="${GADFLY_MODELS:-...}" uses :-, so a consumer passing models: "" correctly falls back to DEFAULT_MODELS="qwen3-coder:480b-cloud" as the comment claims.
Cross-file consistency (verified): adversarial-review.yml's if: actor gate (steve, fizi, dazed) matches the allowed_users input; examples/reusable.yml overrides models to cloud-only and comments the CLAUDE_CODE_OAUTH_TOKEN consistently; all doc claims (3 cloud + 3 claude-code, 5-lens suite, claude-code=3) match the actual input defaults in review-reusable.yml.

🧹 Code cleanliness & maintainability — Minor issues

All three findings check out against the actual code. Let me confirm the exact line numbers before writing the final review.

The actual file contents confirm:

adversarial-review.yml lines 6-7 and 51-53 both describe the same swarm in slightly different words — confirmed.
review-reusable.yml:43 default and examples/reusable.yml:66 both contain minimax-m3:cloud,glm-5.2:cloud,deepseek-v4-pro:cloud verbatim — confirmed.
examples/README.md:9 still says "take the defaults or override a few inputs" but examples/reusable.yml now always ships an active models: override (cloud-only, to avoid needing the Claude token) — confirmed stale.

All findings survive. Line numbers in the draft were slightly off (off by 1-3 lines); correcting them below.

Minor issues

Redundant swarm description in adversarial-review.yml (.gitea/workflows/adversarial-review.yml:6-7 and :51-53). The file-header comment ("It inherits the default swarm: 3 cloud models + Claude Code (sonnet, opus, opus:max), 5-lens suite, 3 claudes at once.") and the with: block comment ("Inherit the default swarm (3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite, all three claudes concurrent) from review-reusable.yml.") state the same thing in slightly different words. Every time the swarm changes both comments need to be updated in sync. One of them can be dropped; the with: comment is closer to the point of use and the more useful of the two.
Cloud model subset duplicated across files (.gitea/workflows/review-reusable.yml:43 default vs examples/reusable.yml:66). The three cloud models (minimax-m3:cloud,glm-5.2:cloud,deepseek-v4-pro:cloud) appear verbatim as the first three items in the full default in review-reusable.yml and as the standalone override in examples/reusable.yml. YAML workflow inputs can't reference constants, so this is somewhat forced, but a rename (e.g. glm-5.2 → glm-6) will silently miss one of the two sites. A one-line comment in examples/reusable.yml noting which models to keep in sync with the defaults would reduce the maintenance risk.
examples/README.md description column is slightly stale (examples/README.md:9). The table says reusable.yml lets you "take the defaults or override a few inputs." After this PR, the example file ships an active models: override (a cloud-only subset, not the default), so it no longer illustrates "taking the defaults." The Needs column (OLLAMA_CLOUD_API_KEY) is still accurate for the example as written, but a reader who looks at the table to decide whether to use reusable.yml or adversarial-review.yml gets a subtly wrong description of what the slim stub does. CLAUDE.md's maintenance rule ("Any change to … the consumer stub MUST be reflected in … the relevant files under examples/") applies here.

⚡ Performance — No material issues found

The finding hinges on the claim that the job wall-clock worst case is 5 × 2 × 600 = 6,000 s. The code contradicts this.

In reviewWithSpecialist (main.go:310–312), a single ctx is created with the 600 s deadline, and both the review pass (line 314) and the recheck pass (line 323) run under that same context. The two passes share one 600 s budget, not one each. The correct worst-case is:

5 lenses × 600 s/lens = 3,000 s = 50 minutes — well inside the 90-minute cap.

The finding's "2 passes × 600 s" multiplier is wrong. The finding cannot be confirmed and is dropped.

No material issues found

🧯 Error handling & edge cases — No material issues found

No material issues found

Reviewed through the error handling & edge cases lens. Changes are confined to YAML workflow defaults and documentation.

Verification performed:

"" fallback claim (workflow comment: "Set any input to "" to fall back to the image/entrypoint built-in default") — verified correct on both sides:
- GADFLY_MODELS="" → entrypoint.sh:156 ${GADFLY_MODELS:-...} correctly falls through to $DEFAULT_MODELS.
- GADFLY_SPECIALISTS="" → specialists.go:136 checks strings.TrimSpace(os.Getenv(...)) != "", so empty string falls through to defaultSuite. No regression.
Failed claude-code models when CLAUDE_CODE_OAUTH_TOKEN is absent — entrypoint.sh:183 || log "model $1 failed (continuing)" catches it; entrypoint.sh:189-191 then marks the pre-seeded status file done:true so the board completes. Graceful degradation is intact.
Malformed/absent provider_concurrency — provider_cap() at entrypoint.sh:167-176 handles empty or missing =N entries by falling back to $DEFAULT_CONC=1. The new claude-code=3 entry in the default is syntactically clean.
PROVIDERS="" (empty model list edge case) — entrypoint.sh:249-253 for p in $PROVIDERS simply doesn't iterate; LANE_PIDS=() stays empty; the conditional wait is skipped. No hang.
Actor gate consistency — adversarial-review.yml:36-41 Gitea if: condition and the allowed_users: "steve,fizi,dazed" input are consistent. The belt-and-suspenders duplication is pre-existing and intentional.
with: containing only comments + allowed_users — valid YAML; no parse issue.

Nothing in this diff introduces new unhappy paths, ignores errors, or removes an existing guard.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 14m 38s}

### 🪰 Gadfly review — `claude-code/sonnet` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> **No material issues found** This PR's security-relevant surface is entirely in how new workflow input defaults (`models`, `specialists`, `provider_concurrency`, etc.) propagate to the container. I verified all of the following: - **Default strings are hardcoded YAML literals**, not consumer-supplied at runtime. They reach the container as `GADFLY_*` env vars — the safe channel. No shell execution touches these values directly; `entrypoint.sh` parses the model CSV into a bash array and strips whitespace with `tr`, not `eval` or unquoted glob expansion. - **`CLAUDE_CODE_OAUTH_TOKEN` is already properly declared** in the `workflow_call.secrets` block (`required: false`) and is forwarded to the container via env at `.gitea/workflows/review-reusable.yml:110`. It is never logged, never passed as a CLI argument, and never echoed — it flows through env inheritance to the gadfly binary, same as every other API key. - **Least-privilege secret forwarding is preserved.** Both the updated `adversarial-review.yml` and `examples/reusable.yml` still enumerate secrets explicitly rather than using `secrets: inherit`. No new secrets are introduced. - **`allowed_users` default remains `""`** (no-ACL-by-default), but this is unchanged behavior from before the PR. The entrypoint's username comparison (splitting on `,` via `IFS`) has no shell injection risk given Gitea's username character constraints. - **No model-name injection risk.** Even if a consumer passes an adversarial `models:` string, it ends up in a bash array element passed as `MODEL="$value"` to `run.sh`, then as `GADFLY_MODEL="$MODEL"` in a subprocess env — never evaluated as shell code. </details> <details><summary>🎯 Correctness — No material issues found</summary> **No material issues found** I verified the key correctness claims by reading `entrypoint.sh`, `cmd/gadfly/main.go`, and both workflow files. - **Model-to-lane assignment** (verified, `entrypoint.sh:159`, `model.go:146-155`): `provider_of("claude-code/opus:max")` → `"claude-code"` in both bash and Go. The `provider_concurrency: "ollama-cloud=3,claude-code=3"` cap therefore applies correctly to all three claude-code models, allowing all three to run in parallel as documented. - **Lens concurrency for claude-code** (verified, `main.go:281-286`): `GADFLY_PROVIDER_LENS_CONCURRENCY="ollama-cloud=3"` has no `claude-code` entry, so `lensConcurrency()` falls back to `defaultLensConcurrency=1` — claude-code lenses run sequentially. This is intentional (not a bug), and the timeout math is sound: 5 lenses × 600s max = 3000s ≈ 50 min per model, with 3 models in parallel → ~50 min wall-clock for the claude-code lane, well inside the 90-min job cap. - **timeout_secs shared across both passes** (verified, `main.go:310-324`): Both the review pass and recheck pass consume the same 600s context per lens. This is pre-existing behavior, unchanged by this PR. - **Empty-string fallback** (verified, `entrypoint.sh:156`): `MODELS="${GADFLY_MODELS:-...}"` uses `:-`, so a consumer passing `models: ""` correctly falls back to `DEFAULT_MODELS="qwen3-coder:480b-cloud"` as the comment claims. - **Cross-file consistency** (verified): `adversarial-review.yml`'s `if:` actor gate (`steve`, `fizi`, `dazed`) matches the `allowed_users` input; `examples/reusable.yml` overrides models to cloud-only and comments the CLAUDE_CODE_OAUTH_TOKEN consistently; all doc claims (3 cloud + 3 claude-code, 5-lens suite, claude-code=3) match the actual input defaults in `review-reusable.yml`. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> All three findings check out against the actual code. Let me confirm the exact line numbers before writing the final review. The actual file contents confirm: 1. `adversarial-review.yml` lines 6-7 and 51-53 both describe the same swarm in slightly different words — confirmed. 2. `review-reusable.yml:43` default and `examples/reusable.yml:66` both contain `minimax-m3:cloud,glm-5.2:cloud,deepseek-v4-pro:cloud` verbatim — confirmed. 3. `examples/README.md:9` still says "take the defaults or override a few inputs" but `examples/reusable.yml` now always ships an active `models:` override (cloud-only, to avoid needing the Claude token) — confirmed stale. All findings survive. Line numbers in the draft were slightly off (off by 1-3 lines); correcting them below. --- **Minor issues** - **Redundant swarm description in `adversarial-review.yml`** (`.gitea/workflows/adversarial-review.yml:6-7` and `:51-53`). The file-header comment ("It inherits the default swarm: 3 cloud models + Claude Code (sonnet, opus, opus:max), 5-lens suite, 3 claudes at once.") and the `with:` block comment ("Inherit the default swarm (3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite, all three claudes concurrent) from review-reusable.yml.") state the same thing in slightly different words. Every time the swarm changes both comments need to be updated in sync. One of them can be dropped; the `with:` comment is closer to the point of use and the more useful of the two. - **Cloud model subset duplicated across files** (`.gitea/workflows/review-reusable.yml:43` default vs `examples/reusable.yml:66`). The three cloud models (`minimax-m3:cloud,glm-5.2:cloud,deepseek-v4-pro:cloud`) appear verbatim as the first three items in the full default in `review-reusable.yml` *and* as the standalone override in `examples/reusable.yml`. YAML workflow inputs can't reference constants, so this is somewhat forced, but a rename (e.g. `glm-5.2` → `glm-6`) will silently miss one of the two sites. A one-line comment in `examples/reusable.yml` noting which models to keep in sync with the defaults would reduce the maintenance risk. - **`examples/README.md` description column is slightly stale** (`examples/README.md:9`). The table says `reusable.yml` lets you "take the defaults or override a few inputs." After this PR, the example file ships an *active* `models:` override (a cloud-only subset, not the default), so it no longer illustrates "taking the defaults." The `Needs` column (`OLLAMA_CLOUD_API_KEY`) is still accurate for the example as written, but a reader who looks at the table to decide whether to use `reusable.yml` or `adversarial-review.yml` gets a subtly wrong description of what the slim stub does. CLAUDE.md's maintenance rule ("Any change to … the consumer stub MUST be reflected in … the relevant files under `examples/`") applies here. </details> <details><summary>⚡ Performance — No material issues found</summary> The finding hinges on the claim that the job wall-clock worst case is `5 × 2 × 600 = 6,000 s`. The code contradicts this. In `reviewWithSpecialist` (main.go:310–312), a **single** `ctx` is created with the 600 s deadline, and both the review pass (line 314) and the recheck pass (line 323) run under that **same** context. The two passes share one 600 s budget, not one each. The correct worst-case is: > 5 lenses × 600 s/lens = 3,000 s = **50 minutes** — well inside the 90-minute cap. The finding's "2 passes × 600 s" multiplier is wrong. The finding cannot be confirmed and is dropped. --- No material issues found </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **No material issues found** Reviewed through the error handling & edge cases lens. Changes are confined to YAML workflow defaults and documentation. **Verification performed:** - **`""` fallback claim** (workflow comment: "Set any input to `""` to fall back to the image/entrypoint built-in default") — verified correct on both sides: - `GADFLY_MODELS=""` → `entrypoint.sh:156` `${GADFLY_MODELS:-...}` correctly falls through to `$DEFAULT_MODELS`. - `GADFLY_SPECIALISTS=""` → `specialists.go:136` checks `strings.TrimSpace(os.Getenv(...)) != ""`, so empty string falls through to `defaultSuite`. No regression. - **Failed claude-code models when `CLAUDE_CODE_OAUTH_TOKEN` is absent** — `entrypoint.sh:183` `|| log "model $1 failed (continuing)"` catches it; `entrypoint.sh:189-191` then marks the pre-seeded status file `done:true` so the board completes. Graceful degradation is intact. - **Malformed/absent `provider_concurrency`** — `provider_cap()` at `entrypoint.sh:167-176` handles empty or missing `=N` entries by falling back to `$DEFAULT_CONC=1`. The new `claude-code=3` entry in the default is syntactically clean. - **`PROVIDERS=""` (empty model list edge case)** — `entrypoint.sh:249-253` `for p in $PROVIDERS` simply doesn't iterate; `LANE_PIDS=()` stays empty; the conditional `wait` is skipped. No hang. - **Actor gate consistency** — `adversarial-review.yml:36-41` Gitea `if:` condition and the `allowed_users: "steve,fizi,dazed"` input are consistent. The belt-and-suspenders duplication is pre-existing and intentional. - **`with:` containing only comments + `allowed_users`** — valid YAML; no parse issue. Nothing in this diff introduces new unhappy paths, ignores errors, or removes an existing guard. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 14m 38s

gitea-actions bot commented

2026-06-28 02:05:49 +00:00

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

No material issues found

The diff moves configuration defaults (model list, specialist suite, concurrency, timeouts) from the caller workflow into the reusable workflow's workflow_call input defaults. Through the security lens:

Secret forwarding: Unchanged. The reusable workflow still declares all secrets as required: false, and the caller (adversarial-review.yml) forwards the same four secrets as before. No new secrets are exposed, and the least-privilege pattern is preserved.
Actor gating: The if: condition in adversarial-review.yml still gates issue_comment events on the same three usernames (steve, fizi, dazed). The in-container GADFLY_ALLOWED_USERS check (entrypoint.sh:71-76) is unchanged and still provides the belt-and-suspenders layer.
allowed_users default of "": The fallback in entrypoint.sh:78-82 (collaborator check) is the same behavior as before this PR. No weakening.
Injection surface: All inputs flow through the same env-var → shell-variable path. I verified entrypoint.sh — the values are used in parameter expansions, read into arrays, and (for the status file path) sanitized via tr -c '[:alnum:]._-' '_' (line 165). No command substitution or eval on these inputs.
Example template: examples/reusable.yml uses "your-username" as a placeholder — a copy-paste consumer must customize it, but the worst case is a non-functional comment trigger, not a security bypass (PR events still fire, and the in-container check would deny an unknown actor).

No authn/authz gaps, injection vectors, secret leaks, or insecure defaults introduced by this change.

🎯 Correctness — Minor issues

Both findings are confirmed against the actual code. Let me output the corrected review.

Verdict: Minor issues

max_steps: "14" is now the inherited default, down from the binary's built-in 24 (review-reusable.yml:50). Previously, a consumer that omitted max_steps got "" → the Go binary used its own defaultMaxSteps of 24 (per cmd/gadfly/main.go:81 and the envInt fallback at line 450–459). Now they get 14 — a 42% drop. The binary has a tool-free finalization fallback for step-exhausted passes (finalizeInstruction at line 113, invoked in runAgent), so it won't silently produce nothing, but reviews may be less thorough. This was already gadfly's own value in the old adversarial-review.yml, so it's a known tradeoff, but it's now imposed on all inheriting consumers.
timeout_secs: "600" unchanged while the default specialist count grew from 3→5 (review-reusable.yml:49). The old adversarial-review.yml ran 3 specialists per model with a 600s binary deadline; the new default runs 5. For cloud models with lens concurrency=3, 5 specialists run in 2 batches (3+2), each batch doing review+recheck = 4 serial rounds → ~150s per round, which is fine. But for claude-code models (provider_lens_concurrency has no claude-code entry, so lensConcurrency() at cmd/gadfly/main.go:281-286 falls back to 1), 5 specialists × 2 passes = 10 sequential rounds → ~60s per round. Claude Code with extended thinking (opus:max) can exceed 60s per call. This was already the case in the old config (same 600s, same lens-concurrency gap for claude-code), so it's not a new regression — but the PR expands the default specialist suite without addressing the claude-code lens-concurrency bottleneck.

🧹 Code cleanliness & maintainability — No material issues found

No material issues found

The diff is clean from a maintainability standpoint:

Duplication removed: adversarial-review.yml no longer repeats config values that are now defaults in review-reusable.yml — the with: block shrinks from 8 explicit overrides to just allowed_users. This is the right pattern: define once, inherit everywhere.
Comments are consistent across files: The header comments in review-reusable.yml, adversarial-review.yml, and examples/reusable.yml all describe the same default swarm (3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite) with matching terminology. Verified by reading all three files.
Example file is self-documenting: examples/reusable.yml explains why it overrides models: (to work with just the Ollama key) and how to restore the full default (delete the override + forward the Claude token). No magic — a copy-paste consumer can understand the tradeoff from the file alone.
No dead references: The old model names dropped from adversarial-review.yml (glm-5.1:cloud, nemotron-3-super:cloud, qwen3-coder:480b-cloud) don't appear in any other workflow or example file — they were only in that one caller. Verified via grep.
CLAUDE.md and README updated in sync: Both reflect the new "ships the default swarm" description. The one minor inconsistency — adversarial-review.yml dropped "Phase-4" from its comment while CLAUDE.md retains it — is harmless internal project jargon, not a maintenance hazard.

⚡ Performance — No material issues found

No material issues found.

This is a pure configuration change (YAML defaults + docs). I traced every new default through the actual runtime code (entrypoint.sh, cmd/gadfly/main.go, cmd/gadfly/specialists.go) to verify the performance impact:

models: 6 models (3 cloud + 3 claude-code) vs the image's prior built-in DEFAULT_MODELS of 1 (qwen3-coder:480b-cloud at entrypoint.sh:52). This is a deliberate 6× work increase for default consumers, not a regression — the PR explicitly ships a "curated swarm."
specialists: The new default (security,correctness,maintainability,performance,error-handling) is byte-identical to the Go binary's defaultSuite at cmd/gadfly/specialists.go:67. When the env was empty, the code already fell through to this exact same 5-lens set (specialists.go:141). Zero behavioral change.
provider_concurrency: ollama-cloud=3,claude-code=3 replaces the prior empty-string fallback of GADFLY_CONCURRENCY default 1 (entrypoint.sh:157). 3× more parallel models per provider — intentional throughput increase.
provider_lens_concurrency: ollama-cloud=3 replaces the prior fallback of GADFLY_LENS_CONCURRENCY default 1 (main.go:285). 3× more parallel lenses for cloud models — intentional.
timeout_secs: 600 replaces the binary default of 300 (main.go:310). Needed for Claude Code extended-thinking models; the 90-minute job cap accommodates 5 sequential lenses × 600s = 50 min per claude-code model.
max_steps: 14 replaces the binary default of 24 (main.go:315). This is a 42% reduction in tool steps per pass — a performance improvement (faster completion), not a regression.
timeout_minutes: 90 replaces 45. Needed to fit the larger default swarm; the arithmetic checks out (slowest lane: 3 claude-code models in parallel, each ~50 min worst-case, fits under 90).

No N+1 queries, no allocations, no hot-loop issues, no unbounded growth — there is no code change to evaluate those against. The configuration values are internally consistent and the parallelism limits are bounded.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

The diff changes only YAML workflow defaults and documentation. I traced every new default value through the runtime code (entrypoint.sh, scripts/run.sh, cmd/gadfly/main.go, cmd/gadfly/specialists.go) and confirmed that all error paths are already handled:

models default with claude-code entries: entrypoint.sh:156 resolves models via ${GADFLY_MODELS:-${OLLAMA_REVIEW_MODELS:-$DEFAULT_MODELS}}. The claude-code engine path in run.sh:134-184 handles missing CLAUDE_CODE_OAUTH_TOKEN / binary-not-found / non-zero exit — all produce a ⚠️ error comment, never a crash.
specialists default: resolveSpecialists (specialists.go:90-169) parses the CSV, validates each name against the registry, and skips unknowns with an error logged — no panic on unrecognized names.
provider_concurrency / provider_lens_concurrency: provider_cap() (entrypoint.sh:167-176) strips whitespace, checks for non-empty values, and falls back to DEFAULT_CONC=1 on any parse failure. lensConcurrency() (main.go:281) has the same fallback pattern, confirmed by the test matrix in lens_concurrency_test.go:160-188 (including "scalar invalid falls back to default").
timeout_secs: "600" / max_steps: "14": Both are valid integer strings; envInt (main.go:310,315) falls back to built-in defaults on parse failure. The binary's tool-free finalization fallback (main.go:44-45) guarantees output even on step exhaustion.
timeout_minutes: 90: Gitea Actions job-level cap; the old adversarial-review.yml already used 90 explicitly, so no behavioral change for the dogfooding caller.
allowed_users: "" default unchanged: entrypoint.sh:68-83 handles empty by falling back to the collaborator check — no auth bypass.

The one configuration tension worth noting (though not an unhandled error): timeout_secs: "600" is a single per-model deadline shared across all lenses, and provider_lens_concurrency has no claude-code entry, so claude-code models run 5 lenses sequentially within those 600 seconds. On a non-trivial PR where a claude-code opus:max lens pass exceeds ~60 seconds, the binary will hit its deadline. But this is handled — run.sh:176-181 catches the non-zero exit and posts a ⚠️ error comment. The system degrades gracefully rather than silently failing or crashing.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 32s}

### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> **No material issues found** The diff moves configuration defaults (model list, specialist suite, concurrency, timeouts) from the caller workflow into the reusable workflow's `workflow_call` input defaults. Through the security lens: - **Secret forwarding**: Unchanged. The reusable workflow still declares all secrets as `required: false`, and the caller (`adversarial-review.yml`) forwards the same four secrets as before. No new secrets are exposed, and the least-privilege pattern is preserved. - **Actor gating**: The `if:` condition in `adversarial-review.yml` still gates `issue_comment` events on the same three usernames (`steve`, `fizi`, `dazed`). The in-container `GADFLY_ALLOWED_USERS` check (entrypoint.sh:71-76) is unchanged and still provides the belt-and-suspenders layer. - **`allowed_users` default of `""`**: The fallback in `entrypoint.sh:78-82` (collaborator check) is the same behavior as before this PR. No weakening. - **Injection surface**: All inputs flow through the same env-var → shell-variable path. I verified `entrypoint.sh` — the values are used in parameter expansions, `read` into arrays, and (for the status file path) sanitized via `tr -c '[:alnum:]._-' '_'` (line 165). No command substitution or eval on these inputs. - **Example template**: `examples/reusable.yml` uses `"your-username"` as a placeholder — a copy-paste consumer must customize it, but the worst case is a non-functional comment trigger, not a security bypass (PR events still fire, and the in-container check would deny an unknown actor). No authn/authz gaps, injection vectors, secret leaks, or insecure defaults introduced by this change. </details> <details><summary>🎯 Correctness — Minor issues</summary> Both findings are confirmed against the actual code. Let me output the corrected review. ## Verdict: Minor issues - **`max_steps: "14"` is now the inherited default, down from the binary's built-in 24** (`review-reusable.yml:50`). Previously, a consumer that omitted `max_steps` got `""` → the Go binary used its own `defaultMaxSteps` of 24 (per `cmd/gadfly/main.go:81` and the `envInt` fallback at line 450–459). Now they get 14 — a 42% drop. The binary has a tool-free finalization fallback for step-exhausted passes (`finalizeInstruction` at line 113, invoked in `runAgent`), so it won't silently produce nothing, but reviews may be less thorough. This was already gadfly's own value in the old `adversarial-review.yml`, so it's a known tradeoff, but it's now imposed on all inheriting consumers. - **`timeout_secs: "600"` unchanged while the default specialist count grew from 3→5** (`review-reusable.yml:49`). The old `adversarial-review.yml` ran 3 specialists per model with a 600s binary deadline; the new default runs 5. For cloud models with lens concurrency=3, 5 specialists run in 2 batches (3+2), each batch doing review+recheck = 4 serial rounds → ~150s per round, which is fine. But for claude-code models (`provider_lens_concurrency` has no `claude-code` entry, so `lensConcurrency()` at `cmd/gadfly/main.go:281-286` falls back to 1), 5 specialists × 2 passes = 10 sequential rounds → ~60s per round. Claude Code with extended thinking (`opus:max`) can exceed 60s per call. This was already the case in the old config (same 600s, same lens-concurrency gap for claude-code), so it's not a new regression — but the PR expands the default specialist suite without addressing the claude-code lens-concurrency bottleneck. </details> <details><summary>🧹 Code cleanliness & maintainability — No material issues found</summary> **No material issues found** The diff is clean from a maintainability standpoint: - **Duplication removed**: `adversarial-review.yml` no longer repeats config values that are now defaults in `review-reusable.yml` — the `with:` block shrinks from 8 explicit overrides to just `allowed_users`. This is the right pattern: define once, inherit everywhere. - **Comments are consistent across files**: The header comments in `review-reusable.yml`, `adversarial-review.yml`, and `examples/reusable.yml` all describe the same default swarm (3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite) with matching terminology. Verified by reading all three files. - **Example file is self-documenting**: `examples/reusable.yml` explains *why* it overrides `models:` (to work with just the Ollama key) and how to restore the full default (delete the override + forward the Claude token). No magic — a copy-paste consumer can understand the tradeoff from the file alone. - **No dead references**: The old model names dropped from `adversarial-review.yml` (`glm-5.1:cloud`, `nemotron-3-super:cloud`, `qwen3-coder:480b-cloud`) don't appear in any other workflow or example file — they were only in that one caller. Verified via grep. - **CLAUDE.md and README updated in sync**: Both reflect the new "ships the default swarm" description. The one minor inconsistency — `adversarial-review.yml` dropped "Phase-4" from its comment while `CLAUDE.md` retains it — is harmless internal project jargon, not a maintenance hazard. </details> <details><summary>⚡ Performance — No material issues found</summary> No material issues found. This is a pure configuration change (YAML defaults + docs). I traced every new default through the actual runtime code (`entrypoint.sh`, `cmd/gadfly/main.go`, `cmd/gadfly/specialists.go`) to verify the performance impact: - **`models`**: 6 models (3 cloud + 3 claude-code) vs the image's prior built-in `DEFAULT_MODELS` of 1 (`qwen3-coder:480b-cloud` at `entrypoint.sh:52`). This is a deliberate 6× work increase for default consumers, not a regression — the PR explicitly ships a "curated swarm." - **`specialists`**: The new default (`security,correctness,maintainability,performance,error-handling`) is byte-identical to the Go binary's `defaultSuite` at `cmd/gadfly/specialists.go:67`. When the env was empty, the code already fell through to this exact same 5-lens set (`specialists.go:141`). Zero behavioral change. - **`provider_concurrency`**: `ollama-cloud=3,claude-code=3` replaces the prior empty-string fallback of `GADFLY_CONCURRENCY` default 1 (`entrypoint.sh:157`). 3× more parallel models per provider — intentional throughput increase. - **`provider_lens_concurrency`**: `ollama-cloud=3` replaces the prior fallback of `GADFLY_LENS_CONCURRENCY` default 1 (`main.go:285`). 3× more parallel lenses for cloud models — intentional. - **`timeout_secs`**: 600 replaces the binary default of 300 (`main.go:310`). Needed for Claude Code extended-thinking models; the 90-minute job cap accommodates 5 sequential lenses × 600s = 50 min per claude-code model. - **`max_steps`**: 14 replaces the binary default of 24 (`main.go:315`). This is a 42% *reduction* in tool steps per pass — a performance improvement (faster completion), not a regression. - **`timeout_minutes`**: 90 replaces 45. Needed to fit the larger default swarm; the arithmetic checks out (slowest lane: 3 claude-code models in parallel, each ~50 min worst-case, fits under 90). No N+1 queries, no allocations, no hot-loop issues, no unbounded growth — there is no code change to evaluate those against. The configuration values are internally consistent and the parallelism limits are bounded. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> ## Verdict: No material issues found The diff changes only YAML workflow defaults and documentation. I traced every new default value through the runtime code (`entrypoint.sh`, `scripts/run.sh`, `cmd/gadfly/main.go`, `cmd/gadfly/specialists.go`) and confirmed that all error paths are already handled: - **`models` default with claude-code entries**: `entrypoint.sh:156` resolves models via `${GADFLY_MODELS:-${OLLAMA_REVIEW_MODELS:-$DEFAULT_MODELS}}`. The claude-code engine path in `run.sh:134-184` handles missing `CLAUDE_CODE_OAUTH_TOKEN` / binary-not-found / non-zero exit — all produce a `⚠️` error comment, never a crash. - **`specialists` default**: `resolveSpecialists` (`specialists.go:90-169`) parses the CSV, validates each name against the registry, and skips unknowns with an error logged — no panic on unrecognized names. - **`provider_concurrency` / `provider_lens_concurrency`**: `provider_cap()` (`entrypoint.sh:167-176`) strips whitespace, checks for non-empty values, and falls back to `DEFAULT_CONC=1` on any parse failure. `lensConcurrency()` (`main.go:281`) has the same fallback pattern, confirmed by the test matrix in `lens_concurrency_test.go:160-188` (including "scalar invalid falls back to default"). - **`timeout_secs: "600"` / `max_steps: "14"`**: Both are valid integer strings; `envInt` (`main.go:310,315`) falls back to built-in defaults on parse failure. The binary's tool-free finalization fallback (`main.go:44-45`) guarantees output even on step exhaustion. - **`timeout_minutes: 90`**: Gitea Actions job-level cap; the old `adversarial-review.yml` already used 90 explicitly, so no behavioral change for the dogfooding caller. - **`allowed_users: ""` default unchanged**: `entrypoint.sh:68-83` handles empty by falling back to the collaborator check — no auth bypass. The one configuration tension worth noting (though not an unhandled error): `timeout_secs: "600"` is a single per-model deadline shared across all lenses, and `provider_lens_concurrency` has no `claude-code` entry, so claude-code models run 5 lenses sequentially within those 600 seconds. On a non-trivial PR where a claude-code opus:max lens pass exceeds ~60 seconds, the binary will hit its deadline. But this is *handled* — `run.sh:176-181` catches the non-zero exit and posts a `⚠️` error comment. The system degrades gracefully rather than silently failing or crashing. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 32s

gitea-actions bot commented

2026-06-28 02:05:49 +00:00

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Through the security lens, I checked the actual checked-out code rather than trusting the diff:

Authz gate preserved. review-reusable.yml:52 keeps allowed_users default as "" (not a new permissive default), and GADFLY_ALLOWED_USERS is still consumed by entrypoint.sh:71-73 as the in-container belt-and-suspenders check. Both adversarial-review.yml:54 and examples/reusable.yml:69 set it explicitly. The comment-trigger actor gate in adversarial-review.yml:36-41 is unchanged. No authn/authz regression.
No new secret exposure. Secrets handling is untouched: still explicit least-privilege forwarding via workflow_call.secrets (review-reusable.yml:64-74), no secrets: inherit introduced. The new default models value references claude-code/* providers, but that does not cause any additional secret to be read — CLAUDE_CODE_OAUTH_TOKEN was already mapped at review-reusable.yml:110 and only resolves if the caller forwards it. A consumer who omits it simply gets an empty env (harmless fail), no leak.
No injection surface added. The new defaults are static string literals passed straight through to GADFLY_* env vars (review-reusable.yml:120-130) and consumed by the pinned image. They are not interpolated into shell, URLs, or templates within the workflow itself. Changing a default value does not expand the input→sink attack surface.
The documented credential requirement (default swarm needs both OLLAMA_CLOUD_API_KEY and CLAUDE_CODE_OAUTH_TOKEN) is a usability/correctness concern, not a security gap — missing credentials degrade to empty env, they don't bypass any check or leak data. The examples/reusable.yml:62-68 cloud-only models: override correctly aligns with forwarding only OLLAMA_CLOUD_API_KEY.

Nothing in the security lane is materially wrong.

🎯 Correctness — No material issues found

VERDICT: No material issues found

Verified the new default values thread correctly through the system:
- models default (review-reusable.yml:43) → GADFLY_MODELS env (:120) → consumed as MODELS in entrypoint.sh:156. The 6 model ids parse fine, and provider_of (entrypoint.sh:159) splits them into lanes: ollama-cloud (3 models) and claude-code (3 models).
- provider_concurrency: "ollama-cloud=3,claude-code=3" (:47) → GADFLY_PROVIDER_CONCURRENCY → parsed by provider_cap (entrypoint.sh:167-176). Both keys match the lane names derived from the model specs, so caps resolve to 3/3 as intended. Comment "all 3 claudes at once" is accurate (3 claude-code models, cap 3).
- provider_lens_concurrency: "ollama-cloud=3" (:48) → GADFLY_PROVIDER_LENS_CONCURRENCY → parsed by providerOverride/lensConcurrency (cmd/gadfly/main.go:281-302). The provider key ollama-cloud matches modelProvider()'s output for bare cloud models. Claude Code models have no override entry, so they correctly fall back to GADFLY_LENS_CONCURRENCY scalar default (1) — consistent with the comment intent of not oversubscribing the Claude lane.
- timeout_secs: "600" → GADFLY_TIMEOUT_SECS → reviewWithSpecialist (main.go:310). Per-lens, matches the documented semantic.
- max_steps: "14" → GADFLY_MAX_STEPS → eng.runPass (main.go:315). Correct.
- timeout_minutes: 90 (number type) → jobs.review.timeout-minutes (:86). Valid.
allowed_users remains "" default in the reusable, and the self-review caller overrides it with "steve,fizi,dazed" (adversarial-review.yml:54); the removed timeout_minutes: 90 override is now inherited from the default (also 90), so no behavior change there. No regression.
The examples/reusable.yml cloud-only models: override correctly omits the claude-code models so it works with only OLLAMA_CLOUD_API_KEY — verified the override string is a strict subset of the default's cloud portion.

No correctness defects found in the change's lane.

🧹 Code cleanliness & maintainability — Minor issues

Both findings are confirmed against the actual code. The default-swarm description does appear twice in review-reusable.yml (header lines 15-19 and inputs-block lines 35-41), and the same description is repeated across the five named files. Note: the phrasing varies slightly between files (not strictly verbatim), but the substance of the duplication is real.

VERDICT: Minor issues

.gitea/workflows/review-reusable.yml:15-19 vs 35-41 — the default-swarm description ("3 cloud models + Claude Code engine sonnet/opus/opus:max, 5-lens suite, all three claudes concurrent, needs both keys") is stated twice in the same file: once in the top header comment and again in the on.workflow_call.inputs block comment. They say the same thing in slightly different words, so any change to the default (e.g. dropping a model, bumping concurrency) requires editing two prose blocks in one file, which drift easily. Consider trimming the inputs-block comment to a one-liner pointing at the header (or vice versa) so the canonical description lives in one place. Verified by reading the full file.
examples/reusable.yml:8-12 + review-reusable.yml:35-41 + adversarial-review.yml:6-7 + CLAUDE.md:49-53 + README.md:42-43 — the same default-swarm description is now copy-pasted across five locations (with minor wording variations, not strictly verbatim). Cross-file doc repetition is partly unavoidable, but the core phrasing ("3 cloud models + Claude Code engine (sonnet/opus/opus:max), 5-lens suite") recurs in each; if the default changes this is a multi-file edit prone to drift. Not blocking, but worth a shared note (e.g. a single examples/README.md table the others link to). Verified by reading all five files.

Otherwise the change is a clean, low-churn consolidation: the adversarial-review.yml caller correctly shed its duplicated config now that the reusable owns the defaults, and the timeout_minutes: 90 removal eliminates a previously-repeated value. The examples/reusable.yml cloud-only override is well-commented and consistent with the secrets it forwards.

⚡ Performance — No material issues found

Verdict: No material issues found

This is a workflow-input-defaults change in YAML (no code paths touched), so I verified the actual concurrency model against the entrypoint and binary before concluding.

Checked:

entrypoint.sh:144-218 — provider lanes run in parallel; within a lane, provider_cap bounds concurrent models; lens-level concurrency is separate (resolved per-model in the Go binary).
cmd/gadfly/main.go:281-302 — lensConcurrency reads GADFLY_PROVIDER_LENS_CONCURRENCY per provider, else falls back to GADFLY_LENS_CONCURRENCY (default 1). Confirmed the new default provider_lens_concurrency: "ollama-cloud=3" leaves claude-code lenses serial (1), while ollama-cloud fans out to 3.
cmd/gadfly/main.go:81-99 — image defaults (maxSteps=24, timeoutSecs=300, lensConcurrency=1) confirm the new defaults (14/600/3-vs-1) are deliberate tuning, not regressions.

Wall-clock reasoning for the new default swarm:

claude-code lane (3 models, provider_concurrency=3 → all concurrent; lenses serial): worst case 5 lenses × 600s = ~50 min per model, all 3 in parallel → ~50 min lane wall-clock. The 90-min timeout_minutes default accommodates this with margin. ✓
ollama-cloud lane (3 models, cap 3; lens_concurrency=3): bounded fan-out (3 models × 3 lenses = 9 concurrent), well within the 90-min budget since lenses batch 5/3 = 2 waves. ✓
Total concurrent model invocations bounded (~12), model list bounded (6), lenses bounded (5). No unbounded growth, no N+1, no quadratic behavior.

The increase from 3→5 lenses and the 45→90 min default are justified by the worst-case serial claude-code path and are consistent with the existing gadfly caller (which already used 90 min). No performance regression introduced.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

Through the error-handling / edge-cases lens, this diff only changes input default values and documentation; it touches no error-handling logic, no cleanup paths, and no boundary arithmetic. I verified the consuming code paths:

review-reusable.yml:120-130 — inputs flow straight into env vars; empty-string fallback is unchanged and the entrypoint still treats "" as "use image default" (entrypoint.sh:156).
entrypoint.sh:167-176 provider_cap parses ollama-cloud=3,claude-code=3 correctly: item%%=*/item#*= split yields claude-code→3, matching provider_of's first-path-segment rule (claude-code/opus:max → claude-code). No off-by-one; the [ "$cap" -ge 1 ] 2>/dev/null || cap=1 guard (entrypoint.sh:208) still clamps a malformed/zero value to 1.
entrypoint.sh:212-217 lane loop: inflight=$((inflight+1)) against cap=3 with ≤3 models per lane — no overflow, no overrun; wait -n 2>/dev/null || wait fallback intact.
cmd/gadfly/main.go:281-299 lensConcurrency/providerOverride: the new default provider_lens_concurrency="ollama-cloud=3" has no claude-code entry, so claude-code lenses fall back to the scalar default (1, serial). That's a deliberate config choice, not an unhandled case; strconv parse errors are ignored via err == nil && n > 0.

One pre-existing, non-blocking observation (not introduced by this PR, so not a finding against the diff): review_one swallows per-model failures (run.sh ... || log "model $1 failed (continuing)", entrypoint.sh:183). The new default swarm requires both OLLAMA_CLOUD_API_KEY and CLAUDE_CODE_OAUTH_TOKEN; a consumer that inherits the default while forwarding only one secret will get silent per-model failures for the un-authed lane rather than a clear abort. The PR documents this requirement and examples/reusable.yml ships a cloud-only models: override to avoid it, so the edge case is covered by guidance rather than by enforced validation — acceptable for an advisory-only workflow.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 30s}

### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> VERDICT: No material issues found Through the security lens, I checked the actual checked-out code rather than trusting the diff: - **Authz gate preserved.** `review-reusable.yml:52` keeps `allowed_users` default as `""` (not a new permissive default), and `GADFLY_ALLOWED_USERS` is still consumed by `entrypoint.sh:71-73` as the in-container belt-and-suspenders check. Both `adversarial-review.yml:54` and `examples/reusable.yml:69` set it explicitly. The comment-trigger actor gate in `adversarial-review.yml:36-41` is unchanged. No authn/authz regression. - **No new secret exposure.** Secrets handling is untouched: still explicit least-privilege forwarding via `workflow_call.secrets` (`review-reusable.yml:64-74`), no `secrets: inherit` introduced. The new default `models` value references `claude-code/*` providers, but that does not cause any additional secret to be read — `CLAUDE_CODE_OAUTH_TOKEN` was already mapped at `review-reusable.yml:110` and only resolves if the caller forwards it. A consumer who omits it simply gets an empty env (harmless fail), no leak. - **No injection surface added.** The new defaults are static string literals passed straight through to `GADFLY_*` env vars (`review-reusable.yml:120-130`) and consumed by the pinned image. They are not interpolated into shell, URLs, or templates within the workflow itself. Changing a default value does not expand the input→sink attack surface. - **The documented credential requirement** (default swarm needs both `OLLAMA_CLOUD_API_KEY` and `CLAUDE_CODE_OAUTH_TOKEN`) is a usability/correctness concern, not a security gap — missing credentials degrade to empty env, they don't bypass any check or leak data. The `examples/reusable.yml:62-68` cloud-only `models:` override correctly aligns with forwarding only `OLLAMA_CLOUD_API_KEY`. Nothing in the security lane is materially wrong. </details> <details><summary>🎯 Correctness — No material issues found</summary> VERDICT: No material issues found - Verified the new default values thread correctly through the system: - `models` default (`review-reusable.yml:43`) → `GADFLY_MODELS` env (`:120`) → consumed as `MODELS` in `entrypoint.sh:156`. The 6 model ids parse fine, and `provider_of` (`entrypoint.sh:159`) splits them into lanes: `ollama-cloud` (3 models) and `claude-code` (3 models). - `provider_concurrency: "ollama-cloud=3,claude-code=3"` (`:47`) → `GADFLY_PROVIDER_CONCURRENCY` → parsed by `provider_cap` (`entrypoint.sh:167-176`). Both keys match the lane names derived from the model specs, so caps resolve to 3/3 as intended. Comment "all 3 claudes at once" is accurate (3 claude-code models, cap 3). - `provider_lens_concurrency: "ollama-cloud=3"` (`:48`) → `GADFLY_PROVIDER_LENS_CONCURRENCY` → parsed by `providerOverride`/`lensConcurrency` (`cmd/gadfly/main.go:281-302`). The provider key `ollama-cloud` matches `modelProvider()`'s output for bare cloud models. Claude Code models have no override entry, so they correctly fall back to `GADFLY_LENS_CONCURRENCY` scalar default (1) — consistent with the comment intent of not oversubscribing the Claude lane. - `timeout_secs: "600"` → `GADFLY_TIMEOUT_SECS` → `reviewWithSpecialist` (`main.go:310`). Per-lens, matches the documented semantic. - `max_steps: "14"` → `GADFLY_MAX_STEPS` → `eng.runPass` (`main.go:315`). Correct. - `timeout_minutes: 90` (number type) → `jobs.review.timeout-minutes` (`:86`). Valid. - `allowed_users` remains `""` default in the reusable, and the self-review caller overrides it with `"steve,fizi,dazed"` (`adversarial-review.yml:54`); the removed `timeout_minutes: 90` override is now inherited from the default (also 90), so no behavior change there. No regression. - The `examples/reusable.yml` cloud-only `models:` override correctly omits the claude-code models so it works with only `OLLAMA_CLOUD_API_KEY` — verified the override string is a strict subset of the default's cloud portion. No correctness defects found in the change's lane. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> Both findings are confirmed against the actual code. The default-swarm description does appear twice in `review-reusable.yml` (header lines 15-19 and inputs-block lines 35-41), and the same description is repeated across the five named files. Note: the phrasing varies slightly between files (not strictly verbatim), but the substance of the duplication is real. VERDICT: Minor issues - **`.gitea/workflows/review-reusable.yml:15-19` vs `35-41`** — the default-swarm description ("3 cloud models + Claude Code engine sonnet/opus/opus:max, 5-lens suite, all three claudes concurrent, needs both keys") is stated twice in the *same file*: once in the top header comment and again in the `on.workflow_call.inputs` block comment. They say the same thing in slightly different words, so any change to the default (e.g. dropping a model, bumping concurrency) requires editing two prose blocks in one file, which drift easily. Consider trimming the inputs-block comment to a one-liner pointing at the header (or vice versa) so the canonical description lives in one place. Verified by reading the full file. - **`examples/reusable.yml:8-12` + `review-reusable.yml:35-41` + `adversarial-review.yml:6-7` + `CLAUDE.md:49-53` + `README.md:42-43`** — the same default-swarm description is now copy-pasted across five locations (with minor wording variations, not strictly verbatim). Cross-file doc repetition is partly unavoidable, but the core phrasing ("3 cloud models + Claude Code engine (sonnet/opus/opus:max), 5-lens suite") recurs in each; if the default changes this is a multi-file edit prone to drift. Not blocking, but worth a shared note (e.g. a single `examples/README.md` table the others link to). Verified by reading all five files. Otherwise the change is a clean, low-churn consolidation: the `adversarial-review.yml` caller correctly shed its duplicated config now that the reusable owns the defaults, and the `timeout_minutes: 90` removal eliminates a previously-repeated value. The `examples/reusable.yml` cloud-only override is well-commented and consistent with the secrets it forwards. </details> <details><summary>⚡ Performance — No material issues found</summary> **Verdict: No material issues found** This is a workflow-input-defaults change in YAML (no code paths touched), so I verified the actual concurrency model against the entrypoint and binary before concluding. Checked: - `entrypoint.sh:144-218` — provider lanes run in parallel; within a lane, `provider_cap` bounds concurrent models; lens-level concurrency is separate (resolved per-model in the Go binary). - `cmd/gadfly/main.go:281-302` — `lensConcurrency` reads `GADFLY_PROVIDER_LENS_CONCURRENCY` per provider, else falls back to `GADFLY_LENS_CONCURRENCY` (default 1). Confirmed the new default `provider_lens_concurrency: "ollama-cloud=3"` leaves claude-code lenses serial (1), while ollama-cloud fans out to 3. - `cmd/gadfly/main.go:81-99` — image defaults (`maxSteps=24`, `timeoutSecs=300`, `lensConcurrency=1`) confirm the new defaults (14/600/3-vs-1) are deliberate tuning, not regressions. Wall-clock reasoning for the new default swarm: - **claude-code lane** (3 models, `provider_concurrency=3` → all concurrent; lenses serial): worst case 5 lenses × 600s = ~50 min per model, all 3 in parallel → ~50 min lane wall-clock. The 90-min `timeout_minutes` default accommodates this with margin. ✓ - **ollama-cloud lane** (3 models, cap 3; `lens_concurrency=3`): bounded fan-out (3 models × 3 lenses = 9 concurrent), well within the 90-min budget since lenses batch 5/3 = 2 waves. ✓ - Total concurrent model invocations bounded (~12), model list bounded (6), lenses bounded (5). No unbounded growth, no N+1, no quadratic behavior. The increase from 3→5 lenses and the 45→90 min default are justified by the worst-case serial claude-code path and are consistent with the existing gadfly caller (which already used 90 min). No performance regression introduced. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **Verdict: No material issues found** Through the error-handling / edge-cases lens, this diff only changes input **default values** and documentation; it touches no error-handling logic, no cleanup paths, and no boundary arithmetic. I verified the consuming code paths: - `review-reusable.yml:120-130` — inputs flow straight into env vars; empty-string fallback is unchanged and the entrypoint still treats `""` as "use image default" (`entrypoint.sh:156`). - `entrypoint.sh:167-176` `provider_cap` parses `ollama-cloud=3,claude-code=3` correctly: `item%%=*`/`item#*=` split yields `claude-code`→`3`, matching `provider_of`'s first-path-segment rule (`claude-code/opus:max` → `claude-code`). No off-by-one; the `[ "$cap" -ge 1 ] 2>/dev/null || cap=1` guard (`entrypoint.sh:208`) still clamps a malformed/zero value to 1. - `entrypoint.sh:212-217` lane loop: `inflight=$((inflight+1))` against `cap=3` with ≤3 models per lane — no overflow, no overrun; `wait -n 2>/dev/null || wait` fallback intact. - `cmd/gadfly/main.go:281-299` `lensConcurrency`/`providerOverride`: the new default `provider_lens_concurrency="ollama-cloud=3"` has no `claude-code` entry, so claude-code lenses fall back to the scalar default (1, serial). That's a deliberate config choice, not an unhandled case; `strconv` parse errors are ignored via `err == nil && n > 0`. One pre-existing, non-blocking observation (not introduced by this PR, so not a finding against the diff): `review_one` swallows per-model failures (`run.sh ... || log "model $1 failed (continuing)"`, `entrypoint.sh:183`). The new default swarm requires **both** `OLLAMA_CLOUD_API_KEY` and `CLAUDE_CODE_OAUTH_TOKEN`; a consumer that inherits the default while forwarding only one secret will get silent per-model failures for the un-authed lane rather than a clear abort. The PR documents this requirement and `examples/reusable.yml` ships a cloud-only `models:` override to avoid it, so the edge case is covered by guidance rather than by enforced validation — acceptable for an advisory-only workflow. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 30s

steve added 1 commit 2026-06-28 02:08:49 +00:00

feat(reusable): run claude lenses concurrently too (claude-code=5)

Build & push image / build-and-push (pull_request) Successful in 6s

Details

6407c0defe

provider_lens_concurrency default adds claude-code=5 so each claude model
runs all 5 lenses at once (was serial — defaultLensConcurrency=1). The engine
already supports it: each lens is an independent read-only 'claude -p' (plan
mode) in its own process group; no shared mutable state beyond ~/.claude + the
API. Peak claude concurrency is now 3 models x 5 lenses = up to 15 concurrent
claude -p per pass — documented, dial back either knob if rate-limited.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve added 1 commit 2026-06-28 02:12:18 +00:00

tune(reusable): claude-code=1 model x 5 lenses (cap peak at 5 concurrent)

Build & push image / build-and-push (pull_request) Successful in 7s

Details

f882b006d1

Run claude models one at a time (provider_concurrency claude-code=1) but each
with all 5 lenses concurrent (provider_lens_concurrency claude-code=5) — peak 5
concurrent claude -p per pass instead of 15, friendlier to one subscription.
Updated all the 'three claudes at once' wording across the workflow + docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve referenced this issue from a commit

2026-06-28 02:18:44 +00:00

docs: reconcile examples/README + CLAUDE.md with the heavier reusable default

steve added 1 commit 2026-06-28 02:18:44 +00:00

docs: reconcile examples/README + CLAUDE.md with the heavier reusable default

Build & push image / build-and-push (pull_request) Successful in 6s

Details

4654036dea

From PR #10's own review (maintainability/perf lenses): examples/README.md
hadn't been updated for the default swarm, and CLAUDE.md's 'keep the default
model count low' cost guidance read as contradicting the new heavy default.
Clarify that the IMAGE default stays minimal while the REUSABLE ships an
opinionated heavier default consumers inherit/override.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve merged commit b02b11d691 into main

2026-06-28 02:23:40 +00:00

steve deleted branch feat/default-swarm-config

2026-06-28 02:23:40 +00:00

steve referenced this issue from a commit

2026-06-28 02:26:15 +00:00

ci: inherit gadfly's default swarm (slim caller, re-pin @b02b11d)

steve referenced this issue from a commit

2026-06-28 02:26:15 +00:00

ci: inherit gadfly's default swarm (slim caller, re-pin @b02b11d)

steve referenced this issue from a commit

2026-06-28 02:26:15 +00:00