tune(reusable): claude-code=3 models × 5 lenses (claude was the bottleneck) #11

2026-06-28T03:49:29Z

steve commented

2026-06-28 03:49:29 +00:00

Claude was the wall-clock bottleneck running serially (1 model at a time). Bump the default provider_concurrency claude-code=1 → 3 so all three claude models run concurrently; each already runs its 5 lenses at once.

Peak claude concurrency: 3 models × 5 lenses = up to 15 concurrent claude -p per pass (review, then recheck).
Subscription is expected to handle it; if it 429s, individual lenses error (advisory). Dial back either knob if needed.
Comments + CLAUDE.md updated to match.

gadfly's own caller uses the local reusable ref, so this PR's own review is the live test of whether 15 concurrent claudes rate-limit. Consumers (mort/executus/majordomo) are sha-pinned, so they'll need re-pinning to pick this up.

🤖 Generated with Claude Code

Claude was the wall-clock bottleneck running serially (1 model at a time). Bump the default `provider_concurrency` `claude-code=1 → 3` so all three claude models run concurrently; each already runs its 5 lenses at once. - **Peak claude concurrency: 3 models × 5 lenses = up to 15 concurrent `claude -p`** per pass (review, then recheck). - Subscription is expected to handle it; if it 429s, individual lenses error (advisory). Dial back either knob if needed. - Comments + CLAUDE.md updated to match. gadfly's own caller uses the local reusable ref, so **this PR's own review is the live test** of whether 15 concurrent claudes rate-limit. Consumers (mort/executus/majordomo) are sha-pinned, so they'll need re-pinning to pick this up. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

steve added 1 commit 2026-06-28 03:49:29 +00:00

tune(reusable): claude-code=3 models x 5 lenses (claude was the bottleneck)

Build & push image / build-and-push (pull_request) Successful in 6s

Details

Adversarial Review (Gadfly) / review (pull_request) Successful in 2m29s

Details

6beaabe647

Bump the default provider_concurrency claude-code 1 -> 3 so all three claude
models run concurrently (each already runs its 5 lenses at once). Peak is now
3 x 5 = 15 concurrent claude -p per pass. Claude was the wall-clock bottleneck
running serially; subscription is expected to handle the concurrency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitea-actions bot commented

2026-06-28 03:49:31 +00:00

🪰 Gadfly — live review status

6/6 reviewers finished · updated 2026-06-28 03:51:58Z

`claude-code/opus` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`claude-code/opus:max` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`claude-code/sonnet` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — No material issues found
✅ performance — No material issues found
✅ error-handling — No material issues found

`deepseek-v4-pro:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`glm-5.2:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`minimax-m3:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — Minor issues
✅ error-handling — No material issues found

_{Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.}

## 🪰 Gadfly — live review status 6/6 reviewers finished · updated 2026-06-28 03:51:58Z #### `claude-code/opus` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/opus:max` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/sonnet` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `deepseek-v4-pro:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `glm-5.2:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `minimax-m3:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — Minor issues - ✅ **error-handling** — No material issues found Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.

gitea-actions bot commented

2026-06-28 03:49:32 +00:00

🪰 Gadfly review — `claude-code/opus:max` (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

I've verified the change against the actual file. The diff is purely a default-value change: claude-code provider concurrency 1 → 3 in review-reusable.yml, plus matching comment/CLAUDE.md doc edits. No code paths, no new inputs, no secret handling changes.

VERDICT: No material issues found

Through the Security lens, this change is benign:

No new taint or injection surface. provider_concurrency is a static default string (ollama-cloud=3,claude-code=3) that flows into the GADFLY_PROVIDER_CONCURRENCY env var (review-reusable.yml:52,129). The value is author-controlled config, not untrusted PR input, and the only change is 1→3. The existing tainted inputs (COMMENT_BODY, ACTOR, PR_BRANCH from caller github.* context) are untouched by this PR.
No secret/credential changes. The secrets: declaration block, least-privilege forwarding model, and the github.token handling are all unchanged. No new secret is exposed or logged.
No authn/authz impact. allowed_users/actor gating is not touched.
Concurrency bump is not a security regression in this lens. Raising peak concurrency to 15 claude -p could trigger provider 429s, but the design is advisory (failed lenses degrade gracefully) — that's a reliability/cost concern owned by other lenses, not a security weakness.

I verified by reading review-reusable.yml in full at the checked-out state to confirm the changed line and that no surrounding security-relevant wiring (secrets block, token env, permissions) was altered.

(Note: I was placed in plan mode by the harness, but this is a review task with no edits required — the output above is the deliverable.)

🎯 Correctness — No material issues found

I've verified the change against the actual code. Here's my review through the Correctness lens.

VERDICT: No material issues found

This is a config-default change (provider_concurrency claude-code=1 → 3) plus matching comment/doc edits. I traced the mechanism end-to-end and the numbers re-derive correctly:

Lane keying is correct. provider_of() (entrypoint.sh:159) strips at the first /, so claude-code/sonnet, claude-code/opus, and claude-code/opus:max all map to the lane key claude-code. provider_cap() (entrypoint.sh:167) does an exact [ "$k" = "$p" ] match against the claude-code=3 map entry, so the cap resolves to 3 for that lane. Verified by reading both functions.
The cap value matches the model count. The default models: list contains exactly 3 claude-code specs, and run_lane (entrypoint.sh:206) gates with inflight >= cap. cap=3 with 3 models ⇒ all 3 run concurrently, no off-by-one, and the cap simply doesn't bind if fewer models are present. Correct.
The "15 concurrent" arithmetic checks out. Peak = provider_concurrency × provider_lens_concurrency = 3 × 5. provider_lens_concurrency (unchanged, claude-code=5) governs the per-model lens fan-out inside the binary, and the suite has exactly 5 lenses, so each model sustains 5 concurrent claude -p. 3 × 5 = 15 "per pass" is accurate; review and recheck passes are sequential, so 15 (not 30) is the true peak. The diff's comment wording matches.
No advisory-invariant regression. review_one still swallows per-model failure with || log "...failed (continuing)" and run.sh errors stay advisory, so a 429 from over-subscription degrades to individual lens errors rather than a failed job — consistent with the PR's stated risk handling.

The CLAUDE.md and YAML comment edits are consistent with the new value; any remaining quibbles there are docs-lens, not correctness.

Nothing in the correctness lens is materially wrong with this change.

(Note: the injected plan-mode scaffolding doesn't apply here — this is a read-only adversarial review whose deliverable is the findings above, not a plan file. I made no edits.)

🧹 Code cleanliness & maintainability — Minor issues

The finding is confirmed. Lines 7-8 of adversarial-review.yml explicitly state "claude models run one at a time, each with all 5 lenses at once," which directly contradicts the new default (3 concurrent claude models) that this caller inherits from review-reusable.yml. The PR updated review-reusable.yml and CLAUDE.md but left this sibling comment stale.

VERDICT: Minor issues

Stale sibling comment — .gitea/workflows/adversarial-review.yml:7-8: This file is gadfly's own thin caller of review-reusable.yml and inherits its default swarm. Its header comment still describes the inherited default as "5-lens suite (claude models run one at a time, each with all 5 lenses at once)." This PR changes that default to run 3 claude models concurrently (claude-code=1 → 3 in review-reusable.yml), so this comment now contradicts the swarm it inherits. The PR updated the wording in both review-reusable.yml and CLAUDE.md but missed this caller. Per CLAUDE.md's own maintenance rule (stale docs are a bug), fix lines 7-8 to match, e.g. "5-lens suite (3 claude models concurrent, each with all 5 lenses at once)."

⚡ Performance — No material issues found

VERDICT: No material issues found

This PR is a one-knob config change: GADFLY_PROVIDER_CONCURRENCY default claude-code=1 → 3 in review-reusable.yml, plus matching comment/CLAUDE.md doc updates. Through the performance lens there is no algorithmic content to regress — no loops, queries, or allocations are touched.

What I verified with the tools:

Peak is bounded, not unbounded. entrypoint.sh:206-215 (run_lane) caps models-in-flight per lane at cap (here 3) via the wait -n gate, and cmd/gadfly/main.go:282-285 caps lenses-per-model at provider_lens_concurrency (5). So the claude-code lane tops out at 3 × 5 = 15 concurrent claude -p subprocesses — the diff's stated figure checks out. The recheck pass is sequential after review (recheck.go), so peak stays 15, it doesn't stack to 30.
Each lens is a real subprocess, confirmed at engine.go:148-151 (exec.CommandContext(e.bin, …) per runPass), so the 15 are 15 Node-based CLI processes, and the ollama-cloud lane runs in parallel with it (entrypoint.sh:250 run_lane "$p" &, ~9 concurrent network calls). That's the actual load shape.

The 15 concurrent claude processes are predominantly API-latency-bound (waiting on the subscription), not CPU-bound local work, and the change is a wall-clock improvement by design (3 models concurrent instead of serial). The only conceivable performance downside — runner CPU/memory oversubscription from 15 heavyweight Node processes — I cannot substantiate without the runner's specs, and the PR explicitly frames its own review as the live load test with a documented dial-back path. I won't dress that up as a finding; it would be an unverified guess.

Nothing material in my lane.

🧯 Error handling & edge cases — No material issues found

I've verified the relevant machinery. The diff is a config-default change (claude-code=1 → 3) plus matching doc/comment updates — no Go code, no new logic.

Through the error-handling & edge-cases lens, the paths this change leans on already exist and are unchanged by the diff:

Model/lens failure (e.g. a 429 under higher concurrency) is swallowed and logged as advisory: entrypoint.sh:183 runs run.sh ... || log "model $1 failed (continuing)", so a rate-limited model can't abort the lane or fail CI.
The concurrency value is parsed defensively: provider_cap (entrypoint.sh:167) returns the raw v, and run_lane clamps it with [ "$cap" -ge 1 ] 2>/dev/null || cap=1 (entrypoint.sh:208). 3 is a valid positive int, so it introduces no new parse/boundary edge case that 1 didn't already have.
The never-started status-file edge case (empty diff / no key / binary missing) is still handled at entrypoint.sh:189-191.

The PR's own claim — "if it 429s, individual lenses error (advisory)" — checks out against the code: failures degrade gracefully rather than panicking or blocking.

VERDICT: No material issues found

The change is a concurrency-knob default (claude-code=1 → 3) and documentation. It introduces no new error path, nil/empty/overflow risk, or boundary condition. The unhappy paths it exercises more often (rate-limit/model errors) are already handled as advisory and continue-on-failure (entrypoint.sh:183, :189-191), and the cap value is clamped (:208). Verified by reading entrypoint.sh concurrency/lane code; nothing in my lens is materially wrong.

Outside my lens: the doc/comment edits and the new default are internally consistent (CLAUDE.md, the YAML inputs block, and the peak-concurrency comment all now say 3×5=15) — that's a maintainability/correctness concern other lenses cover, not an error-handling one.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 13s}

### 🪰 Gadfly review — `claude-code/opus:max` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> I've verified the change against the actual file. The diff is purely a default-value change: `claude-code` provider concurrency `1 → 3` in `review-reusable.yml`, plus matching comment/CLAUDE.md doc edits. No code paths, no new inputs, no secret handling changes. ## VERDICT: No material issues found Through the Security lens, this change is benign: - **No new taint or injection surface.** `provider_concurrency` is a static default string (`ollama-cloud=3,claude-code=3`) that flows into the `GADFLY_PROVIDER_CONCURRENCY` env var (`review-reusable.yml:52,129`). The value is author-controlled config, not untrusted PR input, and the only change is `1`→`3`. The existing tainted inputs (`COMMENT_BODY`, `ACTOR`, `PR_BRANCH` from caller `github.*` context) are untouched by this PR. - **No secret/credential changes.** The `secrets:` declaration block, least-privilege forwarding model, and the `github.token` handling are all unchanged. No new secret is exposed or logged. - **No authn/authz impact.** `allowed_users`/actor gating is not touched. - **Concurrency bump is not a security regression in this lens.** Raising peak concurrency to 15 `claude -p` could trigger provider 429s, but the design is advisory (failed lenses degrade gracefully) — that's a reliability/cost concern owned by other lenses, not a security weakness. I verified by reading `review-reusable.yml` in full at the checked-out state to confirm the changed line and that no surrounding security-relevant wiring (secrets block, token env, permissions) was altered. (Note: I was placed in plan mode by the harness, but this is a review task with no edits required — the output above is the deliverable.) </details> <details><summary>🎯 Correctness — No material issues found</summary> I've verified the change against the actual code. Here's my review through the Correctness lens. --- **VERDICT: No material issues found** This is a config-default change (`provider_concurrency` `claude-code=1 → 3`) plus matching comment/doc edits. I traced the mechanism end-to-end and the numbers re-derive correctly: - **Lane keying is correct.** `provider_of()` (`entrypoint.sh:159`) strips at the first `/`, so `claude-code/sonnet`, `claude-code/opus`, and `claude-code/opus:max` all map to the lane key `claude-code`. `provider_cap()` (`entrypoint.sh:167`) does an exact `[ "$k" = "$p" ]` match against the `claude-code=3` map entry, so the cap resolves to 3 for that lane. Verified by reading both functions. - **The cap value matches the model count.** The default `models:` list contains exactly 3 claude-code specs, and `run_lane` (`entrypoint.sh:206`) gates with `inflight >= cap`. cap=3 with 3 models ⇒ all 3 run concurrently, no off-by-one, and the cap simply doesn't bind if fewer models are present. Correct. - **The "15 concurrent" arithmetic checks out.** Peak = `provider_concurrency × provider_lens_concurrency` = 3 × 5. `provider_lens_concurrency` (unchanged, `claude-code=5`) governs the per-model lens fan-out inside the binary, and the suite has exactly 5 lenses, so each model sustains 5 concurrent `claude -p`. 3 × 5 = 15 "per pass" is accurate; review and recheck passes are sequential, so 15 (not 30) is the true peak. The diff's comment wording matches. - **No advisory-invariant regression.** `review_one` still swallows per-model failure with `|| log "...failed (continuing)"` and `run.sh` errors stay advisory, so a 429 from over-subscription degrades to individual lens errors rather than a failed job — consistent with the PR's stated risk handling. The CLAUDE.md and YAML comment edits are consistent with the new value; any remaining quibbles there are docs-lens, not correctness. Nothing in the correctness lens is materially wrong with this change. *(Note: the injected plan-mode scaffolding doesn't apply here — this is a read-only adversarial review whose deliverable is the findings above, not a plan file. I made no edits.)* </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> The finding is confirmed. Lines 7-8 of `adversarial-review.yml` explicitly state "claude models run one at a time, each with all 5 lenses at once," which directly contradicts the new default (3 concurrent claude models) that this caller inherits from `review-reusable.yml`. The PR updated `review-reusable.yml` and `CLAUDE.md` but left this sibling comment stale. VERDICT: Minor issues - **Stale sibling comment — `.gitea/workflows/adversarial-review.yml:7-8`**: This file is gadfly's own thin caller of `review-reusable.yml` and inherits its default swarm. Its header comment still describes the inherited default as "5-lens suite (claude models run one at a time, each with all 5 lenses at once)." This PR changes that default to run 3 claude models concurrently (`claude-code=1 → 3` in `review-reusable.yml`), so this comment now contradicts the swarm it inherits. The PR updated the wording in both `review-reusable.yml` and `CLAUDE.md` but missed this caller. Per CLAUDE.md's own maintenance rule (stale docs are a bug), fix lines 7-8 to match, e.g. "5-lens suite (3 claude models concurrent, each with all 5 lenses at once)." </details> <details><summary>⚡ Performance — No material issues found</summary> **VERDICT: No material issues found** This PR is a one-knob config change: `GADFLY_PROVIDER_CONCURRENCY` default `claude-code=1 → 3` in `review-reusable.yml`, plus matching comment/CLAUDE.md doc updates. Through the performance lens there is no algorithmic content to regress — no loops, queries, or allocations are touched. What I verified with the tools: - **Peak is bounded, not unbounded.** `entrypoint.sh:206-215` (`run_lane`) caps models-in-flight per lane at `cap` (here 3) via the `wait -n` gate, and `cmd/gadfly/main.go:282-285` caps lenses-per-model at `provider_lens_concurrency` (5). So the claude-code lane tops out at 3 × 5 = 15 concurrent `claude -p` subprocesses — the diff's stated figure checks out. The recheck pass is sequential after review (`recheck.go`), so peak stays 15, it doesn't stack to 30. - **Each lens is a real subprocess**, confirmed at `engine.go:148-151` (`exec.CommandContext(e.bin, …)` per `runPass`), so the 15 are 15 Node-based CLI processes, and the `ollama-cloud` lane runs *in parallel* with it (`entrypoint.sh:250` `run_lane "$p" &`, ~9 concurrent network calls). That's the actual load shape. The 15 concurrent claude processes are predominantly API-latency-bound (waiting on the subscription), not CPU-bound local work, and the change is a wall-clock *improvement* by design (3 models concurrent instead of serial). The only conceivable performance downside — runner CPU/memory oversubscription from 15 heavyweight Node processes — I cannot substantiate without the runner's specs, and the PR explicitly frames its own review as the live load test with a documented dial-back path. I won't dress that up as a finding; it would be an unverified guess. Nothing material in my lane. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I've verified the relevant machinery. The diff is a config-default change (`claude-code=1 → 3`) plus matching doc/comment updates — no Go code, no new logic. Through the error-handling & edge-cases lens, the paths this change leans on already exist and are unchanged by the diff: - Model/lens failure (e.g. a 429 under higher concurrency) is swallowed and logged as advisory: `entrypoint.sh:183` runs `run.sh ... || log "model $1 failed (continuing)"`, so a rate-limited model can't abort the lane or fail CI. - The concurrency value is parsed defensively: `provider_cap` (`entrypoint.sh:167`) returns the raw `v`, and `run_lane` clamps it with `[ "$cap" -ge 1 ] 2>/dev/null || cap=1` (`entrypoint.sh:208`). `3` is a valid positive int, so it introduces no new parse/boundary edge case that `1` didn't already have. - The never-started status-file edge case (empty diff / no key / binary missing) is still handled at `entrypoint.sh:189-191`. The PR's own claim — "if it 429s, individual lenses error (advisory)" — checks out against the code: failures degrade gracefully rather than panicking or blocking. --- **VERDICT: No material issues found** - The change is a concurrency-knob default (`claude-code=1 → 3`) and documentation. It introduces no new error path, nil/empty/overflow risk, or boundary condition. The unhappy paths it exercises more often (rate-limit/model errors) are already handled as advisory and continue-on-failure (`entrypoint.sh:183`, `:189-191`), and the cap value is clamped (`:208`). Verified by reading `entrypoint.sh` concurrency/lane code; nothing in my lens is materially wrong. Outside my lens: the doc/comment edits and the new default are internally consistent (CLAUDE.md, the YAML inputs block, and the peak-concurrency comment all now say 3×5=15) — that's a maintainability/correctness concern other lenses cover, not an error-handling one. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 13s

gitea-actions bot commented

2026-06-28 03:49:32 +00:00

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

The diff is purely a default-value/comment change in .gitea/workflows/review-reusable.yml (claude-code=1 → 3) plus matching doc text. No authn/authz, secret-handling, input-validation, injection, or untrusted-data-flow changes were introduced. Verified by reading the full inputs/secrets/permissions block (review-reusable.yml:47-86) — the secret declarations, least-privilege forwarding guidance, permissions: grants, and pinning (uses: docker://...:sha-c342bdb) are unchanged.
The only security-adjacent consideration is whether raising concurrency constitutes a DoS vector. It does not: it is a self-imposed limit on the reviewer's own outbound claude -p calls against the project's own subscription; there is no attacker-controlled input that influences the concurrency value, and an over-limit surfaces as advisory per-lens errors per the documented behavior. No externally exploitable surface is added.

🎯 Correctness — No material issues found

VERDICT: No material issues found

Verified the concurrency math behind the change. entrypoint.sh:159 provider_of resolves claude-code/sonnet, claude-code/opus, claude-code/opus:max to the claude-code lane; run_lane (entrypoint.sh:206-218) launches up to cap models concurrently via wait -n. With claude-code=3 and exactly 3 claude-code models in the default models list, all 3 run at once — consistent with the "all 3 claude models at once" comment.
Verified the peak-concurrency claim: cmd/gadfly/main.go:281-286 lensConcurrency reads GADFLY_PROVIDER_LENS_CONCURRENCY for the model's provider (claude-code=5), and the default specialists list has exactly 5 lenses, so each model runs all 5 lenses concurrently. 3 models × 5 lenses = 15 concurrent claude -p, matching the updated comment at review-reusable.yml:44-46.
The ollama-cloud lane is untouched (cap=3, 3 cloud models, lens_concurrency=3) — no cross-lane interaction change.
Comment/doc edits are accurate and consistent with the new defaults. No logic or semantic correctness problem in this lane.

🧹 Code cleanliness & maintainability — Minor issues

Corrected review

VERDICT: Minor issues

.gitea/workflows/adversarial-review.yml:6-8 — stale comment, now contradicts the inherited default. The caller's header still reads "claude models run one at a time, each with all 5 lenses at once", but this workflow uses: ./.gitea/workflows/review-reusable.yml with no provider_concurrency override (its with: block at lines 51-55 sets only allowed_users), so it inherits the new claude-code=3 default — i.e. claude models now run concurrently, not one at a time. The PR updated the two other places that said "serial / one at a time" (the reusable header + inputs block, and CLAUDE.md:51) but missed this third copy of the same phrasing. Suggested fix: update lines 7-8 to match the reusable header, e.g. "5-lens suite (3 claude models concurrent, each with all 5 lenses at once)" — same wording already used in review-reusable.yml:16 and CLAUDE.md:51 for consistency.

The new wording inside review-reusable.yml (header line 16, inputs block lines 38-39, peak-concurrency note line 45) and CLAUDE.md:51 are internally consistent with each other and with the claude-code=3 default. The cautionary comment at cmd/gadfly/main.go:233 ("run models one at a time (provider lane cap 1) and raise this") is generic guidance, not a statement of the current default, so it is not stale.

⚡ Performance — No material issues found

VERDICT: No material issues found

Through the performance lens, this PR is a default-value bump (claude-code=1 → 3) plus comment/doc rewording — no code logic changes. The effect is strictly a wall-clock reduction (serial → 3 concurrent model lanes), which is a performance improvement, not a regression.

Verified:

.gitea/workflows/review-reusable.yml:52 — the only functional change is the default string; it flows through GADFLY_PROVIDER_CONCURRENCY (:129) into the existing entrypoint cap logic (entrypoint.sh:151, :169). No new allocation/copy/loop paths introduced.
The 3 × 5 = 15 concurrent claude -p claim is arithmetically consistent with provider_concurrency=claude-code=3 × provider_lens_concurrency=claude-code=5 (:52–:53).

No N+1, unbounded growth, hot-loop, or blocking-path concerns apply — this is a concurrency knob, not algorithmic code. Runner-side resource contention from 15 concurrent processes is a runtime/infra consideration the PR already calls out, not a code-level performance regression. Nothing to flag in my lane.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

The diff is a config-default bump (claude-code=1 → 3) plus matching comment/CLAUDE.md text. Through the error-handling lens I verified the downstream paths that this concurrency value feeds, and they already handle the unhappy paths the higher concurrency could provoke:

entrypoint.sh:208 validates the parsed cap ([ "$cap" -ge 1 ] 2>/dev/null || cap=1), so the new claude-code=3 is guarded against non-numeric/garbage values and coerces to a safe default — no panic on bad input.
entrypoint.sh:183 review_one wraps run.sh in || log "...failed (continuing)", so a per-model failure (e.g. a 429 from the higher concurrency) is swallowed and the lane continues rather than aborting the whole pass — exactly the "individual lenses error (advisory)" behavior the PR description claims.
entrypoint.sh:189-191 marks a never-started status file done so the status board can still reach N/N even if a model fails before writing real status — relevant when more concurrent claudes race/fail.
entrypoint.sh:215 wait -n 2>/dev/null || wait falls back portably; cap=3 is a tiny integer, no overflow/boundary concern.

The rate-limit/429 handling is pre-existing and unchanged by this diff; the PR only changes the knob's default, not the error paths. Nothing in my lane is materially wrong.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 42s}

### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> VERDICT: No material issues found - The diff is purely a default-value/comment change in `.gitea/workflows/review-reusable.yml` (`claude-code=1 → 3`) plus matching doc text. No authn/authz, secret-handling, input-validation, injection, or untrusted-data-flow changes were introduced. Verified by reading the full inputs/secrets/permissions block (`review-reusable.yml:47-86`) — the secret declarations, least-privilege forwarding guidance, `permissions:` grants, and pinning (`uses: docker://...:sha-c342bdb`) are unchanged. - The only security-adjacent consideration is whether raising concurrency constitutes a DoS vector. It does not: it is a self-imposed limit on the reviewer's own outbound `claude -p` calls against the project's own subscription; there is no attacker-controlled input that influences the concurrency value, and an over-limit surfaces as advisory per-lens errors per the documented behavior. No externally exploitable surface is added. </details> <details><summary>🎯 Correctness — No material issues found</summary> VERDICT: No material issues found - Verified the concurrency math behind the change. `entrypoint.sh:159` `provider_of` resolves `claude-code/sonnet`, `claude-code/opus`, `claude-code/opus:max` to the `claude-code` lane; `run_lane` (`entrypoint.sh:206-218`) launches up to `cap` models concurrently via `wait -n`. With `claude-code=3` and exactly 3 claude-code models in the default `models` list, all 3 run at once — consistent with the "all 3 claude models at once" comment. - Verified the peak-concurrency claim: `cmd/gadfly/main.go:281-286` `lensConcurrency` reads `GADFLY_PROVIDER_LENS_CONCURRENCY` for the model's provider (claude-code=5), and the default `specialists` list has exactly 5 lenses, so each model runs all 5 lenses concurrently. 3 models × 5 lenses = 15 concurrent `claude -p`, matching the updated comment at `review-reusable.yml:44-46`. - The `ollama-cloud` lane is untouched (cap=3, 3 cloud models, lens_concurrency=3) — no cross-lane interaction change. - Comment/doc edits are accurate and consistent with the new defaults. No logic or semantic correctness problem in this lane. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> ## Corrected review VERDICT: Minor issues - **`.gitea/workflows/adversarial-review.yml:6-8`** — stale comment, now contradicts the inherited default. The caller's header still reads *"claude models run one at a time, each with all 5 lenses at once"*, but this workflow `uses: ./.gitea/workflows/review-reusable.yml` with no `provider_concurrency` override (its `with:` block at lines 51-55 sets only `allowed_users`), so it inherits the new `claude-code=3` default — i.e. claude models now run concurrently, not one at a time. The PR updated the two other places that said "serial / one at a time" (the reusable header + inputs block, and `CLAUDE.md:51`) but missed this third copy of the same phrasing. Suggested fix: update lines 7-8 to match the reusable header, e.g. *"5-lens suite (3 claude models concurrent, each with all 5 lenses at once)"* — same wording already used in `review-reusable.yml:16` and `CLAUDE.md:51` for consistency. The new wording inside `review-reusable.yml` (header line 16, inputs block lines 38-39, peak-concurrency note line 45) and `CLAUDE.md:51` are internally consistent with each other and with the `claude-code=3` default. The cautionary comment at `cmd/gadfly/main.go:233` ("run models one at a time (provider lane cap 1) and raise this") is generic guidance, not a statement of the current default, so it is not stale. </details> <details><summary>⚡ Performance — No material issues found</summary> **VERDICT: No material issues found** Through the performance lens, this PR is a default-value bump (`claude-code=1 → 3`) plus comment/doc rewording — no code logic changes. The effect is strictly a wall-clock reduction (serial → 3 concurrent model lanes), which is a performance improvement, not a regression. Verified: - `.gitea/workflows/review-reusable.yml:52` — the only functional change is the default string; it flows through `GADFLY_PROVIDER_CONCURRENCY` (`:129`) into the existing entrypoint cap logic (`entrypoint.sh:151`, `:169`). No new allocation/copy/loop paths introduced. - The `3 × 5 = 15 concurrent claude -p` claim is arithmetically consistent with `provider_concurrency=claude-code=3` × `provider_lens_concurrency=claude-code=5` (`:52`–`:53`). No N+1, unbounded growth, hot-loop, or blocking-path concerns apply — this is a concurrency knob, not algorithmic code. Runner-side resource contention from 15 concurrent processes is a runtime/infra consideration the PR already calls out, not a code-level performance regression. Nothing to flag in my lane. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> VERDICT: No material issues found The diff is a config-default bump (`claude-code=1 → 3`) plus matching comment/CLAUDE.md text. Through the error-handling lens I verified the downstream paths that this concurrency value feeds, and they already handle the unhappy paths the higher concurrency could provoke: - `entrypoint.sh:208` validates the parsed cap (`[ "$cap" -ge 1 ] 2>/dev/null || cap=1`), so the new `claude-code=3` is guarded against non-numeric/garbage values and coerces to a safe default — no panic on bad input. - `entrypoint.sh:183` `review_one` wraps `run.sh` in `|| log "...failed (continuing)"`, so a per-model failure (e.g. a 429 from the higher concurrency) is swallowed and the lane continues rather than aborting the whole pass — exactly the "individual lenses error (advisory)" behavior the PR description claims. - `entrypoint.sh:189-191` marks a never-started status file done so the status board can still reach N/N even if a model fails before writing real status — relevant when more concurrent claudes race/fail. - `entrypoint.sh:215` `wait -n 2>/dev/null || wait` falls back portably; `cap=3` is a tiny integer, no overflow/boundary concern. The rate-limit/429 handling is pre-existing and unchanged by this diff; the PR only changes the knob's default, not the error paths. Nothing in my lane is materially wrong. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 42s

gitea-actions bot commented

2026-06-28 03:49:32 +00:00

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

No material issues found

I went through the change looking for any new authn/authz, injection, SSRF, deserialization, secret-handling, or untrusted-input surface introduced by raising claude-code from 1 → 3 in provider_concurrency. There is none:

The diff only touches a workflow input default (provider_concurrency) and its accompanying prose in .gitea/workflows/review-reusable.yml:52 and CLAUDE.md:51. No code, no env-construction, no new subprocess boundaries.
Verified by reading cmd/gadfly/engine.go:127-165 and the surrounding claudeEnv() call site (also referenced in README.md:125-127): the CLI subprocess still runs with a minimal, hard-coded env (auth token + PATH/HOME/locale/GADFLY_CLAUDE_*) — concurrency is irrelevant to what reaches the child process.
Verified entrypoint.sh:167-176 (provider_cap): the new value claude-code=3 is parsed by the same key=value splitter as the old claude-code=1, no eval, no shell re-injection — a non-numeric garbage value silently falls back to GADFLY_CONCURRENCY (default 1). Concurrency knobs ship as operator-controlled workflow inputs, not PR-author or commenter input, so no untrusted-input path was widened.
Secret forwarding (OLLAMA_CLOUD_API_KEY, CLAUDE_CODE_OAUTH_TOKEN) is unchanged — workflow_call.secrets declared at .gitea/workflows/review-reusable.yml:69-80 still gates forwarding, least-privilege semantics intact.
Rate-limit / DoS via the upstream Anthropic subscription is a policy/throughput concern, explicitly called out by the PR ("if it 429s, individual lenses error (advisory)") and already disclosed in README.md:137-141. Not a code security finding; it lives on the operator side of the trust boundary.

Outside my lens: nothing material to add.

🎯 Correctness — No material issues found

VERDICT: No material issues found

Verified the diff through the correctness lens:

The claude-code=3 default flows into GADFLY_PROVIDER_CONCURRENCY, parsed by provider_cap() in entrypoint.sh:167-176 and applied as the per-lane semaphore in run_lane() (entrypoint.sh:206-218). Syntax is the same as before; only the cap value changed.
The peak-concurrency math (provider_concurrency × provider_lens_concurrency = 3 × 5 = 15) holds:
- 3 model processes (one per claude spec) are spawned by the claude-code lane's loop.
- Each model's runSpecialists (cmd/gadfly/main.go:234-271) uses min(lensConcurrency(), len(specialists)), with lensConcurrency() (main.go:281-286) reading GADFLY_PROVIDER_LENS_CONCURRENCY's claude-code=5 override. modelProvider() (model.go:146-155) correctly returns "claude-code" for claude-code/<model> specs, matching entrypoint.sh's provider_of, so the override resolves.
- 3 × 5 = 15 concurrent claude -p per pass, then each lens's recheck (reviewWithSpecialist, main.go:309-332) runs sequentially after its review pass within the same goroutine — no extra multiplication.
The recheck pass multiplies in wall-clock risk only inside each lens (review + recheck share the per-lens GADFLY_TIMEOUT_SECS=600 context), not across lenses, so 15-way parallelism is bounded correctly.
timeout_minutes=90 (line 61) is still safely above the worst-case parallel wall clock (two sequential claude -p per lens at 600s each = 1200s = 20min, run in parallel).
Job wall-clock should decrease vs. the prior claude-code=1 serial config — exactly the PR's stated intent.

Nothing in the diff is mis-computed or breaks an invariant. The remaining concern (15 concurrent claude -p hitting a subscription rate limit) is operational, not a logic bug; the PR explicitly documents it and the system is designed for individual lens failures to be advisory.

🧹 Code cleanliness & maintainability — Minor issues

VERDICT: Minor issues

Stale comment in adversarial-review.yml:7-8 (verified). The PR updates the concurrency narrative in review-reusable.yml (lines 16, 38, 45, 52) and CLAUDE.md:51, but the dogfood caller file .gitea/workflows/adversarial-review.yml still says:

# 3 cloud models + Claude Code (sonnet, opus, opus:max), 5-lens suite (claude models run
# one at a time, each with all 5 lenses at once).

This directly contradicts the new default (claude-code=3) the PR just shipped in the workflow this file calls. Since adversarial-review.yml is the in-repo exemplar consumers read alongside examples/, leaving its header stale makes the docs self-inconsistent on the very first screen they'd open. The PR description even claims "Comments + CLAUDE.md updated to match" — this is the third comment site that was missed.

Fix: rewrite lines 7-8 to match the reusable's updated wording, e.g.

# 3 cloud models + Claude Code (sonnet, opus, opus:max), 5-lens suite (3 claude models
# concurrent, 5 lenses each).

(Verified via repo-wide grep: no other stale instances of "claude-code=1", "claude models … serial", or "one at a time" in comment contexts; the remaining hit in cmd/gadfly/main.go:233 is unrelated — it documents a generic single-model-provider lane behavior.)

⚡ Performance — Minor issues

VERDICT: Minor issues

Synchronous burst at t=0 across all 3 claude lanes. entrypoint.sh:206-216 run_lane spawns every model in the claude-code lane via review_one "$m" & and only waits once inflight >= cap (3); with 3 claude models and cap=3, all three are backgrounded before the first wait. The per-binary runSpecialists then fills its buffered sem of size 5 at cmd/gadfly/main.go:244-248 before any goroutine actually starts, so all 15 claude -p processes can be exec'd within the same millisecond. A small per-model stagger (sleep 1–2s between launches in run_lane, or an initial delay before sem <- struct{}{} in runSpecialists) would convert the spike into roughly the same steady-state concurrency while shedding the discard rate — strictly an efficiency win at zero wall-clock cost. Verified by reading entrypoint.sh:206-216 and cmd/gadfly/main.go:242-269.
PR math undercounts the peak by a factor of 2. The workflow comment claims "3 models × 5 lenses = up to 15 concurrent claude -p per pass" (.gitea/workflows/review-reusable.yml:44-46), but reviewWithSpecialist (cmd/gadfly/main.go:309-332) does a draft pass and, when shouldRecheck(draft) is true, a recheck pass — and the lensConcurrency() cap of 5 is shared across the whole runSpecialists. On non-clean drafts the recheck wave can add another 15 concurrent claude -p (different pass, same cap), giving a real-world peak of 30 concurrent when the recheck trigger fires for every lens. Not wrong, just understated — worth fixing the comment text on .gitea/workflows/review-reusable.yml:44-46 so operators dial the knob against the right number. Verified by reading cmd/gadfly/main.go:309-332, cmd/gadfly/recheck.go:65-76, and cmd/gadfly/main.go:242-269.
opus:max cost multiplier is hidden in the wall-clock framing. claude-code/opus:max sets MAX_THINKING_TOKENS=31999 (cmd/gadfly/engine.go:66, applied at engine.go:152-155). The PR description frames this purely as a wall-clock / concurrency change; a perf/cost-conscious reviewer will note that effective compute and token spend roughly 6× for the opus:max lane (3× more requests × ~2× thinking per request vs. plain opus), not 3×. Not a reason to revert, but the comments at .gitea/workflows/review-reusable.yml:44-46 and CLAUDE.md:51 should mention the cost dimension, not just the 15-concurrent-request count. Verified by reading cmd/gadfly/engine.go:63-66 and cmd/gadfly/engine.go:148-155.
No backoff / shared throttle on 429s. claudeCodeEngine.runPass (cmd/gadfly/engine.go:148-214) has no retry, no backoff, and no shared state across the 15 parallel processes. When the subscription starts 429-ing, every in-flight pass surfaces its error independently (reviewWithSpecialist at cmd/gadfly/main.go:316-319), each lens posts an advisory failure, and nothing coordinates a retry. From an efficiency standpoint this is wasted work — a tiny shared backoff (e.g., a process-level rate gate in claudeCodeEngine reading a recent-failure timestamp) would let the herd recover instead of all failing at once. Probably out of scope for a "bump the knob" PR, but it's the actual mechanism that will turn this knob change from "faster" to "flaky" if the subscription balks. Verified by reading cmd/gadfly/engine.go:148-214.

Nothing else material in the performance lane: no N+1, no quadratic loop, no unbounded growth, no per-request allocation bloat introduced. The change is a small, surgical edit to two strings in the workflow defaults plus matching comments.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

I walked the changed paths and the code that has to absorb the higher concurrency. Through the error-handling & edge cases lens, the bump claude-code=1 → 3 (peak from 5 to 15 concurrent claude -p) is absorbed cleanly by the existing machinery:

Lens error routing already covers the 429 case. Verified in cmd/gadfly/main.go:316-319 and cmd/gadfly/engine.go:184-205: a claude -p failure (including IsError, e.g. a rate-limit response) becomes fmt.Errorf("claude reported error (%s): …"), which reviewWithSpecialist renders as ⚠️ This reviewer failed to complete: … with errored=true. runSpecialists recovers from panics the same way (cmd/gadfly/main.go:256-261). One failed lens never sinks the suite, and the status board marks the lens finished/errored so the swarm can still complete (cmd/gadfly/status.go:75-98).
Lane-level fault isolation is intact. entrypoint.sh:183 logs "model $1 failed (continuing)" and the lane still waits; the 429-spam scenario the PR description calls out degrades to per-lens advisory notices, exactly as advertised.
Concurrency-knob parsing is fail-safe. entrypoint.sh:208 ([ "$cap" -ge 1 ] 2>/dev/null || cap=1), cmd/gadfly/main.go:281-302 (providerOverride ignores unparseable entries), and cmd/gadfly/main.go:450-459 (envInt falls back on <=0 or non-integer) all guard the new claude-code=3 default and any user override. No nil/zero/bad-string path I can see.
Per-lens timeouts still bound worst-case wall time. defaultTimeoutSecs=300 per lens, workflow default timeout_secs=600 (cmd/gadfly/main.go:86, .gitea/workflows/review-reusable.yml:54); each claude -p gets exec.CommandContext with a Cancel that SIGKILLs the whole process group (cmd/gadfly/engine.go:159-165). 15 concurrent × timeout cap is still bounded — no orphan/timeout-leak risk introduced.
Recheck failure stays non-fatal. cmd/gadfly/main.go:322-330: a failed recheck logs and emits the unverified draft; the lens still produces output. Increased parallelism doesn't change that.
Status-board writer is best-effort and atomic. cmd/gadfly/status.go:102-131: os.CreateTemp + rename within the same dir, errors swallowed. 3× the writers still produces consistent snapshots; the bash board (scripts/status-board.sh:114-129) also clears its cached comment id on PATCH failure, so a transient 5xx from Gitea is retried next tick instead of leaving a half-state.
Math in the comment is right. provider_concurrency × provider_lens_concurrency = 3 × 5 = 15 per pass (review, then recheck). Confirmed against the new defaults at .gitea/workflows/review-reusable.yml:52-53.

Nothing in this diff introduces a new unhandled edge case, swallowed error, missing cleanup, or boundary bug that I can verify. The operational risk the PR itself flags (subscription 429s at 15-way parallelism) routes through the existing advisory-on-failure path — that's a performance/operational concern, not an error-handling defect.

Outside my lens: nothing severe to add.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 19s}

### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> **No material issues found** I went through the change looking for any new authn/authz, injection, SSRF, deserialization, secret-handling, or untrusted-input surface introduced by raising `claude-code` from 1 → 3 in `provider_concurrency`. There is none: - The diff only touches a workflow *input default* (`provider_concurrency`) and its accompanying prose in `.gitea/workflows/review-reusable.yml:52` and `CLAUDE.md:51`. No code, no env-construction, no new subprocess boundaries. - Verified by reading `cmd/gadfly/engine.go:127-165` and the surrounding `claudeEnv()` call site (also referenced in `README.md:125-127`): the CLI subprocess still runs with a *minimal, hard-coded* env (auth token + `PATH`/`HOME`/locale/`GADFLY_CLAUDE_*`) — concurrency is irrelevant to what reaches the child process. - Verified `entrypoint.sh:167-176` (`provider_cap`): the new value `claude-code=3` is parsed by the same `key=value` splitter as the old `claude-code=1`, no eval, no shell re-injection — a non-numeric garbage value silently falls back to `GADFLY_CONCURRENCY` (default 1). Concurrency knobs ship as operator-controlled workflow inputs, not PR-author or commenter input, so no untrusted-input path was widened. - Secret forwarding (`OLLAMA_CLOUD_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`) is unchanged — `workflow_call.secrets` declared at `.gitea/workflows/review-reusable.yml:69-80` still gates forwarding, least-privilege semantics intact. - Rate-limit / DoS via the upstream Anthropic subscription is a policy/throughput concern, explicitly called out by the PR ("if it 429s, individual lenses error (advisory)") and already disclosed in `README.md:137-141`. Not a code security finding; it lives on the operator side of the trust boundary. Outside my lens: nothing material to add. </details> <details><summary>🎯 Correctness — No material issues found</summary> **VERDICT: No material issues found** Verified the diff through the correctness lens: - The `claude-code=3` default flows into `GADFLY_PROVIDER_CONCURRENCY`, parsed by `provider_cap()` in `entrypoint.sh:167-176` and applied as the per-lane semaphore in `run_lane()` (`entrypoint.sh:206-218`). Syntax is the same as before; only the cap value changed. - The peak-concurrency math (`provider_concurrency × provider_lens_concurrency = 3 × 5 = 15`) holds: - 3 model processes (one per claude spec) are spawned by the `claude-code` lane's loop. - Each model's `runSpecialists` (`cmd/gadfly/main.go:234-271`) uses `min(lensConcurrency(), len(specialists))`, with `lensConcurrency()` (`main.go:281-286`) reading `GADFLY_PROVIDER_LENS_CONCURRENCY`'s `claude-code=5` override. `modelProvider()` (`model.go:146-155`) correctly returns `"claude-code"` for `claude-code/<model>` specs, matching `entrypoint.sh`'s `provider_of`, so the override resolves. - 3 × 5 = 15 concurrent `claude -p` per pass, then each lens's recheck (`reviewWithSpecialist`, `main.go:309-332`) runs sequentially after its review pass within the same goroutine — no extra multiplication. - The recheck pass multiplies in wall-clock risk only inside each lens (review + recheck share the per-lens `GADFLY_TIMEOUT_SECS=600` context), not across lenses, so 15-way parallelism is bounded correctly. - `timeout_minutes=90` (line 61) is still safely above the worst-case parallel wall clock (two sequential `claude -p` per lens at 600s each = 1200s = 20min, run in parallel). - Job wall-clock should *decrease* vs. the prior `claude-code=1` serial config — exactly the PR's stated intent. Nothing in the diff is mis-computed or breaks an invariant. The remaining concern (15 concurrent `claude -p` hitting a subscription rate limit) is operational, not a logic bug; the PR explicitly documents it and the system is designed for individual lens failures to be advisory. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> ## VERDICT: Minor issues - **Stale comment in `adversarial-review.yml:7-8` (verified).** The PR updates the concurrency narrative in `review-reusable.yml` (lines 16, 38, 45, 52) and `CLAUDE.md:51`, but the dogfood caller file `.gitea/workflows/adversarial-review.yml` still says: > `# 3 cloud models + Claude Code (sonnet, opus, opus:max), 5-lens suite (claude models run` > `# one at a time, each with all 5 lenses at once).` This directly contradicts the new default (`claude-code=3`) the PR just shipped in the workflow this file calls. Since `adversarial-review.yml` is the in-repo exemplar consumers read alongside `examples/`, leaving its header stale makes the docs self-inconsistent on the very first screen they'd open. The PR description even claims "Comments + CLAUDE.md updated to match" — this is the third comment site that was missed. Fix: rewrite lines 7-8 to match the reusable's updated wording, e.g. > `# 3 cloud models + Claude Code (sonnet, opus, opus:max), 5-lens suite (3 claude models` > `# concurrent, 5 lenses each).` (Verified via repo-wide grep: no other stale instances of "claude-code=1", "claude models … serial", or "one at a time" in comment contexts; the remaining hit in `cmd/gadfly/main.go:233` is unrelated — it documents a generic single-model-provider lane behavior.) </details> <details><summary>⚡ Performance — Minor issues</summary> **VERDICT: Minor issues** - **Synchronous burst at t=0 across all 3 claude lanes.** `entrypoint.sh:206-216` `run_lane` spawns every model in the claude-code lane via `review_one "$m" &` and only waits once `inflight >= cap` (3); with 3 claude models and `cap=3`, all three are backgrounded before the first `wait`. The per-binary `runSpecialists` then fills its buffered `sem` of size 5 at `cmd/gadfly/main.go:244-248` *before* any goroutine actually starts, so all 15 `claude -p` processes can be exec'd within the same millisecond. A small per-model stagger (sleep 1–2s between launches in `run_lane`, or an initial delay before `sem <- struct{}{}` in `runSpecialists`) would convert the spike into roughly the same steady-state concurrency while shedding the discard rate — strictly an efficiency win at zero wall-clock cost. Verified by reading `entrypoint.sh:206-216` and `cmd/gadfly/main.go:242-269`. - **PR math undercounts the peak by a factor of 2.** The workflow comment claims "3 models × 5 lenses = up to 15 concurrent `claude -p` per pass" (`.gitea/workflows/review-reusable.yml:44-46`), but `reviewWithSpecialist` (`cmd/gadfly/main.go:309-332`) does a draft pass *and*, when `shouldRecheck(draft)` is true, a recheck pass — and the `lensConcurrency()` cap of 5 is shared across the whole `runSpecialists`. On non-clean drafts the recheck wave can add another 15 concurrent `claude -p` (different pass, same cap), giving a real-world peak of **30 concurrent** when the recheck trigger fires for every lens. Not wrong, just understated — worth fixing the comment text on `.gitea/workflows/review-reusable.yml:44-46` so operators dial the knob against the right number. Verified by reading `cmd/gadfly/main.go:309-332`, `cmd/gadfly/recheck.go:65-76`, and `cmd/gadfly/main.go:242-269`. - **`opus:max` cost multiplier is hidden in the wall-clock framing.** `claude-code/opus:max` sets `MAX_THINKING_TOKENS=31999` (`cmd/gadfly/engine.go:66`, applied at `engine.go:152-155`). The PR description frames this purely as a wall-clock / concurrency change; a perf/cost-conscious reviewer will note that effective compute and token spend roughly **6×** for the opus:max lane (3× more requests × ~2× thinking per request vs. plain opus), not 3×. Not a reason to revert, but the comments at `.gitea/workflows/review-reusable.yml:44-46` and `CLAUDE.md:51` should mention the cost dimension, not just the 15-concurrent-request count. Verified by reading `cmd/gadfly/engine.go:63-66` and `cmd/gadfly/engine.go:148-155`. - **No backoff / shared throttle on 429s.** `claudeCodeEngine.runPass` (`cmd/gadfly/engine.go:148-214`) has no retry, no backoff, and no shared state across the 15 parallel processes. When the subscription starts 429-ing, every in-flight pass surfaces its error independently (`reviewWithSpecialist` at `cmd/gadfly/main.go:316-319`), each lens posts an advisory failure, and nothing coordinates a retry. From an efficiency standpoint this is wasted work — a tiny shared backoff (e.g., a process-level rate gate in `claudeCodeEngine` reading a recent-failure timestamp) would let the herd recover instead of all failing at once. Probably out of scope for a "bump the knob" PR, but it's the actual mechanism that will turn this knob change from "faster" to "flaky" if the subscription balks. Verified by reading `cmd/gadfly/engine.go:148-214`. Nothing else material in the performance lane: no N+1, no quadratic loop, no unbounded growth, no per-request allocation bloat introduced. The change is a small, surgical edit to two strings in the workflow defaults plus matching comments. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **VERDICT: No material issues found** I walked the changed paths and the code that has to absorb the higher concurrency. Through the error-handling & edge cases lens, the bump `claude-code=1 → 3` (peak from 5 to 15 concurrent `claude -p`) is absorbed cleanly by the existing machinery: - **Lens error routing already covers the 429 case.** Verified in `cmd/gadfly/main.go:316-319` and `cmd/gadfly/engine.go:184-205`: a `claude -p` failure (including `IsError`, e.g. a rate-limit response) becomes `fmt.Errorf("claude reported error (%s): …")`, which `reviewWithSpecialist` renders as `⚠️ This reviewer failed to complete: …` with `errored=true`. `runSpecialists` recovers from panics the same way (`cmd/gadfly/main.go:256-261`). One failed lens never sinks the suite, and the status board marks the lens `finished`/`errored` so the swarm can still complete (`cmd/gadfly/status.go:75-98`). - **Lane-level fault isolation is intact.** `entrypoint.sh:183` logs `"model $1 failed (continuing)"` and the lane still `wait`s; the 429-spam scenario the PR description calls out degrades to per-lens advisory notices, exactly as advertised. - **Concurrency-knob parsing is fail-safe.** `entrypoint.sh:208` (`[ "$cap" -ge 1 ] 2>/dev/null || cap=1`), `cmd/gadfly/main.go:281-302` (`providerOverride` ignores unparseable entries), and `cmd/gadfly/main.go:450-459` (`envInt` falls back on `<=0` or non-integer) all guard the new `claude-code=3` default and any user override. No nil/zero/bad-string path I can see. - **Per-lens timeouts still bound worst-case wall time.** `defaultTimeoutSecs=300` per lens, workflow default `timeout_secs=600` (`cmd/gadfly/main.go:86`, `.gitea/workflows/review-reusable.yml:54`); each `claude -p` gets `exec.CommandContext` with a `Cancel` that `SIGKILL`s the whole process group (`cmd/gadfly/engine.go:159-165`). 15 concurrent × timeout cap is still bounded — no orphan/timeout-leak risk introduced. - **Recheck failure stays non-fatal.** `cmd/gadfly/main.go:322-330`: a failed recheck logs and emits the unverified draft; the lens still produces output. Increased parallelism doesn't change that. - **Status-board writer is best-effort and atomic.** `cmd/gadfly/status.go:102-131`: `os.CreateTemp` + rename within the same dir, errors swallowed. 3× the writers still produces consistent snapshots; the bash board (`scripts/status-board.sh:114-129`) also clears its cached comment id on PATCH failure, so a transient 5xx from Gitea is retried next tick instead of leaving a half-state. - **Math in the comment is right.** `provider_concurrency × provider_lens_concurrency = 3 × 5 = 15` per pass (review, then recheck). Confirmed against the new defaults at `.gitea/workflows/review-reusable.yml:52-53`. Nothing in this diff introduces a new unhandled edge case, swallowed error, missing cleanup, or boundary bug that I can verify. The operational risk the PR itself flags (subscription 429s at 15-way parallelism) routes through the existing advisory-on-failure path — that's a performance/operational concern, not an error-handling defect. Outside my lens: nothing severe to add. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 19s

gitea-actions bot commented

2026-06-28 03:49:32 +00:00

🪰 Gadfly review — `claude-code/opus` (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found (Security lens)

Verified the change against the checked-out workflow (review-reusable.yml:54, secrets block :69-123):

The only functional change is a default input value, provider_concurrency claude-code=1 → 3. It feeds GADFLY_PROVIDER_CONCURRENCY, a concurrency cap — it doesn't reach any shell exec, query, path, or deserialization sink, so it opens no injection/SSRF surface.
No change to authn/authz: allowed_users / trigger_phrase defaults and the gating logic are untouched.
No secret/credential exposure introduced: the least-privilege secret model (explicit workflow_call.secrets declaration + per-secret forwarding rather than secrets: inherit, review-reusable.yml:69-75,110-123) is unchanged. No new secret is added or logged.
The remaining hunks (CLAUDE.md, the two YAML comment blocks) are documentation only.

One security-adjacent note, in-lens but not a code flaw in this diff: raising default concurrency to 15 concurrent claude -p per pass increases the chance of provider 429s, which the PR itself acknowledges degrade to per-lens errors (advisory). That's an availability/cost trade-off, not an auth/injection/secret weakness — no insecure default in the security sense.

Nothing in my lane is materially wrong here.

🎯 Correctness — No material issues found

I've verified the concurrency semantics against entrypoint.sh and the binary. Here's my review through the Correctness lens.

VERDICT: No material issues found

This PR changes one default value (provider_concurrency claude-code=1 → 3) plus matching comment/doc text. I re-derived the claimed behavior from the actual mechanism rather than trusting the comments:

The "3 models × 5 lenses = 15" math is correct. Verified the two knobs are independent multiplicands:
- provider_cap() (entrypoint.sh:167-176) reads the per-provider entry from GADFLY_PROVIDER_CONCURRENCY; run_lane() (entrypoint.sh:206-218) uses it as cap, the number of models run concurrently within a provider lane (inflight >= cap → wait -n). So claude-code=3 ⇒ up to 3 claude models at once.
- Per-model lens fan-out is the separate GADFLY_PROVIDER_LENS_CONCURRENCY (claude-code=5, unchanged), consumed in the binary (cmd/gadfly/main.go:275-285).
- Peak = cap × lens_conc = 3 × 5 = 15 per pass. The comment's formula "provider_concurrency × provider_lens_concurrency" matches the implementation.
The model count and provider key are correct. The default models: list contains exactly three claude-code/* entries (sonnet, opus, opus:max), and provider_of("claude-code/sonnet") = "claude-code" (entrypoint.sh:159), which is the exact key used in the concurrency map. So all three are eligible to run in the one claude-code lane and the =3 cap actually applies to them. The 5-lens suite is likewise exactly the five specialists: entries.
Advisory invariant preserved. Lane work runs under review_one, which swallows failures (|| log "model $1 failed (continuing)", entrypoint.sh:183). A 429 from raising concurrency degrades an individual lens — it can't fail the job — consistent with the PR's "individual lenses error (advisory)" claim.
Docs kept in sync. Both the review-reusable.yml header/inputs comments and the CLAUDE.md architecture line were updated to "3 claude models concurrent," matching the new value (satisfies the repo's "keep docs current" maintenance rule, though that's adjacent to my lens).

No logic bug, unit error, or off-by-one in the changed values. The change is internally consistent and does what it claims.

🧹 Code cleanliness & maintainability — Minor issues

The finding is confirmed. Lines 7–8 of .gitea/workflows/adversarial-review.yml read "claude models run one at a time, each with all 5 lenses at once" — exactly the serial behavior this PR replaces with 3 concurrent claude models. This file is gadfly's own caller inheriting the default swarm, and the PR updated the same description in review-reusable.yml and CLAUDE.md but missed this third copy.

VERDICT: Minor issues

Stale comment in .gitea/workflows/adversarial-review.yml:7-8 — This file is gadfly's own caller of review-reusable.yml and inherits the default swarm. Its header comment still describes the pre-PR behavior: "5-lens suite (claude models run one at a time, each with all 5 lenses at once)." The PR changes the default to 3 claude models concurrent and updated the matching prose in review-reusable.yml (lines 16, 38–47, 52) and CLAUDE.md (line 51), but missed this third copy. Per CLAUDE.md's own rule that stale docs are a bug, line 7–8 should be updated to "all 3 claude models concurrent, each with all 5 lenses at once," mirroring the wording now used in the other two files.

⚡ Performance — No material issues found

VERDICT: No material issues found

This is a concurrency-tuning change — it raises claude-code provider concurrency from 1 to 3 in the reusable workflow's input default, plus matching comment/doc text. Through the performance lens it is a net positive, not a regression, and I verified the mechanism it relies on:

The change does what it claims and is a legitimate wall-clock win. entrypoint.sh:206-218 (run_lane) bounds each provider lane to cap concurrent models, where cap comes from provider_cap() reading GADFLY_PROVIDER_CONCURRENCY (entrypoint.sh:167-176). Lanes themselves already run in parallel. So before this change the claude-code lane ran its 3 models serially (cap=1) while the ollama lane finished early — claude genuinely was the tail. Bumping to cap=3 overlaps the three claude models; each still fans out to 5 lenses via the unchanged provider_lens_concurrency: claude-code=5. The 3×5=15 figure is correct.
No retry/backoff storm risk. I checked the claude-code engine (cmd/gadfly/engine.go:148-213): a failed/429 claude -p returns an error and the lens is dropped (advisory) — there's no internal retry or sleep, so rate-limiting does not add wasted wall-clock or amplify load. The PR's "lenses error (advisory)" claim holds.
No N+1, quadratic behavior, hot-loop work, unbounded growth, or new allocations are introduced — this is a single default-string edit.

One observation I could not verify as material (noted, not a finding): peak load becomes 15 concurrent claude -p (each a node process) plus the ollama lane's ~9 in-process agent loops, all on one Gitea act_runner. If the runner is memory-constrained, 15 node processes could thrash/OOM and erase the intended wall-clock gain. I have no runner-spec data to call this real, and the author already ships two dial-back knobs (provider_concurrency / provider_lens_concurrency) and frames this PR's own review as the live test. So I'm leaving it as a watch-item, not a blocking issue.

Docs (CLAUDE.md, the workflow header comments) were updated consistently with the value change.

🧯 Error handling & edge cases — No material issues found

I've verified the relevant paths. Findings below.

VERDICT: No material issues found

This diff is config-default + documentation only. Through the error-handling & edge-cases lens, there is nothing materially wrong, and I confirmed the unhappy paths the bump touches are already handled:

Model-failure path is unchanged and still advisory. entrypoint.sh:183 runs each model as bash run.sh || log "model $1 failed (continuing)" — a failed/rate-limited model is logged and the lane continues; the lane and the script never propagate a non-zero exit from review content. Bumping claude-code cap from 1→3 changes only how many models run at once, not this handler. Verified by reading review_one/run_lane (entrypoint.sh:178–218).
The 429 claim in the PR holds. runPass turns a CLI error/empty result/is_error into a returned error (engine.go:185,190,196,203–205) rather than panicking, so a rate-limited lens fails its own pass and is dropped advisorily — consistent with "individual lenses error (advisory)."
No new concurrency edge case is introduced. All concurrent claude -p invocations share the same cmd.Dir = e.repoDir and a minimal inherited HOME/TMPDIR (engine.go:150–151, claudeEnv engine.go:221–224). That shared-state pattern was already exercised at 5-concurrent (5 lenses per model via provider_lens_concurrency=5, unchanged on this line review-reusable.yml:53). Going 1→3 models makes it 15 concurrent against the same read-only repo dir and HOME — more of an already-running pattern, not a new class of collision. The wait -n reaping loop (entrypoint.sh:215) is also independent of the cap value.

No swallowed errors, missing cleanup, panics, or boundary bugs are introduced by this change within my lens.

(Note: I delivered this as a review rather than a plan — the task here is an adversarial code review, which is read-only and produces markdown findings, not an implementation plan.)

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 21s}

### 🪰 Gadfly review — `claude-code/opus` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> **VERDICT: No material issues found** (Security lens) Verified the change against the checked-out workflow (`review-reusable.yml:54`, secrets block `:69-123`): - The only functional change is a default input value, `provider_concurrency` `claude-code=1 → 3`. It feeds `GADFLY_PROVIDER_CONCURRENCY`, a concurrency cap — it doesn't reach any shell exec, query, path, or deserialization sink, so it opens no injection/SSRF surface. - No change to authn/authz: `allowed_users` / `trigger_phrase` defaults and the gating logic are untouched. - No secret/credential exposure introduced: the least-privilege secret model (explicit `workflow_call.secrets` declaration + per-secret forwarding rather than `secrets: inherit`, `review-reusable.yml:69-75,110-123`) is unchanged. No new secret is added or logged. - The remaining hunks (CLAUDE.md, the two YAML comment blocks) are documentation only. One security-adjacent note, in-lens but not a code flaw in this diff: raising default concurrency to 15 concurrent `claude -p` per pass increases the chance of provider 429s, which the PR itself acknowledges degrade to per-lens errors (advisory). That's an availability/cost trade-off, not an auth/injection/secret weakness — no insecure default in the security sense. Nothing in my lane is materially wrong here. </details> <details><summary>🎯 Correctness — No material issues found</summary> I've verified the concurrency semantics against `entrypoint.sh` and the binary. Here's my review through the Correctness lens. ## VERDICT: No material issues found This PR changes one default value (`provider_concurrency` `claude-code=1 → 3`) plus matching comment/doc text. I re-derived the claimed behavior from the actual mechanism rather than trusting the comments: - **The "3 models × 5 lenses = 15" math is correct.** Verified the two knobs are independent multiplicands: - `provider_cap()` (`entrypoint.sh:167-176`) reads the per-provider entry from `GADFLY_PROVIDER_CONCURRENCY`; `run_lane()` (`entrypoint.sh:206-218`) uses it as `cap`, the number of models run concurrently *within a provider lane* (`inflight >= cap` → `wait -n`). So `claude-code=3` ⇒ up to 3 claude models at once. - Per-model lens fan-out is the separate `GADFLY_PROVIDER_LENS_CONCURRENCY` (`claude-code=5`, unchanged), consumed in the binary (`cmd/gadfly/main.go:275-285`). - Peak = `cap × lens_conc = 3 × 5 = 15` per pass. The comment's formula "`provider_concurrency × provider_lens_concurrency`" matches the implementation. - **The model count and provider key are correct.** The default `models:` list contains exactly three `claude-code/*` entries (`sonnet`, `opus`, `opus:max`), and `provider_of("claude-code/sonnet")` = `"claude-code"` (`entrypoint.sh:159`), which is the exact key used in the concurrency map. So all three are eligible to run in the one `claude-code` lane and the `=3` cap actually applies to them. The `5-lens suite` is likewise exactly the five `specialists:` entries. - **Advisory invariant preserved.** Lane work runs under `review_one`, which swallows failures (`|| log "model $1 failed (continuing)"`, `entrypoint.sh:183`). A 429 from raising concurrency degrades an individual lens — it can't fail the job — consistent with the PR's "individual lenses error (advisory)" claim. - **Docs kept in sync.** Both the `review-reusable.yml` header/inputs comments and the `CLAUDE.md` architecture line were updated to "3 claude models concurrent," matching the new value (satisfies the repo's "keep docs current" maintenance rule, though that's adjacent to my lens). No logic bug, unit error, or off-by-one in the changed values. The change is internally consistent and does what it claims. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> The finding is confirmed. Lines 7–8 of `.gitea/workflows/adversarial-review.yml` read "claude models run one at a time, each with all 5 lenses at once" — exactly the serial behavior this PR replaces with 3 concurrent claude models. This file is gadfly's own caller inheriting the default swarm, and the PR updated the same description in `review-reusable.yml` and `CLAUDE.md` but missed this third copy. --- **VERDICT: Minor issues** - **Stale comment in `.gitea/workflows/adversarial-review.yml:7-8`** — This file is gadfly's own caller of `review-reusable.yml` and inherits the default swarm. Its header comment still describes the pre-PR behavior: "5-lens suite (claude models run **one at a time**, each with all 5 lenses at once)." The PR changes the default to 3 claude models concurrent and updated the matching prose in `review-reusable.yml` (lines 16, 38–47, 52) and `CLAUDE.md` (line 51), but missed this third copy. Per CLAUDE.md's own rule that stale docs are a bug, line 7–8 should be updated to "all 3 claude models concurrent, each with all 5 lenses at once," mirroring the wording now used in the other two files. </details> <details><summary>⚡ Performance — No material issues found</summary> ## VERDICT: No material issues found This is a concurrency-tuning change — it raises `claude-code` provider concurrency from `1` to `3` in the reusable workflow's input default, plus matching comment/doc text. Through the performance lens it is a net positive, not a regression, and I verified the mechanism it relies on: - **The change does what it claims and is a legitimate wall-clock win.** `entrypoint.sh:206-218` (`run_lane`) bounds each provider lane to `cap` concurrent models, where `cap` comes from `provider_cap()` reading `GADFLY_PROVIDER_CONCURRENCY` (`entrypoint.sh:167-176`). Lanes themselves already run in parallel. So before this change the `claude-code` lane ran its 3 models serially (`cap=1`) while the ollama lane finished early — claude genuinely was the tail. Bumping to `cap=3` overlaps the three claude models; each still fans out to 5 lenses via the unchanged `provider_lens_concurrency: claude-code=5`. The 3×5=15 figure is correct. - **No retry/backoff storm risk.** I checked the claude-code engine (`cmd/gadfly/engine.go:148-213`): a failed/`429` `claude -p` returns an error and the lens is dropped (advisory) — there's no internal retry or sleep, so rate-limiting does **not** add wasted wall-clock or amplify load. The PR's "lenses error (advisory)" claim holds. - No N+1, quadratic behavior, hot-loop work, unbounded growth, or new allocations are introduced — this is a single default-string edit. One observation I could **not** verify as material (noted, not a finding): peak load becomes 15 concurrent `claude -p` (each a node process) plus the ollama lane's ~9 in-process agent loops, all on one Gitea `act_runner`. If the runner is memory-constrained, 15 node processes could thrash/OOM and *erase* the intended wall-clock gain. I have no runner-spec data to call this real, and the author already ships two dial-back knobs (`provider_concurrency` / `provider_lens_concurrency`) and frames this PR's own review as the live test. So I'm leaving it as a watch-item, not a blocking issue. Docs (`CLAUDE.md`, the workflow header comments) were updated consistently with the value change. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I've verified the relevant paths. Findings below. --- **VERDICT: No material issues found** This diff is config-default + documentation only. Through the error-handling & edge-cases lens, there is nothing materially wrong, and I confirmed the unhappy paths the bump touches are already handled: - **Model-failure path is unchanged and still advisory.** `entrypoint.sh:183` runs each model as `bash run.sh || log "model $1 failed (continuing)"` — a failed/rate-limited model is logged and the lane continues; the lane and the script never propagate a non-zero exit from review content. Bumping `claude-code` cap from 1→3 changes only how many models run at once, not this handler. Verified by reading `review_one`/`run_lane` (`entrypoint.sh:178–218`). - **The 429 claim in the PR holds.** `runPass` turns a CLI error/empty result/`is_error` into a returned `error` (`engine.go:185,190,196,203–205`) rather than panicking, so a rate-limited lens fails its own pass and is dropped advisorily — consistent with "individual lenses error (advisory)." - **No *new* concurrency edge case is introduced.** All concurrent `claude -p` invocations share the same `cmd.Dir = e.repoDir` and a minimal inherited `HOME`/`TMPDIR` (`engine.go:150–151`, `claudeEnv` `engine.go:221–224`). That shared-state pattern was *already* exercised at 5-concurrent (5 lenses per model via `provider_lens_concurrency=5`, unchanged on this line `review-reusable.yml:53`). Going 1→3 models makes it 15 concurrent against the same read-only repo dir and HOME — more of an already-running pattern, not a new class of collision. The `wait -n` reaping loop (`entrypoint.sh:215`) is also independent of the cap value. No swallowed errors, missing cleanup, panics, or boundary bugs are introduced by this change within my lens. *(Note: I delivered this as a review rather than a plan — the task here is an adversarial code review, which is read-only and produces markdown findings, not an implementation plan.)* </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 21s

gitea-actions bot commented

2026-06-28 03:49:32 +00:00

🪰 Gadfly review — `claude-code/sonnet` (claude-code)

Verdict: No material issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

No material issues found

The diff touches only two things: a comment update in CLAUDE.md and the provider_concurrency default changing from claude-code=1 to claude-code=3 in the workflow YAML. Through the security lens:

CLAUDE_CODE_OAUTH_TOKEN sharing across concurrent invocations — not a new concern. The token was already shared across 5 concurrent lens invocations per model (provider_lens_concurrency=claude-code=5). Going 1→3 models is a quantitative change (5→15 concurrent users of the token), not a qualitative security change. OAuth tokens are designed for concurrent use.
No new input paths, injection surfaces, or credential handling changes — the secrets block, how secrets flow into the container env, and how they're consumed are all unchanged.
Rate-limit 429s — acknowledged in the PR and handled gracefully (advisory, not a security exposure).

Pre-existing issues the Explore agent surfaced (git clone URL embedding the token in plaintext, lack of numeric validation on concurrency inputs) are not introduced by this diff and are out of scope for this review.

🎯 Correctness — No material issues found

No material issues found

The change is a single-knob bump (claude-code=1 → 3 in provider_concurrency) plus comment/doc updates. Verified:

Math is correct. The formula cited in the comment — provider_concurrency × provider_lens_concurrency = 3 × 5 = 15 — is arithmetic-sound. provider_concurrency=claude-code=3 and provider_lens_concurrency=claude-code=5 multiply to 15, consistent with "up to 15 concurrent claude -p per pass." Verified against the actual file values at lines 52-53 of review-reusable.yml.
Model count matches the concurrency cap. The models default lists exactly 3 claude-code entries (claude-code/sonnet, claude-code/opus, claude-code/opus:max), so claude-code=3 is a tight fit — no over- or under-provisioning. Verified at line 48.
All in-file cross-references are consistent. The comment block updated at lines 44-46 ("3 models × 5 lenses"), the provider_concurrency default, and the CLAUDE.md description all agree. No internal contradiction.

Nothing under the Correctness lens is wrong.

🧹 Code cleanliness & maintainability — No material issues found

No stale references elsewhere. The diff is a pure comment/doc update to a YAML config file and CLAUDE.md — no logic changes, no code, no symbols to verify.

No material issues found

The change is two comment/default-value updates in a YAML workflow input block and a matching line in CLAUDE.md. From a maintainability standpoint:

All three touched spots (review-reusable.yml block comment, input default comment, and CLAUDE.md architecture table) are mutually consistent: each now says "3 claude models concurrent, 5 lenses each."
The "default 1" in CLAUDE.md:145 is about GADFLY_CONCURRENCY (the binary-level fallback, confirmed by entrypoint.sh:157: DEFAULT_CONC="${GADFLY_CONCURRENCY:-1}"), not the reusable workflow input — it was not stale before and still isn't.
No stale copies in README.md or examples/.
No dead code, duplicated logic, or naming issues introduced.

⚡ Performance — No material issues found

Verdict: No material issues found

This change raises claude-code provider concurrency from 1 → 3 in a YAML config file and matching doc comments. Through the performance lens, it is a net improvement — it removes a serial bottleneck.

Correct concurrency bound: Verified in entrypoint.sh's run_lane() function, which uses a wait -n semaphore pattern to hold inflight jobs to cap. With 3 models and cap=3, all three spawn immediately with no blocking; the semaphore fires only after the last model is queued, then the final bare wait drains the lane. Behavior is correct.
No unbounded growth: Peak load is a fixed ceiling — 3 models × 5 lenses = 15 concurrent claude -p per pass. Both knobs are explicit config inputs a consumer can lower.
I/O contention is unlikely to dominate: Each claude -p process is network-I/O-bound (waiting on the API), not CPU- or disk-bound, so 15 concurrent processes on the same repo checkout should not cause meaningful resource contention on the runner.
Two-pass cost acknowledged: Review + recheck each spawn the full fleet, but that was true before this change; the PR doesn't alter that.

No efficiency regressions introduced. The 429-on-rate-limit risk is the only real operational concern, and it's correctly scoped as advisory (individual lenses error, not the job).

🧯 Error handling & edge cases — No material issues found

No material issues found

This diff is YAML config + comments only — claude-code=1 → 3 in provider_concurrency. I verified the error handling infrastructure against the increased concurrency (15 concurrent claude -p vs the previous 5):

Per-lens panic recovery (main.go): each goroutine has its own recover(), so a rate-limit crash in one lens cannot propagate to the other 14.
Per-lens timeout context (main.go): each specialist gets an independent timeout; one 429-stalled lens cannot starve others.
errored=true tracking (main.go → consolidate.go): rate limit errors surface as ⚠️ notices in the relevant specialist section while other lenses' results are preserved intact.
run.sh non-zero exit fallback (scripts/run.sh): binary failure is caught, wrapped in a notice, and the script exits 0 — subsequent model lanes continue unimpeded.
entrypoint.sh lane isolation: || log "model $1 failed (continuing)" swallows individual model failures; wait -n correctly drains the concurrent slot so the lane cap doesn't deadlock.
Status board write safety: mutex + temp-file-then-rename already handles concurrent goroutine updates.

The advisory-only invariant is preserved: 429s under the new 15-way concurrency degrade to per-lens ⚠️ notices, not CI failures. Nothing in this diff creates new unhandled edge cases.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 57s}

### 🪰 Gadfly review — `claude-code/sonnet` (claude-code) **Verdict: No material issues found** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> **No material issues found** The diff touches only two things: a comment update in CLAUDE.md and the `provider_concurrency` default changing from `claude-code=1` to `claude-code=3` in the workflow YAML. Through the security lens: - **CLAUDE_CODE_OAUTH_TOKEN sharing across concurrent invocations** — not a new concern. The token was already shared across 5 concurrent lens invocations per model (provider_lens_concurrency=claude-code=5). Going 1→3 models is a quantitative change (5→15 concurrent users of the token), not a qualitative security change. OAuth tokens are designed for concurrent use. - **No new input paths, injection surfaces, or credential handling changes** — the secrets block, how secrets flow into the container env, and how they're consumed are all unchanged. - **Rate-limit 429s** — acknowledged in the PR and handled gracefully (advisory, not a security exposure). Pre-existing issues the Explore agent surfaced (git clone URL embedding the token in plaintext, lack of numeric validation on concurrency inputs) are not introduced by this diff and are out of scope for this review. </details> <details><summary>🎯 Correctness — No material issues found</summary> **No material issues found** The change is a single-knob bump (`claude-code=1 → 3` in `provider_concurrency`) plus comment/doc updates. Verified: - **Math is correct.** The formula cited in the comment — `provider_concurrency × provider_lens_concurrency = 3 × 5 = 15` — is arithmetic-sound. `provider_concurrency=claude-code=3` and `provider_lens_concurrency=claude-code=5` multiply to 15, consistent with "up to 15 concurrent `claude -p` per pass." Verified against the actual file values at lines 52-53 of `review-reusable.yml`. - **Model count matches the concurrency cap.** The `models` default lists exactly 3 `claude-code` entries (`claude-code/sonnet`, `claude-code/opus`, `claude-code/opus:max`), so `claude-code=3` is a tight fit — no over- or under-provisioning. Verified at line 48. - **All in-file cross-references are consistent.** The comment block updated at lines 44-46 ("3 models × 5 lenses"), the `provider_concurrency` default, and the CLAUDE.md description all agree. No internal contradiction. Nothing under the Correctness lens is wrong. </details> <details><summary>🧹 Code cleanliness & maintainability — No material issues found</summary> No stale references elsewhere. The diff is a pure comment/doc update to a YAML config file and CLAUDE.md — no logic changes, no code, no symbols to verify. --- **No material issues found** The change is two comment/default-value updates in a YAML workflow input block and a matching line in CLAUDE.md. From a maintainability standpoint: - All three touched spots (`review-reusable.yml` block comment, input default comment, and CLAUDE.md architecture table) are mutually consistent: each now says "3 claude models concurrent, 5 lenses each." - The "default 1" in `CLAUDE.md:145` is about `GADFLY_CONCURRENCY` (the binary-level fallback, confirmed by `entrypoint.sh:157: DEFAULT_CONC="${GADFLY_CONCURRENCY:-1}"`), not the reusable workflow input — it was not stale before and still isn't. - No stale copies in `README.md` or `examples/`. - No dead code, duplicated logic, or naming issues introduced. </details> <details><summary>⚡ Performance — No material issues found</summary> **Verdict: No material issues found** This change raises `claude-code` provider concurrency from 1 → 3 in a YAML config file and matching doc comments. Through the performance lens, it is a net improvement — it removes a serial bottleneck. - **Correct concurrency bound**: Verified in `entrypoint.sh`'s `run_lane()` function, which uses a `wait -n` semaphore pattern to hold inflight jobs to `cap`. With 3 models and `cap=3`, all three spawn immediately with no blocking; the semaphore fires only after the last model is queued, then the final bare `wait` drains the lane. Behavior is correct. - **No unbounded growth**: Peak load is a fixed ceiling — 3 models × 5 lenses = 15 concurrent `claude -p` per pass. Both knobs are explicit config inputs a consumer can lower. - **I/O contention is unlikely to dominate**: Each `claude -p` process is network-I/O-bound (waiting on the API), not CPU- or disk-bound, so 15 concurrent processes on the same repo checkout should not cause meaningful resource contention on the runner. - **Two-pass cost acknowledged**: Review + recheck each spawn the full fleet, but that was true before this change; the PR doesn't alter that. No efficiency regressions introduced. The 429-on-rate-limit risk is the only real operational concern, and it's correctly scoped as advisory (individual lenses error, not the job). </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **No material issues found** This diff is YAML config + comments only — `claude-code=1 → 3` in `provider_concurrency`. I verified the error handling infrastructure against the increased concurrency (15 concurrent `claude -p` vs the previous 5): - **Per-lens panic recovery** (`main.go`): each goroutine has its own `recover()`, so a rate-limit crash in one lens cannot propagate to the other 14. - **Per-lens timeout context** (`main.go`): each specialist gets an independent timeout; one 429-stalled lens cannot starve others. - **`errored=true` tracking** (`main.go` → `consolidate.go`): rate limit errors surface as `⚠️` notices in the relevant specialist section while other lenses' results are preserved intact. - **`run.sh` non-zero exit fallback** (`scripts/run.sh`): binary failure is caught, wrapped in a notice, and the script exits 0 — subsequent model lanes continue unimpeded. - **`entrypoint.sh` lane isolation**: `|| log "model $1 failed (continuing)"` swallows individual model failures; `wait -n` correctly drains the concurrent slot so the lane cap doesn't deadlock. - **Status board write safety**: mutex + temp-file-then-rename already handles concurrent goroutine updates. The advisory-only invariant is preserved: 429s under the new 15-way concurrency degrade to per-lens `⚠️` notices, not CI failures. Nothing in this diff creates new unhandled edge cases. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 57s

gitea-actions bot commented

2026-06-28 03:49:32 +00:00

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

The change is a pure concurrency tuning — bumping claude-code from 1 to 3 in the provider_concurrency default. Through the security lens:

No new secrets, inputs, or authz paths are introduced. The same CLAUDE_CODE_OAUTH_TOKEN is forwarded the same way; claudeEnv() (engine.go:221-241) already filters the subprocess environment to a minimal allowlist regardless of how many concurrent instances run.
No injection surface: the provider_concurrency value is parsed in entrypoint.sh:167-176 via echo/tr/string splitting — no eval, no command substitution risk. The parsed numeric value is guarded by [ "$cap" -ge 1 ] 2>/dev/null || cap=1 (line 208), so even a malformed override falls back safely.
No untrusted data flow: the input originates from the workflow YAML (trusted, repo-owner-controlled), not from PR content or comment text.
No credential leakage from parallelism: each claude -p subprocess gets the same filtered environment; 15 concurrent copies don't increase exposure.

The diff is operationally opinionated (aggressive parallelism) but introduces no security regression.

🎯 Correctness — No material issues found

No material issues found

The change is a straightforward configuration value bump (claude-code=1 → 3) with consistent comment updates. I verified:

Concurrency math: provider_concurrency × provider_lens_concurrency = 3 × 5 = 15 peak concurrent claude -p processes. The comment's claim of 15 is correct.
Model count matches cap: The default models list has exactly 3 claude-code/* entries (sonnet, opus, opus:max), so claude-code=3 means all three run concurrently — no model is left serialized, and no model is starved by an undersized cap.
Lens count matches cap: 5 specialists × claude-code=5 lens concurrency = all 5 lenses per model run concurrently. Unchanged and correct.
Entrypoint logic: provider_cap() in entrypoint.sh:167-176 parses the provider=N map correctly; run_lane() at line 206-218 uses the cap to bound concurrent model launches. With cap=3 and 3 models, all launch concurrently — no off-by-one or starvation.
No stale references: Grep confirmed zero remaining instances of claude-code=1, claude models serial, or similar old-language across the repo.
Cloud lane unchanged: ollama-cloud=3 with 3 cloud models and lens concurrency 3 remains correct (peak 9).

🧹 Code cleanliness & maintainability — Minor issues

The finding is confirmed. The adversarial-review.yml file at lines 7-8 still contains the old serial-concurrency description ("claude models run one at a time, each with all 5 lenses at once"), while the reusable workflow it calls now defaults to claude-code=3 (concurrent). The PR updated the reusable workflow and CLAUDE.md but missed this caller file.

Verdict: Minor issues

Stale comment in adversarial-review.yml:7-8 — the PR updates the concurrency description in review-reusable.yml (lines 16, 38, 44-46, 52) and CLAUDE.md (line 51), but .gitea/workflows/adversarial-review.yml still says claude models run one at a time, each with all 5 lenses at once. This file is the thin caller that inherits the default swarm from the reusable workflow (it passes no with: overrides for the swarm inputs), so its comment now contradicts the actual behavior (claude-code=3 concurrent). Impact: a reader of the caller workflow gets stale information about concurrency behavior. Fix: update lines 7-8 to match the new description (e.g., "3 claude models concurrent, each with all 5 lenses at once").

⚡ Performance — No material issues found

Verdict: No material issues found

This is a straightforward configuration change — bumping claude-code provider concurrency from 1 to 3. From a performance lens:

The concurrency mechanism is already efficient. I verified in entrypoint.sh:206-218 that run_lane uses a simple semaphore pattern (& + wait -n), which is O(n) in model count. With cap=3 and exactly 3 claude-code models, all three launch immediately and then wait drains them — no busy-waiting, no quadratic behavior.
No N+1, no unbounded growth. The cap is a hard constant (3), and the per-lens cap (5) is also bounded. Peak concurrency is 15, which is explicitly documented and intentional.
No unnecessary allocations or copies. The diff touches only YAML defaults and comments; no code paths are altered.
The timeout_minutes (90) remains appropriate. With serial claude execution removed, the job should complete faster, not slower.

The PR is transparent about the tradeoff (rate-limiting risk), but that's a reliability concern, not a performance regression in the code. Nothing in my lane to flag.

🧯 Error handling & edge cases — No material issues found

No material issues found

This PR changes a single configuration default (claude-code=1 → claude-code=3) and updates associated comments. I traced the value through every consumer:

entrypoint.sh:167-176 (provider_cap): Parses "ollama-cloud=3,claude-code=3" by splitting on , then =, trims whitespace, and returns the value for a matching provider key. The new value 3 is well-formed and handled identically to the old 1.
entrypoint.sh:206-218 (run_lane): The cap is validated with [ "$cap" -ge 1 ] 2>/dev/null || cap=1, so any non-numeric or zero/negative value would safely fall back to 1. The wait -n throttling loop handles cap=3 correctly — with exactly 3 claude-code models, all three launch concurrently, then the trailing wait collects them.
entrypoint.sh:178-192 (review_one): Individual model failures are already advisory (|| log "model $1 failed (continuing)") — unchanged behavior regardless of concurrency.
cmd/gadfly/main.go:281-286 (lensConcurrency): The per-lens concurrency (provider_lens_concurrency) is unchanged at claude-code=5 and resolved through the same providerOverride parser, which handles the value correctly.

The PR description explicitly acknowledges the rate-limit risk ("if it 429s, individual lenses error (advisory)") and the existing error handling already treats individual lens/model failures as non-fatal. No new error paths, no unhandled edge cases, no missing cleanup — the change is a pure numeric tuning of an already-guarded concurrency cap.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 39s}

### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> ## Verdict: No material issues found The change is a pure concurrency tuning — bumping `claude-code` from `1` to `3` in the `provider_concurrency` default. Through the security lens: - **No new secrets, inputs, or authz paths** are introduced. The same `CLAUDE_CODE_OAUTH_TOKEN` is forwarded the same way; `claudeEnv()` (engine.go:221-241) already filters the subprocess environment to a minimal allowlist regardless of how many concurrent instances run. - **No injection surface**: the `provider_concurrency` value is parsed in `entrypoint.sh:167-176` via `echo`/`tr`/string splitting — no `eval`, no command substitution risk. The parsed numeric value is guarded by `[ "$cap" -ge 1 ] 2>/dev/null || cap=1` (line 208), so even a malformed override falls back safely. - **No untrusted data flow**: the input originates from the workflow YAML (trusted, repo-owner-controlled), not from PR content or comment text. - **No credential leakage from parallelism**: each `claude -p` subprocess gets the same filtered environment; 15 concurrent copies don't increase exposure. The diff is operationally opinionated (aggressive parallelism) but introduces no security regression. </details> <details><summary>🎯 Correctness — No material issues found</summary> **No material issues found** The change is a straightforward configuration value bump (`claude-code=1 → 3`) with consistent comment updates. I verified: - **Concurrency math**: `provider_concurrency` × `provider_lens_concurrency` = 3 × 5 = 15 peak concurrent `claude -p` processes. The comment's claim of 15 is correct. - **Model count matches cap**: The default `models` list has exactly 3 `claude-code/*` entries (sonnet, opus, opus:max), so `claude-code=3` means all three run concurrently — no model is left serialized, and no model is starved by an undersized cap. - **Lens count matches cap**: 5 specialists × `claude-code=5` lens concurrency = all 5 lenses per model run concurrently. Unchanged and correct. - **Entrypoint logic**: `provider_cap()` in `entrypoint.sh:167-176` parses the `provider=N` map correctly; `run_lane()` at line 206-218 uses the cap to bound concurrent model launches. With cap=3 and 3 models, all launch concurrently — no off-by-one or starvation. - **No stale references**: Grep confirmed zero remaining instances of `claude-code=1`, `claude models serial`, or similar old-language across the repo. - **Cloud lane unchanged**: `ollama-cloud=3` with 3 cloud models and lens concurrency 3 remains correct (peak 9). </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> The finding is confirmed. The `adversarial-review.yml` file at lines 7-8 still contains the old serial-concurrency description ("claude models run one at a time, each with all 5 lenses at once"), while the reusable workflow it calls now defaults to `claude-code=3` (concurrent). The PR updated the reusable workflow and `CLAUDE.md` but missed this caller file. --- ## Verdict: Minor issues - **Stale comment in `adversarial-review.yml:7-8`** — the PR updates the concurrency description in `review-reusable.yml` (lines 16, 38, 44-46, 52) and `CLAUDE.md` (line 51), but `.gitea/workflows/adversarial-review.yml` still says `claude models run one at a time, each with all 5 lenses at once`. This file is the thin caller that inherits the default swarm from the reusable workflow (it passes no `with:` overrides for the swarm inputs), so its comment now contradicts the actual behavior (`claude-code=3` concurrent). Impact: a reader of the caller workflow gets stale information about concurrency behavior. Fix: update lines 7-8 to match the new description (e.g., "3 claude models concurrent, each with all 5 lenses at once"). </details> <details><summary>⚡ Performance — No material issues found</summary> ## Verdict: No material issues found This is a straightforward configuration change — bumping `claude-code` provider concurrency from 1 to 3. From a performance lens: - **The concurrency mechanism is already efficient.** I verified in `entrypoint.sh:206-218` that `run_lane` uses a simple semaphore pattern (`&` + `wait -n`), which is O(n) in model count. With cap=3 and exactly 3 claude-code models, all three launch immediately and then `wait` drains them — no busy-waiting, no quadratic behavior. - **No N+1, no unbounded growth.** The cap is a hard constant (3), and the per-lens cap (5) is also bounded. Peak concurrency is 15, which is explicitly documented and intentional. - **No unnecessary allocations or copies.** The diff touches only YAML defaults and comments; no code paths are altered. - **The `timeout_minutes` (90) remains appropriate.** With serial claude execution removed, the job should complete faster, not slower. The PR is transparent about the tradeoff (rate-limiting risk), but that's a reliability concern, not a performance regression in the code. Nothing in my lane to flag. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **No material issues found** This PR changes a single configuration default (`claude-code=1` → `claude-code=3`) and updates associated comments. I traced the value through every consumer: - **`entrypoint.sh:167-176` (`provider_cap`)**: Parses `"ollama-cloud=3,claude-code=3"` by splitting on `,` then `=`, trims whitespace, and returns the value for a matching provider key. The new value `3` is well-formed and handled identically to the old `1`. - **`entrypoint.sh:206-218` (`run_lane`)**: The cap is validated with `[ "$cap" -ge 1 ] 2>/dev/null || cap=1`, so any non-numeric or zero/negative value would safely fall back to 1. The `wait -n` throttling loop handles `cap=3` correctly — with exactly 3 claude-code models, all three launch concurrently, then the trailing `wait` collects them. - **`entrypoint.sh:178-192` (`review_one`)**: Individual model failures are already advisory (`|| log "model $1 failed (continuing)"`) — unchanged behavior regardless of concurrency. - **`cmd/gadfly/main.go:281-286` (`lensConcurrency`)**: The per-lens concurrency (`provider_lens_concurrency`) is unchanged at `claude-code=5` and resolved through the same `providerOverride` parser, which handles the value correctly. The PR description explicitly acknowledges the rate-limit risk ("if it 429s, individual lenses error (advisory)") and the existing error handling already treats individual lens/model failures as non-fatal. No new error paths, no unhandled edge cases, no missing cleanup — the change is a pure numeric tuning of an already-guarded concurrency cap. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 39s

steve merged commit 0d80ae73d8 into main

2026-06-28 04:02:18 +00:00

steve deleted branch tune/claude-3x5

2026-06-28 04:02:18 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: steve/gadfly#11

tune(reusable): claude-code=3 models × 5 lenses (claude was the bottleneck) #11

🪰 Gadfly — live review status

claude-code/opus · claude-code — ✅ done

claude-code/opus:max · claude-code — ✅ done

claude-code/sonnet · claude-code — ✅ done

deepseek-v4-pro:cloud · ollama-cloud — ✅ done

glm-5.2:cloud · ollama-cloud — ✅ done

minimax-m3:cloud · ollama-cloud — ✅ done

🪰 Gadfly review — claude-code/opus:max (claude-code)

VERDICT: No material issues found

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Corrected review

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

VERDICT: Minor issues

🪰 Gadfly review — claude-code/opus (claude-code)

VERDICT: No material issues found

VERDICT: No material issues found

🪰 Gadfly review — claude-code/sonnet (claude-code)

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: No material issues found

Verdict: Minor issues

Verdict: No material issues found

`claude-code/opus` · claude-code — ✅ done

`claude-code/opus:max` · claude-code — ✅ done

`claude-code/sonnet` · claude-code — ✅ done

`deepseek-v4-pro:cloud` · ollama-cloud — ✅ done

`glm-5.2:cloud` · ollama-cloud — ✅ done

`minimax-m3:cloud` · ollama-cloud — ✅ done

🪰 Gadfly review — `claude-code/opus:max` (claude-code)

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

🪰 Gadfly review — `claude-code/opus` (claude-code)

🪰 Gadfly review — `claude-code/sonnet` (claude-code)

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)