ci: switch gadfly review to the reusable workflow (curated swarm, 5 lenses) #6

2026-06-28T01:58:37Z

steve commented

2026-06-28 01:58:37 +00:00

Replaces majordomo's full self-contained Gadfly stub with a thin caller of steve/gadfly's reusable workflow that inherits the default swarm.

uses: steve/gadfly/.gitea/workflows/review-reusable.yml@b02b11d (immutable pin)
Inherits the default swarm (defined once in gadfly's review-reusable.yml, steve/gadfly#10): 3 cloud models (minimax-m3, glm-5.2, deepseek-v4-pro) + Claude Code (sonnet, opus, opus:max), 5-lens suite, claude models serial / 5 lenses each. (This is the config originally prototyped in this PR — now centralized so all consumers share it.)
Least-privilege secrets: explicit forward of OLLAMA_CLOUD_API_KEY / CLAUDE_CODE_OAUTH_TOKEN / findings tokens (not secrets: inherit); GITEA_TOKEN is the auto job token.
The stub now carries only triggers + actor gate + secret forwarding + allowed_users; all swarm tuning lives centrally in gadfly.

Advisory only — never blocks a merge.

🤖 Generated with Claude Code

Replaces majordomo's full self-contained Gadfly stub with a thin caller of **steve/gadfly's reusable workflow** that **inherits the default swarm**. - `uses: steve/gadfly/.gitea/workflows/review-reusable.yml@b02b11d` (immutable pin) - **Inherits the default swarm** (defined once in gadfly's `review-reusable.yml`, steve/gadfly#10): 3 cloud models (minimax-m3, glm-5.2, deepseek-v4-pro) + Claude Code (sonnet, opus, opus:max), **5-lens suite**, claude models serial / 5 lenses each. (This is the config originally prototyped in this PR — now centralized so all consumers share it.) - **Least-privilege secrets**: explicit forward of `OLLAMA_CLOUD_API_KEY` / `CLAUDE_CODE_OAUTH_TOKEN` / findings tokens (not `secrets: inherit`); `GITEA_TOKEN` is the auto job token. - The stub now carries only triggers + actor gate + secret forwarding + `allowed_users`; all swarm tuning lives centrally in gadfly. Advisory only — never blocks a merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

steve added 1 commit 2026-06-28 01:58:37 +00:00

ci: switch gadfly review to the reusable workflow (curated swarm, 5 lenses)

Adversarial Review (Gadfly) / review (pull_request) Successful in 8m49s

Details

CI / Tidy (pull_request) Successful in 9m37s

Details

CI / Build & Test (pull_request) Successful in 10m14s

Details

ecf39087a9

Replace majordomo's full self-contained Gadfly stub with a thin caller of
steve/gadfly's reusable workflow, matching mort/executus's hardened pattern:
- explicit secret forwarding (least privilege); GITEA_TOKEN is the auto token.
- pinned to an immutable gadfly commit (@20a5c43), not @main.

Curated swarm tuned for majordomo:
- 3 strong cloud models (minimax-m3, glm-5.2, deepseek-v4-pro) — dropped
  qwen3-coder:480b, nemotron-3-super, glm-5.1.
- Claude Code engine (sonnet, opus, opus:max), claude-code=3 so all three
  claudes run at once.
- 5-lens default suite (security, correctness, maintainability, performance,
  error-handling) for every model (lenses are global).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitea-actions bot commented

2026-06-28 01:58:40 +00:00

🪰 Gadfly — live review status

6/6 reviewers finished · updated 2026-06-28 02:07:25Z

`claude-code/opus` · claude-code — ✅ done

✅ security — Minor issues
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`claude-code/opus:max` · claude-code — ✅ done

✅ security — Minor issues
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`claude-code/sonnet` · claude-code — ✅ done

✅ security — Minor issues
✅ correctness — No material issues found
✅ maintainability — No material issues found
✅ performance — No material issues found
✅ error-handling — No material issues found

`deepseek-v4-pro:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — Minor issues
✅ maintainability — Minor issues
✅ performance — Minor issues
✅ error-handling — Minor issues

`glm-5.2:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — No material issues found
✅ performance — Minor issues
✅ error-handling — No material issues found

`minimax-m3:cloud` · ollama-cloud — ✅ done

✅ security — Minor issues
✅ correctness — Minor issues
✅ maintainability — Minor issues
✅ performance — Minor issues
✅ error-handling — Minor issues

_{Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.}

## 🪰 Gadfly — live review status 6/6 reviewers finished · updated 2026-06-28 02:07:25Z #### `claude-code/opus` · claude-code — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/opus:max` · claude-code — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/sonnet` · claude-code — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — No material issues found - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `deepseek-v4-pro:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — Minor issues - ✅ **error-handling** — Minor issues #### `glm-5.2:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — No material issues found - ✅ **performance** — Minor issues - ✅ **error-handling** — No material issues found #### `minimax-m3:cloud` · ollama-cloud — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — Minor issues - ✅ **error-handling** — Minor issues Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.

gitea-actions bot commented

2026-06-28 01:58:40 +00:00

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

The change is a net security improvement over the previous workflow:

Explicit secret forwarding replaces the old implicit model where the Docker container had access to all job-context secrets. Only 4 secrets are now forwarded (OLLAMA_CLOUD_API_KEY, CLAUDE_CODE_OAUTH_TOKEN, GADFLY_FINDINGS_URL, GADFLY_FINDINGS_TOKEN), and GITEA_TOKEN is handled automatically by the runner — this is genuine least-privilege.
Immutable pinning — @20a5c431f22d43bab0f711c14ce74de3a735c110 prevents a push to steve/gadfly from silently changing the code that runs with forwarded secrets. The old workflow also pinned (sha-d7f364d), so this is parity, not regression.
Actor gate preserved — the if: condition restricting issue_comment triggers to steve, fizi, dazed is unchanged. pull_request and workflow_dispatch remain trusted triggers (require repo write access).
No user-controlled input in with: parameters — all values (models, specialists, provider_concurrency, timeout_secs, max_steps, allowed_users, timeout_minutes) are hardcoded strings; nothing interpolates github.event.comment.body or any other attacker-controlled value.
permissions: block unchanged — contents: read, issues: write, pull-requests: write remains at the workflow level, constraining what the called reusable workflow can do.

The one new credential (CLAUDE_CODE_OAUTH_TOKEN) is expected given the addition of Claude Code models to the swarm, and it's forwarded through the same pinned, least-privilege channel as the others. The trust boundary (code authored by steve/gadfly) is identical to the old Docker-image approach.

🎯 Correctness — Minor issues

Now let me verify the two findings against the actual code.

Finding 1 — provider_lens_concurrency omits claude-code entry:

The actual file at line 55 shows provider_lens_concurrency: "ollama-cloud=3" with no claude-code entry. However, the draft itself marks this as unverified because the reusable workflow at 20a5c43 lives in steve/gadfly, not this repo. I cannot inspect that workflow either — I have no way to determine the default lens-concurrency behavior for unspecified providers. Per the instructions, I must drop any finding I cannot positively confirm.

Finding 2 — vars.GADFLY_SPECIALISTS override silently dropped:

Confirmed against the actual file. The old workflow (visible in the diff) resolved GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS || 'security,correctness,error-handling' }}. The new workflow at line 53 hardcodes specialists: "security,correctness,maintainability,performance,error-handling" with no reference to vars.GADFLY_SPECIALISTS. A grep for GADFLY_SPECIALISTS across the entire repo returns zero matches. The behavioral contract change is real: any repo-level GADFLY_SPECIALISTS variable set in Gitea's web UI is now silently ignored.

Verdict: Minor issues

vars.GADFLY_SPECIALISTS repo-variable override silently dropped — adversarial-review.yml:53. The old workflow resolved GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS || 'security,correctness,error-handling' }}, allowing repo-level customization of the lens suite without editing the workflow file. The new workflow hardcodes specialists: "security,correctness,maintainability,performance,error-handling" with no fallback to vars. If GADFLY_SPECIALISTS was set in the repo's Gitea settings (unverifiable from repo files — repo variables live in the web UI, not in the tree), it is now silently ignored. The PR description frames the 5-lens suite as intentional, so this is likely a deliberate simplification, but it is a behavioral contract change worth flagging.

🧹 Code cleanliness & maintainability — Minor issues

Verdict: Minor issues

provider_lens_concurrency asymmetry — provider_concurrency declares limits for both providers (ollama-cloud=3,claude-code=3), but provider_lens_concurrency only sets ollama-cloud=3 with no entry for claude-code (.gitea/workflows/adversarial-review.yml:55). The old workflow only had cloud models so the single entry was complete; now that claude-code models are in the swarm, the asymmetry between the two concurrency knobs is confusing. A future maintainer will wonder whether omitting claude-code from lens concurrency was deliberate or an oversight. If intentional, a one-line comment explaining why (e.g., # claude-code manages its own lens fan-out) would eliminate the ambiguity.
Hardcoded specialists removes the repo-variable escape hatch — the old workflow allowed overriding the lens suite at runtime via ${{ vars.GADFLY_SPECIALISTS || '…' }}; the new workflow hardcodes the 5-lens string (.gitea/workflows/adversarial-review.yml:53). The PR description frames this as a deliberate "curated swarm" choice, which is fine, but it means any ad-hoc adjustment (e.g., running only security for a quick spot-check) now requires editing and committing the workflow file rather than toggling a repo variable. Worth acknowledging as a flexibility trade-off, even if intentional.

⚡ Performance — Minor issues

Now let me verify the second finding — the timeout change — by confirming the old value in the diff:

The diff clearly shows:

Old: timeout-minutes: 45 (removed)
New: timeout_minutes: 90 (line 59)

This is confirmed. The timeout doubled from 45 to 90 minutes.

Now for the first finding: the draft itself admits it cannot verify the reusable workflow's default behavior for provider_lens_concurrency because the reusable workflow lives in steve/gadfly, outside this repo. The draft speculates that the default is 1 for unspecified providers, but this is unconfirmable from the code in this repository. I cannot positively confirm this finding.

Verdict: Minor issues

Timeout doubled from 45 → 90 minutes (.gitea/workflows/adversarial-review.yml:59): The old job-level timeout-minutes: 45 is replaced by timeout_minutes: 90 passed to the reusable workflow. Even accounting for the heavier Claude Code models and the lens-suite expansion (3→5), this is a 2× ceiling increase. A hung or slow model can now consume CI runner time for twice as long before being killed.

🧯 Error handling & edge cases — Minor issues

I've read the actual workflow file and confirmed the finding. The review job (line 32) has no timeout-minutes — only the uses:, secrets:, and with: blocks. The old code (visible in the diff) had timeout-minutes: 45 directly on the job. The timeout_minutes: 90 on line 59 is merely an input to the reusable workflow, not a platform-level job timeout. The finding is confirmed.

Verdict: Minor issues

Missing job-level timeout-minutes backstop (adversarial-review.yml:44): The old workflow had timeout-minutes: 45 on the review job — a hard platform-level cap that would kill the job even if the container/reusable-workflow hung. The new workflow removes this and passes timeout_minutes: 90 only as an input to the reusable workflow. If the reusable workflow fails to enforce its own timeout (e.g., stuck waiting for a runner, a bug in the reusable workflow's timeout wiring, or the reusable workflow itself hangs before its internal timeout-minutes takes effect), the calling job has no backstop and will run until the platform default (typically 360 minutes). This is a regression in defensive error handling — the old code had belt and suspenders; the new code relies entirely on the callee to time itself out.
Fix: Add timeout-minutes: 90 (or 120) directly on the review: job, above or below the uses: line. This keeps the platform-level kill switch independent of the reusable workflow's internal timeout.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 57s}

### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> ## Verdict: No material issues found The change is a net **security improvement** over the previous workflow: - **Explicit secret forwarding** replaces the old implicit model where the Docker container had access to all job-context secrets. Only 4 secrets are now forwarded (`OLLAMA_CLOUD_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, `GADFLY_FINDINGS_URL`, `GADFLY_FINDINGS_TOKEN`), and `GITEA_TOKEN` is handled automatically by the runner — this is genuine least-privilege. - **Immutable pinning** — `@20a5c431f22d43bab0f711c14ce74de3a735c110` prevents a push to `steve/gadfly` from silently changing the code that runs with forwarded secrets. The old workflow also pinned (`sha-d7f364d`), so this is parity, not regression. - **Actor gate preserved** — the `if:` condition restricting `issue_comment` triggers to `steve`, `fizi`, `dazed` is unchanged. `pull_request` and `workflow_dispatch` remain trusted triggers (require repo write access). - **No user-controlled input in `with:` parameters** — all values (`models`, `specialists`, `provider_concurrency`, `timeout_secs`, `max_steps`, `allowed_users`, `timeout_minutes`) are hardcoded strings; nothing interpolates `github.event.comment.body` or any other attacker-controlled value. - **`permissions:` block unchanged** — `contents: read, issues: write, pull-requests: write` remains at the workflow level, constraining what the called reusable workflow can do. The one new credential (`CLAUDE_CODE_OAUTH_TOKEN`) is expected given the addition of Claude Code models to the swarm, and it's forwarded through the same pinned, least-privilege channel as the others. The trust boundary (code authored by `steve/gadfly`) is identical to the old Docker-image approach. </details> <details><summary>🎯 Correctness — Minor issues</summary> Now let me verify the two findings against the actual code. **Finding 1 — `provider_lens_concurrency` omits `claude-code` entry:** The actual file at line 55 shows `provider_lens_concurrency: "ollama-cloud=3"` with no `claude-code` entry. However, the draft itself marks this as **unverified** because the reusable workflow at `20a5c43` lives in `steve/gadfly`, not this repo. I cannot inspect that workflow either — I have no way to determine the default lens-concurrency behavior for unspecified providers. Per the instructions, I must drop any finding I cannot positively confirm. **Finding 2 — `vars.GADFLY_SPECIALISTS` override silently dropped:** Confirmed against the actual file. The old workflow (visible in the diff) resolved `GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS || 'security,correctness,error-handling' }}`. The new workflow at line 53 hardcodes `specialists: "security,correctness,maintainability,performance,error-handling"` with no reference to `vars.GADFLY_SPECIALISTS`. A grep for `GADFLY_SPECIALISTS` across the entire repo returns zero matches. The behavioral contract change is real: any repo-level `GADFLY_SPECIALISTS` variable set in Gitea's web UI is now silently ignored. --- ## Verdict: Minor issues - **`vars.GADFLY_SPECIALISTS` repo-variable override silently dropped** — `adversarial-review.yml:53`. The old workflow resolved `GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS || 'security,correctness,error-handling' }}`, allowing repo-level customization of the lens suite without editing the workflow file. The new workflow hardcodes `specialists: "security,correctness,maintainability,performance,error-handling"` with no fallback to `vars`. If `GADFLY_SPECIALISTS` was set in the repo's Gitea settings (unverifiable from repo files — repo variables live in the web UI, not in the tree), it is now silently ignored. The PR description frames the 5-lens suite as intentional, so this is likely a deliberate simplification, but it is a behavioral contract change worth flagging. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> ## Verdict: Minor issues - **`provider_lens_concurrency` asymmetry** — `provider_concurrency` declares limits for both providers (`ollama-cloud=3,claude-code=3`), but `provider_lens_concurrency` only sets `ollama-cloud=3` with no entry for `claude-code` (`.gitea/workflows/adversarial-review.yml:55`). The old workflow only had cloud models so the single entry was complete; now that claude-code models are in the swarm, the asymmetry between the two concurrency knobs is confusing. A future maintainer will wonder whether omitting claude-code from lens concurrency was deliberate or an oversight. If intentional, a one-line comment explaining why (e.g., `# claude-code manages its own lens fan-out`) would eliminate the ambiguity. - **Hardcoded specialists removes the repo-variable escape hatch** — the old workflow allowed overriding the lens suite at runtime via `${{ vars.GADFLY_SPECIALISTS || '…' }}`; the new workflow hardcodes the 5-lens string (`.gitea/workflows/adversarial-review.yml:53`). The PR description frames this as a deliberate "curated swarm" choice, which is fine, but it means any ad-hoc adjustment (e.g., running only `security` for a quick spot-check) now requires editing and committing the workflow file rather than toggling a repo variable. Worth acknowledging as a flexibility trade-off, even if intentional. </details> <details><summary>⚡ Performance — Minor issues</summary> Now let me verify the second finding — the timeout change — by confirming the old value in the diff: The diff clearly shows: - Old: `timeout-minutes: 45` (removed) - New: `timeout_minutes: 90` (line 59) This is confirmed. The timeout doubled from 45 to 90 minutes. Now for the first finding: the draft itself admits it cannot verify the reusable workflow's default behavior for `provider_lens_concurrency` because the reusable workflow lives in `steve/gadfly`, outside this repo. The draft speculates that the default is 1 for unspecified providers, but this is unconfirmable from the code in this repository. I cannot positively confirm this finding. --- **Verdict:** Minor issues - **Timeout doubled from 45 → 90 minutes** (`.gitea/workflows/adversarial-review.yml:59`): The old job-level `timeout-minutes: 45` is replaced by `timeout_minutes: 90` passed to the reusable workflow. Even accounting for the heavier Claude Code models and the lens-suite expansion (3→5), this is a 2× ceiling increase. A hung or slow model can now consume CI runner time for twice as long before being killed. </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> I've read the actual workflow file and confirmed the finding. The `review` job (line 32) has no `timeout-minutes` — only the `uses:`, `secrets:`, and `with:` blocks. The old code (visible in the diff) had `timeout-minutes: 45` directly on the job. The `timeout_minutes: 90` on line 59 is merely an input to the reusable workflow, not a platform-level job timeout. The finding is confirmed. --- ## Verdict: Minor issues - **Missing job-level `timeout-minutes` backstop** (`adversarial-review.yml:44`): The old workflow had `timeout-minutes: 45` on the `review` job — a hard platform-level cap that would kill the job even if the container/reusable-workflow hung. The new workflow removes this and passes `timeout_minutes: 90` only as an *input* to the reusable workflow. If the reusable workflow fails to enforce its own timeout (e.g., stuck waiting for a runner, a bug in the reusable workflow's timeout wiring, or the reusable workflow itself hangs before its internal `timeout-minutes` takes effect), the calling job has **no backstop** and will run until the platform default (typically 360 minutes). This is a regression in defensive error handling — the old code had belt *and* suspenders; the new code relies entirely on the callee to time itself out. **Fix**: Add `timeout-minutes: 90` (or 120) directly on the `review:` job, above or below the `uses:` line. This keeps the platform-level kill switch independent of the reusable workflow's internal timeout. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 57s

gitea-actions bot commented

2026-06-28 01:58:40 +00:00

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — Minor issues

The findings all check out against the code:

Line 36-41 — actor allow-list for issue_comment confirmed.
Lines 22-25 — narrowed permissions confirmed.
Line 44 — SHA pin to 20a5c431f22d43bab0f711c14ce74de3a735c110 confirmed.
Line 52 — claude-code/* models receive CLAUDE_CODE_OAUTH_TOKEN confirmed.
Lines 28-29 — concurrency cancel-in-progress keyed on PR number confirmed.
CLAUDE.md:144-147 — stale doc ("6 ollama-cloud models", "3-lens suite") confirmed; the file still has that text.

VERDICT: Minor issues

.gitea/workflows/adversarial-review.yml:36-41 — issue_comment trigger guard is job-level only in this repo. The if: allow-lists actors (steve, fizi, dazed) for the comment path; the allowed_users: "steve,fizi,dazed" input is forwarded as belt-and-suspenders, but whether the upstream steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c431f22d43bab0f711c14ce74de3a735c110 actually enforces it in-container cannot be confirmed from this repo. The SHA pin is correct, so a push to gadfly main cannot silently alter what runs. Suggest confirming in the gadfly repo that allowed_users is wired into an if: on the commenting step.
.gitea/workflows/adversarial-review.yml:44 — third-party reusable workflow pinned to commit SHA. steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c431f22d43bab0f711c14ce74de3a735c110 (full SHA, not @main). Correct supply-chain call: a force-push to steve/gadfly main cannot silently change what runs with our secrets. No issue.
.gitea/workflows/adversarial-review.yml:48 — CLAUDE_CODE_OAUTH_TOKEN is a long-lived user-scoped OAuth credential. Acceptable because the swarm at line 52 explicitly invokes claude-code/sonnet,claude-code/opus,claude-code/opus:max, so the OAuth token is required. Not over-scoping. The SHA pin is the primary defense if the reusable workflow's execution environment is compromised.
.gitea/workflows/adversarial-review.yml:22-25 — permissions correctly narrowed. contents: read, issues: write, pull-requests: write; no secrets: inherit. Forwarded secrets (OLLAMA_CLOUD_API_KEY, CLAUDE_CODE_OAUTH_TOKEN, GADFLY_FINDINGS_URL, GADFLY_FINDINGS_TOKEN) are least-privilege. Net improvement over the prior GITEA_TOKEN + broader set.
.gitea/workflows/adversarial-review.yml:27-29 — concurrency cancel-in-progress keyed on PR number. Prevents stale reviews from leaking a later PR's secrets/state into an earlier run's comment thread.
CLAUDE.md:144-147 — stale doc, not a security finding. Still describes the reviewer as "a fleet of 6 ollama-cloud models … 3-lens suite (security, correctness, error-handling)" but the workflow now runs 3 cloud + 3 claude-code models with a 5-lens suite (security, correctness, maintainability, performance, error-handling). Doc drift that could mislead an operator triaging which reviewers ran. Suggest updating in a follow-up.

🎯 Correctness — Minor issues

VERDICT: Minor issues

.gitea/workflows/adversarial-review.yml:54-55 — provider_lens_concurrency is asymmetric between providers. Line 54 sets provider_concurrency: "ollama-cloud=3,claude-code=3" (both providers get an explicit lane width), but line 55 sets provider_lens_concurrency: "ollama-cloud=3" with no claude-code=… entry. The lens suite also grew from 3 lenses (security, correctness, error-handling) to 5 (adding maintainability, performance) per the specialists: input on line 53. The omitted claude-code lens-concurrency setting means the three claude-code models (sonnet, opus, opus:max) inherit whatever the gadfly default is for that key — most plausibly 1, making each claude model run all 5 lenses serially. With timeout_secs: 600 per lens on line 56 and three claude models running in parallel lanes (per provider_concurrency=claude-code=3), the worst case is one claude model at ~50 min, which still fits inside timeout_minutes: 90 on line 59, but the asymmetry should be either intentional and documented in the header comment, or fixed by adding claude-code=3 to provider_lens_concurrency. Confirmed by reading lines 54-55 of the new file.

🧹 Code cleanliness & maintainability — Minor issues

VERDICT: Minor issues

CLAUDE.md:144-147 — docs now drift from the workflow (verified).
CLAUDE.md still says Gadfly is "the standalone agentic adversarial reviewer", "a fleet of 6 ollama-cloud models", running "the 3-lens suite (security, correctness, error-handling)". The PR ships a thin caller of steve/gadfly's reusable workflow with 3 cloud + 3 claude-code models and the 5-lens suite (adds maintainability, performance). The repo's own rule at CLAUDE.md:135-137 ("README.md must match reality in the same commit that changes behavior") is violated here.
.gitea/workflows/adversarial-review.yml:39-41 and :58 — duplicated allow-list (verified).
The job-level if: guard hard-codes github.actor == 'steve' || ... == 'fizi' || ... == 'dazed', and the same list is re-passed via allowed_users: "steve,fizi,dazed". Two places to keep in sync; will drift the next time someone is added or removed. Minor cleanup suggestion.
.gitea/workflows/adversarial-review.yml:55 — provider_lens_concurrency only sets ollama-cloud=3 (verified).
Lens-level concurrency is provider-scoped, and only ollama-cloud is explicit. provider_concurrency on line 54 sets both (ollama-cloud=3,claude-code=3), but provider_lens_concurrency omits claude-code, so it falls through to whatever the reusable workflow defaults to. The header comment (lines 4-6) asserts "Every model runs the 5-lens default suite … lenses are global, so this applies to the cloud AND claude models", which makes the omission ambiguous — a one-line note clarifying the fall-through behavior would make the file self-explanatory.

⚡ Performance — Minor issues

VERDICT: Minor issues

.gitea/workflows/adversarial-review.yml:54-55 — asymmetric lens parallelism (unverified). provider_concurrency: "ollama-cloud=3,claude-code=3" is set for both providers, but provider_lens_concurrency is only set for ollama-cloud. With 5 lenses per model, if the reusable workflow's default lens concurrency for an unset provider is 1, then each of the 3 Claude models will run its 5 lenses sequentially while the 3 cloud models run all 5 lenses concurrently — so the claude lanes are up to 5× slower per model than the cloud lanes despite the PR description framing the two providers as equivalent. Suggested fix: add provider_lens_concurrency: "claude-code=5" (or whatever the upper bound should be). I could not verify the reusable workflow's default behavior without checking steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c43…, which is outside this repo — flagging as unverified.
.gitea/workflows/adversarial-review.yml:53,59 — ~67% more reviewer work, only 2× timeout headroom. Lens count went from 3 → 5, so total reviewer invocations went from 6 models × 3 = 18 to 6 models × 5 = 30 (~67% more). Job timeout was doubled from 45 → 90 min, which is roughly the right shape but leaves little slack if any single model stalls near timeout_secs: "600" × 5 lenses. Combined with the 5 claude-code jobs (each an OAuth-authenticated CLI run, generally slower than a cloud HTTP call), the wall-clock per PR can grow noticeably. Mitigation is already in place (concurrency.cancel-in-progress: true keyed on PR number, line 27-29), so pile-up across PRs is bounded. Worth keeping an eye on runner-pool saturation if PR throughput rises — not a code fix, just a tuning note.
No N+1, no hot-loop, no unbounded growth concerns in this diff — it's a static config swap from a self-contained container step to a reusable workflow caller, and concurrency + cancel-in-progress are preserved unchanged.

🧯 Error handling & edge cases — Minor issues

Clean — no stale references.

Summary of findings within my lens:

Minor: provider_lens_concurrency (line 55) only specifies ollama-cloud=3. The claude-code provider has no entry, so its lenses run serially (1 at a time). With 5 lenses × 600s each, that's up to 50 min per claude model wall time. This is a missed optimization (adding claude-code=3 would reduce it to ~20 min), but NOT a timeout concern: the cloud chain finishes in ~20 min (5 lenses / 3 concurrent × 600s) and runs in parallel with the claude chain, so the overall worst-case wall time is ~50 min, well within the 90 min budget. The original review's claim that "a single slow model can push the run past 90 min" is not supported by the math.
Minor: cancel-in-progress: true (lines 27-29) silently kills in-flight reviews on every push to the PR, losing partial findings. Intentional for cost, but from an error-handling view the partial work is discarded without a tombstone. Low severity.
Unverified, not reported as a finding: max_steps: "14" and timeout_secs: "600" are strings; whether the reusable workflow declares these as type: number is in another repo and not checkable from this checkout.

VERDICT

Minor issues

.gitea/workflows/adversarial-review.yml:55 — provider_lens_concurrency: "ollama-cloud=3" omits the claude-code provider, so its 5 lenses run serially (up to 50 min per model). This is a missed optimization — adding claude-code=3 to mirror the cloud provider would cut claude wall time to ~20 min, and the 3 claude models already run concurrently via provider_concurrency: claude-code=3 (line 54). The 90 min timeout_minutes budget is adequate as-is (cloud chain finishes in ~20 min in parallel, so overall wall time is bounded by the ~50 min claude chain), but the asymmetry between cloud and claude lens concurrency is worth flagging. (Verified by reading lines 52-59.)
.gitea/workflows/adversarial-review.yml:27-29 — concurrency.cancel-in-progress: true silently discards in-flight reviews on every push to the PR, losing partial findings. Intentional for cost but worth flagging from an error-handling view: an on-call engineer re-running a review after a fix will see the prior run's partial lens outputs vanish without trace. No fix needed unless you want a tombstone; just noting it's an unhandled partial-failure mode. (Verified by reading lines 27-29.)
Unverified, not reported as a finding: max_steps: "14" and timeout_secs: "600" are strings; whether the reusable workflow declares these as type: number (and silently coerces) is in another repo and not checkable from this checkout. Flagging only so a future reviewer confirms the called workflow's inputs: types match.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 31s}

### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — Minor issues</summary> The findings all check out against the code: 1. **Line 36-41** — actor allow-list for `issue_comment` confirmed. 2. **Lines 22-25** — narrowed permissions confirmed. 3. **Line 44** — SHA pin to `20a5c431f22d43bab0f711c14ce74de3a735c110` confirmed. 4. **Line 52** — `claude-code/*` models receive `CLAUDE_CODE_OAUTH_TOKEN` confirmed. 5. **Lines 28-29** — concurrency cancel-in-progress keyed on PR number confirmed. 6. **CLAUDE.md:144-147** — stale doc ("6 ollama-cloud models", "3-lens suite") confirmed; the file still has that text. --- **VERDICT: Minor issues** - **`.gitea/workflows/adversarial-review.yml:36-41` — `issue_comment` trigger guard is job-level only in this repo.** The `if:` allow-lists actors (`steve`, `fizi`, `dazed`) for the comment path; the `allowed_users: "steve,fizi,dazed"` input is forwarded as belt-and-suspenders, but whether the upstream `steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c431f22d43bab0f711c14ce74de3a735c110` actually enforces it in-container cannot be confirmed from this repo. The SHA pin is correct, so a push to gadfly `main` cannot silently alter what runs. Suggest confirming in the gadfly repo that `allowed_users` is wired into an `if:` on the commenting step. - **`.gitea/workflows/adversarial-review.yml:44` — third-party reusable workflow pinned to commit SHA.** `steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c431f22d43bab0f711c14ce74de3a735c110` (full SHA, not `@main`). Correct supply-chain call: a force-push to `steve/gadfly` `main` cannot silently change what runs with our secrets. No issue. - **`.gitea/workflows/adversarial-review.yml:48` — `CLAUDE_CODE_OAUTH_TOKEN` is a long-lived user-scoped OAuth credential.** Acceptable because the swarm at line 52 explicitly invokes `claude-code/sonnet,claude-code/opus,claude-code/opus:max`, so the OAuth token is required. Not over-scoping. The SHA pin is the primary defense if the reusable workflow's execution environment is compromised. - **`.gitea/workflows/adversarial-review.yml:22-25` — permissions correctly narrowed.** `contents: read`, `issues: write`, `pull-requests: write`; no `secrets: inherit`. Forwarded secrets (`OLLAMA_CLOUD_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, `GADFLY_FINDINGS_URL`, `GADFLY_FINDINGS_TOKEN`) are least-privilege. Net improvement over the prior `GITEA_TOKEN` + broader set. - **`.gitea/workflows/adversarial-review.yml:27-29` — concurrency cancel-in-progress keyed on PR number.** Prevents stale reviews from leaking a later PR's secrets/state into an earlier run's comment thread. - **CLAUDE.md:144-147 — stale doc, not a security finding.** Still describes the reviewer as "a fleet of 6 ollama-cloud models … 3-lens suite (security, correctness, error-handling)" but the workflow now runs 3 cloud + 3 claude-code models with a 5-lens suite (security, correctness, maintainability, performance, error-handling). Doc drift that could mislead an operator triaging which reviewers ran. Suggest updating in a follow-up. </details> <details><summary>🎯 Correctness — Minor issues</summary> **VERDICT: Minor issues** - **`.gitea/workflows/adversarial-review.yml:54-55` — `provider_lens_concurrency` is asymmetric between providers.** Line 54 sets `provider_concurrency: "ollama-cloud=3,claude-code=3"` (both providers get an explicit lane width), but line 55 sets `provider_lens_concurrency: "ollama-cloud=3"` with no `claude-code=…` entry. The lens suite also grew from 3 lenses (security, correctness, error-handling) to 5 (adding maintainability, performance) per the `specialists:` input on line 53. The omitted `claude-code` lens-concurrency setting means the three claude-code models (`sonnet`, `opus`, `opus:max`) inherit whatever the gadfly default is for that key — most plausibly 1, making each claude model run all 5 lenses serially. With `timeout_secs: 600` per lens on line 56 and three claude models running in parallel lanes (per `provider_concurrency=claude-code=3`), the worst case is one claude model at ~50 min, which still fits inside `timeout_minutes: 90` on line 59, but the asymmetry should be either intentional and documented in the header comment, or fixed by adding `claude-code=3` to `provider_lens_concurrency`. Confirmed by reading lines 54-55 of the new file. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> ## VERDICT: Minor issues - **`CLAUDE.md:144-147` — docs now drift from the workflow (verified).** CLAUDE.md still says Gadfly is "the standalone agentic adversarial reviewer", "a fleet of 6 ollama-cloud models", running "the 3-lens suite (security, correctness, error-handling)". The PR ships a thin caller of `steve/gadfly`'s reusable workflow with **3 cloud + 3 claude-code** models and the **5-lens** suite (adds `maintainability`, `performance`). The repo's own rule at CLAUDE.md:135-137 ("README.md must match reality in the same commit that changes behavior") is violated here. - **`.gitea/workflows/adversarial-review.yml:39-41` and `:58` — duplicated allow-list (verified).** The job-level `if:` guard hard-codes `github.actor == 'steve' || ... == 'fizi' || ... == 'dazed'`, and the same list is re-passed via `allowed_users: "steve,fizi,dazed"`. Two places to keep in sync; will drift the next time someone is added or removed. Minor cleanup suggestion. - **`.gitea/workflows/adversarial-review.yml:55` — `provider_lens_concurrency` only sets `ollama-cloud=3` (verified).** Lens-level concurrency is provider-scoped, and only `ollama-cloud` is explicit. `provider_concurrency` on line 54 sets both (`ollama-cloud=3,claude-code=3`), but `provider_lens_concurrency` omits `claude-code`, so it falls through to whatever the reusable workflow defaults to. The header comment (lines 4-6) asserts "Every model runs the 5-lens default suite … lenses are global, so this applies to the cloud AND claude models", which makes the omission ambiguous — a one-line note clarifying the fall-through behavior would make the file self-explanatory. </details> <details><summary>⚡ Performance — Minor issues</summary> **VERDICT: Minor issues** - **`.gitea/workflows/adversarial-review.yml:54-55` — asymmetric lens parallelism (unverified).** `provider_concurrency: "ollama-cloud=3,claude-code=3"` is set for both providers, but `provider_lens_concurrency` is only set for `ollama-cloud`. With 5 lenses per model, if the reusable workflow's default lens concurrency for an unset provider is `1`, then each of the 3 Claude models will run its 5 lenses **sequentially** while the 3 cloud models run all 5 lenses concurrently — so the claude lanes are up to 5× slower per model than the cloud lanes despite the PR description framing the two providers as equivalent. Suggested fix: add `provider_lens_concurrency: "claude-code=5"` (or whatever the upper bound should be). I could not verify the reusable workflow's default behavior without checking `steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c43…`, which is outside this repo — flagging as unverified. - **`.gitea/workflows/adversarial-review.yml:53,59` — ~67% more reviewer work, only 2× timeout headroom.** Lens count went from 3 → 5, so total reviewer invocations went from 6 models × 3 = **18** to 6 models × 5 = **30** (~67% more). Job timeout was doubled from 45 → 90 min, which is roughly the right shape but leaves little slack if any single model stalls near `timeout_secs: "600"` × 5 lenses. Combined with the 5 claude-code jobs (each an OAuth-authenticated CLI run, generally slower than a cloud HTTP call), the wall-clock per PR can grow noticeably. Mitigation is already in place (`concurrency.cancel-in-progress: true` keyed on PR number, line 27-29), so pile-up across PRs is bounded. Worth keeping an eye on runner-pool saturation if PR throughput rises — not a code fix, just a tuning note. - **No N+1, no hot-loop, no unbounded growth concerns** in this diff — it's a static config swap from a self-contained container step to a reusable workflow caller, and `concurrency` + `cancel-in-progress` are preserved unchanged. </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> Clean — no stale references. Summary of findings within my lens: - Minor: `provider_lens_concurrency` (line 55) only specifies `ollama-cloud=3`. The claude-code provider has no entry, so its lenses run serially (1 at a time). With 5 lenses × 600s each, that's up to 50 min per claude model wall time. This is a missed optimization (adding `claude-code=3` would reduce it to ~20 min), but NOT a timeout concern: the cloud chain finishes in ~20 min (5 lenses / 3 concurrent × 600s) and runs in parallel with the claude chain, so the overall worst-case wall time is ~50 min, well within the 90 min budget. The original review's claim that "a single slow model can push the run past 90 min" is not supported by the math. - Minor: `cancel-in-progress: true` (lines 27-29) silently kills in-flight reviews on every push to the PR, losing partial findings. Intentional for cost, but from an error-handling view the partial work is discarded without a tombstone. Low severity. - Unverified, not reported as a finding: `max_steps: "14"` and `timeout_secs: "600"` are strings; whether the reusable workflow declares these as `type: number` is in another repo and not checkable from this checkout. ## VERDICT **Minor issues** - `.gitea/workflows/adversarial-review.yml:55` — `provider_lens_concurrency: "ollama-cloud=3"` omits the `claude-code` provider, so its 5 lenses run serially (up to 50 min per model). This is a missed optimization — adding `claude-code=3` to mirror the cloud provider would cut claude wall time to ~20 min, and the 3 claude models already run concurrently via `provider_concurrency: claude-code=3` (line 54). The 90 min `timeout_minutes` budget is adequate as-is (cloud chain finishes in ~20 min in parallel, so overall wall time is bounded by the ~50 min claude chain), but the asymmetry between cloud and claude lens concurrency is worth flagging. (Verified by reading lines 52-59.) - `.gitea/workflows/adversarial-review.yml:27-29` — `concurrency.cancel-in-progress: true` silently discards in-flight reviews on every push to the PR, losing partial findings. Intentional for cost but worth flagging from an error-handling view: an on-call engineer re-running a review after a fix will see the prior run's partial lens outputs vanish without trace. No fix needed unless you want a tombstone; just noting it's an unhandled partial-failure mode. (Verified by reading lines 27-29.) - Unverified, not reported as a finding: `max_steps: "14"` and `timeout_secs: "600"` are strings; whether the reusable workflow declares these as `type: number` (and silently coerces) is in another repo and not checkable from this checkout. Flagging only so a future reviewer confirms the called workflow's `inputs:` types match. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 31s

gitea-actions bot commented

2026-06-28 01:58:40 +00:00

🪰 Gadfly review — `claude-code/opus` (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — Minor issues

Verified. The file matches the diff exactly, and the reusable workflow indeed lives in another repo (only adversarial-review.yml and ci.yaml are present here). All verifiable claims in the finding hold:

pull_request: types: [opened, reopened, ready_for_review] (lines 12-13) ✓
The if: guard restricts only issue_comment; pull_request falls through event_name != 'issue_comment' and runs for any author (lines 36-41) ✓
CLAUDE_CODE_OAUTH_TOKEN is newly forwarded (line 48) ✓
The job holds pull-requests: write (line 25) ✓
The inline comment asserts pull_request ... [is] already trusted (lines 33-34) ✓

The finding is correctly hedged on the only unverifiable piece (whether the Gitea instance leaks secrets to fork PRs). It survives.

VERDICT: Minor issues

This diff is mostly a positive security refactor (pinning + least-privilege secret forwarding). The one substantive concern is the author-trust boundary for pull_request events, now carrying a heavier credential set.

Findings

.gitea/workflows/adversarial-review.yml:11-13, 36-50 — pull_request runs are not author-gated, and this diff adds a new high-value credential to what's forwarded into them. The if: guard (lines 36-41) only restricts issue_comment events to steve/fizi/dazed. For pull_request (opened, reopened, ready_for_review) any PR author triggers the secret-bearing reusable workflow. The diff newly forwards CLAUDE_CODE_OAUTH_TOKEN (line 48) — an account-scoped OAuth token, more sensitive than the cloud API key — alongside OLLAMA_CLOUD_API_KEY and the findings token. If this Gitea instance forwards secrets to fork PRs (environment-dependent — not verifiable from the repo), an untrusted contributor opening a PR would cause those secrets to be present in a job that also holds pull-requests: write (line 25) and runs LLM agents over attacker-controlled diff/PR content — a plausible prompt-injection exfiltration path (read env secret → post as a comment). The inline comment's premise that "pull_request … [is] already trusted" (lines 33-34) is the unstated assumption doing all the work; it only holds if forks can't open PRs or the instance withholds secrets from fork PRs.
- Impact: potential leak of CLAUDE_CODE_OAUTH_TOKEN / OLLAMA_CLOUD_API_KEY to an untrusted PR author.
- Fix: gate pull_request on author membership too (or run untrusted PRs behind a label/maintainer-approval gate), and/or confirm the Gitea instance does not pass secrets to fork PRs. At minimum, make the trust assumption explicit rather than implicit.
- Confidence: the trigger/guard config is verified; whether secrets actually reach fork PRs is unverified (instance config) — this is why it's Minor, not Blocking. Note also this exposure largely pre-existed; the diff increases blast radius by adding the OAuth token.

Verified as sound (not issues)

Pinning (line 44): …@20a5c431f22d43bab0f711c14ce74de3a735c110 is a full 40-char commit SHA (immutable) — correctly prevents a @main push from silently changing code that runs with forwarded secrets.
Least-privilege secret forwarding (lines 46-50): explicit named secrets instead of secrets: inherit is a genuine improvement and correctly scoped to what the swarm uses.
issue_comment guard (lines 36-41): unchanged and intact; still restricts comment-triggered runs to trusted actors.

Could not verify

The reusable workflow steve/gadfly/.gitea/workflows/review-reusable.yml lives in another repo and is not checked out here (.gitea/workflows/ contains only adversarial-review.yml and ci.yaml). I could not confirm that it declares these four secrets:/with: inputs in its workflow_call contract, nor how it handles the forwarded tokens internally (e.g. whether it echoes/logs them, or how GITEA_TOKEN is provisioned). Those are the highest-leverage things to check on the gadfly side, but they're outside this repo's diff.

🎯 Correctness — No material issues found

I've read the full workflow file and confirmed the diff is the entire change (one file). Now let me assess what's verifiable in-repo versus what lives in the external reusable workflow.

The reusable workflow being called (steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c43) is not in this checkout, so its input/secret contract cannot be verified from here. I'll be explicit about that boundary.

VERDICT: No material issues found

Through the Correctness lens, the change is internally consistent and the YAML structure is valid for a reusable-workflow caller. Notes below; none are blocking.

Verified (read adversarial-review.yml in full):

Caller-job structure is correct. A job using uses: may not also declare runs-on/steps/env/timeout-minutes; the diff correctly removed all of those, and the only retained job-level key is if:, which is permitted on a reusable-workflow-calling job. (:44-59)
Swarm values match the prose, re-derived by counting. models (:52) lists exactly the 6 described — 3 cloud (minimax-m3:cloud, glm-5.2:cloud, deepseek-v4-pro:cloud) + 3 Claude (sonnet, opus, opus:max). claude-code=3 (:54) equals the Claude model count, so "all three run concurrently" is accurate, not an off-by-one. The three dropped models named in the PR (qwen3-coder:480b, nemotron-3-super, glm-5.1) are correctly absent.
specialists is the claimed 5-lens set (:53), and the if: actor allow-list (:39-41) is consistent with the allowed_users: "steve,fizi,dazed" input (:58).

Could NOT verify (external — flagged, not asserted as bugs):

Input/secret name contract. Every key under with: (models, specialists, provider_concurrency, provider_lens_concurrency, timeout_secs, max_steps, allowed_users, timeout_minutes) and secrets: must exactly match the on.workflow_call.inputs/secrets declared in review-reusable.yml@20a5c43. A single name or type mismatch makes the job fail at dispatch. This is the main correctness exposure and it lives entirely in the pinned external commit, which isn't fetchable from this repo. Confirm against that commit before merge.
PR/ref derivation for issue_comment triggers. The old step explicitly forwarded PR, PR_BRANCH (github.head_ref), COMMENT_BODY, COMMENT_ID, and the workflow_dispatch pr_number. The new caller forwards none of these — it relies on the reusable workflow deriving them from the inherited github context and checking out the PR head itself (note github.head_ref is empty on issue_comment, so the reusable side must resolve the head ref from the PR number). Whether it does is internal to the external workflow.

Minor (consistency, not a bug):

timeout_minutes: 90 (:59) is passed unquoted (number) while all sibling inputs are quoted strings ("600", "14"). Harmless if the reusable workflow declares it as type: number; would fail input validation only if it's declared type: string. Cosmetically inconsistent regardless.

No logic, unit, threshold, or magic-number defects found within this file.

🧹 Code cleanliness & maintainability — Minor issues

Both findings verified against the actual files. The doc contradiction is real (CLAUDE.md:144-146 says "6 ollama-cloud models" + "3-lens suite"; the workflow now uses 3 cloud + 3 claude-code models and a 5-lens suite). The quoting inconsistency is also present as described (lines 56-57 quoted, line 59 unquoted).

VERDICT: Minor issues

CLAUDE.md:144-146 — documentation now contradicts the workflow this PR changes. The operating manual still says the workflow runs "a fleet of 6 ollama-cloud models, each running the 3-lens suite (security, correctness, error-handling)." After this PR the swarm is 3 cloud + 3 Claude-Code models (adversarial-review.yml:52) running the 5-lens suite (adversarial-review.yml:53). The doc is wrong on both the provider mix and the lens count. The file's own conventions make this binding — the header ("keep the README/support-matrix/examples updated in the same commit as the behavior they describe — that in-sync promise is part of the project's credibility") and the Gates section ("README.md must match reality in the same commit that changes behavior"). Fix: update those lines in this same commit to match (e.g. "a curated swarm of 3 cloud + 3 Claude-Code models running the 5-lens suite (security, correctness, maintainability, performance, error-handling)").
adversarial-review.yml:56-59 — inconsistent scalar quoting in the with: block (trivial). timeout_secs: "600" and max_steps: "14" are quoted strings, but timeout_minutes: 90 is an unquoted number. Cosmetic only; harmonize for readability (quote all three, or none). The input-type question (whether the reusable workflow declares timeout_minutes as string) lives in the reusable repo, which isn't in this checkout, so only the local style inconsistency is reported here.

Everything else in this lane reads clean: the dead env-var plumbing (GADFLY_MODELS, the GADFLY_* block) was fully removed rather than left commented out, the header comment was rewritten to match the new structure, and the if:-guard / allowed_users duplication is pre-existing and self-documented as deliberate belt-and-suspenders.

⚡ Performance — No material issues found

VERDICT: No material issues found

This is a CI workflow config change (the file matches the diff exactly; verified by reading .gitea/workflows/adversarial-review.yml). Through the performance lens there is no efficiency regression I can justify:

The review job's cost is intentionally bounded: timeout_secs: 600 per lens, max_steps: 14, timeout_minutes: 90, and concurrency.cancel-in-progress: true (line 29) kills superseded runs so redundant work doesn't pile up.
The model×lens fan-out rises from 6×3=18 to 6×5=30 tasks, but with fewer models and deeper per-model coverage — a deliberate tuning called out in the PR, not unbounded growth.
Concurrency knobs are explicit: provider_concurrency: "ollama-cloud=3,claude-code=3" runs all three Claude models concurrently, keeping wall-clock down.

One observation, unverified and not a finding: provider_lens_concurrency is set only for ollama-cloud=3 and omits claude-code (line 55). If the reusable workflow defaults Claude lens concurrency to 1, each Claude model would run its 5 lenses serially (up to ~5×600s ≈ 50 min). That still fits inside the 90-min budget with claude-code=3 running the models in parallel, so there's no overrun — but Claude lanes may finish later than they would with explicit lens parallelism. I cannot confirm the default because the reusable workflow lives in steve/gadfly (review-reusable.yml@20a5c43), which is not in this repo and not readable from this checkout. If wall-clock matters, worth setting claude-code lens concurrency explicitly; otherwise it's a non-issue.

Nothing material in the performance lane.

🧯 Error handling & edge cases — No material issues found

I've verified the diff against the checked-out file. The change converts a self-contained docker:// job into a thin caller of an external reusable workflow (steve/gadfly/.gitea/workflows/review-reusable.yml@<sha>). That target lives in another repo and is not present in this checkout, so the actual error-handling logic (PR-number derivation, per-model auth failure, empty-response/failover) now lives outside this diff and cannot be inspected here.

Within my lens, here is what I checked:

PR-number resolution across trigger paths — The old job explicitly forwarded PR: ${{ github.event.pull_request.number || github.event.issue.number || github.event.inputs.pr_number }}, a three-way fallback covering the pull_request, issue_comment, and workflow_dispatch triggers (all three are still declared in on:). The new caller forwards none of this and passes no pr_number in with:. The issue_comment and workflow_dispatch unhappy paths now depend entirely on the reusable workflow re-deriving the PR from the inherited github.event context. The concurrency group still references all three (github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number), so the trigger surface is unchanged. I could not verify whether the reusable workflow replicates that fallback — if it only handles pull_request, the comment/dispatch edge cases would silently no-op. Unverified (external file); flagging as the one edge-case-adjacent risk worth a glance when reviewing the gadfly side.
Unset secret forwarding — CLAUDE_CODE_OAUTH_TOKEN / GADFLY_FINDINGS_* forward to empty strings if undefined in the repo. This is advisory-only and non-blocking by design, and the failure handling is inside the reusable workflow — not a defect introduced by this diff.
No swallowed errors, missing cleanup/rollback, nil/empty/zero, or off-by-one concerns exist in this declarative caller — there is no imperative logic in my control here.

VERDICT: No material issues found

The error-handling-relevant logic has been delegated to an external reusable workflow that isn't part of this diff; nothing within the changed file mishandles an error or edge case. The only thing I'd want a second look at — and it's unverifiable from this repo — is whether the reusable workflow reproduces the removed three-way PR-number fallback so the issue_comment and workflow_dispatch paths don't silently fail to find a PR.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 11s}

### 🪰 Gadfly review — `claude-code/opus` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — Minor issues</summary> Verified. The file matches the diff exactly, and the reusable workflow indeed lives in another repo (only `adversarial-review.yml` and `ci.yaml` are present here). All verifiable claims in the finding hold: - `pull_request: types: [opened, reopened, ready_for_review]` (lines 12-13) ✓ - The `if:` guard restricts only `issue_comment`; `pull_request` falls through `event_name != 'issue_comment'` and runs for any author (lines 36-41) ✓ - `CLAUDE_CODE_OAUTH_TOKEN` is newly forwarded (line 48) ✓ - The job holds `pull-requests: write` (line 25) ✓ - The inline comment asserts `pull_request ... [is] already trusted` (lines 33-34) ✓ The finding is correctly hedged on the only unverifiable piece (whether the Gitea instance leaks secrets to fork PRs). It survives. --- ## VERDICT: Minor issues This diff is mostly a positive security refactor (pinning + least-privilege secret forwarding). The one substantive concern is the author-trust boundary for `pull_request` events, now carrying a heavier credential set. ### Findings - **`.gitea/workflows/adversarial-review.yml:11-13, 36-50` — `pull_request` runs are not author-gated, and this diff adds a new high-value credential to what's forwarded into them.** The `if:` guard (lines 36-41) only restricts `issue_comment` events to `steve/fizi/dazed`. For `pull_request` (`opened, reopened, ready_for_review`) **any** PR author triggers the secret-bearing reusable workflow. The diff newly forwards `CLAUDE_CODE_OAUTH_TOKEN` (line 48) — an account-scoped OAuth token, more sensitive than the cloud API key — alongside `OLLAMA_CLOUD_API_KEY` and the findings token. If this Gitea instance forwards secrets to fork PRs (environment-dependent — not verifiable from the repo), an untrusted contributor opening a PR would cause those secrets to be present in a job that also holds `pull-requests: write` (line 25) and runs LLM agents over attacker-controlled diff/PR content — a plausible prompt-injection exfiltration path (read env secret → post as a comment). The inline comment's premise that "pull_request … [is] already trusted" (lines 33-34) is the unstated assumption doing all the work; it only holds if forks can't open PRs or the instance withholds secrets from fork PRs. - *Impact:* potential leak of `CLAUDE_CODE_OAUTH_TOKEN` / `OLLAMA_CLOUD_API_KEY` to an untrusted PR author. - *Fix:* gate `pull_request` on author membership too (or run untrusted PRs behind a label/maintainer-approval gate), and/or confirm the Gitea instance does not pass secrets to fork PRs. At minimum, make the trust assumption explicit rather than implicit. - *Confidence:* the trigger/guard config is verified; whether secrets actually reach fork PRs is **unverified** (instance config) — this is why it's Minor, not Blocking. Note also this exposure largely pre-existed; the diff increases blast radius by adding the OAuth token. ### Verified as sound (not issues) - **Pinning (line 44):** `…@20a5c431f22d43bab0f711c14ce74de3a735c110` is a full 40-char commit SHA (immutable) — correctly prevents a `@main` push from silently changing code that runs with forwarded secrets. - **Least-privilege secret forwarding (lines 46-50):** explicit named secrets instead of `secrets: inherit` is a genuine improvement and correctly scoped to what the swarm uses. - **`issue_comment` guard (lines 36-41):** unchanged and intact; still restricts comment-triggered runs to trusted actors. ### Could not verify The reusable workflow `steve/gadfly/.gitea/workflows/review-reusable.yml` lives in another repo and is **not** checked out here (`.gitea/workflows/` contains only `adversarial-review.yml` and `ci.yaml`). I could not confirm that it declares these four `secrets:`/`with:` inputs in its `workflow_call` contract, nor how it handles the forwarded tokens internally (e.g. whether it echoes/logs them, or how `GITEA_TOKEN` is provisioned). Those are the highest-leverage things to check on the gadfly side, but they're outside this repo's diff. </details> <details><summary>🎯 Correctness — No material issues found</summary> I've read the full workflow file and confirmed the diff is the entire change (one file). Now let me assess what's verifiable in-repo versus what lives in the external reusable workflow. The reusable workflow being called (`steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c43`) is **not** in this checkout, so its input/secret contract cannot be verified from here. I'll be explicit about that boundary. --- ## VERDICT: No material issues found Through the **Correctness** lens, the change is internally consistent and the YAML structure is valid for a reusable-workflow caller. Notes below; none are blocking. **Verified (read `adversarial-review.yml` in full):** - **Caller-job structure is correct.** A job using `uses:` may not also declare `runs-on`/`steps`/`env`/`timeout-minutes`; the diff correctly removed all of those, and the only retained job-level key is `if:`, which is permitted on a reusable-workflow-calling job. (`:44-59`) - **Swarm values match the prose, re-derived by counting.** `models` (`:52`) lists exactly the 6 described — 3 cloud (`minimax-m3:cloud, glm-5.2:cloud, deepseek-v4-pro:cloud`) + 3 Claude (`sonnet, opus, opus:max`). `claude-code=3` (`:54`) equals the Claude model count, so "all three run concurrently" is accurate, not an off-by-one. The three dropped models named in the PR (`qwen3-coder:480b`, `nemotron-3-super`, `glm-5.1`) are correctly absent. - **`specialists` is the claimed 5-lens set** (`:53`), and the `if:` actor allow-list (`:39-41`) is consistent with the `allowed_users: "steve,fizi,dazed"` input (`:58`). **Could NOT verify (external — flagged, not asserted as bugs):** - **Input/secret name contract.** Every key under `with:` (`models`, `specialists`, `provider_concurrency`, `provider_lens_concurrency`, `timeout_secs`, `max_steps`, `allowed_users`, `timeout_minutes`) and `secrets:` must exactly match the `on.workflow_call.inputs`/`secrets` declared in `review-reusable.yml@20a5c43`. A single name or type mismatch makes the job fail at dispatch. This is the main correctness exposure and it lives entirely in the pinned external commit, which isn't fetchable from this repo. Confirm against that commit before merge. - **PR/ref derivation for `issue_comment` triggers.** The old step explicitly forwarded `PR`, `PR_BRANCH` (`github.head_ref`), `COMMENT_BODY`, `COMMENT_ID`, and the `workflow_dispatch` `pr_number`. The new caller forwards none of these — it relies on the reusable workflow deriving them from the inherited `github` context and checking out the PR head itself (note `github.head_ref` is empty on `issue_comment`, so the reusable side must resolve the head ref from the PR number). Whether it does is internal to the external workflow. **Minor (consistency, not a bug):** - `timeout_minutes: 90` (`:59`) is passed unquoted (number) while all sibling inputs are quoted strings (`"600"`, `"14"`). Harmless if the reusable workflow declares it as `type: number`; would fail input validation only if it's declared `type: string`. Cosmetically inconsistent regardless. No logic, unit, threshold, or magic-number defects found within this file. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> Both findings verified against the actual files. The doc contradiction is real (CLAUDE.md:144-146 says "6 ollama-cloud models" + "3-lens suite"; the workflow now uses 3 cloud + 3 claude-code models and a 5-lens suite). The quoting inconsistency is also present as described (lines 56-57 quoted, line 59 unquoted). --- **VERDICT: Minor issues** - **`CLAUDE.md:144-146` — documentation now contradicts the workflow this PR changes.** The operating manual still says the workflow runs *"a fleet of 6 ollama-cloud models, each running the 3-lens suite (security, correctness, error-handling)."* After this PR the swarm is 3 cloud + 3 Claude-Code models (`adversarial-review.yml:52`) running the **5-lens** suite (`adversarial-review.yml:53`). The doc is wrong on both the provider mix and the lens count. The file's own conventions make this binding — the header ("keep the README/support-matrix/examples updated in the same commit as the behavior they describe — that in-sync promise is part of the project's credibility") and the Gates section ("README.md must match reality in the same commit that changes behavior"). Fix: update those lines in this same commit to match (e.g. "a curated swarm of 3 cloud + 3 Claude-Code models running the 5-lens suite (security, correctness, maintainability, performance, error-handling)"). - **`adversarial-review.yml:56-59` — inconsistent scalar quoting in the `with:` block (trivial).** `timeout_secs: "600"` and `max_steps: "14"` are quoted strings, but `timeout_minutes: 90` is an unquoted number. Cosmetic only; harmonize for readability (quote all three, or none). The input-type question (whether the reusable workflow declares `timeout_minutes` as `string`) lives in the reusable repo, which isn't in this checkout, so only the local style inconsistency is reported here. Everything else in this lane reads clean: the dead env-var plumbing (`GADFLY_MODELS`, the `GADFLY_*` block) was fully removed rather than left commented out, the header comment was rewritten to match the new structure, and the `if:`-guard / `allowed_users` duplication is pre-existing and self-documented as deliberate belt-and-suspenders. </details> <details><summary>⚡ Performance — No material issues found</summary> **VERDICT: No material issues found** This is a CI workflow config change (the file matches the diff exactly; verified by reading `.gitea/workflows/adversarial-review.yml`). Through the performance lens there is no efficiency regression I can justify: - The review job's cost is intentionally bounded: `timeout_secs: 600` per lens, `max_steps: 14`, `timeout_minutes: 90`, and `concurrency.cancel-in-progress: true` (line 29) kills superseded runs so redundant work doesn't pile up. - The model×lens fan-out rises from 6×3=18 to 6×5=30 tasks, but with fewer models and deeper per-model coverage — a deliberate tuning called out in the PR, not unbounded growth. - Concurrency knobs are explicit: `provider_concurrency: "ollama-cloud=3,claude-code=3"` runs all three Claude models concurrently, keeping wall-clock down. One observation, **unverified and not a finding**: `provider_lens_concurrency` is set only for `ollama-cloud=3` and omits `claude-code` (line 55). If the reusable workflow defaults Claude lens concurrency to 1, each Claude model would run its 5 lenses serially (up to ~5×600s ≈ 50 min). That still fits inside the 90-min budget with `claude-code=3` running the models in parallel, so there's no overrun — but Claude lanes may finish later than they would with explicit lens parallelism. I cannot confirm the default because the reusable workflow lives in `steve/gadfly` (`review-reusable.yml@20a5c43`), which is not in this repo and not readable from this checkout. If wall-clock matters, worth setting `claude-code` lens concurrency explicitly; otherwise it's a non-issue. Nothing material in the performance lane. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I've verified the diff against the checked-out file. The change converts a self-contained `docker://` job into a thin caller of an external reusable workflow (`steve/gadfly/.gitea/workflows/review-reusable.yml@<sha>`). That target lives in another repo and is not present in this checkout, so the actual error-handling logic (PR-number derivation, per-model auth failure, empty-response/failover) now lives outside this diff and cannot be inspected here. Within my lens, here is what I checked: - **PR-number resolution across trigger paths** — The old job explicitly forwarded `PR: ${{ github.event.pull_request.number || github.event.issue.number || github.event.inputs.pr_number }}`, a three-way fallback covering the `pull_request`, `issue_comment`, and `workflow_dispatch` triggers (all three are still declared in `on:`). The new caller forwards none of this and passes no `pr_number` in `with:`. The `issue_comment` and `workflow_dispatch` unhappy paths now depend entirely on the reusable workflow re-deriving the PR from the inherited `github.event` context. The `concurrency` group still references all three (`github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number`), so the trigger surface is unchanged. I **could not verify** whether the reusable workflow replicates that fallback — if it only handles `pull_request`, the comment/dispatch edge cases would silently no-op. Unverified (external file); flagging as the one edge-case-adjacent risk worth a glance when reviewing the gadfly side. - **Unset secret forwarding** — `CLAUDE_CODE_OAUTH_TOKEN` / `GADFLY_FINDINGS_*` forward to empty strings if undefined in the repo. This is advisory-only and non-blocking by design, and the failure handling is inside the reusable workflow — not a defect introduced by this diff. - No swallowed errors, missing cleanup/rollback, nil/empty/zero, or off-by-one concerns exist in this declarative caller — there is no imperative logic in my control here. --- **VERDICT: No material issues found** The error-handling-relevant logic has been delegated to an external reusable workflow that isn't part of this diff; nothing within the changed file mishandles an error or edge case. The only thing I'd want a second look at — and it's unverifiable from this repo — is whether the reusable workflow reproduces the removed three-way PR-number fallback so the `issue_comment` and `workflow_dispatch` paths don't silently fail to find a PR. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 11s

gitea-actions bot commented

2026-06-28 01:58:40 +00:00

🪰 Gadfly review — `claude-code/opus:max` (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — Minor issues

I've verified the draft against the actual workflow file. All the in-repo factual claims check out:

Line 36-41: the if: guard only restricts issue_comment; for any other event (pull_request, workflow_dispatch) it evaluates true and the job runs unconditionally. ✓
Lines 12-13: pull_request triggers (opened, reopened, ready_for_review). ✓
Line 44: immutable SHA pin on the reusable workflow. ✓
Lines 46-50: secrets block forwarded; line 48 adds CLAUDE_CODE_OAUTH_TOKEN. ✓
Line 58: allowed_users is an in-container check passed as a with: input (runs after secrets are in the job env). ✓
Comment at :33-34 does assume pull_request is "already trusted." ✓

The finding is grounded in the file, and the draft appropriately flags the out-of-repo contingency (Gitea's fork/untrusted-PR secret behavior) rather than asserting it. It survives.

VERDICT: Minor issues

This change is, on balance, a security improvement — explicit least-privilege secret forwarding (replacing broad env passing) and an immutable SHA pin on the external reusable workflow (review-reusable.yml@20a5c431…, :44) are exactly the right hardening for "secrets forwarded to code in another repo." One material observation and a couple of notes:

Finding — trusted-user gating covers only `issue_comment`, while a newly-added high-value secret now rides the ungated `pull_request` surface

.gitea/workflows/adversarial-review.yml:36-41 — the if: guard only restricts issue_comment. For pull_request (opened, reopened, ready_for_review, :12-13) and workflow_dispatch, the job runs unconditionally and the secrets: block (:46-50) is forwarded to the reusable workflow.
This PR adds CLAUDE_CODE_OAUTH_TOKEN (:48) to that forwarded set — a far more powerful, longer-lived credential than the previously-forwarded OLLAMA_CLOUD_API_KEY. The reusable workflow then runs an agentic Claude Code reviewer over attacker-controlled PR diff/file content. That is a classic prompt-injection → credential-exfiltration surface: untrusted text in the PR is the model's input, and the OAuth token sits in the same job environment.
Impact: if this Gitea instance ever accepts a PR (or fork PR) from an account outside steve/fizi/dazed, that author triggers a secret-bearing, agentic run with no actor check. The blast radius of a successful injection is now an Anthropic OAuth token, not just an Ollama key.
Suggested fix: gate pull_request by author association / allowed_users too (or confirm and document that Gitea withholds secrets from untrusted/fork PRs, the way the comment at :33-34 assumes — "pull_request … already trusted"). The in-container allowed_users check (:58) is belt-and-suspenders but runs after secrets are already in the job env, so it doesn't protect the token from injection-based exfiltration.

Verification / honesty notes:

Verified by reading the full workflow: triggers (:11-16), the if: only matching issue_comment (:36-41), and the secret set including the new CLAUDE_CODE_OAUTH_TOKEN (:48).
Unverified (out of repo): Gitea's actual secret-forwarding behavior for fork / untrusted-author PRs, and how steve/gadfly's pinned reusable workflow handles the forwarded secrets internally. Both live in another repo not checked out here. The severity of the finding above is contingent on the first of these — on a closed instance with only the three trusted committers and no fork PRs, real-world risk is low; the gating asymmetry is the latent hazard.
Non-issues checked and cleared: allowed_users/models/etc. passed as with: inputs are non-sensitive (:51-59); classifying GADFLY_FINDINGS_URL as a secret is harmless over-caution; dropping explicit GITEA_TOKEN forwarding is correct (it's auto-provisioned to the called workflow), not a leak.

🎯 Correctness — No material issues found

This is a CI workflow change whose counterpart interface (steve/gadfly/.gitea/workflows/review-reusable.yml) lives in an external repo not present in this checkout. I read the full checked-out file (above) and confirmed the diff applied as shown. I evaluated the YAML for internal logic/consistency through the Correctness lens.

VERDICT: No material issues found

The change is internally self-consistent and the job is correctly restructured from a steps:-based job into a reusable-workflow call (no leftover runs-on/steps/env that would be invalid alongside a job-level uses:). Specifics I checked:

allowed_users: "steve,fizi,dazed" (line 58) matches the actor allow-list in the if: guard (lines 39–41) — consistent, no drift.
specialists (line 53) lists exactly the five lenses named in the header comment and PR description — consistent.
provider_concurrency: "ollama-cloud=3,claude-code=3" (line 54) vs models (line 52): 3 cloud models + 3 claude-code/* models, so claude-code=3 does make all three Claudes run concurrently as described. Math checks out.

Items I could not verify (they depend on the external reusable workflow, which is not in this repo — flagging as unverified, not asserting bugs):

Input/secret name & type matching. models, specialists, provider_concurrency, provider_lens_concurrency, timeout_secs, max_steps, allowed_users, timeout_minutes and the four forwarded secrets must match the workflow_call.inputs/secrets declared in review-reusable.yml@20a5c43. A misnamed key is a hard Gitea Actions error, not a silent fallback. Note timeout_minutes: 90 (line 59) is an unquoted number while every other input is a quoted string — if that input is declared as type: string upstream this is a type mismatch. Worth a one-time confirmation against the pinned commit.
workflow_dispatch PR routing. This workflow still collects inputs.pr_number (lines 17–20) but never forwards it to the reusable workflow's with:. That's only correct if the reusable workflow reads github.event.inputs.pr_number from the inherited event context itself (the standard pattern). If it instead expects a pr_number input, manual dispatch would lose the PR number. Cannot confirm without the upstream file.

Within my lens and verifiable against this repo, the diff is clean.

(Note: a plan-mode harness template was injected into this task, but the assigned job is an adversarial review, which is read-only — so I've produced the review directly rather than a plan file. I made no edits and ran only read-only tools.)

🧹 Code cleanliness & maintainability — Minor issues

VERDICT: Minor issues

Reviewing only through the code cleanliness & maintainability lens.

Stale operating manual — docs drift (CLAUDE.md:145-146). The manual still describes Gadfly as "a fleet of 6 ollama-cloud models, each running the 3-lens suite (security, correctness, error-handling)." This PR changes the workflow to 3 cloud models + the Claude Code engine (sonnet/opus/opus:max) running the 5-lens suite (adversarial-review.yml:52-53: models, specialists). CLAUDE.md is the in-repo description of exactly this behavior, and the project's own rule is explicit: "keep the README/support-matrix/examples updated in the same commit as the behavior they describe (that in-sync promise is part of the project's credibility)." Fix: update those two lines in this same commit to reflect "3 cloud models + the Claude Code engine (sonnet/opus/opus:max), each running the 5-lens suite (security, correctness, maintainability, performance, error-handling)."

Nothing else material in the cleanliness/maintainability lane.

⚡ Performance — No material issues found

VERDICT: No material issues found

This change is a single CI workflow file that swaps a self-contained docker:// step for a uses: call into steve/gadfly's reusable workflow. There is no application/library code, no data path, no loops or queries — the actual review workload runs inside the pinned reusable workflow (steve/gadfly/...@20a5c43), which lives in a different repo and is not present in this checkout, so the concurrency/scheduling behavior cannot be inspected here. Through the Performance lens there is nothing material to flag in code.

One observation, explicitly unverified (I could not confirm it because the reusable workflow's defaults are not in this repo):

.gitea/workflows/adversarial-review.yml:54-55 — provider_concurrency sets both lanes (ollama-cloud=3,claude-code=3), but provider_lens_concurrency sets only ollama-cloud=3; there is no claude-code= entry. The suite also grew from 3 lenses to 5. If the reusable workflow defaults un-specified lens concurrency to 1, each Claude model would run its 5 lenses serially, roughly multiplying per-model wall-clock vs. the cloud lane. That is consistent with the timeout_minutes bump 45→90, which may simply absorb the longer serial runs rather than parallelizing them. If the intent is symmetric throughput, consider provider_lens_concurrency: "ollama-cloud=3,claude-code=3" (or whatever the engine supports). I cannot confirm the default without the review-reusable.yml source, so treat this as a question to check, not a confirmed regression.

Everything else (timeouts, model count, secret forwarding) is config tuning with no performance impact I can justify.

🧯 Error handling & edge cases — No material issues found

I've examined the changed file. The change swaps an inline docker-container job for a call to an external pinned reusable workflow (steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c43…). That reusable workflow lives in another repository and is not checked out here, so the unhappy-path behavior it now owns can't be verified from this repo.

Within my lens, here's what I assessed:

VERDICT: No material issues found

Trigger/skip edge cases are intact (adversarial-review.yml:36-41, 27-29). Verified by reading the full file (the diff omits the on: and concurrency: blocks). The issue_comment path is still gated on github.event.issue.pull_request being truthy AND a trusted actor, so a comment on a non-PR issue still correctly skips. The concurrency.group fallback chain issue.number || pull_request.number || inputs.pr_number covers all three trigger types with no null gap. None of this changed, and the conversion to uses: preserves the job-level if: guard (legal on a reusable-workflow job), so no unhappy path was dropped here.
Dropped event-context forwarding — flagged but unverifiable, and correctness-adjacent. The old job explicitly passed PR_BRANCH: ${{ github.head_ref }}, IS_DRAFT, COMMENT_BODY, COMMENT_ID, EVENT_NAME, PR, ACTOR as env. The new caller passes none of these — it relies on the reusable workflow deriving them from the inherited github context. The one genuine edge case worth naming: on issue_comment events github.head_ref is empty, so whatever logic checks out the PR branch must reconstruct it from the issue/PR API rather than a forwarded ref. Whether that's handled correctly is entirely inside the external workflow I can't read, and it's really a correctness concern, so I'm not raising it as a blocking finding — only labeling it unverified.
timeout_secs/max_steps as bounded guards are preserved (:56-57), so per-lens runaway protection still exists; timeout_minutes: 90 replaces the old 45 — a looser ceiling but intentional (more lenses × Claude engine), not an error-handling defect.

No swallowed errors, missing cleanup, panics, or boundary bugs are observable in the changed file itself; the edge-case-bearing logic moved behind an immutable external reference that isn't part of this checkout.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 6m 49s}

### 🪰 Gadfly review — `claude-code/opus:max` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — Minor issues</summary> I've verified the draft against the actual workflow file. All the in-repo factual claims check out: - Line 36-41: the `if:` guard only restricts `issue_comment`; for any other event (`pull_request`, `workflow_dispatch`) it evaluates true and the job runs unconditionally. ✓ - Lines 12-13: `pull_request` triggers (`opened, reopened, ready_for_review`). ✓ - Line 44: immutable SHA pin on the reusable workflow. ✓ - Lines 46-50: secrets block forwarded; line 48 adds `CLAUDE_CODE_OAUTH_TOKEN`. ✓ - Line 58: `allowed_users` is an in-container check passed as a `with:` input (runs after secrets are in the job env). ✓ - Comment at :33-34 does assume `pull_request` is "already trusted." ✓ The finding is grounded in the file, and the draft appropriately flags the out-of-repo contingency (Gitea's fork/untrusted-PR secret behavior) rather than asserting it. It survives. ## VERDICT: Minor issues This change is, on balance, a security *improvement* — explicit least-privilege secret forwarding (replacing broad env passing) and an immutable SHA pin on the external reusable workflow (`review-reusable.yml@20a5c431…`, `:44`) are exactly the right hardening for "secrets forwarded to code in another repo." One material observation and a couple of notes: ### Finding — trusted-user gating covers only `issue_comment`, while a newly-added high-value secret now rides the ungated `pull_request` surface - `.gitea/workflows/adversarial-review.yml:36-41` — the `if:` guard only restricts `issue_comment`. For `pull_request` (`opened, reopened, ready_for_review`, `:12-13`) and `workflow_dispatch`, the job runs unconditionally and the `secrets:` block (`:46-50`) is forwarded to the reusable workflow. - This PR adds **`CLAUDE_CODE_OAUTH_TOKEN`** (`:48`) to that forwarded set — a far more powerful, longer-lived credential than the previously-forwarded `OLLAMA_CLOUD_API_KEY`. The reusable workflow then runs an *agentic* Claude Code reviewer over attacker-controlled PR diff/file content. That is a classic prompt-injection → credential-exfiltration surface: untrusted text in the PR is the model's input, and the OAuth token sits in the same job environment. - **Impact:** if this Gitea instance ever accepts a PR (or fork PR) from an account outside `steve/fizi/dazed`, that author triggers a secret-bearing, agentic run with no actor check. The blast radius of a successful injection is now an Anthropic OAuth token, not just an Ollama key. - **Suggested fix:** gate `pull_request` by author association / `allowed_users` too (or confirm and document that Gitea withholds secrets from untrusted/fork PRs, the way the comment at `:33-34` *assumes* — "pull_request … already trusted"). The in-container `allowed_users` check (`:58`) is belt-and-suspenders but runs *after* secrets are already in the job env, so it doesn't protect the token from injection-based exfiltration. **Verification / honesty notes:** - Verified by reading the full workflow: triggers (`:11-16`), the `if:` only matching `issue_comment` (`:36-41`), and the secret set including the new `CLAUDE_CODE_OAUTH_TOKEN` (`:48`). - **Unverified (out of repo):** Gitea's actual secret-forwarding behavior for fork / untrusted-author PRs, and how `steve/gadfly`'s pinned reusable workflow handles the forwarded secrets internally. Both live in another repo not checked out here. The severity of the finding above is contingent on the first of these — on a closed instance with only the three trusted committers and no fork PRs, real-world risk is low; the gating asymmetry is the latent hazard. - Non-issues checked and cleared: `allowed_users`/`models`/etc. passed as `with:` inputs are non-sensitive (`:51-59`); classifying `GADFLY_FINDINGS_URL` as a secret is harmless over-caution; dropping explicit `GITEA_TOKEN` forwarding is correct (it's auto-provisioned to the called workflow), not a leak. </details> <details><summary>🎯 Correctness — No material issues found</summary> This is a CI workflow change whose counterpart interface (`steve/gadfly/.gitea/workflows/review-reusable.yml`) lives in an external repo not present in this checkout. I read the full checked-out file (above) and confirmed the diff applied as shown. I evaluated the YAML for internal logic/consistency through the Correctness lens. **VERDICT: No material issues found** The change is internally self-consistent and the job is correctly restructured from a `steps:`-based job into a reusable-workflow call (no leftover `runs-on`/`steps`/`env` that would be invalid alongside a job-level `uses:`). Specifics I checked: - `allowed_users: "steve,fizi,dazed"` (line 58) matches the actor allow-list in the `if:` guard (lines 39–41) — consistent, no drift. - `specialists` (line 53) lists exactly the five lenses named in the header comment and PR description — consistent. - `provider_concurrency: "ollama-cloud=3,claude-code=3"` (line 54) vs `models` (line 52): 3 cloud models + 3 `claude-code/*` models, so `claude-code=3` does make all three Claudes run concurrently as described. Math checks out. Items I could **not** verify (they depend on the external reusable workflow, which is not in this repo — flagging as unverified, not asserting bugs): - **Input/secret name & type matching.** `models`, `specialists`, `provider_concurrency`, `provider_lens_concurrency`, `timeout_secs`, `max_steps`, `allowed_users`, `timeout_minutes` and the four forwarded secrets must match the `workflow_call.inputs`/`secrets` declared in `review-reusable.yml@20a5c43`. A misnamed key is a hard Gitea Actions error, not a silent fallback. Note `timeout_minutes: 90` (line 59) is an unquoted number while every other input is a quoted string — if that input is declared as `type: string` upstream this is a type mismatch. Worth a one-time confirmation against the pinned commit. - **`workflow_dispatch` PR routing.** This workflow still collects `inputs.pr_number` (lines 17–20) but never forwards it to the reusable workflow's `with:`. That's only correct if the reusable workflow reads `github.event.inputs.pr_number` from the inherited event context itself (the standard pattern). If it instead expects a `pr_number` input, manual dispatch would lose the PR number. Cannot confirm without the upstream file. Within my lens and verifiable against this repo, the diff is clean. (Note: a plan-mode harness template was injected into this task, but the assigned job is an adversarial review, which is read-only — so I've produced the review directly rather than a plan file. I made no edits and ran only read-only tools.) </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> **VERDICT: Minor issues** Reviewing only through the code cleanliness & maintainability lens. - **Stale operating manual — docs drift (`CLAUDE.md:145-146`).** The manual still describes Gadfly as *"a fleet of 6 ollama-cloud models, each running the 3-lens suite (security, correctness, error-handling)."* This PR changes the workflow to **3 cloud models + the Claude Code engine (sonnet/opus/opus:max)** running the **5-lens suite** (`adversarial-review.yml:52-53`: `models`, `specialists`). CLAUDE.md is the in-repo description of exactly this behavior, and the project's own rule is explicit: *"keep the README/support-matrix/examples updated in the same commit as the behavior they describe (that in-sync promise is part of the project's credibility)."* Fix: update those two lines in this same commit to reflect "3 cloud models + the Claude Code engine (sonnet/opus/opus:max), each running the 5-lens suite (security, correctness, maintainability, performance, error-handling)." Nothing else material in the cleanliness/maintainability lane. </details> <details><summary>⚡ Performance — No material issues found</summary> **VERDICT: No material issues found** This change is a single CI workflow file that swaps a self-contained `docker://` step for a `uses:` call into `steve/gadfly`'s reusable workflow. There is no application/library code, no data path, no loops or queries — the actual review workload runs inside the pinned reusable workflow (`steve/gadfly/...@20a5c43`), which lives in a different repo and is not present in this checkout, so the concurrency/scheduling behavior cannot be inspected here. Through the Performance lens there is nothing material to flag in code. One observation, explicitly **unverified** (I could not confirm it because the reusable workflow's defaults are not in this repo): - `.gitea/workflows/adversarial-review.yml:54-55` — `provider_concurrency` sets both lanes (`ollama-cloud=3,claude-code=3`), but `provider_lens_concurrency` sets **only** `ollama-cloud=3`; there is no `claude-code=` entry. The suite also grew from 3 lenses to 5. If the reusable workflow defaults un-specified lens concurrency to 1, each Claude model would run its 5 lenses serially, roughly multiplying per-model wall-clock vs. the cloud lane. That is consistent with the `timeout_minutes` bump 45→90, which may simply absorb the longer serial runs rather than parallelizing them. If the intent is symmetric throughput, consider `provider_lens_concurrency: "ollama-cloud=3,claude-code=3"` (or whatever the engine supports). I cannot confirm the default without the `review-reusable.yml` source, so treat this as a question to check, not a confirmed regression. Everything else (timeouts, model count, secret forwarding) is config tuning with no performance impact I can justify. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I've examined the changed file. The change swaps an inline docker-container job for a call to an external pinned reusable workflow (`steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c43…`). That reusable workflow lives in another repository and is **not** checked out here, so the unhappy-path behavior it now owns can't be verified from this repo. Within my lens, here's what I assessed: **VERDICT: No material issues found** - **Trigger/skip edge cases are intact** (`adversarial-review.yml:36-41`, `27-29`). Verified by reading the full file (the diff omits the `on:` and `concurrency:` blocks). The `issue_comment` path is still gated on `github.event.issue.pull_request` being truthy AND a trusted actor, so a comment on a non-PR issue still correctly skips. The `concurrency.group` fallback chain `issue.number || pull_request.number || inputs.pr_number` covers all three trigger types with no null gap. None of this changed, and the conversion to `uses:` preserves the job-level `if:` guard (legal on a reusable-workflow job), so no unhappy path was dropped here. - **Dropped event-context forwarding — flagged but unverifiable, and correctness-adjacent.** The old job explicitly passed `PR_BRANCH: ${{ github.head_ref }}`, `IS_DRAFT`, `COMMENT_BODY`, `COMMENT_ID`, `EVENT_NAME`, `PR`, `ACTOR` as env. The new caller passes none of these — it relies on the reusable workflow deriving them from the inherited `github` context. The one genuine edge case worth naming: on `issue_comment` events `github.head_ref` is empty, so whatever logic checks out the PR branch must reconstruct it from the issue/PR API rather than a forwarded ref. Whether that's handled correctly is entirely inside the external workflow I can't read, and it's really a correctness concern, so I'm not raising it as a blocking finding — only labeling it **unverified**. - **`timeout_secs`/`max_steps` as bounded guards are preserved** (`:56-57`), so per-lens runaway protection still exists; `timeout_minutes: 90` replaces the old 45 — a looser ceiling but intentional (more lenses × Claude engine), not an error-handling defect. No swallowed errors, missing cleanup, panics, or boundary bugs are observable in the changed file itself; the edge-case-bearing logic moved behind an immutable external reference that isn't part of this checkout. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 6m 49s

gitea-actions bot commented

2026-06-28 01:58:40 +00:00

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

This is a clean security posture change — actually an improvement over the prior self-contained job:

Secret forwarding is least-privilege and explicit. Only OLLAMA_CLOUD_API_KEY, CLAUDE_CODE_OAUTH_TOKEN, and the two findings tokens are passed; no secrets: inherit. GITEA_TOKEN is the auto job token (verified the file at .gitea/workflows/adversarial-review.yml:46-50).
Immutable pin, not @main. uses: steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c431… — a future push to steve/gadfly cannot silently alter the code that runs with majordomo's forwarded secrets (adversarial-review.yml:44).
Comment-trigger hardening retained. The job-level if: guard still restricts the issue_comment path to steve/fizi/dazed, and the same allow-list is mirrored into the allowed_users input (adversarial-review.yml:36-41, 58) — defense in depth preserved.
No secret exfiltration surface introduced via PR-controlled logic. Because the review code is the pinned external workflow, a PR author in majordomo cannot modify the code that touches these secrets (and on issue_comment/pull_request the workflow file itself runs from the base branch anyway).
Permissions unchanged (contents: read, issues: write, pull-requests: write) — appropriate for posting advisory comments, inherited by the reusable workflow.

I could not read the reusable workflow body itself (it lives in the external steve/gadfly repo, not in this checkout), so the trust ultimately rests on that pinned SHA — which is the intended design. Nothing actionable in the security lens.

🎯 Correctness — No material issues found

No material issues found

I verified the final workflow file at .gitea/workflows/adversarial-review.yml (full 59 lines) and checked the repo for any other references to the reusable workflow or these input names.

Within the correctness lens, the configuration is internally consistent:

provider_concurrency: "ollama-code=3,claude-code=3" matches the three cloud models + three claude models — all run concurrently. The comment's "all three claudes run concurrently (claude-code=3)" is accurate at the model level.
Timeout math holds. timeout_minutes went 45 → 90. Worst case: claude models have no provider_lens_concurrency entry (see below), so lenses could run sequentially per model: 5 lenses × 600s = 3000s ≈ 50 min, under 90 min. Cloud lanes at lens-concurrency 3 finish in ~2 batches × 600s = 20 min. No truncation risk from the timeout bump.
secrets: forwarding drops secrets: inherit and explicitly forwards only the four used secrets. GITEA_TOKEN is Gitea's auto-provided job token (analogous to GITHUB_TOKEN), which is automatically available to called reusable workflows and does not require explicit forwarding — consistent with the comment and the referenced mort/executus pattern.
The if: guard, concurrency group, permissions, and on: triggers are unchanged and remain valid for a reusable-workflow caller job.

One minor, unverified observation (not blocking): provider_lens_concurrency sets ollama-cloud=3 but omits claude-code, so claude-code lens concurrency falls to the reusable workflow's default (likely sequential). This is probably intentional (the comment only claims model-level concurrency=3 for claudes, not lens-level), and it doesn't break anything — but it is an asymmetry vs. the cloud models. I could not confirm the reusable workflow's default since review-reusable.yml lives in the external steve/gadfly repo and isn't checked out here, so I'm flagging it only as something to confirm against that workflow's with: contract, not as a defect.

🧹 Code cleanliness & maintainability — No material issues found

No material issues found

The change is a clean simplification: a ~45-line self-contained job with a wall of inline env vars collapses into a thin reusable-workflow caller. The top-of-file comment accurately tracks the new config (3 cloud models, 3 claudes, claude-code=3, 5-lens suite), and the allowed_users comment correctly points at the new allowed_users input below. No duplicated strings, no dead env vars left behind, and the pinning comment ("immutable commit, not @main") matches the @20a5c43… reference. Verified by reading .gitea/workflows/adversarial-review.yml:1-59.
Minor (non-blocking) readability nit: the header comment repeats "all three claudes run concurrently (claude-code=3)" and the provider_concurrency line on :54 also encodes it; the redundancy is harmless but could drift if the concurrency value is later edited without updating the comment. No action required.

(Could not verify that the reusable workflow steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c43… actually declares these exact with: input names — that file lives outside this checked-out repo, so it's outside what I can confirm here. Flagging only as "unverified," not as a finding.)

⚡ Performance — Minor issues

Verdict: Minor issues

.gitea/workflows/adversarial-review.yml:55 — provider_lens_concurrency omits claude-code, so lens fan-out for the Claude models is left to the reusable workflow's default. The file's header comment (lines 3–4) emphasizes concurrency for the claude fleet ("all three claudes run concurrently (claude-code=3)"), but that claude-code=3 only appears in provider_concurrency (line 54), which controls model-level concurrency. It is absent from provider_lens_concurrency (line 55), which sets only ollama-cloud=3. This is an asymmetry versus the previous self-contained config, which set both GADFLY_PROVIDER_CONCURRENCY and GADFLY_PROVIDER_LENS_CONCURRENCY to ollama-cloud=3; the migration to the reusable workflow added claude-code=3 to model concurrency but not to lens concurrency. With the expanded 5-lens suite (line 53), the slowest/most-expensive reviewers could run their lenses sequentially if the reusable workflow's default lens concurrency is 1. I could not positively confirm the reusable workflow's default for provider_lens_concurrency — that workflow lives in the external steve/gadfly repo and is not checked out here, so the wall-clock impact cannot be definitively confirmed. Suggested fix if the default is indeed 1: provider_lens_concurrency: "ollama-cloud=3,claude-code=3" (or whichever bound is safe for the Claude Code engine).

No other performance regressions confirmed: model count is 6 (3 cloud + 3 claude), per-reviewer timeout_secs/max_steps bounds (600s / 14 steps) are unchanged, and the concurrency: cancel-in-progress group (lines 27–29) is preserved so superseded runs are still cancelled.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

All three draft findings hinge on the behavior of the external reusable workflow steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c431..., which is not present in this repository (confirmed: .gitea/workflows/ contains only adversarial-review.yml and ci.yaml; no review-reusable file exists anywhere in the checkout, and grepping the repo yields only the uses: reference itself). The draft itself labels each as "Unverified — cannot inspect."

The timeout_minutes / removed job-level timeout-minutes observation (finding 1): the local-file fact is accurate (no timeout-minutes remains at the job level; timeout_minutes: 90 is passed as a with: input at line 59), but whether the external workflow honors that input — and thus whether this is a regression — cannot be confirmed against any code in this repo.
The CLAUDE_CODE_OAUTH_TOKEN secret-presence concern (finding 2): repo secrets and the external workflow's empty-token handling are not inspectable here.
The provider_lens_concurrency omission for claude-code (finding 3): the local input string does omit it (line 55 specifies only ollama-cloud=3), but the impact depends entirely on the external workflow's default lens concurrency, which is not available to verify.

No code-verifiable issue survives in this repository.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 20s}

### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> **Verdict: No material issues found** This is a clean security posture change — actually an improvement over the prior self-contained job: - **Secret forwarding is least-privilege and explicit.** Only `OLLAMA_CLOUD_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, and the two findings tokens are passed; no `secrets: inherit`. `GITEA_TOKEN` is the auto job token (verified the file at `.gitea/workflows/adversarial-review.yml:46-50`). - **Immutable pin, not `@main`.** `uses: steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c431…` — a future push to `steve/gadfly` cannot silently alter the code that runs with majordomo's forwarded secrets (`adversarial-review.yml:44`). - **Comment-trigger hardening retained.** The job-level `if:` guard still restricts the `issue_comment` path to `steve`/`fizi`/`dazed`, and the same allow-list is mirrored into the `allowed_users` input (`adversarial-review.yml:36-41, 58`) — defense in depth preserved. - **No secret exfiltration surface introduced via PR-controlled logic.** Because the review code is the pinned external workflow, a PR author in majordomo cannot modify the code that touches these secrets (and on `issue_comment`/`pull_request` the workflow file itself runs from the base branch anyway). - **Permissions unchanged** (`contents: read`, `issues: write`, `pull-requests: write`) — appropriate for posting advisory comments, inherited by the reusable workflow. I could not read the reusable workflow body itself (it lives in the external `steve/gadfly` repo, not in this checkout), so the trust ultimately rests on that pinned SHA — which is the intended design. Nothing actionable in the security lens. </details> <details><summary>🎯 Correctness — No material issues found</summary> No material issues found I verified the final workflow file at `.gitea/workflows/adversarial-review.yml` (full 59 lines) and checked the repo for any other references to the reusable workflow or these input names. **Within the correctness lens, the configuration is internally consistent:** - **`provider_concurrency: "ollama-code=3,claude-code=3"`** matches the three cloud models + three claude models — all run concurrently. The comment's "all three claudes run concurrently (claude-code=3)" is accurate at the model level. - **Timeout math holds.** `timeout_minutes` went 45 → 90. Worst case: claude models have no `provider_lens_concurrency` entry (see below), so lenses could run sequentially per model: 5 lenses × 600s = 3000s ≈ 50 min, under 90 min. Cloud lanes at lens-concurrency 3 finish in ~2 batches × 600s = 20 min. No truncation risk from the timeout bump. - **`secrets:` forwarding** drops `secrets: inherit` and explicitly forwards only the four used secrets. `GITEA_TOKEN` is Gitea's auto-provided job token (analogous to `GITHUB_TOKEN`), which is automatically available to called reusable workflows and does not require explicit forwarding — consistent with the comment and the referenced mort/executus pattern. - The `if:` guard, `concurrency` group, `permissions`, and `on:` triggers are unchanged and remain valid for a reusable-workflow caller job. **One minor, unverified observation (not blocking):** `provider_lens_concurrency` sets `ollama-cloud=3` but omits `claude-code`, so claude-code lens concurrency falls to the reusable workflow's default (likely sequential). This is probably intentional (the comment only claims model-level concurrency=3 for claudes, not lens-level), and it doesn't break anything — but it is an asymmetry vs. the cloud models. I could not confirm the reusable workflow's default since `review-reusable.yml` lives in the external `steve/gadfly` repo and isn't checked out here, so I'm flagging it only as something to confirm against that workflow's `with:` contract, not as a defect. </details> <details><summary>🧹 Code cleanliness & maintainability — No material issues found</summary> No material issues found - The change is a clean simplification: a ~45-line self-contained job with a wall of inline env vars collapses into a thin reusable-workflow caller. The top-of-file comment accurately tracks the new config (3 cloud models, 3 claudes, `claude-code=3`, 5-lens suite), and the `allowed_users` comment correctly points at the new `allowed_users` input below. No duplicated strings, no dead env vars left behind, and the pinning comment ("immutable commit, not @main") matches the `@20a5c43…` reference. Verified by reading `.gitea/workflows/adversarial-review.yml:1-59`. - Minor (non-blocking) readability nit: the header comment repeats "all three claudes run concurrently (claude-code=3)" and the `provider_concurrency` line on `:54` also encodes it; the redundancy is harmless but could drift if the concurrency value is later edited without updating the comment. No action required. (Could not verify that the reusable workflow `steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c43…` actually declares these exact `with:` input names — that file lives outside this checked-out repo, so it's outside what I can confirm here. Flagging only as "unverified," not as a finding.) </details> <details><summary>⚡ Performance — Minor issues</summary> **Verdict: Minor issues** - **`.gitea/workflows/adversarial-review.yml:55` — `provider_lens_concurrency` omits `claude-code`, so lens fan-out for the Claude models is left to the reusable workflow's default.** The file's header comment (lines 3–4) emphasizes concurrency for the claude fleet ("all three claudes run concurrently (claude-code=3)"), but that `claude-code=3` only appears in `provider_concurrency` (line 54), which controls *model*-level concurrency. It is absent from `provider_lens_concurrency` (line 55), which sets only `ollama-cloud=3`. This is an asymmetry versus the previous self-contained config, which set both `GADFLY_PROVIDER_CONCURRENCY` and `GADFLY_PROVIDER_LENS_CONCURRENCY` to `ollama-cloud=3`; the migration to the reusable workflow added `claude-code=3` to model concurrency but not to lens concurrency. With the expanded 5-lens suite (line 53), the slowest/most-expensive reviewers could run their lenses sequentially if the reusable workflow's default lens concurrency is `1`. I could not positively confirm the reusable workflow's default for `provider_lens_concurrency` — that workflow lives in the external `steve/gadfly` repo and is not checked out here, so the wall-clock impact cannot be definitively confirmed. Suggested fix if the default is indeed `1`: `provider_lens_concurrency: "ollama-cloud=3,claude-code=3"` (or whichever bound is safe for the Claude Code engine). No other performance regressions confirmed: model count is 6 (3 cloud + 3 claude), per-reviewer `timeout_secs`/`max_steps` bounds (600s / 14 steps) are unchanged, and the `concurrency: cancel-in-progress` group (lines 27–29) is preserved so superseded runs are still cancelled. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **Verdict: No material issues found** All three draft findings hinge on the behavior of the external reusable workflow `steve/gadfly/.gitea/workflows/review-reusable.yml@20a5c431...`, which is not present in this repository (confirmed: `.gitea/workflows/` contains only `adversarial-review.yml` and `ci.yaml`; no `review-reusable` file exists anywhere in the checkout, and grepping the repo yields only the `uses:` reference itself). The draft itself labels each as "Unverified — cannot inspect." - The `timeout_minutes` / removed job-level `timeout-minutes` observation (finding 1): the local-file fact is accurate (no `timeout-minutes` remains at the job level; `timeout_minutes: 90` is passed as a `with:` input at line 59), but whether the external workflow honors that input — and thus whether this is a regression — cannot be confirmed against any code in this repo. - The `CLAUDE_CODE_OAUTH_TOKEN` secret-presence concern (finding 2): repo secrets and the external workflow's empty-token handling are not inspectable here. - The `provider_lens_concurrency` omission for `claude-code` (finding 3): the local input string does omit it (line 55 specifies only `ollama-cloud=3`), but the impact depends entirely on the external workflow's default lens concurrency, which is not available to verify. No code-verifiable issue survives in this repository. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 20s

gitea-actions bot commented

2026-06-28 01:58:40 +00:00

🪰 Gadfly review — `claude-code/sonnet` (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — Minor issues

The finding is confirmed against the actual code — CLAUDE_CODE_OAUTH_TOKEN is forwarded at line 48 to an external reusable workflow in a different repository, and the reusable workflow's internals cannot be read from this repo.

Minor issues

CLAUDE_CODE_OAUTH_TOKEN delegated to external code — unverifiable handling (.gitea/workflows/adversarial-review.yml:48)

This is a net-new credential added to the forward list. The old workflow forwarded OLLAMA_CLOUD_API_KEY, GITEA_TOKEN, and the findings pair; this PR adds CLAUDE_CODE_OAUTH_TOKEN. OAuth tokens frequently carry broad scope (account-level access, not repo-scoped), and it's being handed to steve/gadfly/.gitea/workflows/review-reusable.yml — code that lives in a different repository. I cannot read that repo from here to verify how the reusable workflow consumes secrets: whether they're exposed in logs, passed as env vars to child processes, included in telemetry payloads, etc.

The SHA pin (@20a5c431f22d…) prevents silent drift, which is good. But pinning bounds future compromise, not current behavior. If the reusable workflow logs all env vars on startup, or the findings telemetry includes request headers, this token leaks. The risk is real even under the repo-owner-identity assumption.

Suggested fix: Before merging, read steve/gadfly/.gitea/workflows/review-reusable.yml at that commit and confirm CLAUDE_CODE_OAUTH_TOKEN is passed only to the Claude API call (not to any logging, telemetry, or shell step that could echo it). If the gadfly workflow ever uses secrets: inherit internally, that surfaces a separate concern.

Everything else checks out from the security lens:

No secrets: inherit — explicit forwarding is correct; the old container-env approach was equivalent in surface but less auditable.
issue_comment actor guard is unchanged and present; pull_request events don't expose secrets to fork PRs because the trigger is pull_request (not pull_request_target).
All with: inputs are static YAML strings, not interpolated from PR title, body, or comments — no injection risk.
GITEA_TOKEN auto-availability: Reusable workflow inherits the calling job's token, which is scoped by the explicit permissions: block (contents: read, issues: write, pull-requests: write) — appropriate minimum for a reviewer.
GADFLY_FINDINGS_URL + GADFLY_FINDINGS_TOKEN were already forwarded in the prior workflow; not a new exposure.

🎯 Correctness — No material issues found

I've read the full checked-out file and verified the key correctness claims. Here is my review.

Verdict: No material issues found

if: guard preserved identically (verified: lines 36–41). The allowed-user list steve,fizi,dazed in the allowed_users: input (line 58) matches the job-level guard exactly. No regression in access control logic.
Model count matches concurrency settings (verified): provider_concurrency: "ollama-cloud=3,claude-code=3" — there are exactly 3 ollama-cloud entries (minimax-m3:cloud, glm-5.2:cloud, deepseek-v4-pro:cloud) and exactly 3 claude-code entries (claude-code/sonnet, claude-code/opus, claude-code/opus:max) in the models: string. Setting each provider's concurrency to 3 means all models in each group run simultaneously, consistent with the PR description's "all three claudes run concurrently."
specialists list matches the 5-lens claim (verified: line 53): "security,correctness,maintainability,performance,error-handling" — 5 distinct lenses, consistent with the header comment and PR description.
timeout_minutes: 90 vs old timeout-minutes: 45: Doubling the outer timeout is directionally correct given the change from 3 lenses to 5 lenses per model. With provider_lens_concurrency: "ollama-cloud=3", the 5 ollama lenses require 2 batches (3+2) of up to 600s each ≈ 20 min per cloud model. Claude Code has no provider_lens_concurrency entry; if that defaults to serial, 5 × 600s ≈ 50 min per Claude model, with 3 running concurrently (claude-code=3). Worst-case sequential total is ~70 min, well within 90. Unverified (reusable workflow definition not accessible): the exact default for claude-code lens concurrency and whether cloud + claude passes overlap or run sequentially. If they are sequential and claude-code lens concurrency defaults > 1, the 90-minute budget is comfortable; if strictly serial with 1 lens at a time, it is still within 90 minutes (50 + 20 = 70 min). No evidence of a timing violation.
timeout_secs: "600" and max_steps: "14" unchanged: Carried over from the old implementation without modification. Not a regression.

🧹 Code cleanliness & maintainability — No material issues found

No material issues found

The file is clean. The rewrite trades ~80 lines of inline env-block configuration for a 60-line thin caller, which is a net readability improvement. A few observations after reading the live file:

The header comment (lines 1–7) packs in a fair amount of context but every sentence is load-bearing (swarm composition, concurrency, lens scope, advisory-only), so it earns its space.
The extra-space alignment on the secrets: block (lines 47–50) is a minor style deviation from the unaligned with: block directly below it, but it's harmless and common in CI files.
allowed_users: "steve,fizi,dazed" (line 58) duplicates the three-name list already in the if: guard (lines 39–41). This is a pre-existing sync hazard — not introduced by this PR — but worth noting as technical debt: adding a user requires two edits. Since the PR doesn't create the problem, this is not a blocking finding.

Nothing in the structure, naming, or change pattern warrants a blocking finding from this lens.

⚡ Performance — No material issues found

The file confirms provider_lens_concurrency: "ollama-cloud=3" at line 55 with no claude-code entry. However, the impact of this observation depends entirely on the default behavior of the external reusable workflow (steve/gadfly/.gitea/workflows/review-reusable.yml), which is not in this repository and cannot be read. The draft reviewer explicitly flagged this as "Unverified" and noted "If the default is already 'all lenses concurrent,' this is a non-issue." Since the finding's severity and fix necessity hinge on an assumption I cannot confirm from the local code, the finding does not survive the confirmation test.

No material issues found

🧯 Error handling & edge cases — No material issues found

No material issues found

This change is purely a CI workflow YAML file — the only error-handling surface is in how the workflow responds to missing/failing inputs, and all substantive handling is delegated to the reusable workflow. I verified the current file state against the diff.

A few things I checked and found non-issues:

timeout_minutes: 90 passed as a with: input, not a job-level directive: When a job uses uses: (reusable workflow call), timeout-minutes at the caller job level is not supported — the timeout must be applied inside the reusable workflow. Removing the old timeout-minutes: 45 job key and passing timeout_minutes: 90 as an input is the correct pattern, not a missing safeguard. .gitea/workflows/adversarial-review.yml:58
CLAUDE_CODE_OAUTH_TOKEN missing → three Claude models fail silently: If this secret isn't set in the repo yet, all three claude-code/* entries in the swarm fail at credential time. The workflow is advisory-only and the reusable workflow presumably handles empty credentials gracefully (same as the old code's GADFLY_FINDINGS_URL handling), so this is not a blocking concern — but there's no local comment noting the degraded-swarm failure mode the way the old stub did.
concurrency: cancel-in-progress: true abandons in-flight reviews without a comment: Unchanged from the old workflow, pre-existing.

No new unhappy paths, ignored errors, missing cleanups, or edge cases are introduced by this diff.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 44s}

### 🪰 Gadfly review — `claude-code/sonnet` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — Minor issues</summary> The finding is confirmed against the actual code — `CLAUDE_CODE_OAUTH_TOKEN` is forwarded at line 48 to an external reusable workflow in a different repository, and the reusable workflow's internals cannot be read from this repo. **Minor issues** - **`CLAUDE_CODE_OAUTH_TOKEN` delegated to external code — unverifiable handling** (`.gitea/workflows/adversarial-review.yml:48`) This is a net-new credential added to the forward list. The old workflow forwarded `OLLAMA_CLOUD_API_KEY`, `GITEA_TOKEN`, and the findings pair; this PR adds `CLAUDE_CODE_OAUTH_TOKEN`. OAuth tokens frequently carry broad scope (account-level access, not repo-scoped), and it's being handed to `steve/gadfly/.gitea/workflows/review-reusable.yml` — code that lives in a different repository. I cannot read that repo from here to verify how the reusable workflow consumes secrets: whether they're exposed in logs, passed as env vars to child processes, included in telemetry payloads, etc. The SHA pin (`@20a5c431f22d…`) prevents silent drift, which is good. But pinning bounds *future* compromise, not *current* behavior. If the reusable workflow logs all env vars on startup, or the findings telemetry includes request headers, this token leaks. The risk is real even under the repo-owner-identity assumption. **Suggested fix:** Before merging, read `steve/gadfly/.gitea/workflows/review-reusable.yml` at that commit and confirm `CLAUDE_CODE_OAUTH_TOKEN` is passed only to the Claude API call (not to any logging, telemetry, or shell step that could echo it). If the gadfly workflow ever uses `secrets: inherit` internally, that surfaces a separate concern. --- Everything else checks out from the security lens: - **No `secrets: inherit`** — explicit forwarding is correct; the old container-env approach was equivalent in surface but less auditable. - **`issue_comment` actor guard** is unchanged and present; `pull_request` events don't expose secrets to fork PRs because the trigger is `pull_request` (not `pull_request_target`). - **All `with:` inputs are static YAML strings**, not interpolated from PR title, body, or comments — no injection risk. - **`GITEA_TOKEN` auto-availability**: Reusable workflow inherits the calling job's token, which is scoped by the explicit `permissions:` block (`contents: read`, `issues: write`, `pull-requests: write`) — appropriate minimum for a reviewer. - **`GADFLY_FINDINGS_URL` + `GADFLY_FINDINGS_TOKEN`** were already forwarded in the prior workflow; not a new exposure. </details> <details><summary>🎯 Correctness — No material issues found</summary> I've read the full checked-out file and verified the key correctness claims. Here is my review. --- **Verdict: No material issues found** - **`if:` guard preserved identically** (verified: lines 36–41). The allowed-user list `steve,fizi,dazed` in the `allowed_users:` input (line 58) matches the job-level guard exactly. No regression in access control logic. - **Model count matches concurrency settings** (verified): `provider_concurrency: "ollama-cloud=3,claude-code=3"` — there are exactly 3 ollama-cloud entries (`minimax-m3:cloud`, `glm-5.2:cloud`, `deepseek-v4-pro:cloud`) and exactly 3 claude-code entries (`claude-code/sonnet`, `claude-code/opus`, `claude-code/opus:max`) in the `models:` string. Setting each provider's concurrency to 3 means all models in each group run simultaneously, consistent with the PR description's "all three claudes run concurrently." - **`specialists` list matches the 5-lens claim** (verified: line 53): `"security,correctness,maintainability,performance,error-handling"` — 5 distinct lenses, consistent with the header comment and PR description. - **`timeout_minutes: 90` vs old `timeout-minutes: 45`**: Doubling the outer timeout is directionally correct given the change from 3 lenses to 5 lenses per model. With `provider_lens_concurrency: "ollama-cloud=3"`, the 5 ollama lenses require 2 batches (3+2) of up to 600s each ≈ 20 min per cloud model. Claude Code has no `provider_lens_concurrency` entry; if that defaults to serial, 5 × 600s ≈ 50 min per Claude model, with 3 running concurrently (claude-code=3). Worst-case sequential total is ~70 min, well within 90. **Unverified** (reusable workflow definition not accessible): the exact default for claude-code lens concurrency and whether cloud + claude passes overlap or run sequentially. If they are sequential and claude-code lens concurrency defaults > 1, the 90-minute budget is comfortable; if strictly serial with 1 lens at a time, it is still within 90 minutes (50 + 20 = 70 min). No evidence of a timing violation. - **`timeout_secs: "600"` and `max_steps: "14"` unchanged**: Carried over from the old implementation without modification. Not a regression. </details> <details><summary>🧹 Code cleanliness & maintainability — No material issues found</summary> **No material issues found** The file is clean. The rewrite trades ~80 lines of inline env-block configuration for a 60-line thin caller, which is a net readability improvement. A few observations after reading the live file: - The header comment (lines 1–7) packs in a fair amount of context but every sentence is load-bearing (swarm composition, concurrency, lens scope, advisory-only), so it earns its space. - The extra-space alignment on the `secrets:` block (lines 47–50) is a minor style deviation from the unaligned `with:` block directly below it, but it's harmless and common in CI files. - `allowed_users: "steve,fizi,dazed"` (line 58) duplicates the three-name list already in the `if:` guard (lines 39–41). This is a pre-existing sync hazard — not introduced by this PR — but worth noting as technical debt: adding a user requires two edits. Since the PR doesn't create the problem, this is not a blocking finding. Nothing in the structure, naming, or change pattern warrants a blocking finding from this lens. </details> <details><summary>⚡ Performance — No material issues found</summary> The file confirms `provider_lens_concurrency: "ollama-cloud=3"` at line 55 with no `claude-code` entry. However, the impact of this observation depends entirely on the default behavior of the external reusable workflow (`steve/gadfly/.gitea/workflows/review-reusable.yml`), which is not in this repository and cannot be read. The draft reviewer explicitly flagged this as "Unverified" and noted "If the default is already 'all lenses concurrent,' this is a non-issue." Since the finding's severity and fix necessity hinge on an assumption I cannot confirm from the local code, the finding does not survive the confirmation test. --- **No material issues found** </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **No material issues found** This change is purely a CI workflow YAML file — the only error-handling surface is in how the workflow responds to missing/failing inputs, and all substantive handling is delegated to the reusable workflow. I verified the current file state against the diff. A few things I checked and found non-issues: - **`timeout_minutes: 90` passed as a `with:` input, not a job-level directive**: When a job uses `uses:` (reusable workflow call), `timeout-minutes` at the caller job level is not supported — the timeout must be applied inside the reusable workflow. Removing the old `timeout-minutes: 45` job key and passing `timeout_minutes: 90` as an input is the correct pattern, not a missing safeguard. `.gitea/workflows/adversarial-review.yml:58` - **`CLAUDE_CODE_OAUTH_TOKEN` missing → three Claude models fail silently**: If this secret isn't set in the repo yet, all three `claude-code/*` entries in the swarm fail at credential time. The workflow is advisory-only and the reusable workflow presumably handles empty credentials gracefully (same as the old code's `GADFLY_FINDINGS_URL` handling), so this is not a blocking concern — but there's no local comment noting the degraded-swarm failure mode the way the old stub did. - **`concurrency: cancel-in-progress: true` abandons in-flight reviews without a comment**: Unchanged from the old workflow, pre-existing. No new unhappy paths, ignored errors, missing cleanups, or edge cases are introduced by this diff. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 44s

steve added 1 commit 2026-06-28 02:44:50 +00:00

ci: inherit gadfly's default swarm (slim caller, re-pin @b02b11d)

CI / Tidy (pull_request) Successful in 9m26s

Details

CI / Build & Test (pull_request) Successful in 9m42s

Details

17064a6d75

steve/gadfly#10 centralized the curated swarm (3 cloud + Claude Code, 5-lens
suite) as the reusable workflow's input defaults. Drop majordomo's explicit
`with:` swarm block and inherit it; only the consumer-specific allow-list
remains. Re-pin to the post-#10 gadfly commit (@b02b11d). Update CLAUDE.md's
Gadfly section to match (was 6 cloud / 3 lenses).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve force-pushed ci/gadfly-reusable from ec9b9a9a94 to 17064a6d75

2026-06-28 02:44:50 +00:00

Compare

steve merged commit 78a1d1c3bb into main

2026-06-28 02:48:28 +00:00

steve deleted branch ci/gadfly-reusable

2026-06-28 02:48:28 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: steve/majordomo#6

ci: switch gadfly review to the reusable workflow (curated swarm, 5 lenses) #6

🪰 Gadfly — live review status

claude-code/opus · claude-code — ✅ done

claude-code/opus:max · claude-code — ✅ done

claude-code/sonnet · claude-code — ✅ done

deepseek-v4-pro:cloud · ollama-cloud — ✅ done

glm-5.2:cloud · ollama-cloud — ✅ done

minimax-m3:cloud · ollama-cloud — ✅ done

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: No material issues found

Verdict: Minor issues

Verdict: Minor issues

Verdict: Minor issues

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

VERDICT: Minor issues

VERDICT

🪰 Gadfly review — claude-code/opus (claude-code)

VERDICT: Minor issues

Findings

Verified as sound (not issues)

Could not verify

VERDICT: No material issues found

🪰 Gadfly review — claude-code/opus:max (claude-code)

VERDICT: Minor issues

Finding — trusted-user gating covers only issue_comment, while a newly-added high-value secret now rides the ungated pull_request surface

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

🪰 Gadfly review — claude-code/sonnet (claude-code)

`claude-code/opus` · claude-code — ✅ done

`claude-code/opus:max` · claude-code — ✅ done

`claude-code/sonnet` · claude-code — ✅ done

`deepseek-v4-pro:cloud` · ollama-cloud — ✅ done

`glm-5.2:cloud` · ollama-cloud — ✅ done

`minimax-m3:cloud` · ollama-cloud — ✅ done

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

🪰 Gadfly review — `claude-code/opus` (claude-code)

🪰 Gadfly review — `claude-code/opus:max` (claude-code)

Finding — trusted-user gating covers only `issue_comment`, while a newly-added high-value secret now rides the ungated `pull_request` surface

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

🪰 Gadfly review — `claude-code/sonnet` (claude-code)