ci(gadfly): trim the weakest reviewers from the swarm #5
Reference in New Issue
Block a user
Delete Branch "ci/trim-gadfly-reviewers"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Removes the four lowest-value reviewers per your call:
m5/qwen3.6:35b-mlx(also drops the last local Mac → cloud-only now)gemma4:cloudgpt-oss:120b-cloudkimi-k2.7-code:cloudResult: a cloud-only fleet of 6 —
minimax-m3,glm-5.2,glm-5.1,deepseek-v4-pro,nemotron-3-super,qwen3-coder:480b.Also removes
GADFLY_ENDPOINT_M5and them5=1concurrency entry (no local lanes left), drops the per-job timeout 90m → 45m, and keeps the README/CLAUDE.md descriptions in sync.🤖 Generated with Claude Code
🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
GADFLY_ENDPOINT_M5secret mapping and them5=1provider-concurrency entry, leaving only cloud providers. All remaining secrets (GITEA_TOKEN,OLLAMA_CLOUD_API_KEY,GADFLY_FINDINGS_URL,GADFLY_FINDINGS_TOKEN) are still explicitly mapped, so no credential is silently dropped or left dangling. Verified viagrepthat no remaining reference toGADFLY_ENDPOINT_M5exists in the workflow or repo config.if:+GADFLY_ALLOWED_USERS) is unchanged, so authn/authz for the secret-bearing comment trigger is intact. Verified at.gitea/workflows/adversarial-review.yml:38-43,68.🎯 Correctness — No material issues found
VERDICT: No material issues found
GADFLY_MODELS(.gitea/workflows/adversarial-review.yml:58) now contains exactly the six advertised survivors —minimax-m3,glm-5.2,glm-5.1,deepseek-v4-pro,nemotron-3-super,qwen3-coder:480b— and the four removed models (m5/qwen3.6,gemma4,gpt-oss,kimi-k2.7) are gone from the env. Verified by reading the file.GADFLY_PROVIDER_CONCURRENCY(:59) correctly drops them5=1entry now that nom5model remains inGADFLY_MODELS; no orphaned concurrency reference.GADFLY_ENDPOINT_M5env mapping is removed (:50-area no longer references it), consistent with dropping the local Mac lane. No code path in the workflow references it after the change (verified via grep across the repo — remainingm5/.../GADFLY_ENDPOINT_M5hits are unrelated docs/tests/examples, not this workflow).timeout-minutes: 45(:48) is internally consistent with the comment justification (no slow local lane dominating wall time). This is a config threshold with no first-principles "correct" value to re-derive against; the change is self-consistent with the removed lane.:4still says "this build is what carries foreman provider-type support" — now slightly stale since no foreman/Mac endpoint is used, but that's a documentation-nit outside the correctness lens (no behavioral impact; foreman support is still carried by the image regardless).No logic/behavioral defects introduced through the correctness lens.
🧯 Error handling & edge cases — No material issues found
Verdict: No material issues found
Through the error-handling / edge-case lens, I checked the unhappy paths this config diff touches:
GADFLY_MODELSnow lists only:cloud/:480b-cloudmodels, which all resolve to theollama-cloudprovider (confirmed viabuiltin.go:20ProviderOllamaCloud = "ollama-cloud"andparse_test.goshowing:cloud/:480b-cloudtags stay underollama-cloud).GADFLY_PROVIDER_CONCURRENCY/GADFLY_PROVIDER_LENS_CONCURRENCYboth still specifyollama-cloud=3, and the removedm5=1entry is no longer needed because no model references providerm5. No orphaned provider name that would cause a parse/startup error. Verified.LENS ollama-cloud=3) under a 600s per-lens deadline (GADFLY_TIMEOUT_SECS). Worst case ≈ 2 × 600s ≈ 20 min of deadline-bound work, well under the 45m job cap. The reduction is safe with margin. Verified against the values in the file (adversarial-review.yml:48,58,59,60,65).GADFLY_ENDPOINT_M5) — dropping the env mapping is consistent with removing the only foreman model. If the repo secret still exists unused, it's harmless (no longer passed in). No nil/empty fallback path introduced.(One stale-comment note —
adversarial-review.yml:4still says the image "carries foreman provider-type support," now that no foreman/Mac model remains — but that's a documentation issue outside my lens, not an error-handling defect.)No edge-case regressions introduced by this change.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 0s
🪰 Gadfly review —
glm-5.1:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
The change removes four reviewer models and a local Mac endpoint from the Gadfly fleet. From the security lens:
GADFLY_ENDPOINT_M5secret is no longer injected into the workflow, which is a slight net positive (one fewer credential in the job environment, one fewer network egress path to a foreman host).steve,fizi,dazed) and job-levelif:guard are unchanged — authz posture is the same.The remaining secrets (
GITEA_TOKEN,OLLAMA_CLOUD_API_KEY,GADFLY_FINDINGS_URL,GADFLY_FINDINGS_TOKEN) are unchanged and already appropriately scoped. Nothing to flag.🎯 Correctness — No material issues found
VERDICT: No material issues found
m5/qwen3.6:35b-mlx,gemma4:cloud,gpt-oss:120b-cloud,kimi-k2.7-code:cloud) have no remaining active references in the codebase (only a stale test fixture reference togpt-oss:120binprovider/ollama/ollama_test.go:250which is unrelated provider-test data, not Gadfly config).GADFLY_ENDPOINT_M5andm5=1concurrency entry are fully removed; no stale references found.GADFLY_MODELSvalue lists exactly 6 models, all with the:cloudprovider suffix consistent with the singleollama-cloud=3concurrency entry.timeout-minutes: 45vsGADFLY_TIMEOUT_SECS: "600"(10 min per lens): withollama-cloud=3model concurrency and 6 models, worst-case wall time is 2 batches × 600s ≈ 20 min, well within 45 min. No correctness issue.🧯 Error handling & edge cases — No material issues found
VERDICT: No material issues found
Reviewed through the error-handling & edge-case lens:
GADFLY_ENDPOINT_M5secret: No remaining references to this env var anywhere in the repo (verified via grep), so a missing/empty secret cannot cause a runtime error.m5=1fromGADFLY_PROVIDER_CONCURRENCY: Allm5/qwen3:35b-mlxmodels are gone fromGADFLY_MODELS, so the stale concurrency entry would be unreachable anyway. Removing it is correct and avoids any "unknown provider" parsing edge case in Gadfly.GADFLY_ENDPOINT_M5will become an orphaned secret, but that's a hygiene concern, not an error-handling issue — Gitea won't inject it unless referenced.:cloudsuffixed and all route to the singleollama-cloudprovider whose concurrency entries (ollama-cloud=3) are intact. No orphaned provider references.No edge-case or error-handling defects found in this change.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 30s
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Reviewing through the Security lens — this PR is purely a fleet-size trim and timeout shrink in the CI workflow, with parallel documentation updates. No new code paths, no new trust boundaries, no new taint sources, no input handling changes.
What I verified:
GADFLY_ENDPOINT_M5is gone from the workflow (grep across repo returns zero hits outside the deleted block). The foreman DSN plumbing form5is not referenced anywhere else (.env.exampleandREADME.mdstill describe theLLM_M*foreman DSN feature generically — those are the library's documented capabilities, not a CI integration, and they're not security-relevant to this change).secrets.*bindings (GITEA_TOKEN,OLLAMA_CLOUD_API_KEY,GADFLY_FINDINGS_URL,GADFLY_FINDINGS_TOKEN) and the trusted-userif:guard (steve|fizi|dazed) plusGADFLY_ALLOWED_USERSare unchanged.permissions:block (contents: read,issues: write,pull-requests: write) is unchanged — still no elevated scopes.timeout-minutes: 90 → 45reduces the worst-case window a runaway reviewer run holds secrets in memory and posts comments. That's a mild improvement on the secrets-exposure surface, not a regression.GADFLY_MODELS/GADFLY_PROVIDER_CONCURRENCY/GADFLY_PROVIDER_LENS_CONCURRENCYare untrusted-input-free envs set by workflow author; no user input is interpolated into them. Model name strings are not executed anywhere a security lens cares about.No material issues found.
The change is purely a fleet trim and a documentation sync. Through the security lens it is a small net positive (one fewer secret plumbed into the job, smaller worst-case runtime window for the secrets-bearing run). Nothing in the diff opens a new injection, SSRF, deserialization, authz, or secret-leakage vector.
🎯 Correctness — No material issues found
All consistent. The CLAUDE.md is updated and README.md has no Gadfly-fleet claim (its references to
m5/qwen3:30b,m1/qwen3:30bare in library code examples, not workflow descriptions).Let me verify the per-lens concurrency math one more time. The fleet has 6 cloud models. With
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3", at most 3 of those 6 are running concurrently (sliced into 2 batches of 3). Each running provider fans out its 3 lenses (GADFLY_PROVIDER_LENS_CONCURRENCY: "ollama-cloud=3"). Each lens has a 600s deadline. Worst-case wall time: ~1200s = 20min (two sequential provider-batches, each running 3 lenses in parallel for up to 600s). The 45-min job timeout is fine.Actually wait — I should think about this more carefully. If provider concurrency is 3, then up to 3 of the 6 cloud models run in parallel. As each finishes, the next one starts. That's a single wave shape, not 2 sequential batches. So in the worst case where ALL 6 take 600s each, with 3 slots, the last one wouldn't start until 3 × 600s = 1800s = 30min in — past the 45-min timeout.
Let me re-derive: if provider concurrency = 3 and there are 6 models, the queue is processed 3-at-a-time. If each model takes 600s (max), the LAST model starts at (5/3) × 600s ≈ 1000s into the run and runs for 600s, finishing at ~1600s ≈ 27min. Within 45min. ✓
But there's still concern: if lenses run concurrently within a model (3 lenses parallel), and the model takes 600s total wall time (its slowest lens), then with provider concurrency = 3, total worst case ≈ 27 min. With provider concurrency = 1 (serial per provider), it would be 6 × 600s = 60min, which would EXCEED 45min. But provider concurrency = 3 means parallel, so we're fine.
Actually, let me look at this differently: the per-model wall time is bounded by its slowest lens (lenses run in parallel within a provider, so model wall time ≤ lens timeout = 600s). With 6 models and concurrency 3, the 4th/5th/6th models can start as soon as any of the first 3 finishes. So total wall time ≤ 600s × 2 = 1200s = 20min in the absolute worst case. Definitely under 45min. ✓
So 90m → 45m is reasonable.
Now let me look one more time at the comment update on line 55-57:
The original was nearly identical; this is just a cleanup removing the M5-specific phrasing. ✓
VERDICT: No material issues found.
No material issues found.
This PR is a self-contained trim of the review fleet and its supporting documentation. Verified each claim:
.gitea/workflows/adversarial-review.yml:58). The newGADFLY_MODELSlists exactly 6 entries (minimax-m3,glm-5.2,glm-5.1,deepseek-v4-pro,nemotron-3-super,qwen3-coder:480b-cloud) — matches the PR's "cloud-only fleet of 6" claim.m5/qwen3.6:35b-mlx,gemma4:cloud,gpt-oss:120b-cloud,kimi-k2.7-code:cloud) was present in the original model string and is absent in the new one; the six remaining names are the original minus those four. 10 → 6 ✓.GADFLY_ENDPOINT_M5removal is safe. After the model list change, nom5/...reference remains inGADFLY_MODELS, so the secret mapping has no consumer (grep confirmed the onlyGADFLY_ENDPOINT_M5occurrence was the removed line).m5=1concurrency removal is safe. Same reason — nom5provider inGADFLY_MODELS, so the entry would have been a no-op.GADFLY_PROVIDER_CONCURRENCY=ollama-cloud=3,GADFLY_PROVIDER_LENS_CONCURRENCY=ollama-cloud=3, and a per-lens deadline of 600s: each model's wall time is bounded by its slowest lens (≤600s because lenses within a provider run in parallel), and at most 2 full provider-waves are queued, giving a worst-case wall time ≤ 1200s ≈ 20 min. 45 min is comfortably above that bound; the old 90 min was paying for the slow local Mac lane that no longer exists.CLAUDE.md:143–146). Updated to "fleet of 6 ollama-cloud models"; README.md has no fleet-size claim to update (itsm5/qwen3references are library usage examples in code blocks, not workflow descriptions).Nothing in the correctness lane — model set, concurrency knobs, timeout budget, and secret mapping all line up.
🧯 Error handling & edge cases — Minor issues
These are general mort library concerns about per-target retries inside majordomo, not about gadfly's multi-model posting behavior. The draft's claim about gadfly's runtime behavior ("zero findings with no diagnostic lane") is unverifiable from this repo since gadfly is an external tool.
Now let me write the corrected review. The first finding's factual core (provider diversity removed; old graceful-degradation comment deleted) is verifiable, but the dramatic conclusion about gadfly going all-or-nothing isn't. Findings 2 and 3 contain arithmetic that's partially wrong (Finding 2's "2× worst case") and partially unverifiable speculation (rate limiting). Let me produce a clean, verifiable output.
VERDICT: Minor issues
.gitea/workflows/adversarial-review.yml:45–66 (verified by diff vs. previous file): The fleet now consists solely of
*:cloudmodels (GADFLY_MODELS, line 58) andGADFLY_PROVIDER_CONCURRENCYonly declaresollama-cloud=3(line 59) —GADFLY_ENDPOINT_M5and the M5 entry inGADFLY_MODELSwere removed. The header comment that previously documented graceful degradation ("If the Mac is offline that model's comment shows an error and the others still post") is also gone. CLAUDE.md:143–146 is updated consistently to "6 ollama-cloud models". The factual loss of the only non-cloud provider lane is confirmed by the diff. Whether this constitutes a regression in review reliability depends on gadfly's behavior when the cloud provider is unreachable, which lives in the externalgitea.stevedudenhoeffer.com/steve/gadfly:sha-d7f364dimage and is not in this repo — that part of the claim is not verifiable here. Worth restating the new failure mode honestly in the workflow header (or adding a second provider lane) before merging, but not blocking on what can be checked in-tree..gitea/workflows/adversarial-review.yml:48,58–60,65 (timeout/concurrency math):
timeout-minutes: 45vs.GADFLY_TIMEOUT_SECS: "600"×ceil(6/3)=2model-batches ≈ 20 min worst-case if all requests run to deadline, i.e. ~2.25× headroom, not the "2×" the draft asserts. The 45-min job budget is reasonable.GADFLY_PROVIDER_CONCURRENCY=3×GADFLY_PROVIDER_LENS_CONCURRENCY=3= up to 9 concurrent ollama-cloud calls per CI runner — that arithmetic is correct. The claim that ollama-cloud rate-limits "will eat 429s and burn the full 45 min" is speculation about an external service's behavior and not verifiable from this repo. No actionable change required based on what is in-tree.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 27s
🪰 Gadfly review —
deepseek-v4-pro:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found.
The diff is a net security positive through the security lens:
GADFLY_ENDPOINT_M5secret removed — the foreman URL+token that connected to a local Mac is no longer exposed as an environment variable in the workflow. Verified viagrepthat no references toGADFLY_ENDPOINT_M5orGADFLY_ENDPOINTremain anywhere in the repo. This shrinks the secret surface area.GITEA_TOKEN,OLLAMA_CLOUD_API_KEY,GADFLY_FINDINGS_URL,GADFLY_FINDINGS_TOKEN) are unchanged. No new user-controlled input flows into the container.if:condition restricting comment-triggered runs tosteve,fizi,dazedis untouched, andGADFLY_ALLOWED_USERSmirrors it.🎯 Correctness — No material issues found
No material issues found
The diff is a straightforward fleet trim: four models are removed from
GADFLY_MODELS, their associated provider registration (GADFLY_ENDPOINT_M5) and concurrency entry (m5=1) are dropped, and the timeout is halved. I verified:m5/qwen3.6:35b-mlxwas the sole consumer of them5provider (registered viaGADFLY_ENDPOINT_M5) and the sole entry behindm5=1inGADFLY_PROVIDER_CONCURRENCY. All three are removed together — nothing left behind that would reference a missing provider or concurrency key.GADFLY_MODELSstring, all:cloud-suffixed, all served byollama-cloud— consistent with theollama-cloud=3concurrency settings.m5/qwen3:30b(line 81) and other repo references are majordomo library examples/docs — they describe the library's own env-DSN provider feature, not Gadfly's fleet, and are unaffected by this change.🧯 Error handling & edge cases — No material issues found
No material issues found.
The diff is a clean removal: the four models, the
GADFLY_ENDPOINT_M5env var, and them5=1concurrency entry are all dropped together. I verified:grepconfirmsGADFLY_ENDPOINT_M5,m5/qwen, andm5=1no longer appear in the workflow file. The remainingGADFLY_MODELSlist (6 entries, all:cloud) is consistent with the singleGADFLY_PROVIDER_CONCURRENCYentry (ollama-cloud=3).m5provider, the container won't attempt to resolveGADFLY_ENDPOINT_M5— removing the env var is safe.GADFLY_TIMEOUT_SECS: "600"still bounds individual runaway reviews regardless.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 31s
🪰 Gadfly review —
nemotron-3-super:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found
🎯 Correctness — No material issues found
No material issues found. The changes align with the PR description: removed the four specified reviewers, removed the corresponding endpoint and concurrency setting, reduced the timeout from 90 to 45 minutes, and updated the comments accordingly. All changes were verified in the workflow file. No correctness issues were detected.
🧯 Error handling & edge cases — No material issues found
No material issues found
Reasoning (Error handling & edge cases lens):
The changes are limited to workflow configuration (
.gitea/workflows/adversarial-review.yml) and documentation (CLAUDE.md). No actual Gadfly source code is modified, so error handling concerns are limited to the workflow's operational aspects:Timeout reduction (90 → 45 min): The workflow timeout was halved after removing the slow local Mac lane and four cloud models. The comment states "Plenty of headroom for the cloud lanes." Without runtime data, this is a judgment call but not an outright error handling flaw—it’s a configuration adjustment. If the timeout proves insufficient in practice, the workflow would fail visibly (step timeout), which is an observable failure mode, not a silent error.
Removed Mac lane secrets/concurrency:
GADFLY_ENDPOINT_M5and its associated comment were removed.GADFLY_PROVIDER_CONCURRENCYchanged from"ollama-cloud=3,m5=1"to"ollama-cloud=3".Verified via
grepthatGADFLY_ENDPOINT_M5is no longer referenced anywhere in the repository. No dangling references or orphaned cleanup obligations remain.Error propagation: The workflow step runs a container; if the container fails (e.g., missing secrets, runtime errors), the step fails and the workflow stops—this is appropriate for a CI job. No swallowed errors or missing rollback logic is evident because the job allocates no persistent resources requiring cleanup.
Documentation: Changes in
CLAUDE.mdmerely update the fleet size description (9 → 6 models, Mac removal) and remove references to the Mac lane. No error handling implications.No material error handling or edge case defects were identified in the scoped changes.
Verdict: No material issues found.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 49s
🪰 Gadfly review —
qwen3-coder:480b-cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
The changes in this PR are focused on reducing the number of reviewers in the Gadfly swarm by removing four models and the local M5 Mac endpoint. From a security perspective, the modifications involve:
GADFLY_ENDPOINT_M5environment variable and associated configurationThese are operational changes that do not introduce new security vulnerabilities or weaken existing security controls. The removal of local endpoints actually reduces the attack surface by eliminating a potential local network access point. The changes are consistent across both the workflow configuration and documentation, maintaining the advisory-only nature of the review system.
No security-related issues were identified in the diff. All changes appear to be intentional operational adjustments that maintain the existing security posture.
🎯 Correctness — No material issues found
VERDICT: No material issues found
The changes correctly update the documentation and workflow configuration to reflect the reduced fleet of reviewers. The timeout reduction from 90m to 45m is consistent with removing the slower local models. The model list and concurrency settings have been appropriately updated to reflect the cloud-only configuration. No correctness issues were identified in the changes.
🧯 Error handling & edge cases — No material issues found
VERDICT: No material issues found
The changes in this PR are focused on updating documentation and configuration to reflect the removal of several reviewers from the Gadfly swarm. Through the lens of error handling and edge cases:
GADFLY_ENDPOINT_M5and related configurations appears consistent across both the workflow file and documentation.timeout-minutesfrom 90 to 45 is accompanied by explanatory comments that clarify the change.GADFLY_MODELShas been appropriately trimmed, and the concurrency settings have been updated to remove them5=1entry.All changes appear to be consistent and do not introduce new error handling or edge case concerns. The removal of local Mac reviewers and the corresponding configuration updates are handled cleanly.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 10s