ci: add Gadfly adversarial PR reviewer + document the review loop #2
Reference in New Issue
Block a user
Delete Branch "ci/gadfly-adversarial-review"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Installs the standalone Gadfly agentic adversarial reviewer for majordomo, mirroring executus's setup on the latest pinned image (
sha-d7f364d).What's here
.gitea/workflows/adversarial-review.yml— runs the full fleet on every PR: 9 ollama-cloud models (minimax-m3,glm-5.2,glm-5.1,kimi-k2.7-code,deepseek-v4-pro,nemotron-3-super,gpt-oss:120b,qwen3-coder:480b,gemma4) + the M1/M5 Macs via foreman, each running the 3-lens suite (security, correctness, error-handling). Advisory only — never blocks a merge. Posts one consolidated comment per model, plus findings telemetry to gadfly-reports.main), wait for Gadfly to finish and weigh its findings, then grade each finding back to the gadfly-reports MCP (record_finding_grade/list_findings/scoreboard).Notes
OLLAMA_CLOUD_API_KEY,GADFLY_ENDPOINT_M1/_M5,GADFLY_FINDINGS_*). The M1/M5 Macs must be awake/reachable; an offline box only fails its own lane.🤖 Generated with Claude Code
🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
VERDICT: Minor issues
.gitea/workflows/adversarial-review.yml:49— image pinned by a mutable:sha-d7f364dtag, not a content-addressable digest. The header comment (lines 3-4) calls the image "pinned to an immutable :sha- tag," but:sha-d7f364dis still a movable tag, not a@sha256:...digest. Anyone with push access togitea.stevedudenhoeffer.com/steve/gadflycan re-tagsha-d7f364dto a different (malicious) image, which would then run withGITEA_TOKEN,OLLAMA_CLOUD_API_KEY, and the foreman/telemetry tokens in its env. Mitigation is real but incomplete: pin by registry digest (docker://.../gadfly@sha256:...) rather than a tag named like a sha. Confirmed by reading the file (line 49 uses only a tag)..gitea/workflows/adversarial-review.yml:52—GITEA_TOKENhanded to the third-party container withissues: write/pull-requests: write(lines 25-27) and no per-secret scope bounding visible in the workflow. Combined with the un-digested image pull above, a compromised/re-tagged Gadfly image could exfiltrate the token and act beyond posting review comments. The secret's actual scope is out-of-repo config and couldn't be verified from the repo; flagging as a hardening recommendation to confirmGITEA_TOKENis repo-scoped (read issues + write comments, nocontents: write) rather than a broad token..gitea/workflows/adversarial-review.yml:90—COMMENT_BODY: ${{ github.event.comment.body }}is untrusted PR-attacker-controlled input passed verbatim into the secret-bearing container env. Low risk on the workflow side: it's an env var, there is norun:shell step, and the workflow does no shell interpolation of it, so this is not a workflow-side injection. The one tainted attacker-controlled value reaching the container, though — the container's own logic must treatCOMMENT_BODY/ACTOR/PR_BRANCHas untrusted (no shell-out/templating with them). Advisory only.Note: the
issue_commentdesign is otherwise sound — the job-levelif:allow-list (lines 38-43) and the mirroredGADFLY_ALLOWED_USERS(line 78) gate who can trigger a secret-bearing run, and there is noactions/checkoutof PR code into the container (line 49 pulls a pinned image directly), so a PR author cannot tamper with the workflow that consumes the secrets.🎯 Correctness — Minor issues
Draft review
VERDICT: Minor issues
Through the correctness lens, I verified the workflow YAML against the checked-out repo (the Gadfly container source isn't in this repo, so container-internal semantics are out of reach). Findings:
.gitea/workflows/adversarial-review.yml:15—pull_requesttrigger omitssynchronize, so pushing new commits to an open PR does not re-run Gadfly. The triggers are[opened, reopened, ready_for_review]only. The documented loop inCLAUDE.md:136("Push to a PR... let Gadfly review it") and the PR description ("runs the full fleet on every PR") imply review tracks PR updates, but the only re-trigger path after a follow-up push is a manual comment (issue_comment: created). Verified by reading theon:block. Fix: addsynchronize(and optionallyedited) to thepull_requesttypes, or document in CLAUDE.md that re-review requires a comment..gitea/workflows/adversarial-review.yml:88-89—PR_BRANCHandIS_DRAFTare empty forissue_commentandworkflow_dispatchtriggers.github.head_ref(line 88) andgithub.event.pull_request.draft(line 89) are only populated onpull_requestevents. For comment/dispatch runs both env vars resolve to empty strings. The inline comment at line 81-82 saysGADFLY_REPO/GADFLY_PRare "derived in-container," yetPR(line 87) is also passed explicitly, andPR_BRANCHis only ever passed (never derived per the comment). If the Gadfly container consumesPR_BRANCHdirectly to fetch the PR ref, comment/dispatch-triggered reviews operate with an empty branch. I could not verify the container's behavior (no Gadfly source in this checkout), so this is flagged as a suspected gap — confirm the container re-derives the head ref from the PR number viaGITEA_APIwhenPR_BRANCHis empty..gitea/workflows/adversarial-review.yml:29-31— concurrency group shared across all trigger types; a skipped comment-triggered run may cancel an in-progress review. The groupgadfly-<number>(line 30) resolves to the same value forpull_request,issue_comment(PR comment →issue.number== PR number), andworkflow_dispatch. Withcancel-in-progress: true(line 31), a comment by a non-allow-listed user evaluates theif:guard (lines 38-43) false, but the workflow run still enters the same concurrency group. If Gitea evaluates concurrency at run-enqueue time (as GitHub Actions does, before jobif:evaluation), a bystander's comment could abort a running review. I could not confirm Gitea's exact semantics here; worth verifying and, if affected, gating the concurrency group or moving the allow-list check earlier.I confirmed the expression ordering in
PR(line 87) and theif:guard (lines 38-43) are correct: forissue_comment,github.event.pull_request.numberis undefined → falsy → falls through togithub.event.issue.number, andgithub.event.issue.pull_requestcorrectly restricts comment triggers to PRs. TheGADFLY_MODELScloud-tag forms (:cloud,:120b-cloud,:480b-cloud) are consistent with ollama tag naming and not a correctness defect.🧯 Error handling & edge cases — Minor issues
Verdict: Minor issues
.gitea/workflows/adversarial-review.yml:38-43— every comment by an allowed actor re-runs the whole fleet; no command gating. Theissue_commentbranch of theif:only checks the actor and thatgithub.event.issue.pull_requestis set — it never inspectsCOMMENT_BODY, which is forwarded into the env at line 90 but is not used by the gate. So any casual reply ("looks good", a typo) from steve/fizi/dazed launches the full 9-model + M1/M5 suite. Combined withcancel-in-progress: true(line 31), a second conversational comment cancels a review already in flight and restarts it from scratch, losing partial work and churning the slow local lanes. Fix: gate on a command token, e.g. add&& startsWith(github.event.comment.body, '/gadfly')to theif:..gitea/workflows/adversarial-review.yml:14-15— draft PRs get reviewed twice.pull_requesttriggers onopened(which fires for drafts too) and again onready_for_reviewwhen the draft is flipped ready.IS_DRAFTis forwarded (line 89) but the workflow doesn't gate on it, so WIP draft code gets a full 90-min review on open and a second one on ready-for-review. Fix: either dropopenedin favor ofready_for_review/synchronize, or add&& github.event.pull_request.draft == falseto theif:for PR events (the run shouldn't start at all for drafts)..gitea/workflows/adversarial-review.yml:87-88—PR_BRANCHresolves to empty on non-pull_requesttriggers (unverified impact).PR_BRANCH: ${{ github.head_ref }}is only populated onpull_requestevents. Forworkflow_dispatch,PRis the inputpr_numberbutPR_BRANCHis empty; forissue_comment, bothgithub.event.pull_request.numberandgithub.head_refare undefined andPRfalls back togithub.event.issue.numberwhilePR_BRANCHstays empty. Whether this breaks the container depends on the in-container code (it may re-derive the branch from the API viaGITEA_API/PR). Worth confirming the container handles an emptyPR_BRANCHby resolving it from the PR API rather than trusting the env var.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 37s
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
VERDICT: Minor issues
Findings (security lens)
.gitea/workflows/adversarial-review.yml:49— Docker image pinned by mutable registry, not digest. The tag is:sha-d7f364d, but thedocker://reference does not resolve to an immutable content hash; the registry can replace the tag. The comment on lines 3-4 calls it an "immutable :sha- tag," which isn't what the Gitea runner actually fetches — a registry compromise or force-push to the same tag leaksGITEA_TOKEN,OLLAMA_CLOUD_API_KEY, theGADFLY_ENDPOINT_M*URLs/tokens, and theGADFLY_FINDINGS_*secrets into attacker-controlled code. Pin by digest (@sha256:...)..gitea/workflows/adversarial-review.yml:90-91—COMMENT_BODYandCOMMENT_IDfrom theissue_commentevent are set unconditionally in the step'senv:block, alongside the secrets. Even with the job-levelif:guard on lines 38-43 gating whether the job runs at all, the comment text is placed in the same environment asGITEA_TOKENand the API keys. Any bug, log line, or telemetry path in the Gadfly container that dumps its own environment leaks the comment body and the token. Restructure so user-supplied event fields are set only after the guard, or pass them via a file written by a trusted step..gitea/workflows/adversarial-review.yml:38-43— allow-list is case-sensitive and exact-string compared.github.actor == 'steve'(and'fizi','dazed') won't match SSO-aliased variants (Steve,STEVE, trailing space, etc.), so the guard fails closed or open depending on the actor's display name. Either lowercase both sides or, better, source the list from avars.*entry so it isn't editable by a PR that lands via push-to-main..gitea/workflows/adversarial-review.yml:83-84— findings URL is a secret, so the third-party container can be redirected to any host with the token in hand.GADFLY_FINDINGS_URLshould be a non-secretvars.GADFLY_FINDINGS_URL; only the bearer (GADFLY_FINDINGS_TOKEN) needs to be a secret. Today an SSRF-from-the-inside by Gadfly can POST findings (and the token) to an attacker-chosen endpoint..gitea/workflows/adversarial-review.yml:88—PR_BRANCH: ${{ github.head_ref }}is attacker-controllable and flows into a third-party container as a label/telemetry field. Branch names from PRs (including forks, wherehead_refis the contributor's branch) can contain..,/, or shell metacharacters. Combined with finding 2, this widens the injection surface for any code path that echoes the value back. Treat allgithub.event.*fields as tainted and validate before they leave the trusted boundary..gitea/workflows/adversarial-review.yml:24-27— theGITEA_TOKENgranted here is write-scoped to bothissuesandpull-requests, and is the same generic org secret used elsewhere. A single bug in the Gadfly container = ability to edit PR titles, merge PRs, close/lock issues, etc. Issue a fine-scoped token (ReadAndWriteonpull_requestsonly, since the workflow only needs to post review comments) and pass it as a dedicated secret rather than the org-wideGITEA_TOKEN.CLAUDE.md:145-151— documentsmcp__gadfly__record_finding_grade/mcp__gadfly__list_findings/mcp__gadfly__scoreboardas available MCP tools, but no MCP wiring is referenced anywhere in the repo.grepformcp__gadflyandgadflyshows only the workflow and these doc lines — no MCP server config, no endpoint, no credential. Operators are told to call a tool that isn't surfaced. Either point the doc at the real MCP server definition or remove the instructions.Note: findings about
GADFLY_ALLOWED_USERSbeing "string in workflow, not a secret" and about thepull_requestfrom-fork secret-leak were folded into theif:allow-list finding above; the env-var ordering concern is covered by theCOMMENT_BODY/COMMENT_IDfinding.🎯 Correctness — Minor issues
The comment on lines 64-65 is admittedly ugly prose —
LENSES:andLENSare abbreviated shorthand, hard to parse, and the second occurrence redundantly repeats "ollama-cloud=3" already stated. That's a fair cosmetic observation. However, the draft's own framing (it's a comment, not a behavior bug) is honest, and it's a nit. Whether to keep it depends on whether a cosmetic finding rises to "minor issue" — I'd say it does qualify as a real but trivial observation, distinct from the (false) separator-typo claim. I'll keep it as a cosmetic note but not let it drive the verdict.Both findings about the model-spec separator/typo are wrong. The cosmetic comment note is real but minor.
Correctness Review
VERDICT: Minor issues
Findings
.gitea/workflows/adversarial-review.yml:64-65— cosmetic comment nit. The linesLENSES:,LENS) where the actual env vars below them areGADFLY_PROVIDER_CONCURRENCYandGADFLY_PROVIDER_LENS_CONCURRENCY, and the second occurrence ofollama-cloud=3merely restates the first. Not a behavior bug; just hard to read adjacent to the env block it documents.What I checked and found clean
.gitea/workflows/adversarial-review.yml:68—GADFLY_MODELSvalue. The original draft flaggedgpt-oss:120b-cloudandqwen3-coder:480b-cloudas separator typos. False positive. The parse grammar (docs/adr/0003-parse-grammar.md:17-23,provider/ollama/ollama_test.go:250,parse_test.go:65-77) isprovider "/" modelwith the model passed verbatim. The 9 comma-separated cloud entries are bare alias tokens (no/); the two Mac entries arem1/<model>andm5/<model>targets.-cloudin the alias names is the registered naming convention, parallel to theminimax-m3:cloudstyle — not a separator mismatch. The same-cloud-suffixed idqwen3-coder:480b-cloudalready appears verbatim as a real model id inparse_test.go:68,README.md:80,examples/live/main.go:33,examples/tiers/main.go:20. No fix needed.if:guard, concurrency group, env-var event context, permissions, timeout, allow-list mirror: all consistent with the draft's own analysis.🧯 Error handling & edge cases — No material issues found
VERDICT: No material issues found
I read the workflow file end-to-end (only 92 lines) and evaluated each draft finding against it. Every finding either self-corrects, is explicitly marked unverified, or is a speculative "nice-to-have" rather than a confirmed bug. None survive verification:
.gitea/workflows/adversarial-review.yml:87—PRfallback chain. The reviewer themselves concludes "Verified by reading the file; no real bug here" — the||short-circuit is correct on all three trigger paths. Not a finding..gitea/workflows/adversarial-review.yml:88-90— empty strings on non-PR events. Explicitly marked "Unverified — I don't have the Gadfly source in this repo." The workflow comment block (lines 85-86) labels these as "leave as-is" event context, and the reviewer concedes the container likely tolerates empties. Pure speculation about external code..gitea/workflows/adversarial-review.yml:68—GADFLY_MODELSparser shape. Explicitly marked "Unverified; only a real bug if the Gadfly parser is strict about thename:tagshape." The reviewer even notes the per-model isolation (line 59 comment) would mask a parser bug. Speculation about an external parser..gitea/workflows/adversarial-review.yml:49— no checkout / preflight step. The reviewer concedes "the secret is documented as required, and a missing-token failure is loud". This is a quality-of-life suggestion ("would benefit from a 1-line precondition"), not a confirmed material defect — the workflow is correct as written and fails clearly on misconfig..gitea/workflows/adversarial-review.yml:31—cancel-in-progress: true. Explicitly hedged as "Not an error-handling bug per se — just an interaction worth being aware of." Not a bug.Nothing else in the file is materially wrong: the
if:allow-list guard (lines 38-43) correctly mirrorsGADFLY_ALLOWED_USERS(line 78), the concurrency group key (line 30) is well-formed, the secret mappings are explicit (Gitea's no-auto-expose behavior is acknowledged in the comment on line 60), and the timeouts/concurrency tunables are internally consistent (cloud lens concurrency 3 + provider concurrency 3 for ollama-cloud, serial 1 for each Mac lane).Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 11s
🪰 Gadfly review —
m1/qwen3:14b(m1)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — Blocking issues found
Blocking issues found
Secrets exposure risk: The workflow passes multiple sensitive secrets (
OLLAMA_CLOUD_API_KEY,GADFLY_ENDPOINT_M1,GADFLY_ENDPOINT_M5,GADFLY_FINDINGS_URL,GADFLY_FINDINGS_TOKEN) directly to the Docker container. If the container has any vulnerabilities (e.g., insecure deserialization, command injection), these secrets could be leaked. The PR mentions using a pinned SHA for the image, but this doesn't eliminate the risk of vulnerabilities in the image itself. Verified by reading the diff.Insecure endpoint format: The
GADFLY_ENDPOINT_M1andGADFLY_ENDPOINT_M5secrets use a custom formatforeman|https://<host>|<token>, which is not standard. If the foreman implementation doesn't properly validate these endpoints, it could lead to SSRF or other attacks. This format isn't validated in the workflow, creating a potential injection risk. Verified by reading the diff.Lack of input validation: The
GADFLY_MODELSenvironment variable is set to a hardcoded list of models, but there's no validation to ensure these values are safe or properly formatted. If this variable were user-controllable, it could allow injection attacks (e.g., malicious model names). Verified by reading the diff.🎯 Correctness — Blocking issues found
Blocking issues found
.gitea/workflows/adversarial-review.yml:55— TheGADFLY_MODELSvalue"m1/qwen3:14b,m5/qwen3.6:35b-mlx"includes provider prefixes (m1/,m5/) in model names, which may be semantically incorrect. The comment explicitly states thatGADFLY_ENDPOINT_M1/_M5are foreman-preset Ollama clients (format:foreman|https://...|<token>), implying models should be specified without provider prefixes. Including provider prefixes in model names may cause Gadfly to misroute model execution, as the provider is already defined via the endpoint secrets. Verify with the Gadfly documentation or foreman configuration to confirm whether provider prefixes are valid in this context.🧯 Error handling & edge cases — Blocking issues found
Blocking issues found
.gitea/workflows/adversarial-review.yml:23 — No error handling for missing/invalid secrets. If
GITEA_TOKENorOLLAMA_CLOUD_API_KEYis missing, the Docker container will fail silently without reporting actionable errors to the user. Verified by reading the workflow's env block — noifguards or fallbacks exist for secret validation..gitea/workflows/adversarial-review.yml:53 — No error recovery for failed model executions. The
GADFLY_MODELSlist includes 10 models (9 cloud + 2 Macs), but if any model fails (e.g., due to offline Macs), the workflow will silently fail without distinguishing between model-specific errors and system-level failures. Verified by checking the Docker step — nocontinue-on-erroror error classification logic is present..gitea/workflows/adversarial-review.yml:66 — No fallback for MCP telemetry failures. If
GADFLY_FINDINGS_URLis unreachable or authentication fails, the workflow will silently lose telemetry data without alerting the user. Verified by examining the env block — no error logging or retry logic is configured for the findings endpoint.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 24m 3s
🪰 Gadfly review —
glm-5.1:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
VERDICT: Minor issues
.gitea/workflows/adversarial-review.yml:38-43—pull_requestevents bypass the actor allowlist; defense-in-depth would extend the check to all event types. Theif:guard only restrictsissue_commentevents to actors steve/fizi/dazed. Forpull_requestandworkflow_dispatchevents,github.event_name != 'issue_comment'is true and the job runs unconditionally. This is a standard CI pattern that relies on the platform (Gitea Actions, like GitHub Actions) withholding secrets from fork-PRpull_requestevents. That platform-level isolation is the primary security boundary, and it is in effect here. However, any repo collaborator — even one not in the allowlist — can open a same-repo PR and trigger the job with full access to all mapped secrets. For repositories where all collaborators are trusted, this is acceptable; if the allowlist is meant to be the authoritative gate, theif:should coverpull_requestandworkflow_dispatchevents too. Impact: Low — relies on Gitea's fork-PR secret isolation (standard behavior) and assumes repo collaborators are implicitly trusted..gitea/workflows/adversarial-review.yml:90— RawCOMMENT_BODYpassed as environment variable without sanitization.${{ github.event.comment.body }}is injected directly into the container environment. If the Gadfly image interprets this value in a shell or template context, it could enable command or template injection. Theissue_commentpath is restricted to the three allowed actors (steve/fizi/dazed), making exploitation require a compromised trusted account. Impact: Low — trusted-actor-only path, but defense-in-depth would sanitize or validate the comment body before passing it to the container.🎯 Correctness — Minor issues
VERDICT: Minor issues
.gitea/workflows/adversarial-review.yml:15—pull_requesttrigger omitssynchronize. Thetypeslist is[opened, reopened, ready_for_review], which does not includesynchronize. This means pushing new commits to an existing PR will not automatically re-trigger the adversarial review. Theissue_commenttrigger andworkflow_dispatchoffer manual workarounds, but the workflow's own documentation and CLAUDE.md both say to "push to a PR" and wait for Gadfly — implying it should auto-run on every push. If automatic re-review on pushes is desired, addsynchronizeto the types list. (Confirmed by reading the file.).gitea/workflows/adversarial-review.yml:68—GADFLY_MODELSformat vs documentedprovider/modelgrammar (unverified). The project's ownREADME.mdandCLAUDE.mddocument the model-spec grammar asprovider/modelwith/as the delimiter (e.g.,ollama-cloud/minimax-m3:cloud), where everything after the first/is the model ID passed verbatim. However, theGADFLY_MODELSvalue lists cloud models without a provider prefix (e.g.,minimax-m3:cloudrather thanollama-cloud/minimax-m3:cloud), while the Mac models do use theprovider/prefix (m1/qwen3:14b,m5/qwen3.6:35b-mlx). Additionally,GADFLY_PROVIDER_CONCURRENCYreferences providerollama-cloud, but no cloud model entry contains that string. The env-var format may differ from the Go API'sParse()grammar, and Gadfly may apply a default provider to bare names — but without access to Gadfly's parser, this cannot be confirmed. Someone with access to the Gadfly image should verify that bare model names likegpt-oss:120b-cloudandqwen3-coder:480b-cloudresolve to theollama-cloudprovider correctly.🧯 Error handling & edge cases — Minor issues
Now let me carefully evaluate each finding:
Finding 1: Missing
synchronizeinpull_requesttypes.The workflow triggers on
[opened, reopened, ready_for_review]but notsynchronize. This is confirmed at line 15. When a PR author pushes new commits, the workflow won't re-trigger. Theissue_commenttrigger allows manual re-review, but there's no automatic re-trigger on new commits. This is a valid concern — Gadfly's review can become stale. However, note that theissue_commenttrigger does allow manual re-triggering. Still, the missingsynchronizemeans automatic re-review on new pushes won't happen.This is a legitimate, confirmed finding.
Finding 2:
PR_BRANCHis empty forissue_commentandworkflow_dispatchtriggers.github.head_refis only populated forpull_requestevents. Forissue_commentandworkflow_dispatch, it will be empty. The comment at line 81-82 says "GADFLY_REPO / GADFLY_PR are derived in-container", which strongly implies Gadfly uses the GITEA_API + GITEA_TOKEN + PR number to fetch branch/draft info from the API server-side. If Gadfly can derive PR_BRANCH internally from the API, then the empty env var is harmless (Gadfly ignores it). But if Gadfly relies on PR_BRANCH being set, it would be broken for non-PR triggers.The comment explicitly says "GADFLY_REPO / GADFLY_PR are derived in-container" — this is a strong signal that Gadfly does API lookups. However, PR_BRANCH is still being passed as an env var, and it will be empty for two of the three triggers. This is a real edge case: the env var is provided but will be empty. Whether Gadfly uses it or derives it from the API is unknown from this repo's perspective, but the workflow is explicitly passing a value that will be empty in 2/3 trigger scenarios. That's a concrete concern worth noting.
However, given the "GADFLY_REPO / GADFLY_PR are derived in-container" comment, it's quite likely Gadfly derives PR_BRANCH from the API as well. The fact that PR_BRANCH is still passed as an env var could just be for the pull_request trigger case. This is speculative — I can't confirm Gadfly's internals.
Let me keep this as a minor note but soften it given the "derived in-container" comment.
Finding 3:
cancel-in-progress: trueis aggressive withissue_commenttrigger.This is about design trade-offs. The concurrency group at line 30 uses the PR number, so any new trigger for the same PR cancels the old one. The
issue_commenttrigger will fire on every comment by an allowed user, and it's not filtered by comment content. However, looking at line 38-43, theif:guard already restrictsissue_commentto specific users. Only steve, fizi, and dazed can trigger it. This is a very small set of trusted users who presumably know the workflow. This is more of a design suggestion than a bug.This is a valid but minor design observation. I'll keep it but note it's a design trade-off, not a bug.
Finding 4:
IS_DRAFTis empty for non-pull_requesttriggers.github.event.pull_request.draftis only available forpull_requestevents. Forissue_commentandworkflow_dispatch, it will be empty/nil. Same reasoning as Finding 2 — if Gadfly derives this from the API, it's fine. The "derived in-container" comment again applies. But IS_DRAFT being empty (notfalse) could be problematic if Gadfly treats empty as truthy.Same analysis as Finding 2. This is a concrete edge case in the workflow file, but likely handled by Gadfly's in-container API logic.
Let me reconsider which findings to keep. Findings 2 and 4 are about env vars being empty for certain triggers, but the workflow explicitly comments that PR info is "derived in-container." This strongly suggests Gadfly doesn't rely solely on these env vars. The findings are speculative about Gadfly's internal behavior which we can't verify from this repo.
Finding 1 (missing
synchronize) is clearly valid and confirmable from the workflow file alone.Finding 3 (cancel-in-progress) is a valid design observation but it's really about the interaction of the workflow configuration choices, not a bug.
I'll keep Findings 1 and 3 as minor issues, and drop Findings 2 and 4 since they depend on Gadfly's internals which the workflow comments suggest handle this via API.
Wait, let me reconsider. Finding 2 is still a real issue even if Gadfly derives the branch from the API — passing an empty env var is confusing at best and could cause subtle bugs. But the draft's claim that "Gadfly receives an empty PR_BRANCH in those cases" is technically true. Whether it matters depends on Gadfly internals. Given the explicit comment that PR info is "derived in-container," this is likely handled. I'll drop this.
Finding 4 same reasoning — IS_DRAFT being empty is a fact, but likely handled in-container. Drop it.
Actually, let me reconsider Finding 3 more carefully. The
cancel-in-progress: truewithissue_commentmeans every comment from an allowed user on the PR cancels the running review. But theif:guard limitsissue_commentto specific users AND requires it to be on a PR (not a regular issue). Still, any comment by those users would cancel. This is a design trade-off noted in the finding. It's valid but minor.Let me finalize:
synchronizemeans the workflow won't auto-re-run on new commits. Confirmed.VERDICT: Minor issues (still minor with the two surviving findings).VERDICT: Minor issues
.gitea/workflows/adversarial-review.yml:15— Missingsynchronizeinpull_requesttypes. The trigger fires on[opened, reopened, ready_for_review]but notsynchronize. When a PR author pushes new commits, Gadfly will not automatically re-review. The only re-trigger paths are a manual comment from an allowed user or re-opening the PR. Gadfly's review can silently become stale on the very PRs it's designed to review. Fix: addsynchronizeto the types list..gitea/workflows/adversarial-review.yml:31—cancel-in-progress: truewith theissue_commenttrigger is aggressive. Any comment by an allowed user on the PR (even "looks good" or "👍") creates a new workflow run that cancels the in-progress one. Combined with the 90-minute timeout and full 11-model fleet, this can waste significant compute and discard partial reviews. The comment body is not filtered before triggering. This is a design trade-off rather than a bug, but worth noting: a casual comment from steve/fizi/dazed kills an in-progress review.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 37s
🪰 Gadfly review —
m5/qwen3.6:35b-mlx(m5)Verdict: Minor issues · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling
🔒 Security — ⚠️ could not complete
⚠️ This reviewer failed to complete: agent: step 4: context deadline exceeded
🎯 Correctness — Minor issues
Review Summary
Verdict: Minor issues
Finding 1 (Unverified): Inconsistent model ID suffix for ollama-cloud entries in
GADFLY_MODELS.gitea/workflows/adversarial-review.yml:68Seven of the nine ollama-cloud model entries use
:<model-id>:cloudformat (e.g.,minimax-m3:cloud,gemma4:cloud). Two entries (gpt-oss:120b-cloud,qwen3-coder:480b-cloud) carry different suffixes (:120b-cloud,:480b-cloud).If Gadfly's internal mapping expects a consistent
:cloudsuffix to identify ollama-cloud provider models, these two entries would either fail to be routed correctly or map to the wrong provider entirely. Conversely, if they are intentionally different model identifiers (and should still resolve to ollama-cloud), then the inconsistency is cosmetic — but it risks confusing future maintainers or automated validation logic within Gadfly.Impact: If these 2 models fail to resolve to
ollama-cloud, they'd silently become unreviewed by any cloud provider, reducing the adversarial review fleet from 9 to 7 ollama-cloud models without error (the model resolution would happen inside the Gadfly container). This is unverified because I don't have access to Gadfly's source code (gitea.stevedudenhoeffer.com/steve/gadfly).Suggested fix: Verify against Gadfly's model parsing logic that
gpt-oss:120b-cloudandqwen3-coder:480b-cloudare indeed valid ollama-cloud model IDs. If they should map via a:cloudsuffix, correct the format (e.g., to match other entries).🧯 Error handling & edge cases — No material issues found
Findings through "Error handling & edge cases" lens
I reviewed the workflow file thoroughly and cross-referenced it against Gitea Actions behavior to validate each concern below.
No material issues found
After careful review of
.gitea/workflows/adversarial-review.ymlthrough the error-handling-and-edge-cases lens, I found no concrete bugs or missing error handlers that would cause failures. Here's why:The
if:guard on line 38–43 correctly handles all trigger types — Forpull_requestandworkflow_dispatch,github.event_name != 'issue_comment'short-circuits to true. Forissue_comment, the actor allow-list is enforced, and non-PR-issue comments (on regular Issues) are properly rejected via thegithub.event.issue.pull_requestnull check. Verified by reading lines 38–43 of the file.IS_DRAFTempty-string edge case is benign — Forissue_commentandworkflow_dispatch,${{ github.event.pull_request.draft }}(line 89) resolves to empty string. Since the downstream consumer only uses this advisory flag (the review never blocks merges), an empty boolean has no observable failure mode. The value is informational only; downstream code either ignores it or treats any non-"true" as false.Missing secrets produce no workflow-level failures — Unset secrets (
OLLAMA_CLOUD_API_KEY,GADFLY_ENDPOINT_M1, etc., lines 53, 61–62, 83–84) are passed as empty strings to the container. The comments explicitly state "failures only log to stderr and never affect the review" (line 80–81), so this is a design choice, not an oversight. The workflow doesn't need to validate secrets — the container handles them.No failure handler on the
docker://step is intentional — Line 49 has noif: failure()orif: always()fallback. Since Gadfly is advisory-only and never blocks merges, silent failure is consistent with the design. There's nothing to roll back, no state to clean up, and no blocking consequence of a crash.Concurrency expression safely handles all three triggers — Line 30's
github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_numbercorrectly resolves in each case: PR events use.pull_request.number, issue comments use.issue.number, andworkflow_dispatchuses.inputs.pr_number. Verified the full expression against the trigger definitions on lines 14–22.No rollback or cleanup needed — This workflow is single-step (docker container), advisory-only, and produces no state changes that require cleanup. No
deferequivalent is needed in CI.All checked by reading
.gitea/workflows/adversarial-review.ymlin full (lines 1–92) and cross-referencing the event trigger definitions against each${{ github.event.* }}reference.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 23m 45s
🪰 Gadfly review —
kimi-k2.7-code:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
Minor issues
.gitea/workflows/adversarial-review.yml:49— The Gadfly image is “pinned” with a mutable tag (sha-d7f364d), not a content-addressable digest. Tags — even ones named after a git SHA — can be repushed if registry immutability is misconfigured or bypassed. Because this container receives repo write token, cloud API keys, foreman endpoint tokens, and a findings token, a tag-swap would let an attacker run arbitrary code with those credentials or post malicious PR comments. Fix: pin withimage@sha256:<digest>(act_runner/GitHub-styledocker://supports this) or document/enforce registry tag immutability for this image..gitea/workflows/adversarial-review.yml:38-43,90— Theissue_commenttrigger gates on actor identity but not on comment content, so any comment fromsteve,fizi, ordazedon any PR feedsCOMMENT_BODYinto a secret-bearing container. This widens the attack surface: a compromised allowed account or a crafted comment body could exploit the container’s parser/LLM invocation. Fix: add a command-prefix check to theifguard (e.g.,&& startsWith(github.event.comment.body, '/gadfly')) so routine comments don’t trigger a privileged run.🎯 Correctness — No material issues found
No material issues found
Reviewed through the Correctness lens.
minimax-m3,glm-5.2,glm-5.1,kimi-k2.7-code,deepseek-v4-pro,nemotron-3-super,gpt-oss:120b,qwen3-coder:480b,gemma4= 9; plusm1/qwen3:14b,m5/qwen3.6:35b-mlx= 11 total. Correct.ollama-cloud=3,m1=1,m5=1; lens concurrency only for cloud; specialists default matches the 3-lens suite.if:guard forissue_commentcorrectly checksgithub.event.issue.pull_requestand allow-listssteve,fizi,dazed; the same list is repeated inGADFLY_ALLOWED_USERS.CLAUDE.mdand the existingci.yaml; the new docs section is consistent with existing style and accurately describes the workflow file.Nothing in this change appears semantically incorrect from a correctness standpoint.
🧯 Error handling & edge cases — Minor issues
Minor issues
.gitea/workflows/adversarial-review.yml:87—PR: ${{ github.event.pull_request.number || github.event.issue.number || github.event.inputs.pr_number }}can resolve to empty for any unexpected event shape, passingPR=to the container. Defensive guard (e.g., an early step that fails fast ifPRis blank) would prevent a run that silently targets nothing. Verified by reading the workflow and the threeon:triggers..gitea/workflows/adversarial-review.yml:88—PR_BRANCH: ${{ github.head_ref }}is only defined forpull_requestevents; onissue_commentandworkflow_dispatchit is empty. If the container usesPR_BRANCHfor checkout/branch selection, this could misroute those triggers. Suggest making it conditional or omitting it when not applicable..gitea/workflows/adversarial-review.yml:89—IS_DRAFT: ${{ github.event.pull_request.draft }}is empty for non-pull_requestevents. The container likely treats empty as non-draft, but that is an implicit default. Suggest${{ github.event.pull_request.draft || false }}to make the default explicit..gitea/workflows/adversarial-review.yml:30— concurrency groupgadfly-${{ ... }}falls back togadfly-if none of the PR-number expressions resolve, which could cause cross-run collisions if an edge-case event slips through. Theif:guard mostly prevents this, but a${{ ... || 'unknown' }}fallback is cheap insurance.CLAUDE.md:146-150— Documentsmcp__gadfly__record_finding_grade,list_findings, andscoreboardas the grading interface, but does not say how to install/verify the MCP or what to do if those tools are absent. Adds operational friction if a reader tries to follow the loop without the MCP configured. Suggest a one-line note/link about MCP setup.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 55s
🪰 Gadfly review —
deepseek-v4-pro:cloud(ollama-cloud)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Now let me verify the key claims in each finding against the actual code and repository context.
Finding 1 — The
if:guard on lines 38-43 is:This is factually correct: for
pull_requestandworkflow_dispatch,github.event_name != 'issue_comment'is true, so the job runs unconditionally. However, the finding frames this as a security vulnerability. In Gitea Actions, only users with write access to the repository can push branches and open same-repo PRs — the same set of people who are already trusted with the repo. Fork PRs do not receive secrets. Theissue_commenttrigger is the only vector where an external (non-write-access) user can interact, hence the allowlist there. The workflow's own comment on line 36 explicitly documents this design decision: "pull_request + workflow_dispatch are already trusted." This is a deliberate, documented design choice consistent with the repo's threat model, not an unverified assumption. The existingci.yamlhas the same pattern — it runs on all PRs with no actor check. DROP.Finding 2 —
COMMENT_BODYis gated by theissue_commentallowlist (only three named users). The draft itself concedes the blast radius is limited to those three accounts being compromised, and admits the container behavior "could not be verified." This is speculation about an external Docker image's internals, not a confirmed issue in the code under review. DROP.Finding 3 — Same pattern:
PR_BRANCHfromgithub.head_ref. The branch name is set by the PR author (who has write access to push branches). The concern about shell metacharacters is speculation about the external container's behavior, which the draft itself says "could not be verified." DROP.All three findings fail to confirm against the actual code and repository context.
Verdict: No material issues found
🎯 Correctness — No material issues found
Both findings in the draft are explicitly marked as unverified by the original reviewer, and I cannot positively confirm either against the actual code in this repository:
gpt-oss:120b-cloud: The onlygpt-ossreference in the repo (ollama_test.go:250) uses the regular ollama provider (not ollama-cloud), so it tells us nothing about the correct ollama-cloud tag. No other file in the repo confirms or contradictsgpt-oss:120b-cloud.IS_DRAFT/issue_comment: This is a question about Gitea Actions runtime behavior, not something verifiable from the repo's code. The draft itself concedes it cannot verify Gitea's exact behavior.Both are dropped.
Verdict: No material issues found
🧯 Error handling & edge cases — Blocking issues found
I've read the actual workflow file and searched the entire
.giteadirectory. The finding is confirmed:types: [opened, reopened, ready_for_review]—synchronizeis indeed absent.push, a separate workflow, etc.) covers the re-review-on-push case.ci.yamlusespull_request: { branches: ["*"] }without explicit types, which defaults to includingsynchronize— so the pattern is understood and used elsewhere in the repo.synchronize.The finding survives. It is the only finding, so the verdict remains as-is.
Verdict: Blocking issues found
Missing
synchronizepull_request event type —adversarial-review.yml:15. The workflow triggers on[opened, reopened, ready_for_review]but omitssynchronize, the event fired when new commits are pushed to an existing PR. This means:ready_for_reviewand gets reviewed, but if the author then pushes fixes, no re-review occurs.Fix: add
synchronizeto thetypeslist:Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 55s
🪰 Gadfly review —
nemotron-3-super:cloud(ollama-cloud)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — Blocking issues found
Blocking issues found
🎯 Correctness — No material issues found
No material issues found
The workflow file and documentation changes were reviewed for correctness. The workflow's conditional logic was verified to correctly restrict secret-bearing runs to trusted users for issue_comment events while allowing pull_request and workflow_dispatch events without restriction. All variable references and secrets are properly scoped. The Docker image reference uses correct syntax. The documentation update accurately describes the adversarial review process without introducing inconsistencies. No logical bugs, incorrect behavior, or semantic errors were found in the changes.
No material issues found🧯 Error handling & edge cases — No material issues found
No material issues found
Outside my lens: The workflow_dispatch trigger lacks actor restriction, allowing any user with repository read access to potentially trigger the workflow and expose secrets if repository permissions permit workflow_dispatch by readers.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 6m 40s
🪰 Gadfly review —
gpt-oss:120b-cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
GITEA_TOKEN,OLLAMA_CLOUD_API_KEY,GADFLY_ENDPOINT_*,GADFLY_FINDINGS_*) are passed as environment variables to that container, which is the intended design for the Gadfly reviewer.issue_commentcorrectly limits execution to a hard‑coded allow‑list of trusted actors, preventing unauthorized users from invoking the job and exposing secrets.All security‑related aspects of the change appear sound.
🎯 Correctness — No material issues found
Verdict: No material issues found
if:guard forissue_commentcorrectly limits runs to allowed users on PR comments and allows normal PR events.CLAUDE.mdaccurately describe the new workflow without altering existing behavior.No correctness defects were identified in the added files.
🧯 Error handling & edge cases — No material issues found
Verdict: No material issues found
${{ github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number }}always resolves to a value for the supported events (pull request, issue comment on a PR, or workflow dispatch with a required input), so no empty group name or nil reference can occur.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 44s
🪰 Gadfly review —
qwen3-coder:480b-cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
The changes introduce a new Gitea workflow for adversarial code review using Gadfly and update documentation accordingly. From a security perspective:
• The workflow properly restricts execution via an allow-list of trusted users (
steve,fizi,dazed) for comment-triggered runs• Secrets are handled appropriately through Gitea's secrets mechanism and only exposed explicitly where needed
• The workflow uses a pinned Docker image (
sha-d7f364d) rather than floating tags, which prevents unexpected image updates• Permissions are scoped minimally (read for contents, write for issues/PRs)
• Concurrency is properly managed with cancellation to prevent resource exhaustion
• Telemetry reporting is opt-in via secrets and doesn't block the main review process
No security vulnerabilities or misconfigurations were identified in the added workflow or documentation.
🎯 Correctness — No material issues found
No material issues found.
🧯 Error handling & edge cases — No material issues found
No material issues found.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5s
🪰 Gadfly review —
gemma4:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
I have reviewed the changes through the Security lens. The primary security surface in this PR is the introduction of a GitHub/Gitea Action that handles secrets and can be triggered by PR comments.
ifguard at the job level (lines 42-47) that restrictsissue_commenttriggers to a specific allow-list of trusted users (steve,fizi,dazed). This prevents untrusted contributors from triggering the workflow via comments to potentially leak secrets or exhaust resources.GITEA_TOKEN,OLLAMA_CLOUD_API_KEY, etc.) are passed as environment variables to the container. This is the standard and secure way to handle secrets in Gitea Actions/GitHub Actions.COMMENT_BODYis passed to the container (line 91). While the workflow itself doesn't process this data, the security of the system depends on thegadflycontainer's internal handling of this input (e.g., preventing prompt injection or command injection). Since this is a standalone external image pinned by SHA (sha-d7f364d), and the workflow is restricted to trusted users, the risk is mitigated.permissionsblock (lines 26-28), requesting only the minimum necessary access (contents: read,issues: write,pull-requests: write).No security vulnerabilities were identified in the workflow configuration.
🎯 Correctness — No material issues found
VERDICT: No material issues found
🧯 Error handling & edge cases — No material issues found
VERDICT: No material issues found
I have reviewed the
.gitea/workflows/adversarial-review.ymlandCLAUDE.mdchanges specifically through the lens of Error handling & edge cases.concurrencygroup and theifguard forissue_commentcorrectly handle the ambiguity of event contexts (PR vs Issue vs Dispatch). The use of||in the concurrency group key ensures that regardless of how the review is triggered, it maps to a stable identifier for that PR, preventing race conditions or leaked runs.timeout-minutes: 90is set, and the container itself is configured withGADFLY_TIMEOUT_SECS: "600"andGADFLY_MAX_STEPS: "14". This provides a layered defense against "hanging" models or infinite agentic loops, preventing the Gitea runner from being exhausted indefinitely.workflow_dispatchinputpr_numberis marked asrequired: true, reducing the risk of null-input panics during manual triggers.contents: read,issues: write,pull-requests: write), limiting the "blast radius" if the agentic tool were to be compromised or malfunction.Nothing in my area.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 18s