ci: add Gadfly adversarial PR reviewer + document the review loop #2

2026-06-27T18:32:51Z

steve commented

2026-06-27 18:32:51 +00:00

Installs the standalone Gadfly agentic adversarial reviewer for majordomo, mirroring executus's setup on the latest pinned image (sha-d7f364d).

What's here

.gitea/workflows/adversarial-review.yml — runs the full fleet on every PR: 9 ollama-cloud models (minimax-m3, glm-5.2, glm-5.1, kimi-k2.7-code, deepseek-v4-pro, nemotron-3-super, gpt-oss:120b, qwen3-coder:480b, gemma4) + the M1/M5 Macs via foreman, each running the 3-lens suite (security, correctness, error-handling). Advisory only — never blocks a merge. Posts one consolidated comment per model, plus findings telemetry to gadfly-reports.
CLAUDE.md — documents the review loop: push work to a PR (never straight to main), wait for Gadfly to finish and weigh its findings, then grade each finding back to the gadfly-reports MCP (record_finding_grade / list_findings / scoreboard).

Notes

Reuses the existing org gitea secrets (OLLAMA_CLOUD_API_KEY, GADFLY_ENDPOINT_M1/_M5, GADFLY_FINDINGS_*). The M1/M5 Macs must be awake/reachable; an offline box only fails its own lane.
Verified the model list before pushing: 9 ollama-cloud + m1 + m5.

🤖 Generated with Claude Code

Installs the standalone [Gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly) agentic adversarial reviewer for majordomo, mirroring executus's setup on the latest pinned image (`sha-d7f364d`). ## What's here - **`.gitea/workflows/adversarial-review.yml`** — runs the full fleet on every PR: **9 ollama-cloud models** (`minimax-m3`, `glm-5.2`, `glm-5.1`, `kimi-k2.7-code`, `deepseek-v4-pro`, `nemotron-3-super`, `gpt-oss:120b`, `qwen3-coder:480b`, `gemma4`) **+ the M1/M5 Macs** via foreman, each running the 3-lens suite (security, correctness, error-handling). Advisory only — never blocks a merge. Posts one consolidated comment per model, plus findings telemetry to gadfly-reports. - **CLAUDE.md** — documents the review loop: push work to a PR (never straight to `main`), wait for Gadfly to finish and weigh its findings, then grade each finding back to the gadfly-reports MCP (`record_finding_grade` / `list_findings` / `scoreboard`). ## Notes - Reuses the existing org gitea secrets (`OLLAMA_CLOUD_API_KEY`, `GADFLY_ENDPOINT_M1`/`_M5`, `GADFLY_FINDINGS_*`). The M1/M5 Macs must be awake/reachable; an offline box only fails its own lane. - Verified the model list before pushing: 9 ollama-cloud + m1 + m5. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

steve added 2 commits 2026-06-27 18:32:52 +00:00

ci: add Gadfly adversarial PR reviewer workflow a5adc6f4d1

Installs the standalone Gadfly agentic adversarial reviewer (advisory, never
blocks merge), mirroring executus's setup on the latest pinned image
(sha-d7f364d). Reviews majordomo PRs with the full fleet: 9 ollama-cloud models
plus the M1/M5 Macs via foreman, each running the 3-lens suite (security,
correctness, error-handling). Posts one consolidated comment per model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs: document the Gadfly adversarial review loop in CLAUDE.md

CI / Build & Test (pull_request) Successful in 10m13s

Details

Adversarial Review (Gadfly) / review (pull_request) Successful in 24m4s

Details

CI / Tidy (pull_request) Successful in 9m26s

Details

8dae9cc941

Records the PR workflow: push work to a PR (never straight to main), wait for
Gadfly to finish and weigh its findings, then grade each finding back to the
gadfly-reports MCP (record_finding_grade / list_findings / scoreboard) so the
telemetry can measure whether each model earns its keep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitea-actions bot commented

2026-06-27 18:32:54 +00:00

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues

.gitea/workflows/adversarial-review.yml:49 — image pinned by a mutable :sha-d7f364d tag, not a content-addressable digest. The header comment (lines 3-4) calls the image "pinned to an immutable :sha- tag," but :sha-d7f364d is still a movable tag, not a @sha256:... digest. Anyone with push access to gitea.stevedudenhoeffer.com/steve/gadfly can re-tag sha-d7f364d to a different (malicious) image, which would then run with GITEA_TOKEN, OLLAMA_CLOUD_API_KEY, and the foreman/telemetry tokens in its env. Mitigation is real but incomplete: pin by registry digest (docker://.../gadfly@sha256:...) rather than a tag named like a sha. Confirmed by reading the file (line 49 uses only a tag).
.gitea/workflows/adversarial-review.yml:52 — GITEA_TOKEN handed to the third-party container with issues: write / pull-requests: write (lines 25-27) and no per-secret scope bounding visible in the workflow. Combined with the un-digested image pull above, a compromised/re-tagged Gadfly image could exfiltrate the token and act beyond posting review comments. The secret's actual scope is out-of-repo config and couldn't be verified from the repo; flagging as a hardening recommendation to confirm GITEA_TOKEN is repo-scoped (read issues + write comments, no contents: write) rather than a broad token.
.gitea/workflows/adversarial-review.yml:90 — COMMENT_BODY: ${{ github.event.comment.body }} is untrusted PR-attacker-controlled input passed verbatim into the secret-bearing container env. Low risk on the workflow side: it's an env var, there is no run: shell step, and the workflow does no shell interpolation of it, so this is not a workflow-side injection. The one tainted attacker-controlled value reaching the container, though — the container's own logic must treat COMMENT_BODY/ACTOR/PR_BRANCH as untrusted (no shell-out/templating with them). Advisory only.

Note: the issue_comment design is otherwise sound — the job-level if: allow-list (lines 38-43) and the mirrored GADFLY_ALLOWED_USERS (line 78) gate who can trigger a secret-bearing run, and there is no actions/checkout of PR code into the container (line 49 pulls a pinned image directly), so a PR author cannot tamper with the workflow that consumes the secrets.

🎯 Correctness — Minor issues

Draft review

VERDICT: Minor issues

Through the correctness lens, I verified the workflow YAML against the checked-out repo (the Gadfly container source isn't in this repo, so container-internal semantics are out of reach). Findings:

.gitea/workflows/adversarial-review.yml:15 — pull_request trigger omits synchronize, so pushing new commits to an open PR does not re-run Gadfly. The triggers are [opened, reopened, ready_for_review] only. The documented loop in CLAUDE.md:136 ("Push to a PR... let Gadfly review it") and the PR description ("runs the full fleet on every PR") imply review tracks PR updates, but the only re-trigger path after a follow-up push is a manual comment (issue_comment: created). Verified by reading the on: block. Fix: add synchronize (and optionally edited) to the pull_request types, or document in CLAUDE.md that re-review requires a comment.
.gitea/workflows/adversarial-review.yml:88-89 — PR_BRANCH and IS_DRAFT are empty for issue_comment and workflow_dispatch triggers. github.head_ref (line 88) and github.event.pull_request.draft (line 89) are only populated on pull_request events. For comment/dispatch runs both env vars resolve to empty strings. The inline comment at line 81-82 says GADFLY_REPO/GADFLY_PR are "derived in-container," yet PR (line 87) is also passed explicitly, and PR_BRANCH is only ever passed (never derived per the comment). If the Gadfly container consumes PR_BRANCH directly to fetch the PR ref, comment/dispatch-triggered reviews operate with an empty branch. I could not verify the container's behavior (no Gadfly source in this checkout), so this is flagged as a suspected gap — confirm the container re-derives the head ref from the PR number via GITEA_API when PR_BRANCH is empty.
.gitea/workflows/adversarial-review.yml:29-31 — concurrency group shared across all trigger types; a skipped comment-triggered run may cancel an in-progress review. The group gadfly-<number> (line 30) resolves to the same value for pull_request, issue_comment (PR comment → issue.number == PR number), and workflow_dispatch. With cancel-in-progress: true (line 31), a comment by a non-allow-listed user evaluates the if: guard (lines 38-43) false, but the workflow run still enters the same concurrency group. If Gitea evaluates concurrency at run-enqueue time (as GitHub Actions does, before job if: evaluation), a bystander's comment could abort a running review. I could not confirm Gitea's exact semantics here; worth verifying and, if affected, gating the concurrency group or moving the allow-list check earlier.

I confirmed the expression ordering in PR (line 87) and the if: guard (lines 38-43) are correct: for issue_comment, github.event.pull_request.number is undefined → falsy → falls through to github.event.issue.number, and github.event.issue.pull_request correctly restricts comment triggers to PRs. The GADFLY_MODELS cloud-tag forms (:cloud, :120b-cloud, :480b-cloud) are consistent with ollama tag naming and not a correctness defect.

🧯 Error handling & edge cases — Minor issues

Verdict: Minor issues

.gitea/workflows/adversarial-review.yml:38-43 — every comment by an allowed actor re-runs the whole fleet; no command gating. The issue_comment branch of the if: only checks the actor and that github.event.issue.pull_request is set — it never inspects COMMENT_BODY, which is forwarded into the env at line 90 but is not used by the gate. So any casual reply ("looks good", a typo) from steve/fizi/dazed launches the full 9-model + M1/M5 suite. Combined with cancel-in-progress: true (line 31), a second conversational comment cancels a review already in flight and restarts it from scratch, losing partial work and churning the slow local lanes. Fix: gate on a command token, e.g. add && startsWith(github.event.comment.body, '/gadfly') to the if:.
.gitea/workflows/adversarial-review.yml:14-15 — draft PRs get reviewed twice. pull_request triggers on opened (which fires for drafts too) and again on ready_for_review when the draft is flipped ready. IS_DRAFT is forwarded (line 89) but the workflow doesn't gate on it, so WIP draft code gets a full 90-min review on open and a second one on ready-for-review. Fix: either drop opened in favor of ready_for_review/synchronize, or add && github.event.pull_request.draft == false to the if: for PR events (the run shouldn't start at all for drafts).
.gitea/workflows/adversarial-review.yml:87-88 — PR_BRANCH resolves to empty on non-pull_request triggers (unverified impact). PR_BRANCH: ${{ github.head_ref }} is only populated on pull_request events. For workflow_dispatch, PR is the input pr_number but PR_BRANCH is empty; for issue_comment, both github.event.pull_request.number and github.head_ref are undefined and PR falls back to github.event.issue.number while PR_BRANCH stays empty. Whether this breaks the container depends on the in-container code (it may re-derive the branch from the API via GITEA_API/PR). Worth confirming the container handles an empty PR_BRANCH by resolving it from the PR API rather than trusting the env var.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 37s}

### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Minor issues</summary> VERDICT: Minor issues - **`.gitea/workflows/adversarial-review.yml:49` — image pinned by a mutable `:sha-d7f364d` tag, not a content-addressable digest.** The header comment (lines 3-4) calls the image "pinned to an immutable :sha- tag," but `:sha-d7f364d` is still a movable tag, not a `@sha256:...` digest. Anyone with push access to `gitea.stevedudenhoeffer.com/steve/gadfly` can re-tag `sha-d7f364d` to a different (malicious) image, which would then run with `GITEA_TOKEN`, `OLLAMA_CLOUD_API_KEY`, and the foreman/telemetry tokens in its env. Mitigation is real but incomplete: pin by registry digest (`docker://.../gadfly@sha256:...`) rather than a tag named like a sha. Confirmed by reading the file (line 49 uses only a tag). - **`.gitea/workflows/adversarial-review.yml:52` — `GITEA_TOKEN` handed to the third-party container with `issues: write` / `pull-requests: write` (lines 25-27) and no per-secret scope bounding visible in the workflow.** Combined with the un-digested image pull above, a compromised/re-tagged Gadfly image could exfiltrate the token and act beyond posting review comments. The secret's actual scope is out-of-repo config and couldn't be verified from the repo; flagging as a hardening recommendation to confirm `GITEA_TOKEN` is repo-scoped (read issues + write comments, no `contents: write`) rather than a broad token. - **`.gitea/workflows/adversarial-review.yml:90` — `COMMENT_BODY: ${{ github.event.comment.body }}` is untrusted PR-attacker-controlled input passed verbatim into the secret-bearing container env.** Low risk on the workflow side: it's an env var, there is no `run:` shell step, and the workflow does no shell interpolation of it, so this is not a workflow-side injection. The one tainted attacker-controlled value reaching the container, though — the container's own logic must treat `COMMENT_BODY`/`ACTOR`/`PR_BRANCH` as untrusted (no shell-out/templating with them). Advisory only. Note: the `issue_comment` design is otherwise sound — the job-level `if:` allow-list (lines 38-43) and the mirrored `GADFLY_ALLOWED_USERS` (line 78) gate who can trigger a secret-bearing run, and there is no `actions/checkout` of PR code into the container (line 49 pulls a pinned image directly), so a PR author cannot tamper with the workflow that consumes the secrets. </details> <details><summary>🎯 Correctness — Minor issues</summary> ## Draft review VERDICT: Minor issues Through the correctness lens, I verified the workflow YAML against the checked-out repo (the Gadfly container source isn't in this repo, so container-internal semantics are out of reach). Findings: - **`.gitea/workflows/adversarial-review.yml:15` — `pull_request` trigger omits `synchronize`, so pushing new commits to an *open* PR does not re-run Gadfly.** The triggers are `[opened, reopened, ready_for_review]` only. The documented loop in `CLAUDE.md:136` ("Push to a PR... let Gadfly review it") and the PR description ("runs the full fleet on every PR") imply review tracks PR updates, but the only re-trigger path after a follow-up push is a manual comment (`issue_comment: created`). Verified by reading the `on:` block. Fix: add `synchronize` (and optionally `edited`) to the `pull_request` types, or document in CLAUDE.md that re-review requires a comment. - **`.gitea/workflows/adversarial-review.yml:88-89` — `PR_BRANCH` and `IS_DRAFT` are empty for `issue_comment` and `workflow_dispatch` triggers.** `github.head_ref` (line 88) and `github.event.pull_request.draft` (line 89) are only populated on `pull_request` events. For comment/dispatch runs both env vars resolve to empty strings. The inline comment at line 81-82 says `GADFLY_REPO`/`GADFLY_PR` are "derived in-container," yet `PR` (line 87) is *also* passed explicitly, and `PR_BRANCH` is only ever passed (never derived per the comment). If the Gadfly container consumes `PR_BRANCH` directly to fetch the PR ref, comment/dispatch-triggered reviews operate with an empty branch. I could not verify the container's behavior (no Gadfly source in this checkout), so this is flagged as a suspected gap — confirm the container re-derives the head ref from the PR number via `GITEA_API` when `PR_BRANCH` is empty. - **`.gitea/workflows/adversarial-review.yml:29-31` — concurrency group shared across all trigger types; a skipped comment-triggered run may cancel an in-progress review.** The group `gadfly-<number>` (line 30) resolves to the same value for `pull_request`, `issue_comment` (PR comment → `issue.number` == PR number), and `workflow_dispatch`. With `cancel-in-progress: true` (line 31), a comment by a non-allow-listed user evaluates the `if:` guard (lines 38-43) false, but the workflow run still enters the same concurrency group. If Gitea evaluates concurrency at run-enqueue time (as GitHub Actions does, before job `if:` evaluation), a bystander's comment could abort a running review. I could not confirm Gitea's exact semantics here; worth verifying and, if affected, gating the concurrency group or moving the allow-list check earlier. I confirmed the expression ordering in `PR` (line 87) and the `if:` guard (lines 38-43) are correct: for `issue_comment`, `github.event.pull_request.number` is undefined → falsy → falls through to `github.event.issue.number`, and `github.event.issue.pull_request` correctly restricts comment triggers to PRs. The `GADFLY_MODELS` cloud-tag forms (`:cloud`, `:120b-cloud`, `:480b-cloud`) are consistent with ollama tag naming and not a correctness defect. </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> Verdict: Minor issues - **`.gitea/workflows/adversarial-review.yml:38-43` — every comment by an allowed actor re-runs the whole fleet; no command gating.** The `issue_comment` branch of the `if:` only checks the actor and that `github.event.issue.pull_request` is set — it never inspects `COMMENT_BODY`, which is forwarded into the env at line 90 but is not used by the gate. So any casual reply ("looks good", a typo) from steve/fizi/dazed launches the full 9-model + M1/M5 suite. Combined with `cancel-in-progress: true` (line 31), a second conversational comment cancels a review already in flight and restarts it from scratch, losing partial work and churning the slow local lanes. Fix: gate on a command token, e.g. add `&& startsWith(github.event.comment.body, '/gadfly')` to the `if:`. - **`.gitea/workflows/adversarial-review.yml:14-15` — draft PRs get reviewed twice.** `pull_request` triggers on `opened` (which fires for drafts too) and again on `ready_for_review` when the draft is flipped ready. `IS_DRAFT` is forwarded (line 89) but the workflow doesn't gate on it, so WIP draft code gets a full 90-min review on open and a second one on ready-for-review. Fix: either drop `opened` in favor of `ready_for_review`/`synchronize`, or add `&& github.event.pull_request.draft == false` to the `if:` for PR events (the run shouldn't start at all for drafts). - **`.gitea/workflows/adversarial-review.yml:87-88` — `PR_BRANCH` resolves to empty on non-`pull_request` triggers** *(unverified impact)*. `PR_BRANCH: ${{ github.head_ref }}` is only populated on `pull_request` events. For `workflow_dispatch`, `PR` is the input `pr_number` but `PR_BRANCH` is empty; for `issue_comment`, both `github.event.pull_request.number` and `github.head_ref` are undefined and `PR` falls back to `github.event.issue.number` while `PR_BRANCH` stays empty. Whether this breaks the container depends on the in-container code (it may re-derive the branch from the API via `GITEA_API`/`PR`). Worth confirming the container handles an empty `PR_BRANCH` by resolving it from the PR API rather than trusting the env var. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 37s

gitea-actions bot commented

2026-06-27 18:32:54 +00:00

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues

Findings (security lens)

.gitea/workflows/adversarial-review.yml:49 — Docker image pinned by mutable registry, not digest. The tag is :sha-d7f364d, but the docker:// reference does not resolve to an immutable content hash; the registry can replace the tag. The comment on lines 3-4 calls it an "immutable :sha- tag," which isn't what the Gitea runner actually fetches — a registry compromise or force-push to the same tag leaks GITEA_TOKEN, OLLAMA_CLOUD_API_KEY, the GADFLY_ENDPOINT_M* URLs/tokens, and the GADFLY_FINDINGS_* secrets into attacker-controlled code. Pin by digest (@sha256:...).
.gitea/workflows/adversarial-review.yml:90-91 — COMMENT_BODY and COMMENT_ID from the issue_comment event are set unconditionally in the step's env: block, alongside the secrets. Even with the job-level if: guard on lines 38-43 gating whether the job runs at all, the comment text is placed in the same environment as GITEA_TOKEN and the API keys. Any bug, log line, or telemetry path in the Gadfly container that dumps its own environment leaks the comment body and the token. Restructure so user-supplied event fields are set only after the guard, or pass them via a file written by a trusted step.
.gitea/workflows/adversarial-review.yml:38-43 — allow-list is case-sensitive and exact-string compared. github.actor == 'steve' (and 'fizi', 'dazed') won't match SSO-aliased variants (Steve, STEVE, trailing space, etc.), so the guard fails closed or open depending on the actor's display name. Either lowercase both sides or, better, source the list from a vars.* entry so it isn't editable by a PR that lands via push-to-main.
.gitea/workflows/adversarial-review.yml:83-84 — findings URL is a secret, so the third-party container can be redirected to any host with the token in hand. GADFLY_FINDINGS_URL should be a non-secret vars.GADFLY_FINDINGS_URL; only the bearer (GADFLY_FINDINGS_TOKEN) needs to be a secret. Today an SSRF-from-the-inside by Gadfly can POST findings (and the token) to an attacker-chosen endpoint.
.gitea/workflows/adversarial-review.yml:88 — PR_BRANCH: ${{ github.head_ref }} is attacker-controllable and flows into a third-party container as a label/telemetry field. Branch names from PRs (including forks, where head_ref is the contributor's branch) can contain .., /, or shell metacharacters. Combined with finding 2, this widens the injection surface for any code path that echoes the value back. Treat all github.event.* fields as tainted and validate before they leave the trusted boundary.
.gitea/workflows/adversarial-review.yml:24-27 — the GITEA_TOKEN granted here is write-scoped to both issues and pull-requests, and is the same generic org secret used elsewhere. A single bug in the Gadfly container = ability to edit PR titles, merge PRs, close/lock issues, etc. Issue a fine-scoped token (ReadAndWrite on pull_requests only, since the workflow only needs to post review comments) and pass it as a dedicated secret rather than the org-wide GITEA_TOKEN.
CLAUDE.md:145-151 — documents mcp__gadfly__record_finding_grade / mcp__gadfly__list_findings / mcp__gadfly__scoreboard as available MCP tools, but no MCP wiring is referenced anywhere in the repo. grep for mcp__gadfly and gadfly shows only the workflow and these doc lines — no MCP server config, no endpoint, no credential. Operators are told to call a tool that isn't surfaced. Either point the doc at the real MCP server definition or remove the instructions.

Note: findings about GADFLY_ALLOWED_USERS being "string in workflow, not a secret" and about the pull_request from-fork secret-leak were folded into the if: allow-list finding above; the env-var ordering concern is covered by the COMMENT_BODY/COMMENT_ID finding.

🎯 Correctness — Minor issues

The comment on lines 64-65 is admittedly ugly prose — LENSES: and LENS are abbreviated shorthand, hard to parse, and the second occurrence redundantly repeats "ollama-cloud=3" already stated. That's a fair cosmetic observation. However, the draft's own framing (it's a comment, not a behavior bug) is honest, and it's a nit. Whether to keep it depends on whether a cosmetic finding rises to "minor issue" — I'd say it does qualify as a real but trivial observation, distinct from the (false) separator-typo claim. I'll keep it as a cosmetic note but not let it drive the verdict.

Both findings about the model-spec separator/typo are wrong. The cosmetic comment note is real but minor.

Correctness Review

VERDICT: Minor issues

Findings

.gitea/workflows/adversarial-review.yml:64-65 — cosmetic comment nit. The lines
```
# LENSES: cloud models run a few at a time (ollama-cloud=3) with their 3
# lenses concurrent (LENS ollama-cloud=3) so comments land sooner; each
```
use bare shorthand (LENSES:, LENS) where the actual env vars below them are GADFLY_PROVIDER_CONCURRENCY and GADFLY_PROVIDER_LENS_CONCURRENCY, and the second occurrence of ollama-cloud=3 merely restates the first. Not a behavior bug; just hard to read adjacent to the env block it documents.

What I checked and found clean

.gitea/workflows/adversarial-review.yml:68 — GADFLY_MODELS value. The original draft flagged gpt-oss:120b-cloud and qwen3-coder:480b-cloud as separator typos. False positive. The parse grammar (docs/adr/0003-parse-grammar.md:17-23, provider/ollama/ollama_test.go:250, parse_test.go:65-77) is provider "/" model with the model passed verbatim. The 9 comma-separated cloud entries are bare alias tokens (no /); the two Mac entries are m1/<model> and m5/<model> targets. -cloud in the alias names is the registered naming convention, parallel to the minimax-m3:cloud style — not a separator mismatch. The same -cloud-suffixed id qwen3-coder:480b-cloud already appears verbatim as a real model id in parse_test.go:68, README.md:80, examples/live/main.go:33, examples/tiers/main.go:20. No fix needed.
Triggers, if: guard, concurrency group, env-var event context, permissions, timeout, allow-list mirror: all consistent with the draft's own analysis.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

I read the workflow file end-to-end (only 92 lines) and evaluated each draft finding against it. Every finding either self-corrects, is explicitly marked unverified, or is a speculative "nice-to-have" rather than a confirmed bug. None survive verification:

.gitea/workflows/adversarial-review.yml:87 — PR fallback chain. The reviewer themselves concludes "Verified by reading the file; no real bug here" — the || short-circuit is correct on all three trigger paths. Not a finding.
.gitea/workflows/adversarial-review.yml:88-90 — empty strings on non-PR events. Explicitly marked "Unverified — I don't have the Gadfly source in this repo." The workflow comment block (lines 85-86) labels these as "leave as-is" event context, and the reviewer concedes the container likely tolerates empties. Pure speculation about external code.
.gitea/workflows/adversarial-review.yml:68 — GADFLY_MODELS parser shape. Explicitly marked "Unverified; only a real bug if the Gadfly parser is strict about the name:tag shape." The reviewer even notes the per-model isolation (line 59 comment) would mask a parser bug. Speculation about an external parser.
.gitea/workflows/adversarial-review.yml:49 — no checkout / preflight step. The reviewer concedes "the secret is documented as required, and a missing-token failure is loud". This is a quality-of-life suggestion ("would benefit from a 1-line precondition"), not a confirmed material defect — the workflow is correct as written and fails clearly on misconfig.
.gitea/workflows/adversarial-review.yml:31 — cancel-in-progress: true. Explicitly hedged as "Not an error-handling bug per se — just an interaction worth being aware of." Not a bug.

Nothing else in the file is materially wrong: the if: allow-list guard (lines 38-43) correctly mirrors GADFLY_ALLOWED_USERS (line 78), the concurrency group key (line 30) is well-formed, the secret mappings are explicit (Gitea's no-auto-expose behavior is acknowledged in the comment on line 60), and the timeouts/concurrency tunables are internally consistent (cloud lens concurrency 3 + provider concurrency 3 for ollama-cloud, serial 1 for each Mac lane).

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 11s}

### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Minor issues</summary> ## VERDICT: Minor issues ### Findings (security lens) - **`.gitea/workflows/adversarial-review.yml:49` — Docker image pinned by mutable registry, not digest.** The tag is `:sha-d7f364d`, but the `docker://` reference does not resolve to an immutable content hash; the registry can replace the tag. The comment on lines 3-4 calls it an "immutable :sha- tag," which isn't what the Gitea runner actually fetches — a registry compromise or force-push to the same tag leaks `GITEA_TOKEN`, `OLLAMA_CLOUD_API_KEY`, the `GADFLY_ENDPOINT_M*` URLs/tokens, and the `GADFLY_FINDINGS_*` secrets into attacker-controlled code. Pin by digest (`@sha256:...`). - **`.gitea/workflows/adversarial-review.yml:90-91` — `COMMENT_BODY` and `COMMENT_ID` from the `issue_comment` event are set unconditionally in the step's `env:` block, alongside the secrets.** Even with the job-level `if:` guard on lines 38-43 gating whether the job runs at all, the comment text is placed in the same environment as `GITEA_TOKEN` and the API keys. Any bug, log line, or telemetry path in the Gadfly container that dumps its own environment leaks the comment body and the token. Restructure so user-supplied event fields are set only after the guard, or pass them via a file written by a trusted step. - **`.gitea/workflows/adversarial-review.yml:38-43` — allow-list is case-sensitive and exact-string compared.** `github.actor == 'steve'` (and `'fizi'`, `'dazed'`) won't match SSO-aliased variants (`Steve`, `STEVE`, trailing space, etc.), so the guard fails closed or open depending on the actor's display name. Either lowercase both sides or, better, source the list from a `vars.*` entry so it isn't editable by a PR that lands via push-to-main. - **`.gitea/workflows/adversarial-review.yml:83-84` — findings URL is a secret, so the third-party container can be redirected to any host with the token in hand.** `GADFLY_FINDINGS_URL` should be a non-secret `vars.GADFLY_FINDINGS_URL`; only the bearer (`GADFLY_FINDINGS_TOKEN`) needs to be a secret. Today an SSRF-from-the-inside by Gadfly can POST findings (and the token) to an attacker-chosen endpoint. - **`.gitea/workflows/adversarial-review.yml:88` — `PR_BRANCH: ${{ github.head_ref }}` is attacker-controllable and flows into a third-party container as a label/telemetry field.** Branch names from PRs (including forks, where `head_ref` is the contributor's branch) can contain `..`, `/`, or shell metacharacters. Combined with finding 2, this widens the injection surface for any code path that echoes the value back. Treat all `github.event.*` fields as tainted and validate before they leave the trusted boundary. - **`.gitea/workflows/adversarial-review.yml:24-27` — the `GITEA_TOKEN` granted here is write-scoped to both `issues` and `pull-requests`, and is the same generic org secret used elsewhere.** A single bug in the Gadfly container = ability to edit PR titles, merge PRs, close/lock issues, etc. Issue a fine-scoped token (`ReadAndWrite` on `pull_requests` only, since the workflow only needs to post review comments) and pass it as a dedicated secret rather than the org-wide `GITEA_TOKEN`. - **`CLAUDE.md:145-151` — documents `mcp__gadfly__record_finding_grade` / `mcp__gadfly__list_findings` / `mcp__gadfly__scoreboard` as available MCP tools, but no MCP wiring is referenced anywhere in the repo.** `grep` for `mcp__gadfly` and `gadfly` shows only the workflow and these doc lines — no MCP server config, no endpoint, no credential. Operators are told to call a tool that isn't surfaced. Either point the doc at the real MCP server definition or remove the instructions. Note: findings about `GADFLY_ALLOWED_USERS` being "string in workflow, not a secret" and about the `pull_request` from-fork secret-leak were folded into the `if:` allow-list finding above; the env-var ordering concern is covered by the `COMMENT_BODY`/`COMMENT_ID` finding. </details> <details><summary>🎯 Correctness — Minor issues</summary> The comment on lines 64-65 is admittedly ugly prose — `LENSES:` and `LENS` are abbreviated shorthand, hard to parse, and the second occurrence redundantly repeats "ollama-cloud=3" already stated. That's a fair cosmetic observation. However, the draft's own framing (it's a comment, not a behavior bug) is honest, and it's a nit. Whether to keep it depends on whether a cosmetic finding rises to "minor issue" — I'd say it does qualify as a real but trivial observation, distinct from the (false) separator-typo claim. I'll keep it as a cosmetic note but not let it drive the verdict. Both findings about the model-spec separator/typo are wrong. The cosmetic comment note is real but minor. # Correctness Review **VERDICT: Minor issues** ## Findings - **`.gitea/workflows/adversarial-review.yml:64-65` — cosmetic comment nit.** The lines ``` # LENSES: cloud models run a few at a time (ollama-cloud=3) with their 3 # lenses concurrent (LENS ollama-cloud=3) so comments land sooner; each ``` use bare shorthand (`LENSES:`, `LENS`) where the actual env vars below them are `GADFLY_PROVIDER_CONCURRENCY` and `GADFLY_PROVIDER_LENS_CONCURRENCY`, and the second occurrence of `ollama-cloud=3` merely restates the first. Not a behavior bug; just hard to read adjacent to the env block it documents. ## What I checked and found clean - **`.gitea/workflows/adversarial-review.yml:68` — `GADFLY_MODELS` value.** The original draft flagged `gpt-oss:120b-cloud` and `qwen3-coder:480b-cloud` as separator typos. **False positive.** The parse grammar (`docs/adr/0003-parse-grammar.md:17-23`, `provider/ollama/ollama_test.go:250`, `parse_test.go:65-77`) is `provider "/" model` with the model passed verbatim. The 9 comma-separated cloud entries are bare alias tokens (no `/`); the two Mac entries are `m1/<model>` and `m5/<model>` targets. `-cloud` in the alias names is the registered naming convention, parallel to the `minimax-m3:cloud` style — not a separator mismatch. The same `-cloud`-suffixed id `qwen3-coder:480b-cloud` already appears verbatim as a real model id in `parse_test.go:68`, `README.md:80`, `examples/live/main.go:33`, `examples/tiers/main.go:20`. No fix needed. - **Triggers, `if:` guard, concurrency group, env-var event context, permissions, timeout, allow-list mirror**: all consistent with the draft's own analysis. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> ## VERDICT: No material issues found I read the workflow file end-to-end (only 92 lines) and evaluated each draft finding against it. Every finding either self-corrects, is explicitly marked unverified, or is a speculative "nice-to-have" rather than a confirmed bug. None survive verification: - **`.gitea/workflows/adversarial-review.yml:87` — `PR` fallback chain.** The reviewer themselves concludes "Verified by reading the file; no real bug here" — the `||` short-circuit is correct on all three trigger paths. Not a finding. - **`.gitea/workflows/adversarial-review.yml:88-90` — empty strings on non-PR events.** Explicitly marked "Unverified — I don't have the Gadfly source in this repo." The workflow comment block (lines 85-86) labels these as "leave as-is" event context, and the reviewer concedes the container likely tolerates empties. Pure speculation about external code. - **`.gitea/workflows/adversarial-review.yml:68` — `GADFLY_MODELS` parser shape.** Explicitly marked "Unverified; only a real bug if the Gadfly parser is strict about the `name:tag` shape." The reviewer even notes the per-model isolation (line 59 comment) would mask a parser bug. Speculation about an external parser. - **`.gitea/workflows/adversarial-review.yml:49` — no checkout / preflight step.** The reviewer concedes "the secret is documented as required, and a missing-token failure is loud". This is a quality-of-life suggestion ("would benefit from a 1-line precondition"), not a confirmed material defect — the workflow is correct as written and fails clearly on misconfig. - **`.gitea/workflows/adversarial-review.yml:31` — `cancel-in-progress: true`.** Explicitly hedged as "Not an error-handling bug per se — just an interaction worth being aware of." Not a bug. Nothing else in the file is materially wrong: the `if:` allow-list guard (lines 38-43) correctly mirrors `GADFLY_ALLOWED_USERS` (line 78), the concurrency group key (line 30) is well-formed, the secret mappings are explicit (Gitea's no-auto-expose behavior is acknowledged in the comment on line 60), and the timeouts/concurrency tunables are internally consistent (cloud lens concurrency 3 + provider concurrency 3 for ollama-cloud, serial 1 for each Mac lane). </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 11s

gitea-actions bot commented

2026-06-27 18:32:54 +00:00

🪰 Gadfly review — `m1/qwen3:14b` (m1)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

Blocking issues found

Secrets exposure risk: The workflow passes multiple sensitive secrets (OLLAMA_CLOUD_API_KEY, GADFLY_ENDPOINT_M1, GADFLY_ENDPOINT_M5, GADFLY_FINDINGS_URL, GADFLY_FINDINGS_TOKEN) directly to the Docker container. If the container has any vulnerabilities (e.g., insecure deserialization, command injection), these secrets could be leaked. The PR mentions using a pinned SHA for the image, but this doesn't eliminate the risk of vulnerabilities in the image itself. Verified by reading the diff.
Insecure endpoint format: The GADFLY_ENDPOINT_M1 and GADFLY_ENDPOINT_M5 secrets use a custom format foreman|https://<host>|<token>, which is not standard. If the foreman implementation doesn't properly validate these endpoints, it could lead to SSRF or other attacks. This format isn't validated in the workflow, creating a potential injection risk. Verified by reading the diff.
Lack of input validation: The GADFLY_MODELS environment variable is set to a hardcoded list of models, but there's no validation to ensure these values are safe or properly formatted. If this variable were user-controllable, it could allow injection attacks (e.g., malicious model names). Verified by reading the diff.

🎯 Correctness — Blocking issues found

Blocking issues found

.gitea/workflows/adversarial-review.yml:55 — The GADFLY_MODELS value "m1/qwen3:14b,m5/qwen3.6:35b-mlx" includes provider prefixes (m1/, m5/) in model names, which may be semantically incorrect. The comment explicitly states that GADFLY_ENDPOINT_M1/_M5 are foreman-preset Ollama clients (format: foreman|https://...|<token>), implying models should be specified without provider prefixes. Including provider prefixes in model names may cause Gadfly to misroute model execution, as the provider is already defined via the endpoint secrets. Verify with the Gadfly documentation or foreman configuration to confirm whether provider prefixes are valid in this context.

🧯 Error handling & edge cases — Blocking issues found

Blocking issues found

.gitea/workflows/adversarial-review.yml:23 — No error handling for missing/invalid secrets. If GITEA_TOKEN or OLLAMA_CLOUD_API_KEY is missing, the Docker container will fail silently without reporting actionable errors to the user. Verified by reading the workflow's env block — no if guards or fallbacks exist for secret validation.
.gitea/workflows/adversarial-review.yml:53 — No error recovery for failed model executions. The GADFLY_MODELS list includes 10 models (9 cloud + 2 Macs), but if any model fails (e.g., due to offline Macs), the workflow will silently fail without distinguishing between model-specific errors and system-level failures. Verified by checking the Docker step — no continue-on-error or error classification logic is present.
.gitea/workflows/adversarial-review.yml:66 — No fallback for MCP telemetry failures. If GADFLY_FINDINGS_URL is unreachable or authentication fails, the workflow will silently lose telemetry data without alerting the user. Verified by examining the env block — no error logging or retry logic is configured for the findings endpoint.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 24m 3s}

### 🪰 Gadfly review — `m1/qwen3:14b` (m1) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Blocking issues found</summary> Blocking issues found - **Secrets exposure risk**: The workflow passes multiple sensitive secrets (`OLLAMA_CLOUD_API_KEY`, `GADFLY_ENDPOINT_M1`, `GADFLY_ENDPOINT_M5`, `GADFLY_FINDINGS_URL`, `GADFLY_FINDINGS_TOKEN`) directly to the Docker container. If the container has any vulnerabilities (e.g., insecure deserialization, command injection), these secrets could be leaked. The PR mentions using a pinned SHA for the image, but this doesn't eliminate the risk of vulnerabilities in the image itself. Verified by reading the diff. - **Insecure endpoint format**: The `GADFLY_ENDPOINT_M1` and `GADFLY_ENDPOINT_M5` secrets use a custom format `foreman|https://<host>|<token>`, which is not standard. If the foreman implementation doesn't properly validate these endpoints, it could lead to SSRF or other attacks. This format isn't validated in the workflow, creating a potential injection risk. Verified by reading the diff. - **Lack of input validation**: The `GADFLY_MODELS` environment variable is set to a hardcoded list of models, but there's no validation to ensure these values are safe or properly formatted. If this variable were user-controllable, it could allow injection attacks (e.g., malicious model names). Verified by reading the diff. </details> <details><summary>🎯 Correctness — Blocking issues found</summary> Blocking issues found - **`.gitea/workflows/adversarial-review.yml:55`** — The `GADFLY_MODELS` value `"m1/qwen3:14b,m5/qwen3.6:35b-mlx"` includes provider prefixes (`m1/`, `m5/`) in model names, which may be semantically incorrect. The comment explicitly states that `GADFLY_ENDPOINT_M1`/`_M5` are foreman-preset Ollama clients (format: `foreman|https://...|<token>`), implying models should be specified without provider prefixes. Including provider prefixes in model names may cause Gadfly to misroute model execution, as the provider is already defined via the endpoint secrets. Verify with the Gadfly documentation or foreman configuration to confirm whether provider prefixes are valid in this context. </details> <details><summary>🧯 Error handling & edge cases — Blocking issues found</summary> Blocking issues found - **.gitea/workflows/adversarial-review.yml:23** — No error handling for missing/invalid secrets. If `GITEA_TOKEN` or `OLLAMA_CLOUD_API_KEY` is missing, the Docker container will fail silently without reporting actionable errors to the user. Verified by reading the workflow's env block — no `if` guards or fallbacks exist for secret validation. - **.gitea/workflows/adversarial-review.yml:53** — No error recovery for failed model executions. The `GADFLY_MODELS` list includes 10 models (9 cloud + 2 Macs), but if any model fails (e.g., due to offline Macs), the workflow will silently fail without distinguishing between model-specific errors and system-level failures. Verified by checking the Docker step — no `continue-on-error` or error classification logic is present. - **.gitea/workflows/adversarial-review.yml:66** — No fallback for MCP telemetry failures. If `GADFLY_FINDINGS_URL` is unreachable or authentication fails, the workflow will silently lose telemetry data without alerting the user. Verified by examining the env block — no error logging or retry logic is configured for the findings endpoint. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 24m 3s

gitea-actions bot commented

2026-06-27 18:32:54 +00:00

🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues

.gitea/workflows/adversarial-review.yml:38-43 — pull_request events bypass the actor allowlist; defense-in-depth would extend the check to all event types. The if: guard only restricts issue_comment events to actors steve/fizi/dazed. For pull_request and workflow_dispatch events, github.event_name != 'issue_comment' is true and the job runs unconditionally. This is a standard CI pattern that relies on the platform (Gitea Actions, like GitHub Actions) withholding secrets from fork-PR pull_request events. That platform-level isolation is the primary security boundary, and it is in effect here. However, any repo collaborator — even one not in the allowlist — can open a same-repo PR and trigger the job with full access to all mapped secrets. For repositories where all collaborators are trusted, this is acceptable; if the allowlist is meant to be the authoritative gate, the if: should cover pull_request and workflow_dispatch events too. Impact: Low — relies on Gitea's fork-PR secret isolation (standard behavior) and assumes repo collaborators are implicitly trusted.
.gitea/workflows/adversarial-review.yml:90 — Raw COMMENT_BODY passed as environment variable without sanitization. ${{ github.event.comment.body }} is injected directly into the container environment. If the Gadfly image interprets this value in a shell or template context, it could enable command or template injection. The issue_comment path is restricted to the three allowed actors (steve/fizi/dazed), making exploitation require a compromised trusted account. Impact: Low — trusted-actor-only path, but defense-in-depth would sanitize or validate the comment body before passing it to the container.

🎯 Correctness — Minor issues

VERDICT: Minor issues

.gitea/workflows/adversarial-review.yml:15 — pull_request trigger omits synchronize. The types list is [opened, reopened, ready_for_review], which does not include synchronize. This means pushing new commits to an existing PR will not automatically re-trigger the adversarial review. The issue_comment trigger and workflow_dispatch offer manual workarounds, but the workflow's own documentation and CLAUDE.md both say to "push to a PR" and wait for Gadfly — implying it should auto-run on every push. If automatic re-review on pushes is desired, add synchronize to the types list. (Confirmed by reading the file.)
.gitea/workflows/adversarial-review.yml:68 — GADFLY_MODELS format vs documented provider/model grammar (unverified). The project's own README.md and CLAUDE.md document the model-spec grammar as provider/model with / as the delimiter (e.g., ollama-cloud/minimax-m3:cloud), where everything after the first / is the model ID passed verbatim. However, the GADFLY_MODELS value lists cloud models without a provider prefix (e.g., minimax-m3:cloud rather than ollama-cloud/minimax-m3:cloud), while the Mac models do use the provider/ prefix (m1/qwen3:14b, m5/qwen3.6:35b-mlx). Additionally, GADFLY_PROVIDER_CONCURRENCY references provider ollama-cloud, but no cloud model entry contains that string. The env-var format may differ from the Go API's Parse() grammar, and Gadfly may apply a default provider to bare names — but without access to Gadfly's parser, this cannot be confirmed. Someone with access to the Gadfly image should verify that bare model names like gpt-oss:120b-cloud and qwen3-coder:480b-cloud resolve to the ollama-cloud provider correctly.

🧯 Error handling & edge cases — Minor issues

Now let me carefully evaluate each finding:

Finding 1: Missing synchronize in pull_request types.

The workflow triggers on [opened, reopened, ready_for_review] but not synchronize. This is confirmed at line 15. When a PR author pushes new commits, the workflow won't re-trigger. The issue_comment trigger allows manual re-review, but there's no automatic re-trigger on new commits. This is a valid concern — Gadfly's review can become stale. However, note that the issue_comment trigger does allow manual re-triggering. Still, the missing synchronize means automatic re-review on new pushes won't happen.

This is a legitimate, confirmed finding.

Finding 2: PR_BRANCH is empty for issue_comment and workflow_dispatch triggers.

github.head_ref is only populated for pull_request events. For issue_comment and workflow_dispatch, it will be empty. The comment at line 81-82 says "GADFLY_REPO / GADFLY_PR are derived in-container", which strongly implies Gadfly uses the GITEA_API + GITEA_TOKEN + PR number to fetch branch/draft info from the API server-side. If Gadfly can derive PR_BRANCH internally from the API, then the empty env var is harmless (Gadfly ignores it). But if Gadfly relies on PR_BRANCH being set, it would be broken for non-PR triggers.

The comment explicitly says "GADFLY_REPO / GADFLY_PR are derived in-container" — this is a strong signal that Gadfly does API lookups. However, PR_BRANCH is still being passed as an env var, and it will be empty for two of the three triggers. This is a real edge case: the env var is provided but will be empty. Whether Gadfly uses it or derives it from the API is unknown from this repo's perspective, but the workflow is explicitly passing a value that will be empty in 2/3 trigger scenarios. That's a concrete concern worth noting.

However, given the "GADFLY_REPO / GADFLY_PR are derived in-container" comment, it's quite likely Gadfly derives PR_BRANCH from the API as well. The fact that PR_BRANCH is still passed as an env var could just be for the pull_request trigger case. This is speculative — I can't confirm Gadfly's internals.

Let me keep this as a minor note but soften it given the "derived in-container" comment.

Finding 3: cancel-in-progress: true is aggressive with issue_comment trigger.

This is about design trade-offs. The concurrency group at line 30 uses the PR number, so any new trigger for the same PR cancels the old one. The issue_comment trigger will fire on every comment by an allowed user, and it's not filtered by comment content. However, looking at line 38-43, the if: guard already restricts issue_comment to specific users. Only steve, fizi, and dazed can trigger it. This is a very small set of trusted users who presumably know the workflow. This is more of a design suggestion than a bug.

This is a valid but minor design observation. I'll keep it but note it's a design trade-off, not a bug.

Finding 4: IS_DRAFT is empty for non-pull_request triggers.

github.event.pull_request.draft is only available for pull_request events. For issue_comment and workflow_dispatch, it will be empty/nil. Same reasoning as Finding 2 — if Gadfly derives this from the API, it's fine. The "derived in-container" comment again applies. But IS_DRAFT being empty (not false) could be problematic if Gadfly treats empty as truthy.

Same analysis as Finding 2. This is a concrete edge case in the workflow file, but likely handled by Gadfly's in-container API logic.

Let me reconsider which findings to keep. Findings 2 and 4 are about env vars being empty for certain triggers, but the workflow explicitly comments that PR info is "derived in-container." This strongly suggests Gadfly doesn't rely solely on these env vars. The findings are speculative about Gadfly's internal behavior which we can't verify from this repo.

Finding 1 (missing synchronize) is clearly valid and confirmable from the workflow file alone.

Finding 3 (cancel-in-progress) is a valid design observation but it's really about the interaction of the workflow configuration choices, not a bug.

I'll keep Findings 1 and 3 as minor issues, and drop Findings 2 and 4 since they depend on Gadfly's internals which the workflow comments suggest handle this via API.

Wait, let me reconsider. Finding 2 is still a real issue even if Gadfly derives the branch from the API — passing an empty env var is confusing at best and could cause subtle bugs. But the draft's claim that "Gadfly receives an empty PR_BRANCH in those cases" is technically true. Whether it matters depends on Gadfly internals. Given the explicit comment that PR info is "derived in-container," this is likely handled. I'll drop this.

Finding 4 same reasoning — IS_DRAFT being empty is a fact, but likely handled in-container. Drop it.

Actually, let me reconsider Finding 3 more carefully. The cancel-in-progress: true with issue_comment means every comment from an allowed user on the PR cancels the running review. But the if: guard limits issue_comment to specific users AND requires it to be on a PR (not a regular issue). Still, any comment by those users would cancel. This is a design trade-off noted in the finding. It's valid but minor.

Let me finalize:

Finding 1: Keep — missing synchronize means the workflow won't auto-re-run on new commits. Confirmed.
Finding 2: Drop — the "derived in-container" comment suggests Gadfly fetches PR info from the API, making the empty env var likely harmless.
Finding 3: Keep — but soften the language; it's a design trade-off, not a bug.
Finding 4: Drop — same reasoning as Finding 2.

VERDICT: Minor issues (still minor with the two surviving findings).VERDICT: Minor issues

.gitea/workflows/adversarial-review.yml:15 — Missing synchronize in pull_request types. The trigger fires on [opened, reopened, ready_for_review] but not synchronize. When a PR author pushes new commits, Gadfly will not automatically re-review. The only re-trigger paths are a manual comment from an allowed user or re-opening the PR. Gadfly's review can silently become stale on the very PRs it's designed to review. Fix: add synchronize to the types list.
.gitea/workflows/adversarial-review.yml:31 — cancel-in-progress: true with the issue_comment trigger is aggressive. Any comment by an allowed user on the PR (even "looks good" or "👍") creates a new workflow run that cancels the in-progress one. Combined with the 90-minute timeout and full 11-model fleet, this can waste significant compute and discard partial reviews. The comment body is not filtered before triggering. This is a design trade-off rather than a bug, but worth noting: a casual comment from steve/fizi/dazed kills an in-progress review.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 37s}

### 🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Minor issues</summary> VERDICT: Minor issues - **`.gitea/workflows/adversarial-review.yml:38-43` — `pull_request` events bypass the actor allowlist; defense-in-depth would extend the check to all event types.** The `if:` guard only restricts `issue_comment` events to actors steve/fizi/dazed. For `pull_request` and `workflow_dispatch` events, `github.event_name != 'issue_comment'` is true and the job runs unconditionally. This is a standard CI pattern that relies on the platform (Gitea Actions, like GitHub Actions) withholding secrets from fork-PR `pull_request` events. That platform-level isolation is the primary security boundary, and it is in effect here. However, any repo collaborator — even one not in the allowlist — can open a same-repo PR and trigger the job with full access to all mapped secrets. For repositories where all collaborators are trusted, this is acceptable; if the allowlist is meant to be the authoritative gate, the `if:` should cover `pull_request` and `workflow_dispatch` events too. **Impact**: Low — relies on Gitea's fork-PR secret isolation (standard behavior) and assumes repo collaborators are implicitly trusted. - **`.gitea/workflows/adversarial-review.yml:90` — Raw `COMMENT_BODY` passed as environment variable without sanitization.** `${{ github.event.comment.body }}` is injected directly into the container environment. If the Gadfly image interprets this value in a shell or template context, it could enable command or template injection. The `issue_comment` path is restricted to the three allowed actors (steve/fizi/dazed), making exploitation require a compromised trusted account. **Impact**: Low — trusted-actor-only path, but defense-in-depth would sanitize or validate the comment body before passing it to the container. </details> <details><summary>🎯 Correctness — Minor issues</summary> VERDICT: Minor issues - **`.gitea/workflows/adversarial-review.yml:15` — `pull_request` trigger omits `synchronize`**. The `types` list is `[opened, reopened, ready_for_review]`, which does not include `synchronize`. This means pushing new commits to an existing PR will **not** automatically re-trigger the adversarial review. The `issue_comment` trigger and `workflow_dispatch` offer manual workarounds, but the workflow's own documentation and CLAUDE.md both say to "push to a PR" and wait for Gadfly — implying it should auto-run on every push. If automatic re-review on pushes is desired, add `synchronize` to the types list. *(Confirmed by reading the file.)* - **`.gitea/workflows/adversarial-review.yml:68` — `GADFLY_MODELS` format vs documented `provider/model` grammar (unverified)**. The project's own `README.md` and `CLAUDE.md` document the model-spec grammar as `provider/model` with `/` as the delimiter (e.g., `ollama-cloud/minimax-m3:cloud`), where everything after the first `/` is the model ID passed verbatim. However, the `GADFLY_MODELS` value lists cloud models *without* a provider prefix (e.g., `minimax-m3:cloud` rather than `ollama-cloud/minimax-m3:cloud`), while the Mac models *do* use the `provider/` prefix (`m1/qwen3:14b`, `m5/qwen3.6:35b-mlx`). Additionally, `GADFLY_PROVIDER_CONCURRENCY` references provider `ollama-cloud`, but no cloud model entry contains that string. The env-var format may differ from the Go API's `Parse()` grammar, and Gadfly may apply a default provider to bare names — but without access to Gadfly's parser, this cannot be confirmed. Someone with access to the Gadfly image should verify that bare model names like `gpt-oss:120b-cloud` and `qwen3-coder:480b-cloud` resolve to the `ollama-cloud` provider correctly. </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> Now let me carefully evaluate each finding: **Finding 1: Missing `synchronize` in `pull_request` types.** The workflow triggers on `[opened, reopened, ready_for_review]` but not `synchronize`. This is confirmed at line 15. When a PR author pushes new commits, the workflow won't re-trigger. The `issue_comment` trigger allows manual re-review, but there's no automatic re-trigger on new commits. This is a valid concern — Gadfly's review can become stale. However, note that the `issue_comment` trigger does allow manual re-triggering. Still, the missing `synchronize` means automatic re-review on new pushes won't happen. This is a legitimate, confirmed finding. **Finding 2: `PR_BRANCH` is empty for `issue_comment` and `workflow_dispatch` triggers.** `github.head_ref` is only populated for `pull_request` events. For `issue_comment` and `workflow_dispatch`, it will be empty. The comment at line 81-82 says "GADFLY_REPO / GADFLY_PR are derived in-container", which strongly implies Gadfly uses the GITEA_API + GITEA_TOKEN + PR number to fetch branch/draft info from the API server-side. If Gadfly can derive PR_BRANCH internally from the API, then the empty env var is harmless (Gadfly ignores it). But if Gadfly relies on PR_BRANCH being set, it would be broken for non-PR triggers. The comment explicitly says "GADFLY_REPO / GADFLY_PR are derived in-container" — this is a strong signal that Gadfly does API lookups. However, PR_BRANCH is still being passed as an env var, and it will be empty for two of the three triggers. This is a real edge case: the env var is provided but will be empty. Whether Gadfly uses it or derives it from the API is unknown from this repo's perspective, but the workflow is explicitly passing a value that will be empty in 2/3 trigger scenarios. That's a concrete concern worth noting. However, given the "GADFLY_REPO / GADFLY_PR are derived in-container" comment, it's quite likely Gadfly derives PR_BRANCH from the API as well. The fact that PR_BRANCH is still passed as an env var could just be for the pull_request trigger case. This is speculative — I can't confirm Gadfly's internals. Let me keep this as a minor note but soften it given the "derived in-container" comment. **Finding 3: `cancel-in-progress: true` is aggressive with `issue_comment` trigger.** This is about design trade-offs. The concurrency group at line 30 uses the PR number, so any new trigger for the same PR cancels the old one. The `issue_comment` trigger will fire on every comment by an allowed user, and it's not filtered by comment content. However, looking at line 38-43, the `if:` guard already restricts `issue_comment` to specific users. Only steve, fizi, and dazed can trigger it. This is a very small set of trusted users who presumably know the workflow. This is more of a design suggestion than a bug. This is a valid but minor design observation. I'll keep it but note it's a design trade-off, not a bug. **Finding 4: `IS_DRAFT` is empty for non-`pull_request` triggers.** `github.event.pull_request.draft` is only available for `pull_request` events. For `issue_comment` and `workflow_dispatch`, it will be empty/nil. Same reasoning as Finding 2 — if Gadfly derives this from the API, it's fine. The "derived in-container" comment again applies. But IS_DRAFT being empty (not `false`) could be problematic if Gadfly treats empty as truthy. Same analysis as Finding 2. This is a concrete edge case in the workflow file, but likely handled by Gadfly's in-container API logic. Let me reconsider which findings to keep. Findings 2 and 4 are about env vars being empty for certain triggers, but the workflow explicitly comments that PR info is "derived in-container." This strongly suggests Gadfly doesn't rely solely on these env vars. The findings are speculative about Gadfly's internal behavior which we can't verify from this repo. Finding 1 (missing `synchronize`) is clearly valid and confirmable from the workflow file alone. Finding 3 (cancel-in-progress) is a valid design observation but it's really about the interaction of the workflow configuration choices, not a bug. I'll keep Findings 1 and 3 as minor issues, and drop Findings 2 and 4 since they depend on Gadfly's internals which the workflow comments suggest handle this via API. Wait, let me reconsider. Finding 2 is still a real issue even if Gadfly derives the branch from the API — passing an empty env var is confusing at best and could cause subtle bugs. But the draft's claim that "Gadfly receives an empty PR_BRANCH in those cases" is technically true. Whether it matters depends on Gadfly internals. Given the explicit comment that PR info is "derived in-container," this is likely handled. I'll drop this. Finding 4 same reasoning — IS_DRAFT being empty is a fact, but likely handled in-container. Drop it. Actually, let me reconsider Finding 3 more carefully. The `cancel-in-progress: true` with `issue_comment` means every comment from an allowed user on the PR cancels the running review. But the `if:` guard limits `issue_comment` to specific users AND requires it to be on a PR (not a regular issue). Still, any comment by those users would cancel. This is a design trade-off noted in the finding. It's valid but minor. Let me finalize: - **Finding 1**: Keep — missing `synchronize` means the workflow won't auto-re-run on new commits. Confirmed. - **Finding 2**: Drop — the "derived in-container" comment suggests Gadfly fetches PR info from the API, making the empty env var likely harmless. - **Finding 3**: Keep — but soften the language; it's a design trade-off, not a bug. - **Finding 4**: Drop — same reasoning as Finding 2. VERDICT: Minor issues (still minor with the two surviving findings).</think>VERDICT: Minor issues - **`.gitea/workflows/adversarial-review.yml:15` — Missing `synchronize` in `pull_request` types.** The trigger fires on `[opened, reopened, ready_for_review]` but not `synchronize`. When a PR author pushes new commits, Gadfly will not automatically re-review. The only re-trigger paths are a manual comment from an allowed user or re-opening the PR. Gadfly's review can silently become stale on the very PRs it's designed to review. Fix: add `synchronize` to the types list. - **`.gitea/workflows/adversarial-review.yml:31` — `cancel-in-progress: true` with the `issue_comment` trigger is aggressive.** Any comment by an allowed user on the PR (even "looks good" or "👍") creates a new workflow run that cancels the in-progress one. Combined with the 90-minute timeout and full 11-model fleet, this can waste significant compute and discard partial reviews. The comment body is not filtered before triggering. This is a design trade-off rather than a bug, but worth noting: a casual comment from steve/fizi/dazed kills an in-progress review. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 37s

gitea-actions bot commented

2026-06-27 18:32:54 +00:00

🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5)

Verdict: Minor issues · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling

🔒 Security — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 4: context deadline exceeded

🎯 Correctness — Minor issues

Review Summary

Verdict: Minor issues

Finding 1 (Unverified): Inconsistent model ID suffix for ollama-cloud entries in `GADFLY_MODELS`

.gitea/workflows/adversarial-review.yml:68

GADFLY_MODELS: "minimax-m3:cloud,glm-5.2:cloud,...,gpt-oss:120b-cloud,qwen3-coder:480b-cloud,gemma4:cloud,m1/qwen3:14b,m5/qwen3.6:35b-mlx"

Seven of the nine ollama-cloud model entries use :<model-id>:cloud format (e.g., minimax-m3:cloud, gemma4:cloud). Two entries (gpt-oss:120b-cloud, qwen3-coder:480b-cloud) carry different suffixes (:120b-cloud, :480b-cloud).

If Gadfly's internal mapping expects a consistent :cloud suffix to identify ollama-cloud provider models, these two entries would either fail to be routed correctly or map to the wrong provider entirely. Conversely, if they are intentionally different model identifiers (and should still resolve to ollama-cloud), then the inconsistency is cosmetic — but it risks confusing future maintainers or automated validation logic within Gadfly.

Impact: If these 2 models fail to resolve to ollama-cloud, they'd silently become unreviewed by any cloud provider, reducing the adversarial review fleet from 9 to 7 ollama-cloud models without error (the model resolution would happen inside the Gadfly container). This is unverified because I don't have access to Gadfly's source code (gitea.stevedudenhoeffer.com/steve/gadfly).

Suggested fix: Verify against Gadfly's model parsing logic that gpt-oss:120b-cloud and qwen3-coder:480b-cloud are indeed valid ollama-cloud model IDs. If they should map via a :cloud suffix, correct the format (e.g., to match other entries).

🧯 Error handling & edge cases — No material issues found

Findings through "Error handling & edge cases" lens

I reviewed the workflow file thoroughly and cross-referenced it against Gitea Actions behavior to validate each concern below.

No material issues found

After careful review of .gitea/workflows/adversarial-review.yml through the error-handling-and-edge-cases lens, I found no concrete bugs or missing error handlers that would cause failures. Here's why:

The if: guard on line 38–43 correctly handles all trigger types — For pull_request and workflow_dispatch, github.event_name != 'issue_comment' short-circuits to true. For issue_comment, the actor allow-list is enforced, and non-PR-issue comments (on regular Issues) are properly rejected via the github.event.issue.pull_request null check. Verified by reading lines 38–43 of the file.
IS_DRAFT empty-string edge case is benign — For issue_comment and workflow_dispatch, ${{ github.event.pull_request.draft }} (line 89) resolves to empty string. Since the downstream consumer only uses this advisory flag (the review never blocks merges), an empty boolean has no observable failure mode. The value is informational only; downstream code either ignores it or treats any non-"true" as false.
Missing secrets produce no workflow-level failures — Unset secrets (OLLAMA_CLOUD_API_KEY, GADFLY_ENDPOINT_M1, etc., lines 53, 61–62, 83–84) are passed as empty strings to the container. The comments explicitly state "failures only log to stderr and never affect the review" (line 80–81), so this is a design choice, not an oversight. The workflow doesn't need to validate secrets — the container handles them.
No failure handler on the docker:// step is intentional — Line 49 has no if: failure() or if: always() fallback. Since Gadfly is advisory-only and never blocks merges, silent failure is consistent with the design. There's nothing to roll back, no state to clean up, and no blocking consequence of a crash.
Concurrency expression safely handles all three triggers — Line 30's github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number correctly resolves in each case: PR events use .pull_request.number, issue comments use .issue.number, and workflow_dispatch uses .inputs.pr_number. Verified the full expression against the trigger definitions on lines 14–22.
No rollback or cleanup needed — This workflow is single-step (docker container), advisory-only, and produces no state changes that require cleanup. No defer equivalent is needed in CI.

All checked by reading .gitea/workflows/adversarial-review.yml in full (lines 1–92) and cross-referencing the event trigger definitions against each ${{ github.event.* }} reference.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 23m 45s}

### 🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5) **Verdict: Minor issues · ⚠️ 1/3 lens(es) errored** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 4: context deadline exceeded </details> <details><summary>🎯 Correctness — Minor issues</summary> ## Review Summary **Verdict: Minor issues** ### Finding 1 (Unverified): Inconsistent model ID suffix for ollama-cloud entries in `GADFLY_MODELS` `.gitea/workflows/adversarial-review.yml:68` ```yaml GADFLY_MODELS: "minimax-m3:cloud,glm-5.2:cloud,...,gpt-oss:120b-cloud,qwen3-coder:480b-cloud,gemma4:cloud,m1/qwen3:14b,m5/qwen3.6:35b-mlx" ``` Seven of the nine ollama-cloud model entries use `:<model-id>:cloud` format (e.g., `minimax-m3:cloud`, `gemma4:cloud`). Two entries (`gpt-oss:120b-cloud`, `qwen3-coder:480b-cloud`) carry different suffixes (`:120b-cloud`, `:480b-cloud`). If Gadfly's internal mapping expects a consistent `:cloud` suffix to identify ollama-cloud provider models, these two entries would either fail to be routed correctly or map to the wrong provider entirely. Conversely, if they are intentionally different model identifiers (and should still resolve to ollama-cloud), then the inconsistency is cosmetic — but it risks confusing future maintainers or automated validation logic within Gadfly. **Impact:** If these 2 models fail to resolve to `ollama-cloud`, they'd silently become unreviewed by any cloud provider, reducing the adversarial review fleet from 9 to 7 ollama-cloud models without error (the model resolution would happen inside the Gadfly container). This is unverified because I don't have access to Gadfly's source code (`gitea.stevedudenhoeffer.com/steve/gadfly`). **Suggested fix:** Verify against Gadfly's model parsing logic that `gpt-oss:120b-cloud` and `qwen3-coder:480b-cloud` are indeed valid ollama-cloud model IDs. If they should map via a `:cloud` suffix, correct the format (e.g., to match other entries). </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> ## Findings through "Error handling & edge cases" lens I reviewed the workflow file thoroughly and cross-referenced it against Gitea Actions behavior to validate each concern below. --- ### No material issues found After careful review of `.gitea/workflows/adversarial-review.yml` through the error-handling-and-edge-cases lens, I found no concrete bugs or missing error handlers that would cause failures. Here's why: 1. **The `if:` guard on line 38–43 correctly handles all trigger types** — For `pull_request` and `workflow_dispatch`, `github.event_name != 'issue_comment'` short-circuits to true. For `issue_comment`, the actor allow-list is enforced, and non-PR-issue comments (on regular Issues) are properly rejected via the `github.event.issue.pull_request` null check. Verified by reading lines 38–43 of the file. 2. **`IS_DRAFT` empty-string edge case is benign** — For `issue_comment` and `workflow_dispatch`, `${{ github.event.pull_request.draft }}` (line 89) resolves to empty string. Since the downstream consumer only uses this advisory flag (the review never blocks merges), an empty boolean has no observable failure mode. The value is informational only; downstream code either ignores it or treats any non-"true" as false. 3. **Missing secrets produce no workflow-level failures** — Unset secrets (`OLLAMA_CLOUD_API_KEY`, `GADFLY_ENDPOINT_M1`, etc., lines 53, 61–62, 83–84) are passed as empty strings to the container. The comments explicitly state "failures only log to stderr and never affect the review" (line 80–81), so this is a design choice, not an oversight. The workflow doesn't need to validate secrets — the container handles them. 4. **No failure handler on the `docker://` step is intentional** — Line 49 has no `if: failure()` or `if: always()` fallback. Since Gadfly is advisory-only and never blocks merges, silent failure is consistent with the design. There's nothing to roll back, no state to clean up, and no blocking consequence of a crash. 5. **Concurrency expression safely handles all three triggers** — Line 30's `github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number` correctly resolves in each case: PR events use `.pull_request.number`, issue comments use `.issue.number`, and `workflow_dispatch` uses `.inputs.pr_number`. Verified the full expression against the trigger definitions on lines 14–22. 6. **No rollback or cleanup needed** — This workflow is single-step (docker container), advisory-only, and produces no state changes that require cleanup. No `defer` equivalent is needed in CI. All checked by reading `.gitea/workflows/adversarial-review.yml` in full (lines 1–92) and cross-referencing the event trigger definitions against each `${{ github.event.* }}` reference. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 23m 45s

gitea-actions bot commented

2026-06-27 18:35:06 +00:00

🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

Minor issues

.gitea/workflows/adversarial-review.yml:49 — The Gadfly image is “pinned” with a mutable tag (sha-d7f364d), not a content-addressable digest. Tags — even ones named after a git SHA — can be repushed if registry immutability is misconfigured or bypassed. Because this container receives repo write token, cloud API keys, foreman endpoint tokens, and a findings token, a tag-swap would let an attacker run arbitrary code with those credentials or post malicious PR comments. Fix: pin with image@sha256:<digest> (act_runner/GitHub-style docker:// supports this) or document/enforce registry tag immutability for this image.
.gitea/workflows/adversarial-review.yml:38-43,90 — The issue_comment trigger gates on actor identity but not on comment content, so any comment from steve, fizi, or dazed on any PR feeds COMMENT_BODY into a secret-bearing container. This widens the attack surface: a compromised allowed account or a crafted comment body could exploit the container’s parser/LLM invocation. Fix: add a command-prefix check to the if guard (e.g., && startsWith(github.event.comment.body, '/gadfly')) so routine comments don’t trigger a privileged run.

🎯 Correctness — No material issues found

No material issues found

Reviewed through the Correctness lens.

The workflow is purely additive CI/docs configuration; there is no executable Go (or other) code whose logic could be incorrect.
The env var counts and model list match the PR description (9 cloud + M1 + M5). I re-counted: minimax-m3, glm-5.2, glm-5.1, kimi-k2.7-code, deepseek-v4-pro, nemotron-3-super, gpt-oss:120b, qwen3-coder:480b, gemma4 = 9; plus m1/qwen3:14b, m5/qwen3.6:35b-mlx = 11 total. Correct.
Provider concurrency settings are internally consistent with the comment: ollama-cloud=3, m1=1, m5=1; lens concurrency only for cloud; specialists default matches the 3-lens suite.
The if: guard for issue_comment correctly checks github.event.issue.pull_request and allow-lists steve, fizi, dazed; the same list is repeated in GADFLY_ALLOWED_USERS.
I read the surrounding CLAUDE.md and the existing ci.yaml; the new docs section is consistent with existing style and accurately describes the workflow file.
No constants/formulas/thresholds in the diff require first-principles re-derivation; timeouts and step limits are explicit policy choices, not derived values.

Nothing in this change appears semantically incorrect from a correctness standpoint.

🧯 Error handling & edge cases — Minor issues

Minor issues

.gitea/workflows/adversarial-review.yml:87 — PR: ${{ github.event.pull_request.number || github.event.issue.number || github.event.inputs.pr_number }} can resolve to empty for any unexpected event shape, passing PR= to the container. Defensive guard (e.g., an early step that fails fast if PR is blank) would prevent a run that silently targets nothing. Verified by reading the workflow and the three on: triggers.
.gitea/workflows/adversarial-review.yml:88 — PR_BRANCH: ${{ github.head_ref }} is only defined for pull_request events; on issue_comment and workflow_dispatch it is empty. If the container uses PR_BRANCH for checkout/branch selection, this could misroute those triggers. Suggest making it conditional or omitting it when not applicable.
.gitea/workflows/adversarial-review.yml:89 — IS_DRAFT: ${{ github.event.pull_request.draft }} is empty for non-pull_request events. The container likely treats empty as non-draft, but that is an implicit default. Suggest ${{ github.event.pull_request.draft || false }} to make the default explicit.
.gitea/workflows/adversarial-review.yml:30 — concurrency group gadfly-${{ ... }} falls back to gadfly- if none of the PR-number expressions resolve, which could cause cross-run collisions if an edge-case event slips through. The if: guard mostly prevents this, but a ${{ ... || 'unknown' }} fallback is cheap insurance.
CLAUDE.md:146-150 — Documents mcp__gadfly__record_finding_grade, list_findings, and scoreboard as the grading interface, but does not say how to install/verify the MCP or what to do if those tools are absent. Adds operational friction if a reader tries to follow the loop without the MCP configured. Suggest a one-line note/link about MCP setup.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 55s}

### 🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Minor issues</summary> **Minor issues** - `.gitea/workflows/adversarial-review.yml:49` — The Gadfly image is “pinned” with a mutable tag (`sha-d7f364d`), not a content-addressable digest. Tags — even ones named after a git SHA — can be repushed if registry immutability is misconfigured or bypassed. Because this container receives repo write token, cloud API keys, foreman endpoint tokens, and a findings token, a tag-swap would let an attacker run arbitrary code with those credentials or post malicious PR comments. **Fix:** pin with `image@sha256:<digest>` (act_runner/GitHub-style `docker://` supports this) or document/enforce registry tag immutability for this image. - `.gitea/workflows/adversarial-review.yml:38-43,90` — The `issue_comment` trigger gates on actor identity but not on comment content, so any comment from `steve`, `fizi`, or `dazed` on any PR feeds `COMMENT_BODY` into a secret-bearing container. This widens the attack surface: a compromised allowed account or a crafted comment body could exploit the container’s parser/LLM invocation. **Fix:** add a command-prefix check to the `if` guard (e.g., `&& startsWith(github.event.comment.body, '/gadfly')`) so routine comments don’t trigger a privileged run. </details> <details><summary>🎯 Correctness — No material issues found</summary> **No material issues found** Reviewed through the **Correctness** lens. - The workflow is purely additive CI/docs configuration; there is no executable Go (or other) code whose logic could be incorrect. - The env var counts and model list match the PR description (9 cloud + M1 + M5). I re-counted: `minimax-m3`, `glm-5.2`, `glm-5.1`, `kimi-k2.7-code`, `deepseek-v4-pro`, `nemotron-3-super`, `gpt-oss:120b`, `qwen3-coder:480b`, `gemma4` = 9; plus `m1/qwen3:14b`, `m5/qwen3.6:35b-mlx` = 11 total. Correct. - Provider concurrency settings are internally consistent with the comment: `ollama-cloud=3`, `m1=1`, `m5=1`; lens concurrency only for cloud; specialists default matches the 3-lens suite. - The `if:` guard for `issue_comment` correctly checks `github.event.issue.pull_request` and allow-lists `steve`, `fizi`, `dazed`; the same list is repeated in `GADFLY_ALLOWED_USERS`. - I read the surrounding `CLAUDE.md` and the existing `ci.yaml`; the new docs section is consistent with existing style and accurately describes the workflow file. - No constants/formulas/thresholds in the diff require first-principles re-derivation; timeouts and step limits are explicit policy choices, not derived values. Nothing in this change appears semantically incorrect from a correctness standpoint. </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> **Minor issues** - `.gitea/workflows/adversarial-review.yml:87` — `PR: ${{ github.event.pull_request.number || github.event.issue.number || github.event.inputs.pr_number }}` can resolve to empty for any unexpected event shape, passing `PR=` to the container. Defensive guard (e.g., an early step that fails fast if `PR` is blank) would prevent a run that silently targets nothing. Verified by reading the workflow and the three `on:` triggers. - `.gitea/workflows/adversarial-review.yml:88` — `PR_BRANCH: ${{ github.head_ref }}` is only defined for `pull_request` events; on `issue_comment` and `workflow_dispatch` it is empty. If the container uses `PR_BRANCH` for checkout/branch selection, this could misroute those triggers. Suggest making it conditional or omitting it when not applicable. - `.gitea/workflows/adversarial-review.yml:89` — `IS_DRAFT: ${{ github.event.pull_request.draft }}` is empty for non-`pull_request` events. The container likely treats empty as non-draft, but that is an implicit default. Suggest `${{ github.event.pull_request.draft || false }}` to make the default explicit. - `.gitea/workflows/adversarial-review.yml:30` — concurrency group `gadfly-${{ ... }}` falls back to `gadfly-` if none of the PR-number expressions resolve, which could cause cross-run collisions if an edge-case event slips through. The `if:` guard mostly prevents this, but a `${{ ... || 'unknown' }}` fallback is cheap insurance. - `CLAUDE.md:146-150` — Documents `mcp__gadfly__record_finding_grade`, `list_findings`, and `scoreboard` as the grading interface, but does not say how to install/verify the MCP or what to do if those tools are absent. Adds operational friction if a reader tries to follow the loop without the MCP configured. Suggest a one-line note/link about MCP setup. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 55s

gitea-actions bot commented

2026-06-27 18:36:31 +00:00

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Now let me verify the key claims in each finding against the actual code and repository context.

Finding 1 — The if: guard on lines 38-43 is:

if: >-
  github.event_name != 'issue_comment'
  || (github.event.issue.pull_request
      && (github.actor == 'steve'
          || github.actor == 'fizi'
          || github.actor == 'dazed'))

This is factually correct: for pull_request and workflow_dispatch, github.event_name != 'issue_comment' is true, so the job runs unconditionally. However, the finding frames this as a security vulnerability. In Gitea Actions, only users with write access to the repository can push branches and open same-repo PRs — the same set of people who are already trusted with the repo. Fork PRs do not receive secrets. The issue_comment trigger is the only vector where an external (non-write-access) user can interact, hence the allowlist there. The workflow's own comment on line 36 explicitly documents this design decision: "pull_request + workflow_dispatch are already trusted." This is a deliberate, documented design choice consistent with the repo's threat model, not an unverified assumption. The existing ci.yaml has the same pattern — it runs on all PRs with no actor check. DROP.

Finding 2 — COMMENT_BODY is gated by the issue_comment allowlist (only three named users). The draft itself concedes the blast radius is limited to those three accounts being compromised, and admits the container behavior "could not be verified." This is speculation about an external Docker image's internals, not a confirmed issue in the code under review. DROP.

Finding 3 — Same pattern: PR_BRANCH from github.head_ref. The branch name is set by the PR author (who has write access to push branches). The concern about shell metacharacters is speculation about the external container's behavior, which the draft itself says "could not be verified." DROP.

All three findings fail to confirm against the actual code and repository context.

Verdict: No material issues found

🎯 Correctness — No material issues found

Both findings in the draft are explicitly marked as unverified by the original reviewer, and I cannot positively confirm either against the actual code in this repository:

gpt-oss:120b-cloud: The only gpt-oss reference in the repo (ollama_test.go:250) uses the regular ollama provider (not ollama-cloud), so it tells us nothing about the correct ollama-cloud tag. No other file in the repo confirms or contradicts gpt-oss:120b-cloud.
IS_DRAFT / issue_comment: This is a question about Gitea Actions runtime behavior, not something verifiable from the repo's code. The draft itself concedes it cannot verify Gitea's exact behavior.

Both are dropped.

Verdict: No material issues found

🧯 Error handling & edge cases — Blocking issues found

I've read the actual workflow file and searched the entire .gitea directory. The finding is confirmed:

Line 15 reads types: [opened, reopened, ready_for_review] — synchronize is indeed absent.
No other trigger (push, a separate workflow, etc.) covers the re-review-on-push case.
The ci.yaml uses pull_request: { branches: ["*"] } without explicit types, which defaults to including synchronize — so the pattern is understood and used elsewhere in the repo.
The CLAUDE.md loop explicitly instructs: "Push to a PR… Wait for Gadfly to finish… fix the real ones" — which breaks after the first iteration without synchronize.

The finding survives. It is the only finding, so the verdict remains as-is.

Verdict: Blocking issues found

Missing synchronize pull_request event type — adversarial-review.yml:15. The workflow triggers on [opened, reopened, ready_for_review] but omits synchronize, the event fired when new commits are pushed to an existing PR. This means:
- After the initial PR open, any subsequent push (amend, rebase, additional commits) silently skips re-review.
- The CLAUDE.md loop ("Push to a PR… Wait for Gadfly to finish… fix the real ones") breaks after the first iteration: you fix findings, push, and Gadfly never looks at the new code.
- A draft PR marked ready triggers ready_for_review and gets reviewed, but if the author then pushes fixes, no re-review occurs.
Fix: add synchronize to the types list:
```
pull_request:
  types: [opened, reopened, synchronize, ready_for_review]
```

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 55s}

### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> Now let me verify the key claims in each finding against the actual code and repository context. **Finding 1** — The `if:` guard on lines 38-43 is: ```yaml if: >- github.event_name != 'issue_comment' || (github.event.issue.pull_request && (github.actor == 'steve' || github.actor == 'fizi' || github.actor == 'dazed')) ``` This is factually correct: for `pull_request` and `workflow_dispatch`, `github.event_name != 'issue_comment'` is true, so the job runs unconditionally. However, the finding frames this as a security vulnerability. In Gitea Actions, only users with **write access** to the repository can push branches and open same-repo PRs — the same set of people who are already trusted with the repo. Fork PRs do not receive secrets. The `issue_comment` trigger is the only vector where an external (non-write-access) user can interact, hence the allowlist there. The workflow's own comment on line 36 explicitly documents this design decision: *"pull_request + workflow_dispatch are already trusted."* This is a deliberate, documented design choice consistent with the repo's threat model, not an unverified assumption. The existing `ci.yaml` has the same pattern — it runs on all PRs with no actor check. **DROP.** **Finding 2** — `COMMENT_BODY` is gated by the `issue_comment` allowlist (only three named users). The draft itself concedes the blast radius is limited to those three accounts being compromised, and admits the container behavior "could not be verified." This is speculation about an external Docker image's internals, not a confirmed issue in the code under review. **DROP.** **Finding 3** — Same pattern: `PR_BRANCH` from `github.head_ref`. The branch name is set by the PR author (who has write access to push branches). The concern about shell metacharacters is speculation about the external container's behavior, which the draft itself says "could not be verified." **DROP.** All three findings fail to confirm against the actual code and repository context. ## Verdict: No material issues found </details> <details><summary>🎯 Correctness — No material issues found</summary> Both findings in the draft are explicitly marked as **unverified** by the original reviewer, and I cannot positively confirm either against the actual code in this repository: 1. **`gpt-oss:120b-cloud`**: The only `gpt-oss` reference in the repo (`ollama_test.go:250`) uses the regular ollama provider (not ollama-cloud), so it tells us nothing about the correct ollama-cloud tag. No other file in the repo confirms or contradicts `gpt-oss:120b-cloud`. 2. **`IS_DRAFT` / `issue_comment`**: This is a question about Gitea Actions runtime behavior, not something verifiable from the repo's code. The draft itself concedes it cannot verify Gitea's exact behavior. Both are dropped. --- ## Verdict: No material issues found </details> <details><summary>🧯 Error handling & edge cases — Blocking issues found</summary> I've read the actual workflow file and searched the entire `.gitea` directory. The finding is confirmed: - **Line 15** reads `types: [opened, reopened, ready_for_review]` — `synchronize` is indeed absent. - No other trigger (`push`, a separate workflow, etc.) covers the re-review-on-push case. - The `ci.yaml` uses `pull_request: { branches: ["*"] }` without explicit types, which defaults to including `synchronize` — so the pattern is understood and used elsewhere in the repo. - The CLAUDE.md loop explicitly instructs: "Push to a PR… Wait for Gadfly to finish… fix the real ones" — which breaks after the first iteration without `synchronize`. The finding survives. It is the only finding, so the verdict remains as-is. --- ## Verdict: Blocking issues found - **Missing `synchronize` pull_request event type** — `adversarial-review.yml:15`. The workflow triggers on `[opened, reopened, ready_for_review]` but omits `synchronize`, the event fired when new commits are pushed to an existing PR. This means: - After the initial PR open, any subsequent push (amend, rebase, additional commits) silently skips re-review. - The CLAUDE.md loop ("Push to a PR… Wait for Gadfly to finish… fix the real ones") breaks after the first iteration: you fix findings, push, and Gadfly never looks at the new code. - A draft PR marked ready triggers `ready_for_review` and gets reviewed, but if the author then pushes fixes, no re-review occurs. **Fix:** add `synchronize` to the `types` list: ```yaml pull_request: types: [opened, reopened, synchronize, ready_for_review] ``` </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 55s

gitea-actions bot commented

2026-06-27 18:36:32 +00:00

🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

Blocking issues found

.gitea/workflows/adversarial-review.yml:38: The workflow runs on pull_request events from any fork and workflow_dispatch from any actor with read access, exposing secrets such as GITEA_TOKEN, OLLAMA_CLOUD_API_KEY, GADFLY_ENDPOINT_M1, GADFLY_ENDPOINT_M5, GADFLY_FINDINGS_URL, GADFLY_FINDINGS_TOKEN to untrusted parties.

🎯 Correctness — No material issues found

No material issues found

The workflow file and documentation changes were reviewed for correctness. The workflow's conditional logic was verified to correctly restrict secret-bearing runs to trusted users for issue_comment events while allowing pull_request and workflow_dispatch events without restriction. All variable references and secrets are properly scoped. The Docker image reference uses correct syntax. The documentation update accurately describes the adversarial review process without introducing inconsistencies. No logical bugs, incorrect behavior, or semantic errors were found in the changes.

No material issues found

🧯 Error handling & edge cases — No material issues found

No material issues found
Outside my lens: The workflow_dispatch trigger lacks actor restriction, allowing any user with repository read access to potentially trigger the workflow and expose secrets if repository permissions permit workflow_dispatch by readers.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 6m 40s}

### 🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Blocking issues found</summary> Blocking issues found - .gitea/workflows/adversarial-review.yml:38: The workflow runs on pull_request events from any fork and workflow_dispatch from any actor with read access, exposing secrets such as GITEA_TOKEN, OLLAMA_CLOUD_API_KEY, GADFLY_ENDPOINT_M1, GADFLY_ENDPOINT_M5, GADFLY_FINDINGS_URL, GADFLY_FINDINGS_TOKEN to untrusted parties. </details> <details><summary>🎯 Correctness — No material issues found</summary> No material issues found The workflow file and documentation changes were reviewed for correctness. The workflow's conditional logic was verified to correctly restrict secret-bearing runs to trusted users for issue_comment events while allowing pull_request and workflow_dispatch events without restriction. All variable references and secrets are properly scoped. The Docker image reference uses correct syntax. The documentation update accurately describes the adversarial review process without introducing inconsistencies. No logical bugs, incorrect behavior, or semantic errors were found in the changes. </think> No material issues found </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> No material issues found Outside my lens: The workflow_dispatch trigger lacks actor restriction, allowing any user with repository read access to potentially trigger the workflow and expose secrets if repository permissions permit workflow_dispatch by readers. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 6m 40s

gitea-actions bot commented

2026-06-27 18:37:01 +00:00

🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

The added workflow only introduces a new CI job that runs a trusted, pinned Docker image. All secrets (GITEA_TOKEN, OLLAMA_CLOUD_API_KEY, GADFLY_ENDPOINT_*, GADFLY_FINDINGS_*) are passed as environment variables to that container, which is the intended design for the Gadfly reviewer.
The trigger guard for issue_comment correctly limits execution to a hard‑coded allow‑list of trusted actors, preventing unauthorized users from invoking the job and exposing secrets.
No user‑controlled input is used to construct commands, URLs, or configuration strings that could lead to injection, SSRF, or credential leakage.

All security‑related aspects of the change appear sound.

🎯 Correctness — No material issues found

Verdict: No material issues found

The added workflow file is syntactically valid YAML and correctly references GitHub contexts.
All environment variables are quoted, so colons and commas do not break the file.
The if: guard for issue_comment correctly limits runs to allowed users on PR comments and allows normal PR events.
Concurrency group, triggers, and timeout settings are properly formed.
Documentation changes in CLAUDE.md accurately describe the new workflow without altering existing behavior.

No correctness defects were identified in the added files.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

The change only adds a new GitHub Actions workflow YAML and documentation; it does not introduce any program code where error handling or edge‑case logic could be faulty.
All referenced environment variables are passed through directly to the container; missing secrets will simply result in the container failing to contact the corresponding endpoint, which is already documented as non‑blocking and harmless to the overall workflow.
The concurrency group expression ${{ github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number }} always resolves to a value for the supported events (pull request, issue comment on a PR, or workflow dispatch with a required input), so no empty group name or nil reference can occur.
No new Go (or other) code is added, so there are no new error‑return paths, defers, or panic‑prone sections to evaluate.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 44s}

### 🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> **Verdict: No material issues found** - The added workflow only introduces a new CI job that runs a trusted, pinned Docker image. All secrets (`GITEA_TOKEN`, `OLLAMA_CLOUD_API_KEY`, `GADFLY_ENDPOINT_*`, `GADFLY_FINDINGS_*`) are passed as environment variables to that container, which is the intended design for the Gadfly reviewer. - The trigger guard for `issue_comment` correctly limits execution to a hard‑coded allow‑list of trusted actors, preventing unauthorized users from invoking the job and exposing secrets. - No user‑controlled input is used to construct commands, URLs, or configuration strings that could lead to injection, SSRF, or credential leakage. All security‑related aspects of the change appear sound. </details> <details><summary>🎯 Correctness — No material issues found</summary> **Verdict: No material issues found** * The added workflow file is syntactically valid YAML and correctly references GitHub contexts. * All environment variables are quoted, so colons and commas do not break the file. * The `if:` guard for `issue_comment` correctly limits runs to allowed users on PR comments and allows normal PR events. * Concurrency group, triggers, and timeout settings are properly formed. * Documentation changes in `CLAUDE.md` accurately describe the new workflow without altering existing behavior. No correctness defects were identified in the added files. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **Verdict: No material issues found** - The change only adds a new GitHub Actions workflow YAML and documentation; it does not introduce any program code where error handling or edge‑case logic could be faulty. - All referenced environment variables are passed through directly to the container; missing secrets will simply result in the container failing to contact the corresponding endpoint, which is already documented as non‑blocking and harmless to the overall workflow. - The concurrency group expression `${{ github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number }}` always resolves to a value for the supported events (pull request, issue comment on a PR, or workflow dispatch with a required input), so no empty group name or nil reference can occur. - No new Go (or other) code is added, so there are no new error‑return paths, defers, or panic‑prone sections to evaluate. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 44s

gitea-actions bot commented

2026-06-27 18:37:46 +00:00

🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

The changes introduce a new Gitea workflow for adversarial code review using Gadfly and update documentation accordingly. From a security perspective:

• The workflow properly restricts execution via an allow-list of trusted users (steve, fizi, dazed) for comment-triggered runs
• Secrets are handled appropriately through Gitea's secrets mechanism and only exposed explicitly where needed
• The workflow uses a pinned Docker image (sha-d7f364d) rather than floating tags, which prevents unexpected image updates
• Permissions are scoped minimally (read for contents, write for issues/PRs)
• Concurrency is properly managed with cancellation to prevent resource exhaustion
• Telemetry reporting is opt-in via secrets and doesn't block the main review process

No security vulnerabilities or misconfigurations were identified in the added workflow or documentation.

🎯 Correctness — No material issues found

No material issues found.

🧯 Error handling & edge cases — No material issues found

No material issues found.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5s}

### 🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> VERDICT: No material issues found The changes introduce a new Gitea workflow for adversarial code review using Gadfly and update documentation accordingly. From a security perspective: • The workflow properly restricts execution via an allow-list of trusted users (`steve`, `fizi`, `dazed`) for comment-triggered runs • Secrets are handled appropriately through Gitea's secrets mechanism and only exposed explicitly where needed • The workflow uses a pinned Docker image (`sha-d7f364d`) rather than floating tags, which prevents unexpected image updates • Permissions are scoped minimally (read for contents, write for issues/PRs) • Concurrency is properly managed with cancellation to prevent resource exhaustion • Telemetry reporting is opt-in via secrets and doesn't block the main review process No security vulnerabilities or misconfigurations were identified in the added workflow or documentation. </details> <details><summary>🎯 Correctness — No material issues found</summary> No material issues found. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> No material issues found. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5s

gitea-actions bot commented

2026-06-27 18:37:51 +00:00

🪰 Gadfly review — `gemma4:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

I have reviewed the changes through the Security lens. The primary security surface in this PR is the introduction of a GitHub/Gitea Action that handles secrets and can be triggered by PR comments.

Trigger Validation: The workflow includes a robust if guard at the job level (lines 42-47) that restricts issue_comment triggers to a specific allow-list of trusted users (steve, fizi, dazed). This prevents untrusted contributors from triggering the workflow via comments to potentially leak secrets or exhaust resources.
Secret Handling: Secrets (GITEA_TOKEN, OLLAMA_CLOUD_API_KEY, etc.) are passed as environment variables to the container. This is the standard and secure way to handle secrets in Gitea Actions/GitHub Actions.
Input Sanitization: The COMMENT_BODY is passed to the container (line 91). While the workflow itself doesn't process this data, the security of the system depends on the gadfly container's internal handling of this input (e.g., preventing prompt injection or command injection). Since this is a standalone external image pinned by SHA (sha-d7f364d), and the workflow is restricted to trusted users, the risk is mitigated.
Permissions: The workflow uses a restrictive permissions block (lines 26-28), requesting only the minimum necessary access (contents: read, issues: write, pull-requests: write).

No security vulnerabilities were identified in the workflow configuration.

🎯 Correctness — No material issues found

VERDICT: No material issues found

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

I have reviewed the .gitea/workflows/adversarial-review.yml and CLAUDE.md changes specifically through the lens of Error handling & edge cases.

Workflow Trigger Edge Cases: The concurrency group and the if guard for issue_comment correctly handle the ambiguity of event contexts (PR vs Issue vs Dispatch). The use of || in the concurrency group key ensures that regardless of how the review is triggered, it maps to a stable identifier for that PR, preventing race conditions or leaked runs.
Resource/Timeout Handling: A generous timeout-minutes: 90 is set, and the container itself is configured with GADFLY_TIMEOUT_SECS: "600" and GADFLY_MAX_STEPS: "14". This provides a layered defense against "hanging" models or infinite agentic loops, preventing the Gitea runner from being exhausted indefinitely.
External Dependency Failure: The PR description and comments explicitly acknowledge the edge case of the M1/M5 Macs being offline. The design ("an offline box only fails its own lane") indicates that the system is designed for partial failure (graceful degradation) rather than total failure.
Input Validation: The workflow_dispatch input pr_number is marked as required: true, reducing the risk of null-input panics during manual triggers.
Permissions: The workflow requests only the minimum necessary permissions (contents: read, issues: write, pull-requests: write), limiting the "blast radius" if the agentic tool were to be compromised or malfunction.

Nothing in my area.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 18s}

### 🪰 Gadfly review — `gemma4:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> VERDICT: No material issues found I have reviewed the changes through the **Security** lens. The primary security surface in this PR is the introduction of a GitHub/Gitea Action that handles secrets and can be triggered by PR comments. - **Trigger Validation**: The workflow includes a robust `if` guard at the job level (lines 42-47) that restricts `issue_comment` triggers to a specific allow-list of trusted users (`steve`, `fizi`, `dazed`). This prevents untrusted contributors from triggering the workflow via comments to potentially leak secrets or exhaust resources. - **Secret Handling**: Secrets (`GITEA_TOKEN`, `OLLAMA_CLOUD_API_KEY`, etc.) are passed as environment variables to the container. This is the standard and secure way to handle secrets in Gitea Actions/GitHub Actions. - **Input Sanitization**: The `COMMENT_BODY` is passed to the container (line 91). While the workflow itself doesn't process this data, the security of the system depends on the `gadfly` container's internal handling of this input (e.g., preventing prompt injection or command injection). Since this is a standalone external image pinned by SHA (`sha-d7f364d`), and the workflow is restricted to trusted users, the risk is mitigated. - **Permissions**: The workflow uses a restrictive `permissions` block (lines 26-28), requesting only the minimum necessary access (`contents: read`, `issues: write`, `pull-requests: write`). No security vulnerabilities were identified in the workflow configuration. </details> <details><summary>🎯 Correctness — No material issues found</summary> VERDICT: No material issues found </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> VERDICT: No material issues found I have reviewed the `.gitea/workflows/adversarial-review.yml` and `CLAUDE.md` changes specifically through the lens of **Error handling & edge cases**. * **Workflow Trigger Edge Cases:** The `concurrency` group and the `if` guard for `issue_comment` correctly handle the ambiguity of event contexts (PR vs Issue vs Dispatch). The use of `||` in the concurrency group key ensures that regardless of how the review is triggered, it maps to a stable identifier for that PR, preventing race conditions or leaked runs. * **Resource/Timeout Handling:** A generous `timeout-minutes: 90` is set, and the container itself is configured with `GADFLY_TIMEOUT_SECS: "600"` and `GADFLY_MAX_STEPS: "14"`. This provides a layered defense against "hanging" models or infinite agentic loops, preventing the Gitea runner from being exhausted indefinitely. * **External Dependency Failure:** The PR description and comments explicitly acknowledge the edge case of the M1/M5 Macs being offline. The design ("an offline box only fails its own lane") indicates that the system is designed for partial failure (graceful degradation) rather than total failure. * **Input Validation:** The `workflow_dispatch` input `pr_number` is marked as `required: true`, reducing the risk of null-input panics during manual triggers. * **Permissions:** The workflow requests only the minimum necessary permissions (`contents: read`, `issues: write`, `pull-requests: write`), limiting the "blast radius" if the agentic tool were to be compromised or malfunction. Nothing in my area. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 18s

steve added 1 commit 2026-06-27 18:52:13 +00:00

ci(gadfly): drop the M1 Mac from the review swarm

CI / Build & Test (pull_request) Successful in 10m33s

Details

CI / Tidy (pull_request) Successful in 9m26s

Details

43eb155759

M1 was consistently slow (26-29 min) for zero real findings, so pull it before
this workflow ever fires. Leaves the 9 ollama-cloud models + the M5 Mac;
removes GADFLY_ENDPOINT_M1 and the m1 concurrency entry. Mirrors the same change
on executus.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve merged commit 38b4e1a028 into main

2026-06-27 19:10:54 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: steve/majordomo#2

ci: add Gadfly adversarial PR reviewer + document the review loop #2

What's here

Notes

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Draft review

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

VERDICT: Minor issues

Findings (security lens)

Correctness Review

Findings

What I checked and found clean

VERDICT: No material issues found

🪰 Gadfly review — m1/qwen3:14b (m1)

🪰 Gadfly review — glm-5.1:cloud (ollama-cloud)

🪰 Gadfly review — m5/qwen3.6:35b-mlx (m5)

Review Summary

Finding 1 (Unverified): Inconsistent model ID suffix for ollama-cloud entries in GADFLY_MODELS

Findings through "Error handling & edge cases" lens

No material issues found

🪰 Gadfly review — kimi-k2.7-code:cloud (ollama-cloud)

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: No material issues found

Verdict: No material issues found

Verdict: Blocking issues found

🪰 Gadfly review — nemotron-3-super:cloud (ollama-cloud)

🪰 Gadfly review — gpt-oss:120b-cloud (ollama-cloud)

🪰 Gadfly review — qwen3-coder:480b-cloud (ollama-cloud)

🪰 Gadfly review — gemma4:cloud (ollama-cloud)

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

🪰 Gadfly review — `m1/qwen3:14b` (m1)

🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud)

🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5)

Finding 1 (Unverified): Inconsistent model ID suffix for ollama-cloud entries in `GADFLY_MODELS`

🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud)

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud)

🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud)

🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud)

🪰 Gadfly review — `gemma4:cloud` (ollama-cloud)