feat: live status-board comment — per-model/per-lens review progress #1

Merged
steve merged 4 commits from feat/status-board into main 2026-06-27 19:00:13 +00:00
Owner

Phase 3 of the gadfly-games build.

When several models (each with several lenses) review a PR, mid-run all you'd see is a row of ⏳ Reviewing… placeholders across N per-model comments. This adds one consolidated, live-updating status-board comment that aggregates every model's per-lens progress (queued → running → finished + verdict), so the swarm's progress is visible at a glance — and a watcher (human or agent) can tell when everything is finished.

What it looks like

## 🪰 Gadfly — live review status
1/3 reviewers finished · updated 2026-06-27 18:14:56Z

#### `glm-5.2:cloud` · ollama-cloud — ⏳ 2/4 lenses
- ✅ security — No material issues found
- 🔄 correctness — running
- ⏸️ performance — queued

How it works

  • cmd/gadfly/status.go — opt-in statusWriter (gated on GADFLY_STATUS_FILE) publishes this model's lenses to a JSON file, written atomically (temp + rename) as runSpecialists transitions each lens. Inert when unset, so plain runs and unit tests are unaffected.
  • scripts/status-board.sh — background renderer: polls the status dir, renders + upserts one marker comment every GADFLY_STATUS_POLL_SECS (default 12s), caching the comment id to PATCH in place rather than re-searching each tick.
  • entrypoint.sh — pre-seeds every model as queued (so the full swarm shows from t=0, even models waiting on a lane cap), launches the board, waits only on the review lanes, then signals .done for one final render and reaps the board.

Invariants

  • Advisory / best-effort. The board never touches the binary's stdout or exit code; a failed render/post is logged and retried next tick. The per-model findings comments are unchanged and entirely separate.
  • Default on, disable with GADFLY_STATUS_BOARD=0.

Docs

README config table + new "Live status board" section, example-stub note, CLAUDE.md architecture map.

Tests

gofmt -l clean · go vet quiet · go build ok · go test -race ./... green (new status_test.go covers the lifecycle, the disabled no-op, and unknown-lens handling). status-board.sh render + upsert verified locally under bash with a stubbed curl.

🤖 Generated with Claude Code

**Phase 3** of the gadfly-games build. When several models (each with several lenses) review a PR, mid-run all you'd see is a row of `⏳ Reviewing…` placeholders across N per-model comments. This adds **one consolidated, live-updating status-board comment** that aggregates every model's per-lens progress (`queued → running → finished + verdict`), so the swarm's progress is visible at a glance — and a watcher (human or agent) can tell when everything is finished. ### What it looks like ``` ## 🪰 Gadfly — live review status 1/3 reviewers finished · updated 2026-06-27 18:14:56Z #### `glm-5.2:cloud` · ollama-cloud — ⏳ 2/4 lenses - ✅ security — No material issues found - 🔄 correctness — running - ⏸️ performance — queued ``` ### How it works - **`cmd/gadfly/status.go`** — opt-in `statusWriter` (gated on `GADFLY_STATUS_FILE`) publishes this model's lenses to a JSON file, written **atomically** (temp + rename) as `runSpecialists` transitions each lens. Inert when unset, so plain runs and unit tests are unaffected. - **`scripts/status-board.sh`** — background renderer: polls the status dir, renders + upserts one marker comment every `GADFLY_STATUS_POLL_SECS` (default 12s), caching the comment id to PATCH in place rather than re-searching each tick. - **`entrypoint.sh`** — pre-seeds every model as `queued` (so the full swarm shows from t=0, even models waiting on a lane cap), launches the board, waits **only** on the review lanes, then signals `.done` for one final render and reaps the board. ### Invariants - **Advisory / best-effort.** The board never touches the binary's stdout or exit code; a failed render/post is logged and retried next tick. The per-model **findings** comments are unchanged and entirely separate. - **Default on**, disable with `GADFLY_STATUS_BOARD=0`. ### Docs README config table + new "Live status board" section, example-stub note, CLAUDE.md architecture map. ### Tests `gofmt -l` clean · `go vet` quiet · `go build` ok · `go test -race ./...` green (new `status_test.go` covers the lifecycle, the disabled no-op, and unknown-lens handling). `status-board.sh` render + upsert verified locally under bash with a stubbed curl. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-06-27 18:18:49 +00:00
feat: live status-board comment — per-model/per-lens review progress
Build & push image / build-and-push (pull_request) Successful in 6s
Adversarial Review (Gadfly) / review (pull_request) Successful in 30m1s
1cdda32dbc
Phase 3 of the gadfly-games build. With several models × several lenses
reviewing a PR, all you'd see mid-run is a row of " Reviewing…"
placeholders. Add ONE consolidated, live-updating status-board comment
that aggregates every model's per-lens progress (queued → running →
finished + verdict), so progress is visible at a glance and a watcher
can tell when the whole swarm is done.

- cmd/gadfly: opt-in statusWriter (GADFLY_STATUS_FILE) publishes this
  model's lenses to a JSON file, written atomically (temp+rename) as
  runSpecialists transitions each lens. Inert when unset — plain runs
  and tests are unaffected.
- scripts/status-board.sh: background renderer that polls the status
  dir and upserts one marker comment every GADFLY_STATUS_POLL_SECS
  (default 12s), caching the comment id to PATCH in place. Advisory and
  best-effort; the per-model findings comments are untouched.
- entrypoint.sh: pre-seeds every model as queued, launches the board,
  waits only on the review lanes, then signals .done for a final render.
  Default on; disable with GADFLY_STATUS_BOARD=0.
- Docs: README config table + "Live status board" section, example
  stub note, CLAUDE.md architecture map.

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly review — m1/qwen3:14b (m1)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

The implementation introduces a live status board for Gadfly reviews with the following security considerations addressed:

  1. Environment Variable Handling
    The GITEA_TOKEN is used with proper access controls (repo write access), and the script uses curl with the Authorization header securely. No direct exposure of secrets in the code or logs.

  2. Input Sanitization
    The status_file_for function in entrypoint.sh sanitizes model IDs by replacing non-alphanumeric characters with underscores, preventing path traversal or injection attacks.

  3. Atomic Writes
    The statusWriter in status.go uses atomic writes (temp + rename) to ensure file integrity and prevent partial writes that could be exploited.

  4. JSON Handling
    The render_section function in status-board.sh gracefully handles malformed JSON files by skipping them, and the statusWriter ensures only defined lenses are updated.

  5. Access Control
    The status board is advisory and does not interfere with the actual review process. It uses a separate comment from per-model findings, ensuring no impact on the binary's stdout or exit code.

No security vulnerabilities were found in the code changes.

🎯 Correctness — No material issues found

Verdict: No material issues found

The implementation correctly tracks per-model/per-lens progress through atomic JSON file updates and aggregation. Key correctness aspects verified:

  1. Lens state transitions (queued → running → finished) are properly modeled in modelStatus and lensStatus structs, with tests confirming state persistence and mutation rules.

  2. Atomic writes in statusWriter.flush() use temp files + rename to prevent partial writes, ensuring the status board never reads inconsistent data.

  3. Unknown lens handling in set() safely ignores unrecognized lens names without modifying the status file, as validated by TestStatusWriterUnknownLens.

  4. Model pre-seeding in entrypoint.sh initializes all models as queued, ensuring the status board shows full swarm progress from t=0.

  5. Error resilience in flush() correctly swallows I/O errors (as required by the "advisory/best-effort" invariant).

  6. Status aggregation in status-board.sh accurately combines per-model JSON files into a consolidated view, with proper handling of edge cases like zero lenses or incomplete data.

All logic aligns with the PR's stated invariants and requirements.

🧯 Error handling & edge cases — Blocking issues found

Blocking issues found

  • cmd/gadfly/status.go:67flush() swallows errors when creating the temp file directory. If GADFLY_STATUS_FILE points to a non-existent directory, os.CreateTemp() will fail silently, preventing status updates without any feedback. Verified by reading the code.

  • scripts/status-board.sh:73render_section() silently skips malformed JSON files. If a model's status file is invalid (e.g., due to concurrent writes), the board will omit that model's progress without logging, leading to incomplete status updates. Verified by reading the code.

  • scripts/status-board.sh:113upsert() does not check HTTP response codes from the Gitea API. If the comment post/patch fails (e.g., invalid token, rate limit), the script will retry blindly without knowing the root cause. Verified by reading the code.

  • entrypoint.sh:173status_file_for() may generate excessively long filenames if the model name contains many non-alphanumeric characters. While not a critical failure, it could lead to filesystem path length limitations. Verified by reading the code.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 21m 52s

<!-- gadfly-review:ollama:m1/qwen3:14b --> ### 🪰 Gadfly review — `m1/qwen3:14b` (m1) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> No material issues found The implementation introduces a live status board for Gadfly reviews with the following security considerations addressed: 1. **Environment Variable Handling** The `GITEA_TOKEN` is used with proper access controls (repo write access), and the script uses `curl` with the `Authorization` header securely. No direct exposure of secrets in the code or logs. 2. **Input Sanitization** The `status_file_for` function in `entrypoint.sh` sanitizes model IDs by replacing non-alphanumeric characters with underscores, preventing path traversal or injection attacks. 3. **Atomic Writes** The `statusWriter` in `status.go` uses atomic writes (temp + rename) to ensure file integrity and prevent partial writes that could be exploited. 4. **JSON Handling** The `render_section` function in `status-board.sh` gracefully handles malformed JSON files by skipping them, and the `statusWriter` ensures only defined lenses are updated. 5. **Access Control** The status board is advisory and does not interfere with the actual review process. It uses a separate comment from per-model findings, ensuring no impact on the binary's stdout or exit code. No security vulnerabilities were found in the code changes. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> Verdict: No material issues found The implementation correctly tracks per-model/per-lens progress through atomic JSON file updates and aggregation. Key correctness aspects verified: 1. **Lens state transitions** (queued → running → finished) are properly modeled in `modelStatus` and `lensStatus` structs, with tests confirming state persistence and mutation rules. 2. **Atomic writes** in `statusWriter.flush()` use temp files + rename to prevent partial writes, ensuring the status board never reads inconsistent data. 3. **Unknown lens handling** in `set()` safely ignores unrecognized lens names without modifying the status file, as validated by `TestStatusWriterUnknownLens`. 4. **Model pre-seeding** in `entrypoint.sh` initializes all models as queued, ensuring the status board shows full swarm progress from t=0. 5. **Error resilience** in `flush()` correctly swallows I/O errors (as required by the "advisory/best-effort" invariant). 6. **Status aggregation** in `status-board.sh` accurately combines per-model JSON files into a consolidated view, with proper handling of edge cases like zero lenses or incomplete data. All logic aligns with the PR's stated invariants and requirements. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Blocking issues found</summary> Blocking issues found - `cmd/gadfly/status.go:67` — `flush()` swallows errors when creating the temp file directory. If `GADFLY_STATUS_FILE` points to a non-existent directory, `os.CreateTemp()` will fail silently, preventing status updates without any feedback. Verified by reading the code. - `scripts/status-board.sh:73` — `render_section()` silently skips malformed JSON files. If a model's status file is invalid (e.g., due to concurrent writes), the board will omit that model's progress without logging, leading to incomplete status updates. Verified by reading the code. - `scripts/status-board.sh:113` — `upsert()` does not check HTTP response codes from the Gitea API. If the comment post/patch fails (e.g., invalid token, rate limit), the script will retry blindly without knowing the root cause. Verified by reading the code. - `entrypoint.sh:173` — `status_file_for()` may generate excessively long filenames if the model name contains many non-alphanumeric characters. While not a critical failure, it could lead to filesystem path length limitations. Verified by reading the code. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 21m 52s</sub>

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues

  • scripts/status-board.sh:33 / :98-99 — the board comment marker is a static, nonce-less literal, so comment discovery is bound to the PR only, not to the current run. find_existing paginates issues/${PR}/comments and selects the first comment whose body starts with <!-- gadfly-status-board --> (line 33/98-99); the marker contains no run/commit/repo nonce. The upsert path then uses the write-scoped GITEA_TOKEN to PATCH that comment (line 114). Impact: on a PR that previously ran Gadfly, a new run reuses/overwrites the stale board comment rather than creating a fresh one (this is at least intentional for restart recovery per the comment at lines 90-91), and — more relevantly — any comment in that PR whose body begins with that exact public literal would be PATCHed by the bot using its repo-write token. This is a minor authorization over-reach: the token is used to modify a comment the current run did not create, keyed solely on a guessable prefix. By contrast, run.sh:44's per-model marker <!-- gadfly-review:${PROVIDER}:${MODEL} --> at least binds to model identity; the board marker is fully static. Suggested fix: include a per-run nonce (e.g. commit SHA or a generated run id) in MARKER, e.g. <!-- gadfly-status-board:${GADFLY_RUN_ID} -->, and only PATCH comments whose body starts with the run-specific marker. No authn/authz gap in the token usage itself (it is used with the intended repo-write scope for comment CRUD, exactly as run.sh already does); no injection or secret-leak path (URL path components are operator env, the token is sent only as an Authorization header and never embedded in the posted body or logged).
🎯 Correctness — Minor issues

Both findings are confirmed against the actual code.

Finding 1 verified: run.sh has three paths that exit/skip before the binary runs: empty diff (exit 0 at line 103), missing Ollama Cloud key (lines 145-146 set REVIEW and skip), and binary not found (lines 148-150). The entrypoint pre-seeds each model at lines 222-226 as {done:false, lenses:[]}. Only the binary's statusWriter (status.go) ever sets done:true. status-board.sh line 77 increments the "finished" counter only when .done == true, and line 58 renders lenses:[] + done:false as " waiting to start". The final render (lines 127-129, one last upsert after .done sentinel) would perpetually show such a model as waiting and the header N/M would never reach M/M.

Finding 2 verified: entrypoint.sh:162 tr -c '[:alnum:]._-' '_' maps / and : to _, so m1/qwen3:14bm1_qwen3_14b.json, colliding with a model id m1_qwen3_14b. Real but low-probability.

VERDICT: Minor issues

  • entrypoint.sh:222-226 / scripts/run.sh:103,145-146,148-150 — pre-seeded status file is never marked done when a model is skipped before the binary runs. The entrypoint pre-seeds every model as {lenses:[], done:false}. The board's "finished" counter (status-board.sh:77) only increments for files with .done == true, which is only ever set by the binary's statusWriter.set. But run.sh has several paths that never invoke the binary: empty diff (exit 0 at run.sh:103), missing Ollama Cloud key (run.sh:145-146 → REVIEW="⚠️ …" and skips the binary), and binary not found (run.sh:148-150). In all of these the model's pre-seeded file stays {lenses:[], done:false} forever. Consequence: the final board render shows that model perpetually as " waiting to start" (status-board.sh:58) and the header N/M reviewers finished can never reach M/M — directly breaking the PR's stated invariant that "a watcher can tell when everything is finished." Suggested fix: have run.sh (or review_one) write a terminal status (done:true, or a single errored lens) when the binary is skipped, or have the board treat a stale lenses:[] file as finished/skipped on the final render.

  • entrypoint.sh:162status_file_for sanitization can collide for distinct model ids. tr -c '[:alnum:]._-' '_' maps every non-alnum/._- char to _, so e.g. m1/qwen3:14b and a hypothetical m1_qwen3_14b both become m1_qwen3_14b.json, and one model's status file silently overwrites the other's. Unlikely with real model ids, but it's a real correctness hazard; the board would then show only one of the two models. Consider including a short hash of the original id in the filename to guarantee uniqueness.

🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

  • scripts/status-board.sh:31,130 — busy-loop on invalid GADFLY_STATUS_POLL_SECS. POLL="${GADFLY_STATUS_POLL_SECS:-12}" (line 31) is never validated. The script uses set -uo pipefail with no set -e (line 24), so if a user sets GADFLY_STATUS_POLL_SECS="" or a non-numeric value (e.g. "12s"), sleep "$POLL" (line 130) fails and the error is silently swallowed. The while : loop (line 127) then spins with no delay, calling upsert "$(render_body)" (line 128, 2+ curl hits to the Gitea API) as fast as it can — an accidental API-hammering tight loop. Verified: there is no numeric check anywhere between line 31 and the sleep at line 130. Suggested fix: coerce to a positive int, e.g. [[ "$POLL" =~ ^[0-9]+$ ]] && [ "$POLL" -ge 1 ] || POLL=12.

  • entrypoint.sh:162 — status-file name collisions from over-aggressive sanitization. status_file_for() flattens the model id with tr -c '[:alnum:]._-' '_', so two distinct models can map to the same filename (e.g. m1/qwen:14b and m1_qwen:14b both → m1_qwen_14b.json). Two model processes would then overwrite each other's status file, and the board would render a single flickering section instead of two. The JSON carries the real model/provider, but uniqueness is the documented contract (the comment at lines 158-160 states "this just needs to be unique per model"). Low severity for realistic model lists, but the edge case is real on unusual input. Suggested fix: hash the raw id into the filename (e.g. ..._$(printf %s "$1" | md5sum | cut -c1-8).json) while keeping a human-readable prefix.

  • cmd/gadfly/main.go:236-240 — a panicked lens is never marked finished. sw.set(sp.Name, lensRunning, ...) (line 236) is called before reviewWithSpecialist, and sw.set(sp.Name, lensFinished, ...) (line 240) only runs after it returns. The goroutine's defers (lines 234-235) only do wg.Done/sem drain — there is no recover, and grepping cmd/gadfly confirms none exists. If the specialist pass panics, the goroutine crashes the whole binary (pre-existing behavior) and that model's status file is left stuck in the running state — the board's final render shows it as never done. Since the process dies this is largely moot for that run, but it is an unhappy path the diff introduces into the status surface. Low severity; noting for completeness.

No issues found with the atomic-write/cleanup discipline in flush() or the .done final-render sequencing — those unhappy paths are handled correctly.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 6m 21s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> VERDICT: Minor issues - **`scripts/status-board.sh:33` / `:98-99` — the board comment marker is a static, nonce-less literal, so comment discovery is bound to the PR only, not to the current run.** `find_existing` paginates `issues/${PR}/comments` and selects the first comment whose `body` starts with `<!-- gadfly-status-board -->` (line 33/98-99); the marker contains no run/commit/repo nonce. The `upsert` path then uses the write-scoped `GITEA_TOKEN` to PATCH that comment (line 114). Impact: on a PR that previously ran Gadfly, a new run reuses/overwrites the stale board comment rather than creating a fresh one (this is at least intentional for restart recovery per the comment at lines 90-91), and — more relevantly — any comment in that PR whose body begins with that exact public literal would be PATCHed by the bot using its repo-write token. This is a minor authorization over-reach: the token is used to modify a comment the current run did not create, keyed solely on a guessable prefix. By contrast, `run.sh:44`'s per-model marker `<!-- gadfly-review:${PROVIDER}:${MODEL} -->` at least binds to model identity; the board marker is fully static. Suggested fix: include a per-run nonce (e.g. commit SHA or a generated run id) in `MARKER`, e.g. `<!-- gadfly-status-board:${GADFLY_RUN_ID} -->`, and only PATCH comments whose body starts with the run-specific marker. No authn/authz gap in the token usage itself (it is used with the intended repo-write scope for comment CRUD, exactly as `run.sh` already does); no injection or secret-leak path (URL path components are operator env, the token is sent only as an `Authorization` header and never embedded in the posted body or logged). </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> Both findings are confirmed against the actual code. **Finding 1 verified:** `run.sh` has three paths that exit/skip before the binary runs: empty diff (`exit 0` at line 103), missing Ollama Cloud key (lines 145-146 set `REVIEW` and skip), and binary not found (lines 148-150). The entrypoint pre-seeds each model at lines 222-226 as `{done:false, lenses:[]}`. Only the binary's `statusWriter` (status.go) ever sets `done:true`. `status-board.sh` line 77 increments the "finished" counter only when `.done == true`, and line 58 renders `lenses:[]` + `done:false` as "⏳ waiting to start". The final render (lines 127-129, one last upsert after `.done` sentinel) would perpetually show such a model as waiting and the header `N/M` would never reach `M/M`. **Finding 2 verified:** `entrypoint.sh:162` `tr -c '[:alnum:]._-' '_'` maps `/` and `:` to `_`, so `m1/qwen3:14b` → `m1_qwen3_14b.json`, colliding with a model id `m1_qwen3_14b`. Real but low-probability. **VERDICT: Minor issues** - **`entrypoint.sh:222-226` / `scripts/run.sh:103,145-146,148-150` — pre-seeded status file is never marked done when a model is skipped before the binary runs.** The entrypoint pre-seeds every model as `{lenses:[], done:false}`. The board's "finished" counter (`status-board.sh:77`) only increments for files with `.done == true`, which is only ever set by the binary's `statusWriter.set`. But `run.sh` has several paths that never invoke the binary: empty diff (`exit 0` at run.sh:103), missing Ollama Cloud key (run.sh:145-146 → `REVIEW="⚠️ …"` and skips the binary), and binary not found (run.sh:148-150). In all of these the model's pre-seeded file stays `{lenses:[], done:false}` forever. Consequence: the final board render shows that model perpetually as "⏳ waiting to start" (`status-board.sh:58`) and the header `N/M reviewers finished` can never reach `M/M` — directly breaking the PR's stated invariant that "a watcher can tell when everything is finished." Suggested fix: have `run.sh` (or `review_one`) write a terminal status (`done:true`, or a single errored lens) when the binary is skipped, or have the board treat a stale `lenses:[]` file as finished/skipped on the final render. - **`entrypoint.sh:162` — `status_file_for` sanitization can collide for distinct model ids.** `tr -c '[:alnum:]._-' '_'` maps every non-alnum/`._-` char to `_`, so e.g. `m1/qwen3:14b` and a hypothetical `m1_qwen3_14b` both become `m1_qwen3_14b.json`, and one model's status file silently overwrites the other's. Unlikely with real model ids, but it's a real correctness hazard; the board would then show only one of the two models. Consider including a short hash of the original id in the filename to guarantee uniqueness. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> **VERDICT: Minor issues** - **`scripts/status-board.sh:31,130` — busy-loop on invalid `GADFLY_STATUS_POLL_SECS`.** `POLL="${GADFLY_STATUS_POLL_SECS:-12}"` (line 31) is never validated. The script uses `set -uo pipefail` with **no `set -e`** (line 24), so if a user sets `GADFLY_STATUS_POLL_SECS=""` or a non-numeric value (e.g. `"12s"`), `sleep "$POLL"` (line 130) fails and the error is silently swallowed. The `while :` loop (line 127) then spins with no delay, calling `upsert "$(render_body)"` (line 128, 2+ `curl` hits to the Gitea API) as fast as it can — an accidental API-hammering tight loop. Verified: there is no numeric check anywhere between line 31 and the `sleep` at line 130. Suggested fix: coerce to a positive int, e.g. `[[ "$POLL" =~ ^[0-9]+$ ]] && [ "$POLL" -ge 1 ] || POLL=12`. - **`entrypoint.sh:162` — status-file name collisions from over-aggressive sanitization.** `status_file_for()` flattens the model id with `tr -c '[:alnum:]._-' '_'`, so two distinct models can map to the same filename (e.g. `m1/qwen:14b` and `m1_qwen:14b` both → `m1_qwen_14b.json`). Two model processes would then overwrite each other's status file, and the board would render a single flickering section instead of two. The JSON carries the real model/provider, but uniqueness is the documented contract (the comment at lines 158-160 states "this just needs to be unique per model"). Low severity for realistic model lists, but the edge case is real on unusual input. Suggested fix: hash the raw id into the filename (e.g. `..._$(printf %s "$1" | md5sum | cut -c1-8).json`) while keeping a human-readable prefix. - **`cmd/gadfly/main.go:236-240` — a panicked lens is never marked finished.** `sw.set(sp.Name, lensRunning, ...)` (line 236) is called before `reviewWithSpecialist`, and `sw.set(sp.Name, lensFinished, ...)` (line 240) only runs after it returns. The goroutine's defers (lines 234-235) only do `wg.Done`/sem drain — there is no `recover`, and grepping `cmd/gadfly` confirms none exists. If the specialist pass panics, the goroutine crashes the whole binary (pre-existing behavior) and that model's status file is left stuck in the `running` state — the board's final render shows it as never `done`. Since the process dies this is largely moot for that run, but it is an unhappy path the diff introduces into the status surface. Low severity; noting for completeness. No issues found with the atomic-write/cleanup discipline in `flush()` or the `.done` final-render sequencing — those unhappy paths are handled correctly. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 6m 21s</sub>

🪰 Gadfly review — m5/qwen3.6:35b-mlx (m5)

Review incomplete — all lenses errored — 3 reviewers: security, correctness, error-handling

🔒 Security⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 10: context deadline exceeded

🎯 Correctness⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 3: context deadline exceeded

🧯 Error handling & edge cases⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 9: context deadline exceeded

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 30m 0s

<!-- gadfly-review:ollama:m5/qwen3.6:35b-mlx --> ### 🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5) **Review incomplete — all lenses errored** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 10: context deadline exceeded </details> <details><summary><b>🎯 Correctness</b> — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 3: context deadline exceeded </details> <details><summary><b>🧯 Error handling & edge cases</b> — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 9: context deadline exceeded </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 30m 0s</sub>

🪰 Gadfly review — deepseek-v4-flash:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

🔒 Security — Verdict: No material issues found

After tracing the data flow end-to-end, I find no material security vulnerabilities introduced by this change. Here's what I verified:

Status file content is tightly controlled. The verdict strings written to the JSON come from verdict.label() (consolidate.go:18-29), which returns one of four fixed strings ("No material issues found", "Minor issues", "Blocking issues found", "Reviewed"). Lens names come from the configured Specialist.Name list. Model/provider names come from environment variables set by entrypoint.sh. No user-supplied or model-generated content flows into the status file unchecked.

Atomic writes prevent partial-read races. status.go:111-130 uses os.CreateTemp + os.Rename within the same directory, so the board script (status-board.sh) never reads a half-written file.

API calls are injection-safe. The upsert function in status-board.sh:109-124 encodes the comment body through jq -n --arg b "$body" '{body:$b}', which properly JSON-escapes the content before sending it to the Gitea API. The BOARD_ID and PR values used in URL paths come from the API response (numeric ID) and a required env var respectively — no injection vector.

File paths are sanitized. entrypoint.sh:162 uses tr -c '[:alnum:]._-' '_' on the model name before constructing the status file path, preventing any path-traversal attempt through model names.

No new secret exposure. The GITEA_TOKEN is used in curl commands with -H "Authorization: token ${GITEA_TOKEN}", the same pattern already used throughout run.sh and entrypoint.sh. The status file itself contains only metadata (model name, provider, lens names, verdicts) — no credentials or tokens.

The board is properly gated. GADFLY_STATUS_BOARD=0 disables it entirely (entrypoint.sh:217), and the statusWriter is inert when GADFLY_STATUS_FILE is empty (status.go:76-78), so plain runs and unit tests are unaffected.

The only observation worth noting (not a finding) is that the GITEA_TOKEN is visible in the process list via curl command arguments — but this is the same pattern used throughout the existing codebase (run.sh, entrypoint.sh) and is standard practice in single-container environments. No change here.

🎯 Correctness — No material issues found

🎯 Correctness — Verdict: No material issues found

After tracing the full data flow through the Go binary, the shell scripts, and the JSON file format, I find no logic bugs, incorrect constants, or semantic errors introduced by this change.

What I verified:

  • The statusWriter atomic-write pattern (temp file + rename) is correct and safe for concurrent goroutines.
  • The set() method's done computation correctly requires all lenses to be in lensFinished state.
  • The modelStatus JSON field tags (json:"name", json:"state", etc.) match the field names used by the jq filter in render_section().
  • The verdict.label() output (e.g. "No material issues found") is correctly passed to sw.set() as the verdict string.
  • The pre-seed files (with lenses:[], done:false) are properly overwritten by the binary's newStatusWriter, and the board's render_section handles the empty-lenses case with $tot == 0.
  • The provider_of() in entrypoint.sh and modelProvider() in Go produce the same provider string for a given model id, so the JSON provider field is consistent between pre-seed and binary-written files.
  • The GADFLY_STATUS_FILE env var flows correctly: entrypoint.shreview_one()run.sh → binary's os.Getenv("GADFLY_STATUS_FILE").
  • The GADFLY_MODEL env var used by newStatusWriter matches the model id used by status_file_for() to derive the filename, so the binary writes to the same path the board reads from.
  • The render_body printf format string has the correct number of %s/%d placeholders (5) matching the 5 arguments.
  • The render_section jq filter correctly handles errored being null (via // false) when omitempty omits it.

Minor cosmetic note (not a correctness issue):
scripts/status-board.sh:80 — The $'\n\n' separator appended to each section, combined with the \n\n before %s in the printf format string on line 86, produces an extra blank line between model sections in the rendered comment. This is purely cosmetic and does not affect the board's function.

🧯 Error handling & edge cases — No material issues found

🧯 Error handling & edge cases — Verdict: No material issues found

After reading the full source files (status.go, status_test.go, main.go, consolidate.go, specialists.go, model.go) and the shell scripts, the code is well-defended for its advisory/best-effort contract. No blocking or material error-handling issues exist in this lens.

What was checked and why it's clean:

  • Nil/empty receiverset() guards w == nil || w.path == "" at the top; the disabled-writer path is explicitly tested (TestStatusWriterDisabled).
  • Atomic writesflush() uses os.CreateTemp + os.Rename within the same directory, so the board reader never sees a partial file even under concurrent lens goroutines. Temp files are cleaned up on every error path.
  • Mutex protectionset() holds w.mu across the entire read-modify-write of w.st, preventing data races between concurrent lens goroutines.
  • Unknown lens namesset() silently ignores them (tested in TestStatusWriterUnknownLens); no panic, no append, no corruption.
  • Board script resiliencerender_section pipes stderr to /dev/null so a malformed/half-written JSON file (shouldn't happen due to atomic writes, but defensive) simply skips that model for one tick. upsert clears BOARD_ID on PATCH failure so the next tick re-discovers or re-creates the comment.
  • Missing jq — the board script checks for jq and exits 0 immediately; the entrypoint's pre-seed loop uses 2>/dev/null || true, so a missing jq silently disables the board rather than crashing.
  • Empty provider list / no modelsLANE_PIDS is guarded with [ "${#LANE_PIDS[@]}" -gt 0 ] before wait, and the board's render_body handles total == 0 with a fallback message.
  • Temp file leak (minor, acknowledged) — if the Go process crashes between os.CreateTemp and os.Rename/os.Remove, a .status-*.tmp file is left behind. These are in the status directory and have a .tmp extension so the board ignores them, but they could accumulate across crashes. This is consistent with the advisory/best-effort design and is not a functional bug.

Nothing in this lens requires a code change.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 10m 32s

<!-- gadfly-review:ollama:deepseek-v4-flash:cloud --> ### 🪰 Gadfly review — `deepseek-v4-flash:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## 🔒 Security — Verdict: No material issues found After tracing the data flow end-to-end, I find no material security vulnerabilities introduced by this change. Here's what I verified: **Status file content is tightly controlled.** The verdict strings written to the JSON come from `verdict.label()` (`consolidate.go:18-29`), which returns one of four fixed strings (`"No material issues found"`, `"Minor issues"`, `"Blocking issues found"`, `"Reviewed"`). Lens names come from the configured `Specialist.Name` list. Model/provider names come from environment variables set by `entrypoint.sh`. No user-supplied or model-generated content flows into the status file unchecked. **Atomic writes prevent partial-read races.** `status.go:111-130` uses `os.CreateTemp` + `os.Rename` within the same directory, so the board script (`status-board.sh`) never reads a half-written file. **API calls are injection-safe.** The `upsert` function in `status-board.sh:109-124` encodes the comment body through `jq -n --arg b "$body" '{body:$b}'`, which properly JSON-escapes the content before sending it to the Gitea API. The `BOARD_ID` and `PR` values used in URL paths come from the API response (numeric ID) and a required env var respectively — no injection vector. **File paths are sanitized.** `entrypoint.sh:162` uses `tr -c '[:alnum:]._-' '_'` on the model name before constructing the status file path, preventing any path-traversal attempt through model names. **No new secret exposure.** The `GITEA_TOKEN` is used in `curl` commands with `-H "Authorization: token ${GITEA_TOKEN}"`, the same pattern already used throughout `run.sh` and `entrypoint.sh`. The status file itself contains only metadata (model name, provider, lens names, verdicts) — no credentials or tokens. **The board is properly gated.** `GADFLY_STATUS_BOARD=0` disables it entirely (`entrypoint.sh:217`), and the `statusWriter` is inert when `GADFLY_STATUS_FILE` is empty (`status.go:76-78`), so plain runs and unit tests are unaffected. The only observation worth noting (not a finding) is that the `GITEA_TOKEN` is visible in the process list via `curl` command arguments — but this is the same pattern used throughout the existing codebase (`run.sh`, `entrypoint.sh`) and is standard practice in single-container environments. No change here. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> ## 🎯 Correctness — Verdict: No material issues found After tracing the full data flow through the Go binary, the shell scripts, and the JSON file format, I find no logic bugs, incorrect constants, or semantic errors introduced by this change. **What I verified:** - The `statusWriter` atomic-write pattern (temp file + rename) is correct and safe for concurrent goroutines. - The `set()` method's `done` computation correctly requires *all* lenses to be in `lensFinished` state. - The `modelStatus` JSON field tags (`json:"name"`, `json:"state"`, etc.) match the field names used by the `jq` filter in `render_section()`. - The `verdict.label()` output (e.g. `"No material issues found"`) is correctly passed to `sw.set()` as the verdict string. - The pre-seed files (with `lenses:[]`, `done:false`) are properly overwritten by the binary's `newStatusWriter`, and the board's `render_section` handles the empty-lenses case with `$tot == 0`. - The `provider_of()` in `entrypoint.sh` and `modelProvider()` in Go produce the same provider string for a given model id, so the JSON `provider` field is consistent between pre-seed and binary-written files. - The `GADFLY_STATUS_FILE` env var flows correctly: `entrypoint.sh` → `review_one()` → `run.sh` → binary's `os.Getenv("GADFLY_STATUS_FILE")`. - The `GADFLY_MODEL` env var used by `newStatusWriter` matches the model id used by `status_file_for()` to derive the filename, so the binary writes to the same path the board reads from. - The `render_body` `printf` format string has the correct number of `%s`/`%d` placeholders (5) matching the 5 arguments. - The `render_section` jq filter correctly handles `errored` being `null` (via `// false`) when `omitempty` omits it. **Minor cosmetic note (not a correctness issue):** `scripts/status-board.sh:80` — The `$'\n\n'` separator appended to each section, combined with the `\n\n` before `%s` in the `printf` format string on line 86, produces an extra blank line between model sections in the rendered comment. This is purely cosmetic and does not affect the board's function. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> ## 🧯 Error handling & edge cases — Verdict: **No material issues found** After reading the full source files (`status.go`, `status_test.go`, `main.go`, `consolidate.go`, `specialists.go`, `model.go`) and the shell scripts, the code is well-defended for its advisory/best-effort contract. No blocking or material error-handling issues exist in this lens. **What was checked and why it's clean:** - **Nil/empty receiver** — `set()` guards `w == nil || w.path == ""` at the top; the disabled-writer path is explicitly tested (`TestStatusWriterDisabled`). - **Atomic writes** — `flush()` uses `os.CreateTemp` + `os.Rename` within the same directory, so the board reader never sees a partial file even under concurrent lens goroutines. Temp files are cleaned up on every error path. - **Mutex protection** — `set()` holds `w.mu` across the entire read-modify-write of `w.st`, preventing data races between concurrent lens goroutines. - **Unknown lens names** — `set()` silently ignores them (tested in `TestStatusWriterUnknownLens`); no panic, no append, no corruption. - **Board script resilience** — `render_section` pipes stderr to `/dev/null` so a malformed/half-written JSON file (shouldn't happen due to atomic writes, but defensive) simply skips that model for one tick. `upsert` clears `BOARD_ID` on PATCH failure so the next tick re-discovers or re-creates the comment. - **Missing `jq`** — the board script checks for `jq` and exits 0 immediately; the entrypoint's pre-seed loop uses `2>/dev/null || true`, so a missing `jq` silently disables the board rather than crashing. - **Empty provider list / no models** — `LANE_PIDS` is guarded with `[ "${#LANE_PIDS[@]}" -gt 0 ]` before `wait`, and the board's `render_body` handles `total == 0` with a fallback message. - **Temp file leak (minor, acknowledged)** — if the Go process crashes between `os.CreateTemp` and `os.Rename`/`os.Remove`, a `.status-*.tmp` file is left behind. These are in the status directory and have a `.tmp` extension so the board ignores them, but they could accumulate across crashes. This is consistent with the advisory/best-effort design and is not a functional bug. Nothing in this lens requires a code change. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 10m 32s</sub>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

I traced the security-relevant trust boundaries for this PR and verified them against the checked-out code:

  • Authn/authzscripts/status-board.sh:95,114,120 reuses GITEA_TOKEN to POST/PATCH exactly one comment under a hardcoded marker (<!-- gadfly-status-board -->). The token is already required to have repo write access for the existing run.sh flow; the board adds no new privilege. Verified by reading scripts/status-board.sh and scripts/run.sh.
  • Token leakagesay() at scripts/status-board.sh:37 echoes only its own prefix + message to stderr, and the only runtime messages I see ("starting…", "patch of comment ${BOARD_ID} failed", "jq not found") contain no token material. GITEA_TOKEN is never expanded into say's $*. Verified.
  • Path traversal / command injectionentrypoint.sh:162 status_file_for pipes the model id through tr -c '[:alnum:]._-' '_' before concatenation with STATUS_DIR/, so / and : (which appear in real model ids like m1/qwen3:14b) are stripped to _ and the result can never escape the directory. Verified.
  • JSON / shell injection — All dynamic values fed into jq use --arg/-n: entrypoint.sh:223 (--arg model, --arg provider), scripts/status-board.sh:44,77,98,111. Nothing is interpolated into a jq program string. Verified.
  • URL injection${GITEA_API} and ${PR} are operator-controlled (set by the consumer's stub workflow / entrypoint), not user-controlled, so there's no SSRF surface added by this PR. Verified by reading entrypoint.sh:26-29.
  • Atomic write hygienecmd/gadfly/status.go:112-129 uses os.CreateTemp in the same dir + os.Rename, with os.Remove(tmpName) on every error path. No leftover tmp files, no half-written readers. Verified.
  • Insecure defaultsGADFLY_STATUS_BOARD defaults to 1 (on), but the gate at entrypoint.sh:217 and the board script both check it explicitly, and the README documents the opt-out. The advisory nature of the board means even a misrender can't affect review correctness — only PR comment content, which is already trusted to operator-controlled env.

One tiny non-security note I happened to spot while reading the file (and so will mention only as a courtesy, not a lens finding): the tr -c sanitizer is one-way and theoretically allows two distinct model ids (e.g. a/b and a:b) to map to the same STATUS_DIR/*.json path. The JSON inside carries the real model/provider so the comment content stays correct — only the on-disk write races. Not security-relevant.

🎯 Correctness — Minor issues

VERDICT: Minor issues

  • cmd/gadfly/main.go:233-241 — a panicked lens is silently stuck as "🔄 running" forever. The goroutine has defer wg.Done() and defer func(){<-sem}, but no recover. If reviewWithSpecialist panics, the deferred calls run and wg.Wait() returns, but sw.set(...finished...) never executes; the lens stays at state:running in the JSON. Because Done is only recomputed inside set, the model keeps reporting ⏳ 0/N lenses on the board, and the overall "X/Y reviewers finished" counter never increments for it. The binary itself also exits without flipping the lens — the same status file is what the board keeps re-rendering after the entrypoint touches ${STATUS_DIR}/.done. This is the exact failure mode the status board exists to surface, and it is the one it misrepresents most. Fix: defer func(){ if r := recover(); r != nil { sw.set(sp.Name, lensFinished, "panic: "+fmt.Sprint(r), true) } }() after defer wg.Done() (and ideally let the panic still propagate).

  • entrypoint.sh:162 — filename sanitizer can collide for distinct model ids. tr -c '[:alnum:]._-' '_' maps both / and : to _, so m1/qwen3:14b and m1:qwen3/14b (or any pair differing only in those positions) both produce ${STATUS_DIR}/m1_qwen3_14b.json. The second model's binary then overwrites the first's status file mid-run, and the board's "X/Y reviewers finished" counter and per-lens bullets silently drop a reviewer. The very example in the adjacent comment (m1/qwen3:14b) demonstrates the collision. Fix: keep / as a path separator (STATUS_DIR/<provider>/<model>.json) or append a short hash of the original id to disambiguate.

🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

  • cmd/gadfly/main.go:233 — panicking lens goroutine leaves the lens stuck at "running" forever. Inside runSpecialists, the goroutine sets lensRunning before calling reviewWithSpecialist and lensFinished after; there is no defer recover(). A panic in any goroutine terminates the entire Go process, so the binary exits non‑zero, run.sh's || log "model $1 failed (continuing)" swallows the failure, and the status file is left with that lens running and Done=false. The board's final render (after entrypoint.sh touches .done) shows the model as still in progress even though the review crashed. A defer recover() that calls sw.set(sp.Name, lensFinished, "panicked", true) would close the loop. No explicit panic(...) exists in the reviewed code, so this is rare in practice — but agent.Run/majordomo could panic, and the gap is real.

  • entrypoint.sh:162status_file_for sanitization is lossy and can collide. tr -c '[:alnum:]._-' '_' collapses both / and _ (and :, etc.) into _, so distinct model ids like m1/qwen3:14b and m1_qwen3_14b map to the same ${STATUS_DIR}/m1_qwen3_14b.json. The pre-seed loop overwrites the first entry's metadata with the second's, and whichever binary flushes last wins the shared path. The board then silently drops one model. The comment in entrypoint.sh:158-160 claims "just needs to be unique per model" while the transformation isn't injective. Realistic collisions are rare with the default model list, but not impossible (the lane logic supports user-supplied MODELS). Fix: hash the model id (printf '%s' "$1" | sha1sum | cut -c1-16) or use a delimiter-preserving substitution.

  • scripts/status-board.sh:65render_section interpolates .name into Markdown without escaping. Lens names are interpolated verbatim inside **…** (and .model/.provider directly into the header). Built-in lens names are hard-coded and safe, but Specialist.Name is also sourced from .gadfly.yml in the repo under review (cmd/gadfly/specialists.go:104,109) — i.e. PR-controlled. A PR that adds a .gadfly.yml defining a specialist whose name contains **, backticks, or ](http://…) would break the bold span or smuggle a link into the board comment. Same applies to .model/.provider (e.g. m1/qwen3:14b, fine in practice, but the principle holds). jq could pipe these through gsub("[\\\\]|\*|\]\("; " ")` (or similar) before interpolation.

  • scripts/status-board.sh:92find_existing gives up after page 10 (500 comments). Past 500 PR comments, find_existing returns empty, the upsert falls into the POST branch, and a second board comment is created. Within a single invocation BOARD_ID is cached after POST, so further ticks PATCH the new comment — duplicates do not multiply per tick — but on the next container restart (or any path that clears BOARD_ID) the script will POST yet another comment, fragmenting the board across runs. Marker-based dedup is broken past that boundary. Probably fine for real PRs, but a continue past 500 is at odds with "one consolidated comment." Consider raising the cap, paging with a creation-time window, or accepting an explicit BOARD_COMMENT_ID env var.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 14s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## VERDICT: No material issues found I traced the security-relevant trust boundaries for this PR and verified them against the checked-out code: - **Authn/authz** — `scripts/status-board.sh:95,114,120` reuses `GITEA_TOKEN` to POST/PATCH exactly one comment under a hardcoded marker (`<!-- gadfly-status-board -->`). The token is already required to have repo write access for the existing `run.sh` flow; the board adds no new privilege. Verified by reading `scripts/status-board.sh` and `scripts/run.sh`. - **Token leakage** — `say()` at `scripts/status-board.sh:37` echoes only its own prefix + message to stderr, and the only runtime messages I see (`"starting…"`, `"patch of comment ${BOARD_ID} failed"`, `"jq not found"`) contain no token material. `GITEA_TOKEN` is never expanded into `say`'s `$*`. Verified. - **Path traversal / command injection** — `entrypoint.sh:162` `status_file_for` pipes the model id through `tr -c '[:alnum:]._-' '_'` before concatenation with `STATUS_DIR/`, so `/` and `:` (which appear in real model ids like `m1/qwen3:14b`) are stripped to `_` and the result can never escape the directory. Verified. - **JSON / shell injection** — All dynamic values fed into `jq` use `--arg`/`-n`: `entrypoint.sh:223` (`--arg model`, `--arg provider`), `scripts/status-board.sh:44,77,98,111`. Nothing is interpolated into a `jq` program string. Verified. - **URL injection** — `${GITEA_API}` and `${PR}` are operator-controlled (set by the consumer's stub workflow / entrypoint), not user-controlled, so there's no SSRF surface added by this PR. Verified by reading `entrypoint.sh:26-29`. - **Atomic write hygiene** — `cmd/gadfly/status.go:112-129` uses `os.CreateTemp` in the same dir + `os.Rename`, with `os.Remove(tmpName)` on every error path. No leftover tmp files, no half-written readers. Verified. - **Insecure defaults** — `GADFLY_STATUS_BOARD` defaults to `1` (on), but the gate at `entrypoint.sh:217` and the board script both check it explicitly, and the README documents the opt-out. The advisory nature of the board means even a misrender can't affect review correctness — only PR comment content, which is already trusted to operator-controlled env. One tiny non-security note I happened to spot while reading the file (and so will mention only as a courtesy, not a lens finding): the `tr -c` sanitizer is one-way and theoretically allows two distinct model ids (e.g. `a/b` and `a:b`) to map to the same `STATUS_DIR/*.json` path. The JSON inside carries the real model/provider so the comment content stays correct — only the on-disk write races. Not security-relevant. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> ## VERDICT: Minor issues - **`cmd/gadfly/main.go:233-241` — a panicked lens is silently stuck as "🔄 running" forever.** The goroutine has `defer wg.Done()` and `defer func(){<-sem}`, but no `recover`. If `reviewWithSpecialist` panics, the deferred calls run and `wg.Wait()` returns, but `sw.set(...finished...)` never executes; the lens stays at `state:running` in the JSON. Because `Done` is only recomputed inside `set`, the model keeps reporting `⏳ 0/N lenses` on the board, and the overall "X/Y reviewers finished" counter never increments for it. The binary itself also exits without flipping the lens — the same status file is what the board keeps re-rendering after the entrypoint touches `${STATUS_DIR}/.done`. This is the exact failure mode the status board exists to surface, and it is the one it misrepresents most. Fix: `defer func(){ if r := recover(); r != nil { sw.set(sp.Name, lensFinished, "panic: "+fmt.Sprint(r), true) } }()` after `defer wg.Done()` (and ideally let the panic still propagate). - **`entrypoint.sh:162` — filename sanitizer can collide for distinct model ids.** `tr -c '[:alnum:]._-' '_'` maps both `/` and `:` to `_`, so `m1/qwen3:14b` and `m1:qwen3/14b` (or any pair differing only in those positions) both produce `${STATUS_DIR}/m1_qwen3_14b.json`. The second model's binary then overwrites the first's status file mid-run, and the board's "X/Y reviewers finished" counter and per-lens bullets silently drop a reviewer. The very example in the adjacent comment (`m1/qwen3:14b`) demonstrates the collision. Fix: keep `/` as a path separator (`STATUS_DIR/<provider>/<model>.json`) or append a short hash of the original id to disambiguate. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> **VERDICT: Minor issues** - **`cmd/gadfly/main.go:233` — panicking lens goroutine leaves the lens stuck at "running" forever.** Inside `runSpecialists`, the goroutine sets `lensRunning` before calling `reviewWithSpecialist` and `lensFinished` after; there is no `defer recover()`. A panic in any goroutine terminates the entire Go process, so the binary exits non‑zero, `run.sh`'s `|| log "model $1 failed (continuing)"` swallows the failure, and the status file is left with that lens `running` and `Done=false`. The board's final render (after `entrypoint.sh` touches `.done`) shows the model as still in progress even though the review crashed. A `defer recover()` that calls `sw.set(sp.Name, lensFinished, "panicked", true)` would close the loop. No explicit `panic(...)` exists in the reviewed code, so this is rare in practice — but `agent.Run`/majordomo could panic, and the gap is real. - **`entrypoint.sh:162` — `status_file_for` sanitization is lossy and can collide.** `tr -c '[:alnum:]._-' '_'` collapses *both* `/` and `_` (and `:`, etc.) into `_`, so distinct model ids like `m1/qwen3:14b` and `m1_qwen3_14b` map to the same `${STATUS_DIR}/m1_qwen3_14b.json`. The pre-seed loop overwrites the first entry's metadata with the second's, and whichever binary flushes last wins the shared path. The board then silently drops one model. The comment in `entrypoint.sh:158-160` claims "just needs to be unique per model" while the transformation isn't injective. Realistic collisions are rare with the default model list, but not impossible (the lane logic supports user-supplied `MODELS`). Fix: hash the model id (`printf '%s' "$1" | sha1sum | cut -c1-16`) or use a delimiter-preserving substitution. - **`scripts/status-board.sh:65` — `render_section` interpolates `.name` into Markdown without escaping.** Lens names are interpolated verbatim inside `**…**` (and `.model`/`.provider` directly into the header). Built-in lens names are hard-coded and safe, but `Specialist.Name` is also sourced from `.gadfly.yml` in the repo under review (`cmd/gadfly/specialists.go:104,109`) — i.e. PR-controlled. A PR that adds a `.gadfly.yml` defining a specialist whose name contains `**`, backticks, or `](http://…)` would break the bold span or smuggle a link into the board comment. Same applies to `.model`/`.provider` (e.g. `m1/qwen3:14b`, fine in practice, but the principle holds). jq could pipe these through `gsub("[\\\\`]|\\*|\\]\\("; " ")` (or similar) before interpolation. - **`scripts/status-board.sh:92` — `find_existing` gives up after page 10 (500 comments).** Past 500 PR comments, `find_existing` returns empty, the upsert falls into the POST branch, and a second board comment is created. Within a single invocation `BOARD_ID` is cached after POST, so further ticks PATCH the new comment — duplicates do not multiply per tick — but on the next container restart (or any path that clears `BOARD_ID`) the script will POST yet another comment, fragmenting the board across runs. Marker-based dedup is broken past that boundary. Probably fine for real PRs, but a `continue` past 500 is at odds with "one consolidated comment." Consider raising the cap, paging with a creation-time window, or accepting an explicit `BOARD_COMMENT_ID` env var. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 14s</sub>
steve added 1 commit 2026-06-27 18:31:09 +00:00
ci: dogfood the FULL fleet — 9 cloud + 2 Macs, matching mort
Build & push image / build-and-push (pull_request) Successful in 5s
48af34f4ca
The dogfood workflow had a truncated 5-model list (3 cloud + the 2 Macs)
and was missing GADFLY_PROVIDER_LENS_CONCURRENCY. Restore mort's full
fleet so gadfly reviews its own PRs with the same 11 reviewers and the
model-quality scoreboard is comparable across both repos:

  9 cloud: minimax-m3, glm-5.2, glm-5.1, kimi-k2.7-code, deepseek-v4-pro,
           nemotron-3-super, gpt-oss:120b, qwen3-coder:480b, gemma4
  2 local: m1/qwen3:14b, m5/qwen3.6:35b-mlx

GADFLY_MODELS / *_CONCURRENCY / *_LENS_CONCURRENCY now match mort's
adversarial-review.yml verbatim.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve added 1 commit 2026-06-27 18:39:34 +00:00
ci: drop M1 from gadfly's dogfood swarm (keep M5)
Build & push image / build-and-push (pull_request) Successful in 4s
0e5e0ff089
The M1 Pro lane is too slow / low-signal for reviewing gadfly's own
PRs, so remove it from the dogfood fleet: out of GADFLY_MODELS, out of
GADFLY_PROVIDER_CONCURRENCY, and its GADFLY_ENDPOINT_M1 mapping dropped.
M5 stays. (mort still runs both Macs.) Fleet is now 9 cloud + M5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve added 1 commit 2026-06-27 18:56:44 +00:00
fix: fold in gadfly's own review findings (3 real bugs)
Build & push image / build-and-push (pull_request) Successful in 9s
a1b0691a1e
The dogfood swarm reviewed PR #1; folding in the warranted findings
(graded via the gadfly MCP — 18 real / 18 false-positive across the 4
completed reviewers):

- entrypoint.sh: finalize a never-written status file when run.sh skips
  the binary (empty diff / no key / missing binary). The pre-seed stayed
  {started:0, done:false}, so the board showed that model "waiting to
  start" forever and the N/N counter never completed — breaking the
  board's own "tell when everything is finished" invariant.
  (glm-5.2, correctness — the strongest finding.)
- main.go: recover() in the per-lens goroutine. A panic previously
  crashed the whole binary (killing every other lens's output) and left
  the lens stuck "running" on the board. Now it's recorded as an errored
  result and the lens is marked finished. (glm-5.2 + minimax-m3.)
- status-board.sh: coerce a non-numeric GADFLY_STATUS_POLL_SECS back to
  12. Under `set -uo pipefail` a bad `sleep "$POLL"` failed silently and
  the loop spun, hammering the Gitea API. (glm-5.2, error-handling.)

The remaining real findings (sanitizer collision, page-10 pagination,
markdown-injection via PR-controlled lens names, cosmetic blank line)
were graded trivial and left as-is — documented in the finding notes.

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve merged commit c3d09d3bd4 into main 2026-06-27 19:00:13 +00:00
steve deleted branch feat/status-board 2026-06-27 19:00:13 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/gadfly#1