feat: live status-board comment — per-model/per-lens review progress #1

Merged
steve merged 4 commits from feat/status-board into main 2026-06-27 19:00:13 +00:00

4 Commits

Author SHA1 Message Date
steve a1b0691a1e fix: fold in gadfly's own review findings (3 real bugs)
Build & push image / build-and-push (pull_request) Successful in 9s
The dogfood swarm reviewed PR #1; folding in the warranted findings
(graded via the gadfly MCP — 18 real / 18 false-positive across the 4
completed reviewers):

- entrypoint.sh: finalize a never-written status file when run.sh skips
  the binary (empty diff / no key / missing binary). The pre-seed stayed
  {started:0, done:false}, so the board showed that model "waiting to
  start" forever and the N/N counter never completed — breaking the
  board's own "tell when everything is finished" invariant.
  (glm-5.2, correctness — the strongest finding.)
- main.go: recover() in the per-lens goroutine. A panic previously
  crashed the whole binary (killing every other lens's output) and left
  the lens stuck "running" on the board. Now it's recorded as an errored
  result and the lens is marked finished. (glm-5.2 + minimax-m3.)
- status-board.sh: coerce a non-numeric GADFLY_STATUS_POLL_SECS back to
  12. Under `set -uo pipefail` a bad `sleep "$POLL"` failed silently and
  the loop spun, hammering the Gitea API. (glm-5.2, error-handling.)

The remaining real findings (sanitizer collision, page-10 pagination,
markdown-injection via PR-controlled lens names, cosmetic blank line)
were graded trivial and left as-is — documented in the finding notes.

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:56:41 -04:00
steve 0e5e0ff089 ci: drop M1 from gadfly's dogfood swarm (keep M5)
Build & push image / build-and-push (pull_request) Successful in 4s
The M1 Pro lane is too slow / low-signal for reviewing gadfly's own
PRs, so remove it from the dogfood fleet: out of GADFLY_MODELS, out of
GADFLY_PROVIDER_CONCURRENCY, and its GADFLY_ENDPOINT_M1 mapping dropped.
M5 stays. (mort still runs both Macs.) Fleet is now 9 cloud + M5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:39:31 -04:00
steve 48af34f4ca ci: dogfood the FULL fleet — 9 cloud + 2 Macs, matching mort
Build & push image / build-and-push (pull_request) Successful in 5s
The dogfood workflow had a truncated 5-model list (3 cloud + the 2 Macs)
and was missing GADFLY_PROVIDER_LENS_CONCURRENCY. Restore mort's full
fleet so gadfly reviews its own PRs with the same 11 reviewers and the
model-quality scoreboard is comparable across both repos:

  9 cloud: minimax-m3, glm-5.2, glm-5.1, kimi-k2.7-code, deepseek-v4-pro,
           nemotron-3-super, gpt-oss:120b, qwen3-coder:480b, gemma4
  2 local: m1/qwen3:14b, m5/qwen3.6:35b-mlx

GADFLY_MODELS / *_CONCURRENCY / *_LENS_CONCURRENCY now match mort's
adversarial-review.yml verbatim.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:31:07 -04:00
steve 1cdda32dbc feat: live status-board comment — per-model/per-lens review progress
Build & push image / build-and-push (pull_request) Successful in 6s
Adversarial Review (Gadfly) / review (pull_request) Successful in 30m1s
Phase 3 of the gadfly-games build. With several models × several lenses
reviewing a PR, all you'd see mid-run is a row of " Reviewing…"
placeholders. Add ONE consolidated, live-updating status-board comment
that aggregates every model's per-lens progress (queued → running →
finished + verdict), so progress is visible at a glance and a watcher
can tell when the whole swarm is done.

- cmd/gadfly: opt-in statusWriter (GADFLY_STATUS_FILE) publishes this
  model's lenses to a JSON file, written atomically (temp+rename) as
  runSpecialists transitions each lens. Inert when unset — plain runs
  and tests are unaffected.
- scripts/status-board.sh: background renderer that polls the status
  dir and upserts one marker comment every GADFLY_STATUS_POLL_SECS
  (default 12s), caching the comment id to PATCH in place. Advisory and
  best-effort; the per-model findings comments are untouched.
- entrypoint.sh: pre-seeds every model as queued, launches the board,
  waits only on the review lanes, then signals .done for a final render.
  Default on; disable with GADFLY_STATUS_BOARD=0.
- Docs: README config table + "Live status board" section, example
  stub note, CLAUDE.md architecture map.

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:18:28 -04:00