steve/gadfly

feat: live status-board comment — per-model/per-lens review progress #1

Merged

steve merged 4 commits from feat/status-board into main

2026-06-27 19:00:13 +00:00

Author	SHA1	Message	Date
steve	a1b0691a1e	fix: fold in gadfly's own review findings (3 real bugs) Build & push image / build-and-push (pull_request) Successful in 9s Details The dogfood swarm reviewed PR #1; folding in the warranted findings (graded via the gadfly MCP — 18 real / 18 false-positive across the 4 completed reviewers): - entrypoint.sh: finalize a never-written status file when run.sh skips the binary (empty diff / no key / missing binary). The pre-seed stayed {started:0, done:false}, so the board showed that model "waiting to start" forever and the N/N counter never completed — breaking the board's own "tell when everything is finished" invariant. (glm-5.2, correctness — the strongest finding.) - main.go: recover() in the per-lens goroutine. A panic previously crashed the whole binary (killing every other lens's output) and left the lens stuck "running" on the board. Now it's recorded as an errored result and the lens is marked finished. (glm-5.2 + minimax-m3.) - status-board.sh: coerce a non-numeric GADFLY_STATUS_POLL_SECS back to 12. Under `set -uo pipefail` a bad `sleep "$POLL"` failed silently and the loop spun, hammering the Gitea API. (glm-5.2, error-handling.) The remaining real findings (sanitizer collision, page-10 pagination, markdown-injection via PR-controlled lens names, cosmetic blank line) were graded trivial and left as-is — documented in the finding notes. gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:56:41 -04:00
steve	0e5e0ff089	ci: drop M1 from gadfly's dogfood swarm (keep M5) Build & push image / build-and-push (pull_request) Successful in 4s Details The M1 Pro lane is too slow / low-signal for reviewing gadfly's own PRs, so remove it from the dogfood fleet: out of GADFLY_MODELS, out of GADFLY_PROVIDER_CONCURRENCY, and its GADFLY_ENDPOINT_M1 mapping dropped. M5 stays. (mort still runs both Macs.) Fleet is now 9 cloud + M5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:39:31 -04:00
steve	48af34f4ca	ci: dogfood the FULL fleet — 9 cloud + 2 Macs, matching mort Build & push image / build-and-push (pull_request) Successful in 5s Details The dogfood workflow had a truncated 5-model list (3 cloud + the 2 Macs) and was missing GADFLY_PROVIDER_LENS_CONCURRENCY. Restore mort's full fleet so gadfly reviews its own PRs with the same 11 reviewers and the model-quality scoreboard is comparable across both repos: 9 cloud: minimax-m3, glm-5.2, glm-5.1, kimi-k2.7-code, deepseek-v4-pro, nemotron-3-super, gpt-oss:120b, qwen3-coder:480b, gemma4 2 local: m1/qwen3:14b, m5/qwen3.6:35b-mlx GADFLY_MODELS / _CONCURRENCY / _LENS_CONCURRENCY now match mort's adversarial-review.yml verbatim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:31:07 -04:00
steve	1cdda32dbc	feat: live status-board comment — per-model/per-lens review progress Build & push image / build-and-push (pull_request) Successful in 6s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 30m1s Details Phase 3 of the gadfly-games build. With several models × several lenses reviewing a PR, all you'd see mid-run is a row of "⏳ Reviewing…" placeholders. Add ONE consolidated, live-updating status-board comment that aggregates every model's per-lens progress (queued → running → finished + verdict), so progress is visible at a glance and a watcher can tell when the whole swarm is done. - cmd/gadfly: opt-in statusWriter (GADFLY_STATUS_FILE) publishes this model's lenses to a JSON file, written atomically (temp+rename) as runSpecialists transitions each lens. Inert when unset — plain runs and tests are unaffected. - scripts/status-board.sh: background renderer that polls the status dir and upserts one marker comment every GADFLY_STATUS_POLL_SECS (default 12s), caching the comment id to PATCH in place. Advisory and best-effort; the per-model findings comments are untouched. - entrypoint.sh: pre-seeds every model as queued, launches the board, waits only on the review lanes, then signals .done for a final render. Default on; disable with GADFLY_STATUS_BOARD=0. - Docs: README config table + "Live status board" section, example stub note, CLAUDE.md architecture map. gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:18:28 -04:00