Phase 1: a second review engine alongside the majordomo agent loop. For
each lens, shell out to the Claude Code CLI (`claude -p --output-format
json`) inside the checked-out repo so it verifies findings with its own
read tools, then reuse gadfly's verdict-parse + recheck + consolidate +
emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`;
auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY;
read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node +
@anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board
image (PR #2 was the first dogfood with the live board + full fleet).
Folded in the swarm's own review findings: minimal subprocess env (no
GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr),
process-group cleanup on timeout, rune-safe error truncation, and
engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings
graded via the gadfly MCP.
gofmt clean, go vet quiet, go build + go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Phase 3: one consolidated, live-updating PR comment aggregating every
model's per-lens progress (queued -> running -> finished + verdict), so
the swarm's progress is visible at a glance and a watcher can tell when
it's done. Opt-in statusWriter in the binary (atomic writes) + a
background status-board.sh renderer wired through entrypoint.sh; default
on, GADFLY_STATUS_BOARD=0 to disable.
Also restores gadfly's dogfood swarm to the full cloud fleet (9 cloud +
M5; M1 dropped as too slow) matching mort, and folds in the 3 real bugs
the swarm found on its own PR (skip-binary stuck-waiting, panic-stuck
lens, busy-loop on bad poll interval). All 36 findings graded via the
gadfly MCP (18 real / 18 false-positive).
gofmt clean, go vet quiet, go build + go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Five fixes, several surfaced by the live bake-off:
- PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist
(own context), not shared across the suite. A slow model (e.g. a 35B local MLX)
was exhausting the whole 600s budget on lens 1, leaving the rest "step 0:
context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go.
- ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean".
Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses
errored"); the section reads "⚠️ could not complete". consolidate.go.
- PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the
spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default
(was wrongly "ollama-cloud" for local models). scripts/run.sh.
- LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens
stays in its lane, says "nothing in my area" rather than re-reporting another
lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive-
constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go.
- RUN TIMING: run.sh posts a "⏳ Reviewing…" placeholder at model start and updates
it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hardcoded ollama.Cloud binding with majordomo's provider registry,
so Gadfly can target any backend majordomo supports without code changes.
- cmd/gadfly/model.go: resolveModel() — GADFLY_PROVIDER (default ollama-cloud)
prefixes bare model ids; GADFLY_MODEL may be a full provider/model spec, alias,
or failover chain (verbatim). GADFLY_BASE_URL constructs openai/ollama/anthropic/
google directly at a custom endpoint (OpenAI-compatible + local/remote Ollama).
GADFLY_API_KEY else the provider's standard env var. + buildSpec unit tests.
- run.sh: provider-aware key gate (local Ollama needs none); maps OLLAMA_CLOUD_API_KEY
-> OLLAMA_API_KEY; provider/base-url/key inherited by the binary. Gadfly-branded comment.
- entrypoint.sh: GADFLY_MODELS alias for OLLAMA_REVIEW_MODELS; provider passthrough.
- examples + README: Models & providers section. Upfront: only the Ollama paths
(local + OpenAI-compatible-against-Ollama) are tested; OpenAI/Anthropic/Google
are wired via majordomo but UNTESTED (no spend).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in
Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/
find_files/get_diff), verifies findings before reporting, two-pass review +
adversarial recheck, posts one labeled comment per model. Advisory only.
- cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo
- entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML)
- Dockerfile: multi-stage; build-time module token never reaches the final image
- .gitea/workflows/build-image.yml: tag v* → build & push image
- examples/: ~15-line consumer stub
- system prompt genericized + hardened to re-derive constants/formulas (semantic bugs)
Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>