The dogfood swarm reviewed PR #1; folding in the warranted findings
(graded via the gadfly MCP — 18 real / 18 false-positive across the 4
completed reviewers):
- entrypoint.sh: finalize a never-written status file when run.sh skips
the binary (empty diff / no key / missing binary). The pre-seed stayed
{started:0, done:false}, so the board showed that model "waiting to
start" forever and the N/N counter never completed — breaking the
board's own "tell when everything is finished" invariant.
(glm-5.2, correctness — the strongest finding.)
- main.go: recover() in the per-lens goroutine. A panic previously
crashed the whole binary (killing every other lens's output) and left
the lens stuck "running" on the board. Now it's recorded as an errored
result and the lens is marked finished. (glm-5.2 + minimax-m3.)
- status-board.sh: coerce a non-numeric GADFLY_STATUS_POLL_SECS back to
12. Under `set -uo pipefail` a bad `sleep "$POLL"` failed silently and
the loop spun, hammering the Gitea API. (glm-5.2, error-handling.)
The remaining real findings (sanitizer collision, page-10 pagination,
markdown-injection via PR-controlled lens names, cosmetic blank line)
were graded trivial and left as-is — documented in the finding notes.
gofmt clean, go vet quiet, go build + go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The M1 Pro lane is too slow / low-signal for reviewing gadfly's own
PRs, so remove it from the dogfood fleet: out of GADFLY_MODELS, out of
GADFLY_PROVIDER_CONCURRENCY, and its GADFLY_ENDPOINT_M1 mapping dropped.
M5 stays. (mort still runs both Macs.) Fleet is now 9 cloud + M5.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dogfood workflow had a truncated 5-model list (3 cloud + the 2 Macs)
and was missing GADFLY_PROVIDER_LENS_CONCURRENCY. Restore mort's full
fleet so gadfly reviews its own PRs with the same 11 reviewers and the
model-quality scoreboard is comparable across both repos:
9 cloud: minimax-m3, glm-5.2, glm-5.1, kimi-k2.7-code, deepseek-v4-pro,
nemotron-3-super, gpt-oss:120b, qwen3-coder:480b, gemma4
2 local: m1/qwen3:14b, m5/qwen3.6:35b-mlx
GADFLY_MODELS / *_CONCURRENCY / *_LENS_CONCURRENCY now match mort's
adversarial-review.yml verbatim.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 3 of the gadfly-games build. With several models × several lenses
reviewing a PR, all you'd see mid-run is a row of "⏳ Reviewing…"
placeholders. Add ONE consolidated, live-updating status-board comment
that aggregates every model's per-lens progress (queued → running →
finished + verdict), so progress is visible at a glance and a watcher
can tell when the whole swarm is done.
- cmd/gadfly: opt-in statusWriter (GADFLY_STATUS_FILE) publishes this
model's lenses to a JSON file, written atomically (temp+rename) as
runSpecialists transitions each lens. Inert when unset — plain runs
and tests are unaffected.
- scripts/status-board.sh: background renderer that polls the status
dir and upserts one marker comment every GADFLY_STATUS_POLL_SECS
(default 12s), caching the comment id to PATCH in place. Advisory and
best-effort; the per-model findings comments are untouched.
- entrypoint.sh: pre-seeds every model as queued, launches the board,
waits only on the review lanes, then signals .done for a final render.
Default on; disable with GADFLY_STATUS_BOARD=0.
- Docs: README config table + "Live status board" section, example
stub note, CLAUDE.md architecture map.
gofmt clean, go vet quiet, go build + go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the adversarial-review workflow to gadfly itself (copied from mort: 3 cloud + m1/m5 via foreman, findings telemetry, sha-d7f364d). Future gadfly PRs get reviewed by the swarm.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After each review the binary POSTs the run + its heuristically-extracted findings to GADFLY_FINDINGS_URL (off unless set). Advisory: any error only goes to stderr — never touches stdout, the exit code, or the review. stdlib net/http only (no new deps). entrypoint.sh derives GADFLY_REPO/GADFLY_PR and passes through GADFLY_FINDINGS_URL/GADFLY_FINDINGS_TOKEN. Also renames store references from the old 'docket' name to 'gadfly-reports'.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Specialist lenses ran strictly sequentially within a model. Add a
GADFLY_LENS_CONCURRENCY knob (default 1 = unchanged) that overlaps the
independent per-lens review+recheck passes, so a model posts its
consolidated comment as soon as its lenses finish.
Per-provider configurable, mirroring GADFLY_PROVIDER_CONCURRENCY:
GADFLY_PROVIDER_LENS_CONCURRENCY takes a "provider=N,..." map keyed by
the same provider lanes (modelProvider() mirrors entrypoint's provider_of;
providerOverride() mirrors provider_cap). The override wins for the model's
lane, else the scalar default.
runSpecialists fans out via a bounded worker pool, order-preserving
(results written by index) and keeping each lens's own timeout/recheck.
repoFS is immutable + fresh-toolbox-per-pass, so lenses share no mutable
state (verified under -race). Docs/examples updated; dropped a duplicate
GADFLY_TIMEOUT_SECS README row.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Accept "foreman" in both resolveModel (GADFLY_BASE_URL) and endpointProvider
(GADFLY_ENDPOINT_*) switches, mapping to majordomo's ollama.Foreman() preset
(handles foreman's non-streaming/long-poll quirks). Unlike the HTTPS-only
LLM_* foreman:// DSN, the base URL is verbatim, so a plaintext http:// foreman
queue works. Tests + README provider table + endpoint-aliases example updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per a Gadfly self-review finding (kimi-k2.7-code): an issue_comment can start a
secret-bearing run before the in-container allowed-users check. Add a workflow
if: that only lets trusted actors trigger via comment (PR/dispatch already
trusted); keep GADFLY_ALLOWED_USERS as the belt-and-suspenders layer. README
documents it + the default-branch caveat for comment triggers. (Docs/examples
only — paths-ignored, no image rebuild.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EOF
A section that led with '**Blocking issues**' (no 'found') fell through to
unknown, so the consolidated header wrongly read 'No material issues found'
(seen live on gpt-oss). Now matches 'blocking issue'/'minor issue'/'no material
issue' and picks the earliest-appearing phrase (the lead verdict). + tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A green check on a section reporting blocking issues was misleading; 🎯 signals
accuracy/on-target. Section verdict text already conveys pass/fail.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
entrypoint.sh groups models by provider into lanes that run in PARALLEL; within
a lane at most `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY map
("ollama-cloud=3,m1pro=1") else GADFLY_CONCURRENCY (default 1). So a single
local box stays serial (1 at a time) while cloud models run several at once and
both lanes progress simultaneously. Portable bash (no associative arrays).
Default cap 1 keeps a single-provider pool sequential as before. Pairs with the
per-lens timeout so a slow lane can't starve others. Docs: README Concurrency
section + config table; CLAUDE.md lessons incl. the docker://:latest cache gotcha.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five fixes, several surfaced by the live bake-off:
- PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist
(own context), not shared across the suite. A slow model (e.g. a 35B local MLX)
was exhausting the whole 600s budget on lens 1, leaving the rest "step 0:
context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go.
- ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean".
Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses
errored"); the section reads "⚠️ could not complete". consolidate.go.
- PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the
spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default
(was wrongly "ollama-cloud" for local models). scripts/run.sh.
- LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens
stays in its lane, says "nothing in my area" rather than re-reporting another
lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive-
constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go.
- RUN TIMING: run.sh posts a "⏳ Reviewing…" placeholder at model start and updates
it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two Phase-2 swarm upgrades:
- auto.go: GADFLY_SPECIALISTS=auto routes the review — a selector model
(GADFLY_SELECTOR_MODEL, else the review model) reads the changed files + PR
description and picks the smallest relevant lens set from the catalog, and may
propose ad-hoc lenses for gaps (e.g. migrations). Structured output via
majordomo.Generate[T]; capped + de-duped; falls back to the default suite.
- delegate.go: GADFLY_WORKER_MODEL adds a delegate_investigation tool so the
reviewer offloads mechanical legwork (trace callers, gather usages) to a cheap
worker sub-agent that returns an evidence-cited digest — the top model reasons
over summaries, not raw file dumps. Workers get an fs-only toolbox (no
sub-delegation). Unset = off.
resolveSpecialists now also returns the registry + an auto flag. Docs (README
Specialists + config table, CLAUDE.md, main.go header) + tests updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).
- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
majordomo's built-in LLM_* env DSNs are HTTPS-only (DSN.BaseURL forces https),
so they can't express a plaintext local Ollama. Add Gadfly-native env families
that register named providers/aliases with majordomo before resolution:
GADFLY_ENDPOINT_<NAME>="<provider>|<base-url>[|<key>]" # base URL verbatim (http ok)
GADFLY_ALIAS_<NAME>="<majordomo spec>" # plain alias / failover chain
Then reference them as "<name>/<model>" (or the bare alias) in GADFLY_MODEL(S).
<NAME> lowercases to the registry name, matching majordomo's LLM_* convention.
LLM_* DSNs still work (and are documented) for HTTPS endpoints. + unit tests,
README "Endpoint aliases via env vars", stub example.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hardcoded ollama.Cloud binding with majordomo's provider registry,
so Gadfly can target any backend majordomo supports without code changes.
- cmd/gadfly/model.go: resolveModel() — GADFLY_PROVIDER (default ollama-cloud)
prefixes bare model ids; GADFLY_MODEL may be a full provider/model spec, alias,
or failover chain (verbatim). GADFLY_BASE_URL constructs openai/ollama/anthropic/
google directly at a custom endpoint (OpenAI-compatible + local/remote Ollama).
GADFLY_API_KEY else the provider's standard env var. + buildSpec unit tests.
- run.sh: provider-aware key gate (local Ollama needs none); maps OLLAMA_CLOUD_API_KEY
-> OLLAMA_API_KEY; provider/base-url/key inherited by the binary. Gadfly-branded comment.
- entrypoint.sh: GADFLY_MODELS alias for OLLAMA_REVIEW_MODELS; provider passthrough.
- examples + README: Models & providers section. Upfront: only the Ollama paths
(local + OpenAI-compatible-against-Ollama) are tested; OpenAI/Anthropic/Google
are wired via majordomo but UNTESTED (no spend).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror mort-ci.yml's build-and-push: BuildKit secrets (REGISTRY_USER/
REGISTRY_PASSWORD) for private majordomo access instead of build-args, and the
LAN --add-host so the builder can reach the registry. push main -> :latest +
:sha-<short>; tag v* -> :<tag> + :latest; other branches -> :branch-<safe>;
PRs build-only (no push). Optional DISCORD_WEBHOOK_URL notifications.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in
Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/
find_files/get_diff), verifies findings before reporting, two-pass review +
adversarial recheck, posts one labeled comment per model. Advisory only.
- cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo
- entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML)
- Dockerfile: multi-stage; build-time module token never reaches the final image
- .gitea/workflows/build-image.yml: tag v* → build & push image
- examples/: ~15-line consumer stub
- system prompt genericized + hardened to re-derive constants/formulas (semantic bugs)
Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>