PR #13's self-review showed the pinned image (sha-c342bdb) rejects the
'llama-swaps' provider spelling in GADFLY_ENDPOINT (its endpointProvider only
accepts 'llamaswap'; the hyphenated aliases were added to the binary later).
Switch GADFLY_ENDPOINT_RAGNAROS to llamaswap|https://... so the 4090 Ti
endpoint registers and ragnaros/qwen3.6-27b resolves.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a local GPU reviewer to the shared default:
- models += ragnaros/qwen3.6-27b
- GADFLY_ENDPOINT_RAGNAROS=llama-swaps|https://llama-swap.ragnaros.dudenhoeffer.casa
(plain LAN URL, no credential; registers provider "ragnaros")
- provider_concurrency ragnaros=1, provider_lens_concurrency ragnaros=1
(one model, one lens at a time — a single local GPU)
Inherited by all @v1 consumers (mort/executus/majordomo) once v1 moves.
Comments + README + CLAUDE.md updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Self-review on PR #9 flagged two doc-drift spots left over from the
explicit-secret-forwarding switch. Cosmetic.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The first attempt failed at entrypoint.sh:61 'GITEA_TOKEN required' — with
explicit secrets (no `inherit`), secrets.GITEA_TOKEN resolves empty in the
reusable job. github.token comes from the github context (not a forwarded
secret), so it's present regardless. The forwarded provider/findings secrets
arrived correctly; only the auto-token sourcing was wrong.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The swarm (reviewing the mort/executus rollout PRs) correctly flagged that
`secrets: inherit` forwards EVERY caller secret to the reusable review
workflow — registry/deploy/db creds the reviewer never touches. Fix:
- review-reusable.yml: declare workflow_call.secrets (all optional) so a
caller can forward only what the reviewer needs.
- adversarial-review.yml (gadfly's own caller) + examples/reusable.yml:
replace `secrets: inherit` with an explicit forward of just
OLLAMA_CLOUD_API_KEY / CLAUDE_CODE_OAUTH_TOKEN / findings tokens.
GITEA_TOKEN stays automatic.
- Docs (README, examples) updated; also advise pinning consumers to an
immutable @<sha> instead of @main (supply-chain, the other finding).
gadfly's own review on this PR exercises the explicit-secrets path (local
reusable ref) — validating it on the act_runner before mort/executus adopt it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Centralizes the consumer stub into a reusable Gitea workflow
(.gitea/workflows/review-reusable.yml, workflow_call + defaulted inputs +
secrets: inherit); gadfly's own dogfood is now a thin caller of it, which
proved end-to-end that github.event context propagates into the reusable
on this act_runner. Adds the slim examples/reusable.yml stub + docs.
Folded in the swarm's findings: timeout_minutes default 30->45, map
GADFLY_API_KEY, explicit permissions block, drop the dead specialist_suite
input, and harden the example's actor gate. ~70 findings graded.
Completes the gadfly-games build (Phases 1-4 + quality fixes).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Bump majordomo to the latest build and accept every llama-swap spelling
(llama-swap/llama-swaps + un-hyphenated llamaswap/llamaswaps) in gadfly's
endpoint switches; the LLM_* llama-swap(s):// DSN path already worked via
majordomo.Parse. README + error messages + endpointProvider alias tests.
Swarm review: 8/9 clean; qwen3-coder's "Blocking" was a false positive
(claimed llamaswap was untested — it has dedicated test cases). Folded in
its one fair nit (README now lists the un-hyphenated aliases).
gofmt clean, go vet quiet, go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Adds claude-code/opus:max to the dogfood swarm and pins to :sha-c342bdb
(which has the :thinking parse). Claude Code lineup is now sonnet + opus +
opus:max. All three ran end-to-end on this PR's own review; 0 findings
(clean PR + the telemetry fix suppressing phantom clean-verification
findings — working as intended).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Adds claude-code/opus to gadfly's dogfood swarm (both sonnet and opus run
end-to-end), bumps the image pin to :sha-80d8f53 so the clean-lens
telemetry fix is live, and adds engine support for a
"claude-code/<model>:max" extended-thinking spec (MAX_THINKING_TOKENS,
best-effort). Validated: only 13 findings on this clean PR vs 43 on the
comparable #4 — the telemetry fix works.
Folded in the swarm's two real findings: a runPass env-injection test and
keeping MAX_THINKING_TOKENS in claudeEnv. Follow-up enables
claude-code/opus:max once this image builds.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
emit() now skips findings extraction for a "No material issues found"
lens (its path:line refs are verification notes, not problems), fixing
the FP inflation that penalized thorough clean-pass reviewers. Also trims
the dogfood swarm to the strong reviewers: drops m5/qwen3.6 (last local
lane), gemma4, gpt-oss:120b, and kimi-k2.7-code — leaving 6 cloud +
claude-code/sonnet.
Fittingly, PR #4's own 11-model review produced 43 findings that were ALL
clean-verification bullets (zero real) — a live demonstration of the bug
this fixes. gofmt clean, go vet quiet, go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Phase 2: bump majordomo to latest and wire its new llamaswap provider
into gadfly's endpoint switches; add claude-code/sonnet to gadfly's own
dogfood swarm (pin :sha-86f12c1, map CLAUDE_CODE_OAUTH_TOKEN) so the
Phase-1 engine runs as a live competitor; document the Ollama-through-CC
ANTHROPIC_BASE_URL proxy path as example-only.
The 11-model swarm (incl. claude-code/sonnet) reviewed it; 52 findings
graded via the MCP. Folded in the two real ones: a llamaswap
endpointProvider test (caught by claude-code/sonnet, citing CLAUDE.md)
and adding "openai-compatible" to the provider error messages (gpt-oss).
gofmt clean, go vet quiet, go build + go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Phase 1: a second review engine alongside the majordomo agent loop. For
each lens, shell out to the Claude Code CLI (`claude -p --output-format
json`) inside the checked-out repo so it verifies findings with its own
read tools, then reuse gadfly's verdict-parse + recheck + consolidate +
emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`;
auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY;
read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node +
@anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board
image (PR #2 was the first dogfood with the live board + full fleet).
Folded in the swarm's own review findings: minimal subprocess env (no
GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr),
process-group cleanup on timeout, rune-safe error truncation, and
engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings
graded via the gadfly MCP.
gofmt clean, go vet quiet, go build + go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Phase 3: one consolidated, live-updating PR comment aggregating every
model's per-lens progress (queued -> running -> finished + verdict), so
the swarm's progress is visible at a glance and a watcher can tell when
it's done. Opt-in statusWriter in the binary (atomic writes) + a
background status-board.sh renderer wired through entrypoint.sh; default
on, GADFLY_STATUS_BOARD=0 to disable.
Also restores gadfly's dogfood swarm to the full cloud fleet (9 cloud +
M5; M1 dropped as too slow) matching mort, and folds in the 3 real bugs
the swarm found on its own PR (skip-binary stuck-waiting, panic-stuck
lens, busy-loop on bad poll interval). All 36 findings graded via the
gadfly MCP (18 real / 18 false-positive).
gofmt clean, go vet quiet, go build + go test -race green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Adds the adversarial-review workflow to gadfly itself (copied from mort: 3 cloud + m1/m5 via foreman, findings telemetry, sha-d7f364d). Future gadfly PRs get reviewed by the swarm.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After each review the binary POSTs the run + its heuristically-extracted findings to GADFLY_FINDINGS_URL (off unless set). Advisory: any error only goes to stderr — never touches stdout, the exit code, or the review. stdlib net/http only (no new deps). entrypoint.sh derives GADFLY_REPO/GADFLY_PR and passes through GADFLY_FINDINGS_URL/GADFLY_FINDINGS_TOKEN. Also renames store references from the old 'docket' name to 'gadfly-reports'.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Specialist lenses ran strictly sequentially within a model. Add a
GADFLY_LENS_CONCURRENCY knob (default 1 = unchanged) that overlaps the
independent per-lens review+recheck passes, so a model posts its
consolidated comment as soon as its lenses finish.
Per-provider configurable, mirroring GADFLY_PROVIDER_CONCURRENCY:
GADFLY_PROVIDER_LENS_CONCURRENCY takes a "provider=N,..." map keyed by
the same provider lanes (modelProvider() mirrors entrypoint's provider_of;
providerOverride() mirrors provider_cap). The override wins for the model's
lane, else the scalar default.
runSpecialists fans out via a bounded worker pool, order-preserving
(results written by index) and keeping each lens's own timeout/recheck.
repoFS is immutable + fresh-toolbox-per-pass, so lenses share no mutable
state (verified under -race). Docs/examples updated; dropped a duplicate
GADFLY_TIMEOUT_SECS README row.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Accept "foreman" in both resolveModel (GADFLY_BASE_URL) and endpointProvider
(GADFLY_ENDPOINT_*) switches, mapping to majordomo's ollama.Foreman() preset
(handles foreman's non-streaming/long-poll quirks). Unlike the HTTPS-only
LLM_* foreman:// DSN, the base URL is verbatim, so a plaintext http:// foreman
queue works. Tests + README provider table + endpoint-aliases example updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per a Gadfly self-review finding (kimi-k2.7-code): an issue_comment can start a
secret-bearing run before the in-container allowed-users check. Add a workflow
if: that only lets trusted actors trigger via comment (PR/dispatch already
trusted); keep GADFLY_ALLOWED_USERS as the belt-and-suspenders layer. README
documents it + the default-branch caveat for comment triggers. (Docs/examples
only — paths-ignored, no image rebuild.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EOF
A section that led with '**Blocking issues**' (no 'found') fell through to
unknown, so the consolidated header wrongly read 'No material issues found'
(seen live on gpt-oss). Now matches 'blocking issue'/'minor issue'/'no material
issue' and picks the earliest-appearing phrase (the lead verdict). + tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A green check on a section reporting blocking issues was misleading; 🎯 signals
accuracy/on-target. Section verdict text already conveys pass/fail.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
entrypoint.sh groups models by provider into lanes that run in PARALLEL; within
a lane at most `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY map
("ollama-cloud=3,m1pro=1") else GADFLY_CONCURRENCY (default 1). So a single
local box stays serial (1 at a time) while cloud models run several at once and
both lanes progress simultaneously. Portable bash (no associative arrays).
Default cap 1 keeps a single-provider pool sequential as before. Pairs with the
per-lens timeout so a slow lane can't starve others. Docs: README Concurrency
section + config table; CLAUDE.md lessons incl. the docker://:latest cache gotcha.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five fixes, several surfaced by the live bake-off:
- PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist
(own context), not shared across the suite. A slow model (e.g. a 35B local MLX)
was exhausting the whole 600s budget on lens 1, leaving the rest "step 0:
context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go.
- ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean".
Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses
errored"); the section reads "⚠️ could not complete". consolidate.go.
- PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the
spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default
(was wrongly "ollama-cloud" for local models). scripts/run.sh.
- LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens
stays in its lane, says "nothing in my area" rather than re-reporting another
lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive-
constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go.
- RUN TIMING: run.sh posts a "⏳ Reviewing…" placeholder at model start and updates
it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two Phase-2 swarm upgrades:
- auto.go: GADFLY_SPECIALISTS=auto routes the review — a selector model
(GADFLY_SELECTOR_MODEL, else the review model) reads the changed files + PR
description and picks the smallest relevant lens set from the catalog, and may
propose ad-hoc lenses for gaps (e.g. migrations). Structured output via
majordomo.Generate[T]; capped + de-duped; falls back to the default suite.
- delegate.go: GADFLY_WORKER_MODEL adds a delegate_investigation tool so the
reviewer offloads mechanical legwork (trace callers, gather usages) to a cheap
worker sub-agent that returns an evidence-cited digest — the top model reasons
over summaries, not raw file dumps. Workers get an fs-only toolbox (no
sub-delegation). Unset = off.
resolveSpecialists now also returns the registry + an auto flag. Docs (README
Specialists + config table, CLAUDE.md, main.go header) + tests updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).
- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
majordomo's built-in LLM_* env DSNs are HTTPS-only (DSN.BaseURL forces https),
so they can't express a plaintext local Ollama. Add Gadfly-native env families
that register named providers/aliases with majordomo before resolution:
GADFLY_ENDPOINT_<NAME>="<provider>|<base-url>[|<key>]" # base URL verbatim (http ok)
GADFLY_ALIAS_<NAME>="<majordomo spec>" # plain alias / failover chain
Then reference them as "<name>/<model>" (or the bare alias) in GADFLY_MODEL(S).
<NAME> lowercases to the registry name, matching majordomo's LLM_* convention.
LLM_* DSNs still work (and are documented) for HTTPS endpoints. + unit tests,
README "Endpoint aliases via env vars", stub example.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hardcoded ollama.Cloud binding with majordomo's provider registry,
so Gadfly can target any backend majordomo supports without code changes.
- cmd/gadfly/model.go: resolveModel() — GADFLY_PROVIDER (default ollama-cloud)
prefixes bare model ids; GADFLY_MODEL may be a full provider/model spec, alias,
or failover chain (verbatim). GADFLY_BASE_URL constructs openai/ollama/anthropic/
google directly at a custom endpoint (OpenAI-compatible + local/remote Ollama).
GADFLY_API_KEY else the provider's standard env var. + buildSpec unit tests.
- run.sh: provider-aware key gate (local Ollama needs none); maps OLLAMA_CLOUD_API_KEY
-> OLLAMA_API_KEY; provider/base-url/key inherited by the binary. Gadfly-branded comment.
- entrypoint.sh: GADFLY_MODELS alias for OLLAMA_REVIEW_MODELS; provider passthrough.
- examples + README: Models & providers section. Upfront: only the Ollama paths
(local + OpenAI-compatible-against-Ollama) are tested; OpenAI/Anthropic/Google
are wired via majordomo but UNTESTED (no spend).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror mort-ci.yml's build-and-push: BuildKit secrets (REGISTRY_USER/
REGISTRY_PASSWORD) for private majordomo access instead of build-args, and the
LAN --add-host so the builder can reach the registry. push main -> :latest +
:sha-<short>; tag v* -> :<tag> + :latest; other branches -> :branch-<safe>;
PRs build-only (no push). Optional DISCORD_WEBHOOK_URL notifications.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in
Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/
find_files/get_diff), verifies findings before reporting, two-pass review +
adversarial recheck, posts one labeled comment per model. Advisory only.
- cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo
- entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML)
- Dockerfile: multi-stage; build-time module token never reaches the final image
- .gitea/workflows/build-image.yml: tag v* → build & push image
- examples/: ~15-line consumer stub
- system prompt genericized + hardened to re-derive constants/formulas (semantic bugs)
Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>