gadfly

Author	SHA1	Message	Date
steve	8f5adc91b2	chore(reusable): bump image pin to sha-88f74aa (consensus consolidation live) Phase 2: gadfly's own multi-model reviews now post ONE cross-model consensus comment instead of N per-model comments. External consumers re-pin separately. [skip ci] Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 18:57:38 -04:00
steve	88f74aa768	feat: cross-model consensus consolidation (one ranked comment, not N walls) (#17 ) Build & push image / build-and-push (push) Successful in 9s Details Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-28 22:56:15 +00:00
steve	84b891b1ba	chore(reusable): bump image pin to sha-5397160 (structured findings contract) Makes the Phase 1 gadfly-findings contract live for gadfly's own dogfood reviews (the local-ref reusable). External consumers re-pin separately. [skip ci] Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 18:25:16 -04:00
steve	7bc3c982fa	feat(reusable): runtime-variable swarm config (cache-immune, no more re-pinning to retune) (#14 ) Build & push image / build-and-push (push) Successful in 5s Details	2026-06-28 06:00:18 +00:00
steve	95a9ec546a	feat(reusable): add the 4090 Ti (qwen3.6-27b via llama-swap) to the default swarm (#13 ) Build & push image / build-and-push (push) Successful in 7s Details	2026-06-28 05:01:50 +00:00
steve	8f69e71311	docs: recommend the @v1 release tag for reusable-workflow consumers (#12 ) Build & push image / build-and-push (push) Successful in 6s Details	2026-06-28 04:17:19 +00:00
steve	0d80ae73d8	tune(reusable): claude-code=3 models × 5 lenses (claude was the bottleneck) (#11 ) Build & push image / build-and-push (push) Successful in 8s Details	2026-06-28 04:02:17 +00:00
steve	b02b11d691	feat(reusable): ship the curated swarm as the default config consumers inherit (#10 ) Build & push image / build-and-push (push) Successful in 8s Details	2026-06-28 02:23:40 +00:00
Steve Dudenhoeffer	daff6d08a1	docs: drop stale 'secrets: inherit' mentions (reusable comment + CLAUDE.md) Build & push image / build-and-push (pull_request) Successful in 6s Details Self-review on PR #9 flagged two doc-drift spots left over from the explicit-secret-forwarding switch. Cosmetic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 21:00:40 -04:00
Steve Dudenhoeffer	18de9b8ebc	fix: source GITEA_TOKEN from github.token (auto) under explicit secret forwarding Build & push image / build-and-push (pull_request) Successful in 7s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 8m2s Details The first attempt failed at entrypoint.sh:61 'GITEA_TOKEN required' — with explicit secrets (no `inherit`), secrets.GITEA_TOKEN resolves empty in the reusable job. github.token comes from the github context (not a forwarded secret), so it's present regardless. The forwarded provider/findings secrets arrived correctly; only the auto-token sourcing was wrong. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 20:53:00 -04:00
Steve Dudenhoeffer	f06fe5ef72	security: scope reusable-workflow secrets (least privilege) over secrets: inherit Adversarial Review (Gadfly) / review (pull_request) Failing after 2s Details Build & push image / build-and-push (pull_request) Successful in 6s Details The swarm (reviewing the mort/executus rollout PRs) correctly flagged that `secrets: inherit` forwards EVERY caller secret to the reusable review workflow — registry/deploy/db creds the reviewer never touches. Fix: - review-reusable.yml: declare workflow_call.secrets (all optional) so a caller can forward only what the reviewer needs. - adversarial-review.yml (gadfly's own caller) + examples/reusable.yml: replace `secrets: inherit` with an explicit forward of just OLLAMA_CLOUD_API_KEY / CLAUDE_CODE_OAUTH_TOKEN / findings tokens. GITEA_TOKEN stays automatic. - Docs (README, examples) updated; also advise pinning consumers to an immutable @<sha> instead of @main (supply-chain, the other finding). gadfly's own review on this PR exercises the explicit-secrets path (local reusable ref) — validating it on the act_runner before mort/executus adopt it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 20:45:18 -04:00
steve	5f86062a5a	feat: Phase 4 — reusable "subscribe" workflow (+ dogfood it) (#8 ) Build & push image / build-and-push (push) Successful in 9s Details Centralizes the consumer stub into a reusable Gitea workflow (.gitea/workflows/review-reusable.yml, workflow_call + defaulted inputs + secrets: inherit); gadfly's own dogfood is now a thin caller of it, which proved end-to-end that github.event context propagates into the reusable on this act_runner. Adds the slim examples/reusable.yml stub + docs. Folded in the swarm's findings: timeout_minutes default 30->45, map GADFLY_API_KEY, explicit permissions block, drop the dead specialist_suite input, and harden the example's actor gate. ~70 findings graded. Completes the gadfly-games build (Phases 1-4 + quality fixes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 23:42:01 +00:00
steve	a4cdc905c9	ci: enable claude-code/opus:max (max-thinking) reviewer (#6 ) Build & push image / build-and-push (push) Successful in 6s Details Adds claude-code/opus:max to the dogfood swarm and pins to :sha-c342bdb (which has the :thinking parse). Claude Code lineup is now sonnet + opus + opus:max. All three ran end-to-end on this PR's own review; 0 findings (clean PR + the telemetry fix suppressing phantom clean-verification findings — working as intended). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 22:49:49 +00:00
steve	c342bdb905	feat: add claude-code/opus reviewer + max-thinking spec support (#5 ) Build & push image / build-and-push (push) Successful in 15s Details Adds claude-code/opus to gadfly's dogfood swarm (both sonnet and opus run end-to-end), bumps the image pin to :sha-80d8f53 so the clean-lens telemetry fix is live, and adds engine support for a "claude-code/<model>:max" extended-thinking spec (MAX_THINKING_TOKENS, best-effort). Validated: only 13 findings on this clean PR vs 43 on the comparable #4 — the telemetry fix works. Folded in the swarm's two real findings: a runPass env-injection test and keeping MAX_THINKING_TOKENS in claudeEnv. Follow-up enables claude-code/opus:max once this image builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 22:39:14 +00:00
steve	80d8f53f63	fix: clean-lens findings + trim the dogfood swarm to strong reviewers (#4 ) Build & push image / build-and-push (push) Successful in 9s Details emit() now skips findings extraction for a "No material issues found" lens (its path:line refs are verification notes, not problems), fixing the FP inflation that penalized thorough clean-pass reviewers. Also trims the dogfood swarm to the strong reviewers: drops m5/qwen3.6 (last local lane), gemma4, gpt-oss:120b, and kimi-k2.7-code — leaving 6 cloud + claude-code/sonnet. Fittingly, PR #4's own 11-model review produced 43 findings that were ALL clean-verification bullets (zero real) — a live demonstration of the bug this fixes. gofmt clean, go vet quiet, go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 22:14:07 +00:00
steve	82f7ef78d5	feat: claude-code backends + llamaswap provider + dogfood the CC engine (#3 ) Build & push image / build-and-push (push) Successful in 10s Details Phase 2: bump majordomo to latest and wire its new llamaswap provider into gadfly's endpoint switches; add claude-code/sonnet to gadfly's own dogfood swarm (pin :sha-86f12c1, map CLAUDE_CODE_OAUTH_TOKEN) so the Phase-1 engine runs as a live competitor; document the Ollama-through-CC ANTHROPIC_BASE_URL proxy path as example-only. The 11-model swarm (incl. claude-code/sonnet) reviewed it; 52 findings graded via the MCP. Folded in the two real ones: a llamaswap endpointProvider test (caught by claude-code/sonnet, citing CLAUDE.md) and adding "openai-compatible" to the provider error messages (gpt-oss). gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 21:53:41 +00:00
steve	86f12c126f	feat: claude-code reviewer engine (#2 ) Build & push image / build-and-push (push) Successful in 28s Details Phase 1: a second review engine alongside the majordomo agent loop. For each lens, shell out to the Claude Code CLI (`claude -p --output-format json`) inside the checked-out repo so it verifies findings with its own read tools, then reuse gadfly's verdict-parse + recheck + consolidate + emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`; auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY; read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node + @anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board image (PR #2 was the first dogfood with the live board + full fleet). Folded in the swarm's own review findings: minimal subprocess env (no GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr), process-group cleanup on timeout, rune-safe error truncation, and engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings graded via the gadfly MCP. gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 20:40:41 +00:00
steve	c3d09d3bd4	feat: live status-board comment + full-fleet dogfood (#1 ) Build & push image / build-and-push (push) Successful in 6s Details Phase 3: one consolidated, live-updating PR comment aggregating every model's per-lens progress (queued -> running -> finished + verdict), so the swarm's progress is visible at a glance and a watcher can tell when it's done. Opt-in statusWriter in the binary (atomic writes) + a background status-board.sh renderer wired through entrypoint.sh; default on, GADFLY_STATUS_BOARD=0 to disable. Also restores gadfly's dogfood swarm to the full cloud fleet (9 cloud + M5; M1 dropped as too slow) matching mort, and folds in the 3 real bugs the swarm found on its own PR (skip-binary stuck-waiting, panic-stuck lens, busy-loop on bad poll interval). All 36 findings graded via the gadfly MCP (18 real / 18 false-positive). gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 19:00:12 +00:00
steve	0ad5b66170	ci: dogfood — gadfly reviews its own PRs (mort's full-fleet setup) Build & push image / build-and-push (push) Successful in 14s Details Adds the adversarial-review workflow to gadfly itself (copied from mort: 3 cloud + m1/m5 via foreman, findings telemetry, sha-d7f364d). Future gadfly PRs get reviewed by the swarm. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 13:26:37 -04:00
Steve Dudenhoeffer	676c9d4f07	ci: skip image rebuild on docs/example-only changes (paths-ignore) Build & push image / build-and-push (push) Successful in 5s Details Tag pushes (v*) bypass path filters, so releases always build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 19:10:24 -04:00
Steve Dudenhoeffer	6123604595	ci: auto build & push image on main (:latest) + v* tags Build & push image / build-and-push (push) Successful in 58s Details Mirror mort-ci.yml's build-and-push: BuildKit secrets (REGISTRY_USER/ REGISTRY_PASSWORD) for private majordomo access instead of build-args, and the LAN --add-host so the builder can reach the registry. push main -> :latest + :sha-<short>; tag v* -> :<tag> + :latest; other branches -> :branch-<safe>; PRs build-only (no push). Optional DISCORD_WEBHOOK_URL notifications. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 18:45:48 -04:00
Steve Dudenhoeffer	c0d0152a34	Gadfly: agentic adversarial PR reviewer (initial extraction) Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/ find_files/get_diff), verifies findings before reporting, two-pass review + adversarial recheck, posts one labeled comment per model. Advisory only. - cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo - entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML) - Dockerfile: multi-stage; build-time module token never reaches the final image - .gitea/workflows/build-image.yml: tag v* → build & push image - examples/: ~15-line consumer stub - system prompt genericized + hardened to re-derive constants/formulas (semantic bugs) Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 18:42:20 -04:00

22 Commits