gadfly

Author	SHA1	Message	Date
steve	27aa92a6e0	fix: fold in PR #8 review findings (reusable workflow) Build & push image / build-and-push (pull_request) Successful in 7s Details The swarm reviewed PR #8 through the reusable path itself — proving github.event context propagates into a workflow_call reusable workflow on this act_runner (the one part the probes hadn't covered). Folded in the warranted findings: - review-reusable.yml: bump timeout_minutes default 30 -> 45 (a multi- model/slow-lens review can exceed 30); map the generic GADFLY_API_KEY secret (was missing); add an explicit permissions block; drop the dead `specialist_suite` input. - examples/reusable.yml: actor gate now also requires github.event.issue.pull_request (so an issue-comment on a plain issue doesn't waste a runner), and a note to pin @<ref> to a release tag. Graded ~70 findings (heavy clustering): the real ones above + several by-design/documented (inputs replace vars-overrides; only M1/M5 named endpoints mapped) and many false positives (IS_DRAFT pattern, GITEA_TOKEN via inherit, "empty specialists" misread — empty does default). YAML validated; Go unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 19:41:45 -04:00
steve	0a01c3ae91	feat: Phase 4 — reusable workflow ("subscribe") + dogfood it Build & push image / build-and-push (pull_request) Successful in 5s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 14m49s Details Centralizes the ~90-line consumer stub into a reusable Gitea workflow so a repo can subscribe to Gadfly with a tiny caller. Feasibility was probe- verified on this act_runner: workflow_call runs, secrets: inherit delivers, and a fully-qualified owner/repo/path@ref resolves. - .gitea/workflows/review-reusable.yml: `on: workflow_call` job holding the image pin + all env plumbing. Inputs (models/specialists/provider/ concurrency/timeouts/allowed_users/…) default to "" so an empty value falls back to the image's own default — caller overrides only what it wants. Secrets via `secrets: inherit` (optional ones resolve empty). - adversarial-review.yml: gadfly's own dogfood is now a thin CALLER of the reusable (proves it end-to-end; advisory so safe to dogfood). - examples/reusable.yml: the slim ~8-line consumer stub. - README / examples/README / CLAUDE.md document the subscribe path. Caveat: consumers with arbitrary GADFLY_ENDPOINT_<NAME>s still need the full stub (a reusable workflow can't enumerate dynamic secret names). YAML validated; Go unchanged (build + test green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 19:14:03 -04:00
steve	a4cdc905c9	ci: enable claude-code/opus:max (max-thinking) reviewer (#6 ) Build & push image / build-and-push (push) Successful in 6s Details Adds claude-code/opus:max to the dogfood swarm and pins to :sha-c342bdb (which has the :thinking parse). Claude Code lineup is now sonnet + opus + opus:max. All three ran end-to-end on this PR's own review; 0 findings (clean PR + the telemetry fix suppressing phantom clean-verification findings — working as intended). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 22:49:49 +00:00
steve	c342bdb905	feat: add claude-code/opus reviewer + max-thinking spec support (#5 ) Build & push image / build-and-push (push) Successful in 15s Details Adds claude-code/opus to gadfly's dogfood swarm (both sonnet and opus run end-to-end), bumps the image pin to :sha-80d8f53 so the clean-lens telemetry fix is live, and adds engine support for a "claude-code/<model>:max" extended-thinking spec (MAX_THINKING_TOKENS, best-effort). Validated: only 13 findings on this clean PR vs 43 on the comparable #4 — the telemetry fix works. Folded in the swarm's two real findings: a runPass env-injection test and keeping MAX_THINKING_TOKENS in claudeEnv. Follow-up enables claude-code/opus:max once this image builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 22:39:14 +00:00
steve	80d8f53f63	fix: clean-lens findings + trim the dogfood swarm to strong reviewers (#4 ) Build & push image / build-and-push (push) Successful in 9s Details emit() now skips findings extraction for a "No material issues found" lens (its path:line refs are verification notes, not problems), fixing the FP inflation that penalized thorough clean-pass reviewers. Also trims the dogfood swarm to the strong reviewers: drops m5/qwen3.6 (last local lane), gemma4, gpt-oss:120b, and kimi-k2.7-code — leaving 6 cloud + claude-code/sonnet. Fittingly, PR #4's own 11-model review produced 43 findings that were ALL clean-verification bullets (zero real) — a live demonstration of the bug this fixes. gofmt clean, go vet quiet, go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 22:14:07 +00:00
steve	82f7ef78d5	feat: claude-code backends + llamaswap provider + dogfood the CC engine (#3 ) Build & push image / build-and-push (push) Successful in 10s Details Phase 2: bump majordomo to latest and wire its new llamaswap provider into gadfly's endpoint switches; add claude-code/sonnet to gadfly's own dogfood swarm (pin :sha-86f12c1, map CLAUDE_CODE_OAUTH_TOKEN) so the Phase-1 engine runs as a live competitor; document the Ollama-through-CC ANTHROPIC_BASE_URL proxy path as example-only. The 11-model swarm (incl. claude-code/sonnet) reviewed it; 52 findings graded via the MCP. Folded in the two real ones: a llamaswap endpointProvider test (caught by claude-code/sonnet, citing CLAUDE.md) and adding "openai-compatible" to the provider error messages (gpt-oss). gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 21:53:41 +00:00
steve	86f12c126f	feat: claude-code reviewer engine (#2 ) Build & push image / build-and-push (push) Successful in 28s Details Phase 1: a second review engine alongside the majordomo agent loop. For each lens, shell out to the Claude Code CLI (`claude -p --output-format json`) inside the checked-out repo so it verifies findings with its own read tools, then reuse gadfly's verdict-parse + recheck + consolidate + emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`; auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY; read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node + @anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board image (PR #2 was the first dogfood with the live board + full fleet). Folded in the swarm's own review findings: minimal subprocess env (no GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr), process-group cleanup on timeout, rune-safe error truncation, and engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings graded via the gadfly MCP. gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 20:40:41 +00:00
steve	c3d09d3bd4	feat: live status-board comment + full-fleet dogfood (#1 ) Build & push image / build-and-push (push) Successful in 6s Details Phase 3: one consolidated, live-updating PR comment aggregating every model's per-lens progress (queued -> running -> finished + verdict), so the swarm's progress is visible at a glance and a watcher can tell when it's done. Opt-in statusWriter in the binary (atomic writes) + a background status-board.sh renderer wired through entrypoint.sh; default on, GADFLY_STATUS_BOARD=0 to disable. Also restores gadfly's dogfood swarm to the full cloud fleet (9 cloud + M5; M1 dropped as too slow) matching mort, and folds in the 3 real bugs the swarm found on its own PR (skip-binary stuck-waiting, panic-stuck lens, busy-loop on bad poll interval). All 36 findings graded via the gadfly MCP (18 real / 18 false-positive). gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 19:00:12 +00:00
steve	0ad5b66170	ci: dogfood — gadfly reviews its own PRs (mort's full-fleet setup) Build & push image / build-and-push (push) Successful in 14s Details Adds the adversarial-review workflow to gadfly itself (copied from mort: 3 cloud + m1/m5 via foreman, findings telemetry, sha-d7f364d). Future gadfly PRs get reviewed by the swarm. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 13:26:37 -04:00
Steve Dudenhoeffer	676c9d4f07	ci: skip image rebuild on docs/example-only changes (paths-ignore) Build & push image / build-and-push (push) Successful in 5s Details Tag pushes (v*) bypass path filters, so releases always build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 19:10:24 -04:00
Steve Dudenhoeffer	6123604595	ci: auto build & push image on main (:latest) + v* tags Build & push image / build-and-push (push) Successful in 58s Details Mirror mort-ci.yml's build-and-push: BuildKit secrets (REGISTRY_USER/ REGISTRY_PASSWORD) for private majordomo access instead of build-args, and the LAN --add-host so the builder can reach the registry. push main -> :latest + :sha-<short>; tag v* -> :<tag> + :latest; other branches -> :branch-<safe>; PRs build-only (no push). Optional DISCORD_WEBHOOK_URL notifications. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 18:45:48 -04:00
Steve Dudenhoeffer	c0d0152a34	Gadfly: agentic adversarial PR reviewer (initial extraction) Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/ find_files/get_diff), verifies findings before reporting, two-pass review + adversarial recheck, posts one labeled comment per model. Advisory only. - cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo - entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML) - Dockerfile: multi-stage; build-time module token never reaches the final image - .gitea/workflows/build-image.yml: tag v* → build & push image - examples/: ~15-line consumer stub - system prompt genericized + hardened to re-derive constants/formulas (semantic bugs) Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 18:42:20 -04:00

12 Commits