Commit Graph

19 Commits

Author SHA1 Message Date
steve 7bc3c982fa feat(reusable): runtime-variable swarm config (cache-immune, no more re-pinning to retune) (#14)
Build & push image / build-and-push (push) Successful in 5s
2026-06-28 06:00:18 +00:00
steve 95a9ec546a feat(reusable): add the 4090 Ti (qwen3.6-27b via llama-swap) to the default swarm (#13)
Build & push image / build-and-push (push) Successful in 7s
2026-06-28 05:01:50 +00:00
steve 8f69e71311 docs: recommend the @v1 release tag for reusable-workflow consumers (#12)
Build & push image / build-and-push (push) Successful in 6s
2026-06-28 04:17:19 +00:00
steve 0d80ae73d8 tune(reusable): claude-code=3 models × 5 lenses (claude was the bottleneck) (#11)
Build & push image / build-and-push (push) Successful in 8s
2026-06-28 04:02:17 +00:00
steve b02b11d691 feat(reusable): ship the curated swarm as the default config consumers inherit (#10)
Build & push image / build-and-push (push) Successful in 8s
2026-06-28 02:23:40 +00:00
Steve Dudenhoeffer daff6d08a1 docs: drop stale 'secrets: inherit' mentions (reusable comment + CLAUDE.md)
Build & push image / build-and-push (pull_request) Successful in 6s
Self-review on PR #9 flagged two doc-drift spots left over from the
explicit-secret-forwarding switch. Cosmetic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 21:00:40 -04:00
Steve Dudenhoeffer 18de9b8ebc fix: source GITEA_TOKEN from github.token (auto) under explicit secret forwarding
Build & push image / build-and-push (pull_request) Successful in 7s
Adversarial Review (Gadfly) / review (pull_request) Successful in 8m2s
The first attempt failed at entrypoint.sh:61 'GITEA_TOKEN required' — with
explicit secrets (no `inherit`), secrets.GITEA_TOKEN resolves empty in the
reusable job. github.token comes from the github context (not a forwarded
secret), so it's present regardless. The forwarded provider/findings secrets
arrived correctly; only the auto-token sourcing was wrong.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 20:53:00 -04:00
Steve Dudenhoeffer f06fe5ef72 security: scope reusable-workflow secrets (least privilege) over secrets: inherit
Adversarial Review (Gadfly) / review (pull_request) Failing after 2s
Build & push image / build-and-push (pull_request) Successful in 6s
The swarm (reviewing the mort/executus rollout PRs) correctly flagged that
`secrets: inherit` forwards EVERY caller secret to the reusable review
workflow — registry/deploy/db creds the reviewer never touches. Fix:

- review-reusable.yml: declare workflow_call.secrets (all optional) so a
  caller can forward only what the reviewer needs.
- adversarial-review.yml (gadfly's own caller) + examples/reusable.yml:
  replace `secrets: inherit` with an explicit forward of just
  OLLAMA_CLOUD_API_KEY / CLAUDE_CODE_OAUTH_TOKEN / findings tokens.
  GITEA_TOKEN stays automatic.
- Docs (README, examples) updated; also advise pinning consumers to an
  immutable @<sha> instead of @main (supply-chain, the other finding).

gadfly's own review on this PR exercises the explicit-secrets path (local
reusable ref) — validating it on the act_runner before mort/executus adopt it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 20:45:18 -04:00
steve 5f86062a5a feat: Phase 4 — reusable "subscribe" workflow (+ dogfood it) (#8)
Build & push image / build-and-push (push) Successful in 9s
Centralizes the consumer stub into a reusable Gitea workflow
(.gitea/workflows/review-reusable.yml, workflow_call + defaulted inputs +
secrets: inherit); gadfly's own dogfood is now a thin caller of it, which
proved end-to-end that github.event context propagates into the reusable
on this act_runner. Adds the slim examples/reusable.yml stub + docs.

Folded in the swarm's findings: timeout_minutes default 30->45, map
GADFLY_API_KEY, explicit permissions block, drop the dead specialist_suite
input, and harden the example's actor gate. ~70 findings graded.

Completes the gadfly-games build (Phases 1-4 + quality fixes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 23:42:01 +00:00
steve a4cdc905c9 ci: enable claude-code/opus:max (max-thinking) reviewer (#6)
Build & push image / build-and-push (push) Successful in 6s
Adds claude-code/opus:max to the dogfood swarm and pins to :sha-c342bdb
(which has the :thinking parse). Claude Code lineup is now sonnet + opus +
opus:max. All three ran end-to-end on this PR's own review; 0 findings
(clean PR + the telemetry fix suppressing phantom clean-verification
findings — working as intended).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 22:49:49 +00:00
steve c342bdb905 feat: add claude-code/opus reviewer + max-thinking spec support (#5)
Build & push image / build-and-push (push) Successful in 15s
Adds claude-code/opus to gadfly's dogfood swarm (both sonnet and opus run
end-to-end), bumps the image pin to :sha-80d8f53 so the clean-lens
telemetry fix is live, and adds engine support for a
"claude-code/<model>:max" extended-thinking spec (MAX_THINKING_TOKENS,
best-effort). Validated: only 13 findings on this clean PR vs 43 on the
comparable #4 — the telemetry fix works.

Folded in the swarm's two real findings: a runPass env-injection test and
keeping MAX_THINKING_TOKENS in claudeEnv. Follow-up enables
claude-code/opus:max once this image builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 22:39:14 +00:00
steve 80d8f53f63 fix: clean-lens findings + trim the dogfood swarm to strong reviewers (#4)
Build & push image / build-and-push (push) Successful in 9s
emit() now skips findings extraction for a "No material issues found"
lens (its path:line refs are verification notes, not problems), fixing
the FP inflation that penalized thorough clean-pass reviewers. Also trims
the dogfood swarm to the strong reviewers: drops m5/qwen3.6 (last local
lane), gemma4, gpt-oss:120b, and kimi-k2.7-code — leaving 6 cloud +
claude-code/sonnet.

Fittingly, PR #4's own 11-model review produced 43 findings that were ALL
clean-verification bullets (zero real) — a live demonstration of the bug
this fixes. gofmt clean, go vet quiet, go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 22:14:07 +00:00
steve 82f7ef78d5 feat: claude-code backends + llamaswap provider + dogfood the CC engine (#3)
Build & push image / build-and-push (push) Successful in 10s
Phase 2: bump majordomo to latest and wire its new llamaswap provider
into gadfly's endpoint switches; add claude-code/sonnet to gadfly's own
dogfood swarm (pin :sha-86f12c1, map CLAUDE_CODE_OAUTH_TOKEN) so the
Phase-1 engine runs as a live competitor; document the Ollama-through-CC
ANTHROPIC_BASE_URL proxy path as example-only.

The 11-model swarm (incl. claude-code/sonnet) reviewed it; 52 findings
graded via the MCP. Folded in the two real ones: a llamaswap
endpointProvider test (caught by claude-code/sonnet, citing CLAUDE.md)
and adding "openai-compatible" to the provider error messages (gpt-oss).

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 21:53:41 +00:00
steve 86f12c126f feat: claude-code reviewer engine (#2)
Build & push image / build-and-push (push) Successful in 28s
Phase 1: a second review engine alongside the majordomo agent loop. For
each lens, shell out to the Claude Code CLI (`claude -p --output-format
json`) inside the checked-out repo so it verifies findings with its own
read tools, then reuse gadfly's verdict-parse + recheck + consolidate +
emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`;
auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY;
read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node +
@anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board
image (PR #2 was the first dogfood with the live board + full fleet).

Folded in the swarm's own review findings: minimal subprocess env (no
GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr),
process-group cleanup on timeout, rune-safe error truncation, and
engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings
graded via the gadfly MCP.

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 20:40:41 +00:00
steve c3d09d3bd4 feat: live status-board comment + full-fleet dogfood (#1)
Build & push image / build-and-push (push) Successful in 6s
Phase 3: one consolidated, live-updating PR comment aggregating every
model's per-lens progress (queued -> running -> finished + verdict), so
the swarm's progress is visible at a glance and a watcher can tell when
it's done. Opt-in statusWriter in the binary (atomic writes) + a
background status-board.sh renderer wired through entrypoint.sh; default
on, GADFLY_STATUS_BOARD=0 to disable.

Also restores gadfly's dogfood swarm to the full cloud fleet (9 cloud +
M5; M1 dropped as too slow) matching mort, and folds in the 3 real bugs
the swarm found on its own PR (skip-binary stuck-waiting, panic-stuck
lens, busy-loop on bad poll interval). All 36 findings graded via the
gadfly MCP (18 real / 18 false-positive).

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 19:00:12 +00:00
steve 0ad5b66170 ci: dogfood — gadfly reviews its own PRs (mort's full-fleet setup)
Build & push image / build-and-push (push) Successful in 14s
Adds the adversarial-review workflow to gadfly itself (copied from mort: 3 cloud + m1/m5 via foreman, findings telemetry, sha-d7f364d). Future gadfly PRs get reviewed by the swarm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 13:26:37 -04:00
Steve Dudenhoeffer 676c9d4f07 ci: skip image rebuild on docs/example-only changes (paths-ignore)
Build & push image / build-and-push (push) Successful in 5s
Tag pushes (v*) bypass path filters, so releases always build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:10:24 -04:00
Steve Dudenhoeffer 6123604595 ci: auto build & push image on main (:latest) + v* tags
Build & push image / build-and-push (push) Successful in 58s
Mirror mort-ci.yml's build-and-push: BuildKit secrets (REGISTRY_USER/
REGISTRY_PASSWORD) for private majordomo access instead of build-args, and the
LAN --add-host so the builder can reach the registry. push main -> :latest +
:sha-<short>; tag v* -> :<tag> + :latest; other branches -> :branch-<safe>;
PRs build-only (no push). Optional DISCORD_WEBHOOK_URL notifications.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 18:45:48 -04:00
Steve Dudenhoeffer c0d0152a34 Gadfly: agentic adversarial PR reviewer (initial extraction)
Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in
Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/
find_files/get_diff), verifies findings before reporting, two-pass review +
adversarial recheck, posts one labeled comment per model. Advisory only.

- cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo
- entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML)
- Dockerfile: multi-stage; build-time module token never reaches the final image
- .gitea/workflows/build-image.yml: tag v* → build & push image
- examples/: ~15-line consumer stub
- system prompt genericized + hardened to re-derive constants/formulas (semantic bugs)

Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 18:42:20 -04:00