Commit Graph

11 Commits

Author SHA1 Message Date
Steve Dudenhoeffer 4654036dea docs: reconcile examples/README + CLAUDE.md with the heavier reusable default
Build & push image / build-and-push (pull_request) Successful in 6s
From PR #10's own review (maintainability/perf lenses): examples/README.md
hadn't been updated for the default swarm, and CLAUDE.md's 'keep the default
model count low' cost guidance read as contradicting the new heavy default.
Clarify that the IMAGE default stays minimal while the REUSABLE ships an
opinionated heavier default consumers inherit/override.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 22:18:43 -04:00
Steve Dudenhoeffer f882b006d1 tune(reusable): claude-code=1 model x 5 lenses (cap peak at 5 concurrent)
Build & push image / build-and-push (pull_request) Successful in 7s
Run claude models one at a time (provider_concurrency claude-code=1) but each
with all 5 lenses concurrent (provider_lens_concurrency claude-code=5) — peak 5
concurrent claude -p per pass instead of 15, friendlier to one subscription.
Updated all the 'three claudes at once' wording across the workflow + docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 22:12:17 -04:00
Steve Dudenhoeffer 79da1bfde3 feat(reusable): ship the curated swarm as the default config consumers inherit
Build & push image / build-and-push (pull_request) Successful in 8s
Adversarial Review (Gadfly) / review (pull_request) Successful in 14m47s
Make the reusable workflow's input defaults BE the standard Gadfly swarm so a
consumer subscribes by just calling it (no `with:` block) and inherits:
- models: 3 strong cloud (minimax-m3, glm-5.2, deepseek-v4-pro) + Claude Code
  (sonnet, opus, opus:max)
- specialists: the 5-lens default suite (security, correctness, maintainability,
  performance, error-handling)
- provider_concurrency: ollama-cloud=3,claude-code=3 (all three claudes at once)
- timeout_minutes default 45 -> 90 (5 lenses x 2 passes over a slow lane)

The default is opinionated (needs OLLAMA_CLOUD_API_KEY + CLAUDE_CODE_OAUTH_TOKEN);
consumers override `models:` for cloud-only / other providers. gadfly's own
caller is slimmed to inherit (only allowed_users remains). examples/reusable.yml
keeps a cloud-only `models:` override so a public copy works with just the
Ollama key. README/CLAUDE.md updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 22:05:31 -04:00
Steve Dudenhoeffer daff6d08a1 docs: drop stale 'secrets: inherit' mentions (reusable comment + CLAUDE.md)
Build & push image / build-and-push (pull_request) Successful in 6s
Self-review on PR #9 flagged two doc-drift spots left over from the
explicit-secret-forwarding switch. Cosmetic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 21:00:40 -04:00
steve 5f86062a5a feat: Phase 4 — reusable "subscribe" workflow (+ dogfood it) (#8)
Build & push image / build-and-push (push) Successful in 9s
Centralizes the consumer stub into a reusable Gitea workflow
(.gitea/workflows/review-reusable.yml, workflow_call + defaulted inputs +
secrets: inherit); gadfly's own dogfood is now a thin caller of it, which
proved end-to-end that github.event context propagates into the reusable
on this act_runner. Adds the slim examples/reusable.yml stub + docs.

Folded in the swarm's findings: timeout_minutes default 30->45, map
GADFLY_API_KEY, explicit permissions block, drop the dead specialist_suite
input, and harden the example's actor gate. ~70 findings graded.

Completes the gadfly-games build (Phases 1-4 + quality fixes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 23:42:01 +00:00
steve 86f12c126f feat: claude-code reviewer engine (#2)
Build & push image / build-and-push (push) Successful in 28s
Phase 1: a second review engine alongside the majordomo agent loop. For
each lens, shell out to the Claude Code CLI (`claude -p --output-format
json`) inside the checked-out repo so it verifies findings with its own
read tools, then reuse gadfly's verdict-parse + recheck + consolidate +
emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`;
auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY;
read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node +
@anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board
image (PR #2 was the first dogfood with the live board + full fleet).

Folded in the swarm's own review findings: minimal subprocess env (no
GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr),
process-group cleanup on timeout, rune-safe error truncation, and
engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings
graded via the gadfly MCP.

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 20:40:41 +00:00
steve c3d09d3bd4 feat: live status-board comment + full-fleet dogfood (#1)
Build & push image / build-and-push (push) Successful in 6s
Phase 3: one consolidated, live-updating PR comment aggregating every
model's per-lens progress (queued -> running -> finished + verdict), so
the swarm's progress is visible at a glance and a watcher can tell when
it's done. Opt-in statusWriter in the binary (atomic writes) + a
background status-board.sh renderer wired through entrypoint.sh; default
on, GADFLY_STATUS_BOARD=0 to disable.

Also restores gadfly's dogfood swarm to the full cloud fleet (9 cloud +
M5; M1 dropped as too slow) matching mort, and folds in the 3 real bugs
the swarm found on its own PR (skip-binary stuck-waiting, panic-stuck
lens, busy-loop on bad poll interval). All 36 findings graded via the
gadfly MCP (18 real / 18 false-positive).

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 19:00:12 +00:00
Steve Dudenhoeffer 9e582bfaca feat: per-provider concurrency lanes (cloud parallel while local churns)
Build & push image / build-and-push (push) Successful in 7s
entrypoint.sh groups models by provider into lanes that run in PARALLEL; within
a lane at most `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY map
("ollama-cloud=3,m1pro=1") else GADFLY_CONCURRENCY (default 1). So a single
local box stays serial (1 at a time) while cloud models run several at once and
both lanes progress simultaneously. Portable bash (no associative arrays).
Default cap 1 keeps a single-provider pool sequential as before. Pairs with the
per-lens timeout so a slow lane can't starve others. Docs: README Concurrency
section + config table; CLAUDE.md lessons incl. the docker://:latest cache gotcha.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 20:29:08 -04:00
Steve Dudenhoeffer 4b8f9aa39b feat: dynamic auto specialist selection + worker-tier delegation
Build & push image / build-and-push (push) Successful in 33s
Two Phase-2 swarm upgrades:

- auto.go: GADFLY_SPECIALISTS=auto routes the review — a selector model
  (GADFLY_SELECTOR_MODEL, else the review model) reads the changed files + PR
  description and picks the smallest relevant lens set from the catalog, and may
  propose ad-hoc lenses for gaps (e.g. migrations). Structured output via
  majordomo.Generate[T]; capped + de-duped; falls back to the default suite.
- delegate.go: GADFLY_WORKER_MODEL adds a delegate_investigation tool so the
  reviewer offloads mechanical legwork (trace callers, gather usages) to a cheap
  worker sub-agent that returns an evidence-cited digest — the top model reasons
  over summaries, not raw file dumps. Workers get an fs-only toolbox (no
  sub-delegation). Unset = off.

resolveSpecialists now also returns the registry + an auto flag. Docs (README
Specialists + config table, CLAUDE.md, main.go header) + tests updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:35:59 -04:00
Steve Dudenhoeffer 7809d1b93d feat: specialist suite — configurable + custom review lenses (one consolidated comment)
Build & push image / build-and-push (push) Successful in 8s
Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).

- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
  maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
  improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
  GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
  Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
  Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
  focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
  specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
  CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:23:05 -04:00
Steve Dudenhoeffer 04cd260ff9 docs: add CLAUDE.md + provider example configs
Build & push image / build-and-push (push) Successful in 6s
- CLAUDE.md: project goals (advisory-only, real-bugs-not-nits, easy-to-enable,
  provider-agnostic, portable), architecture map, build/test/release, and
  maintenance rules — incl. "keep README + examples/ current with any env/flag/
  provider/trigger change" and the advisory-only invariant.
- examples/: local-ollama.yml, openai-compatible.yml, endpoint-aliases.yml +
  an examples/README index; README setup step points at them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:06:08 -04:00