Commit Graph

13 Commits

Author SHA1 Message Date
steve b860119332 feat: re-platform agentic review onto executus + large-PR cost controls
Build & push image / build-and-push (pull_request) Successful in 11s
Adversarial Review (Gadfly) / review (pull_request) Successful in 25m56s
Fixes the large-PR token burn: a ~250K-token diff was re-sent every agent
step across models × lenses × passes, draining a metered usage block in
minutes. Small PRs are untouched (every mitigation is size-gated / no-op
under threshold).

- Re-platform the in-process review path onto executus run.Executor: context
  compaction (executus/compact, threshold from the model's real context window
  via executus/model), run-bounding, a per-PR budget gate (Ports.Budget), and
  the wrap-up nudge re-expressed as a run.Critic. Lens fan-out now uses
  executus/fanout. gadfly keeps its own model.go, so GADFLY_ENDPOINT_<NAME>
  aliases and the claude-code engine are unaffected. No majordomo bump; the
  binary stays static (executus core is majordomo+stdlib only).
- Paginate get_diff (per-file `path` + start_line/limit) instead of dumping the
  whole diff; trim the recheck diff embed (60k -> 20k chars).
- entrypoint.sh: downshift the fleet above GADFLY_HUGE_DIFF_BYTES (one cheap
  model, fewer lenses/steps, no recheck) + a swarm-wide GADFLY_PR_BUDGET_SECS
  wall-clock backstop (adds procps for pkill). All advisory; CI never fails.
- README + CLAUDE.md + tests updated.

Note: run.Result exposes no transcript, so the old transcript-based forced-
finalization fallback is dropped; the wrap-up critic nudge is the remaining
"always emit something" mechanism.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 10:43:34 -04:00
steve 7bc3c982fa feat(reusable): runtime-variable swarm config (cache-immune, no more re-pinning to retune) (#14)
Build & push image / build-and-push (push) Successful in 5s
2026-06-28 06:00:18 +00:00
steve 95a9ec546a feat(reusable): add the 4090 Ti (qwen3.6-27b via llama-swap) to the default swarm (#13)
Build & push image / build-and-push (push) Successful in 7s
2026-06-28 05:01:50 +00:00
steve 0d80ae73d8 tune(reusable): claude-code=3 models × 5 lenses (claude was the bottleneck) (#11)
Build & push image / build-and-push (push) Successful in 8s
2026-06-28 04:02:17 +00:00
steve b02b11d691 feat(reusable): ship the curated swarm as the default config consumers inherit (#10)
Build & push image / build-and-push (push) Successful in 8s
2026-06-28 02:23:40 +00:00
Steve Dudenhoeffer daff6d08a1 docs: drop stale 'secrets: inherit' mentions (reusable comment + CLAUDE.md)
Build & push image / build-and-push (pull_request) Successful in 6s
Self-review on PR #9 flagged two doc-drift spots left over from the
explicit-secret-forwarding switch. Cosmetic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 21:00:40 -04:00
steve 5f86062a5a feat: Phase 4 — reusable "subscribe" workflow (+ dogfood it) (#8)
Build & push image / build-and-push (push) Successful in 9s
Centralizes the consumer stub into a reusable Gitea workflow
(.gitea/workflows/review-reusable.yml, workflow_call + defaulted inputs +
secrets: inherit); gadfly's own dogfood is now a thin caller of it, which
proved end-to-end that github.event context propagates into the reusable
on this act_runner. Adds the slim examples/reusable.yml stub + docs.

Folded in the swarm's findings: timeout_minutes default 30->45, map
GADFLY_API_KEY, explicit permissions block, drop the dead specialist_suite
input, and harden the example's actor gate. ~70 findings graded.

Completes the gadfly-games build (Phases 1-4 + quality fixes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 23:42:01 +00:00
steve 86f12c126f feat: claude-code reviewer engine (#2)
Build & push image / build-and-push (push) Successful in 28s
Phase 1: a second review engine alongside the majordomo agent loop. For
each lens, shell out to the Claude Code CLI (`claude -p --output-format
json`) inside the checked-out repo so it verifies findings with its own
read tools, then reuse gadfly's verdict-parse + recheck + consolidate +
emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`;
auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY;
read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node +
@anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board
image (PR #2 was the first dogfood with the live board + full fleet).

Folded in the swarm's own review findings: minimal subprocess env (no
GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr),
process-group cleanup on timeout, rune-safe error truncation, and
engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings
graded via the gadfly MCP.

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 20:40:41 +00:00
steve c3d09d3bd4 feat: live status-board comment + full-fleet dogfood (#1)
Build & push image / build-and-push (push) Successful in 6s
Phase 3: one consolidated, live-updating PR comment aggregating every
model's per-lens progress (queued -> running -> finished + verdict), so
the swarm's progress is visible at a glance and a watcher can tell when
it's done. Opt-in statusWriter in the binary (atomic writes) + a
background status-board.sh renderer wired through entrypoint.sh; default
on, GADFLY_STATUS_BOARD=0 to disable.

Also restores gadfly's dogfood swarm to the full cloud fleet (9 cloud +
M5; M1 dropped as too slow) matching mort, and folds in the 3 real bugs
the swarm found on its own PR (skip-binary stuck-waiting, panic-stuck
lens, busy-loop on bad poll interval). All 36 findings graded via the
gadfly MCP (18 real / 18 false-positive).

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 19:00:12 +00:00
Steve Dudenhoeffer 9e582bfaca feat: per-provider concurrency lanes (cloud parallel while local churns)
Build & push image / build-and-push (push) Successful in 7s
entrypoint.sh groups models by provider into lanes that run in PARALLEL; within
a lane at most `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY map
("ollama-cloud=3,m1pro=1") else GADFLY_CONCURRENCY (default 1). So a single
local box stays serial (1 at a time) while cloud models run several at once and
both lanes progress simultaneously. Portable bash (no associative arrays).
Default cap 1 keeps a single-provider pool sequential as before. Pairs with the
per-lens timeout so a slow lane can't starve others. Docs: README Concurrency
section + config table; CLAUDE.md lessons incl. the docker://:latest cache gotcha.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 20:29:08 -04:00
Steve Dudenhoeffer 4b8f9aa39b feat: dynamic auto specialist selection + worker-tier delegation
Build & push image / build-and-push (push) Successful in 33s
Two Phase-2 swarm upgrades:

- auto.go: GADFLY_SPECIALISTS=auto routes the review — a selector model
  (GADFLY_SELECTOR_MODEL, else the review model) reads the changed files + PR
  description and picks the smallest relevant lens set from the catalog, and may
  propose ad-hoc lenses for gaps (e.g. migrations). Structured output via
  majordomo.Generate[T]; capped + de-duped; falls back to the default suite.
- delegate.go: GADFLY_WORKER_MODEL adds a delegate_investigation tool so the
  reviewer offloads mechanical legwork (trace callers, gather usages) to a cheap
  worker sub-agent that returns an evidence-cited digest — the top model reasons
  over summaries, not raw file dumps. Workers get an fs-only toolbox (no
  sub-delegation). Unset = off.

resolveSpecialists now also returns the registry + an auto flag. Docs (README
Specialists + config table, CLAUDE.md, main.go header) + tests updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:35:59 -04:00
Steve Dudenhoeffer 7809d1b93d feat: specialist suite — configurable + custom review lenses (one consolidated comment)
Build & push image / build-and-push (push) Successful in 8s
Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).

- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
  maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
  improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
  GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
  Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
  Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
  focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
  specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
  CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:23:05 -04:00
Steve Dudenhoeffer 04cd260ff9 docs: add CLAUDE.md + provider example configs
Build & push image / build-and-push (push) Successful in 6s
- CLAUDE.md: project goals (advisory-only, real-bugs-not-nits, easy-to-enable,
  provider-agnostic, portable), architecture map, build/test/release, and
  maintenance rules — incl. "keep README + examples/ current with any env/flag/
  provider/trigger change" and the advisory-only invariant.
- examples/: local-ollama.yml, openai-compatible.yml, endpoint-aliases.yml +
  an examples/README index; README setup step points at them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:06:08 -04:00