feat: dynamic auto specialist selection + worker-tier delegation
Build & push image / build-and-push (push) Successful in 33s

Two Phase-2 swarm upgrades:

- auto.go: GADFLY_SPECIALISTS=auto routes the review — a selector model
  (GADFLY_SELECTOR_MODEL, else the review model) reads the changed files + PR
  description and picks the smallest relevant lens set from the catalog, and may
  propose ad-hoc lenses for gaps (e.g. migrations). Structured output via
  majordomo.Generate[T]; capped + de-duped; falls back to the default suite.
- delegate.go: GADFLY_WORKER_MODEL adds a delegate_investigation tool so the
  reviewer offloads mechanical legwork (trace callers, gather usages) to a cheap
  worker sub-agent that returns an evidence-cited digest — the top model reasons
  over summaries, not raw file dumps. Workers get an fs-only toolbox (no
  sub-delegation). Unset = off.

resolveSpecialists now also returns the registry + an auto flag. Docs (README
Specialists + config table, CLAUDE.md, main.go header) + tests updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Steve Dudenhoeffer
2026-06-25 19:35:59 -04:00
parent 7809d1b93d
commit 4b8f9aa39b
9 changed files with 370 additions and 26 deletions
+18 -2
View File
@@ -138,9 +138,21 @@ define:
focus: "Review schema migrations for destructive ops, missing indexes, table locks."
```
**Dynamic selection (`auto`):** set `GADFLY_SPECIALISTS: auto` and a selector model reads the
changed files + PR description and picks only the lenses that materially apply (and may invent
an ad-hoc one — e.g. a "migrations" lens for a schema change). The selector is
`GADFLY_SELECTOR_MODEL` if set (a cheap tier is ideal), else the review model. Capped and
de-duplicated; falls back to the default suite if selection fails.
**Worker-tier delegation:** set `GADFLY_WORKER_MODEL` (a cheap/fast model) to give every
reviewer a `delegate_investigation` tool — it offloads mechanical legwork (trace all callers,
gather every usage, check a pattern across files) to a worker sub-agent that returns a concise,
evidence-cited digest, so the expensive model reasons over summaries instead of raw file dumps.
Unset = no delegation (current behavior).
> **Cost:** each specialist is its own review+recheck, so cost ≈ *specialists × models × 2*.
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, and a future
> `auto` mode will let a cheap model pick only the lenses a given diff actually needs.
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, let `auto` pick
> only what a diff needs, and point heavy legwork at a cheap `GADFLY_WORKER_MODEL`.
### Triggers
@@ -178,6 +190,10 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
| `GADFLY_PROVIDER` | `ollama-cloud` | provider prefix for a bare model id |
| `GADFLY_BASE_URL` | — | override endpoint (OpenAI/Ollama-compatible servers) |
| `GADFLY_API_KEY` | — | provider key; falls back to the provider's standard env |
| `GADFLY_SPECIALISTS` | default suite | csv of lenses, `all`, or `auto` (dynamic selection) |
| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
| `GADFLY_WORKER_MAX_STEPS` | 8 | tool-step cap for a delegated worker run |
| `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap |
| `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
| `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |