feat: dynamic auto specialist selection + worker-tier delegation

Two Phase-2 swarm upgrades: - auto.go: GADFLY_SPECIALISTS=auto routes the review — a selector model (GADFLY_SELECTOR_MODEL, else the review model) reads the changed files + PR description and picks the smallest relevant lens set from the catalog, and may propose ad-hoc lenses for gaps (e.g. migrations). Structured output via majordomo.Generate[T]; capped + de-duped; falls back to the default suite. - delegate.go: GADFLY_WORKER_MODEL adds a delegate_investigation tool so the reviewer offloads mechanical legwork (trace callers, gather usages) to a cheap worker sub-agent that returns an evidence-cited digest — the top model reasons over summaries, not raw file dumps. Workers get an fs-only toolbox (no sub-delegation). Unset = off. resolveSpecialists now also returns the registry + an auto flag. Docs (README Specialists + config table, CLAUDE.md, main.go header) + tests updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:35:59 -04:00
parent 7809d1b93d
commit 4b8f9aa39b
9 changed files with 370 additions and 26 deletions
@@ -138,9 +138,21 @@ define:
    focus: "Review schema migrations for destructive ops, missing indexes, table locks."
 ```

+**Dynamic selection (`auto`):** set `GADFLY_SPECIALISTS: auto` and a selector model reads the
+changed files + PR description and picks only the lenses that materially apply (and may invent
+an ad-hoc one — e.g. a "migrations" lens for a schema change). The selector is
+`GADFLY_SELECTOR_MODEL` if set (a cheap tier is ideal), else the review model. Capped and
+de-duplicated; falls back to the default suite if selection fails.
+
+**Worker-tier delegation:** set `GADFLY_WORKER_MODEL` (a cheap/fast model) to give every
+reviewer a `delegate_investigation` tool — it offloads mechanical legwork (trace all callers,
+gather every usage, check a pattern across files) to a worker sub-agent that returns a concise,
+evidence-cited digest, so the expensive model reasons over summaries instead of raw file dumps.
+Unset = no delegation (current behavior).
+
 > **Cost:** each specialist is its own review+recheck, so cost ≈ *specialists × models × 2*.
-> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, and a future
-> `auto` mode will let a cheap model pick only the lenses a given diff actually needs.
+> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, let `auto` pick
+> only what a diff needs, and point heavy legwork at a cheap `GADFLY_WORKER_MODEL`.

 ### Triggers

@@ -178,6 +190,10 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
 | `GADFLY_PROVIDER` | `ollama-cloud` | provider prefix for a bare model id |
 | `GADFLY_BASE_URL` | — | override endpoint (OpenAI/Ollama-compatible servers) |
 | `GADFLY_API_KEY` | — | provider key; falls back to the provider's standard env |
+| `GADFLY_SPECIALISTS` | default suite | csv of lenses, `all`, or `auto` (dynamic selection) |
+| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
+| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
+| `GADFLY_WORKER_MAX_STEPS` | 8 | tool-step cap for a delegated worker run |
 | `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap |
 | `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
 | `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |