feat: re-platform agentic review onto executus + large-PR cost controls
Build & push image / build-and-push (pull_request) Successful in 11s
Adversarial Review (Gadfly) / review (pull_request) Successful in 25m56s

Fixes the large-PR token burn: a ~250K-token diff was re-sent every agent
step across models × lenses × passes, draining a metered usage block in
minutes. Small PRs are untouched (every mitigation is size-gated / no-op
under threshold).

- Re-platform the in-process review path onto executus run.Executor: context
  compaction (executus/compact, threshold from the model's real context window
  via executus/model), run-bounding, a per-PR budget gate (Ports.Budget), and
  the wrap-up nudge re-expressed as a run.Critic. Lens fan-out now uses
  executus/fanout. gadfly keeps its own model.go, so GADFLY_ENDPOINT_<NAME>
  aliases and the claude-code engine are unaffected. No majordomo bump; the
  binary stays static (executus core is majordomo+stdlib only).
- Paginate get_diff (per-file `path` + start_line/limit) instead of dumping the
  whole diff; trim the recheck diff embed (60k -> 20k chars).
- entrypoint.sh: downshift the fleet above GADFLY_HUGE_DIFF_BYTES (one cheap
  model, fewer lenses/steps, no recheck) + a swarm-wide GADFLY_PR_BUDGET_SECS
  wall-clock backstop (adds procps for pkill). All advisory; CI never fails.
- README + CLAUDE.md + tests updated.

Note: run.Result exposes no transcript, so the old transcript-based forced-
finalization fallback is dropped; the wrap-up critic nudge is the remaining
"always emit something" mechanism.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-30 10:43:34 -04:00
parent 5007597cf9
commit b860119332
19 changed files with 935 additions and 224 deletions
+29 -5
View File
@@ -22,15 +22,21 @@ verifies each one against the actual code, and posts its findings as a comment.
4. **Provider-agnostic.** Powered by [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo),
so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any
OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib + majordomo. Keep
it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).
5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib, majordomo, and
[executus](https://gitea.stevedudenhoeffer.com/steve/executus) (whose *core*`run`/`compact`/
`model`/`fanout`/`tool` — is itself majordomo+stdlib only, so the binary stays static; do NOT
pull executus's `contrib/store` or any battery that drags in a DB driver). No heavyweight deps,
no coupling to any one consumer repo (e.g. mort). Gadfly is executus's canonical *light* consumer.
## Architecture
```
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
main.go orchestration: loop specialists, each a review pass + adversarial recheck
engine.go reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
main.go orchestration: fan specialists out (executus/fanout), each a review pass + recheck
engine.go reviewEngine abstraction: executus run.Executor (majordomo agent loop +
compaction/bounding/budget/critic) vs claude-code CLI shell-out
executus.go executus wiring: tool.Registry over the repo tools, the run.Executor build
(compact + model context-limit threshold + per-PR budget + wrap-up critic)
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
auto.go dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
delegate.go worker-tier delegate_investigation tool (cheap sub-agent does legwork)
@@ -69,7 +75,7 @@ verdict. Verdict is one of: `No material issues found` / `Minor issues` / `Block
## Build / test
```sh
go build ./cmd/gadfly # needs read access to the private majordomo module
go build ./cmd/gadfly # needs read access to the private majordomo + executus modules
go test ./...
gofmt -l cmd/ # must be clean
docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .
@@ -149,3 +155,21 @@ are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstractio
parallel, `cap` (from `GADFLY_PROVIDER_CONCURRENCY` else `GADFLY_CONCURRENCY`, default 1) bounds
models-at-once within a lane. The review timeout (`GADFLY_TIMEOUT_SECS`) is **per-lens**, not
shared across the suite — a slow model can't starve later lenses (the original timeout bug).
- **Large-PR token burn**: the agent loop re-sends the whole transcript every step, so a giant
diff (the old `get_diff` dumped it untruncated, and it was embedded in both the review and
recheck task) was re-transmitted ~steps × lenses × passes × models times — a ~250 K-token PR
could drain a metered usage block in minutes. Fixed in three size-gated layers (small PRs
untouched): paginated `get_diff` + `executus/compact` compaction in the binary; an
`entrypoint.sh` downshift above `GADFLY_HUGE_DIFF_BYTES` (one cheap model, fewer lenses/steps,
no recheck); and a swarm-wide `GADFLY_PR_BUDGET_SECS` wall-clock backstop. Compaction's threshold
is intentionally LOW (`GADFLY_COMPACT_RATIO` 0.45, not executus's 0.7) because the burning
transcript on the embedded path rarely reaches 0.7×context.
- **executus re-platform**: the in-process review path runs through `executus/run`'s `run.Executor`
(compaction, run-bounding, `Ports.Budget`, the wrap-up nudge as `Ports.Critic`), wiring it in
`cmd/gadfly/executus.go`. Gadfly KEEPS its own `model.go` resolution (so `GADFLY_ENDPOINT_<NAME>`
http aliases + the claude-code engine survive) and only hands `run.Executor` the already-resolved
model via a trivial resolver — do NOT route review-model resolution through
`model.ParseModelForContext` (it bypasses gadfly's endpoint aliases). `run.Result` exposes no
transcript, so the old transcript-based forced-finalization fallback is gone; the wrap-up critic
nudge is the remaining "always emit something" mechanism. The claude-code engine still shells out
and is unaffected.