feat: re-platform agentic review onto executus + large-PR cost controls (#20)
Build & push image / build-and-push (push) Successful in 33s

Makes gadfly a consumer of executus (run.Executor compaction/bounding/budget/critic + fanout) and fixes the large-PR token burn in size-gated layers: paginated get_diff, downshift above GADFLY_HUGE_DIFF_BYTES, and a swarm-wide GADFLY_PR_BUDGET_SECS backstop. Small PRs untouched; advisory-only and the static binary preserved. Dogfood swarm reviewed it (6 models, 21 real findings graded + folded in).

Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
This commit was merged in pull request #20.
This commit is contained in:
2026-06-30 15:41:03 +00:00
committed by steve
parent 5007597cf9
commit ac6ce06cdd
19 changed files with 1065 additions and 249 deletions
+29 -5
View File
@@ -22,15 +22,21 @@ verifies each one against the actual code, and posts its findings as a comment.
4. **Provider-agnostic.** Powered by [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo),
so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any
OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib + majordomo. Keep
it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).
5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib, majordomo, and
[executus](https://gitea.stevedudenhoeffer.com/steve/executus) (whose *core*`run`/`compact`/
`model`/`fanout`/`tool` — is itself majordomo+stdlib only, so the binary stays static; do NOT
pull executus's `contrib/store` or any battery that drags in a DB driver). No heavyweight deps,
no coupling to any one consumer repo (e.g. mort). Gadfly is executus's canonical *light* consumer.
## Architecture
```
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
main.go orchestration: loop specialists, each a review pass + adversarial recheck
engine.go reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
main.go orchestration: fan specialists out (executus/fanout), each a review pass + recheck
engine.go reviewEngine abstraction: executus run.Executor (majordomo agent loop +
compaction/bounding/budget/critic) vs claude-code CLI shell-out
executus.go executus wiring: tool.Registry over the repo tools, the run.Executor build
(compact + model context-limit threshold + per-PR budget + wrap-up critic)
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
auto.go dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
delegate.go worker-tier delegate_investigation tool (cheap sub-agent does legwork)
@@ -69,7 +75,7 @@ verdict. Verdict is one of: `No material issues found` / `Minor issues` / `Block
## Build / test
```sh
go build ./cmd/gadfly # needs read access to the private majordomo module
go build ./cmd/gadfly # needs read access to the private majordomo + executus modules
go test ./...
gofmt -l cmd/ # must be clean
docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .
@@ -149,3 +155,21 @@ are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstractio
parallel, `cap` (from `GADFLY_PROVIDER_CONCURRENCY` else `GADFLY_CONCURRENCY`, default 1) bounds
models-at-once within a lane. The review timeout (`GADFLY_TIMEOUT_SECS`) is **per-lens**, not
shared across the suite — a slow model can't starve later lenses (the original timeout bug).
- **Large-PR token burn**: the agent loop re-sends the whole transcript every step, so a giant
diff (the old `get_diff` dumped it untruncated, and it was embedded in both the review and
recheck task) was re-transmitted ~steps × lenses × passes × models times — a ~250 K-token PR
could drain a metered usage block in minutes. Fixed in three size-gated layers (small PRs
untouched): paginated `get_diff` + `executus/compact` compaction in the binary; an
`entrypoint.sh` downshift above `GADFLY_HUGE_DIFF_BYTES` (one cheap model, fewer lenses/steps,
no recheck); and a swarm-wide `GADFLY_PR_BUDGET_SECS` wall-clock backstop. Compaction's threshold
is intentionally LOW (`GADFLY_COMPACT_RATIO` 0.45, not executus's 0.7) because the burning
transcript on the embedded path rarely reaches 0.7×context.
- **executus re-platform**: the in-process review path runs through `executus/run`'s `run.Executor`
(compaction, run-bounding, `Ports.Budget`, the wrap-up nudge as `Ports.Critic`), wiring it in
`cmd/gadfly/executus.go`. Gadfly KEEPS its own `model.go` resolution (so `GADFLY_ENDPOINT_<NAME>`
http aliases + the claude-code engine survive) and only hands `run.Executor` the already-resolved
model via a trivial resolver — do NOT route review-model resolution through
`model.ParseModelForContext` (it bypasses gadfly's endpoint aliases). `run.Result` exposes no
transcript, so the old transcript-based forced-finalization fallback is gone; the wrap-up critic
nudge is the remaining "always emit something" mechanism. The claude-code engine still shells out
and is unaffected.