feat: re-platform agentic review onto executus + large-PR cost controls
Fixes the large-PR token burn: a ~250K-token diff was re-sent every agent step across models × lenses × passes, draining a metered usage block in minutes. Small PRs are untouched (every mitigation is size-gated / no-op under threshold). - Re-platform the in-process review path onto executus run.Executor: context compaction (executus/compact, threshold from the model's real context window via executus/model), run-bounding, a per-PR budget gate (Ports.Budget), and the wrap-up nudge re-expressed as a run.Critic. Lens fan-out now uses executus/fanout. gadfly keeps its own model.go, so GADFLY_ENDPOINT_<NAME> aliases and the claude-code engine are unaffected. No majordomo bump; the binary stays static (executus core is majordomo+stdlib only). - Paginate get_diff (per-file `path` + start_line/limit) instead of dumping the whole diff; trim the recheck diff embed (60k -> 20k chars). - entrypoint.sh: downshift the fleet above GADFLY_HUGE_DIFF_BYTES (one cheap model, fewer lenses/steps, no recheck) + a swarm-wide GADFLY_PR_BUDGET_SECS wall-clock backstop (adds procps for pkill). All advisory; CI never fails. - README + CLAUDE.md + tests updated. Note: run.Result exposes no transcript, so the old transcript-based forced- finalization fallback is dropped; the wrap-up critic nudge is the remaining "always emit something" mechanism. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -22,15 +22,21 @@ verifies each one against the actual code, and posts its findings as a comment.
|
||||
4. **Provider-agnostic.** Powered by [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo),
|
||||
so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any
|
||||
OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
|
||||
5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib + majordomo. Keep
|
||||
it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).
|
||||
5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib, majordomo, and
|
||||
[executus](https://gitea.stevedudenhoeffer.com/steve/executus) (whose *core* — `run`/`compact`/
|
||||
`model`/`fanout`/`tool` — is itself majordomo+stdlib only, so the binary stays static; do NOT
|
||||
pull executus's `contrib/store` or any battery that drags in a DB driver). No heavyweight deps,
|
||||
no coupling to any one consumer repo (e.g. mort). Gadfly is executus's canonical *light* consumer.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
|
||||
main.go orchestration: loop specialists, each a review pass + adversarial recheck
|
||||
engine.go reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
|
||||
main.go orchestration: fan specialists out (executus/fanout), each a review pass + recheck
|
||||
engine.go reviewEngine abstraction: executus run.Executor (majordomo agent loop +
|
||||
compaction/bounding/budget/critic) vs claude-code CLI shell-out
|
||||
executus.go executus wiring: tool.Registry over the repo tools, the run.Executor build
|
||||
(compact + model context-limit threshold + per-PR budget + wrap-up critic)
|
||||
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
|
||||
auto.go dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
|
||||
delegate.go worker-tier delegate_investigation tool (cheap sub-agent does legwork)
|
||||
@@ -69,7 +75,7 @@ verdict. Verdict is one of: `No material issues found` / `Minor issues` / `Block
|
||||
## Build / test
|
||||
|
||||
```sh
|
||||
go build ./cmd/gadfly # needs read access to the private majordomo module
|
||||
go build ./cmd/gadfly # needs read access to the private majordomo + executus modules
|
||||
go test ./...
|
||||
gofmt -l cmd/ # must be clean
|
||||
docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .
|
||||
@@ -149,3 +155,21 @@ are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstractio
|
||||
parallel, `cap` (from `GADFLY_PROVIDER_CONCURRENCY` else `GADFLY_CONCURRENCY`, default 1) bounds
|
||||
models-at-once within a lane. The review timeout (`GADFLY_TIMEOUT_SECS`) is **per-lens**, not
|
||||
shared across the suite — a slow model can't starve later lenses (the original timeout bug).
|
||||
- **Large-PR token burn**: the agent loop re-sends the whole transcript every step, so a giant
|
||||
diff (the old `get_diff` dumped it untruncated, and it was embedded in both the review and
|
||||
recheck task) was re-transmitted ~steps × lenses × passes × models times — a ~250 K-token PR
|
||||
could drain a metered usage block in minutes. Fixed in three size-gated layers (small PRs
|
||||
untouched): paginated `get_diff` + `executus/compact` compaction in the binary; an
|
||||
`entrypoint.sh` downshift above `GADFLY_HUGE_DIFF_BYTES` (one cheap model, fewer lenses/steps,
|
||||
no recheck); and a swarm-wide `GADFLY_PR_BUDGET_SECS` wall-clock backstop. Compaction's threshold
|
||||
is intentionally LOW (`GADFLY_COMPACT_RATIO` 0.45, not executus's 0.7) because the burning
|
||||
transcript on the embedded path rarely reaches 0.7×context.
|
||||
- **executus re-platform**: the in-process review path runs through `executus/run`'s `run.Executor`
|
||||
(compaction, run-bounding, `Ports.Budget`, the wrap-up nudge as `Ports.Critic`), wiring it in
|
||||
`cmd/gadfly/executus.go`. Gadfly KEEPS its own `model.go` resolution (so `GADFLY_ENDPOINT_<NAME>`
|
||||
http aliases + the claude-code engine survive) and only hands `run.Executor` the already-resolved
|
||||
model via a trivial resolver — do NOT route review-model resolution through
|
||||
`model.ParseModelForContext` (it bypasses gadfly's endpoint aliases). `run.Result` exposes no
|
||||
transcript, so the old transcript-based forced-finalization fallback is gone; the wrap-up critic
|
||||
nudge is the remaining "always emit something" mechanism. The claude-code engine still shells out
|
||||
and is unaffected.
|
||||
|
||||
Reference in New Issue
Block a user