feat: re-platform agentic review onto executus + large-PR cost controls

Fixes the large-PR token burn: a ~250K-token diff was re-sent every agent step across models × lenses × passes, draining a metered usage block in minutes. Small PRs are untouched (every mitigation is size-gated / no-op under threshold). - Re-platform the in-process review path onto executus run.Executor: context compaction (executus/compact, threshold from the model's real context window via executus/model), run-bounding, a per-PR budget gate (Ports.Budget), and the wrap-up nudge re-expressed as a run.Critic. Lens fan-out now uses executus/fanout. gadfly keeps its own model.go, so GADFLY_ENDPOINT_<NAME> aliases and the claude-code engine are unaffected. No majordomo bump; the binary stays static (executus core is majordomo+stdlib only). - Paginate get_diff (per-file `path` + start_line/limit) instead of dumping the whole diff; trim the recheck diff embed (60k -> 20k chars). - entrypoint.sh: downshift the fleet above GADFLY_HUGE_DIFF_BYTES (one cheap model, fewer lenses/steps, no recheck) + a swarm-wide GADFLY_PR_BUDGET_SECS wall-clock backstop (adds procps for pkill). All advisory; CI never fails. - README + CLAUDE.md + tests updated. Note: run.Result exposes no transcript, so the old transcript-based forced- finalization fallback is dropped; the wrap-up critic nudge is the remaining "always emit something" mechanism. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 10:43:34 -04:00
parent 5007597cf9
commit b860119332
19 changed files with 935 additions and 224 deletions
@@ -22,15 +22,21 @@ verifies each one against the actual code, and posts its findings as a comment.
 4. **Provider-agnostic.** Powered by [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo),
   so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any
   OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
-5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib + majordomo. Keep
-   it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).
+5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib, majordomo, and
+   [executus](https://gitea.stevedudenhoeffer.com/steve/executus) (whose *core* — `run`/`compact`/
+   `model`/`fanout`/`tool` — is itself majordomo+stdlib only, so the binary stays static; do NOT
+   pull executus's `contrib/store` or any battery that drags in a DB driver). No heavyweight deps,
+   no coupling to any one consumer repo (e.g. mort). Gadfly is executus's canonical *light* consumer.

 ## Architecture

 ```
 cmd/gadfly/            the reviewer binary — pure producer of review markdown (stdout)
-  main.go              orchestration: loop specialists, each a review pass + adversarial recheck
-  engine.go            reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
+  main.go              orchestration: fan specialists out (executus/fanout), each a review pass + recheck
+  engine.go            reviewEngine abstraction: executus run.Executor (majordomo agent loop +
+                       compaction/bounding/budget/critic) vs claude-code CLI shell-out
+  executus.go          executus wiring: tool.Registry over the repo tools, the run.Executor build
+                       (compact + model context-limit threshold + per-PR budget + wrap-up critic)
  specialists.go       specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
  auto.go              dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
  delegate.go          worker-tier delegate_investigation tool (cheap sub-agent does legwork)
@@ -69,7 +75,7 @@ verdict. Verdict is one of: `No material issues found` / `Minor issues` / `Block
 ## Build / test

 ```sh
-go build ./cmd/gadfly      # needs read access to the private majordomo module
+go build ./cmd/gadfly      # needs read access to the private majordomo + executus modules
 go test ./...
 gofmt -l cmd/               # must be clean
 docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .
@@ -149,3 +155,21 @@ are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstractio
  parallel, `cap` (from `GADFLY_PROVIDER_CONCURRENCY` else `GADFLY_CONCURRENCY`, default 1) bounds
  models-at-once within a lane. The review timeout (`GADFLY_TIMEOUT_SECS`) is **per-lens**, not
  shared across the suite — a slow model can't starve later lenses (the original timeout bug).
+- **Large-PR token burn**: the agent loop re-sends the whole transcript every step, so a giant
+  diff (the old `get_diff` dumped it untruncated, and it was embedded in both the review and
+  recheck task) was re-transmitted ~steps × lenses × passes × models times — a ~250 K-token PR
+  could drain a metered usage block in minutes. Fixed in three size-gated layers (small PRs
+  untouched): paginated `get_diff` + `executus/compact` compaction in the binary; an
+  `entrypoint.sh` downshift above `GADFLY_HUGE_DIFF_BYTES` (one cheap model, fewer lenses/steps,
+  no recheck); and a swarm-wide `GADFLY_PR_BUDGET_SECS` wall-clock backstop. Compaction's threshold
+  is intentionally LOW (`GADFLY_COMPACT_RATIO` 0.45, not executus's 0.7) because the burning
+  transcript on the embedded path rarely reaches 0.7×context.
+- **executus re-platform**: the in-process review path runs through `executus/run`'s `run.Executor`
+  (compaction, run-bounding, `Ports.Budget`, the wrap-up nudge as `Ports.Critic`), wiring it in
+  `cmd/gadfly/executus.go`. Gadfly KEEPS its own `model.go` resolution (so `GADFLY_ENDPOINT_<NAME>`
+  http aliases + the claude-code engine survive) and only hands `run.Executor` the already-resolved
+  model via a trivial resolver — do NOT route review-model resolution through
+  `model.ParseModelForContext` (it bypasses gadfly's endpoint aliases). `run.Result` exposes no
+  transcript, so the old transcript-based forced-finalization fallback is gone; the wrap-up critic
+  nudge is the remaining "always emit something" mechanism. The claude-code engine still shells out
+  and is unaffected.