feat: re-platform agentic review onto executus + large-PR cost controls (#20)
Build & push image / build-and-push (push) Successful in 33s
Build & push image / build-and-push (push) Successful in 33s
Makes gadfly a consumer of executus (run.Executor compaction/bounding/budget/critic + fanout) and fixes the large-PR token burn in size-gated layers: paginated get_diff, downshift above GADFLY_HUGE_DIFF_BYTES, and a swarm-wide GADFLY_PR_BUDGET_SECS backstop. Small PRs untouched; advisory-only and the static binary preserved. Dogfood swarm reviewed it (6 models, 21 real findings graded + folded in). Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
This commit was merged in pull request #20.
This commit is contained in:
@@ -396,7 +396,16 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
|
||||
| `GADFLY_TIMEOUT_SECS` | 300 | deadline **per specialist lens** (review+recheck) |
|
||||
| `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
|
||||
| `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |
|
||||
| `GADFLY_MAX_DIFF_CHARS` | 60000 | diff chars embedded in the prompt (full diff via `get_diff`) |
|
||||
| `GADFLY_MAX_DIFF_CHARS` | 60000 | diff chars embedded in the **review** prompt (the full diff is reachable via the paginated `get_diff` tool, scoped per file with its `path` arg) |
|
||||
| `GADFLY_RECHECK_DIFF_CHARS` | 20000 | diff chars embedded in the **recheck** prompt (smaller — the recheck pages `get_diff` for the hunks it verifies) |
|
||||
| `GADFLY_COMPACT` | on | context compaction (via [executus](https://gitea.stevedudenhoeffer.com/steve/executus)): fold the transcript's runaway middle into a summary as it nears the model's context window, so a big diff + accumulating tool output can't balloon every step. `0` disables |
|
||||
| `GADFLY_COMPACT_RATIO` | 0.45 | fraction of the model's context window at which compaction fires |
|
||||
| `GADFLY_COMPACT_MODEL` | worker, else review model | cheap model the compactor uses to summarize the folded middle |
|
||||
| `GADFLY_COMPACT_KEEP_RECENT` | 8 | most-recent messages kept verbatim during compaction |
|
||||
| `GADFLY_COMPACT_SUMMARY_WORDS` | 200 | word cap on the compaction summary |
|
||||
| `GADFLY_MODEL_CONTEXT_TOKENS` | *(auto)* | override the model's context-window size (tokens) for the compaction threshold; set it for self-hosted endpoints executus can't introspect (Ollama Cloud models resolve automatically) |
|
||||
| `GADFLY_PR_TOKEN_BUDGET` | — | per-model token ceiling for this PR; once spent, remaining lenses/passes are skipped (advisory). 0 = off |
|
||||
| `GADFLY_PR_TIME_BUDGET_SECS` | — | per-model wall-clock ceiling for this PR (advisory). 0 = off |
|
||||
| `GADFLY_STATUS_BOARD` | on | set `0` to disable the live status-board comment |
|
||||
| `GADFLY_STATUS_POLL_SECS` | 12 | how often the status board re-renders/upserts |
|
||||
| `GADFLY_CONSOLIDATE` | `auto` | cross-model consensus comment: `auto` (on for ≥2 models), `1` (force on), `0` (off — one comment per model) |
|
||||
@@ -408,6 +417,37 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
|
||||
| `GADFLY_REPO` | *(from `GITEA_API`)* | `owner/repo` slug stamped on emitted runs/findings (set by `entrypoint.sh`) |
|
||||
| `GADFLY_PR` | *(from event)* | PR number stamped on emitted runs/findings (set by `entrypoint.sh`) |
|
||||
|
||||
### Large-PR cost controls
|
||||
|
||||
A very large diff is the one thing that can blow the budget: every review step
|
||||
re-sends it, multiplied across models × lenses × passes × steps (a single
|
||||
~250 K-token PR can otherwise burn a whole metered usage block). Gadfly handles
|
||||
big PRs in three layers, all **size-gated so small PRs are untouched**:
|
||||
|
||||
1. **Paginated `get_diff` + compaction** (reviewer binary, on by default) —
|
||||
`get_diff` returns a paginated, optionally per-file window instead of the whole
|
||||
diff, and once a transcript nears the model's context window its middle is
|
||||
folded into a summary (powered by [executus](https://gitea.stevedudenhoeffer.com/steve/executus)'s
|
||||
`compact`). Tune with the `GADFLY_COMPACT_*` knobs above.
|
||||
2. **Downshift** (`entrypoint.sh`) — above `GADFLY_HUGE_DIFF_BYTES` the whole fleet
|
||||
collapses to a single cheap model + a focused lens subset, fewer steps, and no
|
||||
recheck. A finished shallow review beats a budget-nuking one, and the posted
|
||||
comment says so.
|
||||
3. **Hard backstop** (`entrypoint.sh`) — `GADFLY_PR_BUDGET_SECS` is a wall-clock
|
||||
ceiling across the *entire* fleet; on expiry the review is stopped and whatever
|
||||
was found so far is posted. Like everything else, it never fails CI.
|
||||
|
||||
| Env | Default | Meaning |
|
||||
|-----|---------|---------|
|
||||
| `GADFLY_HUGE_DIFF_BYTES` | 600000 | downshift the fleet when the PR diff exceeds this many bytes (0 = never downshift) |
|
||||
| `GADFLY_HUGE_DIFF_MODELS` | first model | model(s) to run on a downshifted huge PR |
|
||||
| `GADFLY_HUGE_DIFF_SPECIALISTS` | `security,correctness,error-handling` | lenses on a downshifted huge PR |
|
||||
| `GADFLY_HUGE_DIFF_MAX_STEPS` | 12 | review step cap on a huge PR |
|
||||
| `GADFLY_HUGE_DIFF_RECHECK_MAX_STEPS` | 8 | recheck step cap on a huge PR |
|
||||
| `GADFLY_HUGE_DIFF_RECHECK` | 0 | run the recheck pass on a huge PR (off by default) |
|
||||
| `GADFLY_HUGE_DIFF_MAX_DIFF_CHARS` | 20000 | embedded review-diff chars on a huge PR |
|
||||
| `GADFLY_PR_BUDGET_SECS` | — | swarm-wide wall-clock backstop; stops the whole fleet when reached (0 = off) |
|
||||
|
||||
## Findings telemetry (optional)
|
||||
|
||||
Gadfly can record what it found so model quality can be tracked over time. It is
|
||||
@@ -431,7 +471,7 @@ code.
|
||||
## Building locally
|
||||
|
||||
```sh
|
||||
go build ./cmd/gadfly # needs read access to the private majordomo module
|
||||
go build ./cmd/gadfly # needs read access to the private majordomo + executus modules
|
||||
go test ./...
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user