feat: per-provider concurrency lanes (cloud parallel while local churns)
Build & push image / build-and-push (push) Successful in 7s

entrypoint.sh groups models by provider into lanes that run in PARALLEL; within
a lane at most `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY map
("ollama-cloud=3,m1pro=1") else GADFLY_CONCURRENCY (default 1). So a single
local box stays serial (1 at a time) while cloud models run several at once and
both lanes progress simultaneously. Portable bash (no associative arrays).
Default cap 1 keeps a single-provider pool sequential as before. Pairs with the
per-lens timeout so a slow lane can't starve others. Docs: README Concurrency
section + config table; CLAUDE.md lessons incl. the docker://:latest cache gotcha.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Steve Dudenhoeffer
2026-06-25 20:29:08 -04:00
parent 49f3623204
commit 9e582bfaca
3 changed files with 90 additions and 13 deletions
+21
View File
@@ -154,6 +154,24 @@ Unset = no delegation (current behavior).
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, let `auto` pick
> only what a diff needs, and point heavy legwork at a cheap `GADFLY_WORKER_MODEL`.
### Concurrency (per-provider lanes)
With multiple models, each **provider** is its own lane and lanes run in **parallel**, so a fast
cloud provider isn't stuck behind a slow local box. Within a lane, at most `cap` models run at
once — `cap` comes from `GADFLY_PROVIDER_CONCURRENCY` (a `provider=N` map) else `GADFLY_CONCURRENCY`
(default `1`). The timeout is **per-lens** (`GADFLY_TIMEOUT_SECS`), so a slow model on one lens
can't starve the others.
```yaml
# One local box (serial — it serves one model at a time) + 3 cloud reviews at once,
# both lanes running concurrently:
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,m1pro=1"
GADFLY_MODELS: "m1pro/qwen3:14b,qwen3-coder:480b-cloud,gpt-oss:120b-cloud"
```
A model's provider is the spec's first segment (`m1pro/…``m1pro`), or `GADFLY_PROVIDER`/
`ollama-cloud` for a bare id. Default (`cap 1`) keeps a single-provider pool fully sequential.
### Triggers
1. A **new/reopened/ready** non-draft PR — automatic.
@@ -194,7 +212,10 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
| `GADFLY_WORKER_MAX_STEPS` | 8 | tool-step cap for a delegated worker run |
| `GADFLY_CONCURRENCY` | 1 | default max models run at once **per provider** |
| `GADFLY_PROVIDER_CONCURRENCY` | — | per-provider overrides, e.g. `ollama-cloud=3,m1pro=1` |
| `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap |
| `GADFLY_TIMEOUT_SECS` | 300 | deadline **per specialist lens** (review+recheck) |
| `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
| `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |
| `GADFLY_TIMEOUT_SECS` | 300 | overall deadline (both passes) |