feat: per-provider concurrency lanes (cloud parallel while local churns)

entrypoint.sh groups models by provider into lanes that run in PARALLEL; within a lane at most `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY map ("ollama-cloud=3,m1pro=1") else GADFLY_CONCURRENCY (default 1). So a single local box stays serial (1 at a time) while cloud models run several at once and both lanes progress simultaneously. Portable bash (no associative arrays). Default cap 1 keeps a single-provider pool sequential as before. Pairs with the per-lens timeout so a slow lane can't starve others. Docs: README Concurrency section + config table; CLAUDE.md lessons incl. the docker://:latest cache gotcha. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 20:29:08 -04:00
parent 49f3623204
commit 9e582bfaca
3 changed files with 90 additions and 13 deletions
@@ -154,6 +154,24 @@ Unset = no delegation (current behavior).
 > The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, let `auto` pick
 > only what a diff needs, and point heavy legwork at a cheap `GADFLY_WORKER_MODEL`.

+### Concurrency (per-provider lanes)
+
+With multiple models, each **provider** is its own lane and lanes run in **parallel**, so a fast
+cloud provider isn't stuck behind a slow local box. Within a lane, at most `cap` models run at
+once — `cap` comes from `GADFLY_PROVIDER_CONCURRENCY` (a `provider=N` map) else `GADFLY_CONCURRENCY`
+(default `1`). The timeout is **per-lens** (`GADFLY_TIMEOUT_SECS`), so a slow model on one lens
+can't starve the others.
+
+```yaml
+# One local box (serial — it serves one model at a time) + 3 cloud reviews at once,
+# both lanes running concurrently:
+GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,m1pro=1"
+GADFLY_MODELS: "m1pro/qwen3:14b,qwen3-coder:480b-cloud,gpt-oss:120b-cloud"
+```
+
+A model's provider is the spec's first segment (`m1pro/…` → `m1pro`), or `GADFLY_PROVIDER`/
+`ollama-cloud` for a bare id. Default (`cap 1`) keeps a single-provider pool fully sequential.
+
 ### Triggers

 1. A **new/reopened/ready** non-draft PR — automatic.
@@ -194,7 +212,10 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
 | `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
 | `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
 | `GADFLY_WORKER_MAX_STEPS` | 8 | tool-step cap for a delegated worker run |
+| `GADFLY_CONCURRENCY` | 1 | default max models run at once **per provider** |
+| `GADFLY_PROVIDER_CONCURRENCY` | — | per-provider overrides, e.g. `ollama-cloud=3,m1pro=1` |
 | `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap |
+| `GADFLY_TIMEOUT_SECS` | 300 | deadline **per specialist lens** (review+recheck) |
 | `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
 | `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |
 | `GADFLY_TIMEOUT_SECS` | 300 | overall deadline (both passes) |