feat: per-provider concurrency lanes (cloud parallel while local churns)
Build & push image / build-and-push (push) Successful in 7s
Build & push image / build-and-push (push) Successful in 7s
entrypoint.sh groups models by provider into lanes that run in PARALLEL; within
a lane at most `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY map
("ollama-cloud=3,m1pro=1") else GADFLY_CONCURRENCY (default 1). So a single
local box stays serial (1 at a time) while cloud models run several at once and
both lanes progress simultaneously. Portable bash (no associative arrays).
Default cap 1 keeps a single-provider pool sequential as before. Pairs with the
per-lens timeout so a slow lane can't starve others. Docs: README Concurrency
section + config table; CLAUDE.md lessons incl. the docker://:latest cache gotcha.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -127,3 +127,11 @@ are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstractio
|
|||||||
`GADFLY_ENDPOINT_<NAME>="provider|base-url[|key]"` mechanism (see `cmd/gadfly/model.go`).
|
`GADFLY_ENDPOINT_<NAME>="provider|base-url[|key]"` mechanism (see `cmd/gadfly/model.go`).
|
||||||
- Gitea `vars`/`secrets` are **not** auto-exposed as env in a job — the consumer stub must map
|
- Gitea `vars`/`secrets` are **not** auto-exposed as env in a job — the consumer stub must map
|
||||||
each one explicitly in its `env:` block (dynamic alias names can't be auto-enumerated).
|
each one explicitly in its `env:` block (dynamic alias names can't be auto-enumerated).
|
||||||
|
- **`uses: docker://…:latest` is CACHED by act_runner** — a freshly-pushed `:latest` is often
|
||||||
|
NOT re-pulled, so the job silently runs the previous image. For a run that must use a specific
|
||||||
|
build (e.g. validating a just-pushed fix), pin the consumer stub to the immutable
|
||||||
|
`:sha-<short>` tag the build publishes, not `:latest`.
|
||||||
|
- **Concurrency is per-provider** (`entrypoint.sh`): each provider is a lane, lanes run in
|
||||||
|
parallel, `cap` (from `GADFLY_PROVIDER_CONCURRENCY` else `GADFLY_CONCURRENCY`, default 1) bounds
|
||||||
|
models-at-once within a lane. The review timeout (`GADFLY_TIMEOUT_SECS`) is **per-lens**, not
|
||||||
|
shared across the suite — a slow model can't starve later lenses (the original timeout bug).
|
||||||
|
|||||||
@@ -154,6 +154,24 @@ Unset = no delegation (current behavior).
|
|||||||
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, let `auto` pick
|
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, let `auto` pick
|
||||||
> only what a diff needs, and point heavy legwork at a cheap `GADFLY_WORKER_MODEL`.
|
> only what a diff needs, and point heavy legwork at a cheap `GADFLY_WORKER_MODEL`.
|
||||||
|
|
||||||
|
### Concurrency (per-provider lanes)
|
||||||
|
|
||||||
|
With multiple models, each **provider** is its own lane and lanes run in **parallel**, so a fast
|
||||||
|
cloud provider isn't stuck behind a slow local box. Within a lane, at most `cap` models run at
|
||||||
|
once — `cap` comes from `GADFLY_PROVIDER_CONCURRENCY` (a `provider=N` map) else `GADFLY_CONCURRENCY`
|
||||||
|
(default `1`). The timeout is **per-lens** (`GADFLY_TIMEOUT_SECS`), so a slow model on one lens
|
||||||
|
can't starve the others.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# One local box (serial — it serves one model at a time) + 3 cloud reviews at once,
|
||||||
|
# both lanes running concurrently:
|
||||||
|
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,m1pro=1"
|
||||||
|
GADFLY_MODELS: "m1pro/qwen3:14b,qwen3-coder:480b-cloud,gpt-oss:120b-cloud"
|
||||||
|
```
|
||||||
|
|
||||||
|
A model's provider is the spec's first segment (`m1pro/…` → `m1pro`), or `GADFLY_PROVIDER`/
|
||||||
|
`ollama-cloud` for a bare id. Default (`cap 1`) keeps a single-provider pool fully sequential.
|
||||||
|
|
||||||
### Triggers
|
### Triggers
|
||||||
|
|
||||||
1. A **new/reopened/ready** non-draft PR — automatic.
|
1. A **new/reopened/ready** non-draft PR — automatic.
|
||||||
@@ -194,7 +212,10 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
|
|||||||
| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
|
| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
|
||||||
| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
|
| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
|
||||||
| `GADFLY_WORKER_MAX_STEPS` | 8 | tool-step cap for a delegated worker run |
|
| `GADFLY_WORKER_MAX_STEPS` | 8 | tool-step cap for a delegated worker run |
|
||||||
|
| `GADFLY_CONCURRENCY` | 1 | default max models run at once **per provider** |
|
||||||
|
| `GADFLY_PROVIDER_CONCURRENCY` | — | per-provider overrides, e.g. `ollama-cloud=3,m1pro=1` |
|
||||||
| `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap |
|
| `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap |
|
||||||
|
| `GADFLY_TIMEOUT_SECS` | 300 | deadline **per specialist lens** (review+recheck) |
|
||||||
| `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
|
| `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
|
||||||
| `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |
|
| `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |
|
||||||
| `GADFLY_TIMEOUT_SECS` | 300 | overall deadline (both passes) |
|
| `GADFLY_TIMEOUT_SECS` | 300 | overall deadline (both passes) |
|
||||||
|
|||||||
+61
-13
@@ -124,21 +124,69 @@ log "cloning ${REPO_PATH} @ ${BRANCH}"
|
|||||||
git clone --depth=1 --branch "$BRANCH" "$CLONE_URL" "$REPO_DIR" 2>/dev/null \
|
git clone --depth=1 --branch "$BRANCH" "$CLONE_URL" "$REPO_DIR" 2>/dev/null \
|
||||||
|| die "clone of ${REPO_PATH}@${BRANCH} failed"
|
|| die "clone of ${REPO_PATH}@${BRANCH} failed"
|
||||||
|
|
||||||
# --- review once per model -------------------------------------------------
|
# --- review once per model, with per-provider concurrency -------------------
|
||||||
# GADFLY_MODELS is the provider-agnostic name; OLLAMA_REVIEW_MODELS is kept as a
|
# GADFLY_MODELS is the provider-agnostic name; OLLAMA_REVIEW_MODELS is a
|
||||||
# back-compat alias. GADFLY_PROVIDER / GADFLY_BASE_URL / GADFLY_API_KEY and any
|
# back-compat alias. GADFLY_PROVIDER / GADFLY_BASE_URL / GADFLY_API_KEY and any
|
||||||
# provider key envs (OPENAI_API_KEY, …) are inherited by run.sh and the binary.
|
# provider key envs (OPENAI_API_KEY, …) are inherited by run.sh and the binary.
|
||||||
|
#
|
||||||
|
# Concurrency: each PROVIDER is its own lane and lanes run in PARALLEL, so a fast
|
||||||
|
# cloud provider isn't stuck behind a slow local box. Within a lane, at most
|
||||||
|
# `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY's "provider=N"
|
||||||
|
# entry, else GADFLY_CONCURRENCY (default 1). A model's provider is the spec's
|
||||||
|
# first path segment ("m1pro/qwen3.6:35b-mlx" -> m1pro), or GADFLY_PROVIDER /
|
||||||
|
# ollama-cloud for a bare id. Default (cap 1) keeps a single-provider pool fully
|
||||||
|
# sequential, exactly as before.
|
||||||
MODELS="${GADFLY_MODELS:-${OLLAMA_REVIEW_MODELS:-$DEFAULT_MODELS}}"
|
MODELS="${GADFLY_MODELS:-${OLLAMA_REVIEW_MODELS:-$DEFAULT_MODELS}}"
|
||||||
log "provider: ${GADFLY_PROVIDER:-ollama-cloud}; models: ${MODELS}"
|
DEFAULT_CONC="${GADFLY_CONCURRENCY:-1}"
|
||||||
IFS=',' read -ra ARR <<< "$MODELS" || true
|
|
||||||
for raw in "${ARR[@]}"; do
|
provider_of() { case "$1" in */*) echo "${1%%/*}";; *) echo "${GADFLY_PROVIDER:-ollama-cloud}";; esac; }
|
||||||
m="$(echo "$raw" | tr -d '[:space:]')"
|
|
||||||
[ -z "$m" ] && continue
|
provider_cap() { # provider -> concurrency (override map "p=N,...", else default)
|
||||||
log "::: reviewing with ${m}"
|
local p="$1" item k v
|
||||||
PROVIDER=ollama \
|
IFS=',' read -ra _caps <<< "${GADFLY_PROVIDER_CONCURRENCY:-}"
|
||||||
MODEL="$m" \
|
for item in "${_caps[@]}"; do
|
||||||
GADFLY_BIN="/usr/local/bin/gadfly" \
|
k="$(echo "${item%%=*}" | tr -d '[:space:]')"
|
||||||
GADFLY_REPO_DIR="$REPO_DIR" \
|
v="$(echo "${item#*=}" | tr -d '[:space:]')"
|
||||||
bash "${SCRIPTS_DIR}/run.sh" || log "model ${m} failed (continuing)"
|
if [ "$k" = "$p" ] && [ -n "$v" ]; then echo "$v"; return; fi
|
||||||
done
|
done
|
||||||
|
echo "$DEFAULT_CONC"
|
||||||
|
}
|
||||||
|
|
||||||
|
review_one() {
|
||||||
|
PROVIDER=ollama MODEL="$1" GADFLY_BIN="/usr/local/bin/gadfly" GADFLY_REPO_DIR="$REPO_DIR" \
|
||||||
|
bash "${SCRIPTS_DIR}/run.sh" || log "model $1 failed (continuing)"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Normalize the model list (trim, drop blanks) into MODEL_LIST.
|
||||||
|
IFS=',' read -ra _raw <<< "$MODELS" || true
|
||||||
|
MODEL_LIST=()
|
||||||
|
for raw in "${_raw[@]}"; do m="$(echo "$raw" | tr -d '[:space:]')"; [ -n "$m" ] && MODEL_LIST+=("$m"); done
|
||||||
|
|
||||||
|
# Distinct providers, in first-seen order (no associative arrays — portable).
|
||||||
|
PROVIDERS=""
|
||||||
|
for m in "${MODEL_LIST[@]}"; do
|
||||||
|
p="$(provider_of "$m")"
|
||||||
|
case " $PROVIDERS " in *" $p "*) ;; *) PROVIDERS="${PROVIDERS}${PROVIDERS:+ }$p" ;; esac
|
||||||
|
done
|
||||||
|
|
||||||
|
run_lane() { # $1=provider: run its models, at most `cap` at a time
|
||||||
|
local p="$1" cap inflight=0 m
|
||||||
|
cap="$(provider_cap "$p")"; [ "$cap" -ge 1 ] 2>/dev/null || cap=1
|
||||||
|
local mine=()
|
||||||
|
for m in "${MODEL_LIST[@]}"; do [ "$(provider_of "$m")" = "$p" ] && mine+=("$m"); done
|
||||||
|
log "lane ${p}: cap ${cap}; models: ${mine[*]}"
|
||||||
|
for m in "${mine[@]}"; do
|
||||||
|
review_one "$m" &
|
||||||
|
inflight=$((inflight+1))
|
||||||
|
if [ "$inflight" -ge "$cap" ]; then wait -n 2>/dev/null || wait; inflight=$((inflight-1)); fi
|
||||||
|
done
|
||||||
|
wait
|
||||||
|
}
|
||||||
|
|
||||||
|
log "providers: ${PROVIDERS:-none}"
|
||||||
|
# Each provider lane runs in parallel; cap is enforced within each lane.
|
||||||
|
for p in $PROVIDERS; do
|
||||||
|
run_lane "$p" &
|
||||||
|
done
|
||||||
|
wait
|
||||||
log "done"
|
log "done"
|
||||||
|
|||||||
Reference in New Issue
Block a user