internal/router/scheduler: add serial scheduler, default on this fork
Validate JSON Schema / validate-schema (push) Successful in 9m53s
Linux CI / run-tests (push) Failing after 15m57s
Windows CI / run-tests (push) Has been cancelled

Add a strict one-model-at-a-time scheduler. Requests run in exact
arrival order; at most one runs at a time; switching to a different
model evicts every other running model first so a single model occupies
memory at a time. Unlike fifo it never reorders or batches same-model
requests, and it ignores group/matrix co-residency entirely, making the
single-model guarantee a property of the scheduler rather than the config.

- new Serial scheduler implementing the Scheduler interface
- register "serial" in scheduler.New; default routing.scheduler.use to
  "serial" at config load (fifo still selectable for upstream behavior)
- update config schema, example config, and config defaults tests

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-28 12:17:32 -04:00
parent 0a25b3bd31
commit 542b79dacf
9 changed files with 683 additions and 15 deletions
+13 -3
View File
@@ -556,11 +556,21 @@ routing:
# expands to: [L]
full: "L"
# scheduler: how queued requests are ordered.
# The default and only valid scheduler is "fifo"
# scheduler: how queued requests are ordered and run.
# - optional, default on this fork: "serial"
# - valid values:
# - "serial": strict one-model-at-a-time. Requests run in exact arrival
# order; only one request runs at a time; switching to a different model
# evicts every other running model first so a single model occupies memory
# at a time. This ignores group/matrix co-residency entirely. The "fifo"
# settings below (priority) do not apply.
# - "fifo": throughput-oriented. Same-model requests are batched to reduce
# swaps and a model serves up to its concurrencyLimit in parallel; models
# in non-exclusive groups can run concurrently. Requests may be reordered.
scheduler:
use: fifo
use: serial
settings:
# fifo settings only apply when use: fifo
fifo:
# priority: a dictionary of model ID -> priority
# - optional, default: empty dictionary