# Router design A developer tutorial for the `internal/router` package and its `scheduler` sub-package. ## Intro A llama-swap router is the component that sits behind the proxy and answers one question for every incoming request: _can this model serve right now, and if not, what has to happen first?_ Answering it means juggling three concerns that used to live tangled together in one type: 1. **Process machinery** — owning the OS processes, starting and stopping them, running health checks, and shuttling HTTP requests onto the right upstream. 2. **Scheduling strategy** — the queue, in-flight bookkeeping, and the decision tree that turns one request into "serve now", "join an existing swap", "queue", or "start a swap". 3. **Eviction policy** — given a model we want to load, which currently-running models have to be stopped to make room? The design pulls those three apart into separate, independently replaceable pieces: | Concern | Type | Lives in | | ------------------- | ------------------------------ | ------------------------------- | | Process machinery | `baseRouter` | `internal/router/base.go` | | Scheduling strategy | `scheduler.Scheduler` (`FIFO`) | `internal/router/scheduler/` | | Eviction policy | `scheduler.Swapper` | `groupSwapper`, `matrixSwapper` | `baseRouter` keeps the channels, run loop, process lifecycle, and shutdown teardown, and exposes the side-effects a scheduler needs through the `scheduler.Effects` interface. The scheduler owns the queue and decision tree but performs no side-effects directly — it calls back through `Effects`. The `Swapper` is a pure function from "target model + currently running" to "models to evict", and knows nothing about queues, channels, or processes. Because the seams are interfaces, you can replace the scheduling strategy without touching process management, or write a new eviction policy without touching either. `FIFO` is the first and currently only `Scheduler`; `groupSwapper` and `matrixSwapper` are the two `Swapper`s. ## Key concepts ### One run loop, no locks `baseRouter.run()` is a single goroutine selecting over a handful of channels: ```go for { select { case req := <-b.shutdownCh: b.handleShutdown(req); return case req := <-b.handlerCh: b.schedule.OnRequest(req) case req := <-b.unloadCh: b.schedule.OnUnload(req.targets, req.timeout); close(req.respond) case ev := <-b.swapDoneCh: b.schedule.OnSwapDone(ev) case ev := <-b.serveDoneCh: b.schedule.OnServeDone(ev) } } ``` Every `Scheduler` method runs on this one goroutine. That is the single most important fact about the design: **the scheduler never needs a mutex for its own state**. All scheduler state is touched only from these callbacks, which are serialized by the run loop. If you write a new scheduler, you get the same guarantee for free — and you must not break it by spinning up goroutines that mutate scheduler state. ### Events flow in, side-effects flow out The run loop turns external happenings into method calls on the scheduler: - A new HTTP request becomes `OnRequest(HandlerReq)`. - A swap goroutine finishing becomes `OnSwapDone(SwapDone)`. - A tracked request handler returning becomes `OnServeDone(ServeDoneEvent)`. - An admin unload becomes `OnUnload(targets, timeout)`. - Shutdown becomes `OnShutdown(err)`. The scheduler reacts by calling **back out** through `Effects`: inspect a process state, start a swap, grant a response to a caller, or stop processes. It never calls `process.Process` directly and never writes to a channel directly. This keeps the scheduler pure enough to unit-test against a fake `Effects` with no goroutines or real processes involved (see `scheduler/fifo_test.go`). ``` HTTP request admin Unload / Shutdown │ │ ▼ ▼ ServeHTTP ──HandlerReq──▶ baseRouter.run() ◀──unloadCh/shutdownCh │ (single goroutine) ▼ Scheduler.On*(...) │ calls back through ▼ Effects: ModelState / StartSwap / GrantServe / GrantError / StopProcesses │ ▼ baseRouter side-effects: doSwap goroutine, grant() to caller, process.Stop() │ swap completes ──SwapDone──▶ back into run loop ``` ### The swap goroutine Scheduling decisions must be quick and non-blocking, but loading a model is slow. The two are reconciled by doing the slow part on a separate goroutine. When the scheduler decides to start a swap, inside `OnRequest` it: 1. records "a swap for X is in flight" in its own state, then 2. calls `Effects.StartSwap(modelID, evict)`. `StartSwap` does **not** load the model itself — it just launches a detached goroutine (`doSwap`) and returns straight away. `doSwap` is what does the slow work: stop the evicted processes, start the target, wait for it to become ready. Because `StartSwap` returned immediately, `OnRequest` returns too, and the run loop is free to pick up the next event — another request, a serve-done, an unload — while `doSwap` runs in the background. The swap's eventual result comes back as just another event: when `doSwap` finishes it posts a `SwapDone` onto `swapDoneCh`, which the run loop delivers as `OnSwapDone`. So a slow load never blocks the run loop; it brackets it with two quick events (`OnRequest` to start, `OnSwapDone` to finish) and everything in between is handled normally. ### In-flight tracking and `trackedServe` When the scheduler grants a request, the handler it hands back is wrapped by `baseRouter.trackedServe`. The wrapper runs the real `ServeHTTP` and, on return, posts a `ServeDoneEvent` so the run loop can decrement the per-model in-flight count. This is why the scheduler can know whether a process is "busy": it counts grants out and serve-dones in. A swap that would evict a busy process is deferred until that process's in-flight count hits zero (`OnServeDone` then re-drains the queue). The subtle contract here is `GrantServe`'s boolean return. The caller's `Respond` channel is unbuffered, so a successful send proves the HTTP goroutine is alive and took the handler. If the caller already disconnected, the send fails, `trackedServe` never runs, and **no** `ServeDoneEvent` will ever arrive — so the scheduler must only increment `inFlight` when `GrantServe` returns true. Incrementing on a false return would strand the counter above zero and the model could never be evicted again. ## The interfaces All three live in `scheduler/scheduler.go`. ### `Scheduler` ```go type Scheduler interface { OnRequest(req HandlerReq) OnSwapDone(ev SwapDone) OnServeDone(ev ServeDoneEvent) OnUnload(targets []string, timeout time.Duration) OnShutdown(err error) } ``` Owns the queue, in-flight tracking, and the decision tree. All methods run on the run-loop goroutine, so no internal locking is needed. ### `Swapper` ```go type Swapper interface { EvictionFor(target string, running []string) []string OnSwapStart(target string, running []string) } ``` The eviction policy. `EvictionFor` is a **pure decision** — given the target and the complete `running` set, return the running model IDs that must stop. It must not log or mutate anything, and it does **not** inspect process state itself: the scheduler hands it `running` already assembled (every non-stopped process, unioned with the targets of in-flight swaps already committed but not yet visible in process state). That keeps the swapper a pure function of its inputs, with no reference to processes. The reason it must not log is that it is a _speculative_ query — "what would we evict if we started this swap right now?" — called far more often than swaps actually happen. The scheduler calls it once per incoming request, and then **again for every still-queued request on every queue drain** (each `OnSwapDone`, `OnServeDone`, and `OnUnload`). Most of those calls end in "still queued", "collides", or "nothing to evict", not a real swap. Logging there would emit duplicate lines for a request that simply sits in the queue, and lines for decisions that never happen — the log would stop meaning "a swap occurred". `OnSwapStart` is the one place a Swapper may log, because it is called exactly once, at the moment a swap is committed. One log line there equals one real swap, with the evict set that is genuinely being applied — which is why `matrixSwapper` re-solves and logs the full decision (set, DSL, cost) in `OnSwapStart` rather than in `EvictionFor`. ### `Effects` ```go type Effects interface { ModelState(modelID string) (process.ProcessState, bool) RunningModels() map[string]process.ProcessState StartSwap(modelID string, evict []string) GrantError(req HandlerReq, err error) GrantServe(req HandlerReq, modelID string) bool StopProcesses(timeout time.Duration, ids []string) } ``` Implemented by `baseRouter`. This is the scheduler's entire window onto the outside world; everything else about the router is hidden from it. See the deep-dive below. ### `Factory` — wiring it together ```go type Factory func(name string, logger *logmon.Monitor, eff Effects) Scheduler ``` `baseRouter` doesn't know which scheduler or swapper it has — it is handed a `Factory` at construction and calls it once, passing itself as the `Effects`. The concrete router captures its `Swapper` in the closure. From `group.go`: ```go swapper := &groupSwapper{ /* ... */ } base := newBaseRouter("group", conf, processes, proxylog, func(name string, logger *logmon.Monitor, eff scheduler.Effects) scheduler.Scheduler { return scheduler.NewFIFO(name, logger, swapper, eff) }) ``` This closure is the single point where the three pieces meet: it binds a specific `Swapper` (`swapper`) and a specific `Scheduler` (`FIFO`) to the `baseRouter`'s `Effects` (`eff`). **The swapper is a separate type from the concrete router.** There are currently two router implementations router.Group and router.Matrix. Each of these has a custom swapper that implements scheduler.Swapper for custom eviction logic. This decoupling of responsibilities makes it easy to implement custom swapping strategies. ### The events A single goroutine in `baseRouter.run()` owns and serializes all state changes in the router. By processing events one at a time it ensures correctness and eliminates complex mutex lock logic. These are the events the router currently uses: ```go type HandlerReq struct { // one in-flight ServeHTTP awaiting a decision Model string Ctx context.Context Respond chan HandlerResp // UNBUFFERED — see GrantServe contract PositionCh chan int // queue-position updates for the loading UI } type HandlerResp struct { // the decision handed back to the caller HandleFunc http.HandlerFunc // serve with this, or... Err error // ...fail with this } type SwapDone struct{ ModelID string; Err error } // swap goroutine finished type ServeDoneEvent struct{ ModelID string } // tracked handler returned ``` ## Deep-dive: the `Effects` interface and why it exists `Effects` is the inversion-of-control boundary that makes the split possible. The scheduler decides and `baseRouter` _acts_. Pulling the side-effects behind this interface buys three things: 1. **Purity and testability.** The scheduler performs no I/O, starts no goroutines of its own, and touches no real processes. Its tests drive the `On*` methods directly and assert on a `fakeEffects` that just records the calls — synchronous, deterministic, no sleeps. (`scheduler/fifo_test.go`.) 2. **A single, auditable side-effect surface.** Every externally-visible thing a scheduler can do is one of six methods. You can reason about the whole contract by reading one interface. 3. **Decoupling lifetime.** The scheduler never holds a `process.Process`, never sees a channel, and never learns how shutdown teardown works. It only knows model IDs and states. Method by method, as implemented in `base.go`: - **`ModelState(modelID) (state, ok)`** — read-only snapshot of a process's state, and whether this router handles the model at all. The scheduler uses it for the "unknown model" check and the "already ready" fast path. Safe to call any time because the process map is fixed at construction and `State()` is a snapshot. - **`RunningModels()`** — the state of every process that isn't stopped or shut down. The scheduler unions its keys with its own in-flight swap targets to build the `running` set it hands the `Swapper`, so the swapper never has to touch process state itself. - **`StartSwap(modelID, evict)`** — fire-and-forget. `baseRouter` launches the `doSwap` goroutine and returns immediately; the result comes back later as a `SwapDone`. The scheduler records the swap as active _before_ calling this so that requests arriving in the meantime can join it. - **`GrantError(req, err)`** — hand a caller an error response. Used for unknown models, failed swaps, unloads, and shutdown. - **`GrantServe(req, modelID) bool`** — hand a caller the tracked handler for a ready model, returning whether the caller was still there to receive it. The scheduler increments the in-flight count **only on a true return** (see the in-flight contract above). This is the one `Effects` method whose return value carries state-machine significance. - **`StopProcesses(timeout, ids)`** — stop processes in parallel and **block** until all have stopped. Used by `OnUnload` so an admin `Unload` call can guarantee the process is dead by the time it returns. (Note `StartSwap` is async but `StopProcesses` is sync — the difference is deliberate and tied to the caller's expectations.) A useful way to hold it in your head: `Effects` is the scheduler's syscall table. The scheduler is a pure state machine; `Effects` is how it touches the world, and `baseRouter` is the kernel that implements those syscalls with real goroutines, channels, and processes. ## How to implement a new `Swapper` A `Swapper` is a pure decision function plus a logging hook — the easiest of the three pieces to replace. 1. **Write the swapper type** and give it whatever config it needs to make a decision. It does **not** need the process map — the scheduler supplies the running set as an argument. `groupSwapper` holds only its group config; `matrixSwapper` holds only its solver and logger: ```go type mySwapper struct { config config.Config } ``` 2. **Implement `EvictionFor(target, running)`** as a _pure_ decision: - `running` is the complete live set, already assembled for you: every non-stopped process unioned with the targets of in-flight swaps the scheduler has committed to. You don't filter process state or fold in in-flight targets yourself, that's the scheduler's job. Just decide against the slice you're handed. - Return the list of model IDs in `running` that must stop for `target` to run. Return `nil`/empty when nothing needs evicting. - Do **not** mutate state here. - Do **not** log here. It can be called multiple times per request. Since it is pure function have tests verify the expected behaviour. 3. **Implement `OnSwapStart(target, running)`** — called once when a swap actually begins, with the same `running` set `EvictionFor` saw. This is the right place to log: one call equals one real swap. `matrixSwapper` re-solves and logs the chosen set and cost here; `groupSwapper` logs nothing. 4. **Wire it in** by instantiating the swapper in your router's constructor and capturing it in the `Factory` closure passed to `newBaseRouter` — exactly as `NewGroup` and `NewMatrix` do. The router struct itself only ever embeds `*baseRouter`; the swapper reaches the scheduler solely through that closure. Reference implementations: `groupSwapper` (static group config) in `group.go` and `matrixSwapper` (cost-based set solver) in `matrix.go`. ## How to implement a new `Scheduler` Replacing the scheduler means taking over the queue and the entire decision tree. Read `scheduler/fifo.go` end to end first — it is the reference implementation and the rules below are easiest to understand in context. The rules you must honour: - **Single goroutine.** Every method runs on the `baseRouter.run()` goroutine. Keep your state in plain maps/slices and never read or write it from another goroutine. If you need slow work done, hand it to `Effects.StartSwap` and react to the resulting `SwapDone` — do not block a method waiting for it. - **Never block the run loop.** `OnRequest`, `OnSwapDone`, and `OnServeDone` must make a decision and return. The one method allowed to block is `OnUnload`, and only because it must wait on the synchronous `StopProcesses` so the admin caller's guarantee holds. - **Respect the `GrantServe` boolean.** Only count a request as in-flight when `GrantServe` returns true (see the in-flight contract above). A false return means the caller is gone; no `ServeDoneEvent` will ever arrive, so incrementing on false permanently strands the counter. - **Account for in-flight swaps in your running set.** When you call `Swapper.EvictionFor`, the running set you pass must include not just live processes (`Effects.RunningModels`) but also the targets of swaps you've already started that aren't yet visible in process state — otherwise the swapper contradicts decisions already in motion. What each method must do: - **`OnRequest(req)`** — every request must resolve to exactly one of: granted, errored, joined (piggybacks an in-flight swap), queued, or swap-started. No request may be silently dropped. - **`OnSwapDone(ev)`** — deliver the result to every waiter that joined this swap (grant on success, error on `ev.Err`), drop the swap from active tracking, then re-examine anything queued — a finished swap may have unblocked it. - **`OnServeDone(ev)`** — decrement the model's in-flight count; when it hits zero, re-examine the queue. Do **not** clear in-flight counts by hand; the handlers post their own `ServeDoneEvent`s on return. - **`OnUnload(targets, timeout)`** — error out any waiters or queued requests for the unloaded models, call `Effects.StopProcesses` (synchronously — the admin caller relies on the process being dead afterwards), then re-examine the queue. - **`OnShutdown(err)`** — error out every waiter you still hold (active swap waiters and queued requests). Don't touch processes; teardown is `baseRouter`'s job. Expose a constructor matching the `Factory` shape: ```go func NewMyScheduler(name string, logger *logmon.Monitor, swapper Swapper, eff Effects) *MyScheduler { // ... } // in the concrete router: base := newBaseRouter(name, conf, processes, proxylog, func(name string, logger *logmon.Monitor, eff scheduler.Effects) scheduler.Scheduler { return scheduler.NewMyScheduler(name, logger, swapper, eff) }) ``` ## Testing - **Schedulers** are tested as pure state machines in the `scheduler` package: drive the `On*` methods directly against a `fakeEffects` and assert on the recorded grants/starts/stops. No goroutines, no sleeps. See `scheduler/fifo_test.go` as the reference; follow the `TestSchedulerName_` naming convention. - **`baseRouter` mechanism** (run loop, `grant`/`ServeHTTP`, `Unload`, `Shutdown`) is tested in `base_test.go`. The run loop exposes a `testProcessed` channel so tests can wait for an event to be fully processed instead of sleeping. - Run new tests with `go test -v -run TestMyScheduler_... ./internal/router/scheduler/`, then `make test-dev` for a quick `go test` + `staticcheck` pass over `proxy/`.