Files
llama-swap/internal/router/group_test.go
T
Benson Wong 02e015fa49 Introduce new routing backend (#790)
This is a huge backend change that essentially started with rewriting
the concurrency handling for processes and blew up to a refactor of the
entire application. In short these are the improvements:

**Better state and life cycle management:** 

Life cycle management of processes has always been the trickiest part of
the code. Juggling mutex locks between multiple locations to reduce race
conditions was complex. Too complex for my feeble brain to build a
simple mental model around as llama-swap gained more features. All of
that has been refactored. Most of the locks are gone, replaced with a
single run() that owns all state changes. There is one place to start
from now to understand and extend routing logic.

The improved life cycle management makes it easier to implement more
complex swap optimization strategies in the future like #727.

**Collation of requests:**

llama-swap previously handled requests and swapping in the order they
came in. For example requests for models in this order ABCABC would
result in 5 swaps. Now those requests are handled in this order AABBCC.
The result is less time waiting for swap under a high churn request
queue. This fixes #588 #612.

A possible future enhancement is to support a starvation parameter so
swap can be forced when models have been waiting too long.

**Shared base implementation for groups and swap matrix:** 

During the refactor it became clear that much of the swapping logic was
shared between these two implementations. That is not surprising
considering the swap matrix was added many moons after groups. Now they
share a common base and their specific swap strategies are implemented
into the swapPlanner interface.

Requests for bespoke or specific swapping scenarios is a common theme in
the issues. Now users can implement whatever bespoke and weird swapping
strategy they want in their own fork. Just ask your agent of choice to
implement swapPlanner. I'll still remaining more conservative on what
actually lands in core llama-swap and will continue to evaluate PRs if
the changes is good for everyone or just one specific use case.

**AI / Agentic Disclosure:** 

I paid very close attention to the low level swap concurrency design and
implementation. It's important to keep that essential part reliable,
boring and no surprises. Backwards compatibility was also maintained,
even the one way non-exclusive group model loading behaviour that people
have rightly pointed out be a weird design decision.

With the underlying swap core done the web server, api and UI sitting on
top were largely ported over with Claude Code and Opus 4.7 in multiple
phases. If you're curious I kept the changes in docs/newrouter-todo.md.
I did several passes to make sure things weren't left behind.

However, even frontier LLMs at the time of this PR still make small
decisions that don't make a lot of sense. They get shit wrong all the
time, just in small subtle way.

That said, there's likely to be some new bugs introduced with this
massive refactor. I'm fairly confident that there's no major
architectural flaws that would cause goal seeking agents to make dumb,
ugly code decisions.

For a little while the legacy llama-swap will be available under
cmd/legacy/llama-swap. The plan is to eventually delete that entry point
as well as the proxy package.

On a bit of a personal note, this PR is exciting and a bit sad for me. I
hand wrote much of the original code and this PR ultimately replaces
much of it. While the old code served as a good reference for the agent
to implement the new stuff it still a bit sad to eventually delete it
all.
2026-05-28 21:47:01 -07:00

332 lines
9.2 KiB
Go

package router
import (
"io"
"net/http"
"net/http/httptest"
"testing"
"time"
"github.com/mostlygeek/llama-swap/internal/config"
"github.com/mostlygeek/llama-swap/internal/logmon"
"github.com/mostlygeek/llama-swap/internal/process"
)
// newTestGroup builds a Group directly from the supplied processes and config,
// bypassing NewGroup's call to process.New.
func newTestGroup(t *testing.T, conf config.Config, processes map[string]process.Process) *Group {
t.Helper()
modelToGroup := make(map[string]string)
for gid, gcfg := range conf.Groups {
for _, mid := range gcfg.Members {
modelToGroup[mid] = gid
}
}
planner := &groupPlanner{
config: conf,
modelToGroup: modelToGroup,
processes: processes,
}
base := newBaseRouter("group", conf, processes, planner, logmon.NewWriter(io.Discard))
base.testProcessed = make(chan struct{}, 64)
g := &Group{baseRouter: base}
go base.run()
t.Cleanup(func() {
if !g.shuttingDown.Load() {
_ = g.Shutdown(time.Second)
}
})
return g
}
func TestGroup_NewGroup_DuplicateMembership(t *testing.T) {
conf := config.Config{
Groups: map[string]config.GroupConfig{
"g1": {Swap: true, Members: []string{"a"}},
"g2": {Swap: true, Members: []string{"a"}},
},
Models: map[string]config.ModelConfig{
"a": {},
},
}
log := logmon.NewWriter(io.Discard)
if _, err := NewGroup(conf, log, log); err == nil {
t.Fatalf("expected error for duplicate membership")
}
}
func TestGroup_ServeHTTP_SwapStopsPrevious(t *testing.T) {
a := newFakeProcess("a")
a.markReady()
go a.Run(0) // park a Run goroutine so Stop has something to release
b := newFakeProcess("b")
b.autoReady = true
conf := config.Config{
HealthCheckTimeout: 5,
Groups: map[string]config.GroupConfig{
"g": {Swap: true, Exclusive: true, Members: []string{"a", "b"}},
},
}
g := newTestGroup(t, conf, map[string]process.Process{"a": a, "b": b})
w := httptest.NewRecorder()
g.ServeHTTP(w, newRequest("b"))
if w.Code != http.StatusOK {
t.Fatalf("status=%d body=%q", w.Code, w.Body.String())
}
if got := a.stopCalls.Load(); got != 1 {
t.Errorf("a.stopCalls=%d want 1", got)
}
if got := b.runCalls.Load(); got != 1 {
t.Errorf("b.runCalls=%d want 1", got)
}
if got := b.serveCalls.Load(); got != 1 {
t.Errorf("b.serveCalls=%d want 1", got)
}
}
func TestGroup_NonSwapGroup_NoStop(t *testing.T) {
a := newFakeProcess("a")
a.markReady()
b := newFakeProcess("b")
b.autoReady = true
conf := config.Config{
HealthCheckTimeout: 5,
Groups: map[string]config.GroupConfig{
"g": {Swap: false, Exclusive: false, Members: []string{"a", "b"}},
},
}
g := newTestGroup(t, conf, map[string]process.Process{"a": a, "b": b})
w := httptest.NewRecorder()
g.ServeHTTP(w, newRequest("b"))
if w.Code != http.StatusOK {
t.Fatalf("status=%d body=%q", w.Code, w.Body.String())
}
if got := a.stopCalls.Load(); got != 0 {
t.Errorf("a.stopCalls=%d want 0 (swap=false should not stop siblings)", got)
}
if got := b.runCalls.Load(); got != 1 {
t.Errorf("b.runCalls=%d want 1", got)
}
}
func TestGroup_CrossGroupExclusive(t *testing.T) {
a := newFakeProcess("a")
a.markReady()
go a.Run(0)
b := newFakeProcess("b")
b.autoReady = true
conf := config.Config{
HealthCheckTimeout: 5,
Groups: map[string]config.GroupConfig{
"g1": {Swap: true, Exclusive: true, Members: []string{"a"}},
"g2": {Swap: true, Exclusive: true, Members: []string{"b"}},
},
}
g := newTestGroup(t, conf, map[string]process.Process{"a": a, "b": b})
w := httptest.NewRecorder()
g.ServeHTTP(w, newRequest("b"))
if w.Code != http.StatusOK {
t.Fatalf("status=%d body=%q", w.Code, w.Body.String())
}
if got := a.stopCalls.Load(); got != 1 {
t.Errorf("a.stopCalls=%d want 1 (cross-group exclusive must stop)", got)
}
}
// TestGroup_CrossGroupNonExclusiveParallel verifies that two requests for
// models in distinct non-exclusive groups load in parallel rather than
// serializing through the router's run loop.
func TestGroup_CrossGroupNonExclusiveParallel(t *testing.T) {
a := newFakeProcess("a")
pb := newFakeProcess("b")
conf := config.Config{
HealthCheckTimeout: 5,
Groups: map[string]config.GroupConfig{
"g1": {Swap: true, Exclusive: false, Members: []string{"a"}},
"g2": {Swap: true, Exclusive: false, Members: []string{"b"}},
},
}
g := newTestGroup(t, conf, map[string]process.Process{"a": a, "b": pb})
w1 := httptest.NewRecorder()
done1 := make(chan struct{})
go func() {
g.ServeHTTP(w1, newRequest("a"))
close(done1)
}()
waitProcessed(t, g.testProcessed, 1)
w2 := httptest.NewRecorder()
done2 := make(chan struct{})
go func() {
g.ServeHTTP(w2, newRequest("b"))
close(done2)
}()
waitProcessed(t, g.testProcessed, 1)
// Both groups load concurrently — both must reach Run() before either is
// marked ready. If the router still serialised, only one would proceed.
<-a.runStarted
<-pb.runStarted
a.markReady()
pb.markReady()
for i, ch := range []chan struct{}{done1, done2} {
select {
case <-ch:
case <-time.After(time.Second):
t.Fatalf("request %d did not complete", i)
}
}
if got := a.stopCalls.Load(); got != 0 {
t.Errorf("a.stopCalls=%d want 0 (parallel groups don't evict each other)", got)
}
if got := pb.stopCalls.Load(); got != 0 {
t.Errorf("b.stopCalls=%d want 0 (parallel groups don't evict each other)", got)
}
}
// TestGroup_SameGroupSwapSerialises verifies that two same-group requests
// (Swap=true) serialise even when both arrive while neither has reached
// StateStarting yet — the alsoRunning hint to the planner closes that race.
func TestGroup_SameGroupSwapSerialises(t *testing.T) {
a := newFakeProcess("a")
pb := newFakeProcess("b")
conf := config.Config{
HealthCheckTimeout: 5,
Groups: map[string]config.GroupConfig{
"g": {Swap: true, Exclusive: false, Members: []string{"a", "b"}},
},
}
g := newTestGroup(t, conf, map[string]process.Process{"a": a, "b": pb})
w1 := httptest.NewRecorder()
done1 := make(chan struct{})
go func() {
g.ServeHTTP(w1, newRequest("a"))
close(done1)
}()
waitProcessed(t, g.testProcessed, 1)
// Request B arrives before A transitions to StateStarting in the process
// state machine. Without the alsoRunning hint, the planner would not see
// A as running, and B would start in parallel, violating Swap=true.
w2 := httptest.NewRecorder()
done2 := make(chan struct{})
go func() {
g.ServeHTTP(w2, newRequest("b"))
close(done2)
}()
waitProcessed(t, g.testProcessed, 1)
if got := pb.runCalls.Load(); got != 0 {
t.Errorf("b started in parallel: runCalls=%d want 0", got)
}
<-a.runStarted
a.markReady()
waitProcessed(t, g.testProcessed, 1) // swapDone(a) → b promoted
<-pb.runStarted
pb.markReady()
for i, ch := range []chan struct{}{done1, done2} {
select {
case <-ch:
case <-time.After(time.Second):
t.Fatalf("request %d did not complete", i)
}
}
if got := a.stopCalls.Load(); got != 1 {
t.Errorf("a.stopCalls=%d want 1 (b's swap must stop a)", got)
}
}
// TestGroup_PersistentNotEvicted verifies that a group with persistent=true
// is never evicted when another exclusive group starts loading. The running
// model in the persistent group stays alive alongside the new one.
func TestGroup_PersistentNotEvicted(t *testing.T) {
a := newFakeProcess("a")
a.markReady()
go a.Run(0)
b := newFakeProcess("b")
b.autoReady = true
conf := config.Config{
HealthCheckTimeout: 5,
Groups: map[string]config.GroupConfig{
"persist": {Swap: true, Exclusive: false, Persistent: true, Members: []string{"a"}},
"other": {Swap: true, Exclusive: true, Members: []string{"b"}},
},
}
g := newTestGroup(t, conf, map[string]process.Process{"a": a, "b": b})
w := httptest.NewRecorder()
g.ServeHTTP(w, newRequest("b"))
if w.Code != http.StatusOK {
t.Fatalf("status=%d body=%q", w.Code, w.Body.String())
}
if got := a.stopCalls.Load(); got != 0 {
t.Errorf("a.stopCalls=%d want 0 (persistent group must not be evicted)", got)
}
if a.State() != process.StateStarting && a.State() != process.StateReady {
t.Errorf("a state=%s want still running", a.State())
}
if got := b.runCalls.Load(); got != 1 {
t.Errorf("b.runCalls=%d want 1", got)
}
}
// TestGroup_NonExclusiveDoesNotUnloadExclusive pins a backwards-compatible
// gotcha from the original ProcessGroup: when a model in a non-exclusive group
// is loaded, any running exclusive group keeps running. The two coexist.
func TestGroup_NonExclusiveDoesNotUnloadExclusive(t *testing.T) {
a := newFakeProcess("a")
a.markReady()
go a.Run(0)
b := newFakeProcess("b")
b.autoReady = true
conf := config.Config{
HealthCheckTimeout: 5,
Groups: map[string]config.GroupConfig{
"g1": {Swap: true, Exclusive: true, Members: []string{"a"}},
"g2": {Swap: true, Exclusive: false, Members: []string{"b"}},
},
}
g := newTestGroup(t, conf, map[string]process.Process{"a": a, "b": b})
w := httptest.NewRecorder()
g.ServeHTTP(w, newRequest("b"))
if w.Code != http.StatusOK {
t.Fatalf("status=%d body=%q", w.Code, w.Body.String())
}
if got := a.stopCalls.Load(); got != 0 {
t.Errorf("a.stopCalls=%d want 0 (non-exclusive target must not unload exclusive group)", got)
}
if a.State() != process.StateStarting && a.State() != process.StateReady {
t.Errorf("a state=%s want still running", a.State())
}
if got := b.runCalls.Load(); got != 1 {
t.Errorf("b.runCalls=%d want 1", got)
}
}