Add a strict one-model-at-a-time scheduler. Requests run in exact
arrival order; at most one runs at a time; switching to a different
model evicts every other running model first so a single model occupies
memory at a time. Unlike fifo it never reorders or batches same-model
requests, and it ignores group/matrix co-residency entirely, making the
single-model guarantee a property of the scheduler rather than the config.
- new Serial scheduler implementing the Scheduler interface
- register "serial" in scheduler.New; default routing.scheduler.use to
"serial" at config load (fifo still selectable for upstream behavior)
- update config schema, example config, and config defaults tests
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- add support for http handlers in the request chain to append metadata
to the request
- metrics middleware will include metadata in the activity log
- update Activity UI to support metadata, drag sort columns
- update Activity UI capture dialog to use more screen space
Updates #834
- make concurrency limiting the scheduler.Scheduler's responsibility
- eliminate the separate concurrency limit middleware
- move concurrencyLimit logic into scheduler.FIFO to maintain backwards compatibility
- add HTTPError from #834
Updates #834
- When a model is manually loaded show a cancel buttton and a queued
status
- Implement cancellation in scheduler.Scheduler interface and FIFO
scheduler
- Add cache bust query parameter to bypass browser cache
Fixes#844
- introduce internal/router/scheduler to decouple routing, swapping and
queuing into interface contracts.
- introduce a new `routing` configuration section that supersedes
`matrix` and `group` while maintaining backwards compatibility
- add FIFO scheduler with prioritized queuing
- add internal/router/design.md as developer documentation on
implementing new schedulers and routers
Fixes#797