- Make ModelLogsTab fill available vertical space instead of fixed h-80
- Add min-h-0 flex-1 to Logs Tabs.Content so height propagates
- Set closeOnSelect=false on column visibility checkbox items to keep
the dropdown open while toggling multiple columns
- Add ref prop to ExpandableTextarea to expose the underlying textarea
- Track streaming state transitions in ChatInterface and refocus the
input via $effect when isStreaming flips to false
Reorder sidebar menu to Activity, Playground, Models, Logs. Remove the
ll header icon and replace it with the connection status indicator
moved from the footer. Add a Settings page (gear icon) at the bottom
that surfaces the build information that was previously hidden behind
the status indicator's tooltip.
- move ConnectionStatus into Sidebar.Header, drop build-info tooltip
- add Settings.svelte route showing version/commit/build date
- register /settings route and title in App.svelte
- add showPagination to Activity route's ActivityTable
- fix pagination reactivity: reassign pagination object in
onPaginationChange so TanStack's effect.pre detects the change, and
reset to first page only when pageSize changes
- move data-change page reset into untrack to avoid clobbering
navigation
- render Cached/Prompt/Drafted headers with a dotted underline trigger
instead of a separate info icon
Replace the hand-rolled table with a TanStack Table-backed shadcn
data-table using the FlexRender/createSvelteTable helpers, with
DropdownMenu column visibility, Select page-size, and icon-button
pagination. Column visibility and page size persist to localStorage.
Move tooltip usage to the canonical shadcn-svelte pattern by adding a
single root Tooltip.Provider in App.svelte and using Tooltip.Root/
Trigger/Content directly in the activity-table sub-components
(HeaderLabel, MetaCell), dropping the custom Tooltip/MetadataTooltip
wrappers.
- add @tanstack/table-core and shadcn data-table helpers
- split cell/header renderers into activity-table/* sub-components
- switch pagination/visibility to TanStack Table state driven by
table.nextPage/previousPage/setPageIndex/setPageSize and
column.toggleVisibility
- Add /models route (ModelsDash) with unload-all, model list with
start/stop buttons, and show-unlisted toggle
- Make sidebar Models and Playground headings navigate to their pages
while the chevron independently expands/collapses the section
- Extract shared model load/unload orchestration into modelLoad store
- Left-align model names in the ConcurrencyInterface load-test list
- Widen CaptureDialog to 90% with flex-based scroll overflow
- Use sm:max-w-[90%] to override the shadcn dialog's sm:max-w-sm cap
- Add ActivityTable component consolidating column customization,
table rendering, pagination, and capture dialog previously
duplicated between Activity.svelte and ModelDetail.svelte
- Split ModelDetail tabs into ModelActivityTab, ModelLogsTab, and
ModelDetailsTab components under components/model/
- Reduce Activity.svelte and ModelDetail.svelte to thin shells
- ModelDetail tabs now reuse ActivityTable instead of duplicating
column management, formatting, and capture logic
Move Models to the top of the sidebar as a collapsible item with each
model listed as a sub-item.
- add persistent modelsMenuOpen store for expand state
- show status dot per model (grey/yellow/green for stopped/changing/loaded)
- right-aligned load/unload button with Play/PowerOff/Loader2 icon
- button stops propagation so it doesn't trigger navigation
Move Playground tabs into the sidebar as collapsible sub-items and make
the sidebar the sole navigation for playground interfaces.
- add collapsible UI primitive (bits-ui wrapper)
- add playground store with selected tab and menu open state (persistent)
- make Playground menu item collapsible; whole button toggles expand state
- move playground sub-items (Chat/Images/Speech/etc) under Playground
- remove in-page Tabs from Playground.svelte
- update sectionTitle breadcrumb to reflect active sub-item
- remove bg-sidebar panel background so items sit on page background
- remove persistent data-active background tint on menu items
fixes#123
Convert the remaining .btn usages (Concurrency, Performance, CaptureDialog)
to shadcn Button, fix CaptureDialog/PerformanceChart styles to shadcn
tokens, and remove the transitional legacy palette aliases and component
classes from index.css. Drop the now-unused lucide-svelte and shadcn-svelte
dependencies.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
- RerankInterface uses Button/Input/Textarea/ToggleGroup
- normalize legacy color utilities and lucide imports across the
remaining playground interfaces, Performance and CaptureDialog
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
Add AppSidebar built from the shadcn sidebar primitives (collapsible icon
rail, editable title, nav with active states, footer theme toggle and
connection status) and wrap the app in a sidebar provider with an inset
top bar. Preserves the always-mounted Playground pattern.
- add src/components/AppSidebar.svelte
- restructure App.svelte around Sidebar.Provider/Inset
- remove Header.svelte
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
Set up shadcn-svelte components and adopt its design-token system as the
base for modernizing the UI. Switch dark mode from the data-theme attribute
to the .dark class so shadcn primitives theme correctly.
- add components.json, $lib alias (tsconfig + vite), cn() util
- install shadcn primitives under src/lib/components/ui
- rewrite index.css with shadcn tokens (zinc + brand teal accent)
- keep legacy utility/class aliases as a transitional shim
- toggle .dark class from theme store in App.svelte
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
Over time the llama-swap configuration file can get really long and
challenging to work with. The -config-dir flag is used for a directory
of configuration YAML fragments.
These fragments are merged together and into a full configuration and
tested for validity. All previous configuration functionality remains
unchanged.
Add upstream.ignorePaths config to prevent model swaps for static-asset
requests made through the /upstream/<model>/<path> passthrough endpoint.
- add UpstreamConfig with compiled *regexp.Regexp slice; invalid regex
returns an error at load time
- apply a default pattern matching common static-asset suffixes
(.js/.json/.css/.png/.gif/.jpg/.jpeg/.ico/.txt) when unset
- in handleUpstream, return 409 Conflict when a path matches and the
local model is not already loaded; peer and already-loaded models fall
through to normal dispatch
- update config-schema.json and config.example.yaml
Updates discussion: #868
Store a request/response capture for non-200 responses so failed
requests can be inspected in the activity log's Capture dialog, matching
the existing behavior for successful requests.
- extract storeCapture/decodeResponseBody helpers to share capture logic
between the success and non-200 paths
- record non-200 bodies (decompressed) so error details are viewable
- the activity UI already gates the View button on has_capture, so it
now appears for failed requests with no UI changes
- add tests for capturing failed requests and the disabled-captures case
closes#766
Wrap /upstream/{upstreamPath...} in the metrics middleware so activity
log entries are recorded for model-dispatched endpoints accessed through
the upstream passthrough.
- Move findModelInPath to shared.FindModelInPath and reuse it in
handleUpstream, the log monitor lookup, and FetchContext.
- Extend FetchContext to resolve the model from /upstream/<model>/...
paths without consuming the request body.
- Add isMetricsRecordPath to limit recording to the model-dispatched
endpoints that produce token usage/timings.
- Add tests for upstream metrics recording and FetchContext upstream
path resolution.
Fixes#855
Add GPU monitoring support for AMD and Intel GPUs on Windows using
D3DKMT (DirectX) and PDH performance counters.
- Add PDH-based GPU utilization via \GPU Engine(*)\Utilization
Percentage counter, summing all engine types per adapter (3D, Compute,
Copy, Video).
- Add D3DKMT bindings for adapter enumeration, memory segments, and
adapter perf data.
- Use PDH as primary utilization source (works on all vendors), with
D3DKMT RunningTime as fallback for systems without PDH counters.
- Prefer nvidia-smi when available, fall back to D3DKMT + PDH for
AMD/Intel.
- Backend priority: nvidia-smi -> D3DKMT + PDH -> ErrNoGpuTool.
Verified on AMD 7900XTX GPU with llama.cpp Vulkan & ROCm backend: GPU
utilization correctly shows ~99% during inference, ~0-2% when idle.
---
LLM disclosure: GLM 5.1 & Kimi K2.6 have been used extensively during
exploration and coding to the point that the LLM's wrote over 3/4 of the
code, and I have done additional verification myself.
As such, it should be considered experimental.
Additional verification is needed.
I have tested it on my 7900XTX system with Windows 11, and it works
correctly, but as I only have this one rig, I cannot verify it
everywhere.
- add support for http handlers in the request chain to append metadata
to the request
- metrics middleware will include metadata in the activity log
- update Activity UI to support metadata, drag sort columns
- update Activity UI capture dialog to use more screen space
Updates #834
- make concurrency limiting the scheduler.Scheduler's responsibility
- eliminate the separate concurrency limit middleware
- move concurrencyLimit logic into scheduler.FIFO to maintain backwards compatibility
- add HTTPError from #834
Updates #834
- When a model is manually loaded show a cancel buttton and a queued
status
- Implement cancellation in scheduler.Scheduler interface and FIFO
scheduler
- Add cache bust query parameter to bypass browser cache
Fixes#844
internal/config,server: implement model capabilities
- define the capabilities of a model using a simple config block on the
model
- v1/models renders out capabilities to be compatible with openrouter,
huggingface chat, and mistral formats for broader compatibility
- add support for capabilities in UI
Fixes#734
- introduce internal/router/scheduler to decouple routing, swapping and
queuing into interface contracts.
- introduce a new `routing` configuration section that supersedes
`matrix` and `group` while maintaining backwards compatibility
- add FIFO scheduler with prioritized queuing
- add internal/router/design.md as developer documentation on
implementing new schedulers and routers
Fixes#797
In v221 the shutdown behaviour was refactored so shutdown behaviour was
more reliable in stopping a process group. This exposed an existing bug
where the unload API had a timeout of 0 that snuck in during the big
refactor.
- set a default timeout of 10 seconds for unloads called via the API
- add logging around shutdown routine
updates: #807, #808fixes: #827
Replace TARGETARCH build-arg with runtime arch detection via uname -m.
BuildKit's TARGETARCH injection was unreliable for the multi-arch cpu
build, causing the arm64 image variant to download and embed the x86_64
llama-swap binary — resulting in "exec format error" on arm64 hosts.
With QEMU user-space emulation, uname -m correctly returns aarch64
inside an arm64 container build, so the download always fetches the
right binary for the actual target architecture. Also adds --fail to
curl so HTTP 404s produce a build error instead of silently embedding an
HTML error page.
fixes#818
Co-authored-by: Claude <noreply@anthropic.com>
Shift the non-unified container builds about 8 hours after the llama.cpp's projects container publishing window. The llama.cpp containers take a few hours to build and publish and 8 hours is expected to be enough time to remain fresh.
Additionally, add an extra build at 18:00 in case the 12:00 one does not pick things up. The container builds on the llama-swap side are cheap (just injecting llama-swap binary) so it is fine to run them a bit more frequently.
Implement performance monitoring on OSX for Apple Silicon hardware.
The implementation uses a combination of mactop and ioreg. If mactop is
installed (`brew install mactop`) it is used in a headless cli mode to
stream usage metrics. mactop hooks into unpublished(?) C based APIs in
OSX. Rather than introduce a cgo dependency into llama-swap's build
chain only for darwin I opted to go the external process route.
ioreg, which comes bundled with OSX is used as the fallback. It does not
provide temperature and power usage data but is able to show accurate
GPU and memory utilization.
Updates #771, #814
Add Windows specific shutdown code paths so stopping of child processes
is more reliable:
- stopping llama-swap won't leave behind any child processes it created
- uses Job Objects in Windows so the whole llama-swap tree is closed by
the os
- add procCtx to baseRouter. It replaces shutdownCtx as a signal for
managing lifetime state.
- shutdownCtx is only used by the router to stop handling new requests
during shutdown
- improve debug logging to make it easier to trace source of issues
Fixes#804
Updates #807
Note: The original proxy/process_unix.go had a noop for setProcAttributes
so it also did not stop grandchildren processes. This patch adds that capability
and improves reliability.
--
Stop() no longer hangs on a shell wrapper that forks the real binary.
The upstream is built with exec.CommandContext + cmd.Cancel +
cmd.WaitDelay, so cmd.Wait() returns even when a forked grandchild
inherits the stdout/stderr pipes. killProcess sends the stop signal
directly (not by cancelling the context) so cmd.WaitDelay measures from
process exit and never silently caps the caller's graceful timeout.
The upstream is also started in its own process group (Setpgid) on Unix,
so the graceful SIGTERM — and the SIGKILL escalation after the timeout —
are delivered to the whole group via the negative PID. A forked
grandchild is reaped with its parent instead of leaking as an orphan.
The loading-spinner SSE goroutine can no longer panic when it outlives
the request. net/http recycles the response writer via Reset(nil) once
ServeHTTP returns; the orphaned goroutine then flushed against a
nil-backed writer and crashed with a SIGSEGV. A release() fence on
loadingWriter lets any in-flight write finish then short-circuits later
writes/flushes, and all three ServeHTTP select branches run a
finishLoading helper (cancelLoad, waitForCompletion, release) before the
writer is reclaimed.
- internal/process: exec.CommandContext + WaitDelay, Setpgid process
groups, group-wide SIGTERM/SIGKILL teardown
- internal/router: release() fence + finishLoading on loadingWriter
fixes#804
update the concurrency middleware to respond with a JSON payload instead
of plain text when the request limit is reached to be compatible with
openai api standard
---------
Co-authored-by: Ludwik <l.czarnota@samsung.com>