llama-swap

Author	SHA1	Message	Date
Benson Wong	d567fa78cb	npm audit fix	2026-06-28 04:38:45 +00:00
Benson Wong	187f1ae27a	ui: fix logs tab height and column toggle dropdown - Make ModelLogsTab fill available vertical space instead of fixed h-80 - Add min-h-0 flex-1 to Logs Tabs.Content so height propagates - Set closeOnSelect=false on column visibility checkbox items to keep the dropdown open while toggling multiple columns	2026-06-28 04:36:56 +00:00
Benson Wong	0ae56b1eb9	ui: convert chat settings panel to a dialog Replace the inline settings panel with a modal Dialog that pops up over the chat interface, matching the CaptureDialog pattern.	2026-06-28 04:22:01 +00:00
Benson Wong	e46cbeb2bf	ui: refocus message input after chat generation completes - Add ref prop to ExpandableTextarea to expose the underlying textarea - Track streaming state transitions in ChatInterface and refocus the input via $effect when isStreaming flips to false	2026-06-28 04:16:23 +00:00
Benson Wong	a0578f0007	ui: reorganize sidebar and add Settings page Reorder sidebar menu to Activity, Playground, Models, Logs. Remove the ll header icon and replace it with the connection status indicator moved from the footer. Add a Settings page (gear icon) at the bottom that surfaces the build information that was previously hidden behind the status indicator's tooltip. - move ConnectionStatus into Sidebar.Header, drop build-info tooltip - add Settings.svelte route showing version/commit/build date - register /settings route and title in App.svelte	2026-06-28 03:53:14 +00:00
Benson Wong	d207a059a4	ui: enable pagination on Activity page and fix table reactivity - add showPagination to Activity route's ActivityTable - fix pagination reactivity: reassign pagination object in onPaginationChange so TanStack's effect.pre detects the change, and reset to first page only when pageSize changes - move data-change page reset into untrack to avoid clobbering navigation - render Cached/Prompt/Drafted headers with a dotted underline trigger instead of a separate info icon	2026-06-28 03:43:55 +00:00
Benson Wong	040ee1e284	ui: convert ActivityTable to shadcn-svelte data-table Replace the hand-rolled table with a TanStack Table-backed shadcn data-table using the FlexRender/createSvelteTable helpers, with DropdownMenu column visibility, Select page-size, and icon-button pagination. Column visibility and page size persist to localStorage. Move tooltip usage to the canonical shadcn-svelte pattern by adding a single root Tooltip.Provider in App.svelte and using Tooltip.Root/ Trigger/Content directly in the activity-table sub-components (HeaderLabel, MetaCell), dropping the custom Tooltip/MetadataTooltip wrappers. - add @tanstack/table-core and shadcn data-table helpers - split cell/header renderers into activity-table/* sub-components - switch pagination/visibility to TanStack Table state driven by table.nextPage/previousPage/setPageIndex/setPageSize and column.toggleVisibility	2026-06-28 03:26:24 +00:00
Benson Wong	82cad1b84e	ui: add ModelsDash route, clickable sidebar headings, and dialog tweaks - Add /models route (ModelsDash) with unload-all, model list with start/stop buttons, and show-unlisted toggle - Make sidebar Models and Playground headings navigate to their pages while the chevron independently expands/collapses the section - Extract shared model load/unload orchestration into modelLoad store - Left-align model names in the ConcurrencyInterface load-test list - Widen CaptureDialog to 90% with flex-based scroll overflow - Use sm:max-w-[90%] to override the shadcn dialog's sm:max-w-sm cap	2026-06-28 03:04:04 +00:00
Benson Wong	55c3678906	ui: extract shared ActivityTable and split ModelDetail into components - Add ActivityTable component consolidating column customization, table rendering, pagination, and capture dialog previously duplicated between Activity.svelte and ModelDetail.svelte - Split ModelDetail tabs into ModelActivityTab, ModelLogsTab, and ModelDetailsTab components under components/model/ - Reduce Activity.svelte and ModelDetail.svelte to thin shells - ModelDetail tabs now reuse ActivityTable instead of duplicating column management, formatting, and capture logic	2026-06-28 02:27:05 +00:00
Benson Wong	8b5a62d92a	ui-svelte: big convert to shadcn components	2026-06-28 01:53:19 +00:00
Benson Wong	d1e4c8ee77	ui tweaks	2026-06-28 01:21:40 +00:00
Benson Wong	11f8afead8	ui: add collapsible Models section to sidebar Move Models to the top of the sidebar as a collapsible item with each model listed as a sub-item. - add persistent modelsMenuOpen store for expand state - show status dot per model (grey/yellow/green for stopped/changing/loaded) - right-aligned load/unload button with Play/PowerOff/Loader2 icon - button stops propagation so it doesn't trigger navigation	2026-06-27 23:54:18 +00:00
Benson Wong	749819ef47	ui: consolidate playground nav into sidebar Move Playground tabs into the sidebar as collapsible sub-items and make the sidebar the sole navigation for playground interfaces. - add collapsible UI primitive (bits-ui wrapper) - add playground store with selected tab and menu open state (persistent) - make Playground menu item collapsible; whole button toggles expand state - move playground sub-items (Chat/Images/Speech/etc) under Playground - remove in-page Tabs from Playground.svelte - update sectionTitle breadcrumb to reflect active sub-item - remove bg-sidebar panel background so items sit on page background - remove persistent data-active background tint on menu items fixes #123	2026-06-27 16:46:10 +00:00
Claude	0ab9e74333	ui: finish shadcn migration and remove legacy shim Convert the remaining .btn usages (Concurrency, Performance, CaptureDialog) to shadcn Button, fix CaptureDialog/PerformanceChart styles to shadcn tokens, and remove the transitional legacy palette aliases and component classes from index.css. Drop the now-unused lucide-svelte and shadcn-svelte dependencies. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC	2026-06-27 12:10:56 +00:00
Claude	b20be6dcd1	ui: convert Image, Speech, Audio interfaces to shadcn buttons Replace .btn elements and inline SVG icons with shadcn Button and @lucide/svelte icons in the image, speech and audio playground tabs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC	2026-06-27 12:05:19 +00:00
Claude	fc24722258	ui: migrate Rerank and normalize remaining views to shadcn tokens - RerankInterface uses Button/Input/Textarea/ToggleGroup - normalize legacy color utilities and lucide imports across the remaining playground interfaces, Performance and CaptureDialog Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC	2026-06-27 12:01:19 +00:00
Claude	2b087dffb1	ui: migrate ChatMessage to shadcn tokens Use shadcn Button/Textarea, @lucide/svelte icons, and map prose/code-block styles to shadcn CSS variables. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC	2026-06-27 11:58:24 +00:00
Claude	746c083a87	ui: migrate chat playground and stats to shadcn - ChatInterface controls, settings, input use Button/Input/Textarea/Label - ExpandableTextarea and ModelSelector restyled on shadcn tokens - ActivityStats wrapped in Card; Tooltip uses shadcn tooltip Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC	2026-06-27 11:56:31 +00:00
Claude	8dd91e99e8	ui: migrate Activity, Logs views to shadcn - Activity table wrapped in Card with restyled column menu and Button - LogPanel toolbar uses Button/Input with lucide icons - LogViewer source switch uses a ToggleGroup Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC	2026-06-27 11:52:11 +00:00
Claude	136dcdc25f	ui: migrate Models panel and Playground to shadcn - ModelsPanel uses Card, Button, Badge and a dropdown menu for actions - Playground uses shadcn Tabs for the switcher while keeping every interface mounted to preserve state Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC	2026-06-27 11:49:16 +00:00
Claude	767b8015fa	ui: replace top navbar with shadcn sidebar layout Add AppSidebar built from the shadcn sidebar primitives (collapsible icon rail, editable title, nav with active states, footer theme toggle and connection status) and wrap the app in a sidebar provider with an inset top bar. Preserves the always-mounted Playground pattern. - add src/components/AppSidebar.svelte - restructure App.svelte around Sidebar.Provider/Inset - remove Header.svelte Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC	2026-06-27 11:46:30 +00:00
Claude	f0144a2361	ui: add shadcn-svelte foundation and theming Set up shadcn-svelte components and adopt its design-token system as the base for modernizing the UI. Switch dark mode from the data-theme attribute to the .dark class so shadcn primitives theme correctly. - add components.json, $lib alias (tsconfig + vite), cn() util - install shadcn primitives under src/lib/components/ui - rewrite index.css with shadcn tokens (zinc + brand teal accent) - keep legacy utility/class aliases as a transitional shim - toggle .dark class from theme store in App.svelte Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC	2026-06-27 11:42:43 +00:00
Benson Wong	32bc781326	internal/config,watcher: add -config-dir (#873 ) Over time the llama-swap configuration file can get really long and challenging to work with. The -config-dir flag is used for a directory of configuration YAML fragments. These fragments are merged together and into a full configuration and tested for validity. All previous configuration functionality remains unchanged. v230	2026-06-24 20:48:51 -07:00
Benson Wong	316ad63f76	config,server: add upstream.ignorePaths (#869 ) Add upstream.ignorePaths config to prevent model swaps for static-asset requests made through the /upstream/<model>/<path> passthrough endpoint. - add UpstreamConfig with compiled *regexp.Regexp slice; invalid regex returns an error at load time - apply a default pattern matching common static-asset suffixes (.js/.json/.css/.png/.gif/.jpg/.jpeg/.ico/.txt) when unset - in handleUpstream, return 409 Conflict when a path matches and the local model is not already loaded; peer and already-loaded models fall through to normal dispatch - update config-schema.json and config.example.yaml Updates discussion: #868 v229	2026-06-21 13:49:53 -07:00
g2mt	e37077a963	feat: hide performance menu item if disabled (#832 ) Hide the Performance UI item of the navigation bar if its disabled.	2026-06-21 13:38:29 -07:00
Benson Wong	eff9b60434	server: capture failed (non-200) LLM requests (#862 ) Store a request/response capture for non-200 responses so failed requests can be inspected in the activity log's Capture dialog, matching the existing behavior for successful requests. - extract storeCapture/decodeResponseBody helpers to share capture logic between the success and non-200 paths - record non-200 bodies (decompressed) so error details are viewable - the activity UI already gates the View button on has_capture, so it now appears for failed requests with no UI changes - add tests for capturing failed requests and the disabled-captures case closes #766	2026-06-20 11:50:35 -07:00
Wojciech	9bcddad91b	internal/server,ui: add new Acitivty page column - Drafted (#859 ) Add draft metrics to activity log	2026-06-18 20:55:02 -07:00
Benson Wong	a15e47922c	proxy: meter /upstream requests via metrics middleware (#858 ) Wrap /upstream/{upstreamPath...} in the metrics middleware so activity log entries are recorded for model-dispatched endpoints accessed through the upstream passthrough. - Move findModelInPath to shared.FindModelInPath and reuse it in handleUpstream, the log monitor lookup, and FetchContext. - Extend FetchContext to resolve the model from /upstream/<model>/... paths without consuming the request body. - Add isMetricsRecordPath to limit recording to the model-dispatched endpoints that produce token usage/timings. - Add tests for upstream metrics recording and FetchContext upstream path resolution. Fixes #855 v228	2026-06-17 17:38:52 -07:00
George	0ab214d1c8	perf: add vendor-agnostic GPU monitoring for Windows (experimental) (#779 ) Add GPU monitoring support for AMD and Intel GPUs on Windows using D3DKMT (DirectX) and PDH performance counters. - Add PDH-based GPU utilization via \GPU Engine(*)\Utilization Percentage counter, summing all engine types per adapter (3D, Compute, Copy, Video). - Add D3DKMT bindings for adapter enumeration, memory segments, and adapter perf data. - Use PDH as primary utilization source (works on all vendors), with D3DKMT RunningTime as fallback for systems without PDH counters. - Prefer nvidia-smi when available, fall back to D3DKMT + PDH for AMD/Intel. - Backend priority: nvidia-smi -> D3DKMT + PDH -> ErrNoGpuTool. Verified on AMD 7900XTX GPU with llama.cpp Vulkan & ROCm backend: GPU utilization correctly shows ~99% during inference, ~0-2% when idle. --- LLM disclosure: GLM 5.1 & Kimi K2.6 have been used extensively during exploration and coding to the point that the LLM's wrote over 3/4 of the code, and I have done additional verification myself. As such, it should be considered experimental. Additional verification is needed. I have tested it on my 7900XTX system with Windows 11, and it works correctly, but as I only have this one rig, I cannot verify it everywhere. v227	2026-06-16 21:49:09 -07:00
Benson Wong	d07b063ab6	internal/server,shared: support request metadata (#850 ) - add support for http handlers in the request chain to append metadata to the request - metrics middleware will include metadata in the activity log - update Activity UI to support metadata, drag sort columns - update Activity UI capture dialog to use more screen space Updates #834	2026-06-16 21:44:55 -07:00
Benson Wong	826210dac9	.coderabbit.yaml: disable unit_tests	2026-06-16 10:10:17 -07:00
Benson Wong	6cf1317341	schedule,shared: move concurrency 429 limits into scheduler code (#849 ) - make concurrency limiting the scheduler.Scheduler's responsibility - eliminate the separate concurrency limit middleware - move concurrencyLimit logic into scheduler.FIFO to maintain backwards compatibility - add HTTPError from #834 Updates #834	2026-06-15 22:35:12 -07:00
Wojciech	8e84b2ec4f	README.md: add macports install option to README (#848 )	2026-06-15 15:58:24 -07:00
Benson Wong	ed77385d08	ui: improve manual model load and cancel (#847 ) - When a model is manually loaded show a cancel buttton and a queued status - Implement cancellation in scheduler.Scheduler interface and FIFO scheduler - Add cache bust query parameter to bypass browser cache Fixes #844 v226	2026-06-14 13:38:10 -07:00
Benson Wong	92b90447e8	Model capabilities 734 (#842 ) internal/config,server: implement model capabilities - define the capabilities of a model using a simple config block on the model - v1/models renders out capabilities to be compatible with openrouter, huggingface chat, and mistral formats for broader compatibility - add support for capabilities in UI Fixes #734 v225	2026-06-13 23:23:19 -07:00
Benson Wong	62aea0e83d	internal/router,server,shared: refactor auth, libs (#839 ) - refactor shared http functionality into internal/shared/http.go - remove stripping of Authorization and x-api-key - add Request Context middleware to internal/server - add /ui and /metrics behind auth middleware, fixes #717 Fix #717 Updates: #834	2026-06-13 10:19:04 -07:00
Benson Wong	8c660dcb90	main: gofmt	2026-06-11 22:16:39 -07:00
Benson Wong	f6877b8175	main: show message when listening on network (#836 ) fixes: #739	2026-06-11 22:15:14 -07:00
Benson Wong	9b3a33d7b9	Implement new scheduler (#823 ) - introduce internal/router/scheduler to decouple routing, swapping and queuing into interface contracts. - introduce a new `routing` configuration section that supersedes `matrix` and `group` while maintaining backwards compatibility - add FIFO scheduler with prioritized queuing - add internal/router/design.md as developer documentation on implementing new schedulers and routers Fixes #797	2026-06-10 20:34:25 -07:00
Benson Wong	0cfe5a6639	Makefile,internal: fix websocket regression and other small things (#830 ) - fix websocket regression and add test to prevent in the future - fix staticheck errors - remove proxy package remnants from Makefile fix #829 v224	2026-06-09 21:37:53 -07:00
Benson Wong	44e1501e81	internal/process,server: fix unload regression (#828 ) In v221 the shutdown behaviour was refactored so shutdown behaviour was more reliable in stopping a process group. This exposed an existing bug where the unload API had a timeout of 0 that snuck in during the big refactor. - set a default timeout of 10 seconds for unloads called via the API - add logging around shutdown routine updates: #807, #808 fixes: #827	2026-06-09 20:49:58 -07:00
Benson Wong	46cea36bc2	proxy: remove legacy code. Thanks champ 🫡 (#822 ) Fixes #820	2026-06-06 21:00:30 -07:00
Benson Wong	ccfba0df28	docker: fix arm64 cpu image downloading amd64 llama-swap binary (#819 ) Replace TARGETARCH build-arg with runtime arch detection via uname -m. BuildKit's TARGETARCH injection was unreliable for the multi-arch cpu build, causing the arm64 image variant to download and embed the x86_64 llama-swap binary — resulting in "exec format error" on arm64 hosts. With QEMU user-space emulation, uname -m correctly returns aarch64 inside an arm64 container build, so the download always fetches the right binary for the actual target architecture. Also adds --fail to curl so HTTP 404s produce a build error instead of silently embedding an HTML error page. fixes #818 Co-authored-by: Claude <noreply@anthropic.com>	2026-06-04 14:26:21 -07:00
Benson Wong	ddfae90b19	Change cron schedule for container builds Shift the non-unified container builds about 8 hours after the llama.cpp's projects container publishing window. The llama.cpp containers take a few hours to build and publish and 8 hours is expected to be enough time to remain fresh. Additionally, add an extra build at 18:00 in case the 12:00 one does not pick things up. The container builds on the llama-swap side are cheap (just injecting llama-swap binary) so it is fine to run them a bit more frequently.	2026-06-04 11:00:43 -07:00
Benson Wong	29d3d9ba20	perf: add macOS GPU monitoring via mactop and ioreg (#816 ) Implement performance monitoring on OSX for Apple Silicon hardware. The implementation uses a combination of mactop and ioreg. If mactop is installed (`brew install mactop`) it is used in a headless cli mode to stream usage metrics. mactop hooks into unpublished(?) C based APIs in OSX. Rather than introduce a cgo dependency into llama-swap's build chain only for darwin I opted to go the external process route. ioreg, which comes bundled with OSX is used as the fallback. It does not provide temperature and power usage data but is able to show accurate GPU and memory utilization. Updates #771, #814 v223	2026-06-03 21:51:03 -07:00
Benson Wong	9be9a87fa0	internal/process: improve windows shutdown behaviour (#808 ) Add Windows specific shutdown code paths so stopping of child processes is more reliable: - stopping llama-swap won't leave behind any child processes it created - uses Job Objects in Windows so the whole llama-swap tree is closed by the os - add procCtx to baseRouter. It replaces shutdownCtx as a signal for managing lifetime state. - shutdownCtx is only used by the router to stop handling new requests during shutdown - improve debug logging to make it easier to trace source of issues Fixes #804 Updates #807 v222	2026-06-01 00:45:30 -07:00
Benson Wong	6ea551362e	process,router: make model shutdown and load-streaming robust Note: The original proxy/process_unix.go had a noop for setProcAttributes so it also did not stop grandchildren processes. This patch adds that capability and improves reliability. -- Stop() no longer hangs on a shell wrapper that forks the real binary. The upstream is built with exec.CommandContext + cmd.Cancel + cmd.WaitDelay, so cmd.Wait() returns even when a forked grandchild inherits the stdout/stderr pipes. killProcess sends the stop signal directly (not by cancelling the context) so cmd.WaitDelay measures from process exit and never silently caps the caller's graceful timeout. The upstream is also started in its own process group (Setpgid) on Unix, so the graceful SIGTERM — and the SIGKILL escalation after the timeout — are delivered to the whole group via the negative PID. A forked grandchild is reaped with its parent instead of leaking as an orphan. The loading-spinner SSE goroutine can no longer panic when it outlives the request. net/http recycles the response writer via Reset(nil) once ServeHTTP returns; the orphaned goroutine then flushed against a nil-backed writer and crashed with a SIGSEGV. A release() fence on loadingWriter lets any in-flight write finish then short-circuits later writes/flushes, and all three ServeHTTP select branches run a finishLoading helper (cancelLoad, waitForCompletion, release) before the writer is reclaimed. - internal/process: exec.CommandContext + WaitDelay, Setpgid process groups, group-wide SIGTERM/SIGKILL teardown - internal/router: release() fence + finishLoading on loadingWriter fixes #804 v221	2026-05-31 10:11:12 -07:00
Benson Wong	03d58e53fa	Add load testing tool to the UI (#805 ) Wouldn't it be nice to test the performance, swapping and concurrency from the UI? Now we can! This is a port of `cmd/test-concurrency` into the UI Here's a demo of it working with a swap matrix: https://github.com/user-attachments/assets/b6bb12ec-0381-46f1-a6b8-27d1c3c0ddb3 v220	2026-05-30 17:04:30 -07:00
Luiszzzor	c790d0ee03	fix: update the concurrency middleware to respond with a JSON payload (#798 ) update the concurrency middleware to respond with a JSON payload instead of plain text when the request limit is reached to be compatible with openai api standard --------- Co-authored-by: Ludwik <l.czarnota@samsung.com>	2026-05-29 23:59:32 -07:00
Benson Wong	4ca9c478a2	Makefile,internal/server: various release tweaks v219	2026-05-29 15:27:08 -07:00

1 2 3 4 5 ...

523 Commits