Commit Graph

523 Commits

Author SHA1 Message Date
Benson Wong d567fa78cb npm audit fix 2026-06-28 04:38:45 +00:00
Benson Wong 187f1ae27a ui: fix logs tab height and column toggle dropdown
- Make ModelLogsTab fill available vertical space instead of fixed h-80
- Add min-h-0 flex-1 to Logs Tabs.Content so height propagates
- Set closeOnSelect=false on column visibility checkbox items to keep
  the dropdown open while toggling multiple columns
2026-06-28 04:36:56 +00:00
Benson Wong 0ae56b1eb9 ui: convert chat settings panel to a dialog
Replace the inline settings panel with a modal Dialog that pops up
over the chat interface, matching the CaptureDialog pattern.
2026-06-28 04:22:01 +00:00
Benson Wong e46cbeb2bf ui: refocus message input after chat generation completes
- Add ref prop to ExpandableTextarea to expose the underlying textarea
- Track streaming state transitions in ChatInterface and refocus the
  input via $effect when isStreaming flips to false
2026-06-28 04:16:23 +00:00
Benson Wong a0578f0007 ui: reorganize sidebar and add Settings page
Reorder sidebar menu to Activity, Playground, Models, Logs. Remove the
ll header icon and replace it with the connection status indicator
moved from the footer. Add a Settings page (gear icon) at the bottom
that surfaces the build information that was previously hidden behind
the status indicator's tooltip.

- move ConnectionStatus into Sidebar.Header, drop build-info tooltip
- add Settings.svelte route showing version/commit/build date
- register /settings route and title in App.svelte
2026-06-28 03:53:14 +00:00
Benson Wong d207a059a4 ui: enable pagination on Activity page and fix table reactivity
- add showPagination to Activity route's ActivityTable
- fix pagination reactivity: reassign pagination object in
  onPaginationChange so TanStack's effect.pre detects the change, and
  reset to first page only when pageSize changes
- move data-change page reset into untrack to avoid clobbering
  navigation
- render Cached/Prompt/Drafted headers with a dotted underline trigger
  instead of a separate info icon
2026-06-28 03:43:55 +00:00
Benson Wong 040ee1e284 ui: convert ActivityTable to shadcn-svelte data-table
Replace the hand-rolled table with a TanStack Table-backed shadcn
data-table using the FlexRender/createSvelteTable helpers, with
DropdownMenu column visibility, Select page-size, and icon-button
pagination. Column visibility and page size persist to localStorage.

Move tooltip usage to the canonical shadcn-svelte pattern by adding a
single root Tooltip.Provider in App.svelte and using Tooltip.Root/
Trigger/Content directly in the activity-table sub-components
(HeaderLabel, MetaCell), dropping the custom Tooltip/MetadataTooltip
wrappers.

- add @tanstack/table-core and shadcn data-table helpers
- split cell/header renderers into activity-table/* sub-components
- switch pagination/visibility to TanStack Table state driven by
  table.nextPage/previousPage/setPageIndex/setPageSize and
  column.toggleVisibility
2026-06-28 03:26:24 +00:00
Benson Wong 82cad1b84e ui: add ModelsDash route, clickable sidebar headings, and dialog tweaks
- Add /models route (ModelsDash) with unload-all, model list with
  start/stop buttons, and show-unlisted toggle
- Make sidebar Models and Playground headings navigate to their pages
  while the chevron independently expands/collapses the section
- Extract shared model load/unload orchestration into modelLoad store
- Left-align model names in the ConcurrencyInterface load-test list
- Widen CaptureDialog to 90% with flex-based scroll overflow

- Use sm:max-w-[90%] to override the shadcn dialog's sm:max-w-sm cap
2026-06-28 03:04:04 +00:00
Benson Wong 55c3678906 ui: extract shared ActivityTable and split ModelDetail into components
- Add ActivityTable component consolidating column customization,
  table rendering, pagination, and capture dialog previously
  duplicated between Activity.svelte and ModelDetail.svelte
- Split ModelDetail tabs into ModelActivityTab, ModelLogsTab, and
  ModelDetailsTab components under components/model/
- Reduce Activity.svelte and ModelDetail.svelte to thin shells

- ModelDetail tabs now reuse ActivityTable instead of duplicating
  column management, formatting, and capture logic
2026-06-28 02:27:05 +00:00
Benson Wong 8b5a62d92a ui-svelte: big convert to shadcn components 2026-06-28 01:53:19 +00:00
Benson Wong d1e4c8ee77 ui tweaks 2026-06-28 01:21:40 +00:00
Benson Wong 11f8afead8 ui: add collapsible Models section to sidebar
Move Models to the top of the sidebar as a collapsible item with each
model listed as a sub-item.

- add persistent modelsMenuOpen store for expand state
- show status dot per model (grey/yellow/green for stopped/changing/loaded)
- right-aligned load/unload button with Play/PowerOff/Loader2 icon
- button stops propagation so it doesn't trigger navigation
2026-06-27 23:54:18 +00:00
Benson Wong 749819ef47 ui: consolidate playground nav into sidebar
Move Playground tabs into the sidebar as collapsible sub-items and make
the sidebar the sole navigation for playground interfaces.

- add collapsible UI primitive (bits-ui wrapper)
- add playground store with selected tab and menu open state (persistent)
- make Playground menu item collapsible; whole button toggles expand state
- move playground sub-items (Chat/Images/Speech/etc) under Playground
- remove in-page Tabs from Playground.svelte
- update sectionTitle breadcrumb to reflect active sub-item
- remove bg-sidebar panel background so items sit on page background
- remove persistent data-active background tint on menu items

fixes #123
2026-06-27 16:46:10 +00:00
Claude 0ab9e74333 ui: finish shadcn migration and remove legacy shim
Convert the remaining .btn usages (Concurrency, Performance, CaptureDialog)
to shadcn Button, fix CaptureDialog/PerformanceChart styles to shadcn
tokens, and remove the transitional legacy palette aliases and component
classes from index.css. Drop the now-unused lucide-svelte and shadcn-svelte
dependencies.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
2026-06-27 12:10:56 +00:00
Claude b20be6dcd1 ui: convert Image, Speech, Audio interfaces to shadcn buttons
Replace .btn elements and inline SVG icons with shadcn Button and
@lucide/svelte icons in the image, speech and audio playground tabs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
2026-06-27 12:05:19 +00:00
Claude fc24722258 ui: migrate Rerank and normalize remaining views to shadcn tokens
- RerankInterface uses Button/Input/Textarea/ToggleGroup
- normalize legacy color utilities and lucide imports across the
  remaining playground interfaces, Performance and CaptureDialog

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
2026-06-27 12:01:19 +00:00
Claude 2b087dffb1 ui: migrate ChatMessage to shadcn tokens
Use shadcn Button/Textarea, @lucide/svelte icons, and map prose/code-block
styles to shadcn CSS variables.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
2026-06-27 11:58:24 +00:00
Claude 746c083a87 ui: migrate chat playground and stats to shadcn
- ChatInterface controls, settings, input use Button/Input/Textarea/Label
- ExpandableTextarea and ModelSelector restyled on shadcn tokens
- ActivityStats wrapped in Card; Tooltip uses shadcn tooltip

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
2026-06-27 11:56:31 +00:00
Claude 8dd91e99e8 ui: migrate Activity, Logs views to shadcn
- Activity table wrapped in Card with restyled column menu and Button
- LogPanel toolbar uses Button/Input with lucide icons
- LogViewer source switch uses a ToggleGroup

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
2026-06-27 11:52:11 +00:00
Claude 136dcdc25f ui: migrate Models panel and Playground to shadcn
- ModelsPanel uses Card, Button, Badge and a dropdown menu for actions
- Playground uses shadcn Tabs for the switcher while keeping every
  interface mounted to preserve state

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
2026-06-27 11:49:16 +00:00
Claude 767b8015fa ui: replace top navbar with shadcn sidebar layout
Add AppSidebar built from the shadcn sidebar primitives (collapsible icon
rail, editable title, nav with active states, footer theme toggle and
connection status) and wrap the app in a sidebar provider with an inset
top bar. Preserves the always-mounted Playground pattern.

- add src/components/AppSidebar.svelte
- restructure App.svelte around Sidebar.Provider/Inset
- remove Header.svelte

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
2026-06-27 11:46:30 +00:00
Claude f0144a2361 ui: add shadcn-svelte foundation and theming
Set up shadcn-svelte components and adopt its design-token system as the
base for modernizing the UI. Switch dark mode from the data-theme attribute
to the .dark class so shadcn primitives theme correctly.

- add components.json, $lib alias (tsconfig + vite), cn() util
- install shadcn primitives under src/lib/components/ui
- rewrite index.css with shadcn tokens (zinc + brand teal accent)
- keep legacy utility/class aliases as a transitional shim
- toggle .dark class from theme store in App.svelte

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UmuGqwNBJNEAMaWsdCDqUC
2026-06-27 11:42:43 +00:00
Benson Wong 32bc781326 internal/config,watcher: add -config-dir (#873)
Over time the llama-swap configuration file can get really long and
challenging to work with. The -config-dir flag is used for a directory
of configuration YAML fragments.

These fragments are merged together and into a full configuration and
tested for validity. All previous configuration functionality remains
unchanged.
v230
2026-06-24 20:48:51 -07:00
Benson Wong 316ad63f76 config,server: add upstream.ignorePaths (#869)
Add upstream.ignorePaths config to prevent model swaps for static-asset
requests made through the /upstream/<model>/<path> passthrough endpoint.

- add UpstreamConfig with compiled *regexp.Regexp slice; invalid regex
returns an error at load time
- apply a default pattern matching common static-asset suffixes
(.js/.json/.css/.png/.gif/.jpg/.jpeg/.ico/.txt) when unset
- in handleUpstream, return 409 Conflict when a path matches and the
local model is not already loaded; peer and already-loaded models fall
through to normal dispatch
- update config-schema.json and config.example.yaml

Updates discussion: #868
v229
2026-06-21 13:49:53 -07:00
g2mt e37077a963 feat: hide performance menu item if disabled (#832)
Hide the Performance UI item of the navigation bar if its disabled.
2026-06-21 13:38:29 -07:00
Benson Wong eff9b60434 server: capture failed (non-200) LLM requests (#862)
Store a request/response capture for non-200 responses so failed
requests can be inspected in the activity log's Capture dialog, matching
the existing behavior for successful requests.

- extract storeCapture/decodeResponseBody helpers to share capture logic
between the success and non-200 paths
- record non-200 bodies (decompressed) so error details are viewable
- the activity UI already gates the View button on has_capture, so it
now appears for failed requests with no UI changes
- add tests for capturing failed requests and the disabled-captures case

closes #766
2026-06-20 11:50:35 -07:00
Wojciech 9bcddad91b internal/server,ui: add new Acitivty page column - Drafted (#859)
Add draft metrics to activity log
2026-06-18 20:55:02 -07:00
Benson Wong a15e47922c proxy: meter /upstream requests via metrics middleware (#858)
Wrap /upstream/{upstreamPath...} in the metrics middleware so activity
log entries are recorded for model-dispatched endpoints accessed through
the upstream passthrough.

- Move findModelInPath to shared.FindModelInPath and reuse it in
handleUpstream, the log monitor lookup, and FetchContext.
- Extend FetchContext to resolve the model from /upstream/<model>/...
paths without consuming the request body.
- Add isMetricsRecordPath to limit recording to the model-dispatched
endpoints that produce token usage/timings.
- Add tests for upstream metrics recording and FetchContext upstream
path resolution.

Fixes #855
v228
2026-06-17 17:38:52 -07:00
George 0ab214d1c8 perf: add vendor-agnostic GPU monitoring for Windows (experimental) (#779)
Add GPU monitoring support for AMD and Intel GPUs on Windows using
D3DKMT (DirectX) and PDH performance counters.

- Add PDH-based GPU utilization via \GPU Engine(*)\Utilization
Percentage counter, summing all engine types per adapter (3D, Compute,
Copy, Video).
- Add D3DKMT bindings for adapter enumeration, memory segments, and
adapter perf data.
- Use PDH as primary utilization source (works on all vendors), with
D3DKMT RunningTime as fallback for systems without PDH counters.
- Prefer nvidia-smi when available, fall back to D3DKMT + PDH for
AMD/Intel.
- Backend priority: nvidia-smi -> D3DKMT + PDH -> ErrNoGpuTool.

Verified on AMD 7900XTX GPU with llama.cpp Vulkan & ROCm backend: GPU
utilization correctly shows ~99% during inference, ~0-2% when idle.

---

LLM disclosure: GLM 5.1 & Kimi K2.6 have been used extensively during
exploration and coding to the point that the LLM's wrote over 3/4 of the
code, and I have done additional verification myself.
As such, it should be considered experimental.
Additional verification is needed.

I have tested it on my 7900XTX system with Windows 11, and it works
correctly, but as I only have this one rig, I cannot verify it
everywhere.
v227
2026-06-16 21:49:09 -07:00
Benson Wong d07b063ab6 internal/server,shared: support request metadata (#850)
- add support for http handlers in the request chain to append metadata
to the request
- metrics middleware will include metadata in the activity log 
- update Activity UI to support metadata, drag sort columns
- update Activity UI capture dialog to use more screen space

Updates #834
2026-06-16 21:44:55 -07:00
Benson Wong 826210dac9 .coderabbit.yaml: disable unit_tests 2026-06-16 10:10:17 -07:00
Benson Wong 6cf1317341 schedule,shared: move concurrency 429 limits into scheduler code (#849)
- make concurrency limiting the scheduler.Scheduler's responsibility
- eliminate the separate concurrency limit middleware 
- move concurrencyLimit logic into scheduler.FIFO to maintain backwards compatibility
- add HTTPError from #834 

Updates #834
2026-06-15 22:35:12 -07:00
Wojciech 8e84b2ec4f README.md: add macports install option to README (#848) 2026-06-15 15:58:24 -07:00
Benson Wong ed77385d08 ui: improve manual model load and cancel (#847)
- When a model is manually loaded show a cancel buttton and a queued
status
- Implement cancellation in scheduler.Scheduler interface and FIFO
scheduler
- Add cache bust query parameter to bypass browser cache

Fixes #844
v226
2026-06-14 13:38:10 -07:00
Benson Wong 92b90447e8 Model capabilities 734 (#842)
internal/config,server: implement model capabilities

- define the capabilities of a model using a simple config block on the
model
- v1/models renders out capabilities to be compatible with openrouter,
huggingface chat, and mistral formats for broader compatibility
- add support for capabilities in UI

Fixes #734
v225
2026-06-13 23:23:19 -07:00
Benson Wong 62aea0e83d internal/router,server,shared: refactor auth, libs (#839)
- refactor shared http functionality into internal/shared/http.go
- remove stripping of Authorization and x-api-key
- add Request Context middleware to internal/server
- add /ui and /metrics behind auth middleware, fixes #717

Fix #717
Updates: #834
2026-06-13 10:19:04 -07:00
Benson Wong 8c660dcb90 main: gofmt 2026-06-11 22:16:39 -07:00
Benson Wong f6877b8175 main: show message when listening on network (#836)
fixes: #739
2026-06-11 22:15:14 -07:00
Benson Wong 9b3a33d7b9 Implement new scheduler (#823)
- introduce internal/router/scheduler to decouple routing, swapping and
queuing into interface contracts.
- introduce a new `routing` configuration section that supersedes
`matrix` and `group` while maintaining backwards compatibility
- add FIFO scheduler with prioritized queuing 
- add internal/router/design.md as developer documentation on
implementing new schedulers and routers

Fixes #797
2026-06-10 20:34:25 -07:00
Benson Wong 0cfe5a6639 Makefile,internal: fix websocket regression and other small things (#830)
- fix websocket regression and add test to prevent in the future
- fix staticheck errors
- remove proxy package remnants from Makefile 

fix #829
v224
2026-06-09 21:37:53 -07:00
Benson Wong 44e1501e81 internal/process,server: fix unload regression (#828)
In v221 the shutdown behaviour was refactored so shutdown behaviour was
more reliable in stopping a process group. This exposed an existing bug
where the unload API had a timeout of 0 that snuck in during the big
refactor.

- set a default timeout of 10 seconds for unloads called via the API
- add logging around shutdown routine

updates: #807, #808
fixes: #827
2026-06-09 20:49:58 -07:00
Benson Wong 46cea36bc2 proxy: remove legacy code. Thanks champ 🫡 (#822)
Fixes #820
2026-06-06 21:00:30 -07:00
Benson Wong ccfba0df28 docker: fix arm64 cpu image downloading amd64 llama-swap binary (#819)
Replace TARGETARCH build-arg with runtime arch detection via uname -m.
BuildKit's TARGETARCH injection was unreliable for the multi-arch cpu
build, causing the arm64 image variant to download and embed the x86_64
llama-swap binary — resulting in "exec format error" on arm64 hosts.

With QEMU user-space emulation, uname -m correctly returns aarch64
inside an arm64 container build, so the download always fetches the
right binary for the actual target architecture. Also adds --fail to
curl so HTTP 404s produce a build error instead of silently embedding an
HTML error page.

fixes #818

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-04 14:26:21 -07:00
Benson Wong ddfae90b19 Change cron schedule for container builds
Shift the non-unified container builds about 8 hours after the llama.cpp's projects container publishing window. The llama.cpp containers take a few hours to build and publish and 8 hours is expected to be enough time to remain fresh. 

Additionally, add an extra build at 18:00 in case the 12:00 one does not pick things up. The container builds on the llama-swap side are cheap (just injecting llama-swap binary) so it is fine to run them a bit more frequently.
2026-06-04 11:00:43 -07:00
Benson Wong 29d3d9ba20 perf: add macOS GPU monitoring via mactop and ioreg (#816)
Implement performance monitoring on OSX for Apple Silicon hardware. 

The implementation uses a combination of mactop and ioreg. If mactop is
installed (`brew install mactop`) it is used in a headless cli mode to
stream usage metrics. mactop hooks into unpublished(?) C based APIs in
OSX. Rather than introduce a cgo dependency into llama-swap's build
chain only for darwin I opted to go the external process route.

ioreg, which comes bundled with OSX is used as the fallback. It does not
provide temperature and power usage data but is able to show accurate
GPU and memory utilization.

Updates #771, #814
v223
2026-06-03 21:51:03 -07:00
Benson Wong 9be9a87fa0 internal/process: improve windows shutdown behaviour (#808)
Add Windows specific shutdown code paths so stopping of child processes
is more reliable:

- stopping llama-swap won't leave behind any child processes it created
- uses Job Objects in Windows so the whole llama-swap tree is closed by
the os
- add procCtx to baseRouter. It replaces shutdownCtx as a signal for
managing lifetime state.
- shutdownCtx is only used by the router to stop handling new requests
during shutdown
- improve debug logging to make it easier to trace source of issues

Fixes #804
Updates #807
v222
2026-06-01 00:45:30 -07:00
Benson Wong 6ea551362e process,router: make model shutdown and load-streaming robust
Note: The original proxy/process_unix.go had a noop for setProcAttributes
so it also did not stop grandchildren processes. This patch adds that capability 
and improves reliability.

--

Stop() no longer hangs on a shell wrapper that forks the real binary.
The upstream is built with exec.CommandContext + cmd.Cancel +
cmd.WaitDelay, so cmd.Wait() returns even when a forked grandchild
inherits the stdout/stderr pipes. killProcess sends the stop signal
directly (not by cancelling the context) so cmd.WaitDelay measures from
process exit and never silently caps the caller's graceful timeout.

The upstream is also started in its own process group (Setpgid) on Unix,
so the graceful SIGTERM — and the SIGKILL escalation after the timeout —
are delivered to the whole group via the negative PID. A forked
grandchild is reaped with its parent instead of leaking as an orphan.

The loading-spinner SSE goroutine can no longer panic when it outlives
the request. net/http recycles the response writer via Reset(nil) once
ServeHTTP returns; the orphaned goroutine then flushed against a
nil-backed writer and crashed with a SIGSEGV. A release() fence on
loadingWriter lets any in-flight write finish then short-circuits later
writes/flushes, and all three ServeHTTP select branches run a
finishLoading helper (cancelLoad, waitForCompletion, release) before the
writer is reclaimed.

- internal/process: exec.CommandContext + WaitDelay, Setpgid process
groups, group-wide SIGTERM/SIGKILL teardown
- internal/router: release() fence + finishLoading on loadingWriter

fixes #804
v221
2026-05-31 10:11:12 -07:00
Benson Wong 03d58e53fa Add load testing tool to the UI (#805)
Wouldn't it be nice to test the performance, swapping and concurrency
from the UI? Now we can! This is a port of `cmd/test-concurrency` into the UI

Here's a demo of it working with a swap matrix: 

https://github.com/user-attachments/assets/b6bb12ec-0381-46f1-a6b8-27d1c3c0ddb3
v220
2026-05-30 17:04:30 -07:00
Luiszzzor c790d0ee03 fix: update the concurrency middleware to respond with a JSON payload (#798)
update the concurrency middleware to respond with a JSON payload instead
of plain text when the request limit is reached to be compatible with
openai api standard

---------

Co-authored-by: Ludwik <l.czarnota@samsung.com>
2026-05-29 23:59:32 -07:00
Benson Wong 4ca9c478a2 Makefile,internal/server: various release tweaks v219 2026-05-29 15:27:08 -07:00