Compare commits

...

97 Commits

Author SHA1 Message Date
Benson Wong cc77139ff8 proxy,proxy/config: add global TTL feature (#554)
Add a new configuration parameter globalTTL that all models will
inherit. The default value is 0 which matches the currently
functionality to never automatically unload a model.

The model.ttl's default has changed to -1, which means use the global
TTL value. Any model.ttl >=0 is now value with 0 meaning never unload.
This allows a model to override a globalTTL > 0 and be configured to
never unload.

Fixes #459
Closes #512
2026-03-01 21:02:12 -08:00
Benson Wong 390a35bf93 ui-svelte: add copy button to markdown code blocks (#537)
Add a copy-to-clipboard button that appears on hover for each code block
rendered in the chat interface assistant messages.

- Svelte action `codeBlockCopy` injects a button into every `<pre>`
element
- MutationObserver reattaches buttons as streaming content arrives
- Button shows a check icon for 2 seconds after a successful copy
- Uses clipboard API with execCommand fallback for non-secure contexts
- CSS hides button by default and reveals it on pre:hover

https://claude.ai/code/session_01PTA5ao5YQuFAS6a9juLeZW

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-01 09:48:56 -08:00
pdscomp 181f71ca11 .github,docker: add cuda13 architecture support (#551)
Add `cuda13` as a supported build architecture, targeting the
`ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image.

The `server-cuda13` image ships with CUDA 13 libraries, providing
improved performance on recent NVIDIA hardware compared to the existing
`server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX
50-series) benefit from reduced model load latency and higher token
throughput.

- Add `cuda13` to the allowed architectures list in
`docker/build-container.sh`
- Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so
the container is built and pushed automatically
2026-03-01 09:37:08 -08:00
Benson Wong 49546e2cf2 ui: fix text size svg 2026-02-27 23:47:52 -08:00
Benson Wong 2c078964f4 Update README with additional images
Added new images for model loading and real-time log streaming sections.
2026-02-27 23:45:40 -08:00
Benson Wong 175bb36fb1 Revise README description for clarity and detail
Updated description to clarify compatibility and usage.
2026-02-27 23:42:40 -08:00
Benson Wong aedb640471 Enhance web UI section in README
Updated README to enhance the description of the web interface and added details about features like token metrics, request inspection, model management, and real-time log streaming.
2026-02-27 23:40:31 -08:00
Benson Wong 2f377f6dc6 ui: add OGG audio format support to transcription playground (#544) 2026-02-26 19:48:19 -08:00
Benson Wong 64e4c79fc3 ui: add Rerank tab to playground (#536)
Add a new Rerank tab to the playground that lets users test /v1/rerank
endpoints. Supports a visual table editor and a JSON editor mode that
stay in sync when toggling between them.

- add rerankApi.ts with typed wrapper for /v1/rerank
- add RerankInterface.svelte with query input, sortable document table,
color-coded scores, auto-add row, cancel/clear, and token usage
- add rerankLoading store to playgroundActivity derived store
- register Rerank tab in Playground.svelte

Updates #481
2026-02-21 21:59:14 -08:00
Benson Wong 19fb5f35e9 proxy: implement setParamsByID filter (#535)
Add setParamsByID filter that applies different request parameters based
on the requested model ID, enabling per-alias behaviour for a single
loaded model.

- add SetParamsByID field to Filters struct and SanitizedSetParamsByID
method
- substitute ${MODEL_ID} and other macros in setParamsByID keys and
values
- validate no unknown macros remain in keys or values after substitution
- apply setParamsByID in proxyInferenceHandler after setParams (can
override it)
- update config-schema.json with setParamsByID definition
- update UI to show aliases and make them selectable in the Playground

closes #534
2026-02-19 22:21:10 -08:00
Benson Wong b45102bde8 ui: smart auto-scroll in LogPanel (#530)
Pause auto-scroll when the user scrolls up to review logs, and resume
when they scroll back to the bottom.

- add `userScrolledUp` state variable
- add `handleScroll` to detect scroll position with 40px threshold
- guard the auto-scroll effect with `!userScrolledUp`

closes #529
2026-02-18 19:47:37 -08:00
Brian Mendonca 1688bdd1e9 proxy, ui: add pending requests count to the main dashboard (#516)
add a real time counter of pending (inflight) requests to the UI.
2026-02-16 09:41:15 -08:00
Benson Wong d33d51fa75 .coderabbit.yaml,AGENTS.md: small tweaks 2026-02-15 21:31:30 -08:00
Benson Wong e3bf065574 ui: persist playground state across route navigation (#525)
- Keep Playground component mounted when navigating away, preserving
streaming/generating state
- Add animated gradient effect on Playground nav link when activity is
in progress
2026-02-15 21:30:52 -08:00
Benson Wong 3e52144058 ui-svelte: incremental rendering of chat messages in the Playground (#520)
add incremental rendering to Playground > Chat
2026-02-15 11:00:44 -08:00
Benson Wong d5e52d7d00 build: disable provenance attestations in container builds (#523)
## Summary
- Add `--provenance=false` to docker build commands in
`build-container.sh`
- BuildKit attestation manifests are stored as untagged images in GHCR,
and the `delete-untagged-containers` cleanup job deletes them, breaking
the manifest list and causing `manifest unknown` errors on pull
- ref: https://github.com/actions/delete-package-versions/issues/162
2026-02-14 10:23:08 -08:00
Benson Wong 17e5263a76 .github/workflows: fix expired token in publishing images (#522)
Fixes: #517
2026-02-14 10:06:05 -08:00
Benson Wong 8d6d949ec3 proxy: support timings for /infill from llama-server (#510)
fixes: #463
2026-02-07 17:16:27 -08:00
Benson Wong b5fde8eb6d proxy,ui-svelte: add request/response capturing (#508)
Add saving request and response headers and bodies that go through
llama-swap in memory.

- captureBuffer added to configuration. Captures are enabled by default.
- 5MB of memory is allocated for req/response captures in a ring buffer.
Setting captureBuffer to 0 will disable captures.
- UI elements to view captured data added to Activity page. Includes
some
QOL features like json formatting and recombining SSE chat streams
- capture saving is done at the byte level and has minimal impact on
llama-swap performance

Fixes #464 
Ref #503
2026-02-07 15:40:01 -08:00
Nuno 7eef5defb8 docs: add stable-diffusion.cpp references (#506)
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2026-02-04 20:20:39 -08:00
Benson Wong bc01e6f539 build: add stable-diffusion server to musa and vulkan container images (#504)
Add sd-server from stable-diffusion.cpp docker image for 
vulkan and musa containers.

closes #450
2026-02-01 16:17:26 -08:00
Benson Wong 0462e3dc3f Reorganize UI controls and improve form interactions (#500)
Reorganizes control placement in the playground interfaces and
improves form interactions for better UX, particularly on mobile
devices.

## Key Changes

- **AudioInterface & ImageInterface**: Moved "Clear" buttons from the
top control bar into the action button group below the form inputs for
better visual hierarchy and logical grouping
- **ImageInterface**: 
- Added prompt clearing to the `clearImage()` function so the input
field is reset when clearing generated images
- Updated Clear button disabled state to also check if prompt is empty,
allowing users to clear an empty prompt
- Added responsive flex styling (`flex-1 md:flex-none`) to the Clear
button for better mobile layout
- **ExpandableTextarea**: 
- Imported `untrack` from Svelte to properly handle reactive
dependencies
- Wrapped `expandedValue.length` in `untrack()` to prevent unnecessary
reactivity when setting cursor position
- Improved button visibility on mobile by changing opacity from
`opacity-0` to `opacity-60` with `md:opacity-0` breakpoint, making the
expand button more discoverable on touch devices

## Implementation Details
The `untrack()` usage in ExpandableTextarea ensures that reading the
text length doesn't create a reactive dependency, preventing potential
infinite loops while still allowing the effect to run when `isExpanded`
changes.
2026-02-01 15:18:22 -08:00
Benson Wong 7b20fc011b Add path filters to CI workflows and create UI test workflow (#501)
* .github/workflows: add UI tests and path-filter Go CI

Add ui-tests.yml workflow to run svelte type checking and vitest
on push/PR to main when ui-svelte/ files change.

- Add path filters to go-ci.yml and go-ci-windows.yml to skip
  Go tests when only non-backend files change
- Filter on **/*.go, go.mod, go.sum, and Makefile

https://claude.ai/code/session_01E6acq54D8JjuE7pczxPGT7

* ui-svelte: remove unused declarations in SpeechInterface

Remove unused `generatedText` state and `clearAudio` function
that caused svelte-check errors.

https://claude.ai/code/session_01E6acq54D8JjuE7pczxPGT7

* .github/workflows: update Node.js to v24

Node 23 is end-of-life; bump to 24 in ui-tests.yml and release.yml.

https://claude.ai/code/session_01E6acq54D8JjuE7pczxPGT7

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-01 15:11:49 -08:00
Benson Wong 20738f3623 proxy,ui-svelte: replace old UI with svelte+playground
Replace the legacy React UI with the new Svelte-based one. Introduce a Playground in the UI to quickly test out text, image, text to speech and speech to text models behind llama-swap. 

Key Changes

New Svelte UI (ui-svelte/)

  - Multi-tab Playground with Chat, Image Generation, Audio Transcription, and Speech interfaces
  - Chat: message editing/regeneration, markdown rendering with LaTeX math support, image attachments, code syntax highlighting
  - Image: size selector, download/fullscreen viewing
  - Audio: transcription with peer support
  - Speech: voice caching with manual refresh, download button
  - Responsive mobile layout with collapsible navigation
  - XSS fixes and accessibility improvements

Proxy Improvements

  - Add gzip/brotli compression for UI static assets (proxy/ui_compress.go)
  - Add GET /v1/audio/voices?model={model} endpoint for voice listing
  - Add peer support for /v1/audio/transcriptions
2026-01-31 22:49:13 -08:00
Benson Wong cdea7d16bd proxy/config: skip env macros in YAML comment lines (#496)
Fix a bug where ${env.macro_not_exist} in comments would trigger a non-substituted macro error. 

fixes #495
2026-01-30 20:10:29 -08:00
Benson Wong 5de387dbf9 ui: fix node-tar vulnerability 2026-01-28 21:40:18 -08:00
Benson Wong 6f8e7ccb57 .github/workflows: switch release.yml to build ui-svelte 2026-01-28 21:39:10 -08:00
Benson Wong 4384315b44 ui-svelte: add Svelte port of React UI (#487)
Trying out svelte for the UI. The port was done by Claude Code on the iOS app w/ Opus 4.5. 

---

* ui: add Svelte port of React UI

Port the React-based UI to Svelte 5 with the following changes:

- Create new ui-svelte directory with complete Svelte 5 implementation
- Use Svelte stores instead of React contexts for state management
- Implement custom ResizablePanels component to replace react-resizable-panels
- Port all pages: LogViewer, Models, Activity
- Port all components: Header, ConnectionStatus, LogPanel, ModelsPanel, etc.
- Use svelte-spa-router for client-side routing
- Same build output directory (proxy/ui_dist) and base path (/ui/)
- Tailwind CSS 4 with same theme configuration

https://claude.ai/code/session_01F3xXLYsd62gePVSFv7aboP

* ui-svelte: simplify state management

- Remove redundant state syncing pattern in LogPanel and ModelsPanel
- Use store values directly with $ syntax instead of manual subscriptions
- Consolidate duplicate title sync logic in App.svelte
- Use existing syncTitleToDocument() from theme.ts

https://claude.ai/code/session_01F3xXLYsd62gePVSFv7aboP

* ui-svelte: use idiomatic Svelte 5 patterns

- Use $effect for document side effects (theme, title) instead of
  store subscriptions
- Use class: directive for active nav links in Header
- Remove SSR guards (unnecessary for client-only SPA)
- Remove leaked subscription in syncThemeToDocument
- Simplify theme.ts by removing sync functions

https://claude.ai/code/session_01F3xXLYsd62gePVSFv7aboP

* ui-svelte: fix build warnings and improve accessibility

Fix Svelte build warnings and add proper accessibility support
to interactive components.

- add aria-labels to buttons for screen readers
- implement keyboard navigation for resizable separator
- suppress intentional state initialization warnings
- update Makefile to use ui-svelte build directory
- add peer:true to package-lock.json dependencies

* ui-svelte: reorganize navigation and add log view toggle

Make Models the default landing page and add view mode toggle
to the Logs page with persistent state.

- set Models as default route at /
- move Logs to /logs route
- reorder navigation: Models, Activity, Logs
- add view toggle with three modes: Panels, Proxy only, Upstream only
- fix horizontal overflow with width constraints
2026-01-28 21:37:29 -08:00
Benson Wong 6439ab1515 ui: add peer:true in package-lock.json 2026-01-22 08:43:36 -08:00
dependabot[bot] f94226122c build(deps-dev): bump tar from 7.5.3 to 7.5.6 in /ui (#477)
Bumps [tar](https://github.com/isaacs/node-tar) from 7.5.3 to 7.5.6.
- [Release notes](https://github.com/isaacs/node-tar/releases)
- [Changelog](https://github.com/isaacs/node-tar/blob/main/CHANGELOG.md)
- [Commits](https://github.com/isaacs/node-tar/compare/v7.5.3...v7.5.6)

---
updated-dependencies:
- dependency-name: tar
  dependency-version: 7.5.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-21 22:55:02 -08:00
Ryan Voots 7493618fdc Add count_tokens api proxying (#476) 2026-01-20 09:34:42 -08:00
Benson Wong 205efd40a1 proxy: extend /running endpoint with additional process data (#474)
Extend the /running endpoint to return more details about running
processes beyond just model and state.

- add cmd field to show the command being executed
- add proxy field to show the proxy URL
- add ttl (UnloadAfter) for automatic unloading configuration
- add name and description for model metadata
- update tests to verify new fields are returned correctly

fixes #471
2026-01-19 17:37:00 -08:00
Benson Wong 14207f8492 ui: npm security update 2026-01-18 21:56:32 -08:00
Benson Wong 4e850c2834 config: refactor macro substitution in configuration (#470)
This commit simplifies substitution of environment variables into the configuration. There was a lot of repetitive code substituting ${env.VAR_NAME} into different fields after the configuration was parsed into a config.Config. This refactor uses a string substitution of env vars into the YAML config before it is fully parsed. This eliminates a lot of logic while maintaining backwards compatibility.
2026-01-18 21:52:34 -08:00
Benson Wong 75fced579e config: support macros in peer apiKey and filters (#469)
* config: support environment variable macros in peer apiKeys

Add ${env.VAR_NAME} substitution for peer apiKey fields, consistent
with existing env macro support for model fields and global apiKeys.

- Add env macro substitution for peers.{name}.apiKey in LoadConfigFromReader
- Add tests for peer apiKey env substitution
- Update config.example.yaml to show env macro usage

* config: support macros in peer apiKey and filters

Extend macro substitution to peer configuration fields:
- peers.{name}.apiKey supports both global macros and env macros
- peers.{name}.filters.stripParams supports both macro types
- peers.{name}.filters.setParams supports both macro types

Also renamed validateMetadataForUnknownMacros to validateNestedForUnknownMacros
for reuse across model metadata and peer filters validation.
2026-01-16 23:10:50 -08:00
Benson Wong b73f367f22 config-schema.json,config.example.yaml: Update examples and schema 2026-01-16 22:43:25 -08:00
Benson Wong 8f2137c72b config: support environment variable macros in apiKeys (#467)
Add substituteEnvMacros support for apiKeys configuration field,
allowing API keys to be loaded from environment variables using
the ${env.VAR_NAME} syntax.

- Apply env macro substitution before validation
- Add tests for env macro substitution in apiKeys
2026-01-16 22:41:14 -08:00
Benson Wong 124007cc98 config: add environment variable macros (#466)
* config: add environment variable macros

Add support for ${env.VAR_NAME} syntax to pull values from system
environment variables during config loading.

- env macros processed before regular macros (allows macros to reference env vars)
- works in cmd, cmdStop, proxy, checkEndpoint, filters.stripParams, metadata
- returns error if env var is not set
- add comprehensive tests

fixes #462

* docs: add env macro example to config.example.yaml
2026-01-16 22:25:20 -08:00
Benson Wong eb5bfff0b0 proxy: unify filtering for local models and peers
This unifies the filtering capabilities for models and peers

- stripParams: removes params in the request
- setParams: sets params in the request

fixes #453
2026-01-15 18:59:43 -08:00
Benson Wong 3edb180c08 ci: free up disk space before ROCm container build (#460) 2026-01-14 22:03:42 -08:00
Benson Wong 66d555e625 Improve container build reliability (#457)
* docker: add .env usage in build-container.sh
* .github,docker: add rocm, improve logging
* .github,CLAUDE.md: fix workflow and update guidelines

Update containers workflow to only push images when triggered
manually or on schedule, not on workflow file changes.

- add push trigger for workflow file changes in containers.yml
- update push condition to skip on regular push events
- update CLAUDE.md commit message guidelines

* docker: remove comma in build-container.sh

* .github,docker: improve container build workflow

Add pagination support for fetching llama.cpp tags and improve debugging.

- add build-container.sh to workflow trigger paths
- implement fetch_llama_tag() with pagination support
- replace .env with local testing instructions
- add DEBUG_ABORT_BUILD flag for testing
2026-01-10 22:14:33 -08:00
Benson Wong 4f863fd9fc CLAUDE.md: tweak instructions 2026-01-09 21:42:06 -08:00
Benson Wong 267c030457 ui: update react-router-dom to 7.12.0 (#456)
Update react-router-dom from 7.6.2 to 7.12.0 to address security vulnerability.

- Updated dependency in package.json
- Regenerated package-lock.json
- Verified build passes successfully
- Confirmed 0 vulnerabilities with npm audit

Co-authored-by: Claude <noreply@anthropic.com>
2026-01-08 16:13:09 -08:00
Benson Wong c19309fe7e CLAUDE.md: small instruction tweaks 2026-01-07 21:34:23 -08:00
Benson Wong 4413881b2d proxy: actually add /v1/responses endpoint (#449)
ref: #448
2026-01-01 13:35:45 -08:00
Benson Wong 8df5e8563b proxy: add /v1/responses and /v1/audio/voices endpoints (#448)
Updates #433
Fixes #442 #226
2026-01-01 12:52:12 -08:00
Benson Wong 7931212d3e proxy: add v1/images/edits API endpoint (#447)
Updates #433
2026-01-01 12:43:06 -08:00
Benson Wong 3dc36032fb proxy: skip very slow tests in -short test mode (#446)
* proxy: skip very slow tests in -short test mode
* CLAUDE.md: update testing instructions
2025-12-31 14:08:56 -08:00
Benson Wong addb98646f proxy: add support for basic authorization (#445)
Fixes #444 where the UI with api keys did not work. The choice to use
http basic authorization is for simple, automatic browser support. No
changes to the UI were necessary. Just use an API key as the password,
no user name is required.
2025-12-31 13:42:35 -08:00
Benson Wong 37d74efc2d proxy: add /v1/images/generations (#443)
Add support for the /v1/images/generations endpoint

Updates #433
Closes #191
2025-12-30 21:04:58 -08:00
Benson Wong 22e098ac8b Add Peer Model Support (#438)
This PR allows a single llama-swap to be the central proxy for models served by other inference servers. The peer servers can be another llama-swap or any API that supports the /v1/* inference endpoint.

Updates: #433, #299
Closes: #296
2025-12-27 20:18:06 -08:00
Benson Wong 9864f9f517 .coderabbit.yaml: disable annoying features 2025-12-23 23:53:06 -08:00
Benson Wong 53b32f3601 proxy: add API key support (#436)
Add configuration support for api keys that are enforced by llama-swap. Keys are stripped before sending them to upstream servers. 

Updates: #433, #50 and #251
2025-12-23 23:39:33 -08:00
Benson Wong 565c44766d config,proxy: add new configuration logToStdout (#432)
The new logToStdout option controls what is logged to stdout. The
default has been changed to just the proxy logs, which contain swap and
http request logs.

There are four supported settings: none, proxy, upstream, both. The
"both" setting is the legacy setting where everything was spewed to
stdout.
2025-12-21 22:23:31 -08:00
Benson Wong e6a9e210ba proxy: fix path bug in /logs/stream/{model_id} (#431)
A {model_id} containing a forward slash trips up gin's path param
parsing. This updates /logs/stream to work like /upstream where the
model_id is built up in parts and searched for in the configuration.

Updates #421
2025-12-21 21:47:14 -08:00
Benson Wong d3f329f924 proxy: Improve logging performance and allow separate log streaming (#421)
Replace container/ring.Ring with a custom circularBuffer that uses a
single contiguous []byte slice. This fixes the original implementation
which created 10,240 ring elements instead of 10KB of storage.

GetHistory is now 139x faster (145μs → 1μs) and uses 117x less memory
(1.2MB → 10KB). Allocations reduced from 2 to 1 per write operation.

Create a LogMonitor per proxy.Process, replacing the usage
of a shared one. The buffer in LogMonitor is lazy allocated on the first
call to Write and freed when the Process is stopped. This reduces
unnecessary memory usage when a model is not active.

The /logs/stream/{model_id} endpoint was added to stream logs from a
specific process.
2025-12-18 21:49:25 -08:00
Benson Wong 98879b38c1 docker: add /app to $PATH (#424)
Make it so llama-server can be called directly instead of with the full
path at /app/llama-server.

Fixes #423
Ref: #233
2025-12-06 22:58:29 -08:00
Benson Wong 7b3b0f5eae move header images around [skip ci] 2025-12-02 19:40:42 -08:00
Benson Wong 021ccceef1 README: update hero image 2025-12-02 19:37:03 -08:00
Benson Wong f03871c50a Update README.md
- add supported anthropic API 
- add example for docker hot reload support
2025-12-02 19:03:01 -08:00
Ryan Steed dc00d17abe docs: add documentation for non-root container images and security considerations (#416)
* docs: add documentation for non-root container images and security considerations
* docs: move container security section to dedicated file and update README links
2025-12-02 08:52:26 -08:00
Benson Wong dea98733c3 proxy: extract metrics for v1/messages (#419) 2025-11-29 23:51:20 -08:00
Benson Wong bccce5fa19 go.mod,ui/package-lock.json: dependency and security updates (#418) 2025-11-29 22:27:22 -08:00
Benson Wong c968da1b73 proxy: add support for anthropic v1/messages api (#417)
* proxy: add support for anthropic v1/messages api
* proxy: restrict loading message to /v1/chat/completions
2025-11-29 22:09:07 -08:00
Ryan Steed a883d68d4f feat: Add support for custom llama.cpp base image and forked llama-swap repositories (#396)
* feat: Add support for custom llama.cpp base image and forked llama-swap repositories

- Introduce BASE_LLAMACPP_IMAGE env var to customize llama.cpp base image
- Introduce LS_REPO env var to customize llama-swap source
- Use GITHUB_REPOSITORY env var to automatically detect forked repos
- Update container tagging to use dynamic repo paths
- Pass build args for BASE_IMAGE and LS_REPO to Containerfile
- Enable flexible release downloads from forked repositories

* chore: quote entire curl options, appease coderabbitai
2025-11-29 20:59:15 -08:00
Ryan Steed b1dec8b735 docker: build both root and non-root container images (#412)
Change the user back to root for containers. Additionally, built a "non-root" labeled container for users who wish to have the additional security of running llama-swap as a lower privileged user.
2025-11-25 10:44:13 -08:00
Nikesh Parajuli 06523d8c1e feat: add platform-specific process attributes support (#411)
Fixes issues on Windows showing new windows for every process llama-swap spawns.
2025-11-24 21:39:56 -08:00
Ryan Steed 86e9b93c37 proxy,ui: add version endpoint and display version info in UI (#395)
- Add /api/version endpoint to ProxyManager that returns build date, commit hash, and version
- Implement SetVersion method to configure version info in ProxyManager
- Add version info fetching to APIProvider and display in ConnectionStatus component
- Include version info in UI context and update dependencies
- Add tests for version endpoint functionality
2025-11-17 10:43:47 -08:00
Ryan Steed 3acace810f proxy: add configurable logging timestamp format (#401)
introduces a new configuration option logTimeFormat that allows customizing the timestamp in log messages using golang's built in time format constants. The default remains no timestamp.
2025-11-16 10:21:59 -08:00
Ryan Steed 554d29e87d feat: enhance model listing to include aliases (#400)
introduce includeAliasesInList as a new configuration setting (default false) that includes aliases in v1/models

Fixes #399
2025-11-15 14:35:26 -08:00
Benson Wong 3567b7df08 Update image in README.md for web UI section 2025-11-08 15:29:37 -08:00
Benson Wong 38738525c9 config.example.yaml: add modeline for schema validation 2025-11-08 15:08:55 -08:00
Benson Wong c0fc858193 Add configuration file JSON schema (#393)
* add json schema for configuration
* add GH action to validate schema
2025-11-08 15:04:14 -08:00
Benson Wong b429349e8a add /ui/ to wol-proxy polling (#388) 2025-11-08 14:16:12 -08:00
Ryan Steed eab2efd7b5 feat: improve llama.cpp base image tag for cpu (#391)
Refactor the container build script to resolve llama.cpp base image for CPU, also tag these builds accordingly.

- For CPU containers, now fetch the latest 'server' tagged llama.cpp image instead of using a generic 'server' tag
- Cleans up the docker build command to use dynamic BASE_TAG variable
- Maintains existing push functionality for built images
2025-11-08 09:56:49 -08:00
Benson Wong 6aedbe121a cmd/wol-proxy: show a loading page for / (#381)
When requesting / wol-proxy will show a loading page that polls /status
every second. When the upstream server is ready the loading page will
refresh causing the actual root page to be displayed
2025-11-03 19:37:06 -08:00
Ryan Steed b24467ab89 fix: update containerfile user/group management commands (#379)
- Replace `addgroup` with `groupadd` for system group creation
- Replace `adduser` with `useradd` for system user creation
- Maintain same functionality while using more standard POSIX commands
2025-11-03 17:17:40 -05:00
Benson Wong 12b69fb718 proxy: recover from panic in Process.statusUpdate (#378)
Process.statusUpdate() panics when it can not write data, usually from a
client disconnect. Since it runs in a goroutine and did not have a
recover() the result was a crash.

ref: https://github.com/mostlygeek/llama-swap/discussions/326#discussioncomment-14856197
2025-11-03 05:30:09 -08:00
Ryan Steed f91a8b2462 refactor: update Containerfile to support non-root user execution and improve security (#368)
Set default container user/group to lower privilege app user 

* refactor: update Containerfile to support non-root user execution and improve security

- Updated LS_VER argument from 89 to 170 to use the latest version
- Added UID/GID arguments with default values of 0 (root) for backward compatibility
- Added USER_HOME environment variable set to /root
- Implemented conditional user/group creation logic that only runs when UID/GID are not 0
- Created necessary directory structure with proper ownership using mkdir and chown commands
- Switched to non-root user execution for improved security posture
- Updated COPY instruction to use --chown flag for proper file ownership

* chore: update containerfile to use non-root user with proper UID/GID

- Changed default UID and GID from 0 (root) to 10001 for security best practices
- Updated USER_HOME from /root to /app to avoid running as root user
2025-10-31 17:01:04 -07:00
Benson Wong a89b803d4a Stream loading state when swapping models (#371)
Swapping models can take a long time and leave a lot of silence while the model is loading. Rather than silently load the model in the background, this PR allows llama-swap to send status updates in the reasoning_content of a streaming chat response.

Fixes: #366
2025-10-29 00:09:39 -07:00
Benson Wong f852689104 proxy: add panic recovery to Process.ProxyRequest (#363)
Switching to use httputil.ReverseProxy in #342 introduced a possible
panic if a client disconnects while streaming the body. Since llama-swap
does not use http.Server the recover() is not automatically there.

- introduce a recover() in Process.ProxyRequest to recover and log the
  event
- add TestProcess_ReverseProxyPanicIsHandled to reproduce and test the
  fix

fixes: #362
2025-10-25 20:40:05 -07:00
Benson Wong e250e71e59 Include metrics from upstream chat requests (#361)
* proxy: refactor metrics recording

- remove metrics_middleware.go as this wrapper is no longer needed. This
  also eliminiates double body parsing for the modelID
- move metrics parsing to be part of MetricsMonitor
- refactor how metrics are recording in ProxyManager
- add MetricsMonitor tests
- improve mem efficiency of processStreamingResponse
- add benchmarks for MetricsMonitor.addMetrics
- proxy: refactor MetricsMonitor to be more safe handling errors
2025-10-25 17:38:18 -07:00
Benson Wong d18dc26d01 cmd/wol-proxy: tweak logs to show what is causing wake ups (#356)
fix the extra wake ups being caused by wol-proxy

* cmd/wol-proxy: tweak logs to show what is causing wake ups
* cmd/wol-proxy: add skip wakeup
* cmd/wol-proxy: replace ticker with SSE connection
* cmd/wol-proxy: increase scanner buffer size
* cmd/wol-proxy: improve failure tracking
2025-10-25 11:04:31 -07:00
Benson Wong 8357714421 ui: fix avg token/sec calculation on models page (#357)
* ui: use percentiles for token stats
* ui: add histogram of metrics
* update vite to remove security warnings

fixes #355
2025-10-23 22:22:24 -07:00
Benson Wong c07179d6e2 cmd/wol-proxy: add wol-proxy (#352)
add a wake-on-lan proxy for llama-swap. When the target llama-swap server is unreachable it will send hold a request, send a WoL packet and proxy the request when llama-swap is available.
2025-10-20 20:55:02 -07:00
Benson Wong 7ff50631e0 Update README for setup instructions clarity [skip ci] 2025-10-19 14:55:23 -07:00
Benson Wong 9fc0431531 Clean up and Documentation (#347) [skip ci]
* cmd,misc: move misc binaries to cmd/
* docs: add docs and move examples/ there
* misc: remove unused misc/assets dir
* docs: add configuration.md
* update README with better structure

Updates: #334
2025-10-19 14:53:13 -07:00
David Wen Riccardi-Zhu 6516532568 Add optional TLS support (#340)
* Add optional TLS support

Introduce HTTPS support with net/http Server.ListenAndServeTLS.

This should enable the option of serving via HTTPS without a reverse
proxy.

Add two flags:
- tls-cert-file (path to the TLS certificate file)
- tls-key-file (path to the TLS private key file)

Both flags must be supplied together; otherwise exit with error.

If both flags are present, call srv.ListenAndServeTLS.
If not, fall back to the existing srv.ListenAndServe (HTTP); no changes
to existing non‑TLS behavior.
2025-10-15 19:29:02 -07:00
David Wen Riccardi-Zhu d58a8b85bf Refactor to use httputil.ReverseProxy (#342)
* Refactor to use httputil.ReverseProxy

Refactor manual HTTP proxying logic in Process.ProxyRequest to use the standard
library's httputil.ReverseProxy.

* Refactor TestProcess_ForceStopWithKill test

Update to handle behavior with httputil.ReverseProxy.

* Fix gin interface conversion panic
2025-10-13 16:47:04 -07:00
Benson Wong caf9e98b1e Fix race conditions in proxy.Process (#349)
- Fix data races found in proxy.Process by go's race detector. 
- Add data race detection to the CI tests. 

Fixes #348
2025-10-13 16:42:49 -07:00
Benson Wong 539278343b ui: tweak vertical space for mobile (#343) 2025-10-10 10:05:36 -07:00
Benson Wong 00b738cd0f Add Macro-In-Macro Support (#337)
Add full macro-in-macro support so any user defined macro can contain another one as long as it was previously declared in the configuration file.

Fixes #336
Supercedes #335
2025-10-06 22:57:15 -07:00
Benson Wong 70930e4e91 proxy: add support for user defined metadata in model configs (#333)
Changes: 

- add Metadata key to ModelConfig
- include metadata in /v1/models under meta.llamaswap key
- add recursive macro substitution into Metadata
- change macros at global and model level to be any scalar type

Note: 

This is the first mostly AI generated change to llama-swap. See #333 for notes about the workflow and approach to AI going forward.
2025-10-04 19:56:41 -07:00
Benson Wong 1f6179110c proxy/config: add model level macros (#330)
* proxy/config: add model level macros

Add macros to model configuration. Model macros override macros that are
defined at the global configuration level. They follow the same naming
and value rules as the global macros.

* proxy/config: fix bug with macro reserved name checking

The PORT reserved name was not properly checked

* proxy/config: add tests around model.filters.stripParams

- add check that model.filters.stripParams has no invalid macros
- renamed strip_params to stripParams for camel case consistency
- add legacy code compatibility so  model.filters.strip_params continues to work

* proxy/config: add duplicate removal to model.filters.stripParams

* clean up some doc nits
2025-09-28 23:32:52 -07:00
Benson Wong 216c40b951 proxy/config: create config package and migrate configuration (#329)
* proxy/config: create config package and migrate configuration

The configuration is become more complex as llama-swap adds more
advanced features. This commit moves config to its own package so it can
be developed independently of the proxy package.

Additionally, enforcing a public API for a configuration will allow
downstream usage to be more decoupled.
2025-09-28 16:50:06 -07:00
Benson Wong 9e3d491c85 proxyToUpstream: add redirect with trailing slash to upstream endpoint (#322)
This adds a redirect to the upstream endpoint so it always ends with a trailing /. 

Fixes #321
2025-09-25 16:43:00 -07:00
Benson Wong 1a84926505 proxy: add unload of single model (#318)
This adds a new API endpoint, /api/models/unload/*model, that unloads a single model. In the UI when a model is in a ReadyState it will have a new button to unload it. 

Fixes #312
2025-09-24 20:53:48 -07:00
159 changed files with 20883 additions and 7227 deletions
+8 -1
View File
@@ -4,12 +4,19 @@ early_access: false
reviews:
profile: "chill"
request_changes_workflow: false
high_level_summary: true
high_level_summary: false
poem: false
review_status: true
collapse_walkthrough: false
sequence_diagrams: false
finishing_touches:
docstrings:
enabled: false
auto_review:
enabled: true
drafts: false
chat:
auto_reply: true
issue_enrichment:
planning:
enabled: false
+41
View File
@@ -0,0 +1,41 @@
name: Validate JSON Schema
on:
pull_request:
paths:
- "config-schema.json"
push:
branches:
- main
paths:
- "config-schema.json"
workflow_dispatch:
jobs:
validate-schema:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Validate JSON Schema
run: |
# Check if the file is valid JSON
if ! jq empty config-schema.json 2>/dev/null; then
echo "Error: config-schema.json is not valid JSON"
exit 1
fi
# Validate that it's a valid JSON Schema
# Check for required $schema field
if ! jq -e '."$schema"' config-schema.json > /dev/null; then
echo "Warning: config-schema.json should have a \$schema field"
fi
# Check that it has either properties or definitions
if ! jq -e '.properties or .definitions or ."$defs"' config-schema.json > /dev/null; then
echo "Warning: JSON Schema should contain properties, definitions, or \$defs"
fi
echo "✓ config-schema.json is valid"
+29 -2
View File
@@ -10,17 +10,44 @@ on:
# Allows manual triggering of the workflow
workflow_dispatch:
# Run on workflow file changes (without pushing)
push:
paths:
- '.github/workflows/containers.yml'
- 'docker/build-container.sh'
- 'docker/*.Containerfile'
# grant permissions on GITHUB_TOKEN to publish packages
# ref: https://docs.github.com/en/packages/managing-github-packages-using-github-actions-workflows/publishing-and-installing-a-package-with-github-actions#publishing-a-package-using-an-action
permissions:
contents: read
packages: write
id-token: write
jobs:
build-and-push:
runs-on: ubuntu-latest
strategy:
matrix:
platform: [intel, cuda, vulkan, cpu, musa]
platform: [intel, cuda, cuda13, vulkan, cpu, musa, rocm]
fail-fast: false
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Free up disk space
if: matrix.platform == 'rocm'
run: |
echo "Before cleanup:"
df -h
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf /opt/ghc
sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo docker system prune -af
echo "After cleanup:"
df -h
- name: Log in to GitHub Container Registry
uses: docker/login-action@v2
with:
@@ -31,7 +58,7 @@ jobs:
- name: Run build-container
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: ./docker/build-container.sh ${{ matrix.platform }} true
run: ./docker/build-container.sh ${{ matrix.platform }} ${{ github.event_name != 'push' }}
# note make sure mostlygeek/llama-swap has admin rights to the llama-swap package
# see: https://github.com/actions/delete-package-versions/issues/74
+18 -2
View File
@@ -3,9 +3,25 @@ name: Windows CI
on:
push:
branches: [ "main" ]
# only run when backend source changes
# cmd/ is excluded because it contains utilities without tests
paths:
- '**/*.go'
- '!cmd/**'
- 'go.mod'
- 'go.sum'
- 'Makefile'
- '.github/workflows/go-ci-windows.yml'
pull_request:
branches: [ "main" ]
paths:
- '**/*.go'
- '!cmd/**'
- 'go.mod'
- 'go.sum'
- 'Makefile'
- '.github/workflows/go-ci-windows.yml'
# Allows manual triggering of the workflow
workflow_dispatch:
@@ -28,7 +44,7 @@ jobs:
uses: actions/cache/restore@v4
with:
path: ./build
key: ${{ runner.os }}-simple-responder-${{ hashFiles('misc/simple-responder/simple-responder.go') }}
key: ${{ runner.os }}-simple-responder-${{ hashFiles('cmd/simple-responder/simple-responder.go') }}
# necessary for testing proxy/Process swapping
- name: Create simple-responder
@@ -43,7 +59,7 @@ jobs:
uses: actions/cache/save@v4
with:
path: ./build
key: ${{ runner.os }}-simple-responder-${{ hashFiles('misc/simple-responder/simple-responder.go') }}
key: ${{ runner.os }}-simple-responder-${{ hashFiles('cmd/simple-responder/simple-responder.go') }}
- name: Test all
shell: bash
+16
View File
@@ -3,9 +3,25 @@ name: Linux CI
on:
push:
branches: [ "main" ]
# only run when backend source changes
# cmd/ is excluded because it contains utilities without tests
paths:
- '**/*.go'
- '!cmd/**'
- 'go.mod'
- 'go.sum'
- 'Makefile'
- '.github/workflows/go-ci.yml'
pull_request:
branches: [ "main" ]
paths:
- '**/*.go'
- '!cmd/**'
- 'go.mod'
- 'go.sum'
- 'Makefile'
- '.github/workflows/go-ci.yml'
# Allows manual triggering of the workflow
workflow_dispatch:
+11 -16
View File
@@ -3,13 +3,13 @@ name: goreleaser
on:
push:
tags:
- '*'
- "*"
# Allows manual triggering of the workflow
workflow_dispatch:
inputs:
tag:
description: 'Tag version to release (e.g. v144)'
description: "Tag version to release (e.g. v144)"
required: true
permissions:
@@ -19,35 +19,30 @@ jobs:
goreleaser:
runs-on: ubuntu-latest
steps:
-
name: Checkout
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ github.event.inputs.tag || github.ref }}
-
name: Set up Go
- name: Set up Go
uses: actions/setup-go@v5
-
name: Set up Node.js
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '23'
-
name: Install dependencies and build UI
node-version: "24"
- name: Install dependencies and build UI
run: |
cd ui
cd ui-svelte
npm ci
npm run build
-
name: Run GoReleaser
- name: Run GoReleaser
uses: goreleaser/goreleaser-action@v6
with:
# either 'goreleaser' (default) or 'goreleaser-pro'
distribution: goreleaser
# 'latest', 'nightly', or a semver
version: '~> v2'
version: "~> v2"
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -76,4 +71,4 @@ jobs:
"release": {
"tag_name": "${{ steps.tag.outputs.tag }}"
}
}
}
+42
View File
@@ -0,0 +1,42 @@
name: UI Tests
on:
push:
branches: [ "main" ]
paths:
- 'ui-svelte/**'
- '.github/workflows/ui-tests.yml'
pull_request:
branches: [ "main" ]
paths:
- 'ui-svelte/**'
- '.github/workflows/ui-tests.yml'
workflow_dispatch:
jobs:
run-tests:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ui-svelte
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '24'
cache: 'npm'
cache-dependency-path: ui-svelte/package-lock.json
- name: Install dependencies
run: npm ci
- name: Type check
run: npm run check
- name: Run tests
run: npm test
+50
View File
@@ -0,0 +1,50 @@
## Project Description:
llama-swap is a light weight, transparent proxy server that provides automatic model swapping to llama.cpp's server.
## Tech stack
- golang
- typescript, vite and svelt5 for UI (located in ui/)
## Workflow Tasks
- when summarizing changes only include details that require further action
- just say "Done." when there is no further action
- use the github CLI `gh` to create pull requests and work with github
- Rules for creating pull requests:
- keep them short and focused on changes.
- never include a test plan
- write the summary using the same style rules as commit message
## Testing
- Follow test naming conventions like `TestProxyManager_<test name>`, `TestProcessGroup_<test name>`, etc.
- Use `go test -v -run <name pattern for new tests>` to run any new tests you've written.
- Use `make test-dev` after running new tests for a quick over all test run. This runs `go test` and `staticcheck`. Fix any static checking errors. Use this only when changes are made to any code under the `proxy/` directory
- Use `make test-all` before completing work. This includes long running concurrency tests.
### Commit message example format:
```
proxy: add new feature
Add new feature that implements functionality X and Y.
- key change 1
- key change 2
- key change 3
fixes #123
```
## Code Reviews
- use three levels High, Medium, Low severity
- label each discovered issue with a label like H1, M2, L3 respectively
- High severity are must fix issues (security, race conditions, critical bugs)
- Medium severity are recommended improvements (coding style, missing functionality, inconsistencies)
- Low severity are nice to have changes and nits
- Include a suggestion with each discovered item
- Limit your code review to three items with the highest priority first
- Double check your discovered items and recommended remediations
+1
View File
@@ -0,0 +1 @@
@AGENTS.md
+21 -9
View File
@@ -23,18 +23,24 @@ proxy/ui_dist/placeholder.txt:
mkdir -p proxy/ui_dist
touch $@
test: proxy/ui_dist/placeholder.txt
go test -short -v -count=1 ./proxy
# use cached test results while developing
test-dev: proxy/ui_dist/placeholder.txt
go test -short ./proxy/...
staticcheck ./proxy/... || true
test: proxy/ui_dist/placeholder.txt
go test -short -count=1 ./proxy/...
# for CI - full test (takes longer)
test-all: proxy/ui_dist/placeholder.txt
go test -v -count=1 ./proxy
go test -race -count=1 ./proxy/...
ui/node_modules:
cd ui && npm install
cd ui-svelte && npm install
# build react UI
ui: ui/node_modules
cd ui && npm run build
cd ui-svelte && npm run build
# Build OSX binary
mac: ui
@@ -55,12 +61,12 @@ windows: ui
# for testing proxy.Process
simple-responder:
@echo "Building simple responder"
GOOS=darwin GOARCH=arm64 go build -o $(BUILD_DIR)/simple-responder_darwin_arm64 misc/simple-responder/simple-responder.go
GOOS=linux GOARCH=amd64 go build -o $(BUILD_DIR)/simple-responder_linux_amd64 misc/simple-responder/simple-responder.go
GOOS=darwin GOARCH=arm64 go build -o $(BUILD_DIR)/simple-responder_darwin_arm64 cmd/simple-responder/simple-responder.go
GOOS=linux GOARCH=amd64 go build -o $(BUILD_DIR)/simple-responder_linux_amd64 cmd/simple-responder/simple-responder.go
simple-responder-windows:
@echo "Building simple responder for windows"
GOOS=windows GOARCH=amd64 go build -o $(BUILD_DIR)/simple-responder.exe misc/simple-responder/simple-responder.go
GOOS=windows GOARCH=amd64 go build -o $(BUILD_DIR)/simple-responder.exe cmd/simple-responder/simple-responder.go
# Ensure build directory exists
$(BUILD_DIR):
@@ -80,5 +86,11 @@ release:
echo "tagging new version: $$new_tag"; \
git tag "$$new_tag";
GOOS ?= $(shell go env GOOS 2>/dev/null || echo linux)
GOARCH ?= $(shell go env GOARCH 2>/dev/null || echo amd64)
wol-proxy: $(BUILD_DIR)
@echo "Building wol-proxy"
go build -o $(BUILD_DIR)/wol-proxy-$(GOOS)-$(GOARCH)-$(shell date +%Y-%m-%d) cmd/wol-proxy/wol-proxy.go
# Phony targets
.PHONY: all clean ui mac linux windows simple-responder
.PHONY: all clean ui mac linux windows simple-responder simple-responder-windows test test-all test-dev wol-proxy
+185 -161
View File
@@ -1,78 +1,203 @@
![llama-swap header image](header2.png)
![llama-swap header image](docs/assets/hero3.webp)
![GitHub Downloads (all assets, all releases)](https://img.shields.io/github/downloads/mostlygeek/llama-swap/total)
![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/mostlygeek/llama-swap/go-ci.yml)
![GitHub Repo stars](https://img.shields.io/github/stars/mostlygeek/llama-swap)
# llama-swap
llama-swap is a light weight, transparent proxy server that provides automatic model swapping to llama.cpp's server.
Run multiple generative AI models on your machine and hot-swap between them on demand. llama-swap works with any OpenAI and Anthropic API compatible server and is used by thousands of people to power their local AI workflows.
Written in golang, it is very easy to install (single binary with no dependencies) and configure (single yaml file). To get started, download a pre-built binary, a provided docker images or Homebrew.
Built in Go for performance and simplicity, llama-swap has zero dependencies and is incredibly easy to set up. Get started in minutes - just one binary and one configuration file.
## Features:
- ✅ Easy to deploy: single binary with no dependencies
- ✅ Easy to config: single yaml file
- ✅ Easy to deploy and configure: one binary, one configuration file. no external dependencies
- ✅ On-demand model switching
- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, stable-diffusion.cpp, etc.)
- future proof, upgrade your inference servers at any time.
- ✅ OpenAI API supported endpoints:
- `v1/completions`
- `v1/chat/completions`
- `v1/responses`
- `v1/embeddings`
- `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
- `v1/audio/transcriptions` ([docs](https://github.com/mostlygeek/llama-swap/issues/41#issuecomment-2722637867))
- ✅ llama-server (llama.cpp) supported endpoints:
- `v1/audio/voices`
- `v1/images/generations`
- `v1/images/edits`
- ✅ Anthropic API supported endpoints:
- `v1/messages`
- `v1/messages/count_tokens`
- ✅ llama-server (llama.cpp) supported endpoints
- `v1/rerank`, `v1/reranking`, `/rerank`
- `/infill` - for code infilling
- `/completion` - for completion endpoint
- ✅ llama-swap custom API endpoints
- ✅ llama-swap API
- `/ui` - web UI
- `/log` - remote log monitoring
- `/upstream/:model_id` - direct access to upstream HTTP server ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
- `/unload` - manually unload running models ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
- `/upstream/:model_id` - direct access to upstream server ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
- `/models/unload` - manually unload running models ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
- `/running` - list currently running models ([#61](https://github.com/mostlygeek/llama-swap/issues/61))
- `/log` - remote log monitoring
- `/health` - just returns "OK"
-Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
-Automatic unloading of models after timeout by setting a `ttl`
- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
- ✅ Reliable Docker and Podman support using `cmd` and `cmdStop` together
- ✅ Full control over server settings per model
- Preload models on startup with `hooks` ([#235](https://github.com/mostlygeek/llama-swap/pull/235))
-API Key support - define keys to restrict access to API endpoints
-Customizable
- Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
- Automatic unloading of models after timeout by setting a `ttl`
- Reliable Docker and Podman support using `cmd` and `cmdStop` together
- Preload models on startup with `hooks` ([#235](https://github.com/mostlygeek/llama-swap/pull/235))
### Web UI
llama-swap includes a real time web interface with a playground for testing out all sorts of local models:
<img width="1125" height="876" alt="image" src="https://github.com/user-attachments/assets/8ee41947-97af-463d-b0f0-8e9c478fac07" />
View detailed token metrics:
<img width="1111" height="515" alt="image" src="https://github.com/user-attachments/assets/64bfb280-d7a3-4126-971a-a128fd40410c" />
Inspect request and responses:
<img width="1111" height="720" alt="image" src="https://github.com/user-attachments/assets/24fe4aca-1448-4d7c-b9e8-a967589bda6c" />
Manually load and unload models:
<img width="1109" height="719" alt="image" src="https://github.com/user-attachments/assets/02b1e1f2-abd0-4050-84ae-facd66ff01c4" />
Real time log streaming:
<img width="1107" height="559" alt="image" src="https://github.com/user-attachments/assets/39669a10-cff2-409e-836a-5bad8bd0140c" />
## Installation
llama-swap can be installed in multiple ways
1. Docker
2. Homebrew (OSX and Linux)
3. WinGet
4. From release binaries
5. From source
### Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
Nightly container images with llama-swap and llama-server are built for multiple platforms (cuda, vulkan, intel, etc.) including [non-root variants with improved security](docs/container-security.md).
The stable-diffusion.cpp server is also included for the musa and vulkan platforms.
```shell
$ docker pull ghcr.io/mostlygeek/llama-swap:cuda
# run with a custom configuration and models directory
$ docker run -it --rm --runtime nvidia -p 9292:8080 \
-v /path/to/models:/models \
-v /path/to/custom/config.yaml:/app/config.yaml \
ghcr.io/mostlygeek/llama-swap:cuda
# configuration hot reload supported with a
# directory volume mount
$ docker run -it --rm --runtime nvidia -p 9292:8080 \
-v /path/to/models:/models \
-v /path/to/custom/config.yaml:/app/config.yaml \
-v /path/to/config:/config \
ghcr.io/mostlygeek/llama-swap:cuda -config /config/config.yaml -watch-config
```
<details>
<summary>
more examples
</summary>
```shell
# pull latest images per platform
docker pull ghcr.io/mostlygeek/llama-swap:cpu
docker pull ghcr.io/mostlygeek/llama-swap:cuda
docker pull ghcr.io/mostlygeek/llama-swap:vulkan
docker pull ghcr.io/mostlygeek/llama-swap:intel
docker pull ghcr.io/mostlygeek/llama-swap:musa
# tagged llama-swap, platform and llama-server version images
docker pull ghcr.io/mostlygeek/llama-swap:v166-cuda-b6795
# non-root cuda
docker pull ghcr.io/mostlygeek/llama-swap:cuda-non-root
```
</details>
### Homebrew Install (macOS/Linux)
```shell
brew tap mostlygeek/llama-swap
brew install llama-swap
llama-swap --config path/to/config.yaml --listen localhost:8080
```
### WinGet Install (Windows)
> [!NOTE]
> WinGet is maintained by community contributor [Dvd-Znf](https://github.com/Dvd-Znf) ([#327](https://github.com/mostlygeek/llama-swap/issues/327)). It is not an official part of llama-swap.
```shell
# install
C:\> winget install llama-swap
# upgrade
C:\> winget upgrade llama-swap
```
### Pre-built Binaries
Binaries are available on the [release](https://github.com/mostlygeek/llama-swap/releases) page for Linux, Mac, Windows and FreeBSD.
### Building from source
1. Building requires Go and Node.js (for UI).
1. `git clone https://github.com/mostlygeek/llama-swap.git`
1. `make clean all`
1. look in the `build/` subdirectory for the llama-swap binary
## Configuration
```yaml
# minimum viable config.yaml
models:
model1:
cmd: llama-server --port ${PORT} --model /path/to/model.gguf
```
That's all you need to get started:
1. `models` - holds all model configurations
2. `model1` - the ID used in API calls
3. `cmd` - the command to run to start the server.
4. `${PORT}` - an automatically assigned port number
Almost all configuration settings are optional and can be added one step at a time:
- Advanced features
- `groups` to run multiple models at once
- `hooks` to run things on startup
- `macros` reusable snippets
- Model customization
- `ttl` to automatically unload models
- `aliases` to use familiar model names (e.g., "gpt-4o-mini")
- `env` to pass custom environment variables to inference servers
- `cmdStop` gracefully stop Docker/Podman containers
- `useModelName` to override model names sent to upstream servers
- `${PORT}` automatic port variables for dynamic port assignment
- `filters` rewrite parts of requests before sending to the upstream server
See the [configuration documentation](docs/configuration.md) for all options.
## How does llama-swap work?
When a request is made to an OpenAI compatible endpoint, llama-swap will extract the `model` value and load the appropriate server configuration to serve it. If the wrong upstream server is running, it will be replaced with the correct one. This is where the "swap" part comes in. The upstream server is automatically swapped to the correct one to serve the request.
When a request is made to an OpenAI compatible endpoint, llama-swap will extract the `model` value and load the appropriate server configuration to serve it. If the wrong upstream server is running, it will be replaced with the correct one. This is where the "swap" part comes in. The upstream server is automatically swapped to handle the request correctly.
In the most basic configuration llama-swap handles one model at a time. For more advanced use cases, the `groups` feature allows multiple models to be loaded at the same time. You have complete control over how your system resources are used.
## config.yaml
llama-swap is managed entirely through a yaml configuration file.
It can be very minimal to start:
```yaml
models:
"qwen2.5":
cmd: |
/path/to/llama-server
-hf bartowski/Qwen2.5-0.5B-Instruct-GGUF:Q4_K_M
--port ${PORT}
```
However, there are many more capabilities that llama-swap supports:
- `groups` to run multiple models at once
- `ttl` to automatically unload models
- `macros` for reusable snippets
- `aliases` to use familiar model names (e.g., "gpt-4o-mini")
- `env` to pass custom environment variables to inference servers
- `cmdStop` for to gracefully stop Docker/Podman containers
- `useModelName` to override model names sent to upstream servers
- `healthCheckTimeout` to control model startup wait times
- `${PORT}` automatic port variables for dynamic port assignment
See the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration) in the wiki all options and examples.
## Reverse Proxy Configuration (nginx)
If you deploy llama-swap behind nginx, disable response buffering for streaming endpoints. By default, nginx buffers responses which breaks ServerSent Events (SSE) and streaming chat completion. ([#236](https://github.com/mostlygeek/llama-swap/issues/236))
@@ -97,129 +222,28 @@ location /v1/chat/completions {
As a safeguard, llama-swap also sets `X-Accel-Buffering: no` on SSE responses. However, explicitly disabling `proxy_buffering` at your reverse proxy is still recommended for reliable streaming behavior.
## Web UI
## Monitoring Logs on the CLI
llama-swap includes a real time web interface for monitoring logs and models:
<img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/adef4a8e-de0b-49db-885a-8f6dedae6799" />
The Activity Page shows recent requests:
<img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/5f3edee6-d03a-4ae5-ae06-b20ac1f135bd" />
## Installation
llama-swap can be installed in multiple ways
1. Docker
2. Homebrew (OSX and Linux)
3. From release binaries
4. From source
### Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
Docker images with llama-swap and llama-server are built nightly.
```shell
# use CPU inference comes with the example config above
$ docker run -it --rm -p 9292:8080 ghcr.io/mostlygeek/llama-swap:cpu
# qwen2.5 0.5B
$ curl -s http://localhost:9292/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{"model":"qwen2.5","messages": [{"role": "user","content": "tell me a joke"}]}' | \
jq -r '.choices[0].message.content'
# SmolLM2 135M
$ curl -s http://localhost:9292/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{"model":"smollm2","messages": [{"role": "user","content": "tell me a joke"}]}' | \
jq -r '.choices[0].message.content'
```
<details>
<summary>Docker images are built nightly with llama-server for cuda, intel, vulcan and musa.</summary>
They include:
- `ghcr.io/mostlygeek/llama-swap:cpu`
- `ghcr.io/mostlygeek/llama-swap:cuda`
- `ghcr.io/mostlygeek/llama-swap:intel`
- `ghcr.io/mostlygeek/llama-swap:vulkan`
- ROCm disabled until fixed in llama.cpp container
Specific versions are also available and are tagged with the llama-swap, architecture and llama.cpp versions. For example: `ghcr.io/mostlygeek/llama-swap:v89-cuda-b4716`
Beyond the demo you will likely want to run the containers with your downloaded models and custom configuration.
```shell
$ docker run -it --rm --runtime nvidia -p 9292:8080 \
-v /path/to/models:/models \
-v /path/to/custom/config.yaml:/app/config.yaml \
ghcr.io/mostlygeek/llama-swap:cuda
```
</details>
### Homebrew Install (macOS/Linux)
The latest release of `llama-swap` can be installed via [Homebrew](https://brew.sh).
```shell
# Set up tap and install formula
brew tap mostlygeek/llama-swap
brew install llama-swap
# Run llama-swap
llama-swap --config path/to/config.yaml --listen localhost:8080
```
This will install the `llama-swap` binary and make it available in your path. See the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration)
### Pre-built Binaries ([download](https://github.com/mostlygeek/llama-swap/releases))
Binaries are available for Linux, Mac, Windows and FreeBSD. These are automatically published and are likely a few hours ahead of the docker releases. The binary install works with any OpenAI compatible server, not just llama-server.
1. Download a [release](https://github.com/mostlygeek/llama-swap/releases) appropriate for your OS and architecture.
1. Create a configuration file, see the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration).
1. Run the binary with `llama-swap --config path/to/config.yaml --listen localhost:8080`.
Available flags:
- `--config`: Path to the configuration file (default: `config.yaml`).
- `--listen`: Address and port to listen on (default: `:8080`).
- `--version`: Show version information and exit.
- `--watch-config`: Automatically reload the configuration file when it changes. This will wait for in-flight requests to complete then stop all running models (default: `false`).
### Building from source
1. Build requires golang and nodejs for the user interface.
1. `git clone https://github.com/mostlygeek/llama-swap.git`
1. `make clean all`
1. Binaries will be in `build/` subdirectory
## Monitoring Logs
Open the `http://<host>:<port>/` with your browser to get a web interface with streaming logs.
CLI access is also supported:
```shell
```sh
# sends up to the last 10KB of logs
curl http://host/logs'
$ curl http://host/logs
# streams combined logs
curl -Ns 'http://host/logs/stream'
curl -Ns http://host/logs/stream
# just llama-swap's logs
curl -Ns 'http://host/logs/stream/proxy'
# stream llama-swap's proxy status logs
curl -Ns http://host/logs/stream/proxy
# just upstream's logs
curl -Ns 'http://host/logs/stream/upstream'
# stream logs from upstream processes that llama-swap loads
curl -Ns http://host/logs/stream/upstream
# stream logs only from a specific model
curl -Ns http://host/logs/stream/{model_id}
# stream and filter logs with linux pipes
curl -Ns http://host/logs/stream | grep 'eval time'
# skips history and just streams new log entries
# appending ?no-history will disable sending buffered history first
curl -Ns 'http://host/logs/stream?no-history'
```
@@ -227,11 +251,11 @@ curl -Ns 'http://host/logs/stream?no-history'
Any OpenAI compatible server would work. llama-swap was originally designed for llama-server and it is the best supported.
For Python based inference servers like vllm or tabbyAPI it is recommended to run them via podman or docker. This provides clean environment isolation as well as responding correctly to `SIGTERM` signals to shutdown.
For Python based inference servers like vllm or tabbyAPI it is recommended to run them via podman or docker. This provides clean environment isolation as well as responding correctly to `SIGTERM` signals for proper shutdown.
## Star History
> [!NOTE]
> ⭐️ Star this project to help others discover it!
> ⭐️ Star this project to help others discover it!
[![Star History Chart](https://api.star-history.com/svg?repos=mostlygeek/llama-swap&type=Date)](https://www.star-history.com/#mostlygeek/llama-swap&Date)
@@ -0,0 +1,85 @@
# Replace ring.Ring with Efficient Circular Byte Buffer
## Overview
Replace the inefficient `container/ring.Ring` implementation in `logMonitor.go` with a simple circular byte buffer that uses a single contiguous `[]byte` slice. This eliminates per-write allocations, improves cache locality, and correctly implements a 10KB buffer.
## Current Issues
1. `ring.New(10 * 1024)` creates 10,240 ring **elements**, not 10KB of storage
2. Every `Write()` call allocates a new `[]byte` slice inside the lock
3. `GetHistory()` iterates all 10,240 elements and appends repeatedly (geometric reallocs)
4. Linked list structure has poor cache locality and pointer overhead
## Design Requirements
### New CircularBuffer Type
Create a simple circular byte buffer with:
- Single pre-allocated `[]byte` of fixed capacity (10KB)
- `head` and `size` integers to track write position and data length
- No per-write allocations
### API Requirements
The new buffer must support:
1. **Write(p []byte)** - Append bytes, overwriting oldest data when full
2. **GetHistory() []byte** - Return all buffered data in correct order (oldest to newest)
### Implementation Details
```go
type circularBuffer struct {
data []byte // pre-allocated capacity
head int // next write position
size int // current number of bytes stored (0 to cap)
}
```
**Write logic:**
- If `len(p) >= capacity`: just keep the last `capacity` bytes
- Otherwise: write bytes at `head`, wrapping around if needed
- Update `head` and `size` accordingly
- Data is copied into the internal buffer (not stored by reference)
**GetHistory logic:**
- Calculate start position: `(head - size + cap) % cap`
- If not wrapped: single slice copy
- If wrapped: two copies (end of buffer + beginning)
- Returns a **new slice** (copy), not a view into internal buffer
### Immutability Guarantees (must preserve)
Per existing tests:
1. Modifying input `[]byte` after `Write()` must not affect stored data
2. `GetHistory()` returns independent copy - modifications don't affect buffer
## Files to Modify
- `proxy/logMonitor.go` - Replace `buffer *ring.Ring` with new circular buffer
## Testing Plan
Existing tests in `logMonitor_test.go` should continue to pass:
- `TestLogMonitor` - Basic write/read and subscriber notification
- `TestWrite_ImmutableBuffer` - Verify writes don't affect returned history
- `TestWrite_LogTimeFormat` - Timestamp formatting
Add new tests:
- Test buffer wrap-around behavior
- Test large writes that exceed buffer capacity
- Test exact capacity boundary conditions
## Checklist
- [ ] Create `circularBuffer` struct in `logMonitor.go`
- [ ] Implement `Write()` method for circular buffer
- [ ] Implement `GetHistory()` method for circular buffer
- [ ] Update `LogMonitor` struct to use new buffer
- [ ] Update `NewLogMonitorWriter()` to initialize new buffer
- [ ] Update `LogMonitor.Write()` to use new buffer
- [ ] Update `LogMonitor.GetHistory()` to use new buffer
- [ ] Remove `"container/ring"` import
- [ ] Run `make test-dev` to verify existing tests pass
- [ ] Add wrap-around test case
- [ ] Run `make test-all` for final validation
+292
View File
@@ -0,0 +1,292 @@
# Add Model Metadata Support with Typed Macros
## Overview
Implement support for arbitrary metadata on model configurations that can be exposed through the `/v1/models` API endpoint. This feature extends the existing macro system to support scalar types (string, int, float, bool) instead of only strings, enabling type-safe metadata values.
The metadata will be schemaless, allowing users to define any key-value pairs they need. Macro substitution will work within metadata values, preserving types when macros are used directly and converting to strings when macros are interpolated within strings.
## Design Requirements
### 1. Enhanced Macro System
**Current State:**
- Macros are defined as `map[string]string` at both global and model levels
- Only string substitution is supported
- Macros are replaced in: `cmd`, `cmdStop`, `proxy`, `checkEndpoint`, `filters.stripParams`
**Required Changes:**
- Change `MacroList` type from `map[string]string` to `map[string]any`
- Support scalar types: `string`, `int`, `float64`, `bool`
- Implement type-preserving macro substitution:
- Direct macro usage (`key: ${macro}`) preserves the macro's type
- Interpolated usage (`key: "text ${macro}"`) converts to string
- Add validation to ensure macro values are scalar types only
- Update existing macro substitution logic in [proxy/config/config.go](proxy/config/config.go) to handle `any` types
**Implementation Details:**
- Create a generic helper function to perform macro substitution that:
- Takes a value of type `any`
- Recursively processes maps, slices, and scalar values
- Replaces `${macro_name}` patterns with macro values
- Preserves types for direct substitution
- Converts to strings for interpolated substitution
- Update `validateMacro()` function to accept `any` type and validate scalar types
- Maintain backward compatibility with existing string-only macros
### 2. Metadata Field in ModelConfig
**Location:** [proxy/config/model_config.go](proxy/config/model_config.go)
**Required Changes:**
- Add `Metadata map[string]any` field to `ModelConfig` struct
- Support YAML unmarshaling of arbitrary structures (maps, arrays, scalars)
- Apply macro substitution to metadata values during config loading
**Schema Requirements:**
- Metadata is optional (default: empty/nil map)
- Supports nested structures (objects within objects, arrays, etc.)
- All string values within metadata undergo macro substitution
- Type preservation rules apply as described above
### 3. Macro Substitution in Metadata
**Location:** [proxy/config/config.go](proxy/config/config.go) in `LoadConfigFromReader()`
**Process Flow:**
1. After loading YAML configuration
2. After model-level and global macro merging
3. Apply macro substitution to `ModelConfig.Metadata` field
4. Use the same merged macros available to `cmd`, `proxy`, etc.
5. Process recursively through all nested structures
**Substitution Rules:**
- `port: ${PORT}` → keeps integer type from PORT macro
- `temperature: ${temp}` → keeps float type from temp macro
- `note: "Running on ${PORT}"` → converts to string `"Running on 10001"`
- Arrays and nested objects are processed recursively
- Unknown macros should cause configuration load error (consistent with existing behavior)
### 4. API Response Updates
**Location:** [proxy/proxymanager.go:350](proxy/proxymanager.go#L350) `listModelsHandler()`
**Current Behavior:**
- Returns model records with: `id`, `object`, `created`, `owned_by`
- Optionally includes: `name`, `description`
**Required Changes:**
- Add metadata to each model record under the key `llamaswap_meta`
- Only include `llamaswap_meta` if metadata is non-empty
- Preserve all types when marshaling to JSON
- Maintain existing sorting by model ID
**Example Response:**
```json
{
"object": "list",
"data": [
{
"id": "llama",
"object": "model",
"created": 1234567890,
"owned_by": "llama-swap",
"name": "llama 3.1 8B",
"description": "A small but capable model",
"llamaswap_meta": {
"port": 10001,
"temperature": 0.7,
"note": "The llama is running on port 10001 temp=0.7, context=16384",
"a_list": [1, 1.23, "macros are OK in list and dictionary types: llama"],
"an_obj": {
"a": "1",
"b": 2,
"c": [0.7, false, "model: llama"]
}
}
}
]
}
```
### 5. Validation and Error Handling
**Macro Validation:**
- Extend `validateMacro()` to accept values of type `any`
- Verify macro values are scalar types: `string`, `int`, `float64`, `bool`
- Reject complex types (maps, slices, structs) as macro values
- Maintain existing validation for macro names and lengths
**Configuration Loading:**
- Fail fast if unknown macros are found in metadata
- Provide clear error messages indicating which model and field contains errors
- Ensure macros in metadata follow same rules as macros in cmd/proxy fields
## Testing Plan
### Test 1: Model-Level Macros with Different Types
**File:** [proxy/config/model_config_test.go](proxy/config/model_config_test.go)
**Test Cases:**
- Define model with macros of each scalar type
- Verify metadata correctly substitutes and preserves types
- Test direct substitution (`port: ${PORT}`)
- Test string interpolation (`note: "Port is ${PORT}"`)
- Verify nested objects and arrays work correctly
### Test 2: Global and Model Macro Precedence
**File:** [proxy/config/config_test.go](proxy/config/config_test.go)
**Test Cases:**
- Define same macro at global and model level with different types
- Verify model-level macro takes precedence
- Test metadata uses correct macro value
- Verify type is preserved from the winning macro
### Test 3: Macro Validation
**File:** [proxy/config/config_test.go](proxy/config/config_test.go)
**Test Cases:**
- Test that complex types (maps, arrays) are rejected as macro values
- Verify error message includes: macro name and type that was rejected
- Test that scalar types (string, int, float, bool) are accepted
- Each type should load without error
- Test macro name validation still works with `any` types
- Invalid characters, reserved names, length limits should still be enforced
### Test 4: Metadata in API Response
**File:** [proxy/proxymanager_test.go](proxy/proxymanager_test.go)
**Existing Test:** `TestProxyManager_ListModelsHandler`
**Test Cases:**
- Model with metadata → verify `llamaswap_meta` key appears
- Model without metadata → verify `llamaswap_meta` key is absent
- Verify all types are correctly marshaled to JSON
- Verify nested structures are preserved
- Verify macro substitution has occurred before serialization
### Test 5: Unknown Macros in Metadata
**File:** [proxy/config/config_test.go](proxy/config/config_test.go)
**Test Cases:**
- Use undefined macro in metadata
- Verify configuration loading fails with clear error
- Error should indicate model name and that macro is undefined
### Test 6: Recursive Substitution
**File:** [proxy/config/config_test.go](proxy/config/config_test.go)
**Test Cases:**
- Metadata with deeply nested structures
- Arrays containing objects with macros
- Objects containing arrays with macros
- Mixed string interpolation and direct substitution at various nesting levels
## Checklist
### Configuration Schema Changes
- [x] Change `MacroList` type from `map[string]string` to `map[string]any` in [proxy/config/config.go:19](proxy/config/config.go#L19)
- [x] Add `Metadata map[string]any` field to `ModelConfig` struct in [proxy/config/model_config.go:37](proxy/config/model_config.go#L37)
- [x] Update `validateMacro()` function signature to accept `any` type for values
- [x] Add validation logic to ensure macro values are scalar types only
### Macro Substitution Logic
- [x] Create generic recursive function `substituteMetadataMacros()` to handle `any` types
- [x] Implement type-preserving direct substitution logic
- [x] Implement string interpolation with type conversion
- [x] Handle maps: recursively process all values
- [x] Handle slices: recursively process all elements
- [x] Handle scalar types: perform string-based macro substitution if value is string
- [x] Integrate macro substitution into `LoadConfigFromReader()` after existing macro expansion
- [x] Update existing macro substitution calls to use merged macros with correct types
### API Response Changes
- [x] Modify `listModelsHandler()` in [proxy/proxymanager.go:350](proxy/proxymanager.go#L350)
- [x] Add `llamaswap_meta` field to model records when metadata exists
- [x] Ensure empty metadata results in omitted `llamaswap_meta` key
- [x] Verify JSON marshaling preserves all types correctly
### Testing - Config Package
- [x] Add test for string macros in metadata: [proxy/config/config_test.go](proxy/config/config_test.go)
- [x] Add test for int macros in metadata: [proxy/config/config_test.go](proxy/config/config_test.go)
- [x] Add test for float macros in metadata: [proxy/config/config_test.go](proxy/config/config_test.go)
- [x] Add test for bool macros in metadata: [proxy/config/config_test.go](proxy/config/config_test.go)
- [x] Add test for string interpolation in metadata: [proxy/config/config_test.go](proxy/config/config_test.go)
- [x] Add test for model-level macro precedence: [proxy/config/config_test.go](proxy/config/config_test.go)
- [x] Add test for nested structures in metadata: [proxy/config/config_test.go](proxy/config/config_test.go)
- [x] Add test for unknown macro in metadata (should error): [proxy/config/config_test.go](proxy/config/config_test.go)
- [x] Add test for invalid macro type validation: [proxy/config/config_test.go](proxy/config/config_test.go)
### Testing - Model Config Package
- [x] Add test cases to [proxy/config/model_config_test.go](proxy/config/model_config_test.go) for metadata unmarshaling
- [x] Test metadata with various scalar types
- [x] Test metadata with nested objects and arrays
### Testing - Proxy Manager
- [x] Update `TestProxyManager_ListModelsHandler` in [proxy/proxymanager_test.go](proxy/proxymanager_test.go)
- [x] Add test case for model with metadata
- [x] Add test case for model without metadata
- [x] Verify `llamaswap_meta` key presence/absence
- [x] Verify type preservation in JSON output
- [x] Verify macro substitution has occurred
### Documentation
- [x] Verify [config.example.yaml](config.example.yaml) already has complete metadata examples (lines 149-171)
- [x] No additional documentation needed per project instructions
## Known Issues and Considerations
### Inconsistencies
None identified. The plan references the correct existing example in [config.example.yaml:149-171](config.example.yaml#L149-L171).
### Design Decisions
1. **Why `llamaswap_meta` instead of merging into record?**
- Avoids potential collisions with OpenAI API standard fields
- Makes it clear this is llama-swap specific metadata
- Easier for clients to distinguish standard vs. custom fields
2. **Why support nested structures?**
- Provides maximum flexibility for users
- Aligns with the schemaless design principle
- Example config already demonstrates this capability
3. **Why validate macro types?**
- Prevents confusing behavior (e.g., substituting a map)
- Makes configuration errors explicit at load time
- Simpler implementation and testing
+397
View File
@@ -0,0 +1,397 @@
# Improve macro-in-macro support
**Status: COMPLETED ✅**
## Title
Fix macro substitution ordering by preserving definition order using ordered YAML parsing
## Overview
The current macro implementation uses `map[string]any` which does not preserve insertion order. This causes issues when macros reference other macros - if macro `B` contains `${A}` but `B` is processed before `A`, the reference won't be substituted, leading to "unknown macro" errors.
**Goal:** Ensure macros are substituted in definition order (LIFO - last in, first out) to allow macros to reliably reference previously-defined macros.
**Outcomes:**
- Macros can reference other macros defined earlier in the config
- Macro substitution is deterministic and order-dependent
- Single-pass substitution prevents circular dependencies
- Use `yaml.Node` from `gopkg.in/yaml.v3` to preserve macro definition order
- All existing tests pass
- New tests validate substitution order and self-reference detection
## Design Requirements
### 1. YAML Parsing Strategy
- **Continue using:** `gopkg.in/yaml.v3` (current library)
- **Use:** `yaml.Node` for ordered parsing of macros
- **Reason:** `yaml.Node` preserves document structure and order, avoiding need for migration
### 2. Data Structure Changes
#### Current Implementation (config.go:19)
```go
type MacroList map[string]any
```
#### New Implementation
```go
type MacroList []MacroEntry
type MacroEntry struct {
Name string
Value any
}
```
**Implementation Note:** Parse macros using `yaml.Node` to extract key-value pairs in document order, then construct the ordered `MacroList`.
### 3. Macro Substitution Order Rules
The substitution must follow this hierarchy (from most specific to least):
1. **Reserved macros** (last): `PORT`, `MODEL_ID` - substituted last, highest priority
2. **Model-level macros** (middle): Defined in specific model config, overrides global
3. **Global macros** (first): Defined at config root level
Within each level, macros are substituted in **reverse definition order** (LIFO):
- The last macro defined is substituted first
- This allows later macros to reference earlier ones
- Single-pass substitution prevents circular dependencies
### 4. Macro Reference Rules
**Allowed:**
- Macro can reference any macro defined **before** it (earlier in the file)
- Model macros can reference global macros
- Macros can reference reserved macros (`${PORT}`, `${MODEL_ID}`)
**Prohibited:**
- Macro cannot reference itself (e.g., `foo: "value ${foo}"`)
- Macro cannot reference macros defined **after** it
- No circular references (prevented by single-pass, ordered substitution)
### 5. Validation Requirements
Add validation to detect:
- **Self-references:** Macro value contains reference to its own name
- **Unknown macros:** After substitution, any remaining `${...}` references
Error messages should be clear:
```
macro 'foo' contains self-reference
unknown macro '${bar}' in model.cmd
```
### 6. Implementation Changes
#### Files to Modify
1. **[proxy/config/config.go](proxy/config/config.go)**
- Line 19: Change `MacroList` type definition
- Line 69: Update `Macros MacroList` field
- Line 153-157: Update macro validation loop to work with ordered structure
- Line 175-188: Update model-level macro validation
- Line 181-188: **NEW** Implement proper macro merging respecting order
- Line 193-202: **NEW** Implement ordered macro substitution in LIFO order
- Line 389-415: Update `validateMacro` to detect self-references
- Line 420-475: Update `substituteMetadataMacros` to accept ordered MacroList
2. **[proxy/config/model_config.go](proxy/config/model_config.go)**
- Line 33: Update `Macros MacroList` field type
3. **All test files**
- Update test fixtures to use ordered macro definitions
- Ensure tests specify macro order explicitly
#### Core Algorithm
Replace the macro substitution logic in [config.go:181-252](proxy/config/config.go#L181-L252) with:
```go
// Merge global config and model macros. Model macros take precedence
mergedMacros := make(MacroList, 0, len(config.Macros)+len(modelConfig.Macros)+2)
// Add global macros first
for _, entry := range config.Macros {
mergedMacros = append(mergedMacros, entry)
}
// Add model macros (can override global)
for _, entry := range modelConfig.Macros {
// Remove any existing global macro with same name
found := false
for i, existing := range mergedMacros {
if existing.Name == entry.Name {
mergedMacros[i] = entry // Override
found = true
break
}
}
if !found {
mergedMacros = append(mergedMacros, entry)
}
}
// Add reserved MODEL_ID macro at the end
mergedMacros = append(mergedMacros, MacroEntry{Name: "MODEL_ID", Value: modelId})
// Check if PORT macro is needed
if strings.Contains(modelConfig.Cmd, "${PORT}") || strings.Contains(modelConfig.Proxy, "${PORT}") || strings.Contains(modelConfig.CmdStop, "${PORT}") {
// enforce ${PORT} used in both cmd and proxy
if !strings.Contains(modelConfig.Cmd, "${PORT}") && strings.Contains(modelConfig.Proxy, "${PORT}") {
return Config{}, fmt.Errorf("model %s: proxy uses ${PORT} but cmd does not - ${PORT} is only available when used in cmd", modelId)
}
// Add PORT macro to the end (highest priority)
mergedMacros = append(mergedMacros, MacroEntry{Name: "PORT", Value: nextPort})
nextPort++
}
// Single-pass substitution: Substitute all macros in LIFO order (last defined first)
// This allows later macros to reference earlier ones
for i := len(mergedMacros) - 1; i >= 0; i-- {
entry := mergedMacros[i]
macroSlug := fmt.Sprintf("${%s}", entry.Name)
macroStr := fmt.Sprintf("%v", entry.Value)
// Substitute in command fields
modelConfig.Cmd = strings.ReplaceAll(modelConfig.Cmd, macroSlug, macroStr)
modelConfig.CmdStop = strings.ReplaceAll(modelConfig.CmdStop, macroSlug, macroStr)
modelConfig.Proxy = strings.ReplaceAll(modelConfig.Proxy, macroSlug, macroStr)
modelConfig.CheckEndpoint = strings.ReplaceAll(modelConfig.CheckEndpoint, macroSlug, macroStr)
modelConfig.Filters.StripParams = strings.ReplaceAll(modelConfig.Filters.StripParams, macroSlug, macroStr)
// Substitute in metadata (recursive)
if len(modelConfig.Metadata) > 0 {
var err error
modelConfig.Metadata, err = substituteMacroInValue(modelConfig.Metadata, entry.Name, entry.Value)
if err != nil {
return Config{}, fmt.Errorf("model %s metadata: %s", modelId, err.Error())
}
}
}
```
Add this new helper function to replace `substituteMetadataMacros`:
```go
// substituteMacroInValue recursively substitutes a single macro in a value structure
// This is called once per macro, allowing LIFO substitution order
func substituteMacroInValue(value any, macroName string, macroValue any) (any, error) {
macroSlug := fmt.Sprintf("${%s}", macroName)
macroStr := fmt.Sprintf("%v", macroValue)
switch v := value.(type) {
case string:
// Check if this is a direct macro substitution
if v == macroSlug {
return macroValue, nil
}
// Handle string interpolation
if strings.Contains(v, macroSlug) {
return strings.ReplaceAll(v, macroSlug, macroStr), nil
}
return v, nil
case map[string]any:
// Recursively process map values
newMap := make(map[string]any)
for key, val := range v {
newVal, err := substituteMacroInValue(val, macroName, macroValue)
if err != nil {
return nil, err
}
newMap[key] = newVal
}
return newMap, nil
case []any:
// Recursively process slice elements
newSlice := make([]any, len(v))
for i, val := range v {
newVal, err := substituteMacroInValue(val, macroName, macroValue)
if err != nil {
return nil, err
}
newSlice[i] = newVal
}
return newSlice, nil
default:
// Return scalar types as-is
return value, nil
}
}
```
### 7. Self-Reference Detection
Add to `validateMacro` function:
```go
func validateMacro(name string, value any) error {
// ... existing validation ...
// Check for self-reference
if str, ok := value.(string); ok {
macroSlug := fmt.Sprintf("${%s}", name)
if strings.Contains(str, macroSlug) {
return fmt.Errorf("macro '%s' contains self-reference", name)
}
}
return nil
}
```
## Testing Plan
### 1. Migration Tests
- **Test:** All existing macro tests still pass after YAML library migration
- **Files:** All `*_test.go` files with macro tests
### 2. Macro Order Tests
#### Test: Macro-in-macro substitution order
```yaml
macros:
"A": "value-A"
"B": "prefix-${A}-suffix"
models:
test:
cmd: "echo ${B}"
```
**Expected:** `cmd` becomes `"echo prefix-value-A-suffix"`
#### Test: LIFO substitution order
```yaml
macros:
"base": "/models"
"path": "${base}/llama"
"full": "${path}/model.gguf"
models:
test:
cmd: "load ${full}"
```
**Expected:** `cmd` becomes `"load /models/llama/model.gguf"`
#### Test: Model macro overrides global
```yaml
macros:
"tag": "global"
"msg": "value-${tag}"
models:
test:
macros:
"tag": "model-level"
cmd: "echo ${msg}"
```
**Expected:** `cmd` becomes `"echo value-model-level"` (model macro overrides global)
### 3. Reserved Macro Tests
#### Test: MODEL_ID substituted in macro
```yaml
macros:
"podman-llama": "podman run --name ${MODEL_ID} ghcr.io/ggml-org/llama.cpp:server-cuda"
models:
my-model:
cmd: "${podman-llama} -m model.gguf"
```
**Expected:** `cmd` becomes `"podman run --name my-model ghcr.io/ggml-org/llama.cpp:server-cuda -m model.gguf"`
### 4. Error Detection Tests
#### Test: Self-reference detection
```yaml
macros:
"recursive": "value-${recursive}"
```
**Expected:** Error: `macro 'recursive' contains self-reference`
#### Test: Undefined macro reference
```yaml
macros:
"A": "value-${UNDEFINED}"
```
**Expected:** Error: `unknown macro '${UNDEFINED}' found in macros.A` (or similar)
### 5. Regression Tests
- Run all existing macro tests: `TestConfig_MacroReplacement`, `TestConfig_MacroReservedNames`, etc.
- Ensure all pass without modification (except test fixtures if needed)
## Checklist
### Phase 1: Data Structure Changes
- [ ] Implement custom `UnmarshalYAML` method for `MacroList` that uses `yaml.Node`
- [ ] Define new ordered `MacroList` type as `[]MacroEntry`
- [ ] Update `MacroList` type definition in [config.go](proxy/config/config.go#L19)
- [ ] Update `Config.Macros` field type in [config.go](proxy/config/config.go#L69)
- [ ] Update `ModelConfig.Macros` field type in [model_config.go](proxy/config/model_config.go#L33)
- [ ] Implement helper functions:
- [ ] `func (ml MacroList) Get(name string) (any, bool)` - lookup by name
- [ ] `func (ml MacroList) Set(name string, value any) MacroList` - add/override entry
- [ ] `func (ml MacroList) ToMap() map[string]any` - convert to map if needed
### Phase 2: Macro Validation Updates
- [ ] Update macro validation loop at [config.go:153-157](proxy/config/config.go#L153-L157)
- [ ] Update model macro validation at [config.go:175-179](proxy/config/config.go#L175-L179)
- [ ] Add self-reference detection to `validateMacro` function [config.go:389](proxy/config/config.go#L389)
- [ ] Test self-reference detection with new test case
### Phase 3: Macro Substitution Algorithm
- [ ] Implement ordered macro merging (global → model → reserved) at [config.go:181-188](proxy/config/config.go#L181-L188)
- [ ] Implement single-pass LIFO substitution loop (reverse iteration) at [config.go:193-202](proxy/config/config.go#L193-L202)
- [ ] Substitute in all string fields (cmd, cmdStop, proxy, checkEndpoint, stripParams)
- [ ] Substitute in metadata within same loop
- [ ] Ensure `MODEL_ID` is added to merged macros before substitution
- [ ] Ensure `PORT` is added after port assignment (if needed)
- [ ] Replace `substituteMetadataMacros` with new `substituteMacroInValue` function that processes one macro at a time [config.go:420](proxy/config/config.go#L420)
- [ ] Remove old metadata substitution code that was separate from main loop [config.go:245-251](proxy/config/config.go#L245-L251)
### Phase 4: Testing
- [ ] Run `make test-dev` - fix any static checking errors
- [ ] Add test: macro-in-macro basic substitution
- [ ] Add test: LIFO substitution order with 3+ macro levels
- [ ] Add test: MODEL_ID in global macro used by model
- [ ] Add test: PORT in global macro used by model
- [ ] Add test: model macro overrides global macro in substitution
- [ ] Add test: self-reference detection error
- [ ] Add test: undefined macro reference error
- [ ] Verify all existing macro tests pass: `TestConfig_Macro*`
- [ ] Run `make test-all` - ensure all tests including concurrency tests pass
### Phase 5: Documentation
- [ ] Update plan status in this file (mark completed)
- [ ] Update CLAUDE.md if macro behavior needs documentation
- [ ] Verify no new error messages need user documentation
## Bug Example (Original Issue)
```yaml
macros:
"podman-llama": >
podman run --name ${MODEL_ID}
--init --rm -p ${PORT}:8080 -v /home/alex/ai/models:/models:z --gpus=all
ghcr.io/ggml-org/llama.cpp:server-cuda
"standard-options": >
--no-mmap --jinja
"kv8": >
-fa on -ctk q8_0 -ctv q8_0
```
**Current Bug:**
- During macro substitution, if `${MODEL_ID}` is processed before `${podman-llama}`, the `${MODEL_ID}` reference inside `podman-llama` remains unsubstituted
- Results in error: `unknown macro '${MODEL_ID}' found in model.cmd`
**After Fix:**
- Macros substituted in LIFO order: `kv8``standard-options``podman-llama`
- `MODEL_ID` is a reserved macro, substituted last (after all user macros)
- `${MODEL_ID}` inside `podman-llama` is correctly replaced with the model name
@@ -210,6 +210,11 @@ func main() {
})
})
r.GET("/v1/audio/voices", func(c *gin.Context) {
model := c.Query("model")
c.JSON(http.StatusOK, gin.H{"voices": []string{"voice1"}, "model": model})
})
r.GET("/slow-respond", func(c *gin.Context) {
echo := c.Query("echo")
delay := c.Query("delay")
+27
View File
@@ -0,0 +1,27 @@
# wol-proxy
wol-proxy automatically wakes up a suspended llama-swap server using Wake-on-LAN when requests are received.
When a request arrives and llama-swap is unavailable, wol-proxy sends a WOL packet and holds the request until the server becomes available. If the server doesn't respond within the timeout period (default: 60 seconds), the request is dropped.
This utility helps conserve energy by allowing GPU-heavy servers to remain suspended when idle, as they can consume hundreds of watts even when not actively processing requests.
## Usage
```shell
# minimal
$ ./wol-proxy -mac BA:DC:0F:FE:E0:00 -upstream http://192.168.1.13:8080
# everything
$ ./wol-proxy -mac BA:DC:0F:FE:E0:00 -upstream http://192.168.1.13:8080 \
# use debug log level
-log debug \
# altenerative listening port
-listen localhost:9999 \
# seconds to hold requests waiting for upstream to be ready
-timeout 30
```
## API
`GET /status` - that's it. Everything else is proxied to the upstream server.
+64
View File
@@ -0,0 +1,64 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Loading...</title>
<style>
body {
font-family: sans-serif;
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
margin: 0;
background: #f5f5f5;
}
.loader {
text-align: center;
}
.stats {
font-size: 18px;
color: #333;
margin: 20px 0;
}
.stats-label {
color: #666;
font-size: 14px;
}
</style>
</head>
<body>
<div class="loader">
<p>Waking up upstream server...</p>
<div class="stats">
<div><span class="stats-label">Time elapsed:</span> <span id="elapsed">0s</span></div>
<div><span id="attempts">&nbsp;</span></div>
</div>
</div>
<script>
var startTime = Date.now();
var attempts = 0;
setInterval(function() {
var elapsed = (Date.now() - startTime) / 1000;
document.getElementById('elapsed').textContent = elapsed.toFixed(1) + 's';
}, 100);
// Check status every second
setInterval(function() {
attempts++;
var dots = '.'.repeat((attempts % 10) || 10);
document.getElementById('attempts').textContent = dots;
fetch('/status')
.then(function(r) { return r.text(); })
.then(function(t) {
if (t.indexOf('status: ready') !== -1) {
location.reload();
}
})
.catch(function() {});
}, 1000);
</script>
</body>
</html>
+333
View File
@@ -0,0 +1,333 @@
package main
import (
"bufio"
"context"
_ "embed"
"errors"
"flag"
"fmt"
"io"
"log/slog"
"net"
"net/http"
"net/http/httputil"
"net/url"
"os"
"os/signal"
"strings"
"sync"
"time"
)
//go:embed index.html
var loadingPageHTML string
var (
flagMac = flag.String("mac", "", "mac address to send WoL packet to")
flagUpstream = flag.String("upstream", "", "upstream proxy address to send requests to")
flagListen = flag.String("listen", ":8080", "listen address to listen on")
flagLog = flag.String("log", "info", "log level (debug, info, warn, error)")
flagTimeout = flag.Int("timeout", 60, "seconds requests wait for upstream response before failing")
)
func main() {
flag.Parse()
switch *flagLog {
case "debug":
slog.SetLogLoggerLevel(slog.LevelDebug)
case "info":
slog.SetLogLoggerLevel(slog.LevelInfo)
case "warn":
slog.SetLogLoggerLevel(slog.LevelWarn)
case "error":
slog.SetLogLoggerLevel(slog.LevelError)
default:
slog.Error("invalid log level", "logLevel", *flagLog)
return
}
// Validate flags
if *flagListen == "" {
slog.Error("listen address is required")
return
}
if *flagMac == "" {
slog.Error("mac address is required")
return
}
if *flagTimeout < 1 {
slog.Error("timeout must be greater than 0")
return
}
var upstreamURL *url.URL
var err error
// validate mac address
if _, err = net.ParseMAC(*flagMac); err != nil {
slog.Error("invalid mac address", "error", err)
return
}
if *flagUpstream == "" {
slog.Error("upstream proxy address is required")
return
} else {
upstreamURL, err = url.ParseRequestURI(*flagUpstream)
if err != nil {
slog.Error("error parsing upstream url", "error", err)
return
}
}
proxy := newProxy(upstreamURL)
server := &http.Server{
Addr: *flagListen,
Handler: proxy,
}
// start the server
go func() {
slog.Info("server starting on", "address", *flagListen)
if err := server.ListenAndServe(); err != nil {
slog.Error("error starting server", "error", err)
}
}()
// graceful shutdown
ctx, _ := signal.NotifyContext(context.Background(), os.Interrupt)
<-ctx.Done()
server.Close()
}
type upstreamStatus string
const (
notready upstreamStatus = "not ready"
ready upstreamStatus = "ready"
)
type proxyServer struct {
upstreamProxy *httputil.ReverseProxy
failCount int
statusMutex sync.RWMutex
status upstreamStatus
}
func newProxy(url *url.URL) *proxyServer {
p := httputil.NewSingleHostReverseProxy(url)
proxy := &proxyServer{
upstreamProxy: p,
status: notready,
failCount: 0,
}
// start a goroutine to monitor upstream status via SSE
go func() {
eventsUrl := url.Scheme + "://" + url.Host + "/api/events"
client := &http.Client{
Timeout: 0, // No timeout for SSE connection
}
waitDuration := 10 * time.Second
for {
slog.Debug("connecting to SSE endpoint", "url", eventsUrl)
req, err := http.NewRequest("GET", eventsUrl, nil)
if err != nil {
slog.Warn("failed to create SSE request", "error", err)
proxy.setStatus(notready)
proxy.incFail(1)
time.Sleep(waitDuration)
continue
}
req.Header.Set("Accept", "text/event-stream")
req.Header.Set("Cache-Control", "no-cache")
req.Header.Set("Connection", "keep-alive")
resp, err := client.Do(req)
if err != nil {
slog.Error("failed to connect to SSE endpoint", "error", err)
proxy.setStatus(notready)
proxy.incFail(1)
time.Sleep(10 * time.Second)
continue
}
if resp.StatusCode != http.StatusOK {
slog.Warn("SSE endpoint returned non-OK status", "status", resp.StatusCode)
_, _ = io.Copy(io.Discard, resp.Body)
_ = resp.Body.Close()
proxy.setStatus(notready)
proxy.incFail(1)
time.Sleep(10 * time.Second)
continue
}
// Successfully connected to SSE endpoint
slog.Info("connected to SSE endpoint, upstream ready")
proxy.setStatus(ready)
proxy.resetFailures()
// Read from the SSE stream to detect disconnection
scanner := bufio.NewScanner(resp.Body)
// use a fairly large buffer to avoid scanner errors when reading large SSE events
buf := make([]byte, 0, 1024*1024*2)
scanner.Buffer(buf, 1024*1024*2)
events := 0
if slog.Default().Enabled(context.Background(), slog.LevelDebug) {
fmt.Print("Events: ")
}
for scanner.Scan() {
if slog.Default().Enabled(context.Background(), slog.LevelDebug) {
// Just read the events to keep connection alive
// We don't need to process the event data
events++
fmt.Printf("%d, ", events)
}
}
fmt.Println()
if err := scanner.Err(); err != nil {
slog.Error("error reading from SSE stream", "error", err)
}
// Connection closed or error occurred
_ = resp.Body.Close()
slog.Info("SSE connection closed, upstream not ready")
proxy.setStatus(notready)
proxy.incFail(1)
// Wait before reconnecting
time.Sleep(waitDuration)
}
}()
return proxy
}
func (p *proxyServer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if r.Method == "GET" && r.URL.Path == "/status" {
status := string(p.getStatus())
failCount := p.getFailures()
w.Header().Set("Content-Type", "text/plain")
w.WriteHeader(200)
fmt.Fprintf(w, "status: %s\n", status)
fmt.Fprintf(w, "failures: %d\n", failCount)
return
}
if p.getStatus() == notready {
path := r.URL.Path
if strings.HasPrefix(path, "/api/events") {
slog.Debug("Skipping wake up", "req", path)
w.WriteHeader(http.StatusNoContent)
return
}
slog.Info("upstream not ready, sending magic packet", "req", path, "from", r.RemoteAddr)
if err := sendMagicPacket(*flagMac); err != nil {
slog.Warn("failed to send magic WoL packet", "error", err)
}
// For root or UI path requests, return loading page with status polling
// the web page will do the polling and redirect when ready
if path == "/" || strings.HasPrefix(path, "/ui/") {
w.Header().Set("Content-Type", "text/html; charset=utf-8")
w.WriteHeader(http.StatusOK)
fmt.Fprint(w, loadingPageHTML)
return
}
ticker := time.NewTicker(250 * time.Millisecond)
timeout, cancel := context.WithTimeout(context.Background(), time.Duration(*flagTimeout)*time.Second)
defer cancel()
loop:
for {
select {
case <-timeout.Done():
slog.Info("timeout waiting for upstream to be ready")
http.Error(w, "timeout", http.StatusRequestTimeout)
return
case <-ticker.C:
if p.getStatus() == ready {
ticker.Stop()
break loop
}
}
}
}
p.upstreamProxy.ServeHTTP(w, r)
}
func (p *proxyServer) getStatus() upstreamStatus {
p.statusMutex.RLock()
defer p.statusMutex.RUnlock()
return p.status
}
func (p *proxyServer) setStatus(status upstreamStatus) {
p.statusMutex.Lock()
defer p.statusMutex.Unlock()
p.status = status
}
func (p *proxyServer) incFail(num int) {
p.statusMutex.Lock()
defer p.statusMutex.Unlock()
p.failCount += num
}
func (p *proxyServer) getFailures() int {
p.statusMutex.RLock()
defer p.statusMutex.RUnlock()
return p.failCount
}
func (p *proxyServer) resetFailures() {
p.statusMutex.Lock()
defer p.statusMutex.Unlock()
p.failCount = 0
}
func sendMagicPacket(macAddr string) error {
hwAddr, err := net.ParseMAC(macAddr)
if err != nil {
return err
}
if len(hwAddr) != 6 {
return errors.New("invalid MAC address")
}
// Create the magic packet.
packet := make([]byte, 102)
// Add 6 bytes of 0xFF.
for i := 0; i < 6; i++ {
packet[i] = 0xFF
}
// Repeat the MAC address 16 times.
for i := 1; i <= 16; i++ {
copy(packet[i*6:], hwAddr)
}
// Send the packet using UDP.
addr := net.UDPAddr{
IP: net.IPv4bcast,
Port: 9,
}
conn, err := net.DialUDP("udp", nil, &addr)
if err != nil {
return err
}
defer conn.Close()
_, err = conn.Write(packet)
return err
}
+377
View File
@@ -0,0 +1,377 @@
{
"$schema": "https://json-schema.org/draft-07/schema#",
"$id": "llama-swap-config-schema.json",
"title": "llama-swap configuration",
"description": "Configuration file for llama-swap",
"type": "object",
"required": [
"models"
],
"definitions": {
"macros": {
"type": "object",
"additionalProperties": {
"oneOf": [
{
"type": "string",
"minLength": 0,
"maxLength": 1024
},
{
"type": "number"
},
{
"type": "boolean"
}
]
},
"propertyNames": {
"type": "string",
"minLength": 1,
"maxLength": 64,
"pattern": "^[a-zA-Z0-9_-]+$",
"not": {
"enum": [
"PORT",
"MODEL_ID"
]
}
},
"default": {},
"description": "A dictionary of string substitutions. Macros are reusable snippets used in model cmd, cmdStop, proxy, checkEndpoint, filters.stripParams. Macro names must be <64 chars, match ^[a-zA-Z0-9_-]+$, and not be PORT or MODEL_ID. Values can be string, number, or boolean. Macros can reference other macros defined before them."
}
},
"properties": {
"healthCheckTimeout": {
"type": "integer",
"minimum": 15,
"default": 120,
"description": "Number of seconds to wait for a model to be ready to serve requests."
},
"globalTTL": {
"type": "integer",
"minimum": 0,
"default": 0,
"description": "Default TTL for all models in seconds, 0 means no TTL and models will never be automatically unloaded"
},
"logLevel": {
"type": "string",
"enum": [
"debug",
"info",
"warn",
"error"
],
"default": "info",
"description": "Sets the logging value. Valid values: debug, info, warn, error."
},
"logTimeFormat": {
"type": "string",
"enum": [
"",
"ansic",
"unixdate",
"rubydate",
"rfc822",
"rfc822z",
"rfc850",
"rfc1123",
"rfc1123z",
"rfc3339",
"rfc3339nano",
"kitchen",
"stamp",
"stampmilli",
"stampmicro",
"stampnano"
],
"default": "",
"description": "Enables and sets the logging timestamp format. Valid values: \"\", \"ansic\", \"unixdate\", \"rubydate\", \"rfc822\", \"rfc822z\", \"rfc850\", \"rfc1123\", \"rfc1123z\", \"rfc3339\", \"rfc3339nano\", \"kitchen\", \"stamp\", \"stampmilli\", \"stampmicro\", and \"stampnano\". For more info, read: https://pkg.go.dev/time#pkg-constants"
},
"metricsMaxInMemory": {
"type": "integer",
"default": 1000,
"description": "Maximum number of metrics to keep in memory. Controls how many metrics are stored before older ones are discarded."
},
"captureBuffer": {
"type": "integer",
"minimum": 0,
"default": 5,
"description": "Size in megabytes of the buffer for storing request/response captures. Set to 0 to disable captures."
},
"startPort": {
"type": "integer",
"default": 5800,
"description": "Starting port number for the automatic ${PORT} macro. The ${PORT} macro is incremented for every model that uses it."
},
"sendLoadingState": {
"type": "boolean",
"default": false,
"description": "Inject loading status updates into the reasoning field. When true, a stream of loading messages will be sent to the client."
},
"includeAliasesInList": {
"type": "boolean",
"default": false,
"description": "Present aliases within the /v1/models OpenAI API listing. when true, model aliases will be output to the API model listing duplicating all fields except for Id so chat UIs can use the alias equivalent to the original."
},
"macros": {
"$ref": "#/definitions/macros"
},
"models": {
"type": "object",
"description": "A dictionary of model configurations. Each key is a model's ID. Model settings have defaults if not defined. The model's ID is available as ${MODEL_ID}.",
"additionalProperties": {
"type": "object",
"required": [
"cmd"
],
"properties": {
"macros": {
"$ref": "#/definitions/macros"
},
"cmd": {
"type": "string",
"minLength": 1,
"description": "Command to run to start the inference server. Macros can be used. Comments allowed with |."
},
"cmdStop": {
"type": "string",
"default": "",
"description": "Command to run to stop the model gracefully. Uses ${PID} macro for upstream process id. If empty, default shutdown behavior is used."
},
"name": {
"type": "string",
"default": "",
"maxLength": 128,
"description": "Display name for the model. Used in v1/models API response."
},
"description": {
"type": "string",
"default": "",
"maxLength": 1024,
"description": "Description for the model. Used in v1/models API response."
},
"env": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[A-Z_][A-Z0-9_]*=.*$"
},
"default": [],
"description": "Array of environment variables to inject into cmd's environment. Each value is a string in ENV_NAME=value format."
},
"proxy": {
"type": "string",
"default": "http://localhost:${PORT}",
"format": "uri",
"description": "URL where llama-swap routes API requests. If custom port is used in cmd, this must be set."
},
"aliases": {
"type": "array",
"items": {
"type": "string",
"minLength": 1
},
"default": [],
"description": "Alternative model names for this configuration. Must be unique globally."
},
"checkEndpoint": {
"type": "string",
"default": "/health",
"pattern": "^/.*$|^none$",
"description": "URL path to check if the server is ready. Use 'none' to skip health checking."
},
"ttl": {
"type": "integer",
"minimum": -1,
"default": -1,
"description": "Automatically unload the model after ttl seconds. -1 uses the global TTL value, 0 disables unloading. Must be >0 to enable."
},
"useModelName": {
"type": "string",
"default": "",
"description": "Override the model name sent to upstream server. Useful if upstream expects a different name."
},
"filters": {
"type": "object",
"properties": {
"stripParams": {
"type": "string",
"default": "",
"pattern": "^[a-zA-Z0-9_, ]*$",
"description": "Comma separated list of parameters to remove from the request. Used for server-side enforcement of sampling parameters."
},
"setParams": {
"type": "object",
"additionalProperties": true,
"default": {},
"description": "Dictionary of parameters to set/override in requests. Useful for enforcing specific parameter values. Protected params like 'model' cannot be overridden. Values can be strings, numbers, booleans, arrays, or objects."
},
"setParamsByID": {
"type": "object",
"additionalProperties": {
"type": "object",
"additionalProperties": true
},
"default": {},
"description": "Dictionary mapping requested model IDs (or aliases) to parameters to set/override in requests. Applied after setParams and can override those values. Useful with aliases to vary behaviour depending on which alias the client used (e.g. different reasoning_effort per alias). Keys support ${MODEL_ID} macro substitution. Protected params like 'model' cannot be overridden."
}
},
"additionalProperties": false,
"default": {},
"description": "Dictionary of filter settings. Supports stripParams, setParams, and setParamsByID."
},
"metadata": {
"type": "object",
"additionalProperties": true,
"default": {},
"description": "Dictionary of arbitrary values included in /v1/models. Can contain complex types. Only passed through in /v1/models responses."
},
"concurrencyLimit": {
"type": "integer",
"minimum": 0,
"default": 0,
"description": "Overrides allowed number of active parallel requests to a model. 0 uses internal default of 10. >0 overrides default. Requests exceeding limit get HTTP 429."
},
"sendLoadingState": {
"type": "boolean",
"description": "Overrides the global sendLoadingState for this model. Ommitting this property will use the global setting."
},
"unlisted": {
"type": "boolean",
"default": false,
"description": "If true the model will not show up in /v1/models responses. It can still be used as normal in API requests."
}
}
}
},
"groups": {
"type": "object",
"additionalProperties": {
"type": "object",
"required": [
"members"
],
"properties": {
"swap": {
"type": "boolean",
"default": true,
"description": "Controls model swapping behaviour within the group. True: only one model runs at a time. False: all models can run together."
},
"exclusive": {
"type": "boolean",
"default": true,
"description": "Controls how the group affects other groups. True: causes all other groups to unload when this group runs a model. False: does not affect other groups."
},
"persistent": {
"type": "boolean",
"default": false,
"description": "Prevents other groups from unloading the models in this group. Does not affect individual model behaviour."
},
"members": {
"type": "array",
"items": {
"type": "string"
},
"description": "Array of model IDs that are members of this group. Model IDs must be defined in models."
}
}
},
"description": "A dictionary of group settings. Provides advanced controls over model swapping behaviour. Model IDs must be defined in models. A model can only be a member of one group. Behaviour controlled via swap, exclusive, persistent."
},
"hooks": {
"type": "object",
"properties": {
"on_startup": {
"type": "object",
"properties": {
"preload": {
"type": "array",
"items": {
"type": "string"
},
"default": [],
"description": "List of model IDs to load on startup. Model names must match keys in models. When preloading multiple models, define a group to prevent swapping."
}
},
"additionalProperties": false,
"description": "Actions to perform on startup. Only supported action is preload."
}
},
"additionalProperties": false,
"description": "A dictionary of event triggers and actions. Only supported hook is on_startup."
},
"logToStdout": {
"type": "string",
"enum": [
"proxy",
"upstream",
"both",
"none"
],
"default": "proxy",
"description": "Controls what is logged to stdout. 'proxy': logs generated by llama-swap, 'upstream': copy of upstream process stdout logs, 'both': both interleaved together, 'none': no logs written to stdout."
},
"apiKeys": {
"type": "array",
"items": {
"type": "string",
"minLength": 1
},
"default": [],
"description": "Require an API key when making requests to inference endpoints. When empty, authorization will not be checked. Each key is a non-empty string."
},
"peers": {
"type": "object",
"additionalProperties": {
"type": "object",
"required": [
"proxy",
"models"
],
"properties": {
"proxy": {
"type": "string",
"format": "uri",
"description": "A valid base URL to proxy requests to. Requested path to llama-swap will be appended to the end of the proxy value."
},
"apiKey": {
"type": "string",
"default": "",
"description": "A string key to be injected into the request. If blank, no key will be added. Key will be injected into headers: Authorization: Bearer <key> and x-api-key: <key>."
},
"models": {
"type": "array",
"items": {
"type": "string",
"minLength": 1
},
"description": "A list of models served by the peer."
},
"filters": {
"type": "object",
"properties": {
"stripParams": {
"type": "string",
"default": "",
"pattern": "^[a-zA-Z0-9_, ]*$",
"description": "Comma separated list of parameters to remove from the request. Useful for removing parameters that the peer doesn't support."
},
"setParams": {
"type": "object",
"additionalProperties": true,
"default": {},
"description": "Dictionary of parameters to set/override in requests to this peer. Useful for injecting provider-specific settings. Protected params like 'model' cannot be overridden. Values can be strings, numbers, booleans, arrays, or objects."
}
},
"additionalProperties": false,
"default": {},
"description": "Dictionary of filter settings for peer requests. Supports stripParams and setParams."
}
}
},
"default": {},
"description": "A dictionary of remote peers and models they provide. Peers can be another llama-swap or any server that provides the /v1/ generative API endpoints supported by llama-swap."
}
}
}
+225 -26
View File
@@ -1,3 +1,6 @@
# add this modeline for validation in vscode
# yaml-language-server: $schema=https://raw.githubusercontent.com/mostlygeek/llama-swap/refs/heads/main/config-schema.json
#
# llama-swap YAML configuration example
# -------------------------------------
#
@@ -23,28 +26,103 @@ healthCheckTimeout: 500
# - Valid log levels: debug, info, warn, error
logLevel: info
# logTimeFormat: enables and sets the logging timestamp format
# - optional, default (disabled): ""
# - Valid values: "", "ansic", "unixdate", "rubydate", "rfc822", "rfc822z",
# "rfc850", "rfc1123", "rfc1123z", "rfc3339", "rfc3339nano", "kitchen",
# "stamp", "stampmilli", "stampmicro", and "stampnano".
# - For more info, read: https://pkg.go.dev/time#pkg-constants
logTimeFormat: ""
# logToStdout: controls what is logged to stdout
# - optional, default: "proxy"
# - valid values:
# - "proxy": logs generated by llama-swap when swapping models,
# handling requests, etc.
# - "upstream": a copy of an upstream processes stdout logs
# - "both": both the proxy and upstream logs interleaved together
# - "none": no logs are ever written to stdout
logToStdout: "proxy"
# metricsMaxInMemory: maximum number of metrics to keep in memory
# - optional, default: 1000
# - controls how many metrics are stored in memory before older ones are discarded
# - useful for limiting memory usage when processing large volumes of metrics
metricsMaxInMemory: 1000
# captureBuffer: how many MBs to allocate for storing request/response captures
# - optional, default: 10
# - set to 0 to disable
captureBuffer: 15
# startPort: sets the starting port number for the automatic ${PORT} macro.
# - optional, default: 5800
# - the ${PORT} macro can be used in model.cmd and model.proxy settings
# - it is automatically incremented for every model that uses it
startPort: 10001
# sendLoadingState: inject loading status updates into the reasoning (thinking)
# field
# - optional, default: false
# - when true, a stream of loading messages will be sent to the client in the
# reasoning field so chat UIs can show that loading is in progress.
# - see #366 for more details
sendLoadingState: true
# includeAliasesInList: present aliases within the /v1/models OpenAI API listing
# - optional, default: false
# - when true, model aliases will be output to the API model listing duplicating
# all fields except for Id so chat UIs can use the alias equivalent to the original.
includeAliasesInList: false
# globalTTL: the default TTL in seconds before unloading a model
# - optional, default: 0 (never automatically unload)
# - must be >= 0
globalTTL: 0
# macros: a dictionary of string substitutions
# - optional, default: empty dictionary
# - macros are reusable snippets
# - used in a model's cmd, cmdStop, proxy and checkEndpoint
# - used in a model's cmd, cmdStop, proxy, checkEndpoint, filters.stripParams
# - useful for reducing common configuration settings
# - macro names are strings and must be less than 64 characters
# - macro names must match the regex ^[a-zA-Z0-9_-]+$
# - macro names must not be a reserved name: PORT or MODEL_ID
# - macro values can be numbers, bools, or strings
# - macros can contain other macros, but they must be defined before they are used
# - environment variables can be referenced with ${env.VAR_NAME} syntax
# - env macros are substituted first, before regular macros
# - if the env var is not set, config loading will fail with an error
macros:
# Example of a multi-line macro
"latest-llama": >
/path/to/llama-server/llama-server-ec9e0301
--port ${PORT}
"default_ctx": 4096
# Example of macro-in-macro usage. macros can contain other macros
# but they must be previously declared.
"default_args": "--ctx-size ${default_ctx}"
# Example of environment variable macros
# - ${env.VAR_NAME} pulls the value from the system environment
# - useful for paths, secrets, or machine-specific configuration
"models_dir": "${env.HOME}/models"
# apiKeys: require an API key when making requests to inference endpoints
# - optional, default: []
# - when empty (the default) authorization will not be checked as llama-swap is default-allow
# - each key is a non-empty string
apiKeys:
- "sk-hunter2"
# tip, one liner: printf "sk-%s\n" "$(head -c 48 /dev/urandom | base64 )"
- "sk-gyCPiKUcIfPlaM4OSMZekkprgijPx6+OsmQs8Rsg0xZ9qpy6gKWsIKqHOk+cgXVx"
# use environment variable macros to keep secrets out of the config
- "${env.API_KEY_1}"
- "${env.API_KEY_2}"
# models: a dictionary of model configurations
# - required
# - each key is the model's ID, used in API requests
@@ -52,9 +130,16 @@ macros:
# - the model's ID is available in the ${MODEL_ID} macro, also available in macros defined above
# - below are examples of the all the settings a model can have
models:
# keys are the model names used in API requests
"llama":
"gpt-oss-120b":
# macros: a dictionary of string substitutions specific to this model
# - optional, default: empty dictionary
# - macros defined here override macros defined in the global macros section
# - model level macros follow the same rules as global macros
macros:
"default_ctx": 16384
"temp": 0.7
# cmd: the command to run to start the inference server.
# - required
# - it is just a string, similar to what you would run on the CLI
@@ -63,19 +148,21 @@ models:
cmd: |
# ${latest-llama} is a macro that is defined above
${latest-llama}
--model path/to/llama-8B-Q4_K_M.gguf
--model path/to/gpt-oss-120B.gguf
--ctx-size ${default_ctx}
--temperature ${temp}
# name: a display name for the model
# - optional, default: empty string
# - if set, it will be used in the v1/models API response
# - if not set, it will be omitted in the JSON model record
name: "llama 3.1 8B"
name: "gpt-oss 120B"
# description: a description for the model
# - optional, default: empty string
# - if set, it will be used in the v1/models API response
# - if not set, it will be omitted in the JSON model record
description: "A small but capable model used for quick testing"
description: "A thinking model from OpenAI"
# env: define an array of environment variables to inject into cmd's environment
# - optional, default: empty array
@@ -90,14 +177,6 @@ models:
# - if you use a custom port in cmd this *must* be set
proxy: http://127.0.0.1:8999
# aliases: alternative model names that this model configuration is used for
# - optional, default: empty array
# - aliases must be unique globally
# - useful for impersonating a specific model
aliases:
- "gpt-4o-mini"
- "gpt-3.5-turbo"
# checkEndpoint: URL path to check if the server is ready
# - optional, default: /health
# - endpoint is expected to return an HTTP 200 response
@@ -106,8 +185,10 @@ models:
checkEndpoint: /custom-endpoint
# ttl: automatically unload the model after ttl seconds
# - optional, default: 0
# - ttl values must be a value greater than 0
# - optional, default: -1 (use global default)
# - ttl values must be a value greater than or equal to 0
# - a ttl of -1 will use the global TTL value as the default
# - a ttl of 0 will mean never unload
# - a value of 0 disables automatic unloading of the model
ttl: 60
@@ -115,19 +196,80 @@ models:
# - optional, default: ""
# - useful for when the upstream server expects a specific model name that
# is different from the model's ID
useModelName: "qwen:qwq"
useModelName: "openai/gpt-oss-120B"
# filters: a dictionary of filter settings
# - optional, default: empty dictionary
# - only strip_params is currently supported
# - same capabilities as peer filters (stripParams, setParams)
filters:
# strip_params: a comma separated list of parameters to remove from the request
# stripParams: a comma separated list of parameters to remove from the request
# - optional, default: ""
# - useful for server side enforcement of sampling parameters
# - the `model` parameter can never be removed
# - can be any JSON key in the request body
# - recommended to stick to sampling parameters
strip_params: "temperature, top_p, top_k"
stripParams: "temperature, top_p, top_k"
# setParams: a dictionary of parameters to set/override in requests
# - optional, default: empty dictionary
# - useful for enforcing specific parameter values
# - protected params like "model" cannot be overridden
# - values can be strings, numbers, booleans, arrays, or objects
# - always runs for the model
setParams:
# Example: enforce specific sampling parameters
temperature: 0.7
top_p: 0.9
# setParamsByID: a dictionary of parameters to set based the model ID
# - optional, default: empty dictionary
# - combine with aliases to create variant behaviour without reloading the model
# - parameters are set in the request body JSON
# - run after setParams so it will override any settings
# - protected params like "model" cannot be overridden
# - values can be strings, numbers, booleans, arrays, or objects
# - model aliases will be automatically created for each key
setParamsByID:
"${MODEL_ID}":
chat_template_kwargs:
reasoning_effort: medium
"${MODEL_ID}:high":
chat_template_kwargs:
reasoning_effort: high
"${MODEL_ID}:low":
chat_template_kwargs:
reasoning_effort: low
# aliases: alternative model names that this model configuration is used for
# - optional, default: empty array
# - aliases must be unique globally
# - useful for impersonating a specific model
aliases:
- "gpt-4o-mini"
# metadata: a dictionary of arbitrary values that are included in /v1/models
# - optional, default: empty dictionary
# - while metadata can contains complex types it is recommended to keep it simple
# - metadata is only passed through in /v1/models responses
metadata:
# port will remain an integer
port: ${PORT}
# the ${temp} macro will remain a float
temperature: ${temp}
note: "The ${MODEL_ID} is running on port ${PORT} temp=${temp}, context=${default_ctx}"
a_list:
- 1
- 1.23
- "macros are OK in list and dictionary types: ${MODEL_ID}"
an_obj:
a: "1"
b: 2
# objects can contain complex types with macro substitution
# becomes: c: [0.7, false, "model: llama"]
c: ["${temp}", false, "model: ${MODEL_ID}"]
# concurrencyLimit: overrides the allowed number of active parallel requests to a model
# - optional, default: 0
@@ -138,6 +280,10 @@ models:
# - recommended to be omitted and the default used
concurrencyLimit: 0
# sendLoadingState: overrides the global sendLoadingState setting for this model
# - optional, default: undefined (use global setting)
sendLoadingState: false
# Unlisted model example:
"qwen-unlisted":
# unlisted: boolean, true or false
@@ -240,10 +386,63 @@ hooks:
# - optional, default: empty dictionary
# - the only supported action is preload
on_startup:
# preload: a list of model ids to load on startup
# - optional, default: empty list
# - model names must match keys in the models sections
# - when preloading multiple models at once, define a group
# otherwise models will be loaded and swapped out
# preload: a list of model ids to load on startup
# - optional, default: empty list
# - model names must match keys in the models sections
# - when preloading multiple models at once, define a group
# otherwise models will be loaded and swapped out
preload:
- "llama"
- "llama"
# peers: a dictionary of remote peers and models they provide
# - optional, default empty dictionary
# - peers can be another llama-swap
# - peers can be any server that provides the /v1/ generative api endpoints supported by llama-swap
peers:
# keys is the peer'd ID
llama-swap-peer:
# proxy: a valid base URL to proxy requests to
# - required
# - requested path to llama-swap will be appended to the end of the proxy value
proxy: http://192.168.1.23
# models: a list of models served by the peer
# - required
models:
- model_a
- model_b
- embeddings/model_c
openrouter:
proxy: https://openrouter.ai/api
# apiKey: a string key to be injected into the request
# - optional, default: ""
# - if blank, no key will be added to the request
# - key will be injected into headers: Authorization: Bearer <key> and x-api-key: <key>
# - can be a string or a macro
apiKey: ${env.OPENROUTER_API_KEY}
models:
- meta-llama/llama-3.1-8b-instruct
- qwen/qwen3-235b-a22b-2507
- deepseek/deepseek-v3.2
- z-ai/glm-4.7
- moonshotai/kimi-k2-0905
- minimax/minimax-m2.1
# filters: a dictionary of filter settings for peer requests
# - optional, default: empty dictionary
# - same capabilities as model filters (stripParams, setParams)
filters:
# stripParams: a comma separated list of parameters to remove from the request
# - optional, default: ""
# - useful for removing parameters that the peer doesn't support
# - the `model` parameter can never be removed
stripParams: "temperature, top_p"
# setParams: a dictionary of parameters to set/override in requests to this peer
# - optional, default: empty dictionary
# - useful for injecting provider-specific settings like data retention policies
# - protected params like "model" cannot be overridden
# - values can be strings, numbers, booleans, arrays, or objects
setParams:
# Example: enforce zero-data-retention for OpenRouter
provider:
data_collection: "deny"
zdr: true
+139 -30
View File
@@ -1,55 +1,164 @@
#!/bin/bash
set -euo pipefail
cd $(dirname "$0")
# use this to test locally, example:
# GITHUB_TOKEN=$(gh auth token) LOG_DEBUG=1 DEBUG_ABORT_BUILD=1 ./docker/build-container.sh rocm
# you need read:package scope on the token. Generate a personal access token with
# the scopes: gist, read:org, repo, write:packages
# then: gh auth login (and copy/paste the new token)
LOG_DEBUG=${LOG_DEBUG:-0}
DEBUG_ABORT_BUILD=${DEBUG_ABORT_BUILD:-}
log_debug() {
if [ "$LOG_DEBUG" = "1" ]; then
echo "[DEBUG] $*"
fi
}
log_info() {
echo "[INFO] $*"
}
ARCH=$1
PUSH_IMAGES=${2:-false}
# List of allowed architectures
ALLOWED_ARCHS=("intel" "vulkan" "musa" "cuda" "cpu")
ALLOWED_ARCHS=("intel" "vulkan" "musa" "cuda" "cuda13" "cpu" "rocm")
# Check if ARCH is in the allowed list
if [[ ! " ${ALLOWED_ARCHS[@]} " =~ " ${ARCH} " ]]; then
echo "Error: ARCH must be one of the following: ${ALLOWED_ARCHS[@]}"
log_info "Error: ARCH must be one of the following: ${ALLOWED_ARCHS[@]}"
exit 1
fi
# Check if GITHUB_TOKEN is set and not empty
if [[ -z "$GITHUB_TOKEN" ]]; then
echo "Error: GITHUB_TOKEN is not set or is empty."
if [[ -z "${GITHUB_TOKEN:-}" ]]; then
log_info "Error: GITHUB_TOKEN is not set or is empty."
exit 1
fi
# Set llama.cpp base image, customizable using the BASE_LLAMACPP_IMAGE environment
# variable, this permits testing with forked llama.cpp repositories
BASE_IMAGE=${BASE_LLAMACPP_IMAGE:-ghcr.io/ggml-org/llama.cpp}
SD_IMAGE=${BASE_SDCPP_IMAGE:-ghcr.io/leejet/stable-diffusion.cpp}
# Set llama-swap repository, automatically uses GITHUB_REPOSITORY variable
# to enable easy container builds on forked repos
LS_REPO=${GITHUB_REPOSITORY:-mostlygeek/llama-swap}
# the most recent llama-swap tag
# have to strip out the 'v' due to .tar.gz file naming
LS_VER=$(curl -s https://api.github.com/repos/mostlygeek/llama-swap/releases/latest | jq -r .tag_name | sed 's/v//')
LS_VER=$(curl -s https://api.github.com/repos/${LS_REPO}/releases/latest | jq -r .tag_name | sed 's/v//')
# Fetches the most recent llama.cpp tag matching the given prefix
# Handles pagination to search beyond the first 100 results
# $1 - tag_prefix (e.g., "server" or "server-vulkan")
# Returns: the version number extracted from the tag
fetch_llama_tag() {
local tag_prefix=$1
local page=1
local per_page=100
while true; do
log_debug "Fetching page $page for tag prefix: $tag_prefix"
local response=$(curl -s -H "Authorization: Bearer $GITHUB_TOKEN" \
"https://api.github.com/users/ggml-org/packages/container/llama.cpp/versions?per_page=${per_page}&page=${page}")
# Check for API errors
if echo "$response" | jq -e '.message' > /dev/null 2>&1; then
local error_msg=$(echo "$response" | jq -r '.message')
log_info "GitHub API error: $error_msg"
return 1
fi
# Check if response is empty array (no more pages)
if [ "$(echo "$response" | jq 'length')" -eq 0 ]; then
log_debug "No more pages (empty response)"
return 1
fi
# Extract matching tag from this page
local found_tag=$(echo "$response" | jq -r \
".[] | select(.metadata.container.tags[]? | startswith(\"$tag_prefix\")) | .metadata.container.tags[] | select(startswith(\"$tag_prefix\"))" \
| sort -r | head -n1)
if [ -n "$found_tag" ]; then
log_debug "Found tag: $found_tag on page $page"
echo "$found_tag" | awk -F '-' '{print $NF}'
return 0
fi
page=$((page + 1))
# Safety limit to prevent infinite loops
if [ $page -gt 50 ]; then
log_info "Reached pagination safety limit (50 pages)"
return 1
fi
done
}
if [ "$ARCH" == "cpu" ]; then
# cpu only containers just use the latest available
CONTAINER_LATEST="ghcr.io/mostlygeek/llama-swap:cpu"
echo "Building ${CONTAINER_LATEST} $LS_VER"
docker build -f llama-swap.Containerfile --build-arg BASE_TAG=server --build-arg LS_VER=${LS_VER} -t ${CONTAINER_LATEST} .
if [ "$PUSH_IMAGES" == "true" ]; then
docker push ${CONTAINER_LATEST}
fi
LCPP_TAG=$(fetch_llama_tag "server")
BASE_TAG=server-${LCPP_TAG}
else
LCPP_TAG=$(curl -s -H "Authorization: Bearer $GITHUB_TOKEN" \
"https://api.github.com/users/ggml-org/packages/container/llama.cpp/versions" \
| jq -r --arg arch "$ARCH" '.[] | select(.metadata.container.tags[] | startswith("server-\($arch)")) | .metadata.container.tags[]' \
| sort -r | head -n1 | awk -F '-' '{print $3}')
LCPP_TAG=$(fetch_llama_tag "server-${ARCH}")
BASE_TAG=server-${ARCH}-${LCPP_TAG}
fi
# Abort if LCPP_TAG is empty.
if [[ -z "$LCPP_TAG" ]]; then
echo "Abort: Could not find llama-server container for arch: $ARCH"
exit 1
fi
SD_TAG=master-${ARCH}
CONTAINER_TAG="ghcr.io/mostlygeek/llama-swap:v${LS_VER}-${ARCH}-${LCPP_TAG}"
CONTAINER_LATEST="ghcr.io/mostlygeek/llama-swap:${ARCH}"
echo "Building ${CONTAINER_TAG} $LS_VER"
docker build -f llama-swap.Containerfile --build-arg BASE_TAG=server-${ARCH}-${LCPP_TAG} --build-arg LS_VER=${LS_VER} -t ${CONTAINER_TAG} -t ${CONTAINER_LATEST} .
if [ "$PUSH_IMAGES" == "true" ]; then
docker push ${CONTAINER_TAG}
docker push ${CONTAINER_LATEST}
fi
fi
# Abort if LCPP_TAG is empty.
if [[ -z "$LCPP_TAG" ]]; then
log_info "Abort: Could not find llama-server container for arch: $ARCH"
exit 1
else
log_info "LCPP_TAG: $LCPP_TAG"
fi
if [[ ! -z "$DEBUG_ABORT_BUILD" ]]; then
log_info "Abort: DEBUG_ABORT_BUILD set"
exit 0
fi
for CONTAINER_TYPE in non-root root; do
CONTAINER_TAG="ghcr.io/${LS_REPO}:v${LS_VER}-${ARCH}-${LCPP_TAG}"
CONTAINER_LATEST="ghcr.io/${LS_REPO}:${ARCH}"
USER_UID=0
USER_GID=0
USER_HOME=/root
if [ "$CONTAINER_TYPE" == "non-root" ]; then
CONTAINER_TAG="${CONTAINER_TAG}-non-root"
CONTAINER_LATEST="${CONTAINER_LATEST}-non-root"
USER_UID=10001
USER_GID=10001
USER_HOME=/app
fi
log_info "Building $CONTAINER_TYPE $CONTAINER_TAG $LS_VER"
docker build --provenance=false -f llama-swap.Containerfile --build-arg BASE_TAG=${BASE_TAG} --build-arg LS_VER=${LS_VER} --build-arg UID=${USER_UID} \
--build-arg LS_REPO=${LS_REPO} --build-arg GID=${USER_GID} --build-arg USER_HOME=${USER_HOME} -t ${CONTAINER_TAG} -t ${CONTAINER_LATEST} \
--build-arg BASE_IMAGE=${BASE_IMAGE} .
# For architectures with stable-diffusion.cpp support, layer sd-server on top
case "$ARCH" in
"musa" | "vulkan")
log_info "Adding sd-server to $CONTAINER_TAG"
docker build --provenance=false -f llama-swap-sd.Containerfile \
--build-arg BASE=${CONTAINER_TAG} \
--build-arg SD_IMAGE=${SD_IMAGE} --build-arg SD_TAG=${SD_TAG} \
--build-arg UID=${USER_UID} --build-arg GID=${USER_GID} \
-t ${CONTAINER_TAG} -t ${CONTAINER_LATEST} . ;;
esac
if [ "$PUSH_IMAGES" == "true" ]; then
docker push ${CONTAINER_TAG}
docker push ${CONTAINER_LATEST}
fi
done
+16 -1
View File
@@ -15,4 +15,19 @@ models:
cmd: >
/app/llama-server
-hf bartowski/SmolLM2-135M-Instruct-GGUF:Q4_K_M
--port 9999
--port 9999
z-image:
checkEndpoint: /
cmd: |
/app/sd-server
--listen-port 9999
--diffusion-fa
--diffusion-model /models/z_image_turbo-Q8_0.gguf
--vae /models/ae.safetensors
--llm /models/qwen3-4b-instruct-2507-q8_0.gguf
--offload-to-cpu
--cfg-scale 1.0
--height 512 --width 512
--steps 8
aliases: [gpt-image-1,dall-e-2,dall-e-3,gpt-image-1-mini,gpt-image-1.5]
+11
View File
@@ -0,0 +1,11 @@
ARG SD_IMAGE=ghcr.io/leejet/stable-diffusion.cpp
ARG SD_TAG=master-vulkan
ARG BASE=llama-swap:latest
FROM ${SD_IMAGE}:${SD_TAG} AS sd-source
FROM ${BASE}
ARG UID=10001
ARG GID=10001
COPY --from=sd-source --chown=${UID}:${GID} /sd-server /app/sd-server
+36 -8
View File
@@ -1,16 +1,44 @@
ARG BASE_IMAGE=ghcr.io/ggml-org/llama.cpp
ARG BASE_TAG=server-cuda
FROM ghcr.io/ggml-org/llama.cpp:${BASE_TAG}
FROM ${BASE_IMAGE}:${BASE_TAG}
# has to be after the FROM
ARG LS_VER=89
ARG LS_VER=170
ARG LS_REPO=mostlygeek/llama-swap
# Set default UID/GID arguments
ARG UID=10001
ARG GID=10001
ARG USER_HOME=/app
# Add user/group
ENV HOME=$USER_HOME
RUN if [ $UID -ne 0 ]; then \
if [ $GID -ne 0 ]; then \
groupadd --system --gid $GID app; \
fi; \
useradd --system --uid $UID --gid $GID \
--home $USER_HOME app; \
fi
# Handle paths
RUN mkdir --parents $HOME /app
RUN chown --recursive $UID:$GID $HOME /app
# Switch user
USER $UID:$GID
WORKDIR /app
RUN \
curl -LO https://github.com/mostlygeek/llama-swap/releases/download/v"${LS_VER}"/llama-swap_"${LS_VER}"_linux_amd64.tar.gz && \
tar -zxf llama-swap_"${LS_VER}"_linux_amd64.tar.gz && \
rm llama-swap_"${LS_VER}"_linux_amd64.tar.gz
COPY config.example.yaml /app/config.yaml
# Add /app to PATH
ENV PATH="/app:${PATH}"
RUN \
curl -LO "https://github.com/${LS_REPO}/releases/download/v${LS_VER}/llama-swap_${LS_VER}_linux_amd64.tar.gz" && \
tar -zxf "llama-swap_${LS_VER}_linux_amd64.tar.gz" && \
rm "llama-swap_${LS_VER}_linux_amd64.tar.gz"
COPY --chown=$UID:$GID config.example.yaml /app/config.yaml
HEALTHCHECK CMD curl -f http://localhost:8080/ || exit 1
ENTRYPOINT [ "/app/llama-swap", "-config", "/app/config.yaml" ]
ENTRYPOINT [ "/app/llama-swap", "-config", "/app/config.yaml" ]

Before

Width:  |  Height:  |  Size: 261 KiB

After

Width:  |  Height:  |  Size: 261 KiB

Before

Width:  |  Height:  |  Size: 351 KiB

After

Width:  |  Height:  |  Size: 351 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 198 KiB

+467
View File
@@ -0,0 +1,467 @@
# config.yaml
llama-swap is designed to be very simple: one binary, one configuration file.
## minimal viable config
```yaml
models:
model1:
cmd: llama-server --port ${PORT} --model /path/to/model.gguf
```
This is enough to launch `llama-server` to serve `model1`. Of course, llama-swap is about making it possible to serve many models:
```yaml
models:
model1:
cmd: llama-server --port ${PORT} -m /path/to/model.gguf
model2:
cmd: llama-server --port ${PORT} -m /path/to/another_model.gguf
model3:
cmd: llama-server --port ${PORT} -m /path/to/third_model.gguf
```
With this configuration models will be hot swapped and loaded on demand. The special `${PORT}` macro provides a unique port per model. Useful if you want to run multiple models at the same time with the `groups` feature.
## Advanced control with `cmd`
llama-swap is also about customizability. You can use any CLI flag available:
```yaml
models:
model1:
cmd: | # support for multi-line
llama-server --PORT ${PORT} -m /path/to/model.gguf
--ctx-size 8192
--jinja
--cache-type-k q8_0
--cache-type-v q8_0
```
## Support for any OpenAI API compatible server
llama-swap supports any OpenAI API compatible server. If you can run it on the CLI llama-swap will be able to manage it. Even if it's run in Docker or Podman containers.
```yaml
models:
"Q3-30B-CODER-VLLM":
name: "Qwen3 30B Coder vllm AWQ (Q3-30B-CODER-VLLM)"
# cmdStop provides a reliable way to stop containers
cmdStop: docker stop vllm-coder
cmd: |
docker run --init --rm --name vllm-coder
--runtime=nvidia --gpus '"device=2,3"'
--shm-size=16g
-v /mnt/nvme/vllm-cache:/root/.cache
-v /mnt/ssd-extra/models:/models -p ${PORT}:8000
vllm/vllm-openai:v0.10.0
--model "/models/cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ"
--served-model-name "Q3-30B-CODER-VLLM"
--enable-expert-parallel
--swap-space 16
--max-num-seqs 512
--max-model-len 65536
--max-seq-len-to-capture 65536
--gpu-memory-utilization 0.9
--tensor-parallel-size 2
--trust-remote-code
```
## Many more features..
llama-swap supports many more features to customize how you want to manage your environment.
| Feature | Description |
| --------- | ---------------------------------------------- |
| `ttl` | automatic unloading of models after a timeout |
| `macros` | reusable snippets to use in configurations |
| `groups` | run multiple models at a time |
| `hooks` | event driven functionality |
| `env` | define environment variables per model |
| `aliases` | serve a model with different names |
| `filters` | modify requests before sending to the upstream |
| `...` | And many more tweaks |
## Full Configuration Example
> [!NOTE]
> Always check [config.example.yaml](https://github.com/mostlygeek/llama-swap/blob/main/config.example.yaml) for the most up to date reference for all example configurations.
```yaml
# add this modeline for validation in vscode
# yaml-language-server: $schema=https://raw.githubusercontent.com/mostlygeek/llama-swap/refs/heads/main/config-schema.json
#
# llama-swap YAML configuration example
# -------------------------------------
#
# 💡 Tip - Use an LLM with this file!
# ====================================
# This example configuration is written to be LLM friendly. Try
# copying this file into an LLM and asking it to explain or generate
# sections for you.
# ====================================
# Usage notes:
# - Below are all the available configuration options for llama-swap.
# - Settings noted as "required" must be in your configuration file
# - Settings noted as "optional" can be omitted
# healthCheckTimeout: number of seconds to wait for a model to be ready to serve requests
# - optional, default: 120
# - minimum value is 15 seconds, anything less will be set to this value
healthCheckTimeout: 500
# logLevel: sets the logging value
# - optional, default: info
# - Valid log levels: debug, info, warn, error
logLevel: info
# logTimeFormat: enables and sets the logging timestamp format
# - optional, default (disabled): ""
# - Valid values: "", "ansic", "unixdate", "rubydate", "rfc822", "rfc822z",
# "rfc850", "rfc1123", "rfc1123z", "rfc3339", "rfc3339nano", "kitchen",
# "stamp", "stampmilli", "stampmicro", and "stampnano".
# - For more info, read: https://pkg.go.dev/time#pkg-constants
logTimeFormat: ""
# logToStdout: controls what is logged to stdout
# - optional, default: "proxy"
# - valid values:
# - "proxy": logs generated by llama-swap when swapping models,
# handling requests, etc.
# - "upstream": a copy of an upstream processes stdout logs
# - "both": both the proxy and upstream logs interleaved together
# - "none": no logs are ever written to stdout
logToStdout: "proxy"
# metricsMaxInMemory: maximum number of metrics to keep in memory
# - optional, default: 1000
# - controls how many metrics are stored in memory before older ones are discarded
# - useful for limiting memory usage when processing large volumes of metrics
metricsMaxInMemory: 1000
# startPort: sets the starting port number for the automatic ${PORT} macro.
# - optional, default: 5800
# - the ${PORT} macro can be used in model.cmd and model.proxy settings
# - it is automatically incremented for every model that uses it
startPort: 10001
# sendLoadingState: inject loading status updates into the reasoning (thinking)
# field
# - optional, default: false
# - when true, a stream of loading messages will be sent to the client in the
# reasoning field so chat UIs can show that loading is in progress.
# - see #366 for more details
sendLoadingState: true
# includeAliasesInList: present aliases within the /v1/models OpenAI API listing
# - optional, default: false
# - when true, model aliases will be output to the API model listing duplicating
# all fields except for Id so chat UIs can use the alias equivalent to the original.
includeAliasesInList: false
# apiKeys: require an API key when making requests to inference endpoints
# - optional, default: []
# - when empty (the default) authorization will not be checked as llama-swap is default-allow
# - each key is a non-empty string
apiKeys:
- "sk-hunter2"
# hint, one liner: printf "sk-%s\n" "$(head -c 48 /dev/urandom | base64 )"
- "sk-gyCPiKUcIfPlaM4OSMZekkprgijPx6+OsmQs8Rsg0xZ9qpy6gKWsIKqHOk+cgXVx"
- "sk-+QtIn0Zjj4UHjiaZYiZEnru4mrwKM9RzhmJeK5SobNXLl8QMFXxGz1/2lEuvQpkb"
# macros: a dictionary of string substitutions
# - optional, default: empty dictionary
# - macros are reusable snippets
# - used in a model's cmd, cmdStop, proxy, checkEndpoint, filters.stripParams
# - useful for reducing common configuration settings
# - macro names are strings and must be less than 64 characters
# - macro names must match the regex ^[a-zA-Z0-9_-]+$
# - macro names must not be a reserved name: PORT or MODEL_ID
# - macro values can be numbers, bools, or strings
# - macros can contain other macros, but they must be defined before they are used
macros:
# Example of a multi-line macro
"latest-llama": >
/path/to/llama-server/llama-server-ec9e0301
--port ${PORT}
"default_ctx": 4096
# Example of macro-in-macro usage. macros can contain other macros
# but they must be previously declared.
"default_args": "--ctx-size ${default_ctx}"
# models: a dictionary of model configurations
# - required
# - each key is the model's ID, used in API requests
# - model settings have default values that are used if they are not defined here
# - the model's ID is available in the ${MODEL_ID} macro, also available in macros defined above
# - below are examples of the all the settings a model can have
models:
# keys are the model names used in API requests
"llama":
# macros: a dictionary of string substitutions specific to this model
# - optional, default: empty dictionary
# - macros defined here override macros defined in the global macros section
# - model level macros follow the same rules as global macros
macros:
"default_ctx": 16384
"temp": 0.7
# cmd: the command to run to start the inference server.
# - required
# - it is just a string, similar to what you would run on the CLI
# - using `|` allows for comments in the command, these will be parsed out
# - macros can be used within cmd
cmd: |
# ${latest-llama} is a macro that is defined above
${latest-llama}
--model path/to/llama-8B-Q4_K_M.gguf
--ctx-size ${default_ctx}
--temperature ${temp}
# name: a display name for the model
# - optional, default: empty string
# - if set, it will be used in the v1/models API response
# - if not set, it will be omitted in the JSON model record
name: "llama 3.1 8B"
# description: a description for the model
# - optional, default: empty string
# - if set, it will be used in the v1/models API response
# - if not set, it will be omitted in the JSON model record
description: "A small but capable model used for quick testing"
# env: define an array of environment variables to inject into cmd's environment
# - optional, default: empty array
# - each value is a single string
# - in the format: ENV_NAME=value
env:
- "CUDA_VISIBLE_DEVICES=0,1,2"
# proxy: the URL where llama-swap routes API requests
# - optional, default: http://localhost:${PORT}
# - if you used ${PORT} in cmd this can be omitted
# - if you use a custom port in cmd this *must* be set
proxy: http://127.0.0.1:8999
# aliases: alternative model names that this model configuration is used for
# - optional, default: empty array
# - aliases must be unique globally
# - useful for impersonating a specific model
aliases:
- "gpt-4o-mini"
- "gpt-3.5-turbo"
# checkEndpoint: URL path to check if the server is ready
# - optional, default: /health
# - endpoint is expected to return an HTTP 200 response
# - all requests wait until the endpoint is ready or fails
# - use "none" to skip endpoint health checking
checkEndpoint: /custom-endpoint
# ttl: automatically unload the model after ttl seconds
# - optional, default: 0
# - ttl values must be a value greater than 0
# - a value of 0 disables automatic unloading of the model
ttl: 60
# useModelName: override the model name that is sent to upstream server
# - optional, default: ""
# - useful for when the upstream server expects a specific model name that
# is different from the model's ID
useModelName: "qwen:qwq"
# filters: a dictionary of filter settings
# - optional, default: empty dictionary
# - only stripParams is currently supported
filters:
# stripParams: a comma separated list of parameters to remove from the request
# - optional, default: ""
# - useful for server side enforcement of sampling parameters
# - the `model` parameter can never be removed
# - can be any JSON key in the request body
# - recommended to stick to sampling parameters
stripParams: "temperature, top_p, top_k"
# metadata: a dictionary of arbitrary values that are included in /v1/models
# - optional, default: empty dictionary
# - while metadata can contains complex types it is recommended to keep it simple
# - metadata is only passed through in /v1/models responses
metadata:
# port will remain an integer
port: ${PORT}
# the ${temp} macro will remain a float
temperature: ${temp}
note: "The ${MODEL_ID} is running on port ${PORT} temp=${temp}, context=${default_ctx}"
a_list:
- 1
- 1.23
- "macros are OK in list and dictionary types: ${MODEL_ID}"
an_obj:
a: "1"
b: 2
# objects can contain complex types with macro substitution
# becomes: c: [0.7, false, "model: llama"]
c: ["${temp}", false, "model: ${MODEL_ID}"]
# concurrencyLimit: overrides the allowed number of active parallel requests to a model
# - optional, default: 0
# - useful for limiting the number of active parallel requests a model can process
# - must be set per model
# - any number greater than 0 will override the internal default value of 10
# - any requests that exceeds the limit will receive an HTTP 429 Too Many Requests response
# - recommended to be omitted and the default used
concurrencyLimit: 0
# sendLoadingState: overrides the global sendLoadingState setting for this model
# - optional, default: undefined (use global setting)
sendLoadingState: false
# Unlisted model example:
"qwen-unlisted":
# unlisted: boolean, true or false
# - optional, default: false
# - unlisted models do not show up in /v1/models api requests
# - can be requested as normal through all apis
unlisted: true
cmd: llama-server --port ${PORT} -m Llama-3.2-1B-Instruct-Q4_K_M.gguf -ngl 0
# Docker example:
# container runtimes like Docker and Podman can be used reliably with
# a combination of cmd, cmdStop, and ${MODEL_ID}
"docker-llama":
proxy: "http://127.0.0.1:${PORT}"
cmd: |
docker run --name ${MODEL_ID}
--init --rm -p ${PORT}:8080 -v /mnt/nvme/models:/models
ghcr.io/ggml-org/llama.cpp:server
--model '/models/Qwen2.5-Coder-0.5B-Instruct-Q4_K_M.gguf'
# cmdStop: command to run to stop the model gracefully
# - optional, default: ""
# - useful for stopping commands managed by another system
# - the upstream's process id is available in the ${PID} macro
#
# When empty, llama-swap has this default behaviour:
# - on POSIX systems: a SIGTERM signal is sent
# - on Windows, calls taskkill to stop the process
# - processes have 5 seconds to shutdown until forceful termination is attempted
cmdStop: docker stop ${MODEL_ID}
# groups: a dictionary of group settings
# - optional, default: empty dictionary
# - provides advanced controls over model swapping behaviour
# - using groups some models can be kept loaded indefinitely, while others are swapped out
# - model IDs must be defined in the Models section
# - a model can only be a member of one group
# - group behaviour is controlled via the `swap`, `exclusive` and `persistent` fields
# - see issue #109 for details
#
# NOTE: the example below uses model names that are not defined above for demonstration purposes
groups:
# group1 works the same as the default behaviour of llama-swap where only one model is allowed
# to run a time across the whole llama-swap instance
"group1":
# swap: controls the model swapping behaviour in within the group
# - optional, default: true
# - true : only one model is allowed to run at a time
# - false: all models can run together, no swapping
swap: true
# exclusive: controls how the group affects other groups
# - optional, default: true
# - true: causes all other groups to unload when this group runs a model
# - false: does not affect other groups
exclusive: true
# members references the models defined above
# required
members:
- "llama"
- "qwen-unlisted"
# Example:
# - in group2 all models can run at the same time
# - when a different group is loaded it causes all running models in this group to unload
"group2":
swap: false
# exclusive: false does not unload other groups when a model in group2 is requested
# - the models in group2 will be loaded but will not unload any other groups
exclusive: false
members:
- "docker-llama"
- "modelA"
- "modelB"
# Example:
# - a persistent group, prevents other groups from unloading it
"forever":
# persistent: prevents over groups from unloading the models in this group
# - optional, default: false
# - does not affect individual model behaviour
persistent: true
# set swap/exclusive to false to prevent swapping inside the group
# and the unloading of other groups
swap: false
exclusive: false
members:
- "forever-modelA"
- "forever-modelB"
- "forever-modelc"
# hooks: a dictionary of event triggers and actions
# - optional, default: empty dictionary
# - the only supported hook is on_startup
hooks:
# on_startup: a dictionary of actions to perform on startup
# - optional, default: empty dictionary
# - the only supported action is preload
on_startup:
# preload: a list of model ids to load on startup
# - optional, default: empty list
# - model names must match keys in the models sections
# - when preloading multiple models at once, define a group
# otherwise models will be loaded and swapped out
preload:
- "llama"
# peers: a dictionary of remote peers and models they provide
# - optional, default empty dictionary
# - peers can be another llama-swap
# - peers can be any server that provides the /v1/ generative api endpoints supported by llama-swap
peers:
# keys is the peer'd ID
llama-swap-peer:
# proxy: a valid base URL to proxy requests to
# - required
# - requested path to llama-swap will be appended to the end of the proxy value
proxy: http://192.168.1.23
# models: a list of models served by the peer
# - required
models:
- model_a
- model_b
- embeddings/model_c
openrouter:
proxy: https://openrouter.ai/api
# apiKey: a string key to be injected into the request
# - optional, default: ""
# - if blank, no key will be added to the request
# - key will be injected into headers: Authorization: Bearer <key> and x-api-key: <key>
apiKey: sk-your-openrouter-key
models:
- meta-llama/llama-3.1-8b-instruct
- qwen/qwen3-235b-a22b-2507
- deepseek/deepseek-v3.2
- z-ai/glm-4.7
- moonshotai/kimi-k2-0905
- minimax/minimax-m2.1
```
+9
View File
@@ -0,0 +1,9 @@
## Container Security
For convenience, the default container images use the **root** user within the container. This permits simplified access to host resources including volume mounts and hardware devices under `/dev/dri` (_for Vulkan support_). But this can widen the attack surface to privilege escalation exploits.
Alternative images, tagged as `non-root`, are also available. For example, `llama-swap:cpu-non-root` uses the unprivileged **app** user by default. Depending on deployment requirements, additional configuration may be necessary to ensure that the container retains access to required hosts resources. This might entail customizing host filesystem permissions/ownership appropriately or injecting host group membership into the container.
Docker offers a [system-wide option enabling user namespace remapping](https://docs.docker.com/engine/security/userns-remap/) to accommodate situations were a **root** container user is required but also mentions that _"The best way to prevent privilege-escalation attacks from within a container is to configure your container's applications to run as unprivileged users."_ Podman offers similar capability, per-container, to [set UID/GID mapping in a new user namespace](https://docs.podman.io/en/latest/markdown/podman-run.1.html#set-uid-gid-mapping-in-a-new-user-namespace).
The Large Language Model (_LLM/AI_) ecosystem is rapidly evolving and [serious security vulnerabilities have surfaced in the past](https://huggingface.co/docs/hub/security-pickle). These alternative _non-root_ images could reduce the impact of future unknown problems. However, proper planning and configuration is recommended to utilize them.
+5 -5
View File
@@ -1,6 +1,6 @@
module github.com/mostlygeek/llama-swap
go 1.23.0
go 1.25.4
require (
github.com/billziss-gh/golib v0.2.0
@@ -37,9 +37,9 @@ require (
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
github.com/ugorji/go/codec v1.2.12 // indirect
golang.org/x/arch v0.8.0 // indirect
golang.org/x/crypto v0.36.0 // indirect
golang.org/x/net v0.38.0 // indirect
golang.org/x/sys v0.31.0 // indirect
golang.org/x/text v0.23.0 // indirect
golang.org/x/crypto v0.45.0 // indirect
golang.org/x/net v0.47.0 // indirect
golang.org/x/sys v0.38.0 // indirect
golang.org/x/text v0.31.0 // indirect
google.golang.org/protobuf v1.34.1 // indirect
)
+8 -8
View File
@@ -80,16 +80,16 @@ github.com/ugorji/go/codec v1.2.12/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZ
golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
golang.org/x/arch v0.8.0 h1:3wRIsP3pM4yUptoR96otTUOXI367OS0+c9eeRi9doIc=
golang.org/x/arch v0.8.0/go.mod h1:FEVrYAQjsQXMVJ1nsMoVVXPZg6p2JE2mx8psSWTDQys=
golang.org/x/crypto v0.36.0 h1:AnAEvhDddvBdpY+uR+MyHmuZzzNqXSe/GvuDeob5L34=
golang.org/x/crypto v0.36.0/go.mod h1:Y4J0ReaxCR1IMaabaSMugxJES1EpwhBHhv2bDHklZvc=
golang.org/x/net v0.38.0 h1:vRMAPTMaeGqVhG5QyLJHqNDwecKTomGeqbnfZyKlBI8=
golang.org/x/net v0.38.0/go.mod h1:ivrbrMbzFq5J41QOQh0siUuly180yBYtLp+CKbEaFx8=
golang.org/x/crypto v0.45.0 h1:jMBrvKuj23MTlT0bQEOBcAE0mjg8mK9RXFhRH6nyF3Q=
golang.org/x/crypto v0.45.0/go.mod h1:XTGrrkGJve7CYK7J8PEww4aY7gM3qMCElcJQ8n8JdX4=
golang.org/x/net v0.47.0 h1:Mx+4dIFzqraBXUugkia1OOvlD6LemFo1ALMHjrXDOhY=
golang.org/x/net v0.47.0/go.mod h1:/jNxtkgq5yWUGYkaZGqo27cfGZ1c5Nen03aYrrKpVRU=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.31.0 h1:ioabZlmFYtWhL+TRYpcnNlLwhyxaM9kWTDEmfnprqik=
golang.org/x/sys v0.31.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/text v0.23.0 h1:D71I7dUrlY+VX0gQShAThNGHFxZ13dGLBHQLVl1mJlY=
golang.org/x/text v0.23.0/go.mod h1:/BLNzu4aZCJ1+kcD0DNRotWKage4q2rGVAg4o22unh4=
golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc=
golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/text v0.31.0 h1:aC8ghyu4JhP8VojJ2lEHBnochRno1sgL6nEi9WGFGMM=
golang.org/x/text v0.31.0/go.mod h1:tKRAlv61yKIjGGHX/4tP1LTbc13YSec1pxVEWXzfoeM=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543 h1:E7g+9GITq07hpfrRu66IVDexMakfv52eLZ2CXBWiKr4=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
google.golang.org/protobuf v1.34.1 h1:9ddQBjfCyZPOHPUiPxpYESBLc+T8P3E+Vo4IbKZgFWg=
+40 -9
View File
@@ -16,6 +16,7 @@ import (
"github.com/gin-gonic/gin"
"github.com/mostlygeek/llama-swap/event"
"github.com/mostlygeek/llama-swap/proxy"
"github.com/mostlygeek/llama-swap/proxy/config"
)
var (
@@ -27,7 +28,9 @@ var (
func main() {
// Define a command-line flag for the port
configPath := flag.String("config", "config.yaml", "config file name")
listenStr := flag.String("listen", ":8080", "listen ip/port")
listenStr := flag.String("listen", "", "listen ip/port")
certFile := flag.String("tls-cert-file", "", "TLS certificate file")
keyFile := flag.String("tls-key-file", "", "TLS key file")
showVersion := flag.Bool("version", false, "show version of build")
watchConfig := flag.Bool("watch-config", false, "Automatically reload config file on change")
@@ -38,13 +41,13 @@ func main() {
os.Exit(0)
}
config, err := proxy.LoadConfig(*configPath)
conf, err := config.LoadConfig(*configPath)
if err != nil {
fmt.Printf("Error loading config: %v\n", err)
os.Exit(1)
}
if len(config.Profiles) > 0 {
if len(conf.Profiles) > 0 {
fmt.Println("WARNING: Profile functionality has been removed in favor of Groups. See the README for more information.")
}
@@ -54,6 +57,23 @@ func main() {
gin.SetMode(gin.ReleaseMode)
}
// Validate TLS flags.
var useTLS = (*certFile != "" && *keyFile != "")
if (*certFile != "" && *keyFile == "") ||
(*certFile == "" && *keyFile != "") {
fmt.Println("Error: Both --tls-cert-file and --tls-key-file must be provided for TLS.")
os.Exit(1)
}
// Set default ports.
if *listenStr == "" {
defaultPort := ":8080"
if useTLS {
defaultPort = ":8443"
}
listenStr = &defaultPort
}
// Setup channels for server management
exitChan := make(chan struct{})
sigChan := make(chan os.Signal, 1)
@@ -67,7 +87,7 @@ func main() {
// Support for watching config and reloading when it changes
reloadProxyManager := func() {
if currentPM, ok := srv.Handler.(*proxy.ProxyManager); ok {
config, err = proxy.LoadConfig(*configPath)
conf, err = config.LoadConfig(*configPath)
if err != nil {
fmt.Printf("Warning, unable to reload configuration: %v\n", err)
return
@@ -75,7 +95,9 @@ func main() {
fmt.Println("Configuration Changed")
currentPM.Shutdown()
srv.Handler = proxy.New(config)
newPM := proxy.New(conf)
newPM.SetVersion(date, commit, version)
srv.Handler = newPM
fmt.Println("Configuration Reloaded")
// wait a few seconds and tell any UI to reload
@@ -85,12 +107,14 @@ func main() {
})
})
} else {
config, err = proxy.LoadConfig(*configPath)
conf, err = config.LoadConfig(*configPath)
if err != nil {
fmt.Printf("Error, unable to load configuration: %v\n", err)
os.Exit(1)
}
srv.Handler = proxy.New(config)
newPM := proxy.New(conf)
newPM.SetVersion(date, commit, version)
srv.Handler = newPM
}
}
@@ -166,9 +190,16 @@ func main() {
}()
// Start server
fmt.Printf("llama-swap listening on %s\n", *listenStr)
go func() {
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
var err error
if useTLS {
fmt.Printf("llama-swap listening with TLS on https://%s\n", *listenStr)
err = srv.ListenAndServeTLS(*certFile, *keyFile)
} else {
fmt.Printf("llama-swap listening on http://%s\n", *listenStr)
err = srv.ListenAndServe()
}
if err != nil && err != http.ErrServerClosed {
log.Fatalf("Fatal server error: %v\n", err)
}
}()
Binary file not shown.

Before

Width:  |  Height:  |  Size: 51 KiB

-460
View File
@@ -1,460 +0,0 @@
package proxy
import (
"fmt"
"io"
"os"
"regexp"
"runtime"
"slices"
"sort"
"strconv"
"strings"
"github.com/billziss-gh/golib/shlex"
"gopkg.in/yaml.v3"
)
const DEFAULT_GROUP_ID = "(default)"
type ModelConfig struct {
Cmd string `yaml:"cmd"`
CmdStop string `yaml:"cmdStop"`
Proxy string `yaml:"proxy"`
Aliases []string `yaml:"aliases"`
Env []string `yaml:"env"`
CheckEndpoint string `yaml:"checkEndpoint"`
UnloadAfter int `yaml:"ttl"`
Unlisted bool `yaml:"unlisted"`
UseModelName string `yaml:"useModelName"`
// #179 for /v1/models
Name string `yaml:"name"`
Description string `yaml:"description"`
// Limit concurrency of HTTP requests to process
ConcurrencyLimit int `yaml:"concurrencyLimit"`
// Model filters see issue #174
Filters ModelFilters `yaml:"filters"`
}
func (m *ModelConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
type rawModelConfig ModelConfig
defaults := rawModelConfig{
Cmd: "",
CmdStop: "",
Proxy: "http://localhost:${PORT}",
Aliases: []string{},
Env: []string{},
CheckEndpoint: "/health",
UnloadAfter: 0,
Unlisted: false,
UseModelName: "",
ConcurrencyLimit: 0,
Name: "",
Description: "",
}
// the default cmdStop to taskkill /f /t /pid ${PID}
if runtime.GOOS == "windows" {
defaults.CmdStop = "taskkill /f /t /pid ${PID}"
}
if err := unmarshal(&defaults); err != nil {
return err
}
*m = ModelConfig(defaults)
return nil
}
func (m *ModelConfig) SanitizedCommand() ([]string, error) {
return SanitizeCommand(m.Cmd)
}
// ModelFilters see issue #174
type ModelFilters struct {
StripParams string `yaml:"strip_params"`
}
func (m *ModelFilters) UnmarshalYAML(unmarshal func(interface{}) error) error {
type rawModelFilters ModelFilters
defaults := rawModelFilters{
StripParams: "",
}
if err := unmarshal(&defaults); err != nil {
return err
}
*m = ModelFilters(defaults)
return nil
}
func (f ModelFilters) SanitizedStripParams() ([]string, error) {
if f.StripParams == "" {
return nil, nil
}
params := strings.Split(f.StripParams, ",")
cleaned := make([]string, 0, len(params))
for _, param := range params {
trimmed := strings.TrimSpace(param)
if trimmed == "model" || trimmed == "" {
continue
}
cleaned = append(cleaned, trimmed)
}
// sort cleaned
slices.Sort(cleaned)
return cleaned, nil
}
type GroupConfig struct {
Swap bool `yaml:"swap"`
Exclusive bool `yaml:"exclusive"`
Persistent bool `yaml:"persistent"`
Members []string `yaml:"members"`
}
// set default values for GroupConfig
func (c *GroupConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
type rawGroupConfig GroupConfig
defaults := rawGroupConfig{
Swap: true,
Exclusive: true,
Persistent: false,
Members: []string{},
}
if err := unmarshal(&defaults); err != nil {
return err
}
*c = GroupConfig(defaults)
return nil
}
type HooksConfig struct {
OnStartup HookOnStartup `yaml:"on_startup"`
}
type HookOnStartup struct {
Preload []string `yaml:"preload"`
}
type Config struct {
HealthCheckTimeout int `yaml:"healthCheckTimeout"`
LogRequests bool `yaml:"logRequests"`
LogLevel string `yaml:"logLevel"`
MetricsMaxInMemory int `yaml:"metricsMaxInMemory"`
Models map[string]ModelConfig `yaml:"models"` /* key is model ID */
Profiles map[string][]string `yaml:"profiles"`
Groups map[string]GroupConfig `yaml:"groups"` /* key is group ID */
// for key/value replacements in model's cmd, cmdStop, proxy, checkEndPoint
Macros map[string]string `yaml:"macros"`
// map aliases to actual model IDs
aliases map[string]string
// automatic port assignments
StartPort int `yaml:"startPort"`
// hooks, see: #209
Hooks HooksConfig `yaml:"hooks"`
}
func (c *Config) RealModelName(search string) (string, bool) {
if _, found := c.Models[search]; found {
return search, true
} else if name, found := c.aliases[search]; found {
return name, found
} else {
return "", false
}
}
func (c *Config) FindConfig(modelName string) (ModelConfig, string, bool) {
if realName, found := c.RealModelName(modelName); !found {
return ModelConfig{}, "", false
} else {
return c.Models[realName], realName, true
}
}
func LoadConfig(path string) (Config, error) {
file, err := os.Open(path)
if err != nil {
return Config{}, err
}
defer file.Close()
return LoadConfigFromReader(file)
}
func LoadConfigFromReader(r io.Reader) (Config, error) {
data, err := io.ReadAll(r)
if err != nil {
return Config{}, err
}
// default configuration values
config := Config{
HealthCheckTimeout: 120,
StartPort: 5800,
LogLevel: "info",
MetricsMaxInMemory: 1000,
}
err = yaml.Unmarshal(data, &config)
if err != nil {
return Config{}, err
}
if config.HealthCheckTimeout < 15 {
// set a minimum of 15 seconds
config.HealthCheckTimeout = 15
}
if config.StartPort < 1 {
return Config{}, fmt.Errorf("startPort must be greater than 1")
}
// Populate the aliases map
config.aliases = make(map[string]string)
for modelName, modelConfig := range config.Models {
for _, alias := range modelConfig.Aliases {
if _, found := config.aliases[alias]; found {
return Config{}, fmt.Errorf("duplicate alias %s found in model: %s", alias, modelName)
}
config.aliases[alias] = modelName
}
}
/* check macro constraint rules:
- name must fit the regex ^[a-zA-Z0-9_-]+$
- names must be less than 64 characters (no reason, just cause)
- name can not be any reserved macros: PORT, MODEL_ID
- macro values must be less than 1024 characters
*/
macroNameRegex := regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
for macroName, macroValue := range config.Macros {
if len(macroName) >= 64 {
return Config{}, fmt.Errorf("macro name '%s' exceeds maximum length of 63 characters", macroName)
}
if !macroNameRegex.MatchString(macroName) {
return Config{}, fmt.Errorf("macro name '%s' contains invalid characters, must match pattern ^[a-zA-Z0-9_-]+$", macroName)
}
if len(macroValue) >= 1024 {
return Config{}, fmt.Errorf("macro value for '%s' exceeds maximum length of 1024 characters", macroName)
}
switch macroName {
case "PORT":
case "MODEL_ID":
return Config{}, fmt.Errorf("macro name '%s' is reserved and cannot be used", macroName)
}
}
// Get and sort all model IDs first, makes testing more consistent
modelIds := make([]string, 0, len(config.Models))
for modelId := range config.Models {
modelIds = append(modelIds, modelId)
}
sort.Strings(modelIds) // This guarantees stable iteration order
nextPort := config.StartPort
for _, modelId := range modelIds {
modelConfig := config.Models[modelId]
// Strip comments from command fields before macro expansion
modelConfig.Cmd = StripComments(modelConfig.Cmd)
modelConfig.CmdStop = StripComments(modelConfig.CmdStop)
// go through model config fields: cmd, cmdStop, proxy, checkEndPoint and replace macros with macro values
for macroName, macroValue := range config.Macros {
macroSlug := fmt.Sprintf("${%s}", macroName)
modelConfig.Cmd = strings.ReplaceAll(modelConfig.Cmd, macroSlug, macroValue)
modelConfig.CmdStop = strings.ReplaceAll(modelConfig.CmdStop, macroSlug, macroValue)
modelConfig.Proxy = strings.ReplaceAll(modelConfig.Proxy, macroSlug, macroValue)
modelConfig.CheckEndpoint = strings.ReplaceAll(modelConfig.CheckEndpoint, macroSlug, macroValue)
modelConfig.Filters.StripParams = strings.ReplaceAll(modelConfig.Filters.StripParams, macroSlug, macroValue)
}
// enforce ${PORT} used in both cmd and proxy
if !strings.Contains(modelConfig.Cmd, "${PORT}") && strings.Contains(modelConfig.Proxy, "${PORT}") {
return Config{}, fmt.Errorf("model %s: proxy uses ${PORT} but cmd does not - ${PORT} is only available when used in cmd", modelId)
}
// only iterate over models that use ${PORT} to keep port numbers from increasing unnecessarily
if strings.Contains(modelConfig.Cmd, "${PORT}") || strings.Contains(modelConfig.Proxy, "${PORT}") || strings.Contains(modelConfig.CmdStop, "${PORT}") {
nextPortStr := strconv.Itoa(nextPort)
modelConfig.Cmd = strings.ReplaceAll(modelConfig.Cmd, "${PORT}", nextPortStr)
modelConfig.CmdStop = strings.ReplaceAll(modelConfig.CmdStop, "${PORT}", nextPortStr)
modelConfig.Proxy = strings.ReplaceAll(modelConfig.Proxy, "${PORT}", nextPortStr)
nextPort++
}
if strings.Contains(modelConfig.Cmd, "${MODEL_ID}") || strings.Contains(modelConfig.CmdStop, "${MODEL_ID}") {
modelConfig.Cmd = strings.ReplaceAll(modelConfig.Cmd, "${MODEL_ID}", modelId)
modelConfig.CmdStop = strings.ReplaceAll(modelConfig.CmdStop, "${MODEL_ID}", modelId)
}
// make sure there are no unknown macros that have not been replaced
macroPattern := regexp.MustCompile(`\$\{([a-zA-Z0-9_-]+)\}`)
fieldMap := map[string]string{
"cmd": modelConfig.Cmd,
"cmdStop": modelConfig.CmdStop,
"proxy": modelConfig.Proxy,
"checkEndpoint": modelConfig.CheckEndpoint,
}
for fieldName, fieldValue := range fieldMap {
matches := macroPattern.FindAllStringSubmatch(fieldValue, -1)
for _, match := range matches {
macroName := match[1]
if macroName == "PID" && fieldName == "cmdStop" {
continue // this is ok, has to be replaced by process later
}
if _, exists := config.Macros[macroName]; !exists {
return Config{}, fmt.Errorf("unknown macro '${%s}' found in %s.%s", macroName, modelId, fieldName)
}
}
}
config.Models[modelId] = modelConfig
}
config = AddDefaultGroupToConfig(config)
// check that members are all unique in the groups
memberUsage := make(map[string]string) // maps member to group it appears in
for groupID, groupConfig := range config.Groups {
prevSet := make(map[string]bool)
for _, member := range groupConfig.Members {
// Check for duplicates within this group
if _, found := prevSet[member]; found {
return Config{}, fmt.Errorf("duplicate model member %s found in group: %s", member, groupID)
}
prevSet[member] = true
// Check if member is used in another group
if existingGroup, exists := memberUsage[member]; exists {
return Config{}, fmt.Errorf("model member %s is used in multiple groups: %s and %s", member, existingGroup, groupID)
}
memberUsage[member] = groupID
}
}
// clean up hooks preload
if len(config.Hooks.OnStartup.Preload) > 0 {
var toPreload []string
for _, modelID := range config.Hooks.OnStartup.Preload {
modelID = strings.TrimSpace(modelID)
if modelID == "" {
continue
}
if real, found := config.RealModelName(modelID); found {
toPreload = append(toPreload, real)
}
}
config.Hooks.OnStartup.Preload = toPreload
}
return config, nil
}
// rewrites the yaml to include a default group with any orphaned models
func AddDefaultGroupToConfig(config Config) Config {
if config.Groups == nil {
config.Groups = make(map[string]GroupConfig)
}
defaultGroup := GroupConfig{
Swap: true,
Exclusive: true,
Members: []string{},
}
// if groups is empty, create a default group and put
// all models into it
if len(config.Groups) == 0 {
for modelName := range config.Models {
defaultGroup.Members = append(defaultGroup.Members, modelName)
}
} else {
// iterate over existing group members and add non-grouped models into the default group
for modelName, _ := range config.Models {
foundModel := false
found:
// search for the model in existing groups
for _, groupConfig := range config.Groups {
for _, member := range groupConfig.Members {
if member == modelName {
foundModel = true
break found
}
}
}
if !foundModel {
defaultGroup.Members = append(defaultGroup.Members, modelName)
}
}
}
sort.Strings(defaultGroup.Members) // make consistent ordering for testing
config.Groups[DEFAULT_GROUP_ID] = defaultGroup
return config
}
func SanitizeCommand(cmdStr string) ([]string, error) {
var cleanedLines []string
for _, line := range strings.Split(cmdStr, "\n") {
trimmed := strings.TrimSpace(line)
// Skip comment lines
if strings.HasPrefix(trimmed, "#") {
continue
}
// Handle trailing backslashes by replacing with space
if strings.HasSuffix(trimmed, "\\") {
cleanedLines = append(cleanedLines, strings.TrimSuffix(trimmed, "\\")+" ")
} else {
cleanedLines = append(cleanedLines, line)
}
}
// put it back together
cmdStr = strings.Join(cleanedLines, "\n")
// Split the command into arguments
var args []string
if runtime.GOOS == "windows" {
args = shlex.Windows.Split(cmdStr)
} else {
args = shlex.Posix.Split(cmdStr)
}
// Ensure the command is not empty
if len(args) == 0 {
return nil, fmt.Errorf("empty command")
}
return args, nil
}
func StripComments(cmdStr string) string {
var cleanedLines []string
for _, line := range strings.Split(cmdStr, "\n") {
trimmed := strings.TrimSpace(line)
// Skip comment lines
if strings.HasPrefix(trimmed, "#") {
continue
}
cleanedLines = append(cleanedLines, line)
}
return strings.Join(cleanedLines, "\n")
}
+795
View File
@@ -0,0 +1,795 @@
package config
import (
"fmt"
"io"
"net/url"
"os"
"regexp"
"runtime"
"sort"
"strings"
"github.com/billziss-gh/golib/shlex"
"gopkg.in/yaml.v3"
)
const DEFAULT_GROUP_ID = "(default)"
const (
LogToStdoutProxy = "proxy"
LogToStdoutUpstream = "upstream"
LogToStdoutBoth = "both"
LogToStdoutNone = "none"
)
type MacroEntry struct {
Name string
Value any
}
type MacroList []MacroEntry
// UnmarshalYAML implements custom YAML unmarshaling that preserves macro definition order
func (ml *MacroList) UnmarshalYAML(value *yaml.Node) error {
if value.Kind != yaml.MappingNode {
return fmt.Errorf("macros must be a mapping")
}
// yaml.Node.Content for a mapping contains alternating key/value nodes
entries := make([]MacroEntry, 0, len(value.Content)/2)
for i := 0; i < len(value.Content); i += 2 {
keyNode := value.Content[i]
valueNode := value.Content[i+1]
var name string
if err := keyNode.Decode(&name); err != nil {
return fmt.Errorf("failed to decode macro name: %w", err)
}
var val any
if err := valueNode.Decode(&val); err != nil {
return fmt.Errorf("failed to decode macro value for '%s': %w", name, err)
}
entries = append(entries, MacroEntry{Name: name, Value: val})
}
*ml = entries
return nil
}
// Get retrieves a macro value by name
func (ml MacroList) Get(name string) (any, bool) {
for _, entry := range ml {
if entry.Name == name {
return entry.Value, true
}
}
return nil, false
}
// ToMap converts MacroList to a map (for backward compatibility if needed)
func (ml MacroList) ToMap() map[string]any {
result := make(map[string]any, len(ml))
for _, entry := range ml {
result[entry.Name] = entry.Value
}
return result
}
type GroupConfig struct {
Swap bool `yaml:"swap"`
Exclusive bool `yaml:"exclusive"`
Persistent bool `yaml:"persistent"`
Members []string `yaml:"members"`
}
var (
macroNameRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
macroPatternRegex = regexp.MustCompile(`\$\{([a-zA-Z0-9_-]+)\}`)
envMacroRegex = regexp.MustCompile(`\$\{env\.([a-zA-Z_][a-zA-Z0-9_]*)\}`)
)
// set default values for GroupConfig
func (c *GroupConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
type rawGroupConfig GroupConfig
defaults := rawGroupConfig{
Swap: true,
Exclusive: true,
Persistent: false,
Members: []string{},
}
if err := unmarshal(&defaults); err != nil {
return err
}
*c = GroupConfig(defaults)
return nil
}
type HooksConfig struct {
OnStartup HookOnStartup `yaml:"on_startup"`
}
type HookOnStartup struct {
Preload []string `yaml:"preload"`
}
type Config struct {
HealthCheckTimeout int `yaml:"healthCheckTimeout"`
LogRequests bool `yaml:"logRequests"`
LogLevel string `yaml:"logLevel"`
LogTimeFormat string `yaml:"logTimeFormat"`
LogToStdout string `yaml:"logToStdout"`
MetricsMaxInMemory int `yaml:"metricsMaxInMemory"`
CaptureBuffer int `yaml:"captureBuffer"`
GlobalTTL int `yaml:"globalTTL"`
Models map[string]ModelConfig `yaml:"models"` /* key is model ID */
Profiles map[string][]string `yaml:"profiles"`
Groups map[string]GroupConfig `yaml:"groups"` /* key is group ID */
// for key/value replacements in model's cmd, cmdStop, proxy, checkEndPoint
Macros MacroList `yaml:"macros"`
// map aliases to actual model IDs
aliases map[string]string
// automatic port assignments
StartPort int `yaml:"startPort"`
// hooks, see: #209
Hooks HooksConfig `yaml:"hooks"`
// send loading state in reasoning
SendLoadingState bool `yaml:"sendLoadingState"`
// present aliases to /v1/models OpenAI API listing
IncludeAliasesInList bool `yaml:"includeAliasesInList"`
// support API keys, see issue #433, #50, #251
RequiredAPIKeys []string `yaml:"apiKeys"`
// support remote peers, see issue #433, #296
Peers PeerDictionaryConfig `yaml:"peers"`
}
func (c *Config) RealModelName(search string) (string, bool) {
if _, found := c.Models[search]; found {
return search, true
} else if name, found := c.aliases[search]; found {
return name, found
} else {
return "", false
}
}
func (c *Config) FindConfig(modelName string) (ModelConfig, string, bool) {
if realName, found := c.RealModelName(modelName); !found {
return ModelConfig{}, "", false
} else {
return c.Models[realName], realName, true
}
}
func LoadConfig(path string) (Config, error) {
file, err := os.Open(path)
if err != nil {
return Config{}, err
}
defer file.Close()
return LoadConfigFromReader(file)
}
func LoadConfigFromReader(r io.Reader) (Config, error) {
data, err := io.ReadAll(r)
if err != nil {
return Config{}, err
}
yamlStr := string(data)
// Phase 1: Substitute all ${env.VAR} macros at string level
// This is safe because env values are simple strings without YAML formatting
yamlStr, err = substituteEnvMacros(yamlStr)
if err != nil {
return Config{}, err
}
// Unmarshal into full Config with defaults
config := Config{
HealthCheckTimeout: 120,
StartPort: 5800,
LogLevel: "info",
LogTimeFormat: "",
LogToStdout: LogToStdoutProxy,
MetricsMaxInMemory: 1000,
CaptureBuffer: 5,
GlobalTTL: 0,
}
if err = yaml.Unmarshal([]byte(yamlStr), &config); err != nil {
return Config{}, err
}
if config.HealthCheckTimeout < 15 {
config.HealthCheckTimeout = 15
}
if config.StartPort < 1 {
return Config{}, fmt.Errorf("startPort must be greater than 1")
}
if config.GlobalTTL < 0 {
return Config{}, fmt.Errorf("globalTTL must be >= 0")
}
switch config.LogToStdout {
case LogToStdoutProxy, LogToStdoutUpstream, LogToStdoutBoth, LogToStdoutNone:
default:
return Config{}, fmt.Errorf("logToStdout must be one of: proxy, upstream, both, none")
}
// Populate the aliases map
config.aliases = make(map[string]string)
for modelName, modelConfig := range config.Models {
for _, alias := range modelConfig.Aliases {
if _, found := config.aliases[alias]; found {
return Config{}, fmt.Errorf("duplicate alias %s found in model: %s", alias, modelName)
}
config.aliases[alias] = modelName
}
}
// Validate global macros
for _, macro := range config.Macros {
if err = validateMacro(macro.Name, macro.Value); err != nil {
return Config{}, err
}
}
// Get and sort all model IDs for consistent port assignment
modelIds := make([]string, 0, len(config.Models))
for modelId := range config.Models {
modelIds = append(modelIds, modelId)
}
sort.Strings(modelIds)
nextPort := config.StartPort
for _, modelId := range modelIds {
modelConfig := config.Models[modelId]
// Strip comments from command fields
modelConfig.Cmd = StripComments(modelConfig.Cmd)
modelConfig.CmdStop = StripComments(modelConfig.CmdStop)
// set model TTL to globalTTL it is the default value
if modelConfig.UnloadAfter == MODEL_CONFIG_DEFAULT_TTL {
modelConfig.UnloadAfter = config.GlobalTTL
}
if modelConfig.UnloadAfter < 0 {
return Config{}, fmt.Errorf("model %s: invalid TTL value %d", modelId, modelConfig.UnloadAfter)
}
// Validate model macros
for _, macro := range modelConfig.Macros {
if err = validateMacro(macro.Name, macro.Value); err != nil {
return Config{}, fmt.Errorf("model %s: %s", modelId, err.Error())
}
}
// Build merged macro list: MODEL_ID + global macros + model macros (model overrides global)
mergedMacros := make(MacroList, 0, len(config.Macros)+len(modelConfig.Macros)+1)
mergedMacros = append(mergedMacros, MacroEntry{Name: "MODEL_ID", Value: modelId})
mergedMacros = append(mergedMacros, config.Macros...)
// Add model macros (override globals with same name)
for _, entry := range modelConfig.Macros {
found := false
for i, existing := range mergedMacros {
if existing.Name == entry.Name {
mergedMacros[i] = entry
found = true
break
}
}
if !found {
mergedMacros = append(mergedMacros, entry)
}
}
// Substitute remaining macros in model fields (LIFO order)
for i := len(mergedMacros) - 1; i >= 0; i-- {
entry := mergedMacros[i]
macroSlug := fmt.Sprintf("${%s}", entry.Name)
macroStr := fmt.Sprintf("%v", entry.Value)
modelConfig.Cmd = strings.ReplaceAll(modelConfig.Cmd, macroSlug, macroStr)
modelConfig.CmdStop = strings.ReplaceAll(modelConfig.CmdStop, macroSlug, macroStr)
modelConfig.Proxy = strings.ReplaceAll(modelConfig.Proxy, macroSlug, macroStr)
modelConfig.CheckEndpoint = strings.ReplaceAll(modelConfig.CheckEndpoint, macroSlug, macroStr)
modelConfig.Filters.StripParams = strings.ReplaceAll(modelConfig.Filters.StripParams, macroSlug, macroStr)
// Substitute macros in SetParamsByID keys and values
if len(modelConfig.Filters.SetParamsByID) > 0 {
newSetParamsByID := make(map[string]map[string]any, len(modelConfig.Filters.SetParamsByID))
for key, paramMap := range modelConfig.Filters.SetParamsByID {
newKey := strings.ReplaceAll(key, macroSlug, macroStr)
newValAny, err := substituteMacroInValue(any(paramMap), entry.Name, entry.Value)
if err != nil {
return Config{}, fmt.Errorf("model %s filters.setParamsByID: %s", modelId, err.Error())
}
newParamMap, ok := newValAny.(map[string]any)
if !ok {
return Config{}, fmt.Errorf("model %s filters.setParamsByID: unexpected type after macro substitution", modelId)
}
newSetParamsByID[newKey] = newParamMap
}
modelConfig.Filters.SetParamsByID = newSetParamsByID
}
// Substitute in metadata (type-preserving)
if len(modelConfig.Metadata) > 0 {
result, err := substituteMacroInValue(modelConfig.Metadata, entry.Name, entry.Value)
if err != nil {
return Config{}, fmt.Errorf("model %s metadata: %s", modelId, err.Error())
}
modelConfig.Metadata = result.(map[string]any)
}
}
// Handle PORT macro - only allocate if cmd uses it
cmdHasPort := strings.Contains(modelConfig.Cmd, "${PORT}")
proxyHasPort := strings.Contains(modelConfig.Proxy, "${PORT}")
if cmdHasPort || proxyHasPort {
if !cmdHasPort && proxyHasPort {
return Config{}, fmt.Errorf("model %s: proxy uses ${PORT} but cmd does not - ${PORT} is only available when used in cmd", modelId)
}
macroSlug := "${PORT}"
macroStr := fmt.Sprintf("%v", nextPort)
modelConfig.Cmd = strings.ReplaceAll(modelConfig.Cmd, macroSlug, macroStr)
modelConfig.CmdStop = strings.ReplaceAll(modelConfig.CmdStop, macroSlug, macroStr)
modelConfig.Proxy = strings.ReplaceAll(modelConfig.Proxy, macroSlug, macroStr)
if len(modelConfig.Metadata) > 0 {
result, err := substituteMacroInValue(modelConfig.Metadata, "PORT", nextPort)
if err != nil {
return Config{}, fmt.Errorf("model %s metadata: %s", modelId, err.Error())
}
modelConfig.Metadata = result.(map[string]any)
}
nextPort++
}
// Validate no unknown macros remain
fieldMap := map[string]string{
"cmd": modelConfig.Cmd,
"cmdStop": modelConfig.CmdStop,
"proxy": modelConfig.Proxy,
"checkEndpoint": modelConfig.CheckEndpoint,
"filters.stripParams": modelConfig.Filters.StripParams,
}
for fieldName, fieldValue := range fieldMap {
matches := macroPatternRegex.FindAllStringSubmatch(fieldValue, -1)
for _, match := range matches {
macroName := match[1]
if macroName == "PID" && fieldName == "cmdStop" {
continue // replaced at runtime
}
if macroName == "PORT" || macroName == "MODEL_ID" {
return Config{}, fmt.Errorf("macro '${%s}' should have been substituted in %s.%s", macroName, modelId, fieldName)
}
return Config{}, fmt.Errorf("unknown macro '${%s}' found in %s.%s", macroName, modelId, fieldName)
}
}
if len(modelConfig.Metadata) > 0 {
if err := validateNestedForUnknownMacros(modelConfig.Metadata, fmt.Sprintf("model %s metadata", modelId)); err != nil {
return Config{}, err
}
}
// Validate SetParamsByID keys and values
for key, paramMap := range modelConfig.Filters.SetParamsByID {
if matches := macroPatternRegex.FindAllStringSubmatch(key, -1); len(matches) > 0 {
return Config{}, fmt.Errorf("unknown macro '${%s}' found in model %s filters.setParamsByID key", matches[0][1], modelId)
}
if err := validateNestedForUnknownMacros(any(paramMap), fmt.Sprintf("model %s filters.setParamsByID[%s]", modelId, key)); err != nil {
return Config{}, err
}
}
// Auto-register setParamsByID keys as aliases (skip the model's own ID)
for key := range modelConfig.Filters.SetParamsByID {
if key == modelId {
continue
}
if _, exists := config.Models[key]; exists {
return Config{}, fmt.Errorf("model %s filters.setParamsByID: key '%s' conflicts with an existing model ID", modelId, key)
}
if existingModel, exists := config.aliases[key]; exists {
if existingModel != modelId {
return Config{}, fmt.Errorf("duplicate alias '%s' in model %s filters.setParamsByID, already used by model %s", key, modelId, existingModel)
}
continue // already registered as explicit alias for this model
}
config.aliases[key] = modelId
modelConfig.Aliases = append(modelConfig.Aliases, key)
}
if _, err := url.Parse(modelConfig.Proxy); err != nil {
return Config{}, fmt.Errorf("model %s: invalid proxy URL: %w", modelId, err)
}
if modelConfig.SendLoadingState == nil {
v := config.SendLoadingState
modelConfig.SendLoadingState = &v
}
config.Models[modelId] = modelConfig
}
config = AddDefaultGroupToConfig(config)
// Validate group members
memberUsage := make(map[string]string)
for groupID, groupConfig := range config.Groups {
prevSet := make(map[string]bool)
for _, member := range groupConfig.Members {
if _, found := prevSet[member]; found {
return Config{}, fmt.Errorf("duplicate model member %s found in group: %s", member, groupID)
}
prevSet[member] = true
if existingGroup, exists := memberUsage[member]; exists {
return Config{}, fmt.Errorf("model member %s is used in multiple groups: %s and %s", member, existingGroup, groupID)
}
memberUsage[member] = groupID
}
}
// Clean up hooks preload
if len(config.Hooks.OnStartup.Preload) > 0 {
var toPreload []string
for _, modelID := range config.Hooks.OnStartup.Preload {
modelID = strings.TrimSpace(modelID)
if modelID == "" {
continue
}
if real, found := config.RealModelName(modelID); found {
toPreload = append(toPreload, real)
}
}
config.Hooks.OnStartup.Preload = toPreload
}
// Validate API keys (env macros already substituted at string level)
for i, apikey := range config.RequiredAPIKeys {
if apikey == "" {
return Config{}, fmt.Errorf("empty api key found in apiKeys")
}
if strings.Contains(apikey, " ") {
return Config{}, fmt.Errorf("api key cannot contain spaces: `%s`", apikey)
}
config.RequiredAPIKeys[i] = apikey
}
// Process peers with global macro substitution
for peerName, peerConfig := range config.Peers {
// Substitute global macros (LIFO order)
for i := len(config.Macros) - 1; i >= 0; i-- {
entry := config.Macros[i]
macroSlug := fmt.Sprintf("${%s}", entry.Name)
macroStr := fmt.Sprintf("%v", entry.Value)
peerConfig.ApiKey = strings.ReplaceAll(peerConfig.ApiKey, macroSlug, macroStr)
peerConfig.Filters.StripParams = strings.ReplaceAll(peerConfig.Filters.StripParams, macroSlug, macroStr)
// Substitute in setParams (type-preserving)
if len(peerConfig.Filters.SetParams) > 0 {
result, err := substituteMacroInValue(peerConfig.Filters.SetParams, entry.Name, entry.Value)
if err != nil {
return Config{}, fmt.Errorf("peers.%s.filters.setParams: %w", peerName, err)
}
peerConfig.Filters.SetParams = result.(map[string]any)
}
}
// Validate no unknown macros remain
if matches := macroPatternRegex.FindAllStringSubmatch(peerConfig.ApiKey, -1); len(matches) > 0 {
return Config{}, fmt.Errorf("peers.%s.apiKey: unknown macro '${%s}'", peerName, matches[0][1])
}
if matches := macroPatternRegex.FindAllStringSubmatch(peerConfig.Filters.StripParams, -1); len(matches) > 0 {
return Config{}, fmt.Errorf("peers.%s.filters.stripParams: unknown macro '${%s}'", peerName, matches[0][1])
}
if len(peerConfig.Filters.SetParams) > 0 {
if err := validateNestedForUnknownMacros(peerConfig.Filters.SetParams, fmt.Sprintf("peers.%s.filters.setParams", peerName)); err != nil {
return Config{}, err
}
}
config.Peers[peerName] = peerConfig
}
return config, nil
}
// rewrites the yaml to include a default group with any orphaned models
func AddDefaultGroupToConfig(config Config) Config {
if config.Groups == nil {
config.Groups = make(map[string]GroupConfig)
}
defaultGroup := GroupConfig{
Swap: true,
Exclusive: true,
Members: []string{},
}
// if groups is empty, create a default group and put
// all models into it
if len(config.Groups) == 0 {
for modelName := range config.Models {
defaultGroup.Members = append(defaultGroup.Members, modelName)
}
} else {
// iterate over existing group members and add non-grouped models into the default group
for modelName := range config.Models {
foundModel := false
found:
// search for the model in existing groups
for _, groupConfig := range config.Groups {
for _, member := range groupConfig.Members {
if member == modelName {
foundModel = true
break found
}
}
}
if !foundModel {
defaultGroup.Members = append(defaultGroup.Members, modelName)
}
}
}
sort.Strings(defaultGroup.Members) // make consistent ordering for testing
config.Groups[DEFAULT_GROUP_ID] = defaultGroup
return config
}
func SanitizeCommand(cmdStr string) ([]string, error) {
var cleanedLines []string
for _, line := range strings.Split(cmdStr, "\n") {
trimmed := strings.TrimSpace(line)
// Skip comment lines
if strings.HasPrefix(trimmed, "#") {
continue
}
// Handle trailing backslashes by replacing with space
if strings.HasSuffix(trimmed, "\\") {
cleanedLines = append(cleanedLines, strings.TrimSuffix(trimmed, "\\")+" ")
} else {
cleanedLines = append(cleanedLines, line)
}
}
// put it back together
cmdStr = strings.Join(cleanedLines, "\n")
// Split the command into arguments
var args []string
if runtime.GOOS == "windows" {
args = shlex.Windows.Split(cmdStr)
} else {
args = shlex.Posix.Split(cmdStr)
}
// Ensure the command is not empty
if len(args) == 0 {
return nil, fmt.Errorf("empty command")
}
return args, nil
}
func StripComments(cmdStr string) string {
var cleanedLines []string
for _, line := range strings.Split(cmdStr, "\n") {
trimmed := strings.TrimSpace(line)
// Skip comment lines
if strings.HasPrefix(trimmed, "#") {
continue
}
cleanedLines = append(cleanedLines, line)
}
return strings.Join(cleanedLines, "\n")
}
// validateMacro validates macro name and value constraints
func validateMacro(name string, value any) error {
if len(name) >= 64 {
return fmt.Errorf("macro name '%s' exceeds maximum length of 63 characters", name)
}
if !macroNameRegex.MatchString(name) {
return fmt.Errorf("macro name '%s' contains invalid characters, must match pattern ^[a-zA-Z0-9_-]+$", name)
}
// Validate that value is a scalar type
switch v := value.(type) {
case string:
if len(v) >= 1024 {
return fmt.Errorf("macro value for '%s' exceeds maximum length of 1024 characters", name)
}
// Check for self-reference
macroSlug := fmt.Sprintf("${%s}", name)
if strings.Contains(v, macroSlug) {
return fmt.Errorf("macro '%s' contains self-reference", name)
}
case int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float32, float64, bool:
// These types are allowed
default:
return fmt.Errorf("macro '%s' has invalid type %T, must be a scalar type (string, int, float, or bool)", name, value)
}
switch name {
case "PORT", "MODEL_ID":
return fmt.Errorf("macro name '%s' is reserved", name)
}
return nil
}
// validateNestedForUnknownMacros recursively checks for any remaining macro references in nested structures
func validateNestedForUnknownMacros(value any, context string) error {
switch v := value.(type) {
case string:
matches := macroPatternRegex.FindAllStringSubmatch(v, -1)
for _, match := range matches {
macroName := match[1]
return fmt.Errorf("%s: unknown macro '${%s}'", context, macroName)
}
// Check for unsubstituted env macros
envMatches := envMacroRegex.FindAllStringSubmatch(v, -1)
for _, match := range envMatches {
varName := match[1]
return fmt.Errorf("%s: environment variable '%s' not set", context, varName)
}
return nil
case map[string]any:
for _, val := range v {
if err := validateNestedForUnknownMacros(val, context); err != nil {
return err
}
}
return nil
case []any:
for _, val := range v {
if err := validateNestedForUnknownMacros(val, context); err != nil {
return err
}
}
return nil
default:
// Scalar types don't contain macros
return nil
}
}
// substituteMacroInValue recursively substitutes a single macro in a value structure
// This is called once per macro, allowing LIFO substitution order
func substituteMacroInValue(value any, macroName string, macroValue any) (any, error) {
macroSlug := fmt.Sprintf("${%s}", macroName)
macroStr := fmt.Sprintf("%v", macroValue)
switch v := value.(type) {
case string:
// Check if this is a direct macro substitution
if v == macroSlug {
return macroValue, nil
}
// Handle string interpolation
if strings.Contains(v, macroSlug) {
return strings.ReplaceAll(v, macroSlug, macroStr), nil
}
return v, nil
case map[string]any:
// Recursively process map values
newMap := make(map[string]any)
for key, val := range v {
newVal, err := substituteMacroInValue(val, macroName, macroValue)
if err != nil {
return nil, err
}
newMap[key] = newVal
}
return newMap, nil
case []any:
// Recursively process slice elements
newSlice := make([]any, len(v))
for i, val := range v {
newVal, err := substituteMacroInValue(val, macroName, macroValue)
if err != nil {
return nil, err
}
newSlice[i] = newVal
}
return newSlice, nil
default:
// Return scalar types as-is
return value, nil
}
}
// substituteEnvMacros replaces ${env.VAR_NAME} with environment variable values.
// Returns error if any referenced env var is not set or contains invalid characters.
// Env macros inside YAML comments are ignored by unmarshalling the YAML first
// (which strips comments) and only checking the comment-free version for macros.
func substituteEnvMacros(s string) (string, error) {
// Unmarshal and remarshal to strip YAML comments
var raw any
if err := yaml.Unmarshal([]byte(s), &raw); err != nil {
// If YAML is invalid, fall back to scanning the original string
// so the user gets the env var error rather than a confusing YAML parse error
return substituteEnvMacrosInString(s, s)
}
clean, err := yaml.Marshal(raw)
if err != nil {
return substituteEnvMacrosInString(s, s)
}
return substituteEnvMacrosInString(s, string(clean))
}
// substituteEnvMacrosInString finds ${env.VAR} macros in scanStr and substitutes
// them in target. This separation allows scanning comment-free YAML while
// substituting in the original string.
func substituteEnvMacrosInString(target, scanStr string) (string, error) {
result := target
matches := envMacroRegex.FindAllStringSubmatch(scanStr, -1)
for _, match := range matches {
fullMatch := match[0] // ${env.VAR_NAME}
varName := match[1] // VAR_NAME
value, exists := os.LookupEnv(varName)
if !exists {
return "", fmt.Errorf("environment variable '%s' is not set", varName)
}
// Sanitize the value for safe YAML substitution
value, err := sanitizeEnvValueForYAML(value, varName)
if err != nil {
return "", err
}
result = strings.ReplaceAll(result, fullMatch, value)
}
return result, nil
}
// sanitizeEnvValueForYAML ensures an environment variable value is safe for YAML substitution.
// It rejects values with characters that break YAML structure and escapes quotes/backslashes
// for compatibility with double-quoted YAML strings.
func sanitizeEnvValueForYAML(value, varName string) (string, error) {
// Reject values that would break YAML structure regardless of quoting context
if strings.ContainsAny(value, "\n\r\x00") {
return "", fmt.Errorf("environment variable '%s' contains newlines or null bytes which are not allowed in YAML substitution", varName)
}
// Escape backslashes and double quotes for safe use in double-quoted YAML strings.
// In unquoted contexts, these escapes appear literally (harmless for most use cases).
// In double-quoted contexts, they are interpreted correctly.
value = strings.ReplaceAll(value, `\`, `\\`)
value = strings.ReplaceAll(value, `"`, `\"`)
return value, nil
}
@@ -1,6 +1,6 @@
//go:build !windows
package proxy
package config
import (
"os"
@@ -58,6 +58,7 @@ models:
assert.Equal(t, 120, config.HealthCheckTimeout)
assert.Equal(t, 5800, config.StartPort)
assert.Equal(t, "info", config.LogLevel)
assert.Equal(t, "", config.LogTimeFormat)
// Test default group exists
defaultGroup, exists := config.Groups["(default)"]
@@ -160,51 +161,61 @@ groups:
t.Fatalf("Failed to load config: %v", err)
}
modelLoadingState := false
expected := Config{
LogLevel: "info",
StartPort: 5800,
Macros: map[string]string{
"svr-path": "path/to/server",
LogLevel: "info",
LogTimeFormat: "",
LogToStdout: LogToStdoutProxy,
StartPort: 5800,
Macros: MacroList{
{"svr-path", "path/to/server"},
},
Hooks: HooksConfig{
OnStartup: HookOnStartup{
Preload: []string{"model1", "model2"},
},
},
SendLoadingState: false,
Models: map[string]ModelConfig{
"model1": {
Cmd: "path/to/cmd --arg1 one",
Proxy: "http://localhost:8080",
Aliases: []string{"m1", "model-one"},
Env: []string{"VAR1=value1", "VAR2=value2"},
CheckEndpoint: "/health",
Name: "Model 1",
Description: "This is model 1",
Cmd: "path/to/cmd --arg1 one",
Proxy: "http://localhost:8080",
Aliases: []string{"m1", "model-one"},
Env: []string{"VAR1=value1", "VAR2=value2"},
CheckEndpoint: "/health",
Name: "Model 1",
Description: "This is model 1",
SendLoadingState: &modelLoadingState,
},
"model2": {
Cmd: "path/to/server --arg1 one",
Proxy: "http://localhost:8081",
Aliases: []string{"m2"},
Env: []string{},
CheckEndpoint: "/",
Cmd: "path/to/server --arg1 one",
Proxy: "http://localhost:8081",
Aliases: []string{"m2"},
Env: []string{},
CheckEndpoint: "/",
SendLoadingState: &modelLoadingState,
},
"model3": {
Cmd: "path/to/cmd --arg1 one",
Proxy: "http://localhost:8081",
Aliases: []string{"mthree"},
Env: []string{},
CheckEndpoint: "/",
Cmd: "path/to/cmd --arg1 one",
Proxy: "http://localhost:8081",
Aliases: []string{"mthree"},
Env: []string{},
CheckEndpoint: "/",
SendLoadingState: &modelLoadingState,
},
"model4": {
Cmd: "path/to/cmd --arg1 one",
Proxy: "http://localhost:8082",
CheckEndpoint: "/",
Aliases: []string{},
Env: []string{},
Cmd: "path/to/cmd --arg1 one",
Proxy: "http://localhost:8082",
CheckEndpoint: "/",
Aliases: []string{},
Env: []string{},
SendLoadingState: &modelLoadingState,
},
},
HealthCheckTimeout: 15,
MetricsMaxInMemory: 1000,
CaptureBuffer: 5,
Profiles: map[string][]string{
"test": {"model1", "model2"},
},
File diff suppressed because it is too large Load Diff
@@ -1,6 +1,6 @@
//go:build windows
package proxy
package config
import (
"os"
@@ -55,6 +55,7 @@ models:
assert.Equal(t, 120, config.HealthCheckTimeout)
assert.Equal(t, 5800, config.StartPort)
assert.Equal(t, "info", config.LogLevel)
assert.Equal(t, "", config.LogTimeFormat)
// Test default group exists
defaultGroup, exists := config.Groups["(default)"]
@@ -152,48 +153,58 @@ groups:
t.Fatalf("Failed to load config: %v", err)
}
modelLoadingState := false
expected := Config{
LogLevel: "info",
StartPort: 5800,
Macros: map[string]string{
"svr-path": "path/to/server",
LogLevel: "info",
LogTimeFormat: "",
LogToStdout: LogToStdoutProxy,
StartPort: 5800,
Macros: MacroList{
{"svr-path", "path/to/server"},
},
SendLoadingState: false,
Models: map[string]ModelConfig{
"model1": {
Cmd: "path/to/cmd --arg1 one",
CmdStop: "taskkill /f /t /pid ${PID}",
Proxy: "http://localhost:8080",
Aliases: []string{"m1", "model-one"},
Env: []string{"VAR1=value1", "VAR2=value2"},
CheckEndpoint: "/health",
Cmd: "path/to/cmd --arg1 one",
CmdStop: "taskkill /f /t /pid ${PID}",
Proxy: "http://localhost:8080",
Aliases: []string{"m1", "model-one"},
Env: []string{"VAR1=value1", "VAR2=value2"},
CheckEndpoint: "/health",
SendLoadingState: &modelLoadingState,
},
"model2": {
Cmd: "path/to/server --arg1 one",
CmdStop: "taskkill /f /t /pid ${PID}",
Proxy: "http://localhost:8081",
Aliases: []string{"m2"},
Env: []string{},
CheckEndpoint: "/",
Cmd: "path/to/server --arg1 one",
CmdStop: "taskkill /f /t /pid ${PID}",
Proxy: "http://localhost:8081",
Aliases: []string{"m2"},
Env: []string{},
CheckEndpoint: "/",
SendLoadingState: &modelLoadingState,
},
"model3": {
Cmd: "path/to/cmd --arg1 one",
CmdStop: "taskkill /f /t /pid ${PID}",
Proxy: "http://localhost:8081",
Aliases: []string{"mthree"},
Env: []string{},
CheckEndpoint: "/",
Cmd: "path/to/cmd --arg1 one",
CmdStop: "taskkill /f /t /pid ${PID}",
Proxy: "http://localhost:8081",
Aliases: []string{"mthree"},
Env: []string{},
CheckEndpoint: "/",
SendLoadingState: &modelLoadingState,
},
"model4": {
Cmd: "path/to/cmd --arg1 one",
CmdStop: "taskkill /f /t /pid ${PID}",
Proxy: "http://localhost:8082",
CheckEndpoint: "/",
Aliases: []string{},
Env: []string{},
Cmd: "path/to/cmd --arg1 one",
CmdStop: "taskkill /f /t /pid ${PID}",
Proxy: "http://localhost:8082",
CheckEndpoint: "/",
Aliases: []string{},
Env: []string{},
SendLoadingState: &modelLoadingState,
},
},
HealthCheckTimeout: 15,
MetricsMaxInMemory: 1000,
CaptureBuffer: 5,
Profiles: map[string][]string{
"test": {"model1", "model2"},
},
+114
View File
@@ -0,0 +1,114 @@
package config
import (
"slices"
"sort"
"strings"
)
// ProtectedParams is a list of parameters that cannot be set or stripped via filters
// These are protected to prevent breaking the proxy's ability to route requests correctly
var ProtectedParams = []string{"model"}
// Filters contains filter settings for modifying request parameters
// Used by both models and peers
type Filters struct {
// StripParams is a comma-separated list of parameters to remove from requests
// The "model" parameter can never be removed
StripParams string `yaml:"stripParams"`
// SetParams is a dictionary of parameters to set/override in requests
// Protected params (like "model") cannot be set
SetParams map[string]any `yaml:"setParams"`
// SetParamsByID maps requested model IDs to parameters to set/override in requests.
// Useful with aliases: a single loaded model can behave differently depending on
// which alias the client used. Applied after SetParams, so it can override those values.
// Protected params (like "model") cannot be set.
SetParamsByID map[string]map[string]any `yaml:"setParamsByID"`
}
// SanitizedStripParams returns a sorted list of parameters to strip,
// with duplicates, empty strings, and protected params removed
func (f Filters) SanitizedStripParams() []string {
if f.StripParams == "" {
return nil
}
params := strings.Split(f.StripParams, ",")
cleaned := make([]string, 0, len(params))
seen := make(map[string]bool)
for _, param := range params {
trimmed := strings.TrimSpace(param)
// Skip protected params, empty strings, and duplicates
if slices.Contains(ProtectedParams, trimmed) || trimmed == "" || seen[trimmed] {
continue
}
seen[trimmed] = true
cleaned = append(cleaned, trimmed)
}
if len(cleaned) == 0 {
return nil
}
slices.Sort(cleaned)
return cleaned
}
// SanitizedSetParamsByID returns the params to set for the given requestedModelID,
// with protected params removed and keys sorted for consistent iteration order.
// Returns nil if the ID has no entry or all its params are protected.
func (f Filters) SanitizedSetParamsByID(requestedModelID string) (map[string]any, []string) {
if len(f.SetParamsByID) == 0 {
return nil, nil
}
params, found := f.SetParamsByID[requestedModelID]
if !found || len(params) == 0 {
return nil, nil
}
result := make(map[string]any, len(params))
keys := make([]string, 0, len(params))
for key, value := range params {
if slices.Contains(ProtectedParams, key) {
continue
}
result[key] = value
keys = append(keys, key)
}
sort.Strings(keys)
if len(result) == 0 {
return nil, nil
}
return result, keys
}
// SanitizedSetParams returns a copy of SetParams with protected params removed
// and keys sorted for consistent iteration order
func (f Filters) SanitizedSetParams() (map[string]any, []string) {
if len(f.SetParams) == 0 {
return nil, nil
}
result := make(map[string]any, len(f.SetParams))
keys := make([]string, 0, len(f.SetParams))
for key, value := range f.SetParams {
// Skip protected params
if slices.Contains(ProtectedParams, key) {
continue
}
result[key] = value
keys = append(keys, key)
}
// Sort keys for consistent ordering
sort.Strings(keys)
if len(result) == 0 {
return nil, nil
}
return result, keys
}
+285
View File
@@ -0,0 +1,285 @@
package config
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestFilters_SanitizedStripParams(t *testing.T) {
tests := []struct {
name string
stripParams string
want []string
}{
{
name: "empty string",
stripParams: "",
want: nil,
},
{
name: "single param",
stripParams: "temperature",
want: []string{"temperature"},
},
{
name: "multiple params",
stripParams: "temperature, top_p, top_k",
want: []string{"temperature", "top_k", "top_p"}, // sorted
},
{
name: "model param filtered",
stripParams: "model, temperature, top_p",
want: []string{"temperature", "top_p"},
},
{
name: "only model param",
stripParams: "model",
want: nil,
},
{
name: "duplicates removed",
stripParams: "temperature, top_p, temperature",
want: []string{"temperature", "top_p"},
},
{
name: "extra whitespace",
stripParams: " temperature , top_p ",
want: []string{"temperature", "top_p"},
},
{
name: "empty values filtered",
stripParams: "temperature,,top_p,",
want: []string{"temperature", "top_p"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
f := Filters{StripParams: tt.stripParams}
got := f.SanitizedStripParams()
assert.Equal(t, tt.want, got)
})
}
}
func TestFilters_SanitizedSetParams(t *testing.T) {
tests := []struct {
name string
setParams map[string]any
wantParams map[string]any
wantKeys []string
}{
{
name: "empty setParams",
setParams: nil,
wantParams: nil,
wantKeys: nil,
},
{
name: "empty map",
setParams: map[string]any{},
wantParams: nil,
wantKeys: nil,
},
{
name: "normal params",
setParams: map[string]any{
"temperature": 0.7,
"top_p": 0.9,
},
wantParams: map[string]any{
"temperature": 0.7,
"top_p": 0.9,
},
wantKeys: []string{"temperature", "top_p"},
},
{
name: "protected model param filtered",
setParams: map[string]any{
"model": "should-be-filtered",
"temperature": 0.7,
},
wantParams: map[string]any{
"temperature": 0.7,
},
wantKeys: []string{"temperature"},
},
{
name: "only protected param",
setParams: map[string]any{
"model": "should-be-filtered",
},
wantParams: nil,
wantKeys: nil,
},
{
name: "complex nested values",
setParams: map[string]any{
"provider": map[string]any{
"data_collection": "deny",
"allow_fallbacks": false,
},
"transforms": []string{"middle-out"},
},
wantParams: map[string]any{
"provider": map[string]any{
"data_collection": "deny",
"allow_fallbacks": false,
},
"transforms": []string{"middle-out"},
},
wantKeys: []string{"provider", "transforms"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
f := Filters{SetParams: tt.setParams}
gotParams, gotKeys := f.SanitizedSetParams()
assert.Equal(t, len(tt.wantKeys), len(gotKeys), "keys length mismatch")
for i, key := range gotKeys {
assert.Equal(t, tt.wantKeys[i], key, "key mismatch at %d", i)
}
if tt.wantParams == nil {
assert.Nil(t, gotParams, "expected nil params")
return
}
assert.Equal(t, len(tt.wantParams), len(gotParams), "params length mismatch")
for key, wantValue := range tt.wantParams {
gotValue, exists := gotParams[key]
assert.True(t, exists, "missing key: %s", key)
// Simple comparison for basic types
switch v := wantValue.(type) {
case string, int, float64, bool:
assert.Equal(t, v, gotValue, "value mismatch for key %s", key)
}
}
})
}
}
func TestFilters_SanitizedSetParamsByID(t *testing.T) {
tests := []struct {
name string
setParamsByID map[string]map[string]any
requestedModelID string
wantParams map[string]any
wantKeys []string
}{
{
name: "empty SetParamsByID returns nil",
setParamsByID: nil,
requestedModelID: "model1",
wantParams: nil,
wantKeys: nil,
},
{
name: "empty map returns nil",
setParamsByID: map[string]map[string]any{},
requestedModelID: "model1",
wantParams: nil,
wantKeys: nil,
},
{
name: "non-matching model ID returns nil",
setParamsByID: map[string]map[string]any{
"model2": {"temperature": 0.9},
},
requestedModelID: "model1",
wantParams: nil,
wantKeys: nil,
},
{
name: "matching model ID returns correct params",
setParamsByID: map[string]map[string]any{
"model1": {"temperature": 0.7, "top_p": 0.9},
"model2": {"temperature": 0.5},
},
requestedModelID: "model1",
wantParams: map[string]any{
"temperature": 0.7,
"top_p": 0.9,
},
wantKeys: []string{"temperature", "top_p"},
},
{
name: "protected param model is filtered out",
setParamsByID: map[string]map[string]any{
"model1": {
"model": "should-be-filtered",
"temperature": 0.7,
},
},
requestedModelID: "model1",
wantParams: map[string]any{
"temperature": 0.7,
},
wantKeys: []string{"temperature"},
},
{
name: "only protected param returns nil",
setParamsByID: map[string]map[string]any{
"model1": {
"model": "should-be-filtered",
},
},
requestedModelID: "model1",
wantParams: nil,
wantKeys: nil,
},
{
name: "keys are sorted",
setParamsByID: map[string]map[string]any{
"model1": {
"z_param": "z",
"a_param": "a",
"m_param": "m",
},
},
requestedModelID: "model1",
wantParams: map[string]any{
"z_param": "z",
"a_param": "a",
"m_param": "m",
},
wantKeys: []string{"a_param", "m_param", "z_param"},
},
{
name: "alias style key lookup",
setParamsByID: map[string]map[string]any{
"model1:high": {"reasoning_effort": "high"},
"model1:low": {"reasoning_effort": "low"},
},
requestedModelID: "model1:high",
wantParams: map[string]any{
"reasoning_effort": "high",
},
wantKeys: []string{"reasoning_effort"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
f := Filters{SetParamsByID: tt.setParamsByID}
gotParams, gotKeys := f.SanitizedSetParamsByID(tt.requestedModelID)
if tt.wantParams == nil {
assert.Nil(t, gotParams)
assert.Nil(t, gotKeys)
return
}
assert.Equal(t, tt.wantKeys, gotKeys)
assert.Equal(t, tt.wantParams, gotParams)
})
}
}
func TestProtectedParams(t *testing.T) {
// Verify that "model" is protected
assert.Contains(t, ProtectedParams, "model")
}
+123
View File
@@ -0,0 +1,123 @@
package config
import (
"strings"
"testing"
"github.com/stretchr/testify/assert"
)
// Test macro-in-macro basic substitution
func TestConfig_MacroInMacroBasic(t *testing.T) {
content := `
startPort: 10000
macros:
"A": "value-A"
"B": "prefix-${A}-suffix"
models:
test:
cmd: echo ${B}
proxy: http://localhost:8080
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
assert.Equal(t, "echo prefix-value-A-suffix", config.Models["test"].Cmd)
}
// Test LIFO substitution order with 3+ macro levels
func TestConfig_MacroInMacroLIFOOrder(t *testing.T) {
content := `
startPort: 10000
macros:
"base": "/models"
"path": "${base}/llama"
"full": "${path}/model.gguf"
models:
test:
cmd: load ${full}
proxy: http://localhost:8080
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
assert.Equal(t, "load /models/llama/model.gguf", config.Models["test"].Cmd)
}
// Test MODEL_ID in global macro used by model
func TestConfig_ModelIdInGlobalMacro(t *testing.T) {
content := `
startPort: 10000
macros:
"podman-llama": "podman run --name ${MODEL_ID} ghcr.io/ggml-org/llama.cpp:server-cuda"
models:
my-model:
cmd: ${podman-llama} -m model.gguf
proxy: http://localhost:8080
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
assert.Equal(t, "podman run --name my-model ghcr.io/ggml-org/llama.cpp:server-cuda -m model.gguf", config.Models["my-model"].Cmd)
}
// Test model macro overrides global macro in substitution
func TestConfig_ModelMacroOverridesGlobal(t *testing.T) {
content := `
startPort: 10000
macros:
"tag": "global"
"msg": "value-${tag}"
models:
test:
macros:
"tag": "model-level"
cmd: echo ${msg}
proxy: http://localhost:8080
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
assert.Equal(t, "echo value-model-level", config.Models["test"].Cmd)
}
// Test self-reference detection error
func TestConfig_SelfReferenceDetection(t *testing.T) {
content := `
startPort: 10000
macros:
"recursive": "value-${recursive}"
models:
test:
cmd: echo ${recursive}
proxy: http://localhost:8080
`
_, err := LoadConfigFromReader(strings.NewReader(content))
assert.Error(t, err)
assert.Contains(t, err.Error(), "recursive")
assert.Contains(t, err.Error(), "self-reference")
}
// Test undefined macro reference error
func TestConfig_UndefinedMacroReference(t *testing.T) {
content := `
startPort: 10000
macros:
"A": "value-${UNDEFINED}"
models:
test:
cmd: echo ${A}
proxy: http://localhost:8080
`
_, err := LoadConfigFromReader(strings.NewReader(content))
assert.Error(t, err)
assert.Contains(t, err.Error(), "UNDEFINED")
}
+112
View File
@@ -0,0 +1,112 @@
package config
import (
"errors"
"runtime"
)
const (
MODEL_CONFIG_DEFAULT_TTL = -1
)
type ModelConfig struct {
Cmd string `yaml:"cmd"`
CmdStop string `yaml:"cmdStop"`
Proxy string `yaml:"proxy"`
Aliases []string `yaml:"aliases"`
Env []string `yaml:"env"`
CheckEndpoint string `yaml:"checkEndpoint"`
UnloadAfter int `yaml:"ttl"`
Unlisted bool `yaml:"unlisted"`
UseModelName string `yaml:"useModelName"`
// #179 for /v1/models
Name string `yaml:"name"`
Description string `yaml:"description"`
// Limit concurrency of HTTP requests to process
ConcurrencyLimit int `yaml:"concurrencyLimit"`
// Model filters see issue #174
Filters ModelFilters `yaml:"filters"`
// Macros: see #264
// Model level macros take precedence over the global macros
Macros MacroList `yaml:"macros"`
// Metadata: see #264
// Arbitrary metadata that can be exposed through the API
Metadata map[string]any `yaml:"metadata"`
// override global setting
SendLoadingState *bool `yaml:"sendLoadingState"`
}
func (m *ModelConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
type rawModelConfig ModelConfig
defaults := rawModelConfig{
Cmd: "",
CmdStop: "",
Proxy: "http://localhost:${PORT}",
Aliases: []string{},
Env: []string{},
CheckEndpoint: "/health",
UnloadAfter: MODEL_CONFIG_DEFAULT_TTL, // use GlobalTTL
Unlisted: false,
UseModelName: "",
ConcurrencyLimit: 0,
Name: "",
Description: "",
}
// the default cmdStop to taskkill /f /t /pid ${PID}
if runtime.GOOS == "windows" {
defaults.CmdStop = "taskkill /f /t /pid ${PID}"
}
if err := unmarshal(&defaults); err != nil {
return err
}
*m = ModelConfig(defaults)
return nil
}
func (m *ModelConfig) SanitizedCommand() ([]string, error) {
return SanitizeCommand(m.Cmd)
}
// ModelFilters embeds Filters and adds legacy support for strip_params field
// See issue #174
type ModelFilters struct {
Filters `yaml:",inline"`
}
func (m *ModelFilters) UnmarshalYAML(unmarshal func(interface{}) error) error {
type rawModelFilters ModelFilters
defaults := rawModelFilters{}
if err := unmarshal(&defaults); err != nil {
return err
}
// Try to unmarshal with the old field name for backwards compatibility
if defaults.StripParams == "" {
var legacy struct {
StripParams string `yaml:"strip_params"`
}
if legacyErr := unmarshal(&legacy); legacyErr != nil {
return errors.New("failed to unmarshal legacy filters.strip_params: " + legacyErr.Error())
}
defaults.StripParams = legacy.StripParams
}
*m = ModelFilters(defaults)
return nil
}
// SanitizedStripParams wraps Filters.SanitizedStripParams for backwards compatibility
// Returns ([]string, error) to match existing API
func (f ModelFilters) SanitizedStripParams() ([]string, error) {
return f.Filters.SanitizedStripParams(), nil
}
+172
View File
@@ -0,0 +1,172 @@
package config
import (
"fmt"
"strings"
"testing"
"github.com/stretchr/testify/assert"
)
func TestConfig_ModelConfigSanitizedCommand(t *testing.T) {
config := &ModelConfig{
Cmd: `python model1.py \
--arg1 value1 \
--arg2 value2`,
}
args, err := config.SanitizedCommand()
assert.NoError(t, err)
assert.Equal(t, []string{"python", "model1.py", "--arg1", "value1", "--arg2", "value2"}, args)
}
func TestConfig_ModelFilters(t *testing.T) {
content := `
macros:
default_strip: "temperature, top_p"
models:
model1:
cmd: path/to/cmd --port ${PORT}
filters:
# macros inserted and list is cleaned of duplicates and empty strings
stripParams: "model, top_k, top_k, temperature, ${default_strip}, , ,"
# check for strip_params (legacy field name) compatibility
legacy:
cmd: path/to/cmd --port ${PORT}
filters:
strip_params: "model, top_k, top_k, temperature, ${default_strip}, , ,"
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
for modelId, modelConfig := range config.Models {
t.Run(fmt.Sprintf("Testing macros in filters for model %s", modelId), func(t *testing.T) {
assert.Equal(t, "model, top_k, top_k, temperature, temperature, top_p, , ,", modelConfig.Filters.StripParams)
sanitized, err := modelConfig.Filters.SanitizedStripParams()
if assert.NoError(t, err) {
// model has been removed
// empty strings have been removed
// duplicates have been removed
assert.Equal(t, []string{"temperature", "top_k", "top_p"}, sanitized)
}
})
}
}
func TestConfig_ModelSendLoadingState(t *testing.T) {
content := `
sendLoadingState: true
models:
model1:
cmd: path/to/cmd --port ${PORT}
sendLoadingState: false
model2:
cmd: path/to/cmd --port ${PORT}
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
assert.True(t, config.SendLoadingState)
if assert.NotNil(t, config.Models["model1"].SendLoadingState) {
assert.False(t, *config.Models["model1"].SendLoadingState)
}
if assert.NotNil(t, config.Models["model2"].SendLoadingState) {
assert.True(t, *config.Models["model2"].SendLoadingState)
}
}
func TestConfig_SetParamsByIDAutoAlias(t *testing.T) {
content := `
models:
model1:
cmd: path/to/cmd --port ${PORT}
filters:
setParamsByID:
"${MODEL_ID}:high":
reasoning_effort: high
"${MODEL_ID}:low":
reasoning_effort: low
`
cfg, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
// Keys (other than the model's own ID) should be registered as aliases
realName, found := cfg.RealModelName("model1:high")
assert.True(t, found, "model1:high should be an auto-registered alias")
assert.Equal(t, "model1", realName)
realName, found = cfg.RealModelName("model1:low")
assert.True(t, found, "model1:low should be an auto-registered alias")
assert.Equal(t, "model1", realName)
// Auto-aliases should also appear in modelConfig.Aliases
aliases := cfg.Models["model1"].Aliases
assert.Contains(t, aliases, "model1:high")
assert.Contains(t, aliases, "model1:low")
}
func TestConfig_SetParamsByIDAutoAliasConflictWithModelID(t *testing.T) {
content := `
models:
model1:
cmd: path/to/cmd --port ${PORT}
filters:
setParamsByID:
model2:
reasoning_effort: high
model2:
cmd: path/to/cmd --port ${PORT}
`
_, err := LoadConfigFromReader(strings.NewReader(content))
assert.ErrorContains(t, err, "conflicts with an existing model ID")
}
func TestConfig_SetParamsByIDAutoAliasConflictWithOtherModel(t *testing.T) {
content := `
models:
model1:
cmd: path/to/cmd --port ${PORT}
filters:
setParamsByID:
"shared-alias":
reasoning_effort: high
model2:
cmd: path/to/cmd --port ${PORT}
filters:
setParamsByID:
"shared-alias":
reasoning_effort: low
`
_, err := LoadConfigFromReader(strings.NewReader(content))
assert.ErrorContains(t, err, "duplicate alias")
}
func TestConfig_ModelFiltersWithSetParams(t *testing.T) {
content := `
models:
model1:
cmd: path/to/cmd --port ${PORT}
filters:
stripParams: "top_k"
setParams:
temperature: 0.7
top_p: 0.9
stop:
- "<|end|>"
- "<|stop|>"
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
modelConfig := config.Models["model1"]
// Check stripParams
stripParams, err := modelConfig.Filters.SanitizedStripParams()
assert.NoError(t, err)
assert.Equal(t, []string{"top_k"}, stripParams)
// Check setParams
setParams, keys := modelConfig.Filters.SanitizedSetParams()
assert.NotNil(t, setParams)
assert.Equal(t, []string{"stop", "temperature", "top_p"}, keys)
assert.Equal(t, 0.7, setParams["temperature"])
assert.Equal(t, 0.9, setParams["top_p"])
}
+49
View File
@@ -0,0 +1,49 @@
package config
import (
"fmt"
"net/url"
)
type PeerDictionaryConfig map[string]PeerConfig
type PeerConfig struct {
Proxy string `yaml:"proxy"`
ProxyURL *url.URL `yaml:"-"`
ApiKey string `yaml:"apiKey"`
Models []string `yaml:"models"`
Filters Filters `yaml:"filters"`
}
func (c *PeerConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
type rawPeerConfig PeerConfig
defaults := rawPeerConfig{
Proxy: "",
ApiKey: "",
Models: []string{},
Filters: Filters{},
}
if err := unmarshal(&defaults); err != nil {
return err
}
// Validate proxy is not empty
if defaults.Proxy == "" {
return fmt.Errorf("proxy is required")
}
// Validate proxy is a valid URL and store the parsed value
parsedURL, err := url.Parse(defaults.Proxy)
if err != nil {
return fmt.Errorf("invalid peer proxy URL (%s): %w", defaults.Proxy, err)
}
defaults.ProxyURL = parsedURL
// Validate models is not empty
if len(defaults.Models) == 0 {
return fmt.Errorf("peer models can not be empty")
}
*c = PeerConfig(defaults)
return nil
}
+209
View File
@@ -0,0 +1,209 @@
package config
import (
"testing"
"gopkg.in/yaml.v3"
)
func TestPeerConfig_UnmarshalYAML(t *testing.T) {
tests := []struct {
name string
yaml string
wantErr string
}{
{
name: "valid config",
yaml: `
proxy: http://192.168.1.23
models:
- model_a
- model_b
`,
wantErr: "",
},
{
name: "valid config with apiKey",
yaml: `
proxy: https://openrouter.ai/api
apiKey: sk-test-key
models:
- meta-llama/llama-3.1-8b-instruct
`,
wantErr: "",
},
{
name: "missing proxy",
yaml: `
models:
- model_a
`,
wantErr: "proxy is required",
},
{
name: "empty proxy",
yaml: `
proxy: ""
models:
- model_a
`,
wantErr: "proxy is required",
},
{
name: "invalid proxy URL",
yaml: `
proxy: "://invalid"
models:
- model_a
`,
wantErr: "invalid peer proxy URL",
},
{
name: "missing models",
yaml: `
proxy: http://localhost:8080
`,
wantErr: "peer models can not be empty",
},
{
name: "empty models",
yaml: `
proxy: http://localhost:8080
models: []
`,
wantErr: "peer models can not be empty",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
var config PeerConfig
err := yaml.Unmarshal([]byte(tt.yaml), &config)
if tt.wantErr == "" {
if err != nil {
t.Errorf("unexpected error: %v", err)
}
} else {
if err == nil {
t.Errorf("expected error containing %q, got nil", tt.wantErr)
} else if !contains(err.Error(), tt.wantErr) {
t.Errorf("expected error containing %q, got %q", tt.wantErr, err.Error())
}
}
})
}
}
func TestPeerConfig_ProxyURL(t *testing.T) {
yamlData := `
proxy: http://192.168.1.23:8080/api
apiKey: sk-test
models:
- model_a
`
var config PeerConfig
err := yaml.Unmarshal([]byte(yamlData), &config)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if config.ProxyURL == nil {
t.Fatal("ProxyURL should not be nil")
}
if config.ProxyURL.Host != "192.168.1.23:8080" {
t.Errorf("expected host %q, got %q", "192.168.1.23:8080", config.ProxyURL.Host)
}
if config.ProxyURL.Scheme != "http" {
t.Errorf("expected scheme %q, got %q", "http", config.ProxyURL.Scheme)
}
if config.ProxyURL.Path != "/api" {
t.Errorf("expected path %q, got %q", "/api", config.ProxyURL.Path)
}
}
func contains(s, substr string) bool {
return len(s) >= len(substr) && searchSubstring(s, substr)
}
func searchSubstring(s, substr string) bool {
for i := 0; i <= len(s)-len(substr); i++ {
if s[i:i+len(substr)] == substr {
return true
}
}
return false
}
func TestPeerConfig_WithFilters(t *testing.T) {
yamlData := `
proxy: https://openrouter.ai/api
apiKey: sk-test
models:
- model_a
filters:
setParams:
temperature: 0.7
provider:
data_collection: deny
`
var config PeerConfig
err := yaml.Unmarshal([]byte(yamlData), &config)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if config.Filters.SetParams == nil {
t.Fatal("Filters.SetParams should not be nil")
}
if config.Filters.SetParams["temperature"] != 0.7 {
t.Errorf("expected temperature 0.7, got %v", config.Filters.SetParams["temperature"])
}
provider, ok := config.Filters.SetParams["provider"].(map[string]any)
if !ok {
t.Fatal("provider should be a map")
}
if provider["data_collection"] != "deny" {
t.Errorf("expected data_collection deny, got %v", provider["data_collection"])
}
}
func TestPeerConfig_WithBothFilters(t *testing.T) {
yamlData := `
proxy: https://openrouter.ai/api
apiKey: sk-test
models:
- model_a
filters:
stripParams: "temperature, top_p"
setParams:
max_tokens: 1000
`
var config PeerConfig
err := yaml.Unmarshal([]byte(yamlData), &config)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
// Check stripParams
stripParams := config.Filters.SanitizedStripParams()
if len(stripParams) != 2 {
t.Errorf("expected 2 strip params, got %d", len(stripParams))
}
if stripParams[0] != "temperature" || stripParams[1] != "top_p" {
t.Errorf("unexpected strip params: %v", stripParams)
}
// Check setParams
if config.Filters.SetParams == nil {
t.Fatal("Filters.SetParams should not be nil")
}
if config.Filters.SetParams["max_tokens"] != 1000 {
t.Errorf("expected max_tokens 1000, got %v", config.Filters.SetParams["max_tokens"])
}
}
-483
View File
@@ -1,483 +0,0 @@
package proxy
import (
"slices"
"strings"
"testing"
"github.com/stretchr/testify/assert"
)
func TestConfig_GroupMemberIsUnique(t *testing.T) {
content := `
models:
model1:
cmd: path/to/cmd --arg1 one
proxy: "http://localhost:8080"
model2:
cmd: path/to/cmd --arg1 one
proxy: "http://localhost:8081"
checkEndpoint: "/"
model3:
cmd: path/to/cmd --arg1 one
proxy: "http://localhost:8081"
checkEndpoint: "/"
healthCheckTimeout: 15
groups:
group1:
swap: true
exclusive: false
members: ["model2"]
group2:
swap: true
exclusive: false
members: ["model2"]
`
// Load the config and verify
_, err := LoadConfigFromReader(strings.NewReader(content))
// a Contains as order of the map is not guaranteed
assert.Contains(t, err.Error(), "model member model2 is used in multiple groups:")
}
func TestConfig_ModelAliasesAreUnique(t *testing.T) {
content := `
models:
model1:
cmd: path/to/cmd --arg1 one
proxy: "http://localhost:8080"
aliases:
- m1
model2:
cmd: path/to/cmd --arg1 one
proxy: "http://localhost:8081"
checkEndpoint: "/"
aliases:
- m1
- m2
`
// Load the config and verify
_, err := LoadConfigFromReader(strings.NewReader(content))
// this is a contains because it could be `model1` or `model2` depending on the order
// go decided on the order of the map
assert.Contains(t, err.Error(), "duplicate alias m1 found in model: model")
}
func TestConfig_ModelConfigSanitizedCommand(t *testing.T) {
config := &ModelConfig{
Cmd: `python model1.py \
--arg1 value1 \
--arg2 value2`,
}
args, err := config.SanitizedCommand()
assert.NoError(t, err)
assert.Equal(t, []string{"python", "model1.py", "--arg1", "value1", "--arg2", "value2"}, args)
}
func TestConfig_FindConfig(t *testing.T) {
// TODO?
// make make this shared between the different tests
config := &Config{
Models: map[string]ModelConfig{
"model1": {
Cmd: "python model1.py",
Proxy: "http://localhost:8080",
Aliases: []string{"m1", "model-one"},
Env: []string{"VAR1=value1", "VAR2=value2"},
CheckEndpoint: "/health",
},
"model2": {
Cmd: "python model2.py",
Proxy: "http://localhost:8081",
Aliases: []string{"m2", "model-two"},
Env: []string{"VAR3=value3", "VAR4=value4"},
CheckEndpoint: "/status",
},
},
HealthCheckTimeout: 10,
aliases: map[string]string{
"m1": "model1",
"model-one": "model1",
"m2": "model2",
},
}
// Test finding a model by its name
modelConfig, modelId, found := config.FindConfig("model1")
assert.True(t, found)
assert.Equal(t, "model1", modelId)
assert.Equal(t, config.Models["model1"], modelConfig)
// Test finding a model by its alias
modelConfig, modelId, found = config.FindConfig("m1")
assert.True(t, found)
assert.Equal(t, "model1", modelId)
assert.Equal(t, config.Models["model1"], modelConfig)
// Test finding a model that does not exist
modelConfig, modelId, found = config.FindConfig("model3")
assert.False(t, found)
assert.Equal(t, "", modelId)
assert.Equal(t, ModelConfig{}, modelConfig)
}
func TestConfig_AutomaticPortAssignments(t *testing.T) {
t.Run("Default Port Ranges", func(t *testing.T) {
content := ``
config, err := LoadConfigFromReader(strings.NewReader(content))
if !assert.NoError(t, err) {
t.Fatalf("Failed to load config: %v", err)
}
assert.Equal(t, 5800, config.StartPort)
})
t.Run("User specific port ranges", func(t *testing.T) {
content := `startPort: 1000`
config, err := LoadConfigFromReader(strings.NewReader(content))
if !assert.NoError(t, err) {
t.Fatalf("Failed to load config: %v", err)
}
assert.Equal(t, 1000, config.StartPort)
})
t.Run("Invalid start port", func(t *testing.T) {
content := `startPort: abcd`
_, err := LoadConfigFromReader(strings.NewReader(content))
assert.NotNil(t, err)
})
t.Run("start port must be greater than 1", func(t *testing.T) {
content := `startPort: -99`
_, err := LoadConfigFromReader(strings.NewReader(content))
assert.NotNil(t, err)
})
t.Run("Automatic port assignments", func(t *testing.T) {
content := `
startPort: 5800
models:
model1:
cmd: svr --port ${PORT}
model2:
cmd: svr --port ${PORT}
proxy: "http://172.11.22.33:${PORT}"
model3:
cmd: svr --port 1999
proxy: "http://1.2.3.4:1999"
`
config, err := LoadConfigFromReader(strings.NewReader(content))
if !assert.NoError(t, err) {
t.Fatalf("Failed to load config: %v", err)
}
assert.Equal(t, 5800, config.StartPort)
assert.Equal(t, "svr --port 5800", config.Models["model1"].Cmd)
assert.Equal(t, "http://localhost:5800", config.Models["model1"].Proxy)
assert.Equal(t, "svr --port 5801", config.Models["model2"].Cmd)
assert.Equal(t, "http://172.11.22.33:5801", config.Models["model2"].Proxy)
assert.Equal(t, "svr --port 1999", config.Models["model3"].Cmd)
assert.Equal(t, "http://1.2.3.4:1999", config.Models["model3"].Proxy)
})
t.Run("Proxy value required if no ${PORT} in cmd", func(t *testing.T) {
content := `
models:
model1:
cmd: svr --port 111
`
_, err := LoadConfigFromReader(strings.NewReader(content))
assert.Equal(t, "model model1: proxy uses ${PORT} but cmd does not - ${PORT} is only available when used in cmd", err.Error())
})
}
func TestConfig_MacroReplacement(t *testing.T) {
content := `
startPort: 9990
macros:
svr-path: "path/to/server"
argOne: "--arg1"
argTwo: "--arg2"
autoPort: "--port ${PORT}"
models:
model1:
cmd: |
${svr-path} ${argTwo}
# the automatic ${PORT} is replaced
${autoPort}
${argOne}
--arg3 three
cmdStop: |
/path/to/stop.sh --port ${PORT} ${argTwo}
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
sanitizedCmd, err := SanitizeCommand(config.Models["model1"].Cmd)
assert.NoError(t, err)
assert.Equal(t, "path/to/server --arg2 --port 9990 --arg1 --arg3 three", strings.Join(sanitizedCmd, " "))
sanitizedCmdStop, err := SanitizeCommand(config.Models["model1"].CmdStop)
assert.NoError(t, err)
assert.Equal(t, "/path/to/stop.sh --port 9990 --arg2", strings.Join(sanitizedCmdStop, " "))
}
func TestConfig_MacroErrorOnUnknownMacros(t *testing.T) {
tests := []struct {
name string
field string
content string
}{
{
name: "unknown macro in cmd",
field: "cmd",
content: `
startPort: 9990
macros:
svr-path: "path/to/server"
models:
model1:
cmd: |
${svr-path} --port ${PORT}
${unknownMacro}
`,
},
{
name: "unknown macro in cmdStop",
field: "cmdStop",
content: `
startPort: 9990
macros:
svr-path: "path/to/server"
models:
model1:
cmd: "${svr-path} --port ${PORT}"
cmdStop: "kill ${unknownMacro}"
`,
},
{
name: "unknown macro in proxy",
field: "proxy",
content: `
startPort: 9990
macros:
svr-path: "path/to/server"
models:
model1:
cmd: "${svr-path} --port ${PORT}"
proxy: "http://localhost:${unknownMacro}"
`,
},
{
name: "unknown macro in checkEndpoint",
field: "checkEndpoint",
content: `
startPort: 9990
macros:
svr-path: "path/to/server"
models:
model1:
cmd: "${svr-path} --port ${PORT}"
checkEndpoint: "http://localhost:${unknownMacro}/health"
`,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
_, err := LoadConfigFromReader(strings.NewReader(tt.content))
assert.Error(t, err)
assert.Contains(t, err.Error(), "unknown macro '${unknownMacro}' found in model1."+tt.field)
//t.Log(err)
})
}
}
func TestConfig_ModelFilters(t *testing.T) {
content := `
macros:
default_strip: "temperature, top_p"
models:
model1:
cmd: path/to/cmd --port ${PORT}
filters:
strip_params: "model, top_k, ${default_strip}, , ,"
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
modelConfig, ok := config.Models["model1"]
if !assert.True(t, ok) {
t.FailNow()
}
// make sure `model` and enmpty strings are not in the list
assert.Equal(t, "model, top_k, temperature, top_p, , ,", modelConfig.Filters.StripParams)
sanitized, err := modelConfig.Filters.SanitizedStripParams()
if assert.NoError(t, err) {
assert.Equal(t, []string{"temperature", "top_k", "top_p"}, sanitized)
}
}
func TestStripComments(t *testing.T) {
tests := []struct {
name string
input string
expected string
}{
{
name: "no comments",
input: "echo hello\necho world",
expected: "echo hello\necho world",
},
{
name: "single comment line",
input: "# this is a comment\necho hello",
expected: "echo hello",
},
{
name: "multiple comment lines",
input: "# comment 1\necho hello\n# comment 2\necho world",
expected: "echo hello\necho world",
},
{
name: "comment with spaces",
input: " # indented comment\necho hello",
expected: "echo hello",
},
{
name: "empty lines preserved",
input: "echo hello\n\necho world",
expected: "echo hello\n\necho world",
},
{
name: "only comments",
input: "# comment 1\n# comment 2",
expected: "",
},
{
name: "empty string",
input: "",
expected: "",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := StripComments(tt.input)
if result != tt.expected {
t.Errorf("StripComments() = %q, expected %q", result, tt.expected)
}
})
}
}
func TestConfig_MacroInCommentStrippedBeforeExpansion(t *testing.T) {
// Test case that reproduces the original bug where a macro in a comment
// would get expanded and cause the comment text to be included in the command
content := `
startPort: 9990
macros:
"latest-llama": >
/user/llama.cpp/build/bin/llama-server
--port ${PORT}
models:
"test-model":
cmd: |
# ${latest-llama} is a macro that is defined above
${latest-llama}
--model /path/to/model.gguf
-ngl 99
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
// Get the sanitized command
sanitizedCmd, err := SanitizeCommand(config.Models["test-model"].Cmd)
assert.NoError(t, err)
// Join the command for easier inspection
cmdStr := strings.Join(sanitizedCmd, " ")
// Verify that comment text is NOT present in the final command as separate arguments
commentWords := []string{"is", "macro", "that", "defined", "above"}
for _, word := range commentWords {
found := slices.Contains(sanitizedCmd, word)
assert.False(t, found, "Comment text '%s' should not be present as a separate argument in final command", word)
}
// Verify that the actual command components ARE present
expectedParts := []string{
"/user/llama.cpp/build/bin/llama-server",
"--port",
"9990",
"--model",
"/path/to/model.gguf",
"-ngl",
"99",
}
for _, part := range expectedParts {
assert.Contains(t, cmdStr, part, "Expected command part '%s' not found in final command", part)
}
// Verify the server path appears exactly once (not duplicated due to macro expansion)
serverPath := "/user/llama.cpp/build/bin/llama-server"
count := strings.Count(cmdStr, serverPath)
assert.Equal(t, 1, count, "Expected exactly 1 occurrence of server path, found %d", count)
// Verify the expected final command structure
expectedCmd := "/user/llama.cpp/build/bin/llama-server --port 9990 --model /path/to/model.gguf -ngl 99"
assert.Equal(t, expectedCmd, cmdStr, "Final command does not match expected structure")
}
func TestConfig_MacroModelId(t *testing.T) {
content := `
startPort: 9000
macros:
"docker-llama": docker run --name ${MODEL_ID} -p ${PORT}:8080 docker_img
"docker-stop": docker stop ${MODEL_ID}
models:
model1:
cmd: /path/to/server -p ${PORT} -hf ${MODEL_ID}
model2:
cmd: ${docker-llama}
cmdStop: ${docker-stop}
author/model:F16:
cmd: /path/to/server -p ${PORT} -hf ${MODEL_ID}
cmdStop: stop
`
config, err := LoadConfigFromReader(strings.NewReader(content))
assert.NoError(t, err)
sanitizedCmd, err := SanitizeCommand(config.Models["model1"].Cmd)
assert.NoError(t, err)
assert.Equal(t, "/path/to/server -p 9001 -hf model1", strings.Join(sanitizedCmd, " "))
assert.Equal(t, "docker stop ${MODEL_ID}", config.Macros["docker-stop"])
sanitizedCmd2, err := SanitizeCommand(config.Models["model2"].Cmd)
assert.NoError(t, err)
assert.Equal(t, "docker run --name model2 -p 9002:8080 docker_img", strings.Join(sanitizedCmd2, " "))
sanitizedCmdStop, err := SanitizeCommand(config.Models["model2"].CmdStop)
assert.NoError(t, err)
assert.Equal(t, "docker stop model2", strings.Join(sanitizedCmdStop, " "))
sanitizedCmd3, err := SanitizeCommand(config.Models["author/model:F16"].Cmd)
assert.NoError(t, err)
assert.Equal(t, "/path/to/server -p 9000 -hf author/model:F16", strings.Join(sanitizedCmd3, " "))
}
+9
View File
@@ -8,6 +8,7 @@ const ConfigFileChangedEventID = 0x03
const LogDataEventID = 0x04
const TokenMetricsEventID = 0x05
const ModelPreloadedEventID = 0x06
const InFlightRequestsEventID = 0x07
type ProcessStateChangeEvent struct {
ProcessName string
@@ -58,3 +59,11 @@ type ModelPreloadedEvent struct {
func (e ModelPreloadedEvent) Type() uint32 {
return ModelPreloadedEventID
}
type InFlightRequestsEvent struct {
Total int
}
func (e InFlightRequestsEvent) Type() uint32 {
return InFlightRequestsEventID
}
+9 -4
View File
@@ -9,6 +9,7 @@ import (
"testing"
"github.com/gin-gonic/gin"
"github.com/mostlygeek/llama-swap/proxy/config"
"gopkg.in/yaml.v3"
)
@@ -65,18 +66,22 @@ func getTestPort() int {
return port
}
func getTestSimpleResponderConfig(expectedMessage string) ModelConfig {
func getTestSimpleResponderConfig(expectedMessage string) config.ModelConfig {
return getTestSimpleResponderConfigPort(expectedMessage, getTestPort())
}
func getTestSimpleResponderConfigPort(expectedMessage string, port int) ModelConfig {
func getTestSimpleResponderConfigPort(expectedMessage string, port int) config.ModelConfig {
// Convert path to forward slashes for cross-platform compatibility
// Windows handles forward slashes in paths correctly
cmdPath := filepath.ToSlash(simpleResponderPath)
// Create a YAML string with just the values we want to set
yamlStr := fmt.Sprintf(`
cmd: '%s --port %d --silent --respond %s'
proxy: "http://127.0.0.1:%d"
`, simpleResponderPath, port, expectedMessage, port)
`, cmdPath, port, expectedMessage, port)
var cfg ModelConfig
var cfg config.ModelConfig
if err := yaml.Unmarshal([]byte(yamlStr), &cfg); err != nil {
panic(fmt.Sprintf("failed to unmarshal test config: %v in [%s]", err, yamlStr))
}
+121 -21
View File
@@ -1,16 +1,95 @@
package proxy
import (
"container/ring"
"context"
"fmt"
"io"
"os"
"sync"
"time"
"github.com/mostlygeek/llama-swap/event"
)
// circularBuffer is a fixed-size circular byte buffer that overwrites
// oldest data when full. It provides O(1) writes and O(n) reads.
type circularBuffer struct {
data []byte // pre-allocated capacity
head int // next write position
size int // current number of bytes stored (0 to cap)
}
func newCircularBuffer(capacity int) *circularBuffer {
return &circularBuffer{
data: make([]byte, capacity),
head: 0,
size: 0,
}
}
// Write appends bytes to the buffer, overwriting oldest data when full.
// Data is copied into the internal buffer (not stored by reference).
func (cb *circularBuffer) Write(p []byte) {
if len(p) == 0 {
return
}
cap := len(cb.data)
// If input is larger than capacity, only keep the last cap bytes
if len(p) >= cap {
copy(cb.data, p[len(p)-cap:])
cb.head = 0
cb.size = cap
return
}
// Calculate how much space is available from head to end of buffer
firstPart := cap - cb.head
if firstPart >= len(p) {
// All data fits without wrapping
copy(cb.data[cb.head:], p)
cb.head = (cb.head + len(p)) % cap
} else {
// Data wraps around
copy(cb.data[cb.head:], p[:firstPart])
copy(cb.data[:len(p)-firstPart], p[firstPart:])
cb.head = len(p) - firstPart
}
// Update size
cb.size += len(p)
if cb.size > cap {
cb.size = cap
}
}
// GetHistory returns all buffered data in correct order (oldest to newest).
// Returns a new slice (copy), not a view into internal buffer.
func (cb *circularBuffer) GetHistory() []byte {
if cb.size == 0 {
return nil
}
result := make([]byte, cb.size)
cap := len(cb.data)
// Calculate start position (oldest data)
start := (cb.head - cb.size + cap) % cap
if start+cb.size <= cap {
// Data is contiguous, single copy
copy(result, cb.data[start:start+cb.size])
} else {
// Data wraps around, two copies
firstPart := cap - start
copy(result[:firstPart], cb.data[start:])
copy(result[firstPart:], cb.data[:cb.size-firstPart])
}
return result
}
type LogLevel int
const (
@@ -18,12 +97,14 @@ const (
LevelInfo
LevelWarn
LevelError
LogBufferSize = 100 * 1024
)
type LogMonitor struct {
eventbus *event.Dispatcher
mu sync.RWMutex
buffer *ring.Ring
buffer *circularBuffer
bufferMu sync.RWMutex
// typically this can be os.Stdout
@@ -32,6 +113,9 @@ type LogMonitor struct {
// logging levels
level LogLevel
prefix string
// timestamps
timeFormat string
}
func NewLogMonitor() *LogMonitor {
@@ -40,11 +124,12 @@ func NewLogMonitor() *LogMonitor {
func NewLogMonitorWriter(stdout io.Writer) *LogMonitor {
return &LogMonitor{
eventbus: event.NewDispatcherConfig(1000),
buffer: ring.New(10 * 1024), // keep 10KB of buffered logs
stdout: stdout,
level: LevelInfo,
prefix: "",
eventbus: event.NewDispatcherConfig(1000),
buffer: nil, // lazy initialized on first Write
stdout: stdout,
level: LevelInfo,
prefix: "",
timeFormat: "",
}
}
@@ -59,12 +144,15 @@ func (w *LogMonitor) Write(p []byte) (n int, err error) {
}
w.bufferMu.Lock()
bufferCopy := make([]byte, len(p))
copy(bufferCopy, p)
w.buffer.Value = bufferCopy
w.buffer = w.buffer.Next()
if w.buffer == nil {
w.buffer = newCircularBuffer(LogBufferSize)
}
w.buffer.Write(p)
w.bufferMu.Unlock()
// Make a copy for broadcast to preserve immutability
bufferCopy := make([]byte, len(p))
copy(bufferCopy, p)
w.broadcast(bufferCopy)
return n, nil
}
@@ -72,16 +160,18 @@ func (w *LogMonitor) Write(p []byte) (n int, err error) {
func (w *LogMonitor) GetHistory() []byte {
w.bufferMu.RLock()
defer w.bufferMu.RUnlock()
if w.buffer == nil {
return nil
}
return w.buffer.GetHistory()
}
var history []byte
w.buffer.Do(func(p any) {
if p != nil {
if content, ok := p.([]byte); ok {
history = append(history, content...)
}
}
})
return history
// Clear releases the buffer memory, making it eligible for GC.
// The buffer will be lazily re-allocated on the next Write.
func (w *LogMonitor) Clear() {
w.bufferMu.Lock()
w.buffer = nil
w.bufferMu.Unlock()
}
func (w *LogMonitor) OnLogData(callback func(data []byte)) context.CancelFunc {
@@ -106,12 +196,22 @@ func (w *LogMonitor) SetLogLevel(level LogLevel) {
w.level = level
}
func (w *LogMonitor) SetLogTimeFormat(timeFormat string) {
w.mu.Lock()
defer w.mu.Unlock()
w.timeFormat = timeFormat
}
func (w *LogMonitor) formatMessage(level string, msg string) []byte {
prefix := ""
if w.prefix != "" {
prefix = fmt.Sprintf("[%s] ", w.prefix)
}
return []byte(fmt.Sprintf("%s[%s] %s\n", prefix, level, msg))
timestamp := ""
if w.timeFormat != "" {
timestamp = fmt.Sprintf("%s ", time.Now().Format(w.timeFormat))
}
return []byte(fmt.Sprintf("%s%s[%s] %s\n", timestamp, prefix, level, msg))
}
func (w *LogMonitor) log(level LogLevel, msg string) {
+230
View File
@@ -3,8 +3,10 @@ package proxy
import (
"bytes"
"io"
"strings"
"sync"
"testing"
"time"
)
func TestLogMonitor(t *testing.T) {
@@ -84,3 +86,231 @@ func TestWrite_ImmutableBuffer(t *testing.T) {
t.Errorf("Expected history to be %q, got %q", expected, history)
}
}
func TestWrite_LogTimeFormat(t *testing.T) {
// Create a new LogMonitor instance
lm := NewLogMonitorWriter(io.Discard)
// Enable timestamps
lm.timeFormat = time.RFC3339
// Write the message to the LogMonitor
lm.Info("Hello, World!")
// Get the history from the LogMonitor
history := lm.GetHistory()
timestamp := ""
fields := strings.Fields(string(history))
if len(fields) > 0 {
timestamp = fields[0]
} else {
t.Fatalf("Cannot extract string from history")
}
_, err := time.Parse(time.RFC3339, timestamp)
if err != nil {
t.Fatalf("Cannot find timestamp: %v", err)
}
}
func TestCircularBuffer_WrapAround(t *testing.T) {
// Create a small buffer to test wrap-around
cb := newCircularBuffer(10)
// Write "hello" (5 bytes)
cb.Write([]byte("hello"))
if got := string(cb.GetHistory()); got != "hello" {
t.Errorf("Expected 'hello', got %q", got)
}
// Write "world" (5 bytes) - buffer now full
cb.Write([]byte("world"))
if got := string(cb.GetHistory()); got != "helloworld" {
t.Errorf("Expected 'helloworld', got %q", got)
}
// Write "12345" (5 bytes) - should overwrite "hello"
cb.Write([]byte("12345"))
if got := string(cb.GetHistory()); got != "world12345" {
t.Errorf("Expected 'world12345', got %q", got)
}
// Write data larger than buffer capacity
cb.Write([]byte("abcdefghijklmnop")) // 16 bytes, only last 10 kept
if got := string(cb.GetHistory()); got != "ghijklmnop" {
t.Errorf("Expected 'ghijklmnop', got %q", got)
}
}
func TestCircularBuffer_BoundaryConditions(t *testing.T) {
// Test empty buffer
cb := newCircularBuffer(10)
if got := cb.GetHistory(); got != nil {
t.Errorf("Expected nil for empty buffer, got %q", got)
}
// Test exact capacity
cb.Write([]byte("1234567890"))
if got := string(cb.GetHistory()); got != "1234567890" {
t.Errorf("Expected '1234567890', got %q", got)
}
// Test write exactly at capacity boundary
cb = newCircularBuffer(10)
cb.Write([]byte("12345"))
cb.Write([]byte("67890"))
if got := string(cb.GetHistory()); got != "1234567890" {
t.Errorf("Expected '1234567890', got %q", got)
}
}
func TestLogMonitor_LazyInit(t *testing.T) {
lm := NewLogMonitorWriter(io.Discard)
// Buffer should be nil before any writes
if lm.buffer != nil {
t.Error("Expected buffer to be nil before first write")
}
// GetHistory should return nil when buffer is nil
if got := lm.GetHistory(); got != nil {
t.Errorf("Expected nil history before first write, got %q", got)
}
// Write should lazily initialize the buffer
lm.Write([]byte("test"))
if lm.buffer == nil {
t.Error("Expected buffer to be initialized after write")
}
if got := string(lm.GetHistory()); got != "test" {
t.Errorf("Expected 'test', got %q", got)
}
}
func TestLogMonitor_Clear(t *testing.T) {
lm := NewLogMonitorWriter(io.Discard)
// Write some data
lm.Write([]byte("hello"))
if got := string(lm.GetHistory()); got != "hello" {
t.Errorf("Expected 'hello', got %q", got)
}
// Clear should release the buffer
lm.Clear()
if lm.buffer != nil {
t.Error("Expected buffer to be nil after Clear")
}
if got := lm.GetHistory(); got != nil {
t.Errorf("Expected nil history after Clear, got %q", got)
}
}
func TestLogMonitor_ClearAndReuse(t *testing.T) {
lm := NewLogMonitorWriter(io.Discard)
// Write, clear, then write again
lm.Write([]byte("first"))
lm.Clear()
lm.Write([]byte("second"))
if got := string(lm.GetHistory()); got != "second" {
t.Errorf("Expected 'second' after clear and reuse, got %q", got)
}
}
func BenchmarkLogMonitorWrite(b *testing.B) {
// Test data of varying sizes
smallMsg := []byte("small message\n")
mediumMsg := []byte(strings.Repeat("medium message content ", 10) + "\n")
largeMsg := []byte(strings.Repeat("large message content for benchmarking ", 100) + "\n")
b.Run("SmallWrite", func(b *testing.B) {
lm := NewLogMonitorWriter(io.Discard)
b.ResetTimer()
for i := 0; i < b.N; i++ {
lm.Write(smallMsg)
}
})
b.Run("MediumWrite", func(b *testing.B) {
lm := NewLogMonitorWriter(io.Discard)
b.ResetTimer()
for i := 0; i < b.N; i++ {
lm.Write(mediumMsg)
}
})
b.Run("LargeWrite", func(b *testing.B) {
lm := NewLogMonitorWriter(io.Discard)
b.ResetTimer()
for i := 0; i < b.N; i++ {
lm.Write(largeMsg)
}
})
b.Run("WithSubscribers", func(b *testing.B) {
lm := NewLogMonitorWriter(io.Discard)
// Add some subscribers
for i := 0; i < 5; i++ {
lm.OnLogData(func(data []byte) {})
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
lm.Write(mediumMsg)
}
})
b.Run("GetHistory", func(b *testing.B) {
lm := NewLogMonitorWriter(io.Discard)
// Pre-populate with data
for i := 0; i < 1000; i++ {
lm.Write(mediumMsg)
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
lm.GetHistory()
}
})
}
/*
Benchmark Results - MBP M1 Pro
Before (ring.Ring):
| Benchmark | ns/op | bytes/op | allocs/op |
|---------------------------------|------------|----------|-----------|
| SmallWrite (14B) | 43 ns | 40 B | 2 |
| MediumWrite (241B) | 76 ns | 264 B | 2 |
| LargeWrite (4KB) | 504 ns | 4,120 B | 2 |
| WithSubscribers (5 subs) | 355 ns | 264 B | 2 |
| GetHistory (after 1000 writes) | 145,000 ns | 1.2 MB | 22 |
After (circularBuffer 10KB):
| Benchmark | ns/op | bytes/op | allocs/op |
|---------------------------------|------------|----------|-----------|
| SmallWrite (14B) | 26 ns | 16 B | 1 |
| MediumWrite (241B) | 67 ns | 240 B | 1 |
| LargeWrite (4KB) | 774 ns | 4,096 B | 1 |
| WithSubscribers (5 subs) | 325 ns | 240 B | 1 |
| GetHistory (after 1000 writes) | 1,042 ns | 10,240 B | 1 |
After (circularBuffer 100KB):
| Benchmark | ns/op | bytes/op | allocs/op |
|---------------------------------|------------|-----------|-----------|
| SmallWrite (14B) | 26 ns | 16 B | 1 |
| MediumWrite (241B) | 66 ns | 240 B | 1 |
| LargeWrite (4KB) | 753 ns | 4,096 B | 1 |
| WithSubscribers (5 subs) | 309 ns | 240 B | 1 |
| GetHistory (after 1000 writes) | 7,788 ns | 106,496 B | 1 |
Summary:
- GetHistory: 139x faster (10KB), 18x faster (100KB)
- Allocations: reduced from 2 to 1 across all operations
- Small/medium writes: ~1.1-1.6x faster
*/
-184
View File
@@ -1,184 +0,0 @@
package proxy
import (
"bytes"
"fmt"
"io"
"net/http"
"strings"
"time"
"github.com/gin-gonic/gin"
"github.com/tidwall/gjson"
)
type MetricsRecorder struct {
metricsMonitor *MetricsMonitor
realModelName string
// isStreaming bool
startTime time.Time
}
// MetricsMiddleware sets up the MetricsResponseWriter for capturing upstream requests
func MetricsMiddleware(pm *ProxyManager) gin.HandlerFunc {
return func(c *gin.Context) {
bodyBytes, err := io.ReadAll(c.Request.Body)
if err != nil {
pm.sendErrorResponse(c, http.StatusBadRequest, "could not ready request body")
c.Abort()
return
}
c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
requestedModel := gjson.GetBytes(bodyBytes, "model").String()
if requestedModel == "" {
pm.sendErrorResponse(c, http.StatusBadRequest, "missing or invalid 'model' key")
c.Abort()
return
}
realModelName, found := pm.config.RealModelName(requestedModel)
if !found {
pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("could not find real modelID for %s", requestedModel))
c.Abort()
return
}
writer := &MetricsResponseWriter{
ResponseWriter: c.Writer,
metricsRecorder: &MetricsRecorder{
metricsMonitor: pm.metricsMonitor,
realModelName: realModelName,
startTime: time.Now(),
},
}
c.Writer = writer
c.Next()
// check for streaming response
if strings.Contains(c.Writer.Header().Get("Content-Type"), "text/event-stream") {
writer.metricsRecorder.processStreamingResponse(writer.body)
} else {
writer.metricsRecorder.processNonStreamingResponse(writer.body)
}
}
}
func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
usage := jsonData.Get("usage")
timings := jsonData.Get("timings")
if !usage.Exists() && !timings.Exists() {
return false
}
// default values
cachedTokens := -1 // unknown or missing data
outputTokens := 0
inputTokens := 0
// timings data
tokensPerSecond := -1.0
promptPerSecond := -1.0
durationMs := int(time.Since(rec.startTime).Milliseconds())
if usage.Exists() {
outputTokens = int(jsonData.Get("usage.completion_tokens").Int())
inputTokens = int(jsonData.Get("usage.prompt_tokens").Int())
}
// use llama-server's timing data for tok/sec and duration as it is more accurate
if timings.Exists() {
inputTokens = int(jsonData.Get("timings.prompt_n").Int())
outputTokens = int(jsonData.Get("timings.predicted_n").Int())
promptPerSecond = jsonData.Get("timings.prompt_per_second").Float()
tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float()
durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float())
if cachedValue := jsonData.Get("timings.cache_n"); cachedValue.Exists() {
cachedTokens = int(cachedValue.Int())
}
}
rec.metricsMonitor.addMetrics(TokenMetrics{
Timestamp: time.Now(),
Model: rec.realModelName,
CachedTokens: cachedTokens,
InputTokens: inputTokens,
OutputTokens: outputTokens,
PromptPerSecond: promptPerSecond,
TokensPerSecond: tokensPerSecond,
DurationMs: durationMs,
})
return true
}
func (rec *MetricsRecorder) processStreamingResponse(body []byte) {
// Iterate **backwards** through the lines looking for the data payload with
// usage data
lines := bytes.Split(body, []byte("\n"))
for i := len(lines) - 1; i >= 0; i-- {
line := bytes.TrimSpace(lines[i])
if len(line) == 0 {
continue
}
// SSE payload always follows "data:"
prefix := []byte("data:")
if !bytes.HasPrefix(line, prefix) {
continue
}
data := bytes.TrimSpace(line[len(prefix):])
if len(data) == 0 {
continue
}
if bytes.Equal(data, []byte("[DONE]")) {
// [DONE] line itself contains nothing of interest.
continue
}
if gjson.ValidBytes(data) {
if rec.parseAndRecordMetrics(gjson.ParseBytes(data)) {
return // short circuit if a metric was recorded
}
}
}
}
func (rec *MetricsRecorder) processNonStreamingResponse(body []byte) {
if len(body) == 0 {
return
}
// Parse JSON to extract usage information
if gjson.ValidBytes(body) {
rec.parseAndRecordMetrics(gjson.ParseBytes(body))
}
}
// MetricsResponseWriter captures the entire response for non-streaming
type MetricsResponseWriter struct {
gin.ResponseWriter
body []byte
metricsRecorder *MetricsRecorder
}
func (w *MetricsResponseWriter) Write(b []byte) (int, error) {
n, err := w.ResponseWriter.Write(b)
if err != nil {
return n, err
}
w.body = append(w.body, b...)
return n, nil
}
func (w *MetricsResponseWriter) WriteHeader(statusCode int) {
w.ResponseWriter.WriteHeader(statusCode)
}
func (w *MetricsResponseWriter) Header() http.Header {
return w.ResponseWriter.Header()
}
+450 -18
View File
@@ -1,11 +1,20 @@
package proxy
import (
"bytes"
"compress/flate"
"compress/gzip"
"encoding/json"
"fmt"
"io"
"net/http"
"strings"
"sync"
"time"
"github.com/gin-gonic/gin"
"github.com/mostlygeek/llama-swap/event"
"github.com/tidwall/gjson"
)
// TokenMetrics represents parsed token statistics from llama-server logs
@@ -19,6 +28,28 @@ type TokenMetrics struct {
PromptPerSecond float64 `json:"prompt_per_second"`
TokensPerSecond float64 `json:"tokens_per_second"`
DurationMs int `json:"duration_ms"`
HasCapture bool `json:"has_capture"`
}
type ReqRespCapture struct {
ID int `json:"id"`
ReqPath string `json:"req_path"`
ReqHeaders map[string]string `json:"req_headers"`
ReqBody []byte `json:"req_body"`
RespHeaders map[string]string `json:"resp_headers"`
RespBody []byte `json:"resp_body"`
}
// Size returns the approximate memory usage of this capture in bytes
func (c *ReqRespCapture) Size() int {
size := len(c.ReqPath) + len(c.ReqBody) + len(c.RespBody)
for k, v := range c.ReqHeaders {
size += len(k) + len(v)
}
for k, v := range c.RespHeaders {
size += len(k) + len(v)
}
return size
}
// TokenMetricsEvent represents a token metrics event
@@ -30,29 +61,39 @@ func (e TokenMetricsEvent) Type() uint32 {
return TokenMetricsEventID // defined in events.go
}
// MetricsMonitor parses llama-server output for token statistics
type MetricsMonitor struct {
// metricsMonitor parses llama-server output for token statistics
type metricsMonitor struct {
mu sync.RWMutex
metrics []TokenMetrics
maxMetrics int
nextID int
logger *LogMonitor
// capture fields
enableCaptures bool
captures map[int]ReqRespCapture // map for O(1) lookup by ID
captureOrder []int // track insertion order for FIFO eviction
captureSize int // current total size in bytes
maxCaptureSize int // max bytes for captures
}
func NewMetricsMonitor(config *Config) *MetricsMonitor {
maxMetrics := config.MetricsMaxInMemory
if maxMetrics <= 0 {
maxMetrics = 1000 // Default fallback
// newMetricsMonitor creates a new metricsMonitor. captureBufferMB is the
// capture buffer size in megabytes; 0 disables captures.
func newMetricsMonitor(logger *LogMonitor, maxMetrics int, captureBufferMB int) *metricsMonitor {
return &metricsMonitor{
logger: logger,
maxMetrics: maxMetrics,
enableCaptures: captureBufferMB > 0,
captures: make(map[int]ReqRespCapture),
captureOrder: make([]int, 0),
captureSize: 0,
maxCaptureSize: captureBufferMB * 1024 * 1024,
}
mp := &MetricsMonitor{
maxMetrics: maxMetrics,
}
return mp
}
// addMetrics adds a new metric to the collection and publishes an event
func (mp *MetricsMonitor) addMetrics(metric TokenMetrics) {
// addMetrics adds a new metric to the collection and publishes an event.
// Returns the assigned metric ID.
func (mp *metricsMonitor) addMetrics(metric TokenMetrics) int {
mp.mu.Lock()
defer mp.mu.Unlock()
@@ -63,10 +104,53 @@ func (mp *MetricsMonitor) addMetrics(metric TokenMetrics) {
mp.metrics = mp.metrics[len(mp.metrics)-mp.maxMetrics:]
}
event.Emit(TokenMetricsEvent{Metrics: metric})
return metric.ID
}
// GetMetrics returns a copy of the current metrics
func (mp *MetricsMonitor) GetMetrics() []TokenMetrics {
// addCapture adds a new capture to the buffer with size-based eviction.
// Captures are skipped if enableCaptures is false or if capture exceeds maxCaptureSize.
func (mp *metricsMonitor) addCapture(capture ReqRespCapture) {
if !mp.enableCaptures {
return
}
mp.mu.Lock()
defer mp.mu.Unlock()
captureSize := capture.Size()
if captureSize > mp.maxCaptureSize {
mp.logger.Warnf("capture size %d exceeds max %d, skipping", captureSize, mp.maxCaptureSize)
return
}
// Evict oldest (FIFO) until room available
for mp.captureSize+captureSize > mp.maxCaptureSize && len(mp.captureOrder) > 0 {
oldestID := mp.captureOrder[0]
mp.captureOrder = mp.captureOrder[1:]
if evicted, exists := mp.captures[oldestID]; exists {
mp.captureSize -= evicted.Size()
delete(mp.captures, oldestID)
}
}
mp.captures[capture.ID] = capture
mp.captureOrder = append(mp.captureOrder, capture.ID)
mp.captureSize += captureSize
}
// getCaptureByID returns a capture by its ID, or nil if not found.
func (mp *metricsMonitor) getCaptureByID(id int) *ReqRespCapture {
mp.mu.RLock()
defer mp.mu.RUnlock()
if capture, exists := mp.captures[id]; exists {
return &capture
}
return nil
}
// getMetrics returns a copy of the current metrics
func (mp *metricsMonitor) getMetrics() []TokenMetrics {
mp.mu.RLock()
defer mp.mu.RUnlock()
@@ -75,9 +159,357 @@ func (mp *MetricsMonitor) GetMetrics() []TokenMetrics {
return result
}
// GetMetricsJSON returns metrics as JSON
func (mp *MetricsMonitor) GetMetricsJSON() ([]byte, error) {
// getMetricsJSON returns metrics as JSON
func (mp *metricsMonitor) getMetricsJSON() ([]byte, error) {
mp.mu.RLock()
defer mp.mu.RUnlock()
return json.Marshal(mp.metrics)
}
// wrapHandler wraps the proxy handler to extract token metrics
// if wrapHandler returns an error it is safe to assume that no
// data was sent to the client
func (mp *metricsMonitor) wrapHandler(
modelID string,
writer gin.ResponseWriter,
request *http.Request,
next func(modelID string, w http.ResponseWriter, r *http.Request) error,
) error {
// Capture request body and headers if captures enabled
var reqBody []byte
var reqHeaders map[string]string
if mp.enableCaptures {
if request.Body != nil {
var err error
reqBody, err = io.ReadAll(request.Body)
if err != nil {
return fmt.Errorf("failed to read request body for capture: %w", err)
}
request.Body.Close()
request.Body = io.NopCloser(bytes.NewBuffer(reqBody))
}
reqHeaders = make(map[string]string)
for key, values := range request.Header {
if len(values) > 0 {
reqHeaders[key] = values[0]
}
}
redactHeaders(reqHeaders)
}
recorder := newBodyCopier(writer)
// Filter Accept-Encoding to only include encodings we can decompress for metrics
if ae := request.Header.Get("Accept-Encoding"); ae != "" {
request.Header.Set("Accept-Encoding", filterAcceptEncoding(ae))
}
if err := next(modelID, recorder, request); err != nil {
return err
}
// after this point we have to assume that data was sent to the client
// and we can only log errors but not send them to clients
if recorder.Status() != http.StatusOK {
mp.logger.Warnf("metrics skipped, HTTP status=%d, path=%s", recorder.Status(), request.URL.Path)
return nil
}
// Initialize default metrics - these will always be recorded
tm := TokenMetrics{
Timestamp: time.Now(),
Model: modelID,
DurationMs: int(time.Since(recorder.StartTime()).Milliseconds()),
}
body := recorder.body.Bytes()
if len(body) == 0 {
mp.logger.Warn("metrics: empty body, recording minimal metrics")
mp.addMetrics(tm)
return nil
}
// Decompress if needed
if encoding := recorder.Header().Get("Content-Encoding"); encoding != "" {
var err error
body, err = decompressBody(body, encoding)
if err != nil {
mp.logger.Warnf("metrics: decompression failed: %v, path=%s, recording minimal metrics", err, request.URL.Path)
mp.addMetrics(tm)
return nil
}
}
if strings.Contains(recorder.Header().Get("Content-Type"), "text/event-stream") {
if parsed, err := processStreamingResponse(modelID, recorder.StartTime(), body); err != nil {
mp.logger.Warnf("error processing streaming response: %v, path=%s, recording minimal metrics", err, request.URL.Path)
} else {
tm = parsed
}
} else {
if gjson.ValidBytes(body) {
parsed := gjson.ParseBytes(body)
usage := parsed.Get("usage")
timings := parsed.Get("timings")
// extract timings for infill - response is an array, timings are in the last element
// see #463
if strings.HasPrefix(request.URL.Path, "/infill") {
if arr := parsed.Array(); len(arr) > 0 {
timings = arr[len(arr)-1].Get("timings")
}
}
if usage.Exists() || timings.Exists() {
if parsedMetrics, err := parseMetrics(modelID, recorder.StartTime(), usage, timings); err != nil {
mp.logger.Warnf("error parsing metrics: %v, path=%s, recording minimal metrics", err, request.URL.Path)
} else {
tm = parsedMetrics
}
}
} else {
mp.logger.Warnf("metrics: invalid JSON in response body path=%s, recording minimal metrics", request.URL.Path)
}
}
// Build capture if enabled and determine if it will be stored
var capture *ReqRespCapture
if mp.enableCaptures {
respHeaders := make(map[string]string)
for key, values := range recorder.Header() {
if len(values) > 0 {
respHeaders[key] = values[0]
}
}
redactHeaders(respHeaders)
delete(respHeaders, "Content-Encoding")
capture = &ReqRespCapture{
ReqPath: request.URL.Path,
ReqHeaders: reqHeaders,
ReqBody: reqBody,
RespHeaders: respHeaders,
RespBody: body,
}
// Only set HasCapture if the capture will actually be stored (not too large)
if capture.Size() <= mp.maxCaptureSize {
tm.HasCapture = true
}
}
metricID := mp.addMetrics(tm)
// Store capture if enabled
if capture != nil {
capture.ID = metricID
mp.addCapture(*capture)
}
return nil
}
func processStreamingResponse(modelID string, start time.Time, body []byte) (TokenMetrics, error) {
// Iterate **backwards** through the body looking for the data payload with
// usage data. This avoids allocating a slice of all lines via bytes.Split.
// Start from the end of the body and scan backwards for newlines
pos := len(body)
for pos > 0 {
// Find the previous newline (or start of body)
lineStart := bytes.LastIndexByte(body[:pos], '\n')
if lineStart == -1 {
lineStart = 0
} else {
lineStart++ // Move past the newline
}
line := bytes.TrimSpace(body[lineStart:pos])
pos = lineStart - 1 // Move position before the newline for next iteration
if len(line) == 0 {
continue
}
// SSE payload always follows "data:"
prefix := []byte("data:")
if !bytes.HasPrefix(line, prefix) {
continue
}
data := bytes.TrimSpace(line[len(prefix):])
if len(data) == 0 {
continue
}
if bytes.Equal(data, []byte("[DONE]")) {
// [DONE] line itself contains nothing of interest.
continue
}
if gjson.ValidBytes(data) {
parsed := gjson.ParseBytes(data)
usage := parsed.Get("usage")
timings := parsed.Get("timings")
if usage.Exists() || timings.Exists() {
return parseMetrics(modelID, start, usage, timings)
}
}
}
return TokenMetrics{}, fmt.Errorf("no valid JSON data found in stream")
}
func parseMetrics(modelID string, start time.Time, usage, timings gjson.Result) (TokenMetrics, error) {
// default values
cachedTokens := -1 // unknown or missing data
outputTokens := 0
inputTokens := 0
// timings data
tokensPerSecond := -1.0
promptPerSecond := -1.0
durationMs := int(time.Since(start).Milliseconds())
if usage.Exists() {
if pt := usage.Get("prompt_tokens"); pt.Exists() {
// v1/chat/completions
inputTokens = int(pt.Int())
} else if it := usage.Get("input_tokens"); it.Exists() {
// v1/messages
inputTokens = int(it.Int())
}
if ct := usage.Get("completion_tokens"); ct.Exists() {
// v1/chat/completions
outputTokens = int(ct.Int())
} else if ot := usage.Get("output_tokens"); ot.Exists() {
outputTokens = int(ot.Int())
}
if ct := usage.Get("cache_read_input_tokens"); ct.Exists() {
cachedTokens = int(ct.Int())
}
}
// use llama-server's timing data for tok/sec and duration as it is more accurate
if timings.Exists() {
inputTokens = int(timings.Get("prompt_n").Int())
outputTokens = int(timings.Get("predicted_n").Int())
promptPerSecond = timings.Get("prompt_per_second").Float()
tokensPerSecond = timings.Get("predicted_per_second").Float()
durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float())
if cachedValue := timings.Get("cache_n"); cachedValue.Exists() {
cachedTokens = int(cachedValue.Int())
}
}
return TokenMetrics{
Timestamp: time.Now(),
Model: modelID,
CachedTokens: cachedTokens,
InputTokens: inputTokens,
OutputTokens: outputTokens,
PromptPerSecond: promptPerSecond,
TokensPerSecond: tokensPerSecond,
DurationMs: durationMs,
}, nil
}
// decompressBody decompresses the body based on Content-Encoding header
func decompressBody(body []byte, encoding string) ([]byte, error) {
switch strings.ToLower(strings.TrimSpace(encoding)) {
case "gzip":
reader, err := gzip.NewReader(bytes.NewReader(body))
if err != nil {
return nil, err
}
defer reader.Close()
return io.ReadAll(reader)
case "deflate":
reader := flate.NewReader(bytes.NewReader(body))
defer reader.Close()
return io.ReadAll(reader)
default:
return body, nil // Return as-is for unknown/no encoding
}
}
// responseBodyCopier records the response body and writes to the original response writer
// while also capturing it in a buffer for later processing
type responseBodyCopier struct {
gin.ResponseWriter
body *bytes.Buffer
tee io.Writer
start time.Time
}
func newBodyCopier(w gin.ResponseWriter) *responseBodyCopier {
bodyBuffer := &bytes.Buffer{}
return &responseBodyCopier{
ResponseWriter: w,
body: bodyBuffer,
tee: io.MultiWriter(w, bodyBuffer),
}
}
func (w *responseBodyCopier) Write(b []byte) (int, error) {
if w.start.IsZero() {
w.start = time.Now()
}
// Single write operation that writes to both the response and buffer
return w.tee.Write(b)
}
func (w *responseBodyCopier) WriteHeader(statusCode int) {
w.ResponseWriter.WriteHeader(statusCode)
}
func (w *responseBodyCopier) Header() http.Header {
return w.ResponseWriter.Header()
}
func (w *responseBodyCopier) StartTime() time.Time {
return w.start
}
// sensitiveHeaders lists headers that should be redacted in captures
var sensitiveHeaders = map[string]bool{
"authorization": true,
"proxy-authorization": true,
"cookie": true,
"set-cookie": true,
"x-api-key": true,
}
// redactHeaders replaces sensitive header values in-place with "[REDACTED]"
func redactHeaders(headers map[string]string) {
for key := range headers {
if sensitiveHeaders[strings.ToLower(key)] {
headers[key] = "[REDACTED]"
}
}
}
// filterAcceptEncoding filters the Accept-Encoding header to only include
// encodings we can decompress (gzip, deflate). This respects the client's
// preferences while ensuring we can parse response bodies for metrics.
func filterAcceptEncoding(acceptEncoding string) string {
if acceptEncoding == "" {
return ""
}
supported := map[string]bool{"gzip": true, "deflate": true}
var filtered []string
for _, part := range strings.Split(acceptEncoding, ",") {
// Parse encoding and optional quality value (e.g., "gzip;q=1.0")
encoding := strings.TrimSpace(strings.Split(part, ";")[0])
if supported[strings.ToLower(encoding)] {
filtered = append(filtered, strings.TrimSpace(part))
}
}
return strings.Join(filtered, ", ")
}
File diff suppressed because it is too large Load Diff
+141
View File
@@ -0,0 +1,141 @@
package proxy
import (
"fmt"
"net"
"net/http"
"net/http/httputil"
"runtime"
"sort"
"strings"
"time"
"github.com/mostlygeek/llama-swap/proxy/config"
)
type peerProxyMember struct {
peerID string
reverseProxy *httputil.ReverseProxy
apiKey string
}
type PeerProxy struct {
peers config.PeerDictionaryConfig
proxyMap map[string]*peerProxyMember
}
func NewPeerProxy(peers config.PeerDictionaryConfig, proxyLogger *LogMonitor) (*PeerProxy, error) {
proxyMap := make(map[string]*peerProxyMember)
// Sort peer IDs for consistent iteration order
peerIDs := make([]string, 0, len(peers))
for peerID := range peers {
peerIDs = append(peerIDs, peerID)
}
sort.Strings(peerIDs)
// Create a shared transport with reasonable timeouts for peer connections
// these can be tuned with feedback later
peerTransport := &http.Transport{
DialContext: (&net.Dialer{
Timeout: 30 * time.Second, // Connection timeout
KeepAlive: 30 * time.Second,
}).DialContext,
TLSHandshakeTimeout: 10 * time.Second,
ResponseHeaderTimeout: 60 * time.Second, // Time to wait for response headers
ExpectContinueTimeout: 1 * time.Second,
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
}
for _, peerID := range peerIDs {
peer := peers[peerID]
// Create reverse proxy for this peer
reverseProxy := httputil.NewSingleHostReverseProxy(peer.ProxyURL)
reverseProxy.Transport = peerTransport
// Wrap Director to set Host header for remote hosts (not localhost)
originalDirector := reverseProxy.Director
reverseProxy.Director = func(req *http.Request) {
originalDirector(req)
// Ensure Host header matches target URL for remote proxying
req.Host = req.URL.Host
}
reverseProxy.ModifyResponse = func(resp *http.Response) error {
if strings.Contains(strings.ToLower(resp.Header.Get("Content-Type")), "text/event-stream") {
resp.Header.Set("X-Accel-Buffering", "no")
}
return nil
}
reverseProxy.ErrorHandler = func(w http.ResponseWriter, r *http.Request, err error) {
proxyLogger.Warnf("peer %s: proxy error: %v", peerID, err)
errMsg := fmt.Sprintf("peer proxy error: %v", err)
if runtime.GOOS == "darwin" && strings.Contains(err.Error(), "connect: no route to host") {
errMsg += " (hint: on macOS, check System Settings > Privacy & Security > Local Network permissions)"
}
http.Error(w, errMsg, http.StatusBadGateway)
}
pp := &peerProxyMember{
peerID: peerID,
reverseProxy: reverseProxy,
apiKey: peer.ApiKey,
}
// Map each model to this peer's proxy
for _, modelID := range peer.Models {
if _, found := proxyMap[modelID]; found {
proxyLogger.Warnf("peer %s: model %s already mapped to another peer, skipping", peerID, modelID)
continue
}
proxyMap[modelID] = pp
}
}
return &PeerProxy{
peers: peers,
proxyMap: proxyMap,
}, nil
}
func (p *PeerProxy) HasPeerModel(modelID string) bool {
_, found := p.proxyMap[modelID]
return found
}
// GetPeerFilters returns the filters for a peer model, or empty filters if not found
func (p *PeerProxy) GetPeerFilters(modelID string) config.Filters {
pp, found := p.proxyMap[modelID]
if !found {
return config.Filters{}
}
// Get the peer config using the peerID
peer, found := p.peers[pp.peerID]
if !found {
return config.Filters{}
}
return peer.Filters
}
func (p *PeerProxy) ListPeers() config.PeerDictionaryConfig {
return p.peers
}
func (p *PeerProxy) ProxyRequest(model_id string, writer http.ResponseWriter, request *http.Request) error {
pp, found := p.proxyMap[model_id]
if !found {
return fmt.Errorf("no peer proxy found for model %s", model_id)
}
// Inject API key if configured for this peer
if pp.apiKey != "" {
request.Header.Set("Authorization", "Bearer "+pp.apiKey)
request.Header.Set("x-api-key", pp.apiKey)
}
pp.reverseProxy.ServeHTTP(writer, request)
return nil
}
+268
View File
@@ -0,0 +1,268 @@
package proxy
import (
"net/http"
"net/http/httptest"
"net/url"
"strings"
"testing"
"github.com/mostlygeek/llama-swap/proxy/config"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestNewPeerProxy_EmptyPeers(t *testing.T) {
peers := config.PeerDictionaryConfig{}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
assert.NotNil(t, pm)
assert.Empty(t, pm.proxyMap)
}
func TestNewPeerProxy_SinglePeer(t *testing.T) {
proxyURL, _ := url.Parse("http://peer1.example.com:8080")
peers := config.PeerDictionaryConfig{
"peer1": config.PeerConfig{
Proxy: "http://peer1.example.com:8080",
ProxyURL: proxyURL,
ApiKey: "test-key",
Models: []string{"model-a", "model-b"},
},
}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
assert.Len(t, pm.proxyMap, 2)
assert.True(t, pm.HasPeerModel("model-a"))
assert.True(t, pm.HasPeerModel("model-b"))
assert.False(t, pm.HasPeerModel("model-c"))
}
func TestNewPeerProxy_MultiplePeers(t *testing.T) {
proxyURL1, _ := url.Parse("http://peer1.example.com:8080")
proxyURL2, _ := url.Parse("http://peer2.example.com:8080")
peers := config.PeerDictionaryConfig{
"peer1": config.PeerConfig{
Proxy: "http://peer1.example.com:8080",
ProxyURL: proxyURL1,
Models: []string{"model-a", "model-b"},
},
"peer2": config.PeerConfig{
Proxy: "http://peer2.example.com:8080",
ProxyURL: proxyURL2,
Models: []string{"model-c", "model-d"},
},
}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
assert.Len(t, pm.proxyMap, 4)
assert.True(t, pm.HasPeerModel("model-a"))
assert.True(t, pm.HasPeerModel("model-b"))
assert.True(t, pm.HasPeerModel("model-c"))
assert.True(t, pm.HasPeerModel("model-d"))
}
func TestNewPeerProxy_DuplicateModelWarning(t *testing.T) {
// When the same model is in multiple peers, only the first (lexicographically by peer ID)
// should be mapped, and a warning should be logged
proxyURL1, _ := url.Parse("http://peer1.example.com:8080")
proxyURL2, _ := url.Parse("http://peer2.example.com:8080")
peers := config.PeerDictionaryConfig{
"alpha-peer": config.PeerConfig{
Proxy: "http://peer1.example.com:8080",
ProxyURL: proxyURL1,
Models: []string{"duplicate-model"},
},
"beta-peer": config.PeerConfig{
Proxy: "http://peer2.example.com:8080",
ProxyURL: proxyURL2,
Models: []string{"duplicate-model"},
},
}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
// Should only have one entry for the duplicate model
assert.Len(t, pm.proxyMap, 1)
assert.True(t, pm.HasPeerModel("duplicate-model"))
}
func TestHasPeerModel(t *testing.T) {
proxyURL, _ := url.Parse("http://peer1.example.com:8080")
peers := config.PeerDictionaryConfig{
"peer1": config.PeerConfig{
Proxy: "http://peer1.example.com:8080",
ProxyURL: proxyURL,
Models: []string{"existing-model"},
},
}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
assert.True(t, pm.HasPeerModel("existing-model"))
assert.False(t, pm.HasPeerModel("non-existing-model"))
}
func TestProxyRequest_ModelNotFound(t *testing.T) {
peers := config.PeerDictionaryConfig{}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
req := httptest.NewRequest("POST", "/v1/chat/completions", nil)
w := httptest.NewRecorder()
err = pm.ProxyRequest("non-existing-model", w, req)
assert.Error(t, err)
assert.Contains(t, err.Error(), "no peer proxy found for model non-existing-model")
}
func TestProxyRequest_Success(t *testing.T) {
// Create a test server to act as the peer
testServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("response from peer"))
}))
defer testServer.Close()
proxyURL, _ := url.Parse(testServer.URL)
peers := config.PeerDictionaryConfig{
"peer1": config.PeerConfig{
Proxy: testServer.URL,
ProxyURL: proxyURL,
Models: []string{"test-model"},
},
}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
req := httptest.NewRequest("POST", "/v1/chat/completions", nil)
w := httptest.NewRecorder()
err = pm.ProxyRequest("test-model", w, req)
assert.NoError(t, err)
assert.Equal(t, http.StatusOK, w.Code)
assert.Equal(t, "response from peer", w.Body.String())
}
func TestProxyRequest_ApiKeyInjection(t *testing.T) {
// Create a test server that checks for the Authorization header
var receivedAuthHeader string
testServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
receivedAuthHeader = r.Header.Get("Authorization")
w.WriteHeader(http.StatusOK)
}))
defer testServer.Close()
proxyURL, _ := url.Parse(testServer.URL)
peers := config.PeerDictionaryConfig{
"peer1": config.PeerConfig{
Proxy: testServer.URL,
ProxyURL: proxyURL,
ApiKey: "secret-api-key",
Models: []string{"test-model"},
},
}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
req := httptest.NewRequest("POST", "/v1/chat/completions", nil)
w := httptest.NewRecorder()
err = pm.ProxyRequest("test-model", w, req)
assert.NoError(t, err)
assert.Equal(t, "Bearer secret-api-key", receivedAuthHeader)
}
func TestProxyRequest_NoApiKey(t *testing.T) {
// Create a test server that checks for the Authorization header
var receivedAuthHeader string
testServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
receivedAuthHeader = r.Header.Get("Authorization")
w.WriteHeader(http.StatusOK)
}))
defer testServer.Close()
proxyURL, _ := url.Parse(testServer.URL)
peers := config.PeerDictionaryConfig{
"peer1": config.PeerConfig{
Proxy: testServer.URL,
ProxyURL: proxyURL,
ApiKey: "", // No API key
Models: []string{"test-model"},
},
}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
req := httptest.NewRequest("POST", "/v1/chat/completions", nil)
w := httptest.NewRecorder()
err = pm.ProxyRequest("test-model", w, req)
assert.NoError(t, err)
assert.Empty(t, receivedAuthHeader)
}
func TestProxyRequest_HostHeaderSet(t *testing.T) {
// Create a test server that checks the Host header
var receivedHost string
testServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
receivedHost = r.Host
w.WriteHeader(http.StatusOK)
}))
defer testServer.Close()
proxyURL, _ := url.Parse(testServer.URL)
peers := config.PeerDictionaryConfig{
"peer1": config.PeerConfig{
Proxy: testServer.URL,
ProxyURL: proxyURL,
Models: []string{"test-model"},
},
}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
req := httptest.NewRequest("POST", "/v1/chat/completions", nil)
w := httptest.NewRecorder()
err = pm.ProxyRequest("test-model", w, req)
assert.NoError(t, err)
// The Host header should be set to the target URL's host
assert.True(t, strings.HasPrefix(receivedHost, "127.0.0.1:"))
}
func TestProxyRequest_SSEHeaderModification(t *testing.T) {
// Create a test server that returns SSE content type
testServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/event-stream")
w.WriteHeader(http.StatusOK)
}))
defer testServer.Close()
proxyURL, _ := url.Parse(testServer.URL)
peers := config.PeerDictionaryConfig{
"peer1": config.PeerConfig{
Proxy: testServer.URL,
ProxyURL: proxyURL,
Models: []string{"test-model"},
},
}
pm, err := NewPeerProxy(peers, testLogger)
require.NoError(t, err)
req := httptest.NewRequest("POST", "/v1/chat/completions", nil)
w := httptest.NewRecorder()
err = pm.ProxyRequest("test-model", w, req)
assert.NoError(t, err)
// The X-Accel-Buffering header should be set to "no" for SSE
assert.Equal(t, "no", w.Header().Get("X-Accel-Buffering"))
}
+380 -68
View File
@@ -2,20 +2,23 @@ package proxy
import (
"context"
"encoding/json"
"errors"
"fmt"
"io"
"math/rand"
"net"
"net/http"
"net/http/httputil"
"net/url"
"os/exec"
"strconv"
"strings"
"sync"
"sync/atomic"
"syscall"
"time"
"github.com/mostlygeek/llama-swap/event"
"github.com/mostlygeek/llama-swap/proxy/config"
)
type ProcessState string
@@ -38,11 +41,13 @@ const (
)
type Process struct {
ID string
config ModelConfig
cmd *exec.Cmd
ID string
config config.ModelConfig
cmd *exec.Cmd
reverseProxy *httputil.ReverseProxy
// PR #155 called to cancel the upstream process
cmdMutex sync.RWMutex
cancelUpstream context.CancelFunc
// closed when command exits
@@ -54,12 +59,14 @@ type Process struct {
healthCheckTimeout int
healthCheckLoopInterval time.Duration
lastRequestHandled time.Time
lastRequestHandledMutex sync.RWMutex
lastRequestHandled time.Time
stateMutex sync.RWMutex
state ProcessState
inFlightRequests sync.WaitGroup
inFlightRequests sync.WaitGroup
inFlightRequestsCount atomic.Int32
// used to block on multiple start() calls
waitStarting sync.WaitGroup
@@ -74,16 +81,35 @@ type Process struct {
failedStartCount int
}
func NewProcess(ID string, healthCheckTimeout int, config ModelConfig, processLogger *LogMonitor, proxyLogger *LogMonitor) *Process {
func NewProcess(ID string, healthCheckTimeout int, config config.ModelConfig, processLogger *LogMonitor, proxyLogger *LogMonitor) *Process {
concurrentLimit := 10
if config.ConcurrencyLimit > 0 {
concurrentLimit = config.ConcurrencyLimit
}
// Setup the reverse proxy.
proxyURL, err := url.Parse(config.Proxy)
if err != nil {
proxyLogger.Errorf("<%s> invalid proxy URL %q: %v", ID, config.Proxy, err)
}
var reverseProxy *httputil.ReverseProxy
if proxyURL != nil {
reverseProxy = httputil.NewSingleHostReverseProxy(proxyURL)
reverseProxy.ModifyResponse = func(resp *http.Response) error {
// prevent nginx from buffering streaming responses (e.g., SSE)
if strings.Contains(strings.ToLower(resp.Header.Get("Content-Type")), "text/event-stream") {
resp.Header.Set("X-Accel-Buffering", "no")
}
return nil
}
}
return &Process{
ID: ID,
config: config,
cmd: nil,
reverseProxy: reverseProxy,
cancelUpstream: nil,
processLogger: processLogger,
proxyLogger: proxyLogger,
@@ -106,6 +132,20 @@ func (p *Process) LogMonitor() *LogMonitor {
return p.processLogger
}
// setLastRequestHandled sets the last request handled time in a thread-safe manner.
func (p *Process) setLastRequestHandled(t time.Time) {
p.lastRequestHandledMutex.Lock()
defer p.lastRequestHandledMutex.Unlock()
p.lastRequestHandled = t
}
// getLastRequestHandled gets the last request handled time in a thread-safe manner.
func (p *Process) getLastRequestHandled() time.Time {
p.lastRequestHandledMutex.RLock()
defer p.lastRequestHandledMutex.RUnlock()
return p.lastRequestHandled
}
// custom error types for swapping state
var (
ErrExpectedStateMismatch = errors.New("expected state mismatch")
@@ -129,6 +169,13 @@ func (p *Process) swapState(expectedState, newState ProcessState) (ProcessState,
}
p.state = newState
// Atomically increment waitStarting when entering StateStarting
// This ensures any thread that sees StateStarting will also see the WaitGroup counter incremented
if newState == StateStarting {
p.waitStarting.Add(1)
}
p.proxyLogger.Debugf("<%s> swapState() State transitioned from %s to %s", p.ID, expectedState, newState)
event.Emit(ProcessStateChangeEvent{ProcessName: p.ID, NewState: newState, OldState: expectedState})
return p.state, nil
@@ -157,6 +204,15 @@ func (p *Process) CurrentState() ProcessState {
return p.state
}
// forceState forces the process state to the new state with mutex protection.
// This should only be used in exceptional cases where the normal state transition
// validation via swapState() cannot be used.
func (p *Process) forceState(newState ProcessState) {
p.stateMutex.Lock()
defer p.stateMutex.Unlock()
p.state = newState
}
// start starts the upstream command, checks the health endpoint, and sets the state to Ready
// it is a private method because starting is automatic but stopping can be called
// at any time.
@@ -190,7 +246,7 @@ func (p *Process) start() error {
}
}
p.waitStarting.Add(1)
// waitStarting.Add(1) is now called atomically in swapState() when transitioning to StateStarting
defer p.waitStarting.Done()
cmdContext, ctxCancelUpstream := context.WithCancel(context.Background())
@@ -200,8 +256,12 @@ func (p *Process) start() error {
p.cmd.Env = append(p.cmd.Environ(), p.config.Env...)
p.cmd.Cancel = p.cmdStopUpstreamProcess
p.cmd.WaitDelay = p.gracefulStopTimeout
setProcAttributes(p.cmd)
p.cmdMutex.Lock()
p.cancelUpstream = ctxCancelUpstream
p.cmdWaitChan = make(chan struct{})
p.cmdMutex.Unlock()
p.failedStartCount++ // this will be reset to zero when the process has successfully started
@@ -211,7 +271,7 @@ func (p *Process) start() error {
// Set process state to failed
if err != nil {
if curState, swapErr := p.swapState(StateStarting, StateStopped); swapErr != nil {
p.state = StateStopped // force it into a stopped state
p.forceState(StateStopped) // force it into a stopped state
return fmt.Errorf(
"failed to start command '%s' and state swap failed. command error: %v, current state: %v, state swap error: %v",
strings.Join(args, " "), err, curState, swapErr,
@@ -284,10 +344,12 @@ func (p *Process) start() error {
return
}
// wait for all inflight requests to complete and ticker
p.inFlightRequests.Wait()
// skip the TTL check if there are inflight requests
if p.inFlightRequestsCount.Load() != 0 {
continue
}
if time.Since(p.lastRequestHandled) > maxDuration {
if time.Since(p.getLastRequestHandled()) > maxDuration {
p.proxyLogger.Infof("<%s> Unloading model, TTL of %ds reached", p.ID, p.config.UnloadAfter)
p.Stop()
return
@@ -343,7 +405,7 @@ func (p *Process) Shutdown() {
p.stopCommand()
// just force it to this state since there is no recovery from shutdown
p.state = StateShutdown
p.forceState(StateShutdown)
}
// stopCommand will send a SIGTERM to the process and wait for it to exit.
@@ -352,15 +414,23 @@ func (p *Process) stopCommand() {
stopStartTime := time.Now()
defer func() {
p.proxyLogger.Debugf("<%s> stopCommand took %v", p.ID, time.Since(stopStartTime))
// free the buffer in processLogger so the memory can be recovered
p.processLogger.Clear()
}()
if p.cancelUpstream == nil {
p.cmdMutex.RLock()
cancelUpstream := p.cancelUpstream
cmdWaitChan := p.cmdWaitChan
p.cmdMutex.RUnlock()
if cancelUpstream == nil {
p.proxyLogger.Errorf("<%s> stopCommand has a nil p.cancelUpstream()", p.ID)
return
}
p.cancelUpstream()
<-p.cmdWaitChan
cancelUpstream()
<-cmdWaitChan
}
func (p *Process) checkHealthEndpoint(healthURL string) error {
@@ -398,6 +468,12 @@ func (p *Process) checkHealthEndpoint(healthURL string) error {
}
func (p *Process) ProxyRequest(w http.ResponseWriter, r *http.Request) {
if p.reverseProxy == nil {
http.Error(w, fmt.Sprintf("No reverse proxy available for %s", p.ID), http.StatusInternalServerError)
return
}
requestBeginTime := time.Now()
var startDuration time.Duration
@@ -417,72 +493,75 @@ func (p *Process) ProxyRequest(w http.ResponseWriter, r *http.Request) {
}
p.inFlightRequests.Add(1)
p.inFlightRequestsCount.Add(1)
defer func() {
p.lastRequestHandled = time.Now()
p.setLastRequestHandled(time.Now())
p.inFlightRequestsCount.Add(-1)
p.inFlightRequests.Done()
}()
// for #366
// - extract streaming param from request context, should have been set by proxymanager
var srw *statusResponseWriter
swapCtx, cancelLoadCtx := context.WithCancel(r.Context())
// start the process on demand
if p.CurrentState() != StateReady {
// start a goroutine to stream loading status messages into the response writer
// add a sync so the streaming client only runs when the goroutine has exited
isStreaming, _ := r.Context().Value(proxyCtxKey("streaming")).(bool)
// PR #417 (no support for anthropic v1/messages yet)
isChatCompletions := strings.HasPrefix(r.URL.Path, "/v1/chat/completions")
if p.config.SendLoadingState != nil && *p.config.SendLoadingState && isStreaming && isChatCompletions {
srw = newStatusResponseWriter(p, w)
go srw.statusUpdates(swapCtx)
} else {
p.proxyLogger.Debugf("<%s> SendLoadingState is nil or false, not streaming loading state", p.ID)
}
beginStartTime := time.Now()
if err := p.start(); err != nil {
errstr := fmt.Sprintf("unable to start process: %s", err)
http.Error(w, errstr, http.StatusBadGateway)
cancelLoadCtx()
if srw != nil {
srw.sendData(fmt.Sprintf("Unable to swap model err: %s\n", errstr))
// Wait for statusUpdates goroutine to finish writing its deferred "Done!" messages
// before closing the connection. Without this, the connection would close before
// the goroutine can write its cleanup messages, causing incomplete SSE output.
srw.waitForCompletion(100 * time.Millisecond)
} else {
http.Error(w, errstr, http.StatusBadGateway)
}
return
}
startDuration = time.Since(beginStartTime)
}
proxyTo := p.config.Proxy
client := &http.Client{}
req, err := http.NewRequestWithContext(r.Context(), r.Method, proxyTo+r.URL.String(), r.Body)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
req.Header = r.Header.Clone()
// should trigger srw to stop sending loading events ...
cancelLoadCtx()
contentLength, err := strconv.ParseInt(req.Header.Get("content-length"), 10, 64)
if err == nil {
req.ContentLength = contentLength
}
resp, err := client.Do(req)
if err != nil {
http.Error(w, err.Error(), http.StatusBadGateway)
return
}
defer resp.Body.Close()
for k, vv := range resp.Header {
for _, v := range vv {
w.Header().Add(k, v)
}
}
// prevent nginx from buffering streaming responses (e.g., SSE)
if strings.Contains(strings.ToLower(resp.Header.Get("Content-Type")), "text/event-stream") {
w.Header().Set("X-Accel-Buffering", "no")
}
w.WriteHeader(resp.StatusCode)
// faster than io.Copy when streaming
buf := make([]byte, 32*1024)
for {
n, err := resp.Body.Read(buf)
if n > 0 {
if _, writeErr := w.Write(buf[:n]); writeErr != nil {
return
}
if flusher, ok := w.(http.Flusher); ok {
flusher.Flush()
// recover from http.ErrAbortHandler panics that can occur when the client
// disconnects before the response is sent
defer func() {
if r := recover(); r != nil {
if r == http.ErrAbortHandler {
p.proxyLogger.Infof("<%s> recovered from client disconnection during streaming", p.ID)
} else {
p.proxyLogger.Infof("<%s> recovered from panic: %v", p.ID, r)
}
}
if err == io.EOF {
break
}
if err != nil {
http.Error(w, err.Error(), http.StatusBadGateway)
return
}()
if srw != nil {
// Wait for the goroutine to finish writing its final messages
const completionTimeout = 1 * time.Second
if !srw.waitForCompletion(completionTimeout) {
p.proxyLogger.Warnf("<%s> status updates goroutine did not complete within %v, proceeding with proxy request", p.ID, completionTimeout)
}
p.reverseProxy.ServeHTTP(srw, r)
} else {
p.reverseProxy.ServeHTTP(w, r)
}
totalTime := time.Since(requestBeginTime)
@@ -518,13 +597,16 @@ func (p *Process) waitForCmd() {
case StateStopping:
if curState, err := p.swapState(StateStopping, StateStopped); err != nil {
p.proxyLogger.Errorf("<%s> Process exited but could not swap to StateStopped. curState=%s, err: %v", p.ID, curState, err)
p.state = StateStopped
p.forceState(StateStopped)
}
default:
p.proxyLogger.Infof("<%s> process exited but not StateStopping, current state: %s", p.ID, currentState)
p.state = StateStopped // force it to be in this state
p.forceState(StateStopped) // force it to be in this state
}
p.cmdMutex.Lock()
close(p.cmdWaitChan)
p.cmdMutex.Unlock()
}
// cmdStopUpstreamProcess attemps to stop the upstream process gracefully
@@ -539,7 +621,7 @@ func (p *Process) cmdStopUpstreamProcess() error {
if p.config.CmdStop != "" {
// replace ${PID} with the pid of the process
stopArgs, err := SanitizeCommand(strings.ReplaceAll(p.config.CmdStop, "${PID}", fmt.Sprintf("%d", p.cmd.Process.Pid)))
stopArgs, err := config.SanitizeCommand(strings.ReplaceAll(p.config.CmdStop, "${PID}", fmt.Sprintf("%d", p.cmd.Process.Pid)))
if err != nil {
p.proxyLogger.Errorf("<%s> Failed to sanitize stop command: %v", p.ID, err)
return err
@@ -550,6 +632,7 @@ func (p *Process) cmdStopUpstreamProcess() error {
stopCmd := exec.Command(stopArgs[0], stopArgs[1:]...)
stopCmd.Stdout = p.processLogger
stopCmd.Stderr = p.processLogger
setProcAttributes(stopCmd)
stopCmd.Env = p.cmd.Env
if err := stopCmd.Run(); err != nil {
@@ -565,3 +648,232 @@ func (p *Process) cmdStopUpstreamProcess() error {
return nil
}
// Logger returns the logger for this process.
func (p *Process) Logger() *LogMonitor {
return p.processLogger
}
var loadingRemarks = []string{
"Still faster than your last standup meeting...",
"Reticulating splines...",
"Waking up the hamsters...",
"Teaching the model manners...",
"Convincing the GPU to participate...",
"Loading weights (they're heavy)...",
"Herding electrons...",
"Compiling excuses for the delay...",
"Downloading more RAM...",
"Asking the model nicely to boot up...",
"Bribing CUDA with cookies...",
"Still loading (blame VRAM)...",
"The model is fashionably late...",
"Warming up those tensors...",
"Making the neural net do push-ups...",
"Your patience is appreciated (really)...",
"Almost there (probably)...",
"Loading like it's 1999...",
"The model forgot where it put its keys...",
"Quantum tunneling through layers...",
"Negotiating with the PCIe bus...",
"Defrosting frozen parameters...",
"Teaching attention heads to focus...",
"Running the matrix (slowly)...",
"Untangling transformer blocks...",
"Calibrating the flux capacitor...",
"Spinning up the probability wheels...",
"Waiting for the GPU to wake from its nap...",
"Converting caffeine to compute...",
"Allocating virtual patience...",
"Performing arcane CUDA rituals...",
"The model is stuck in traffic...",
"Inflating embeddings...",
"Summoning computational demons...",
"Pleading with the OOM killer...",
"Calculating the meaning of life (still at 42)...",
"Training the training wheels...",
"Optimizing the optimizer...",
"Bootstrapping the bootstrapper...",
"Loading loading screen...",
"Processing processing logs...",
"Buffering buffer overflow jokes...",
"The model hit snooze...",
"Debugging the debugger...",
"Compiling the compiler...",
"Parsing the parser (meta)...",
"Tokenizing tokens...",
"Encoding the encoder...",
"Hashing hash browns...",
"Forking spoons (not forks)...",
"The model is contemplating existence...",
"Transcending dimensional barriers...",
"Invoking elder tensor gods...",
"Unfurling probability clouds...",
"Synchronizing parallel universes...",
"The GPU is having second thoughts...",
"Recalibrating reality matrices...",
"Time is an illusion, loading doubly so...",
"Convincing bits to flip themselves...",
"The model is reading its own documentation...",
}
type statusResponseWriter struct {
hasWritten bool
writer http.ResponseWriter
process *Process
wg sync.WaitGroup // Track goroutine completion
start time.Time
}
func newStatusResponseWriter(p *Process, w http.ResponseWriter) *statusResponseWriter {
s := &statusResponseWriter{
writer: w,
process: p,
start: time.Now(),
}
s.Header().Set("Content-Type", "text/event-stream") // SSE
s.Header().Set("Cache-Control", "no-cache") // no-cache
s.Header().Set("Connection", "keep-alive") // keep-alive
s.WriteHeader(http.StatusOK) // send status code 200
s.sendLine("━━━━━")
s.sendLine(fmt.Sprintf("llama-swap loading model: %s", p.ID))
return s
}
// statusUpdates sends status updates to the client while the model is loading
func (s *statusResponseWriter) statusUpdates(ctx context.Context) {
s.wg.Add(1)
defer s.wg.Done()
// Recover from panics caused by client disconnection
// Note: recover() only works within the same goroutine, so we need it here
defer func() {
if r := recover(); r != nil {
s.process.proxyLogger.Debugf("<%s> statusUpdates recovered from panic (likely client disconnect): %v", s.process.ID, r)
}
}()
defer func() {
duration := time.Since(s.start)
s.sendLine(fmt.Sprintf("\nDone! (%.2fs)", duration.Seconds()))
s.sendLine("━━━━━")
s.sendLine(" ")
}()
// Create a shuffled copy of loadingRemarks
remarks := make([]string, len(loadingRemarks))
copy(remarks, loadingRemarks)
rand.Shuffle(len(remarks), func(i, j int) {
remarks[i], remarks[j] = remarks[j], remarks[i]
})
ri := 0
// Pick a random duration to send a remark
nextRemarkIn := time.Duration(2+rand.Intn(4)) * time.Second
lastRemarkTime := time.Now()
ticker := time.NewTicker(time.Second)
defer ticker.Stop() // Ensure ticker is stopped to prevent resource leak
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
if s.process.CurrentState() == StateReady {
return
}
// Check if it's time for a snarky remark
if time.Since(lastRemarkTime) >= nextRemarkIn {
remark := remarks[ri%len(remarks)]
ri++
s.sendLine(fmt.Sprintf("\n%s", remark))
lastRemarkTime = time.Now()
// Pick a new random duration for the next remark
nextRemarkIn = time.Duration(5+rand.Intn(5)) * time.Second
} else {
s.sendData(".")
}
}
}
}
// waitForCompletion waits for the statusUpdates goroutine to finish
func (s *statusResponseWriter) waitForCompletion(timeout time.Duration) bool {
done := make(chan struct{})
go func() {
s.wg.Wait()
close(done)
}()
select {
case <-done:
return true
case <-time.After(timeout):
return false
}
}
func (s *statusResponseWriter) sendLine(line string) {
s.sendData(line + "\n")
}
func (s *statusResponseWriter) sendData(data string) {
// Create the proper SSE JSON structure
type Delta struct {
ReasoningContent string `json:"reasoning_content"`
}
type Choice struct {
Delta Delta `json:"delta"`
}
type SSEMessage struct {
Choices []Choice `json:"choices"`
}
msg := SSEMessage{
Choices: []Choice{
{
Delta: Delta{
ReasoningContent: data,
},
},
},
}
jsonData, err := json.Marshal(msg)
if err != nil {
s.process.proxyLogger.Errorf("<%s> Failed to marshal SSE message: %v", s.process.ID, err)
return
}
// Write SSE formatted data, panic if not able to write
_, err = fmt.Fprintf(s.writer, "data: %s\n\n", jsonData)
if err != nil {
panic(fmt.Sprintf("<%s> Failed to write SSE data: %v", s.process.ID, err))
}
s.Flush()
}
func (s *statusResponseWriter) Header() http.Header {
return s.writer.Header()
}
func (s *statusResponseWriter) Write(data []byte) (int, error) {
return s.writer.Write(data)
}
func (s *statusResponseWriter) WriteHeader(statusCode int) {
if s.hasWritten {
return
}
s.hasWritten = true
s.writer.WriteHeader(statusCode)
s.Flush()
}
func (s *statusResponseWriter) Flush() {
if flusher, ok := s.writer.(http.Flusher); ok {
flusher.Flush()
}
}
+100 -22
View File
@@ -10,6 +10,7 @@ import (
"testing"
"time"
"github.com/mostlygeek/llama-swap/proxy/config"
"github.com/stretchr/testify/assert"
)
@@ -90,7 +91,7 @@ func TestProcess_WaitOnMultipleStarts(t *testing.T) {
// test that the automatic start returns the expected error type
func TestProcess_BrokenModelConfig(t *testing.T) {
// Create a process configuration
config := ModelConfig{
config := config.ModelConfig{
Cmd: "nonexistent-command",
Proxy: "http://127.0.0.1:9913",
CheckEndpoint: "/health",
@@ -116,12 +117,12 @@ func TestProcess_UnloadAfterTTL(t *testing.T) {
}
expectedMessage := "I_sense_imminent_danger"
config := getTestSimpleResponderConfig(expectedMessage)
assert.Equal(t, 0, config.UnloadAfter)
config.UnloadAfter = 3 // seconds
assert.Equal(t, 3, config.UnloadAfter)
conf := getTestSimpleResponderConfig(expectedMessage)
assert.Equal(t, config.MODEL_CONFIG_DEFAULT_TTL, conf.UnloadAfter)
conf.UnloadAfter = 3 // seconds
assert.Equal(t, 3, conf.UnloadAfter)
process := NewProcess("ttl_test", 2, config, debugLogger, debugLogger)
process := NewProcess("ttl_test", 2, conf, debugLogger, debugLogger)
defer process.Stop()
// this should take 4 seconds
@@ -158,12 +159,12 @@ func TestProcess_LowTTLValue(t *testing.T) {
t.Skip("skipping test, edit process_test.go to run it ")
}
config := getTestSimpleResponderConfig("fast_ttl")
assert.Equal(t, 0, config.UnloadAfter)
config.UnloadAfter = 1 // second
assert.Equal(t, 1, config.UnloadAfter)
conf := getTestSimpleResponderConfig("fast_ttl")
assert.Equal(t, config.MODEL_CONFIG_DEFAULT_TTL, conf.UnloadAfter)
conf.UnloadAfter = 1 // second
assert.Equal(t, 1, conf.UnloadAfter)
process := NewProcess("ttl", 2, config, debugLogger, debugLogger)
process := NewProcess("ttl", 2, conf, debugLogger, debugLogger)
defer process.Stop()
for i := 0; i < 100; i++ {
@@ -325,7 +326,7 @@ func TestProcess_ExitInterruptsHealthCheck(t *testing.T) {
// should run and exit but interrupt the long checkHealthTimeout
checkHealthTimeout := 5
config := ModelConfig{
config := config.ModelConfig{
Cmd: "sleep 1",
Proxy: "http://127.0.0.1:9913",
CheckEndpoint: "/health",
@@ -394,6 +395,10 @@ func TestProcess_StopImmediately(t *testing.T) {
// Test that SIGKILL is sent when gracefulStopTimeout is reached and properly terminates
// the upstream command
func TestProcess_ForceStopWithKill(t *testing.T) {
if testing.Short() {
t.Skip("skipping slow test")
}
if runtime.GOOS == "windows" {
t.Skip("skipping SIGTERM test on Windows ")
}
@@ -402,7 +407,7 @@ func TestProcess_ForceStopWithKill(t *testing.T) {
binaryPath := getSimpleResponderPath()
port := getTestPort()
config := ModelConfig{
conf := config.ModelConfig{
// note --ignore-sig-term which ignores the SIGTERM signal so a SIGKILL must be sent
// to force the process to exit
Cmd: fmt.Sprintf("%s --port %d --respond %s --silent --ignore-sig-term", binaryPath, port, expectedMessage),
@@ -410,7 +415,7 @@ func TestProcess_ForceStopWithKill(t *testing.T) {
CheckEndpoint: "/health",
}
process := NewProcess("stop_immediate", 2, config, debugLogger, debugLogger)
process := NewProcess("stop_immediate", 2, conf, debugLogger, debugLogger)
defer process.Stop()
// reduce to make testing go faster
@@ -435,7 +440,9 @@ func TestProcess_ForceStopWithKill(t *testing.T) {
if runtime.GOOS == "windows" {
assert.Contains(t, w.Body.String(), "wsarecv: An existing connection was forcibly closed by the remote host")
} else {
assert.Contains(t, w.Body.String(), "unexpected EOF")
// Upstream may be killed mid-response.
// Assert an incomplete or partial response.
assert.NotEqual(t, "12345", w.Body.String())
}
close(waitChan)
@@ -450,15 +457,15 @@ func TestProcess_ForceStopWithKill(t *testing.T) {
}
func TestProcess_StopCmd(t *testing.T) {
config := getTestSimpleResponderConfig("test_stop_cmd")
conf := getTestSimpleResponderConfig("test_stop_cmd")
if runtime.GOOS == "windows" {
config.CmdStop = "taskkill /f /t /pid ${PID}"
conf.CmdStop = "taskkill /f /t /pid ${PID}"
} else {
config.CmdStop = "kill -TERM ${PID}"
conf.CmdStop = "kill -TERM ${PID}"
}
process := NewProcess("testStopCmd", 2, config, debugLogger, debugLogger)
process := NewProcess("testStopCmd", 2, conf, debugLogger, debugLogger)
defer process.Stop()
err := process.start()
@@ -470,15 +477,15 @@ func TestProcess_StopCmd(t *testing.T) {
func TestProcess_EnvironmentSetCorrectly(t *testing.T) {
expectedMessage := "test_env_not_emptied"
config := getTestSimpleResponderConfig(expectedMessage)
conf := getTestSimpleResponderConfig(expectedMessage)
// ensure that the the default config does not blank out the inherited environment
configWEnv := config
configWEnv := conf
// ensure the additiona variables are appended to the process' environment
configWEnv.Env = append(configWEnv.Env, "TEST_ENV1=1", "TEST_ENV2=2")
process1 := NewProcess("env_test", 2, config, debugLogger, debugLogger)
process1 := NewProcess("env_test", 2, conf, debugLogger, debugLogger)
process2 := NewProcess("env_test", 2, configWEnv, debugLogger, debugLogger)
process1.start()
@@ -491,3 +498,74 @@ func TestProcess_EnvironmentSetCorrectly(t *testing.T) {
assert.Equal(t, len(process1.cmd.Environ())+2, len(process2.cmd.Environ()), "process2 should have 2 more environment variables than process1")
}
// TestProcess_ReverseProxyPanicIsHandled tests that panics from
// httputil.ReverseProxy in Process.ProxyRequest(w, r) do not bubble up and are
// handled appropriately.
//
// httputil.ReverseProxy will panic with http.ErrAbortHandler when it has sent headers
// can't copy the body. This can be caused by a client disconnecting before the full
// response is sent from some reason.
//
// bug: https://github.com/mostlygeek/llama-swap/issues/362
// see: https://github.com/golang/go/issues/23643 (where panic was added to httputil.ReverseProxy)
func TestProcess_ReverseProxyPanicIsHandled(t *testing.T) {
// Add defer/recover to catch any panics that aren't handled by ProxyRequest
// If this recover() is hit, it means ProxyRequest didn't handle the panic properly
defer func() {
if r := recover(); r != nil {
t.Fatalf("ProxyRequest should handle panics from reverseProxy.ServeHTTP, but panic was not caught: %v", r)
}
}()
expectedMessage := "panic_test"
config := getTestSimpleResponderConfig(expectedMessage)
process := NewProcess("panic-test", 5, config, debugLogger, debugLogger)
defer process.Stop()
// Start the process
err := process.start()
assert.Nil(t, err)
assert.Equal(t, StateReady, process.CurrentState())
// Create a custom ResponseWriter that simulates a client disconnect
// by panicking when Write is called after headers are sent
panicWriter := &panicOnWriteResponseWriter{
ResponseRecorder: httptest.NewRecorder(),
shouldPanic: true,
}
// Make a request that will trigger the panic
req := httptest.NewRequest("GET", "/slow-respond?echo=test&delay=100ms", nil)
// This should panic inside reverseProxy.ServeHTTP when the panicWriter.Write() is called.
// ProxyRequest should catch and handle this panic gracefully.
process.ProxyRequest(panicWriter, req)
// If we get here, the panic was properly recovered in ProxyRequest
// The process should still be in a ready state
assert.Equal(t, StateReady, process.CurrentState())
}
// panicOnWriteResponseWriter is a ResponseWriter that panics on Write
// to simulate a client disconnect after headers are sent
// used by: TestProcess_ReverseProxyPanicIsHandled
type panicOnWriteResponseWriter struct {
*httptest.ResponseRecorder
shouldPanic bool
headerWritten bool
}
func (w *panicOnWriteResponseWriter) WriteHeader(statusCode int) {
w.headerWritten = true
w.ResponseRecorder.WriteHeader(statusCode)
}
func (w *panicOnWriteResponseWriter) Write(b []byte) (int, error) {
if w.shouldPanic && w.headerWritten {
// Simulate the panic that httputil.ReverseProxy throws
panic(http.ErrAbortHandler)
}
return w.ResponseRecorder.Write(b)
}
+12
View File
@@ -0,0 +1,12 @@
//go:build !windows
package proxy
import (
"os/exec"
)
// setProcAttributes sets platform-specific process attributes
func setProcAttributes(cmd *exec.Cmd) {
// No-op on Unix systems
}
+16
View File
@@ -0,0 +1,16 @@
//go:build windows
package proxy
import (
"os/exec"
"syscall"
)
// setProcAttributes sets platform-specific process attributes
func setProcAttributes(cmd *exec.Cmd) {
cmd.SysProcAttr = &syscall.SysProcAttr{
HideWindow: true,
CreationFlags: 0x08000000, // CREATE_NO_WINDOW
}
}
+36 -3
View File
@@ -5,12 +5,14 @@ import (
"net/http"
"slices"
"sync"
"github.com/mostlygeek/llama-swap/proxy/config"
)
type ProcessGroup struct {
sync.Mutex
config Config
config config.Config
id string
swap bool
exclusive bool
@@ -24,7 +26,7 @@ type ProcessGroup struct {
lastUsedProcess string
}
func NewProcessGroup(id string, config Config, proxyLogger *LogMonitor, upstreamLogger *LogMonitor) *ProcessGroup {
func NewProcessGroup(id string, config config.Config, proxyLogger *LogMonitor, upstreamLogger *LogMonitor) *ProcessGroup {
groupConfig, ok := config.Groups[id]
if !ok {
panic("Unable to find configuration for group id: " + id)
@@ -44,7 +46,8 @@ func NewProcessGroup(id string, config Config, proxyLogger *LogMonitor, upstream
// Create a Process for each member in the group
for _, modelID := range groupConfig.Members {
modelConfig, modelID, _ := pg.config.FindConfig(modelID)
process := NewProcess(modelID, pg.config.HealthCheckTimeout, modelConfig, pg.upstreamLogger, pg.proxyLogger)
processLogger := NewLogMonitorWriter(upstreamLogger)
process := NewProcess(modelID, pg.config.HealthCheckTimeout, modelConfig, processLogger, pg.proxyLogger)
pg.processes[modelID] = process
}
@@ -86,6 +89,36 @@ func (pg *ProcessGroup) HasMember(modelName string) bool {
return slices.Contains(pg.config.Groups[pg.id].Members, modelName)
}
func (pg *ProcessGroup) GetMember(modelName string) (*Process, bool) {
if pg.HasMember(modelName) {
return pg.processes[modelName], true
}
return nil, false
}
func (pg *ProcessGroup) StopProcess(modelID string, strategy StopStrategy) error {
pg.Lock()
process, exists := pg.processes[modelID]
if !exists {
pg.Unlock()
return fmt.Errorf("process not found for %s", modelID)
}
if pg.lastUsedProcess == modelID {
pg.lastUsedProcess = ""
}
pg.Unlock()
switch strategy {
case StopImmediately:
process.StopImmediately()
default:
process.Stop()
}
return nil
}
func (pg *ProcessGroup) StopProcesses(strategy StopStrategy) {
pg.Lock()
defer pg.Unlock()
+12 -7
View File
@@ -7,19 +7,20 @@ import (
"sync"
"testing"
"github.com/mostlygeek/llama-swap/proxy/config"
"github.com/stretchr/testify/assert"
)
var processGroupTestConfig = AddDefaultGroupToConfig(Config{
var processGroupTestConfig = config.AddDefaultGroupToConfig(config.Config{
HealthCheckTimeout: 15,
Models: map[string]ModelConfig{
Models: map[string]config.ModelConfig{
"model1": getTestSimpleResponderConfig("model1"),
"model2": getTestSimpleResponderConfig("model2"),
"model3": getTestSimpleResponderConfig("model3"),
"model4": getTestSimpleResponderConfig("model4"),
"model5": getTestSimpleResponderConfig("model5"),
},
Groups: map[string]GroupConfig{
Groups: map[string]config.GroupConfig{
"G1": {
Swap: true,
Exclusive: true,
@@ -34,7 +35,7 @@ var processGroupTestConfig = AddDefaultGroupToConfig(Config{
})
func TestProcessGroup_DefaultHasCorrectModel(t *testing.T) {
pg := NewProcessGroup(DEFAULT_GROUP_ID, processGroupTestConfig, testLogger, testLogger)
pg := NewProcessGroup(config.DEFAULT_GROUP_ID, processGroupTestConfig, testLogger, testLogger)
assert.True(t, pg.HasMember("model5"))
}
@@ -48,9 +49,13 @@ func TestProcessGroup_HasMember(t *testing.T) {
// TestProcessGroup_ProxyRequestSwapIsTrueParallel tests that when swap is true
// and multiple requests are made in parallel, only one process is running at a time.
func TestProcessGroup_ProxyRequestSwapIsTrueParallel(t *testing.T) {
var processGroupTestConfig = AddDefaultGroupToConfig(Config{
if testing.Short() {
t.Skip("skipping slow test")
}
var processGroupTestConfig = config.AddDefaultGroupToConfig(config.Config{
HealthCheckTimeout: 15,
Models: map[string]ModelConfig{
Models: map[string]config.ModelConfig{
// use the same listening so if a model is already running, it will fail
// this is a way to test that swap isolation is working
// properly when there are parallel requests made at the
@@ -61,7 +66,7 @@ func TestProcessGroup_ProxyRequestSwapIsTrueParallel(t *testing.T) {
"model4": getTestSimpleResponderConfigPort("model4", 9832),
"model5": getTestSimpleResponderConfigPort("model5", 9832),
},
Groups: map[string]GroupConfig{
Groups: map[string]config.GroupConfig{
"G1": {
Swap: true,
Members: []string{"model1", "model2", "model3", "model4", "model5"},
+551 -127
View File
@@ -3,6 +3,7 @@ package proxy
import (
"bytes"
"context"
"encoding/base64"
"fmt"
"io"
"mime/multipart"
@@ -16,6 +17,7 @@ import (
"github.com/gin-gonic/gin"
"github.com/mostlygeek/llama-swap/event"
"github.com/mostlygeek/llama-swap/proxy/config"
"github.com/tidwall/gjson"
"github.com/tidwall/sjson"
)
@@ -24,10 +26,46 @@ const (
PROFILE_SPLIT_CHAR = ":"
)
type proxyCtxKey string
type InflightCounter struct {
mu sync.Mutex
total int
}
func newInflightCounter() *InflightCounter {
return &InflightCounter{}
}
func (ic *InflightCounter) Current() int {
ic.mu.Lock()
total := ic.total
ic.mu.Unlock()
return total
}
func (ic *InflightCounter) Increment() int {
ic.mu.Lock()
ic.total++
total := ic.total
ic.mu.Unlock()
return total
}
func (ic *InflightCounter) Decrement() int {
ic.mu.Lock()
if ic.total > 0 {
ic.total--
}
total := ic.total
ic.mu.Unlock()
return total
}
type ProxyManager struct {
sync.Mutex
config Config
config config.Config
ginEngine *gin.Engine
// logging
@@ -35,26 +73,56 @@ type ProxyManager struct {
upstreamLogger *LogMonitor
muxLogger *LogMonitor
metricsMonitor *MetricsMonitor
metricsMonitor *metricsMonitor
processGroups map[string]*ProcessGroup
inFlightCounter *InflightCounter
// shutdown signaling
shutdownCtx context.Context
shutdownCancel context.CancelFunc
// version info
buildDate string
commit string
version string
// peer proxy see: #296, #433
peerProxy *PeerProxy
}
func New(config Config) *ProxyManager {
func New(proxyConfig config.Config) *ProxyManager {
// set up loggers
stdoutLogger := NewLogMonitorWriter(os.Stdout)
upstreamLogger := NewLogMonitorWriter(stdoutLogger)
proxyLogger := NewLogMonitorWriter(stdoutLogger)
if config.LogRequests {
var muxLogger, upstreamLogger, proxyLogger *LogMonitor
switch proxyConfig.LogToStdout {
case config.LogToStdoutNone:
muxLogger = NewLogMonitorWriter(io.Discard)
upstreamLogger = NewLogMonitorWriter(io.Discard)
proxyLogger = NewLogMonitorWriter(io.Discard)
case config.LogToStdoutBoth:
muxLogger = NewLogMonitorWriter(os.Stdout)
upstreamLogger = NewLogMonitorWriter(muxLogger)
proxyLogger = NewLogMonitorWriter(muxLogger)
case config.LogToStdoutUpstream:
muxLogger = NewLogMonitorWriter(os.Stdout)
upstreamLogger = NewLogMonitorWriter(muxLogger)
proxyLogger = NewLogMonitorWriter(io.Discard)
default:
// same as config.LogToStdoutProxy
// helpful because some old tests create a config.Config directly and it
// may not have LogToStdout set explicitly
muxLogger = NewLogMonitorWriter(os.Stdout)
upstreamLogger = NewLogMonitorWriter(io.Discard)
proxyLogger = NewLogMonitorWriter(muxLogger)
}
if proxyConfig.LogRequests {
proxyLogger.Warn("LogRequests configuration is deprecated. Use logLevel instead.")
}
switch strings.ToLower(strings.TrimSpace(config.LogLevel)) {
switch strings.ToLower(strings.TrimSpace(proxyConfig.LogLevel)) {
case "debug":
proxyLogger.SetLogLevel(LevelDebug)
upstreamLogger.SetLogLevel(LevelDebug)
@@ -72,53 +140,105 @@ func New(config Config) *ProxyManager {
upstreamLogger.SetLogLevel(LevelInfo)
}
// see: https://go.dev/src/time/format.go
timeFormats := map[string]string{
"ansic": time.ANSIC,
"unixdate": time.UnixDate,
"rubydate": time.RubyDate,
"rfc822": time.RFC822,
"rfc822z": time.RFC822Z,
"rfc850": time.RFC850,
"rfc1123": time.RFC1123,
"rfc1123z": time.RFC1123Z,
"rfc3339": time.RFC3339,
"rfc3339nano": time.RFC3339Nano,
"kitchen": time.Kitchen,
"stamp": time.Stamp,
"stampmilli": time.StampMilli,
"stampmicro": time.StampMicro,
"stampnano": time.StampNano,
}
if timeFormat, ok := timeFormats[strings.ToLower(strings.TrimSpace(proxyConfig.LogTimeFormat))]; ok {
proxyLogger.SetLogTimeFormat(timeFormat)
upstreamLogger.SetLogTimeFormat(timeFormat)
}
shutdownCtx, shutdownCancel := context.WithCancel(context.Background())
var maxMetrics int
if proxyConfig.MetricsMaxInMemory <= 0 {
maxMetrics = 1000 // Default fallback
} else {
maxMetrics = proxyConfig.MetricsMaxInMemory
}
peerProxy, err := NewPeerProxy(proxyConfig.Peers, proxyLogger)
if err != nil {
proxyLogger.Errorf("Disabling Peering. Failed to create proxy peers: %v", err)
peerProxy = nil
}
pm := &ProxyManager{
config: config,
config: proxyConfig,
ginEngine: gin.New(),
proxyLogger: proxyLogger,
muxLogger: stdoutLogger,
muxLogger: muxLogger,
upstreamLogger: upstreamLogger,
metricsMonitor: NewMetricsMonitor(&config),
metricsMonitor: newMetricsMonitor(proxyLogger, maxMetrics, proxyConfig.CaptureBuffer),
processGroups: make(map[string]*ProcessGroup),
inFlightCounter: newInflightCounter(),
shutdownCtx: shutdownCtx,
shutdownCancel: shutdownCancel,
buildDate: "unknown",
commit: "abcd1234",
version: "0",
peerProxy: peerProxy,
}
// create the process groups
for groupID := range config.Groups {
processGroup := NewProcessGroup(groupID, config, proxyLogger, upstreamLogger)
for groupID := range proxyConfig.Groups {
processGroup := NewProcessGroup(groupID, proxyConfig, proxyLogger, upstreamLogger)
pm.processGroups[groupID] = processGroup
}
pm.setupGinEngine()
// run any startup hooks
if len(config.Hooks.OnStartup.Preload) > 0 {
if len(proxyConfig.Hooks.OnStartup.Preload) > 0 {
// do it in the background, don't block startup -- not sure if good idea yet
go func() {
discardWriter := &DiscardWriter{}
for _, realModelName := range config.Hooks.OnStartup.Preload {
proxyLogger.Infof("Preloading model: %s", realModelName)
processGroup, _, err := pm.swapProcessGroup(realModelName)
for _, preloadModelName := range proxyConfig.Hooks.OnStartup.Preload {
modelID, ok := proxyConfig.RealModelName(preloadModelName)
if !ok {
proxyLogger.Warnf("Preload model %s not found in config", preloadModelName)
continue
}
proxyLogger.Infof("Preloading model: %s", modelID)
processGroup, err := pm.swapProcessGroup(modelID)
if err != nil {
event.Emit(ModelPreloadedEvent{
ModelName: realModelName,
ModelName: modelID,
Success: false,
})
proxyLogger.Errorf("Failed to preload model %s: %v", realModelName, err)
proxyLogger.Errorf("Failed to preload model %s: %v", modelID, err)
continue
} else {
req, _ := http.NewRequest("GET", "/", nil)
processGroup.ProxyRequest(realModelName, discardWriter, req)
processGroup.ProxyRequest(modelID, discardWriter, req)
event.Emit(ModelPreloadedEvent{
ModelName: realModelName,
ModelName: modelID,
Success: true,
})
}
@@ -130,7 +250,15 @@ func New(config Config) *ProxyManager {
}
func (pm *ProxyManager) setupGinEngine() {
pm.ginEngine.Use(func(c *gin.Context) {
// don't log the Wake on Lan proxy health check
if c.Request.URL.Path == "/wol-health" {
c.Next()
return
}
// Start timer
start := time.Now()
@@ -184,38 +312,46 @@ func (pm *ProxyManager) setupGinEngine() {
c.Next()
})
mm := MetricsMiddleware(pm)
// Set up routes using the Gin engine
pm.ginEngine.POST("/v1/chat/completions", mm, pm.proxyOAIHandler)
// Protected routes use pm.apiKeyAuth() middleware
pm.ginEngine.POST("/v1/chat/completions", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
pm.ginEngine.POST("/v1/responses", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
// Support legacy /v1/completions api, see issue #12
pm.ginEngine.POST("/v1/completions", mm, pm.proxyOAIHandler)
pm.ginEngine.POST("/v1/completions", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
// Support anthropic /v1/messages (added https://github.com/ggml-org/llama.cpp/pull/17570)
pm.ginEngine.POST("/v1/messages", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
// Support anthropic count_tokens API (Also added in the above PR)
pm.ginEngine.POST("/v1/messages/count_tokens", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
// Support embeddings and reranking
pm.ginEngine.POST("/v1/embeddings", mm, pm.proxyOAIHandler)
pm.ginEngine.POST("/v1/embeddings", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
// llama-server's /reranking endpoint + aliases
pm.ginEngine.POST("/reranking", mm, pm.proxyOAIHandler)
pm.ginEngine.POST("/rerank", mm, pm.proxyOAIHandler)
pm.ginEngine.POST("/v1/rerank", mm, pm.proxyOAIHandler)
pm.ginEngine.POST("/v1/reranking", mm, pm.proxyOAIHandler)
pm.ginEngine.POST("/reranking", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
pm.ginEngine.POST("/rerank", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
pm.ginEngine.POST("/v1/rerank", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
pm.ginEngine.POST("/v1/reranking", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
// llama-server's /infill endpoint for code infilling
pm.ginEngine.POST("/infill", mm, pm.proxyOAIHandler)
pm.ginEngine.POST("/infill", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
// llama-server's /completion endpoint
pm.ginEngine.POST("/completion", mm, pm.proxyOAIHandler)
pm.ginEngine.POST("/completion", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
// Support audio/speech endpoint
pm.ginEngine.POST("/v1/audio/speech", pm.proxyOAIHandler)
pm.ginEngine.POST("/v1/audio/transcriptions", pm.proxyOAIPostFormHandler)
pm.ginEngine.POST("/v1/audio/speech", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
pm.ginEngine.POST("/v1/audio/voices", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
pm.ginEngine.GET("/v1/audio/voices", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyGETModelHandler)
pm.ginEngine.POST("/v1/audio/transcriptions", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyOAIPostFormHandler)
pm.ginEngine.POST("/v1/images/generations", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
pm.ginEngine.POST("/v1/images/edits", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyOAIPostFormHandler)
pm.ginEngine.GET("/v1/models", pm.listModelsHandler)
pm.ginEngine.GET("/v1/models", pm.apiKeyAuth(), pm.listModelsHandler)
// in proxymanager_loghandlers.go
pm.ginEngine.GET("/logs", pm.sendLogsHandlers)
pm.ginEngine.GET("/logs/stream", pm.streamLogsHandler)
pm.ginEngine.GET("/logs/stream/:logMonitorID", pm.streamLogsHandler)
pm.ginEngine.GET("/logs", pm.apiKeyAuth(), pm.sendLogsHandlers)
pm.ginEngine.GET("/logs/stream", pm.apiKeyAuth(), pm.streamLogsHandler)
pm.ginEngine.GET("/logs/stream/*logMonitorID", pm.apiKeyAuth(), pm.streamLogsHandler)
/**
* User Interface Endpoints
@@ -227,14 +363,18 @@ func (pm *ProxyManager) setupGinEngine() {
pm.ginEngine.GET("/upstream", func(c *gin.Context) {
c.Redirect(http.StatusFound, "/ui/models")
})
pm.ginEngine.Any("/upstream/*upstreamPath", pm.proxyToUpstream)
pm.ginEngine.GET("/unload", pm.unloadAllModelsHandler)
pm.ginEngine.GET("/running", pm.listRunningProcessesHandler)
pm.ginEngine.Any("/upstream/*upstreamPath", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyToUpstream)
pm.ginEngine.GET("/unload", pm.apiKeyAuth(), pm.unloadAllModelsHandler)
pm.ginEngine.GET("/running", pm.apiKeyAuth(), pm.listRunningProcessesHandler)
pm.ginEngine.GET("/health", func(c *gin.Context) {
c.String(http.StatusOK, "OK")
})
// see cmd/wol-proxy/wol-proxy.go, not logged
pm.ginEngine.GET("/wol-health", func(c *gin.Context) {
c.String(http.StatusOK, "OK")
})
pm.ginEngine.GET("/favicon.ico", func(c *gin.Context) {
if data, err := reactStaticFS.ReadFile("ui_dist/favicon.ico"); err == nil {
c.Data(http.StatusOK, "image/x-icon", data)
@@ -247,25 +387,35 @@ func (pm *ProxyManager) setupGinEngine() {
if err != nil {
pm.proxyLogger.Errorf("Failed to load React filesystem: %v", err)
} else {
// Serve files with compression support under /ui/*
// This handler checks for pre-compressed .br and .gz files
pm.ginEngine.GET("/ui/*filepath", func(c *gin.Context) {
filepath := strings.TrimPrefix(c.Param("filepath"), "/")
// Default to index.html for directory-like paths
if filepath == "" {
filepath = "index.html"
}
// serve files that exist under /ui/*
pm.ginEngine.StaticFS("/ui", reactFS)
ServeCompressedFile(reactFS, c.Writer, c.Request, filepath)
})
// server SPA for UI under /ui/*
// Serve SPA for UI under /ui/* - fallback to index.html for client-side routing
pm.ginEngine.NoRoute(func(c *gin.Context) {
if !strings.HasPrefix(c.Request.URL.Path, "/ui") {
c.AbortWithStatus(http.StatusNotFound)
return
}
file, err := reactFS.Open("index.html")
if err != nil {
c.String(http.StatusInternalServerError, err.Error())
// Check if this looks like a file request (has extension)
path := c.Request.URL.Path
if strings.Contains(path, ".") && !strings.HasSuffix(path, "/") {
// This was likely a file request that wasn't found
c.AbortWithStatus(http.StatusNotFound)
return
}
defer file.Close()
http.ServeContent(c.Writer, c.Request, "index.html", time.Now(), file)
// Serve index.html for SPA routing
ServeCompressedFile(reactFS, c.Writer, c.Request, "index.html")
})
}
@@ -277,6 +427,14 @@ func (pm *ProxyManager) setupGinEngine() {
gin.DisableConsoleColor()
}
func (pm *ProxyManager) trackInflight() gin.HandlerFunc {
return func(c *gin.Context) {
event.Emit(InFlightRequestsEvent{Total: pm.inFlightCounter.Increment()})
defer event.Emit(InFlightRequestsEvent{Total: pm.inFlightCounter.Decrement()})
c.Next()
}
}
// ServeHTTP implements http.Handler interface
func (pm *ProxyManager) ServeHTTP(w http.ResponseWriter, r *http.Request) {
pm.ginEngine.ServeHTTP(w, r)
@@ -323,16 +481,10 @@ func (pm *ProxyManager) Shutdown() {
pm.shutdownCancel()
}
func (pm *ProxyManager) swapProcessGroup(requestedModel string) (*ProcessGroup, string, error) {
// de-alias the real model name and get a real one
realModelName, found := pm.config.RealModelName(requestedModel)
if !found {
return nil, realModelName, fmt.Errorf("could not find real modelID for %s", requestedModel)
}
func (pm *ProxyManager) swapProcessGroup(realModelName string) (*ProcessGroup, error) {
processGroup := pm.findGroupByModelName(realModelName)
if processGroup == nil {
return nil, realModelName, fmt.Errorf("could not find process group for model %s", requestedModel)
return nil, fmt.Errorf("could not find process group for model %s", realModelName)
}
if processGroup.exclusive {
@@ -344,20 +496,16 @@ func (pm *ProxyManager) swapProcessGroup(requestedModel string) (*ProcessGroup,
}
}
return processGroup, realModelName, nil
return processGroup, nil
}
func (pm *ProxyManager) listModelsHandler(c *gin.Context) {
data := make([]gin.H, 0, len(pm.config.Models))
createdTime := time.Now().Unix()
for id, modelConfig := range pm.config.Models {
if modelConfig.Unlisted {
continue
}
newRecord := func(modelId string, modelConfig config.ModelConfig) gin.H {
record := gin.H{
"id": id,
"id": modelId,
"object": "model",
"created": createdTime,
"owned_by": "llama-swap",
@@ -370,7 +518,47 @@ func (pm *ProxyManager) listModelsHandler(c *gin.Context) {
record["description"] = desc
}
data = append(data, record)
// Add metadata if present
if len(modelConfig.Metadata) > 0 {
record["meta"] = gin.H{
"llamaswap": modelConfig.Metadata,
}
}
return record
}
for id, modelConfig := range pm.config.Models {
if modelConfig.Unlisted {
continue
}
data = append(data, newRecord(id, modelConfig))
// Include aliases
if pm.config.IncludeAliasesInList {
for _, alias := range modelConfig.Aliases {
if alias := strings.TrimSpace(alias); alias != "" {
data = append(data, newRecord(alias, modelConfig))
}
}
}
}
if pm.peerProxy != nil {
for peerID, peer := range pm.peerProxy.ListPeers() {
// add peer models
for _, modelID := range peer.Models {
// Skip unlisted models if not showing them
record := newRecord(modelID, config.ModelConfig{
Name: fmt.Sprintf("%s: %s", peerID, modelID),
Metadata: map[string]any{
"peerID": peerID,
},
})
data = append(data, record)
}
}
}
// Sort by the "id" key
@@ -392,54 +580,87 @@ func (pm *ProxyManager) listModelsHandler(c *gin.Context) {
})
}
func (pm *ProxyManager) proxyToUpstream(c *gin.Context) {
upstreamPath := c.Param("upstreamPath")
// split the upstream path by / and search for the model name
parts := strings.Split(strings.TrimSpace(upstreamPath), "/")
if len(parts) == 0 {
pm.sendErrorResponse(c, http.StatusBadRequest, "model id required in path")
return
}
modelFound := false
// findModelInPath searches for a valid model name in a path with slashes.
// It iteratively builds up path segments until it finds a matching model.
// Returns: (searchModelName, realModelName, remainingPath, found)
// Example: "/author/model/endpoint" with model "author/model" -> ("author/model", "author/model", "/endpoint", true)
func (pm *ProxyManager) findModelInPath(path string) (searchName string, realName string, remainingPath string, found bool) {
parts := strings.Split(strings.TrimSpace(path), "/")
searchModelName := ""
var modelName, remainingPath string
for i, part := range parts {
if parts[i] == "" {
if part == "" {
continue
}
if searchModelName == "" {
searchModelName = part
} else {
searchModelName = searchModelName + "/" + parts[i]
searchModelName = searchModelName + "/" + part
}
if real, ok := pm.config.RealModelName(searchModelName); ok {
modelName = real
remainingPath = "/" + strings.Join(parts[i+1:], "/")
modelFound = true
break
if modelID, ok := pm.config.RealModelName(searchModelName); ok {
return searchModelName, modelID, "/" + strings.Join(parts[i+1:], "/"), true
}
}
return "", "", "", false
}
func (pm *ProxyManager) proxyToUpstream(c *gin.Context) {
upstreamPath := c.Param("upstreamPath")
searchModelName, modelID, remainingPath, modelFound := pm.findModelInPath(upstreamPath)
if !modelFound {
pm.sendErrorResponse(c, http.StatusBadRequest, "model id required in path")
return
}
processGroup, realModelName, err := pm.swapProcessGroup(modelName)
// Redirect /upstream/modelname to /upstream/modelname/ for URL consistency.
// This ensures relative URLs in upstream responses resolve correctly and
// provides canonical URL form. Uses 308 for POST/PUT/etc to preserve the
// HTTP method (301 would downgrade to GET).
if remainingPath == "/" && !strings.HasSuffix(upstreamPath, "/") {
newPath := "/upstream/" + searchModelName + "/"
if c.Request.URL.RawQuery != "" {
newPath += "?" + c.Request.URL.RawQuery
}
if c.Request.Method == http.MethodGet || c.Request.Method == http.MethodHead {
c.Redirect(http.StatusMovedPermanently, newPath)
} else {
c.Redirect(http.StatusPermanentRedirect, newPath)
}
return
}
processGroup, err := pm.swapProcessGroup(modelID)
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
return
}
// rewrite the path
originalPath := c.Request.URL.Path
c.Request.URL.Path = remainingPath
processGroup.ProxyRequest(realModelName, c.Writer, c.Request)
// attempt to record metrics if it is a POST request
if pm.metricsMonitor != nil && c.Request.Method == "POST" {
if err := pm.metricsMonitor.wrapHandler(modelID, c.Writer, c.Request, processGroup.ProxyRequest); err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error proxying metrics wrapped request: %s", err.Error()))
pm.proxyLogger.Errorf("Error proxying wrapped upstream request for model %s, path=%s", modelID, originalPath)
return
}
} else {
if err := processGroup.ProxyRequest(modelID, c.Writer, c.Request); err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error proxying request: %s", err.Error()))
pm.proxyLogger.Errorf("Error proxying upstream request for model %s, path=%s", modelID, originalPath)
return
}
}
}
func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
func (pm *ProxyManager) proxyInferenceHandler(c *gin.Context) {
bodyBytes, err := io.ReadAll(c.Request.Body)
if err != nil {
pm.sendErrorResponse(c, http.StatusBadRequest, "could not ready request body")
@@ -452,41 +673,101 @@ func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
return
}
realModelName, found := pm.config.RealModelName(requestedModel)
if !found {
pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("could not find real modelID for %s", requestedModel))
return
}
// Look for a matching local model first
var nextHandler func(modelID string, w http.ResponseWriter, r *http.Request) error
processGroup, _, err := pm.swapProcessGroup(realModelName)
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
return
}
// issue #69 allow custom model names to be sent to upstream
useModelName := pm.config.Models[realModelName].UseModelName
if useModelName != "" {
bodyBytes, err = sjson.SetBytes(bodyBytes, "model", useModelName)
modelID, found := pm.config.RealModelName(requestedModel)
if found {
processGroup, err := pm.swapProcessGroup(modelID)
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error rewriting model name in JSON: %s", err.Error()))
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
return
}
}
// issue #174 strip parameters from the JSON body
stripParams, err := pm.config.Models[realModelName].Filters.SanitizedStripParams()
if err != nil { // just log it and continue
pm.proxyLogger.Errorf("Error sanitizing strip params string: %s, %s", pm.config.Models[realModelName].Filters.StripParams, err.Error())
} else {
for _, param := range stripParams {
pm.proxyLogger.Debugf("<%s> stripping param: %s", realModelName, param)
bodyBytes, err = sjson.DeleteBytes(bodyBytes, param)
// issue #69 allow custom model names to be sent to upstream
useModelName := pm.config.Models[modelID].UseModelName
if useModelName != "" {
bodyBytes, err = sjson.SetBytes(bodyBytes, "model", useModelName)
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error deleting parameter %s from request", param))
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error rewriting model name in JSON: %s", err.Error()))
return
}
}
// issue #174 strip parameters from the JSON body
stripParams, err := pm.config.Models[modelID].Filters.SanitizedStripParams()
if err != nil { // just log it and continue
pm.proxyLogger.Errorf("Error sanitizing strip params string: %s, %s", pm.config.Models[modelID].Filters.StripParams, err.Error())
} else {
for _, param := range stripParams {
pm.proxyLogger.Debugf("<%s> stripping param: %s", modelID, param)
bodyBytes, err = sjson.DeleteBytes(bodyBytes, param)
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error deleting parameter %s from request", param))
return
}
}
}
// issue #453 set/override parameters in the JSON body
setParams, setParamKeys := pm.config.Models[modelID].Filters.SanitizedSetParams()
for _, key := range setParamKeys {
pm.proxyLogger.Debugf("<%s> setting param: %s", modelID, key)
bodyBytes, err = sjson.SetBytes(bodyBytes, key, setParams[key])
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error setting parameter %s in request", key))
return
}
}
// setParamsByID: set params based on the requested model ID (runs after setParams, can override it)
setParamsByIDParams, setParamsByIDKeys := pm.config.Models[modelID].Filters.SanitizedSetParamsByID(requestedModel)
for _, key := range setParamsByIDKeys {
pm.proxyLogger.Debugf("<%s> setting param by id: %s", requestedModel, key)
bodyBytes, err = sjson.SetBytes(bodyBytes, key, setParamsByIDParams[key])
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error setting parameter %s in request", key))
return
}
}
pm.proxyLogger.Debugf("ProxyManager using local Process for model: %s", requestedModel)
nextHandler = processGroup.ProxyRequest
} else if pm.peerProxy != nil && pm.peerProxy.HasPeerModel(requestedModel) {
pm.proxyLogger.Debugf("ProxyManager using ProxyPeer for model: %s", requestedModel)
modelID = requestedModel
// issue #453 apply filters for peer requests
peerFilters := pm.peerProxy.GetPeerFilters(requestedModel)
// Apply stripParams - remove specified parameters from request
stripParams := peerFilters.SanitizedStripParams()
for _, param := range stripParams {
pm.proxyLogger.Debugf("<%s> stripping param: %s", requestedModel, param)
bodyBytes, err = sjson.DeleteBytes(bodyBytes, param)
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error stripping parameter %s from request", param))
return
}
}
// Apply setParams - set/override specified parameters in request
setParams, setParamKeys := peerFilters.SanitizedSetParams()
for _, key := range setParamKeys {
pm.proxyLogger.Debugf("<%s> setting param: %s", requestedModel, key)
bodyBytes, err = sjson.SetBytes(bodyBytes, key, setParams[key])
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error setting parameter %s in request", key))
return
}
}
nextHandler = pm.peerProxy.ProxyRequest
}
if nextHandler == nil {
pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("could not find suitable inference handler for %s", requestedModel))
return
}
c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
@@ -496,10 +777,24 @@ func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
c.Request.Header.Set("content-length", strconv.Itoa(len(bodyBytes)))
c.Request.ContentLength = int64(len(bodyBytes))
if err := processGroup.ProxyRequest(realModelName, c.Writer, c.Request); err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error proxying request: %s", err.Error()))
pm.proxyLogger.Errorf("Error Proxying Request for processGroup %s and model %s", processGroup.id, realModelName)
return
// issue #366 extract values that downstream handlers may need
isStreaming := gjson.GetBytes(bodyBytes, "stream").Bool()
ctx := context.WithValue(c.Request.Context(), proxyCtxKey("streaming"), isStreaming)
ctx = context.WithValue(ctx, proxyCtxKey("model"), modelID)
c.Request = c.Request.WithContext(ctx)
if pm.metricsMonitor != nil && c.Request.Method == "POST" {
if err := pm.metricsMonitor.wrapHandler(modelID, c.Writer, c.Request, nextHandler); err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error proxying metrics wrapped request: %s", err.Error()))
pm.proxyLogger.Errorf("Error Proxying Metrics Wrapped Request model %s", modelID)
return
}
} else {
if err := nextHandler(modelID, c.Writer, c.Request); err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error proxying request: %s", err.Error()))
pm.proxyLogger.Errorf("Error Proxying Request for model %s", modelID)
return
}
}
}
@@ -517,9 +812,29 @@ func (pm *ProxyManager) proxyOAIPostFormHandler(c *gin.Context) {
return
}
processGroup, realModelName, err := pm.swapProcessGroup(requestedModel)
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
// Look for a matching local model first, then check peers
var nextHandler func(modelID string, w http.ResponseWriter, r *http.Request) error
var useModelName string
modelID, found := pm.config.RealModelName(requestedModel)
if found {
processGroup, err := pm.swapProcessGroup(modelID)
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
return
}
useModelName = pm.config.Models[modelID].UseModelName
pm.proxyLogger.Debugf("ProxyManager using local Process for model: %s", requestedModel)
nextHandler = processGroup.ProxyRequest
} else if pm.peerProxy != nil && pm.peerProxy.HasPeerModel(requestedModel) {
pm.proxyLogger.Debugf("ProxyManager using ProxyPeer for model: %s", requestedModel)
modelID = requestedModel
nextHandler = pm.peerProxy.ProxyRequest
}
if nextHandler == nil {
pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("could not find suitable handler for %s", requestedModel))
return
}
@@ -535,8 +850,6 @@ func (pm *ProxyManager) proxyOAIPostFormHandler(c *gin.Context) {
// If this is the model field and we have a profile, use just the model name
if key == "model" {
// # issue #69 allow custom model names to be sent to upstream
useModelName := pm.config.Models[realModelName].UseModelName
if useModelName != "" {
fieldValue = useModelName
} else {
@@ -606,9 +919,46 @@ func (pm *ProxyManager) proxyOAIPostFormHandler(c *gin.Context) {
modifiedReq.ContentLength = int64(requestBuffer.Len())
// Use the modified request for proxying
if err := processGroup.ProxyRequest(realModelName, c.Writer, modifiedReq); err != nil {
if err := nextHandler(modelID, c.Writer, modifiedReq); err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error proxying request: %s", err.Error()))
pm.proxyLogger.Errorf("Error Proxying Request for processGroup %s and model %s", processGroup.id, realModelName)
pm.proxyLogger.Errorf("Error Proxying Request for model %s", modelID)
return
}
}
func (pm *ProxyManager) proxyGETModelHandler(c *gin.Context) {
requestedModel := c.Query("model")
if requestedModel == "" {
pm.sendErrorResponse(c, http.StatusBadRequest, "missing required 'model' query parameter")
return
}
var nextHandler func(modelID string, w http.ResponseWriter, r *http.Request) error
var modelID string
if realModelID, found := pm.config.RealModelName(requestedModel); found {
processGroup, err := pm.swapProcessGroup(realModelID)
if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
return
}
modelID = realModelID
pm.proxyLogger.Debugf("ProxyManager using local Process for model: %s", requestedModel)
nextHandler = processGroup.ProxyRequest
} else if pm.peerProxy != nil && pm.peerProxy.HasPeerModel(requestedModel) {
modelID = requestedModel
pm.proxyLogger.Debugf("ProxyManager using ProxyPeer for model: %s", requestedModel)
nextHandler = pm.peerProxy.ProxyRequest
}
if nextHandler == nil {
pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("could not find suitable handler for %s", requestedModel))
return
}
if err := nextHandler(modelID, c.Writer, c.Request); err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error proxying request: %s", err.Error()))
pm.proxyLogger.Errorf("Error Proxying GET Request for model %s", modelID)
return
}
}
@@ -623,6 +973,67 @@ func (pm *ProxyManager) sendErrorResponse(c *gin.Context, statusCode int, messag
}
}
// apiKeyAuth returns a middleware that validates API keys if configured.
// Returns a pass-through handler if no API keys are configured.
func (pm *ProxyManager) apiKeyAuth() gin.HandlerFunc {
if len(pm.config.RequiredAPIKeys) == 0 {
return func(c *gin.Context) { c.Next() }
}
return func(c *gin.Context) {
xApiKey := c.GetHeader("x-api-key")
var bearerKey string
var basicKey string
if auth := c.GetHeader("Authorization"); auth != "" {
if strings.HasPrefix(auth, "Bearer ") {
bearerKey = strings.TrimPrefix(auth, "Bearer ")
} else if strings.HasPrefix(auth, "Basic ") {
// Basic Auth: base64(username:password), password is the API key
encoded := strings.TrimPrefix(auth, "Basic ")
if decoded, err := base64.StdEncoding.DecodeString(encoded); err == nil {
parts := strings.SplitN(string(decoded), ":", 2)
if len(parts) == 2 {
basicKey = parts[1] // password is the API key
}
}
}
}
// Use first key found: Basic, then Bearer, then x-api-key
var providedKey string
if basicKey != "" {
providedKey = basicKey
} else if bearerKey != "" {
providedKey = bearerKey
} else {
providedKey = xApiKey
}
// Validate key
valid := false
for _, key := range pm.config.RequiredAPIKeys {
if providedKey == key {
valid = true
break
}
}
if !valid {
c.Header("WWW-Authenticate", `Basic realm="llama-swap"`)
pm.sendErrorResponse(c, http.StatusUnauthorized, "unauthorized: invalid or missing API key")
c.Abort()
return
}
// Strip auth headers to prevent leakage to upstream
c.Request.Header.Del("Authorization")
c.Request.Header.Del("x-api-key")
c.Next()
}
}
func (pm *ProxyManager) unloadAllModelsHandler(c *gin.Context) {
pm.StopProcesses(StopImmediately)
c.String(http.StatusOK, "OK")
@@ -636,8 +1047,13 @@ func (pm *ProxyManager) listRunningProcessesHandler(context *gin.Context) {
for _, process := range processGroup.processes {
if process.CurrentState() == StateReady {
runningProcesses = append(runningProcesses, gin.H{
"model": process.ID,
"state": process.state,
"model": process.ID,
"state": process.state,
"cmd": process.config.Cmd,
"proxy": process.config.Proxy,
"ttl": process.config.UnloadAfter,
"name": process.config.Name,
"description": process.config.Description,
})
}
}
@@ -659,3 +1075,11 @@ func (pm *ProxyManager) findGroupByModelName(modelName string) *ProcessGroup {
}
return nil
}
func (pm *ProxyManager) SetVersion(buildDate string, commit string, version string) {
pm.Lock()
defer pm.Unlock()
pm.buildDate = buildDate
pm.commit = commit
pm.version = version
}
+98 -8
View File
@@ -3,28 +3,37 @@ package proxy
import (
"context"
"encoding/json"
"fmt"
"net/http"
"sort"
"strconv"
"strings"
"github.com/gin-gonic/gin"
"github.com/mostlygeek/llama-swap/event"
)
type Model struct {
Id string `json:"id"`
Name string `json:"name"`
Description string `json:"description"`
State string `json:"state"`
Unlisted bool `json:"unlisted"`
Id string `json:"id"`
Name string `json:"name"`
Description string `json:"description"`
State string `json:"state"`
Unlisted bool `json:"unlisted"`
PeerID string `json:"peerID"`
Aliases []string `json:"aliases,omitempty"`
}
func addApiHandlers(pm *ProxyManager) {
// Add API endpoints for React to consume
apiGroup := pm.ginEngine.Group("/api")
// Protected with API key authentication
apiGroup := pm.ginEngine.Group("/api", pm.apiKeyAuth())
{
apiGroup.POST("/models/unload", pm.apiUnloadAllModels)
apiGroup.POST("/models/unload/*model", pm.apiUnloadSingleModelHandler)
apiGroup.GET("/events", pm.apiSendEvents)
apiGroup.GET("/metrics", pm.apiGetMetrics)
apiGroup.GET("/version", pm.apiGetVersion)
apiGroup.GET("/captures/:id", pm.apiGetCapture)
}
}
@@ -75,9 +84,22 @@ func (pm *ProxyManager) getModelStatus() []Model {
Description: pm.config.Models[modelID].Description,
State: state,
Unlisted: pm.config.Models[modelID].Unlisted,
Aliases: pm.config.Models[modelID].Aliases,
})
}
// Iterate over the peer models
if pm.peerProxy != nil {
for peerID, peer := range pm.peerProxy.ListPeers() {
for _, modelID := range peer.Models {
models = append(models, Model{
Id: modelID,
PeerID: peerID,
})
}
}
}
return models
}
@@ -87,6 +109,7 @@ const (
msgTypeModelStatus messageType = "modelStatus"
msgTypeLogData messageType = "logData"
msgTypeMetrics messageType = "metrics"
msgTypeInFlight messageType = "inflight"
)
type messageEnvelope struct {
@@ -146,6 +169,18 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
}
}
sendInFlight := func(total int) {
jsonData, err := json.Marshal(gin.H{"total": total})
if err == nil {
select {
case sendBuffer <- messageEnvelope{Type: msgTypeInFlight, Data: string(jsonData)}:
case <-ctx.Done():
return
default:
}
}
}
/**
* Send updated models list
*/
@@ -173,11 +208,19 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
sendMetrics([]TokenMetrics{e.Metrics})
})()
/**
* Send in-flight request stats related to token stats "Waiting: N" count.
*/
defer event.On(func(e InFlightRequestsEvent) {
sendInFlight(e.Total)
})()
// send initial batch of data
sendLogData("proxy", pm.proxyLogger.GetHistory())
sendLogData("upstream", pm.upstreamLogger.GetHistory())
sendModels()
sendMetrics(pm.metricsMonitor.GetMetrics())
sendMetrics(pm.metricsMonitor.getMetrics())
sendInFlight(pm.inFlightCounter.Current())
for {
select {
@@ -195,10 +238,57 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
}
func (pm *ProxyManager) apiGetMetrics(c *gin.Context) {
jsonData, err := pm.metricsMonitor.GetMetricsJSON()
jsonData, err := pm.metricsMonitor.getMetricsJSON()
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to get metrics"})
return
}
c.Data(http.StatusOK, "application/json", jsonData)
}
func (pm *ProxyManager) apiUnloadSingleModelHandler(c *gin.Context) {
requestedModel := strings.TrimPrefix(c.Param("model"), "/")
realModelName, found := pm.config.RealModelName(requestedModel)
if !found {
pm.sendErrorResponse(c, http.StatusNotFound, "Model not found")
return
}
processGroup := pm.findGroupByModelName(realModelName)
if processGroup == nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("process group not found for model %s", requestedModel))
return
}
if err := processGroup.StopProcess(realModelName, StopImmediately); err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error stopping process: %s", err.Error()))
return
} else {
c.String(http.StatusOK, "OK")
}
}
func (pm *ProxyManager) apiGetVersion(c *gin.Context) {
c.JSON(http.StatusOK, map[string]string{
"version": pm.version,
"commit": pm.commit,
"build_date": pm.buildDate,
})
}
func (pm *ProxyManager) apiGetCapture(c *gin.Context) {
idStr := c.Param("id")
id, err := strconv.Atoi(idStr)
if err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid capture ID"})
return
}
capture := pm.metricsMonitor.getCaptureByID(id)
if capture == nil {
c.JSON(http.StatusNotFound, gin.H{"error": "capture not found"})
return
}
c.JSON(http.StatusOK, capture)
}
+20 -13
View File
@@ -31,7 +31,7 @@ func (pm *ProxyManager) streamLogsHandler(c *gin.Context) {
// prevent nginx from buffering streamed logs
c.Header("X-Accel-Buffering", "no")
logMonitorId := c.Param("logMonitorID")
logMonitorId := strings.TrimPrefix(c.Param("logMonitorID"), "/")
logger, err := pm.getLogger(logMonitorId)
if err != nil {
c.String(http.StatusBadRequest, err.Error())
@@ -83,18 +83,25 @@ func (pm *ProxyManager) streamLogsHandler(c *gin.Context) {
// getLogger searches for the appropriate logger based on the logMonitorId
func (pm *ProxyManager) getLogger(logMonitorId string) (*LogMonitor, error) {
var logger *LogMonitor
if logMonitorId == "" {
switch logMonitorId {
case "":
// maintain the default
logger = pm.muxLogger
} else if logMonitorId == "proxy" {
logger = pm.proxyLogger
} else if logMonitorId == "upstream" {
logger = pm.upstreamLogger
} else {
return nil, fmt.Errorf("invalid logger. Use 'proxy' or 'upstream'")
}
return pm.muxLogger, nil
case "proxy":
return pm.proxyLogger, nil
case "upstream":
return pm.upstreamLogger, nil
default:
// search for a models specific logger using findModelInPath
// to handle model names with slashes (e.g., "author/model")
if _, name, _, found := pm.findModelInPath("/" + logMonitorId); found {
for _, group := range pm.processGroups {
if process, found := group.GetMember(name); found {
return process.Logger(), nil
}
}
}
return logger, nil
return nil, fmt.Errorf("invalid logger. Use 'proxy', 'upstream' or a model's ID")
}
}
+830 -149
View File
File diff suppressed because it is too large Load Diff
+81
View File
@@ -0,0 +1,81 @@
package proxy
import (
"net/http"
"strings"
)
// selectEncoding chooses the best encoding based on Accept-Encoding header
// Returns the encoding ("br", "gzip", or "") and the corresponding file extension
func selectEncoding(acceptEncoding string) (encoding, ext string) {
if acceptEncoding == "" {
return "", ""
}
for _, part := range strings.Split(acceptEncoding, ",") {
enc := strings.TrimSpace(strings.SplitN(part, ";", 2)[0])
if enc == "br" {
return "br", ".br"
}
}
for _, part := range strings.Split(acceptEncoding, ",") {
enc := strings.TrimSpace(strings.SplitN(part, ";", 2)[0])
if enc == "gzip" {
return "gzip", ".gz"
}
}
return "", ""
}
// ServeCompressedFile serves a file with compression support.
// It checks for pre-compressed versions and serves them with proper headers.
func ServeCompressedFile(fs http.FileSystem, w http.ResponseWriter, r *http.Request, name string) {
encoding, ext := selectEncoding(r.Header.Get("Accept-Encoding"))
// Try to serve compressed version if client supports it
if encoding != "" {
if cf, err := fs.Open(name + ext); err == nil {
defer cf.Close()
// Verify it's a regular file (not a directory)
if stat, err := cf.Stat(); err == nil && !stat.IsDir() {
// Set the content encoding header
w.Header().Set("Content-Encoding", encoding)
w.Header().Add("Vary", "Accept-Encoding")
// Get original file info for content type detection
origFile, err := fs.Open(name)
if err == nil {
origFile.Close()
}
// Serve the compressed file
http.ServeContent(w, r, name, stat.ModTime(), cf)
return
}
}
}
// Fall back to serving the uncompressed file
file, err := fs.Open(name)
if err != nil {
http.Error(w, err.Error(), http.StatusNotFound)
return
}
defer file.Close()
stat, err := file.Stat()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
if stat.IsDir() {
http.Error(w, "is a directory", http.StatusForbidden)
return
}
http.ServeContent(w, r, name, stat.ModTime(), file)
}
+283
View File
@@ -0,0 +1,283 @@
package proxy
import (
"bytes"
"compress/gzip"
"io"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strings"
"testing"
"testing/fstest"
"time"
)
func TestServeCompressedFile_Brotli(t *testing.T) {
// Create test content
content := []byte("This is test content that should be compressed with brotli")
brContent := []byte("fake-brotli-compressed-data")
// Create a test filesystem
mapFS := fstest.MapFS{
"test.js": {Data: content, ModTime: time.Now()},
"test.js.br": {Data: brContent, ModTime: time.Now()},
"test.js.gz": {Data: []byte("fake-gzip-data"), ModTime: time.Now()},
}
fs := http.FS(mapFS)
req := httptest.NewRequest(http.MethodGet, "/test.js", nil)
req.Header.Set("Accept-Encoding", "br, gzip")
w := httptest.NewRecorder()
ServeCompressedFile(fs, w, req, "test.js")
resp := w.Result()
body, _ := io.ReadAll(resp.Body)
if resp.StatusCode != http.StatusOK {
t.Errorf("Expected status 200, got %d", resp.StatusCode)
}
// Check that brotli is used (preferred over gzip)
if encoding := resp.Header.Get("Content-Encoding"); encoding != "br" {
t.Errorf("Expected Content-Encoding 'br', got '%s'", encoding)
}
if vary := resp.Header.Get("Vary"); vary != "Accept-Encoding" {
t.Errorf("Expected Vary 'Accept-Encoding', got '%s'", vary)
}
if !bytes.Equal(body, brContent) {
t.Errorf("Expected brotli content, got %s", string(body))
}
}
func TestServeCompressedFile_Gzip(t *testing.T) {
// Create test content
content := []byte("This is test content that should be compressed with gzip")
gzContent := []byte("fake-gzip-compressed-data")
// Create a test filesystem without brotli
mapFS := fstest.MapFS{
"test.js": {Data: content, ModTime: time.Now()},
"test.js.gz": {Data: gzContent, ModTime: time.Now()},
}
fs := http.FS(mapFS)
req := httptest.NewRequest(http.MethodGet, "/test.js", nil)
req.Header.Set("Accept-Encoding", "gzip")
w := httptest.NewRecorder()
ServeCompressedFile(fs, w, req, "test.js")
resp := w.Result()
body, _ := io.ReadAll(resp.Body)
if resp.StatusCode != http.StatusOK {
t.Errorf("Expected status 200, got %d", resp.StatusCode)
}
if encoding := resp.Header.Get("Content-Encoding"); encoding != "gzip" {
t.Errorf("Expected Content-Encoding 'gzip', got '%s'", encoding)
}
if !bytes.Equal(body, gzContent) {
t.Errorf("Expected gzip content, got %s", string(body))
}
}
func TestServeCompressedFile_UncompressedFallback(t *testing.T) {
// Create test content
content := []byte("This is uncompressed test content")
// Create a test filesystem without compressed versions
mapFS := fstest.MapFS{
"test.js": {Data: content, ModTime: time.Now()},
}
fs := http.FS(mapFS)
req := httptest.NewRequest(http.MethodGet, "/test.js", nil)
req.Header.Set("Accept-Encoding", "br, gzip")
w := httptest.NewRecorder()
ServeCompressedFile(fs, w, req, "test.js")
resp := w.Result()
body, _ := io.ReadAll(resp.Body)
if resp.StatusCode != http.StatusOK {
t.Errorf("Expected status 200, got %d", resp.StatusCode)
}
// Should not have Content-Encoding header since we're serving uncompressed
if encoding := resp.Header.Get("Content-Encoding"); encoding != "" {
t.Errorf("Expected no Content-Encoding, got '%s'", encoding)
}
if !bytes.Equal(body, content) {
t.Errorf("Expected original content, got %s", string(body))
}
}
func TestServeCompressedFile_NoAcceptEncoding(t *testing.T) {
// Create test content
content := []byte("This is test content")
// Create a test filesystem with compressed versions
mapFS := fstest.MapFS{
"test.js": {Data: content, ModTime: time.Now()},
"test.js.br": {Data: []byte("brotli"), ModTime: time.Now()},
"test.js.gz": {Data: []byte("gzip"), ModTime: time.Now()},
}
fs := http.FS(mapFS)
req := httptest.NewRequest(http.MethodGet, "/test.js", nil)
// No Accept-Encoding header
w := httptest.NewRecorder()
ServeCompressedFile(fs, w, req, "test.js")
resp := w.Result()
body, _ := io.ReadAll(resp.Body)
if resp.StatusCode != http.StatusOK {
t.Errorf("Expected status 200, got %d", resp.StatusCode)
}
// Should serve uncompressed content
if encoding := resp.Header.Get("Content-Encoding"); encoding != "" {
t.Errorf("Expected no Content-Encoding, got '%s'", encoding)
}
if !bytes.Equal(body, content) {
t.Errorf("Expected original content, got %s", string(body))
}
}
func TestServeCompressedFile_NotFound(t *testing.T) {
mapFS := fstest.MapFS{}
fs := http.FS(mapFS)
req := httptest.NewRequest(http.MethodGet, "/nonexistent.js", nil)
w := httptest.NewRecorder()
ServeCompressedFile(fs, w, req, "nonexistent.js")
resp := w.Result()
if resp.StatusCode != http.StatusNotFound {
t.Errorf("Expected status 404, got %d", resp.StatusCode)
}
}
func TestSelectEncoding(t *testing.T) {
tests := []struct {
acceptEncoding string
wantEncoding string
wantExt string
}{
{"br, gzip", "br", ".br"},
{"gzip, deflate", "gzip", ".gz"},
{"gzip", "gzip", ".gz"},
{"br", "br", ".br"},
{"", "", ""},
{"deflate", "", ""},
{"br;q=1.0, gzip;q=0.5", "br", ".br"},
{"gzip;q=1.0, br;q=0.5", "br", ".br"},
{"browser", "", ""},
{"compress, deflate", "", ""},
}
for _, tt := range tests {
gotEncoding, gotExt := selectEncoding(tt.acceptEncoding)
if gotEncoding != tt.wantEncoding || gotExt != tt.wantExt {
t.Errorf("selectEncoding(%q) = (%q, %q), want (%q, %q)",
tt.acceptEncoding, gotEncoding, gotExt, tt.wantEncoding, tt.wantExt)
}
}
}
// Test with actual pre-compressed files from ui_dist
func TestServeCompressedFile_RealFiles(t *testing.T) {
// Check if ui_dist exists
if _, err := os.Stat("./ui_dist"); os.IsNotExist(err) {
t.Skip("ui_dist not found, skipping real file test")
}
// Find a .js or .css file that has compressed versions
entries, err := os.ReadDir("./ui_dist/assets")
if err != nil {
t.Skipf("Could not read ui_dist/assets: %v", err)
}
var testFile string
for _, entry := range entries {
name := entry.Name()
if strings.HasSuffix(name, ".js") && !strings.HasSuffix(name, ".js.gz") && !strings.HasSuffix(name, ".js.br") {
// Check if compressed versions exist
base := strings.TrimSuffix(name, ".js")
if _, err := os.Stat(filepath.Join("./ui_dist/assets", base+".js.gz")); err == nil {
testFile = "assets/" + name
break
}
}
}
if testFile == "" {
t.Skip("No suitable test file found with compressed versions")
}
fs := http.FS(os.DirFS("./ui_dist"))
// Test brotli
t.Run("brotli", func(t *testing.T) {
req := httptest.NewRequest(http.MethodGet, "/"+testFile, nil)
req.Header.Set("Accept-Encoding", "br")
w := httptest.NewRecorder()
ServeCompressedFile(fs, w, req, testFile)
resp := w.Result()
if resp.StatusCode != http.StatusOK {
t.Fatalf("Expected status 200, got %d", resp.StatusCode)
}
if encoding := resp.Header.Get("Content-Encoding"); encoding != "br" {
t.Errorf("Expected Content-Encoding 'br', got '%s'", encoding)
}
})
// Test gzip
t.Run("gzip", func(t *testing.T) {
req := httptest.NewRequest(http.MethodGet, "/"+testFile, nil)
req.Header.Set("Accept-Encoding", "gzip")
w := httptest.NewRecorder()
ServeCompressedFile(fs, w, req, testFile)
resp := w.Result()
if resp.StatusCode != http.StatusOK {
t.Fatalf("Expected status 200, got %d", resp.StatusCode)
}
if encoding := resp.Header.Get("Content-Encoding"); encoding != "gzip" {
t.Errorf("Expected Content-Encoding 'gzip', got '%s'", encoding)
}
// Verify it's valid gzip
reader, err := gzip.NewReader(resp.Body)
if err != nil {
t.Errorf("Expected valid gzip content: %v", err)
return
}
defer reader.Close()
// Just read to verify it's valid
_, err = io.Copy(io.Discard, reader)
if err != nil {
t.Errorf("Failed to decompress gzip: %v", err)
}
})
}
+2
View File
@@ -0,0 +1,2 @@
node_modules
.vite
+3 -3
View File
@@ -10,8 +10,8 @@
<link rel="manifest" href="/site.webmanifest" />
<title>llama-swap</title>
</head>
<body >
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
<body>
<div id="app"></div>
<script type="module" src="/src/main.ts"></script>
</body>
</html>
+4119
View File
File diff suppressed because it is too large Load Diff
+42
View File
@@ -0,0 +1,42 @@
{
"name": "ui-svelte",
"private": true,
"version": "0.0.0",
"type": "module",
"scripts": {
"start": "vite",
"build": "vite build --emptyOutDir",
"preview": "vite preview",
"check": "svelte-check --tsconfig ./tsconfig.json",
"test": "vitest run",
"test:watch": "vitest"
},
"devDependencies": {
"@sveltejs/vite-plugin-svelte": "^5.0.3",
"@tailwindcss/vite": "^4.1.8",
"@tsconfig/svelte": "^5.0.4",
"@types/hast": "^3.0.4",
"@types/node": "^25.1.0",
"svelte": "^5.19.0",
"svelte-check": "^4.1.4",
"tailwindcss": "^4.1.8",
"typescript": "~5.8.3",
"vite": "^6.3.5",
"vite-plugin-compression2": "^2.4.0",
"vitest": "^4.0.18"
},
"dependencies": {
"highlight.js": "^11.11.1",
"katex": "^0.16.28",
"lucide-svelte": "^0.563.0",
"rehype-katex": "^7.0.1",
"rehype-stringify": "^10.0.1",
"remark-gfm": "^4.0.1",
"remark-math": "^6.0.0",
"remark-parse": "^11.0.0",
"remark-rehype": "^11.1.2",
"svelte-spa-router": "^4.0.1",
"unified": "^11.0.5",
"unist-util-visit": "^5.1.0"
}
}

Before

Width:  |  Height:  |  Size: 5.9 KiB

After

Width:  |  Height:  |  Size: 5.9 KiB

Before

Width:  |  Height:  |  Size: 2.2 KiB

After

Width:  |  Height:  |  Size: 2.2 KiB

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 15 KiB

Before

Width:  |  Height:  |  Size: 38 KiB

After

Width:  |  Height:  |  Size: 38 KiB

Before

Width:  |  Height:  |  Size: 6.5 KiB

After

Width:  |  Height:  |  Size: 6.5 KiB

Before

Width:  |  Height:  |  Size: 28 KiB

After

Width:  |  Height:  |  Size: 28 KiB

+58
View File
@@ -0,0 +1,58 @@
<script lang="ts">
import { onMount } from "svelte";
import Router from "svelte-spa-router";
import Header from "./components/Header.svelte";
import LogViewer from "./routes/LogViewer.svelte";
import Models from "./routes/Models.svelte";
import Activity from "./routes/Activity.svelte";
import Playground from "./routes/Playground.svelte";
import PlaygroundStub from "./routes/PlaygroundStub.svelte";
import { enableAPIEvents } from "./stores/api";
import { initScreenWidth, isDarkMode, appTitle, connectionState } from "./stores/theme";
import { currentRoute } from "./stores/route";
const routes = {
"/": PlaygroundStub,
"/models": Models,
"/logs": LogViewer,
"/activity": Activity,
"*": PlaygroundStub,
};
function handleRouteLoaded(event: { detail: { route: string | RegExp } }) {
const route = event.detail.route;
currentRoute.set(typeof route === "string" ? route : "/");
}
$effect(() => {
document.documentElement.setAttribute("data-theme", $isDarkMode ? "dark" : "light");
});
$effect(() => {
const icon = $connectionState === "connecting" ? "\u{1F7E1}" : $connectionState === "connected" ? "\u{1F7E2}" : "\u{1F534}";
document.title = `${icon} ${$appTitle}`;
});
onMount(() => {
const cleanupScreenWidth = initScreenWidth();
enableAPIEvents(true);
return () => {
cleanupScreenWidth();
enableAPIEvents(false);
};
});
</script>
<div class="flex flex-col h-screen">
<Header />
<main class="flex-1 overflow-auto p-4">
<div class="h-full" class:hidden={$currentRoute !== "/"}>
<Playground />
</div>
<div class="h-full" class:hidden={$currentRoute === "/"}>
<Router {routes} on:routeLoaded={handleRouteLoaded} />
</div>
</main>
</div>

Before

Width:  |  Height:  |  Size: 12 KiB

After

Width:  |  Height:  |  Size: 12 KiB

Before

Width:  |  Height:  |  Size: 4.0 KiB

After

Width:  |  Height:  |  Size: 4.0 KiB

@@ -0,0 +1,452 @@
<script lang="ts">
import type { ReqRespCapture } from "../lib/types";
interface Props {
capture: ReqRespCapture | null;
open: boolean;
onclose: () => void;
}
let { capture, open, onclose }: Props = $props();
let dialogEl: HTMLDialogElement | undefined = $state();
type BodyTab = "raw" | "pretty" | "chat";
let reqBodyTab: BodyTab = $state("pretty");
let respBodyTab: BodyTab = $state("pretty");
let copiedReq = $state(false);
let copiedResp = $state(false);
$effect(() => {
if (open && dialogEl) {
dialogEl.showModal();
} else if (!open && dialogEl) {
dialogEl.close();
}
});
// Reset tabs when capture changes
$effect(() => {
if (capture) {
const reqCt = getContentType(capture.req_headers);
const respCt = getContentType(capture.resp_headers);
reqBodyTab = reqCt.includes("json") ? "pretty" : "raw";
respBodyTab = respCt.includes("text/event-stream")
? "chat"
: respCt.includes("json")
? "pretty"
: "raw";
}
});
function handleDialogClose() {
onclose();
}
function decodeBody(body: string | null | undefined): string {
if (!body) return "";
try {
const binary = atob(body);
const bytes = Uint8Array.from(binary, (c) => c.charCodeAt(0));
return new TextDecoder().decode(bytes);
} catch {
return body;
}
}
function formatJson(str: string): string {
try {
const parsed = JSON.parse(str);
return JSON.stringify(parsed, null, 2);
} catch {
return str;
}
}
function getContentType(
headers: Record<string, string> | null | undefined,
): string {
if (!headers) return "";
const ct = headers["Content-Type"] || headers["content-type"] || "";
return ct.toLowerCase();
}
function isImageContentType(contentType: string): boolean {
return contentType.startsWith("image/");
}
function isTextContentType(contentType: string): boolean {
return (
contentType.startsWith("text/") ||
contentType.includes("application/json") ||
contentType.includes("application/xml") ||
contentType.includes("application/javascript")
);
}
function getImageDataUrl(body: string, contentType: string): string {
const mimeType = contentType.split(";")[0].trim();
return `data:${mimeType};base64,${body}`;
}
interface SSEChat {
reasoning: string;
content: string;
}
function parseSSEChat(text: string): SSEChat {
const result: SSEChat = { reasoning: "", content: "" };
for (const line of text.split("\n")) {
const trimmed = line.trim();
if (!trimmed || !trimmed.startsWith("data: ")) continue;
const data = trimmed.slice(6);
if (data === "[DONE]") continue;
try {
const parsed = JSON.parse(data);
const delta = parsed.choices?.[0]?.delta;
if (delta?.content) result.content += delta.content;
if (delta?.reasoning_content) result.reasoning += delta.reasoning_content;
} catch {
// skip unparseable lines
}
}
return result;
}
async function copyToClipboard(text: string, type: "req" | "resp") {
try {
await navigator.clipboard.writeText(text);
if (type === "req") {
copiedReq = true;
setTimeout(() => (copiedReq = false), 1500);
} else {
copiedResp = true;
setTimeout(() => (copiedResp = false), 1500);
}
} catch {
// ignore
}
}
function getCopyText(): string {
if (respBodyTab === "chat") {
let text = "";
if (sseChat.reasoning) text += sseChat.reasoning + "\n\n";
text += sseChat.content;
return text;
}
return displayedResponseBody;
}
// Request body derivations
let requestContentType = $derived(
capture ? getContentType(capture.req_headers) : "",
);
let isRequestJson = $derived(requestContentType.includes("json"));
let requestBodyRaw = $derived.by(() => {
if (!capture) return "";
return decodeBody(capture.req_body);
});
let requestBodyPretty = $derived.by(() => {
if (!isRequestJson) return requestBodyRaw;
return formatJson(requestBodyRaw);
});
let displayedRequestBody = $derived(
reqBodyTab === "pretty" ? requestBodyPretty : requestBodyRaw,
);
// Response body derivations
let responseContentType = $derived(
capture ? getContentType(capture.resp_headers) : "",
);
let isResponseImage = $derived(isImageContentType(responseContentType));
let isResponseText = $derived(isTextContentType(responseContentType));
let isResponseJson = $derived(responseContentType.includes("json"));
let isSSE = $derived(responseContentType.includes("text/event-stream"));
let responseBodyRaw = $derived.by(() => {
if (!capture) return "";
return decodeBody(capture.resp_body);
});
let responseBodyPretty = $derived.by(() => {
if (!isResponseJson) return responseBodyRaw;
return formatJson(responseBodyRaw);
});
let sseChat = $derived.by(() => {
if (!isSSE || !responseBodyRaw)
return { reasoning: "", content: "" } as SSEChat;
return parseSSEChat(responseBodyRaw);
});
let displayedResponseBody = $derived.by(() => {
if (respBodyTab === "pretty") return responseBodyPretty;
return responseBodyRaw;
});
</script>
<dialog
bind:this={dialogEl}
onclose={handleDialogClose}
class="bg-surface text-txtmain rounded-lg shadow-xl max-w-4xl w-full max-h-[90vh] p-0 backdrop:bg-black/50 m-auto"
>
{#if capture}
<div class="flex flex-col max-h-[90vh]">
<div
class="flex justify-between items-center p-4 border-b border-card-border"
>
<h2 class="text-xl font-bold pb-0">Capture #{capture.id + 1}{#if capture.req_path} <span class="text-base font-mono font-normal text-txtsecondary">{capture.req_path}</span>{/if}</h2>
<button
onclick={() => dialogEl?.close()}
class="text-txtsecondary hover:text-txtmain text-2xl leading-none"
>
&times;
</button>
</div>
<div class="overflow-y-auto flex-1 p-4 space-y-4">
<!-- Request Headers -->
<details class="group" open>
<summary
class="cursor-pointer font-semibold text-sm uppercase tracking-wider text-txtsecondary hover:text-txtmain"
>
Request Headers
</summary>
<div
class="mt-2 bg-background rounded border border-card-border overflow-auto max-h-48"
>
<table class="w-full text-sm">
<tbody>
{#each Object.entries(capture.req_headers || {}) as [key, value]}
<tr class="border-b border-card-border-inner last:border-0">
<td class="px-3 py-1 font-mono text-primary whitespace-nowrap"
>{key}</td
>
<td class="px-3 py-1 font-mono break-all">{value}</td>
</tr>
{/each}
</tbody>
</table>
</div>
</details>
<!-- Request Body -->
<details class="group" open>
<summary
class="cursor-pointer font-semibold text-sm uppercase tracking-wider text-txtsecondary hover:text-txtmain"
>
Request Body
</summary>
{#if requestBodyRaw}
<div class="mt-2 flex items-center justify-between">
<div class="flex gap-1">
{#if isRequestJson}
<button
class="tab-btn"
class:tab-btn-active={reqBodyTab === "pretty"}
onclick={() => (reqBodyTab = "pretty")}>Pretty</button
>
<button
class="tab-btn"
class:tab-btn-active={reqBodyTab === "raw"}
onclick={() => (reqBodyTab = "raw")}>Raw</button
>
{/if}
</div>
<button
class="tab-btn"
onclick={() =>
copyToClipboard(displayedRequestBody, "req")}
>
{#if copiedReq}
Copied!
{:else}
Copy
{/if}
</button>
</div>
<div
class="mt-1 bg-background rounded border border-card-border overflow-auto max-h-96"
>
<pre
class="p-3 text-sm font-mono whitespace-pre-wrap break-all">{displayedRequestBody}</pre>
</div>
{:else}
<div
class="mt-2 bg-background rounded border border-card-border overflow-auto max-h-96"
>
<pre class="p-3 text-sm font-mono whitespace-pre-wrap break-all"
>(empty)</pre
>
</div>
{/if}
</details>
<!-- Response Headers -->
<details class="group" open>
<summary
class="cursor-pointer font-semibold text-sm uppercase tracking-wider text-txtsecondary hover:text-txtmain"
>
Response Headers
</summary>
<div
class="mt-2 bg-background rounded border border-card-border overflow-auto max-h-48"
>
<table class="w-full text-sm">
<tbody>
{#each Object.entries(capture.resp_headers || {}) as [key, value]}
<tr class="border-b border-card-border-inner last:border-0">
<td class="px-3 py-1 font-mono text-primary whitespace-nowrap"
>{key}</td
>
<td class="px-3 py-1 font-mono break-all">{value}</td>
</tr>
{/each}
</tbody>
</table>
</div>
</details>
<!-- Response Body -->
<details class="group" open>
<summary
class="cursor-pointer font-semibold text-sm uppercase tracking-wider text-txtsecondary hover:text-txtmain"
>
Response Body
</summary>
{#if isResponseImage && capture.resp_body}
<div
class="mt-2 bg-background rounded border border-card-border overflow-auto max-h-96"
>
<div class="p-3 flex justify-center">
<img
src={getImageDataUrl(capture.resp_body, responseContentType)}
alt="Response"
class="max-w-full h-auto"
/>
</div>
</div>
{:else if isSSE || isResponseText}
<div class="mt-2 flex items-center justify-between">
<div class="flex gap-1">
{#if isSSE}
<button
class="tab-btn"
class:tab-btn-active={respBodyTab === "chat"}
onclick={() => (respBodyTab = "chat")}>Chat</button
>
{/if}
{#if isResponseJson}
<button
class="tab-btn"
class:tab-btn-active={respBodyTab === "pretty"}
onclick={() => (respBodyTab = "pretty")}>Pretty</button
>
{/if}
{#if isSSE || isResponseJson}
<button
class="tab-btn"
class:tab-btn-active={respBodyTab === "raw"}
onclick={() => (respBodyTab = "raw")}>Raw</button
>
{/if}
</div>
<button
class="tab-btn"
onclick={() => copyToClipboard(getCopyText(), "resp")}
>
{#if copiedResp}
Copied!
{:else}
Copy
{/if}
</button>
</div>
<div
class="mt-1 bg-background rounded border border-card-border overflow-auto max-h-96"
>
{#if respBodyTab === "chat"}
<div class="p-3 text-sm space-y-3">
{#if sseChat.reasoning}
<div>
<div
class="text-xs font-semibold uppercase tracking-wider text-txtsecondary mb-1"
>
Reasoning
</div>
<pre
class="font-mono whitespace-pre-wrap break-all text-txtsecondary">{sseChat.reasoning}</pre>
</div>
{/if}
{#if sseChat.content}
<div>
{#if sseChat.reasoning}
<div
class="text-xs font-semibold uppercase tracking-wider text-txtsecondary mb-1"
>
Response
</div>
{/if}
<pre
class="font-mono whitespace-pre-wrap break-all">{sseChat.content}</pre>
</div>
{/if}
{#if !sseChat.reasoning && !sseChat.content}
<pre class="font-mono">(empty)</pre>
{/if}
</div>
{:else}
<pre
class="p-3 text-sm font-mono whitespace-pre-wrap break-all">{displayedResponseBody || "(empty)"}</pre>
{/if}
</div>
{:else if responseBodyRaw}
<div
class="mt-2 bg-background rounded border border-card-border overflow-auto max-h-96"
>
<div class="p-3 text-sm text-txtsecondary italic">
(binary data - {responseContentType || "unknown content type"})
</div>
</div>
{:else}
<div
class="mt-2 bg-background rounded border border-card-border overflow-auto max-h-96"
>
<pre class="p-3 text-sm font-mono">(empty)</pre>
</div>
{/if}
</details>
</div>
<div class="p-4 border-t border-card-border flex justify-end">
<button onclick={() => dialogEl?.close()} class="btn"> Close </button>
</div>
</div>
{/if}
</dialog>
<style>
.tab-btn {
padding: 2px 10px;
font-size: 0.75rem;
border-radius: 4px;
color: var(--color-txtsecondary);
cursor: pointer;
border: 1px solid transparent;
background: transparent;
transition: all 0.15s;
}
.tab-btn:hover {
color: var(--color-txtmain);
background: var(--color-secondary);
}
.tab-btn-active {
color: var(--color-primary);
background: color-mix(in srgb, var(--color-primary) 12%, transparent);
border-color: color-mix(in srgb, var(--color-primary) 25%, transparent);
}
</style>
@@ -0,0 +1,24 @@
<script lang="ts">
import { connectionState } from "../stores/theme";
import { versionInfo } from "../stores/api";
let eventStatusColor = $derived.by(() => {
switch ($connectionState) {
case "connected":
return "bg-emerald-500";
case "connecting":
return "bg-amber-500";
case "disconnected":
default:
return "bg-red-500";
}
});
let tooltipText = $derived(
`Event Stream: ${$connectionState ?? "unknown"}\nAPI Version: ${$versionInfo?.version ?? "unknown"}\nCommit Hash: ${$versionInfo?.commit?.substring(0, 7) ?? "unknown"}\nBuild Date: ${$versionInfo?.build_date ?? "unknown"}`
);
</script>
<div class="flex items-center" title={tooltipText}>
<span class="inline-block w-3 h-3 rounded-full {eventStatusColor} mr-2"></span>
</div>
+120
View File
@@ -0,0 +1,120 @@
<script lang="ts">
import { link } from "svelte-spa-router";
import { screenWidth, toggleTheme, isDarkMode, appTitle, isNarrow } from "../stores/theme";
import { currentRoute } from "../stores/route";
import { playgroundActivity } from "../stores/playgroundActivity";
import ConnectionStatus from "./ConnectionStatus.svelte";
function handleTitleChange(newTitle: string): void {
const sanitized = newTitle.replace(/\n/g, "").trim().substring(0, 64) || "llama-swap";
appTitle.set(sanitized);
}
function handleKeyDown(e: KeyboardEvent): void {
if (e.key === "Enter") {
e.preventDefault();
const target = e.currentTarget as HTMLElement;
handleTitleChange(target.textContent || "(set title)");
target.blur();
}
}
function handleBlur(e: FocusEvent): void {
const target = e.currentTarget as HTMLElement;
handleTitleChange(target.textContent || "(set title)");
}
function isActive(path: string, current: string): boolean {
return path === "/" ? current === "/" : current.startsWith(path);
}
</script>
<header
class="flex items-center justify-between bg-surface border-b border-border px-4 {$isNarrow
? 'py-1 h-[60px]'
: 'p-2 h-[75px]'}"
>
{#if $screenWidth !== "xs" && $screenWidth !== "sm"}
<h1
contenteditable="true"
class="p-0 outline-none hover:bg-gray-100 dark:hover:bg-gray-700 rounded"
onblur={handleBlur}
onkeydown={handleKeyDown}
>
{$appTitle}
</h1>
{/if}
<menu class="flex items-center gap-4 overflow-x-auto">
<a
href="/"
use:link
class="p-1 whitespace-nowrap {isActive('/', $currentRoute) ? 'font-semibold' : ''} {$playgroundActivity ? 'activity-link' : 'text-gray-600 hover:text-black dark:text-gray-300 dark:hover:text-gray-100'}"
>
Playground
</a>
<a
href="/models"
use:link
class="text-gray-600 hover:text-black dark:text-gray-300 dark:hover:text-gray-100 p-1 whitespace-nowrap"
class:font-semibold={isActive("/models", $currentRoute)}
>
Models
</a>
<a
href="/activity"
use:link
class="text-gray-600 hover:text-black dark:text-gray-300 dark:hover:text-gray-100 p-1 whitespace-nowrap"
class:font-semibold={isActive("/activity", $currentRoute)}
>
Activity
</a>
<a
href="/logs"
use:link
class="text-gray-600 hover:text-black dark:text-gray-300 dark:hover:text-gray-100 p-1 whitespace-nowrap"
class:font-semibold={isActive("/logs", $currentRoute)}
>
Logs
</a>
<button onclick={toggleTheme} title="Toggle theme">
{#if $isDarkMode}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-5 h-5">
<path
fill-rule="evenodd"
d="M9.528 1.718a.75.75 0 0 1 .162.819A8.97 8.97 0 0 0 9 6a9 9 0 0 0 9 9 8.97 8.97 0 0 0 3.463-.69.75.75 0 0 1 .981.98 10.503 10.503 0 0 1-9.694 6.46c-5.799 0-10.5-4.7-10.5-10.5 0-4.368 2.667-8.112 6.46-9.694a.75.75 0 0 1 .818.162Z"
clip-rule="evenodd"
/>
</svg>
{:else}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-5 h-5">
<path
d="M12 2.25a.75.75 0 0 1 .75.75v2.25a.75.75 0 0 1-1.5 0V3a.75.75 0 0 1 .75-.75ZM7.5 12a4.5 4.5 0 1 1 9 0 4.5 4.5 0 0 1-9 0ZM18.894 6.166a.75.75 0 0 0-1.06-1.06l-1.591 1.59a.75.75 0 1 0 1.06 1.061l1.591-1.59ZM21.75 12a.75.75 0 0 1-.75.75h-2.25a.75.75 0 0 1 0-1.5H21a.75.75 0 0 1 .75.75ZM17.834 18.894a.75.75 0 0 0 1.06-1.06l-1.59-1.591a.75.75 0 1 0-1.061 1.06l1.591 1.591ZM12 18a.75.75 0 0 1 .75.75V21a.75.75 0 0 1-1.5 0v-2.25A.75.75 0 0 1 12 18ZM7.758 17.303a.75.75 0 0 0-1.061-1.06l-1.591 1.59a.75.75 0 0 0 1.06 1.061l1.591-1.59ZM6 12a.75.75 0 0 1-.75.75H3a.75.75 0 0 1 0-1.5h2.25A.75.75 0 0 1 6 12ZM6.697 7.757a.75.75 0 0 0 1.06-1.06l-1.59-1.591a.75.75 0 0 0-1.061 1.06l1.59 1.591Z"
/>
</svg>
{/if}
</button>
<ConnectionStatus />
</menu>
</header>
<style>
.activity-link {
background: linear-gradient(90deg, #6366f1, #8b5cf6, #a855f7, #8b5cf6, #6366f1);
background-size: 200% 100%;
-webkit-background-clip: text;
background-clip: text;
-webkit-text-fill-color: transparent;
animation: gradient-shift 2s linear infinite;
}
@keyframes gradient-shift {
0% {
background-position: 0% 50%;
}
100% {
background-position: 200% 50%;
}
}
</style>
+139
View File
@@ -0,0 +1,139 @@
<script lang="ts">
import { persistentStore } from "../stores/persistent";
interface Props {
id: string;
title: string;
logData: string;
}
let { id, title, logData }: Props = $props();
let filterRegex = $state("");
// Create persistent stores for this panel (id is intentionally captured at init time)
// svelte-ignore state_referenced_locally
const fontSizeStore = persistentStore<"xxs" | "xs" | "small" | "normal">(`logPanel-${id}-fontSize`, "normal");
// svelte-ignore state_referenced_locally
const wrapTextStore = persistentStore<boolean>(`logPanel-${id}-wrapText`, false);
// svelte-ignore state_referenced_locally
const showFilterStore = persistentStore<boolean>(`logPanel-${id}-showFilter`, false);
let textWrapClass = $derived($wrapTextStore ? "whitespace-pre-wrap" : "whitespace-pre");
function toggleFontSize(): void {
fontSizeStore.update((prev) => {
switch (prev) {
case "xxs": return "xs";
case "xs": return "small";
case "small": return "normal";
case "normal": return "xxs";
}
});
}
function toggleWrapText(): void {
wrapTextStore.update((prev) => !prev);
}
function toggleFilter(): void {
if ($showFilterStore) {
showFilterStore.set(false);
filterRegex = "";
} else {
showFilterStore.set(true);
}
}
let fontSizeClass = $derived.by(() => {
switch ($fontSizeStore) {
case "xxs": return "text-[0.5rem]";
case "xs": return "text-[0.75rem]";
case "small": return "text-[0.875rem]";
case "normal": return "text-base";
}
});
let filteredLogs = $derived.by(() => {
if (!filterRegex) return logData;
try {
const regex = new RegExp(filterRegex, "i");
return logData.split("\n").filter((line) => regex.test(line)).join("\n");
} catch {
return logData;
}
});
let preElement: HTMLPreElement;
let userScrolledUp = $state(false);
function handleScroll() {
if (!preElement) return;
const { scrollTop, scrollHeight, clientHeight } = preElement;
userScrolledUp = scrollHeight - scrollTop - clientHeight > 40;
}
// Auto scroll to bottom when logs change, unless user has scrolled up
$effect(() => {
if (preElement && filteredLogs && !userScrolledUp) {
preElement.scrollTop = preElement.scrollHeight;
}
});
</script>
<div class="rounded-lg overflow-hidden flex flex-col bg-gray-950/5 dark:bg-white/10 h-full w-full p-1">
<div class="p-4">
<div class="flex items-center justify-between">
<h3 class="m-0 text-lg p-0">{title}</h3>
<div class="flex gap-2 items-center">
<button class="btn border-0" onclick={toggleFontSize} title="Change font size">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-4 h-4">
<path d="M2 4v3h5v12h3V7h5V4H2zm19 5h-9v3h3v7h3v-7h3V9z"/>
</svg>
</button>
<button class="btn border-0" onclick={toggleWrapText} title="Toggle text wrap">
{#if $wrapTextStore}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-4 h-4">
<path fill-rule="evenodd" d="M3 6.75A.75.75 0 0 1 3.75 6h16.5a.75.75 0 0 1 0 1.5H3.75A.75.75 0 0 1 3 6.75ZM3 12a.75.75 0 0 1 .75-.75h16.5a.75.75 0 0 1 0 1.5H3.75A.75.75 0 0 1 3 12Zm0 5.25a.75.75 0 0 1 .75-.75h16.5a.75.75 0 0 1 0 1.5H3.75a.75.75 0 0 1-.75-.75Z" clip-rule="evenodd" />
</svg>
{:else}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-4 h-4">
<path fill-rule="evenodd" d="M3 6.75A.75.75 0 0 1 3.75 6h16.5a.75.75 0 0 1 0 1.5H3.75A.75.75 0 0 1 3 6.75ZM3 12a.75.75 0 0 1 .75-.75h10.5a.75.75 0 0 1 0 1.5H3.75A.75.75 0 0 1 3 12Zm0 5.25a.75.75 0 0 1 .75-.75h16.5a.75.75 0 0 1 0 1.5H3.75a.75.75 0 0 1-.75-.75Z" clip-rule="evenodd" />
</svg>
{/if}
</button>
<button class="btn border-0" onclick={toggleFilter} title="Toggle filter">
{#if $showFilterStore}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-4 h-4">
<path fill-rule="evenodd" d="M10.5 3.75a6.75 6.75 0 1 0 0 13.5 6.75 6.75 0 0 0 0-13.5ZM2.25 10.5a8.25 8.25 0 1 1 14.59 5.28l4.69 4.69a.75.75 0 1 1-1.06 1.06l-4.69-4.69A8.25 8.25 0 0 1 2.25 10.5Z" clip-rule="evenodd" />
</svg>
{:else}
<svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" class="w-4 h-4">
<path stroke-linecap="round" stroke-linejoin="round" d="m21 21-5.197-5.197m0 0A7.5 7.5 0 1 0 5.196 5.196a7.5 7.5 0 0 0 10.607 10.607Z" />
</svg>
{/if}
</button>
</div>
</div>
{#if $showFilterStore}
<div class="mt-2 flex gap-2 items-center w-full">
<input
type="text"
class="w-full text-sm border border-gray-950/10 dark:border-white/5 p-2 rounded outline-none"
placeholder="Filter logs (regex)..."
bind:value={filterRegex}
/>
<button class="pl-2" onclick={() => (filterRegex = "")} aria-label="Clear filter">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-6 h-6">
<path fill-rule="evenodd" d="M12 2.25c-5.385 0-9.75 4.365-9.75 9.75s4.365 9.75 9.75 9.75 9.75-4.365 9.75-9.75S17.385 2.25 12 2.25Zm-1.72 6.97a.75.75 0 1 0-1.06 1.06L10.94 12l-1.72 1.72a.75.75 0 1 0 1.06 1.06L12 13.06l1.72 1.72a.75.75 0 1 0 1.06-1.06L13.06 12l1.72-1.72a.75.75 0 1 0-1.06-1.06L12 10.94l-1.72-1.72Z" clip-rule="evenodd" />
</svg>
</button>
</div>
{/if}
</div>
<div class="rounded-lg bg-background font-mono text-sm flex-1 overflow-hidden">
<pre bind:this={preElement} onscroll={handleScroll} class="{textWrapClass} {fontSizeClass} h-full overflow-auto p-4">{filteredLogs}</pre>
</div>
</div>
+211
View File
@@ -0,0 +1,211 @@
<script lang="ts">
import { models, loadModel, unloadAllModels, unloadSingleModel } from "../stores/api";
import { isNarrow } from "../stores/theme";
import { persistentStore } from "../stores/persistent";
import type { Model } from "../lib/types";
let isUnloading = $state(false);
let menuOpen = $state(false);
const showUnlistedStore = persistentStore<boolean>("showUnlisted", true);
const showIdorNameStore = persistentStore<"id" | "name">("showIdorName", "id");
let filteredModels = $derived.by(() => {
const filtered = $models.filter((model) => $showUnlistedStore || !model.unlisted);
const peerModels = filtered.filter((m) => m.peerID);
// Group peer models by peerID
const grouped = peerModels.reduce(
(acc, model) => {
const peerId = model.peerID || "unknown";
if (!acc[peerId]) acc[peerId] = [];
acc[peerId].push(model);
return acc;
},
{} as Record<string, Model[]>
);
return {
regularModels: filtered.filter((m) => !m.peerID),
peerModelsByPeerId: grouped,
};
});
async function handleUnloadAllModels(): Promise<void> {
isUnloading = true;
try {
await unloadAllModels();
} catch (e) {
console.error(e);
} finally {
setTimeout(() => (isUnloading = false), 1000);
}
}
function toggleIdorName(): void {
showIdorNameStore.update((prev) => (prev === "name" ? "id" : "name"));
}
function toggleShowUnlisted(): void {
showUnlistedStore.update((prev) => !prev);
}
function getModelDisplay(model: Model): string {
return $showIdorNameStore === "id" ? model.id : (model.name || model.id);
}
</script>
<div class="card h-full flex flex-col">
<div class="shrink-0">
<div class="flex justify-between items-baseline">
<h2 class={$isNarrow ? "text-xl" : ""}>Models</h2>
{#if $isNarrow}
<div class="relative">
<button class="btn text-base flex items-center gap-2 py-1" onclick={() => (menuOpen = !menuOpen)} aria-label="Toggle menu">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-5 h-5">
<path fill-rule="evenodd" d="M3 6.75A.75.75 0 0 1 3.75 6h16.5a.75.75 0 0 1 0 1.5H3.75A.75.75 0 0 1 3 6.75ZM3 12a.75.75 0 0 1 .75-.75h16.5a.75.75 0 0 1 0 1.5H3.75A.75.75 0 0 1 3 12Zm0 5.25a.75.75 0 0 1 .75-.75h16.5a.75.75 0 0 1 0 1.5H3.75a.75.75 0 0 1-.75-.75Z" clip-rule="evenodd" />
</svg>
</button>
{#if menuOpen}
<div class="absolute right-0 mt-2 w-48 bg-surface border border-gray-200 dark:border-white/10 rounded shadow-lg z-20">
<button
class="w-full text-left px-4 py-2 hover:bg-secondary-hover flex items-center gap-2"
onclick={() => { toggleIdorName(); menuOpen = false; }}
>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-5 h-5">
<path fill-rule="evenodd" d="M15.97 2.47a.75.75 0 0 1 1.06 0l4.5 4.5a.75.75 0 0 1 0 1.06l-4.5 4.5a.75.75 0 1 1-1.06-1.06l3.22-3.22H7.5a.75.75 0 0 1 0-1.5h11.69l-3.22-3.22a.75.75 0 0 1 0-1.06Zm-7.94 9a.75.75 0 0 1 0 1.06l-3.22 3.22H16.5a.75.75 0 0 1 0 1.5H4.81l3.22 3.22a.75.75 0 1 1-1.06 1.06l-4.5-4.5a.75.75 0 0 1 0-1.06l4.5-4.5a.75.75 0 0 1 1.06 0Z" clip-rule="evenodd" />
</svg>
{$showIdorNameStore === "id" ? "Show Name" : "Show ID"}
</button>
<button
class="w-full text-left px-4 py-2 hover:bg-secondary-hover flex items-center gap-2"
onclick={() => { toggleShowUnlisted(); menuOpen = false; }}
>
{#if $showUnlistedStore}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-5 h-5">
<path d="M3.53 2.47a.75.75 0 0 0-1.06 1.06l18 18a.75.75 0 1 0 1.06-1.06l-18-18ZM22.676 12.553a11.249 11.249 0 0 1-2.631 4.31l-3.099-3.099a5.25 5.25 0 0 0-6.71-6.71L7.759 4.577a11.217 11.217 0 0 1 4.242-.827c4.97 0 9.185 3.223 10.675 7.69.12.362.12.752 0 1.113Z" />
<path d="M15.75 12c0 .18-.013.357-.037.53l-4.244-4.243A3.75 3.75 0 0 1 15.75 12ZM12.53 15.713l-4.243-4.244a3.75 3.75 0 0 0 4.244 4.243Z" />
<path d="M6.75 12c0-.619.107-1.213.304-1.764l-3.1-3.1a11.25 11.25 0 0 0-2.63 4.31c-.12.362-.12.752 0 1.114 1.489 4.467 5.704 7.69 10.675 7.69 1.5 0 2.933-.294 4.242-.827l-2.477-2.477A5.25 5.25 0 0 1 6.75 12Z" />
</svg>
{:else}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-5 h-5">
<path d="M12 15a3 3 0 1 0 0-6 3 3 0 0 0 0 6Z" />
<path fill-rule="evenodd" d="M1.323 11.447C2.811 6.976 7.028 3.75 12.001 3.75c4.97 0 9.185 3.223 10.675 7.69.12.362.12.752 0 1.113-1.487 4.471-5.705 7.697-10.677 7.697-4.97 0-9.186-3.223-10.675-7.69a1.762 1.762 0 0 1 0-1.113ZM17.25 12a5.25 5.25 0 1 1-10.5 0 5.25 5.25 0 0 1 10.5 0Z" clip-rule="evenodd" />
</svg>
{/if}
{$showUnlistedStore ? "Hide Unlisted" : "Show Unlisted"}
</button>
<button
class="w-full text-left px-4 py-2 hover:bg-secondary-hover flex items-center gap-2"
onclick={() => { handleUnloadAllModels(); menuOpen = false; }}
disabled={isUnloading}
>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-6 h-6">
<path fill-rule="evenodd" d="M12 2.25c-5.385 0-9.75 4.365-9.75 9.75s4.365 9.75 9.75 9.75 9.75-4.365 9.75-9.75S17.385 2.25 12 2.25Zm.53 5.47a.75.75 0 0 0-1.06 0l-3 3a.75.75 0 1 0 1.06 1.06l1.72-1.72v5.69a.75.75 0 0 0 1.5 0v-5.69l1.72 1.72a.75.75 0 1 0 1.06-1.06l-3-3Z" clip-rule="evenodd" />
</svg>
{isUnloading ? "Unloading..." : "Unload All"}
</button>
</div>
{/if}
</div>
{/if}
</div>
{#if !$isNarrow}
<div class="flex justify-between">
<div class="flex gap-2">
<button class="btn text-base flex items-center gap-2" onclick={toggleIdorName} style="line-height: 1.2">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-5 h-5">
<path fill-rule="evenodd" d="M15.97 2.47a.75.75 0 0 1 1.06 0l4.5 4.5a.75.75 0 0 1 0 1.06l-4.5 4.5a.75.75 0 1 1-1.06-1.06l3.22-3.22H7.5a.75.75 0 0 1 0-1.5h11.69l-3.22-3.22a.75.75 0 0 1 0-1.06Zm-7.94 9a.75.75 0 0 1 0 1.06l-3.22 3.22H16.5a.75.75 0 0 1 0 1.5H4.81l3.22 3.22a.75.75 0 1 1-1.06 1.06l-4.5-4.5a.75.75 0 0 1 0-1.06l4.5-4.5a.75.75 0 0 1 1.06 0Z" clip-rule="evenodd" />
</svg>
{$showIdorNameStore === "id" ? "ID" : "Name"}
</button>
<button class="btn text-base flex items-center gap-2" onclick={toggleShowUnlisted} style="line-height: 1.2">
{#if $showUnlistedStore}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-5 h-5">
<path d="M12 15a3 3 0 1 0 0-6 3 3 0 0 0 0 6Z" />
<path fill-rule="evenodd" d="M1.323 11.447C2.811 6.976 7.028 3.75 12.001 3.75c4.97 0 9.185 3.223 10.675 7.69.12.362.12.752 0 1.113-1.487 4.471-5.705 7.697-10.677 7.697-4.97 0-9.186-3.223-10.675-7.69a1.762 1.762 0 0 1 0-1.113ZM17.25 12a5.25 5.25 0 1 1-10.5 0 5.25 5.25 0 0 1 10.5 0Z" clip-rule="evenodd" />
</svg>
{:else}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-5 h-5">
<path d="M3.53 2.47a.75.75 0 0 0-1.06 1.06l18 18a.75.75 0 1 0 1.06-1.06l-18-18ZM22.676 12.553a11.249 11.249 0 0 1-2.631 4.31l-3.099-3.099a5.25 5.25 0 0 0-6.71-6.71L7.759 4.577a11.217 11.217 0 0 1 4.242-.827c4.97 0 9.185 3.223 10.675 7.69.12.362.12.752 0 1.113Z" />
<path d="M15.75 12c0 .18-.013.357-.037.53l-4.244-4.243A3.75 3.75 0 0 1 15.75 12ZM12.53 15.713l-4.243-4.244a3.75 3.75 0 0 0 4.244 4.243Z" />
<path d="M6.75 12c0-.619.107-1.213.304-1.764l-3.1-3.1a11.25 11.25 0 0 0-2.63 4.31c-.12.362-.12.752 0 1.114 1.489 4.467 5.704 7.69 10.675 7.69 1.5 0 2.933-.294 4.242-.827l-2.477-2.477A5.25 5.25 0 0 1 6.75 12Z" />
</svg>
{/if}
unlisted
</button>
</div>
<button class="btn text-base flex items-center gap-2" onclick={handleUnloadAllModels} disabled={isUnloading}>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" class="w-6 h-6">
<path fill-rule="evenodd" d="M12 2.25c-5.385 0-9.75 4.365-9.75 9.75s4.365 9.75 9.75 9.75 9.75-4.365 9.75-9.75S17.385 2.25 12 2.25Zm.53 5.47a.75.75 0 0 0-1.06 0l-3 3a.75.75 0 1 0 1.06 1.06l1.72-1.72v5.69a.75.75 0 0 0 1.5 0v-5.69l1.72 1.72a.75.75 0 1 0 1.06-1.06l-3-3Z" clip-rule="evenodd" />
</svg>
{isUnloading ? "Unloading..." : "Unload All"}
</button>
</div>
{/if}
</div>
<div class="flex-1 overflow-y-auto">
<table class="w-full">
<thead class="sticky top-0 bg-card z-10">
<tr class="text-left border-b border-gray-200 dark:border-white/10 bg-surface">
<th>{$showIdorNameStore === "id" ? "Model ID" : "Name"}</th>
<th></th>
<th>State</th>
</tr>
</thead>
<tbody>
{#each filteredModels.regularModels as model (model.id)}
<tr class="border-b hover:bg-secondary-hover border-gray-200">
<td class={model.unlisted ? "text-txtsecondary" : ""}>
<a href="/upstream/{model.id}/" class="font-semibold" target="_blank">
{getModelDisplay(model)}
</a>
{#if model.description}
<p class={model.unlisted ? "text-opacity-70" : ""}><em>{model.description}</em></p>
{/if}
{#if model.aliases && model.aliases.length > 0}
<p class="text-xs text-txtsecondary">Aliases: {model.aliases.join(", ")}</p>
{/if}
</td>
<td class="w-12">
{#if model.state === "stopped"}
<button class="btn btn--sm" onclick={() => loadModel(model.id)}>Load</button>
{:else}
<button class="btn btn--sm" onclick={() => unloadSingleModel(model.id)} disabled={model.state !== "ready"}>Unload</button>
{/if}
</td>
<td class="w-20">
<span class="w-16 text-center status status--{model.state}">{model.state}</span>
</td>
</tr>
{/each}
</tbody>
</table>
{#if Object.keys(filteredModels.peerModelsByPeerId).length > 0}
<h3 class="mt-8 mb-2">Peer Models</h3>
{#each Object.entries(filteredModels.peerModelsByPeerId).sort(([a], [b]) => a.localeCompare(b)) as [peerId, peerModels] (peerId)}
<div class="mb-4">
<table class="w-full">
<thead class="sticky top-0 bg-card z-10">
<tr class="text-left border-b border-gray-200 dark:border-white/10 bg-surface">
<th class="font-semibold">{peerId}</th>
</tr>
</thead>
<tbody>
{#each peerModels as model (model.id)}
<tr class="border-b hover:bg-secondary-hover border-gray-200">
<td class="pl-8 {model.unlisted ? 'text-txtsecondary' : ''}">
<span>{model.id}</span>
</td>
</tr>
{/each}
</tbody>
</table>
</div>
{/each}
{/if}
</div>
</div>
@@ -0,0 +1,152 @@
<script lang="ts">
import type { Snippet } from "svelte";
import { onMount } from "svelte";
interface Props {
direction: "horizontal" | "vertical";
storageKey: string;
leftPanel: Snippet;
rightPanel: Snippet;
defaultSize?: number;
minSize?: number;
}
let { direction, storageKey, leftPanel, rightPanel, defaultSize = 50, minSize = 5 }: Props = $props();
let containerRef: HTMLDivElement;
let isDragging = $state(false);
// svelte-ignore state_referenced_locally
let leftSize = $state(defaultSize);
// Load saved size from localStorage
onMount(() => {
const saved = localStorage.getItem(`panel-size-${storageKey}`);
if (saved) {
const parsed = parseFloat(saved);
if (!isNaN(parsed) && parsed >= minSize && parsed <= 100 - minSize) {
leftSize = parsed;
}
}
});
function saveSize(): void {
localStorage.setItem(`panel-size-${storageKey}`, String(leftSize));
}
function handleMouseDown(e: MouseEvent): void {
e.preventDefault();
isDragging = true;
document.addEventListener("mousemove", handleMouseMove);
document.addEventListener("mouseup", handleMouseUp);
}
function handleTouchStart(_e: TouchEvent): void {
isDragging = true;
document.addEventListener("touchmove", handleTouchMove);
document.addEventListener("touchend", handleTouchEnd);
}
function handleMouseMove(e: MouseEvent): void {
if (!isDragging || !containerRef) return;
updateSize(e.clientX, e.clientY);
}
function handleTouchMove(e: TouchEvent): void {
if (!isDragging || !containerRef || e.touches.length === 0) return;
updateSize(e.touches[0].clientX, e.touches[0].clientY);
}
function updateSize(clientX: number, clientY: number): void {
const rect = containerRef.getBoundingClientRect();
let newSize: number;
if (direction === "horizontal") {
newSize = ((clientX - rect.left) / rect.width) * 100;
} else {
newSize = ((clientY - rect.top) / rect.height) * 100;
}
// Clamp size
newSize = Math.max(minSize, Math.min(100 - minSize, newSize));
leftSize = newSize;
}
function handleMouseUp(): void {
isDragging = false;
saveSize();
document.removeEventListener("mousemove", handleMouseMove);
document.removeEventListener("mouseup", handleMouseUp);
}
function handleTouchEnd(): void {
isDragging = false;
saveSize();
document.removeEventListener("touchmove", handleTouchMove);
document.removeEventListener("touchend", handleTouchEnd);
}
function handleKeyDown(e: KeyboardEvent): void {
const step = 2; // 2% increment for keyboard navigation
const key = e.key;
if (direction === "horizontal" && (key === "ArrowLeft" || key === "ArrowRight")) {
e.preventDefault();
const delta = key === "ArrowLeft" ? -step : step;
const newSize = Math.max(minSize, Math.min(100 - minSize, leftSize + delta));
leftSize = newSize;
saveSize();
} else if (direction === "vertical" && (key === "ArrowUp" || key === "ArrowDown")) {
e.preventDefault();
const delta = key === "ArrowUp" ? -step : step;
const newSize = Math.max(minSize, Math.min(100 - minSize, leftSize + delta));
leftSize = newSize;
saveSize();
}
}
let containerClass = $derived(direction === "horizontal" ? "flex-row" : "flex-col");
let handleClass = $derived(
direction === "horizontal"
? "w-2 h-full cursor-col-resize"
: "w-full h-2 cursor-row-resize"
);
let leftStyle = $derived(
direction === "horizontal"
? `width: ${leftSize}%; min-width: ${minSize}%`
: `height: ${leftSize}%; min-height: ${minSize}%`
);
let rightStyle = $derived(
direction === "horizontal"
? `width: ${100 - leftSize}%; min-width: ${minSize}%`
: `height: ${100 - leftSize}%; min-height: ${minSize}%`
);
</script>
<div bind:this={containerRef} class="flex {containerClass} h-full w-full gap-2">
<div style={leftStyle} class="overflow-hidden">
{@render leftPanel()}
</div>
<!-- svelte-ignore a11y_no_noninteractive_tabindex -->
<!-- svelte-ignore a11y_no_noninteractive_element_interactions -->
<div
role="separator"
tabindex="0"
class="{handleClass} bg-primary hover:bg-success transition-colors rounded flex-shrink-0"
onmousedown={handleMouseDown}
ontouchstart={handleTouchStart}
onkeydown={handleKeyDown}
aria-label="Resize panels"
aria-orientation={direction}
aria-valuenow={Math.round(leftSize)}
aria-valuemin={minSize}
aria-valuemax={100 - minSize}
></div>
<div style={rightStyle} class="overflow-hidden">
{@render rightPanel()}
</div>
</div>

Some files were not shown because too many files have changed in this diff Show More