Commit Graph

76 Commits

Author SHA1 Message Date
Benson Wong addb98646f proxy: add support for basic authorization (#445)
Fixes #444 where the UI with api keys did not work. The choice to use
http basic authorization is for simple, automatic browser support. No
changes to the UI were necessary. Just use an API key as the password,
no user name is required.
2025-12-31 13:42:35 -08:00
Benson Wong 37d74efc2d proxy: add /v1/images/generations (#443)
Add support for the /v1/images/generations endpoint

Updates #433
Closes #191
2025-12-30 21:04:58 -08:00
Benson Wong 22e098ac8b Add Peer Model Support (#438)
This PR allows a single llama-swap to be the central proxy for models served by other inference servers. The peer servers can be another llama-swap or any API that supports the /v1/* inference endpoint.

Updates: #433, #299
Closes: #296
2025-12-27 20:18:06 -08:00
Benson Wong 53b32f3601 proxy: add API key support (#436)
Add configuration support for api keys that are enforced by llama-swap. Keys are stripped before sending them to upstream servers. 

Updates: #433, #50 and #251
2025-12-23 23:39:33 -08:00
Benson Wong 565c44766d config,proxy: add new configuration logToStdout (#432)
The new logToStdout option controls what is logged to stdout. The
default has been changed to just the proxy logs, which contain swap and
http request logs.

There are four supported settings: none, proxy, upstream, both. The
"both" setting is the legacy setting where everything was spewed to
stdout.
2025-12-21 22:23:31 -08:00
Benson Wong e6a9e210ba proxy: fix path bug in /logs/stream/{model_id} (#431)
A {model_id} containing a forward slash trips up gin's path param
parsing. This updates /logs/stream to work like /upstream where the
model_id is built up in parts and searched for in the configuration.

Updates #421
2025-12-21 21:47:14 -08:00
Benson Wong c968da1b73 proxy: add support for anthropic v1/messages api (#417)
* proxy: add support for anthropic v1/messages api
* proxy: restrict loading message to /v1/chat/completions
2025-11-29 22:09:07 -08:00
Ryan Steed 86e9b93c37 proxy,ui: add version endpoint and display version info in UI (#395)
- Add /api/version endpoint to ProxyManager that returns build date, commit hash, and version
- Implement SetVersion method to configure version info in ProxyManager
- Add version info fetching to APIProvider and display in ConnectionStatus component
- Include version info in UI context and update dependencies
- Add tests for version endpoint functionality
2025-11-17 10:43:47 -08:00
Ryan Steed 3acace810f proxy: add configurable logging timestamp format (#401)
introduces a new configuration option logTimeFormat that allows customizing the timestamp in log messages using golang's built in time format constants. The default remains no timestamp.
2025-11-16 10:21:59 -08:00
Ryan Steed 554d29e87d feat: enhance model listing to include aliases (#400)
introduce includeAliasesInList as a new configuration setting (default false) that includes aliases in v1/models

Fixes #399
2025-11-15 14:35:26 -08:00
Benson Wong a89b803d4a Stream loading state when swapping models (#371)
Swapping models can take a long time and leave a lot of silence while the model is loading. Rather than silently load the model in the background, this PR allows llama-swap to send status updates in the reasoning_content of a streaming chat response.

Fixes: #366
2025-10-29 00:09:39 -07:00
Benson Wong e250e71e59 Include metrics from upstream chat requests (#361)
* proxy: refactor metrics recording

- remove metrics_middleware.go as this wrapper is no longer needed. This
  also eliminiates double body parsing for the modelID
- move metrics parsing to be part of MetricsMonitor
- refactor how metrics are recording in ProxyManager
- add MetricsMonitor tests
- improve mem efficiency of processStreamingResponse
- add benchmarks for MetricsMonitor.addMetrics
- proxy: refactor MetricsMonitor to be more safe handling errors
2025-10-25 17:38:18 -07:00
Benson Wong c07179d6e2 cmd/wol-proxy: add wol-proxy (#352)
add a wake-on-lan proxy for llama-swap. When the target llama-swap server is unreachable it will send hold a request, send a WoL packet and proxy the request when llama-swap is available.
2025-10-20 20:55:02 -07:00
Benson Wong 70930e4e91 proxy: add support for user defined metadata in model configs (#333)
Changes: 

- add Metadata key to ModelConfig
- include metadata in /v1/models under meta.llamaswap key
- add recursive macro substitution into Metadata
- change macros at global and model level to be any scalar type

Note: 

This is the first mostly AI generated change to llama-swap. See #333 for notes about the workflow and approach to AI going forward.
2025-10-04 19:56:41 -07:00
Benson Wong 216c40b951 proxy/config: create config package and migrate configuration (#329)
* proxy/config: create config package and migrate configuration

The configuration is become more complex as llama-swap adds more
advanced features. This commit moves config to its own package so it can
be developed independently of the proxy package.

Additionally, enforcing a public API for a configuration will allow
downstream usage to be more decoupled.
2025-09-28 16:50:06 -07:00
Benson Wong 9e3d491c85 proxyToUpstream: add redirect with trailing slash to upstream endpoint (#322)
This adds a redirect to the upstream endpoint so it always ends with a trailing /. 

Fixes #321
2025-09-25 16:43:00 -07:00
Benson Wong 1a84926505 proxy: add unload of single model (#318)
This adds a new API endpoint, /api/models/unload/*model, that unloads a single model. In the UI when a model is in a ReadyState it will have a new button to unload it. 

Fixes #312
2025-09-24 20:53:48 -07:00
Benson Wong c36986fef6 upstream handler support for model names with forward slash (#298)
The upstream handler would break on model IDs that contained a forward
slash. Model IDs like "aaa/bbb" called at upstream/aaa/bbb would result
in an error. This commit adds support for model IDs with a forward slash
by iteratively searching the path for a match.

Fixes: #229
2025-09-13 13:37:03 -07:00
Yandrik 977f1856bb add /completion endpoint (#275)
* feat: add /completion endpoint
* chore: reformat using gofmt
2025-08-28 21:41:02 -07:00
Benson Wong 57803fd3aa Support llama-server's /infill endpoint (#272)
Add support for llama-server's /infill endpoint and metrics gathering on the Activities page.
2025-08-27 08:36:05 -07:00
Benson Wong 5dc6b3e6d9 Add barebones but working implementation of model preload (#209, #235)
Add barebones but working implementation of model preload

* add config test for Preload hook
* improve TestProxyManager_StartupHooks
* docs for new hook configuration
* add a .dev to .gitignore
2025-08-14 10:27:28 -07:00
Benson Wong 10569ed546 Fix model alias usage in upstream path (#230)
Model alias values are not properly resolved and work in upstream/ path.

Related to #229.
2025-08-07 20:16:56 -07:00
Ben Greene 5c63e0066c return models sorted by id in /v1/models (#222) 2025-08-06 10:04:52 -07:00
Benson Wong 0f583163f7 add /health (#211) 2025-07-30 10:37:10 -07:00
Benson Wong fd50932dbc Decouple MetricsMiddleware from downstream handlers (#206)
* Decouple MetricsMiddleware from downstream handlers

Remove ls-real-model-name optimization. Within proxyOAIHandler the
request body's bytes are required for various rewriting features
anyways. This negated any benefits from trying not to parse it twice.
2025-07-27 10:36:06 -07:00
Gaël James 8c693e7fcf Add endpoint aliases for reranking models (#201)
* Add endpoint aliases for reranking models
* Add MetricsMiddleware to the previous reranking endpoint
* Fix the embeddings endpoint not having model set
2025-07-24 08:32:47 -07:00
Benson Wong cce0bc6aa1 add guard to ensure ls-real-model-name is set in context 2025-07-21 22:59:41 -07:00
g2mt 87dce5f8f6 Add metrics logging for chat completion requests (#195)
- Add token and performance metrics  for v1/chat/completions 
- Add Activity Page in UI
- Add /api/metrics endpoint

Contributed by @g2mt
2025-07-21 22:19:55 -07:00
Benson Wong 1921e570d7 Add Event Bus (#184)
Major internal refactor to use an event bus to pass event/messages along. These changes are largely invisible user facing but sets up internal design for real time stats and information.

- `--watch-config` logic refactored for events
- remove multiple SSE api endpoints, replaced with /api/events
- keep all functionality essentially the same
- UI/backend sync is in near real time now
2025-07-01 22:17:35 -07:00
Benson Wong c867a6c9a2 Add name and description to v1/models list (#179)
* Add support for name and description in v1/models list
* add configuration example for name and description
2025-06-30 23:02:44 -07:00
Benson Wong 4236cec03a Add Filters to Model Configuration (#174)
llama-swap can strip specific keys in JSON requests. This is useful for removing the ability for clients to set sampling parameters like temperature, top_k, top_p, etc.
2025-06-23 10:52:29 -07:00
Benson Wong 9a3c656738 New UI (#157, #164)
- Add a react UI to replace the plain HTML one. 
- Serve as a foundation for better GUI interactions
2025-06-16 16:45:19 -07:00
Benson Wong 2dc0ca0663 improve llama-swap upstream process recovery and restarts (#155)
Refactor internal upstream process life cycle management to recover better from unexpected situations. With this change llama-swap should never need to be restarted due to a crashed upstream child process.  The `StateFailed` state was removed in favour of always trying to start/restart a process.
2025-06-05 16:24:55 -07:00
Daniel Hofer a84098d3b4 Add missing object type to /v1/models endpoint (#154) 2025-06-02 09:25:45 -07:00
choyuansu f7e46a359f Add link to unload endpoint in upstream list (#140)
* Add link to open /unload
2025-05-20 08:31:44 -07:00
Benson Wong b83a5fa291 make Failed stated recoverable (#137)
A process in the failed state can transition to stopped either by calling /unload or swapping to another model.
2025-05-16 19:54:44 -07:00
fakezeta 2d00120781 Update proxymanager.go (#135) 2025-05-16 06:45:09 -07:00
Fadenfire f5763a94a0 Fix content length being incorrect when useModelName is used (#131)
* Fix content length being incorrect when useModelName is used
* Update c.Request.ContentLength as well
2025-05-14 19:37:54 -07:00
Benson Wong 519c3a4d22 Change /unload to not wait for inflight requests (#125)
Sometimes upstreams can accept HTTP but never respond causing requests
to build up waiting for a response. This can block Process.Stop() as
that waits for inflight requests to finish. This change refactors the
code to not wait when attempting to shutdown the process.
2025-05-13 11:39:19 -07:00
Sam bc652709a5 Add config hot-reload (#106)
introduce --watch-config command line option to reload ProxyManager when configuration changes.
2025-05-11 17:37:00 -07:00
Benson Wong 21d7973d11 Improve content-length handling (#115)
ref: See #114

* Improve content-length handling
- Content length was not always being sent
- Add tests for content-length
2025-05-05 10:46:26 -07:00
Yi Hong Ang cc450e9c5f fix issue where proxy is still proxying with chunked transfer-encoding (#114) 2025-05-05 10:00:03 -07:00
Benson Wong 27465fe053 bug fix with missing early return statements fix #112 2025-05-05 09:32:44 -07:00
Benson Wong 448ccae959 Introduce Groups Feature (#107)
Groups allows more control over swapping behaviour when a model is requested. The new groups feature provides three ways to control swapping: within the group, swapping out other groups or keep the models in the group loaded persistently (never swapped out). 

Closes #96, #99 and #106.
2025-05-02 22:35:38 -07:00
Benson Wong 5fad24c16f Make checkHealthTimeout Interruptable during startup (#102)
interrupt and exit Process.start() early if the upstream process exits prematurely or unexpectedly.
2025-04-24 14:39:33 -07:00
Benson Wong b8f888f864 Logging Improvements (#88)
This change revamps the internal logging architecture to be more flexible and descriptive. Previously all logs from both llama-swap and upstream services were mixed together. This makes it harder to troubleshoot and identify problems. This PR adds these new endpoints: 

- `/logs/stream/proxy` - just llama-swap's logs
- `/logs/stream/upstream` - stdout output from the upstream server
2025-04-04 21:01:33 -07:00
Benson Wong b7f8cb5094 Limit Access-Control-Allow-Origin to OPTIONS preflight requests #85 2025-04-04 14:44:35 -07:00
Benson Wong a23da6eb57 Sanitize CORS headers (#85)
Add sanitation step for `Access-Control-Allow-Headers` when echoing back user supplied headers
2025-04-01 08:43:53 -07:00
Benson Wong 84e2c07a7e Refactor wildcard out of CORS headers (#81)
Changes to CORS functionality: 

- `Access-Control-Allow-Origin: *` is set for all requests 
- for pre-flight OPTIONS requests
  - specify methods: `Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE, OPTIONS`
  - if the client sent `Access-Control-Request-Headers` then echo back the same value in `Access-Control-Allow-Headers`. If no `Access-Control-Request-Headers` were sent, then send back a default set
  - set `Access-Control-Max-Age: 86400` to that may improve performance 
- Add CORS tests to the proxy-manager
2025-03-25 15:24:43 -07:00
Benson Wong 680af28bcc Allow very permissive CORS headers (#77) 2025-03-20 15:50:21 -07:00