llama-swap

Author	SHA1	Message	Date
Benson Wong	4413881b2d	proxy: actually add /v1/responses endpoint (#449 ) ref: #448	2026-01-01 13:35:45 -08:00
Benson Wong	8df5e8563b	proxy: add /v1/responses and /v1/audio/voices endpoints (#448 ) Updates #433 Fixes #442 #226	2026-01-01 12:52:12 -08:00
Benson Wong	7931212d3e	proxy: add v1/images/edits API endpoint (#447 ) Updates #433	2026-01-01 12:43:06 -08:00
Benson Wong	3dc36032fb	proxy: skip very slow tests in -short test mode (#446 ) * proxy: skip very slow tests in -short test mode * CLAUDE.md: update testing instructions	2025-12-31 14:08:56 -08:00
Benson Wong	addb98646f	proxy: add support for basic authorization (#445 ) Fixes #444 where the UI with api keys did not work. The choice to use http basic authorization is for simple, automatic browser support. No changes to the UI were necessary. Just use an API key as the password, no user name is required.	2025-12-31 13:42:35 -08:00
Benson Wong	37d74efc2d	proxy: add /v1/images/generations (#443 ) Add support for the /v1/images/generations endpoint Updates #433 Closes #191	2025-12-30 21:04:58 -08:00
Benson Wong	22e098ac8b	Add Peer Model Support (#438 ) This PR allows a single llama-swap to be the central proxy for models served by other inference servers. The peer servers can be another llama-swap or any API that supports the /v1/* inference endpoint. Updates: #433, #299 Closes: #296	2025-12-27 20:18:06 -08:00
Benson Wong	53b32f3601	proxy: add API key support (#436 ) Add configuration support for api keys that are enforced by llama-swap. Keys are stripped before sending them to upstream servers. Updates: #433, #50 and #251	2025-12-23 23:39:33 -08:00
Benson Wong	565c44766d	config,proxy: add new configuration logToStdout (#432 ) The new logToStdout option controls what is logged to stdout. The default has been changed to just the proxy logs, which contain swap and http request logs. There are four supported settings: none, proxy, upstream, both. The "both" setting is the legacy setting where everything was spewed to stdout.	2025-12-21 22:23:31 -08:00
Benson Wong	e6a9e210ba	proxy: fix path bug in /logs/stream/{model_id} (#431 ) A {model_id} containing a forward slash trips up gin's path param parsing. This updates /logs/stream to work like /upstream where the model_id is built up in parts and searched for in the configuration. Updates #421	2025-12-21 21:47:14 -08:00
Benson Wong	d3f329f924	proxy: Improve logging performance and allow separate log streaming (#421 ) Replace container/ring.Ring with a custom circularBuffer that uses a single contiguous []byte slice. This fixes the original implementation which created 10,240 ring elements instead of 10KB of storage. GetHistory is now 139x faster (145μs → 1μs) and uses 117x less memory (1.2MB → 10KB). Allocations reduced from 2 to 1 per write operation. Create a LogMonitor per proxy.Process, replacing the usage of a shared one. The buffer in LogMonitor is lazy allocated on the first call to Write and freed when the Process is stopped. This reduces unnecessary memory usage when a model is not active. The /logs/stream/{model_id} endpoint was added to stream logs from a specific process.	2025-12-18 21:49:25 -08:00
Benson Wong	dea98733c3	proxy: extract metrics for v1/messages (#419 )	2025-11-29 23:51:20 -08:00
Benson Wong	c968da1b73	proxy: add support for anthropic v1/messages api (#417 ) * proxy: add support for anthropic v1/messages api * proxy: restrict loading message to /v1/chat/completions	2025-11-29 22:09:07 -08:00
Nikesh Parajuli	06523d8c1e	feat: add platform-specific process attributes support (#411 ) Fixes issues on Windows showing new windows for every process llama-swap spawns.	2025-11-24 21:39:56 -08:00
Ryan Steed	86e9b93c37	proxy,ui: add version endpoint and display version info in UI (#395 ) - Add /api/version endpoint to ProxyManager that returns build date, commit hash, and version - Implement SetVersion method to configure version info in ProxyManager - Add version info fetching to APIProvider and display in ConnectionStatus component - Include version info in UI context and update dependencies - Add tests for version endpoint functionality	2025-11-17 10:43:47 -08:00
Ryan Steed	3acace810f	proxy: add configurable logging timestamp format (#401 ) introduces a new configuration option logTimeFormat that allows customizing the timestamp in log messages using golang's built in time format constants. The default remains no timestamp.	2025-11-16 10:21:59 -08:00
Ryan Steed	554d29e87d	feat: enhance model listing to include aliases (#400 ) introduce includeAliasesInList as a new configuration setting (default false) that includes aliases in v1/models Fixes #399	2025-11-15 14:35:26 -08:00
Benson Wong	12b69fb718	proxy: recover from panic in Process.statusUpdate (#378 ) Process.statusUpdate() panics when it can not write data, usually from a client disconnect. Since it runs in a goroutine and did not have a recover() the result was a crash. ref: https://github.com/mostlygeek/llama-swap/discussions/326#discussioncomment-14856197	2025-11-03 05:30:09 -08:00
Benson Wong	a89b803d4a	Stream loading state when swapping models (#371 ) Swapping models can take a long time and leave a lot of silence while the model is loading. Rather than silently load the model in the background, this PR allows llama-swap to send status updates in the reasoning_content of a streaming chat response. Fixes: #366	2025-10-29 00:09:39 -07:00
Benson Wong	f852689104	proxy: add panic recovery to Process.ProxyRequest (#363 ) Switching to use httputil.ReverseProxy in #342 introduced a possible panic if a client disconnects while streaming the body. Since llama-swap does not use http.Server the recover() is not automatically there. - introduce a recover() in Process.ProxyRequest to recover and log the event - add TestProcess_ReverseProxyPanicIsHandled to reproduce and test the fix fixes: #362	2025-10-25 20:40:05 -07:00
Benson Wong	e250e71e59	Include metrics from upstream chat requests (#361 ) * proxy: refactor metrics recording - remove metrics_middleware.go as this wrapper is no longer needed. This also eliminiates double body parsing for the modelID - move metrics parsing to be part of MetricsMonitor - refactor how metrics are recording in ProxyManager - add MetricsMonitor tests - improve mem efficiency of processStreamingResponse - add benchmarks for MetricsMonitor.addMetrics - proxy: refactor MetricsMonitor to be more safe handling errors	2025-10-25 17:38:18 -07:00
Benson Wong	c07179d6e2	cmd/wol-proxy: add wol-proxy (#352 ) add a wake-on-lan proxy for llama-swap. When the target llama-swap server is unreachable it will send hold a request, send a WoL packet and proxy the request when llama-swap is available.	2025-10-20 20:55:02 -07:00
David Wen Riccardi-Zhu	d58a8b85bf	Refactor to use httputil.ReverseProxy (#342 ) * Refactor to use httputil.ReverseProxy Refactor manual HTTP proxying logic in Process.ProxyRequest to use the standard library's httputil.ReverseProxy. * Refactor TestProcess_ForceStopWithKill test Update to handle behavior with httputil.ReverseProxy. * Fix gin interface conversion panic	2025-10-13 16:47:04 -07:00
Benson Wong	caf9e98b1e	Fix race conditions in proxy.Process (#349 ) - Fix data races found in proxy.Process by go's race detector. - Add data race detection to the CI tests. Fixes #348	2025-10-13 16:42:49 -07:00
Benson Wong	00b738cd0f	Add Macro-In-Macro Support (#337 ) Add full macro-in-macro support so any user defined macro can contain another one as long as it was previously declared in the configuration file. Fixes #336 Supercedes #335	2025-10-06 22:57:15 -07:00
Benson Wong	70930e4e91	proxy: add support for user defined metadata in model configs (#333 ) Changes: - add Metadata key to ModelConfig - include metadata in /v1/models under meta.llamaswap key - add recursive macro substitution into Metadata - change macros at global and model level to be any scalar type Note: This is the first mostly AI generated change to llama-swap. See #333 for notes about the workflow and approach to AI going forward.	2025-10-04 19:56:41 -07:00
Benson Wong	1f6179110c	proxy/config: add model level macros (#330 ) * proxy/config: add model level macros Add macros to model configuration. Model macros override macros that are defined at the global configuration level. They follow the same naming and value rules as the global macros. * proxy/config: fix bug with macro reserved name checking The PORT reserved name was not properly checked * proxy/config: add tests around model.filters.stripParams - add check that model.filters.stripParams has no invalid macros - renamed strip_params to stripParams for camel case consistency - add legacy code compatibility so model.filters.strip_params continues to work * proxy/config: add duplicate removal to model.filters.stripParams * clean up some doc nits	2025-09-28 23:32:52 -07:00
Benson Wong	216c40b951	proxy/config: create config package and migrate configuration (#329 ) * proxy/config: create config package and migrate configuration The configuration is become more complex as llama-swap adds more advanced features. This commit moves config to its own package so it can be developed independently of the proxy package. Additionally, enforcing a public API for a configuration will allow downstream usage to be more decoupled.	2025-09-28 16:50:06 -07:00
Benson Wong	9e3d491c85	proxyToUpstream: add redirect with trailing slash to upstream endpoint (#322 ) This adds a redirect to the upstream endpoint so it always ends with a trailing /. Fixes #321	2025-09-25 16:43:00 -07:00
Benson Wong	1a84926505	proxy: add unload of single model (#318 ) This adds a new API endpoint, /api/models/unload/*model, that unloads a single model. In the UI when a model is in a ReadyState it will have a new button to unload it. Fixes #312	2025-09-24 20:53:48 -07:00
Benson Wong	c36986fef6	upstream handler support for model names with forward slash (#298 ) The upstream handler would break on model IDs that contained a forward slash. Model IDs like "aaa/bbb" called at upstream/aaa/bbb would result in an error. This commit adds support for model IDs with a forward slash by iteratively searching the path for a match. Fixes: #229	2025-09-13 13:37:03 -07:00
Artur Podsiadły	558801db1a	Fix nginx proxy buffering for streaming endpoints (#295 ) * Fix nginx proxy buffering for streaming endpoints - Add X-Accel-Buffering: no header to SSE endpoints (/api/events, /logs/stream) - Add X-Accel-Buffering: no header to proxied text/event-stream responses - Add nginx reverse proxy configuration section to README - Add tests for X-Accel-Buffering header on streaming endpoints Fixes #236 * Fix goroutine cleanup in streaming endpoints test Add context cancellation to TestProxyManager_StreamingEndpointsReturnNoBufferingHeader to ensure the goroutine is properly cleaned up when the test completes.	2025-09-09 16:07:46 -07:00
Benson Wong	f58c8c8ec5	Support llama.cpp's cache_n in timings info (#287 ) Capture prompt cache metrics and surface them on Activities page in UI	2025-09-06 13:58:02 -07:00
Brett Profitt	97b17fc47d	Add ${MODEL_ID} macro (#226 ) The automatic ${MODEL_ID} macro includes the name of the model and can be used in Cmd and CmdStop.	2025-09-01 21:21:37 -07:00
Benson Wong	831a90d3b0	Add different timeout scenarios to Process.checkHealthEndpoint #276 (#278 ) - add a TCP connection timeout of 500ms - increase HTTP client timeout to 5000ms In this new behaviour the upstream has 500ms to accept a tcp connection and 5000ms to respond to the HTTP request.	2025-08-28 22:03:14 -07:00
Yandrik	977f1856bb	add /completion endpoint (#275 ) * feat: add /completion endpoint * chore: reformat using gofmt	2025-08-28 21:41:02 -07:00
Benson Wong	52b329f7bc	Fix #277 race condition in ProcessGroup.ProxyRequest when swap=true	2025-08-28 21:38:40 -07:00
Benson Wong	57803fd3aa	Support llama-server's /infill endpoint (#272 ) Add support for llama-server's /infill endpoint and metrics gathering on the Activities page.	2025-08-27 08:36:05 -07:00
Benson Wong	04fc67354a	Improve Activity event handling in the UI (#254 ) Improve Activity event handling in the UI - fixes #252 found that the Activity page showed activity inconsistent with /api/metrics - Change data structure for event metrics to array. - Add Event stream connections status indicator	2025-08-15 21:44:08 -07:00
Benson Wong	5dc6b3e6d9	Add barebones but working implementation of model preload (#209 , #235 ) Add barebones but working implementation of model preload * add config test for Preload hook * improve TestProxyManager_StartupHooks * docs for new hook configuration * add a .dev to .gitignore	2025-08-14 10:27:28 -07:00
Benson Wong	74c69f39ef	Add prompt processing metrics (#250 ) - capture prompt processing metrics - display prompt processing metrics on UI Activity page	2025-08-14 10:02:16 -07:00
Benson Wong	10569ed546	Fix model alias usage in upstream path (#230 ) Model alias values are not properly resolved and work in upstream/ path. Related to #229.	2025-08-07 20:16:56 -07:00
Ben Greene	5c63e0066c	return models sorted by id in /v1/models (#222 )	2025-08-06 10:04:52 -07:00
Benson Wong	0f583163f7	add /health (#211 )	2025-07-30 10:37:10 -07:00
Benson Wong	fd50932dbc	Decouple MetricsMiddleware from downstream handlers (#206 ) * Decouple MetricsMiddleware from downstream handlers Remove ls-real-model-name optimization. Within proxyOAIHandler the request body's bytes are required for various rewriting features anyways. This negated any benefits from trying not to parse it twice.	2025-07-27 10:36:06 -07:00
Gaël James	8c693e7fcf	Add endpoint aliases for reranking models (#201 ) * Add endpoint aliases for reranking models * Add MetricsMiddleware to the previous reranking endpoint * Fix the embeddings endpoint not having model set	2025-07-24 08:32:47 -07:00
Benson Wong	01d4838fb3	Fix token metrics parsing (#199 ) Fix #198 - use llama-server's `timings` info if available in response body - send "-1" for token/sec when not able to accurately calculate performance - optimize streaming body search for metrics information	2025-07-22 23:10:14 -07:00
Benson Wong	cce0bc6aa1	add guard to ensure ls-real-model-name is set in context	2025-07-21 22:59:41 -07:00
g2mt	87dce5f8f6	Add metrics logging for chat completion requests (#195 ) - Add token and performance metrics for v1/chat/completions - Add Activity Page in UI - Add /api/metrics endpoint Contributed by @g2mt	2025-07-21 22:19:55 -07:00
Benson Wong	6299c1b874	Fix High CPU (#189 ) * vendor in kelindar/event lib and refactor to remove time.Ticker	2025-07-15 18:04:30 -07:00

1 2 3 4

167 Commits