fix config hot-reload on macos (#180 )

Co-authored-by: srevn <srevn@github>
improve log display and add a small stats table in ui (#178 )
2025-06-26 09:20:50 -07:00 · 2025-06-25 12:27:49 -07:00 · 2025-06-24 10:38:28 -07:00 · 2025-06-23 16:17:21 -07:00 · 2025-06-23 10:52:29 -07:00 · 2025-06-19 14:39:07 -07:00
28 changed files with 525 additions and 124 deletions
@@ -17,14 +17,16 @@ builds:
      - goos: windows
        goarch: arm64
 # use zip format for windows
 archives:
  - id: default
-    format: tar.gz
+    formats:
      - tar.gz
    name_template: "{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}"
    builds_info:
      group: root
      owner: root
    format_overrides:
      # use zip format for windows
      - goos: windows
-        format: zip
+        formats:
          - zip
@@ -22,6 +22,7 @@ Written in golang, it is very easy to install (single binary with no dependencie
  - `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
  - `v1/audio/transcriptions` ([docs](https://github.com/mostlygeek/llama-swap/issues/41#issuecomment-2722637867))
 - ✅ llama-swap custom API endpoints
  - `/ui` - web UI
  - `/log` - remote log monitoring
  - `/upstream/:model_id` - direct access to upstream HTTP server ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
  - `/unload` - manually unload running models ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
@@ -67,6 +68,12 @@ However, there are many more capabilities that llama-swap supports:
 See the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration) in the wiki all options and examples.
 ## Web UI
 llama-swap ships with a web based interface to make it easier to monitor logs and check the status of models. 
 <img width="1758" alt="image" src="https://github.com/user-attachments/assets/31ae5bcd-5efd-46b0-b64b-6db9e60196d3" />
 ## Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
 Docker is the quickest way to try out llama-swap:
@@ -1,93 +1,191 @@
-# ======
+# llama-swap YAML configuration example
-# For a more detailed configuration example:
+# -------------------------------------
-# https://github.com/mostlygeek/llama-swap/wiki/Configuration
+#
-# ======
+# - Below are all the available configuration options for llama-swap.
 # - Settings with a default value, or noted as optional can be omitted.
 # - Settings that are marked required must be in your configuration file
-# Seconds to wait for llama.cpp to be available to serve requests
+# healthCheckTimeout: number of seconds to wait for a model to be ready to serve requests
-# Default (and minimum): 15 seconds
+# - optional, default: 120
-healthCheckTimeout: 90
+# - minimum value is 15 seconds, anything less will be set to this value
 healthCheckTimeout: 500
-# valid log levels: debug, info (default), warn, error
+# logLevel: sets the logging value
-logLevel: debug
+# - optional, default: info
 # - Valid log levels: debug, info, warn, error
 logLevel: info
-# creating a coding profile with models for code generation and general questions
+# startPort: sets the starting port number for the automatic ${PORT} macro.
-groups:
+# - optional, default: 5800
-  coding:
+# - the ${PORT} macro can be used in model.cmd and model.proxy settings
-    swap: false
+# - it is automatically incremented for every model that uses it
-    members:
+startPort: 10001
      - "qwen"
      - "llama"
 # macros: sets a dictionary of string:string pairs
 # - optional, default: empty dictionary
 # - these are reusable snippets
 # - used in a model's cmd, cmdStop, proxy and checkEndpoint
 # - useful for reducing common configuration settings
 macros:
  "latest-llama": >
    /path/to/llama-server/llama-server-ec9e0301
    --port ${PORT}
 # models: a dictionary of model configurations
 # - required
 # - each key is the model's ID, used in API requests
 # - model settings have default values that are used if they are not defined here
 # - below are examples of the various settings a model can have:
 # - available model settings: env, cmd, cmdStop, proxy, aliases, checkEndpoint, ttl, unlisted
 models:
  # keys are the model names used in API requests
  "llama":
    # cmd: the command to run to start the inference server.
    # - required
    # - it is just a string, similar to what you would run on the CLI
    # - using `|` allows for comments in the command, these will be parsed out
    # - macros can be used within cmd
    cmd: |
-      models/llama-server-osx
+      # ${latest-llama} is a macro that is defined above
-      --port ${PORT}
+      ${latest-llama}
-      -m models/Llama-3.2-1B-Instruct-Q4_0.gguf
+      --model path/to/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf
-    # list of model name aliases this llama.cpp instance can serve
+    # env: define an array of environment variables to inject into cmd's environment
    # - optional, default: empty array
    # - each value is a single string
    # - in the format: ENV_NAME=value
    env:
      - "CUDA_VISIBLE_DEVICES=0,1,2"
    # proxy: the URL where llama-swap routes API requests
    # - optional, default: http://localhost:${PORT}
    # - if you used ${PORT} in cmd this can be omitted
    # - if you use a custom port in cmd this *must* be set
    proxy: http://127.0.0.1:8999
    # aliases: alternative model names that this model configuration is used for
    # - optional, default: empty array
    # - aliases must be unique globally
    # - useful for impersonating a specific model
    aliases:
-    - gpt-4o-mini
+      - "gpt-4o-mini"
      - "gpt-3.5-turbo"
-    # check this path for a HTTP 200 response for the server to be ready
+    # checkEndpoint: URL path to check if the server is ready
-    checkEndpoint: /health
+    # - optional, default: /health
    # - use "none" to skip endpoint ready checking
    # - endpoint is expected to return an HTTP 200 response
    # - all requests wait until the endpoint is ready (or fails)
    checkEndpoint: /custom-endpoint
-    # unload model after 5 seconds
+    # ttl: automatically unload the model after this many seconds
-    ttl: 5
+    # - optional, default: 0
    # - ttl values must be a value greater than 0
    # - a value of 0 disables automatic unloading of the model
    ttl: 60
-  "qwen":
+    # useModelName: overrides the model name that is sent to upstream server
-    cmd: models/llama-server-osx --port ${PORT} -m models/qwen2.5-0.5b-instruct-q8_0.gguf
+    # - optional, default: ""
-    aliases:
+    # - useful when the upstream server expects a specific model name or format
-      - gpt-3.5-turbo
+    useModelName: "qwen:qwq"
-  # Embedding example with Nomic
+    # filters: a dictionary of filter settings
-  # https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF
+    # - optional, default: empty dictionary
-  "nomic":
+    filters:
-    cmd: |
+      # strip_params: a comma separated list of parameters to remove from the request
-      models/llama-server-osx --port ${PORT}
+      # - optional, default: ""
-      -m models/nomic-embed-text-v1.5.Q8_0.gguf
+      # - useful for preventing overriding of default server params by requests
-      --ctx-size 8192
+      # - `model` parameter is never removed
-      --batch-size 8192
+      # - can be any JSON key in the request body
-      --rope-scaling yarn
+      # - recommended to stick to sampling parameters
-      --rope-freq-scale 0.75
+      strip_params: "temperature, top_p, top_k"
      -ngl 99
      --embeddings
-  # Reranking example with bge-reranker
+  # Unlisted model example:
-  # https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF
+  "qwen-unlisted":
-  "bge-reranker":
+    # unlisted: true or false
-    cmd: |
+    # - optional, default: false
-      models/llama-server-osx --port ${PORT}
+    # - unlisted models do not show up in /v1/models or /upstream lists
-      -m models/bge-reranker-v2-m3-Q4_K_M.gguf
+    # - can be requested as normal through all apis
-      --ctx-size 8192
+    unlisted: true
-      --reranking
+    cmd: llama-server --port ${PORT} -m Llama-3.2-1B-Instruct-Q4_K_M.gguf -ngl 0
-  # Docker Support (v26.1.4+ required!)
+  # Docker example:
-  "dockertest":
+  # container run times like Docker and Podman can also be used with a
  # a combination of cmd and cmdStop.
  "docker-llama":
    proxy: "http://127.0.0.1:${PORT}"
    cmd: |
      docker run --name dockertest
      --init --rm -p ${PORT}:8080 -v /mnt/nvme/models:/models
-      ghcr.io/ggerganov/llama.cpp:server
+      ghcr.io/ggml-org/llama.cpp:server
      --model '/models/Qwen2.5-Coder-0.5B-Instruct-Q4_K_M.gguf'
-  "simple":
+    # cmdStop: command to run to stop the model gracefully
-    # example of setting environment variables
+    # - optional, default: ""
-    env:
+    # - useful for stopping commands managed by another system
-      - CUDA_VISIBLE_DEVICES=0,1
+    # - on POSIX systems: a SIGTERM is sent for graceful shutdown
-      - env1=hello
+    # - on Windows, taskkill is used
-    cmd: build/simple-responder --port ${PORT}
+    # - processes are given 5 seconds to shutdown until they are forcefully killed
-    unlisted: true
+    # - the upstream's process id is available in the ${PID} macro
    cmdStop: docker stop dockertest
-    # use "none" to skip check. Caution this may cause some requests to fail
+# groups: a dictionary of group settings
-    # until the upstream server is ready for traffic
+# - optional, default: empty dictionary
-    checkEndpoint: none
+# - provide advanced controls over model swapping behaviour.
 # - Using groups some models can be kept loaded indefinitely, while others are swapped out.
 # - model ids must be defined in the Models section
 # - a model can only be a member of one group
 # - group behaviour is controlled via the `swap`, `exclusive` and `persistent` fields
 # - see issue #109 for details
 #
 # NOTE: the example below uses model names that are not defined above for demonstration purposes
 groups:
  # group1 is same as the default behaviour of llama-swap where only one model is allowed
  # to run a time across the whole llama-swap instance
  "group1":
    # swap: controls the model swapping behaviour in within the group
    # - optional, default: true
    # - true : only one model is allowed to run at a time
    # - false: all models can run together, no swapping
    swap: true
-  # don't use these, just for testing if things are broken
+    # exclusive: controls how the group affects other groups
-  "broken":
+    # - optional, default: true
-    cmd: models/llama-server-osx --port 8999 -m models/doesnotexist.gguf
+    # - true: causes all other groups to unload when this group runs a model
-    proxy: http://127.0.0.1:8999
+    # - false: does not affect other groups
-    unlisted: true
+    exclusive: true
-  "broken_timeout":
+
-    cmd: models/llama-server-osx --port 8999 -m models/qwen2.5-0.5b-instruct-q8_0.gguf
+    # members references the models defined above
-    proxy: http://127.0.0.1:9000
+    # required
-    unlisted: true
+    members:
      - "llama"
      - "qwen-unlisted"
  # Example:
  # - in this group all the models can run at the same time
  # - when a different group loads all running models in this group are unloaded
  "group2":
    swap: false
    exclusive: false
    members:
      - "docker-llama"
      - "modelA"
      - "modelB"
  # Example:
  # - a persistent group, prevents other groups from unloading it
  "forever":
    # persistent: prevents over groups from unloading the models in this group
    # - optional, default: false
    # - does not affect individual model behaviour
    persistent: true
    # set swap/exclusive to false to prevent swapping inside the group
    # and the unloading of other groups
    swap: false
    exclusive: false
    members:
      - "forever-modelA"
      - "forever-modelB"
      - "forever-modelc"
@@ -144,8 +144,8 @@ func watchConfigFileWithReload(configPath string, reloadChan chan<- *proxy.Proxy
 			if !ok {
 				return
 			}
-			// We only care about writes to the specific config file
+			// We only care about writes/creates to the specific config file
-			if event.Name == configPath && event.Has(fsnotify.Write) {
+			if event.Name == configPath && (event.Has(fsnotify.Write) || event.Has(fsnotify.Create)) {
 				// Reset or start the debounce timer
 				if debounceTimer != nil {
 					debounceTimer.Stop()
@@ -42,9 +42,12 @@ func main() {
 			time.Sleep(wait)
 		}
 		bodyBytes, _ := io.ReadAll(c.Request.Body)
 		c.JSON(http.StatusOK, gin.H{
 			"responseMessage":  *responseMessage,
 			"h_content_length": c.Request.Header.Get("Content-Length"),
 			"request_body":     string(bodyBytes),
 		})
 	})
@@ -6,6 +6,7 @@ import (
 	"os"
 	"regexp"
 	"runtime"
 	"slices"
 	"sort"
 	"strconv"
 	"strings"
@@ -29,6 +30,9 @@ type ModelConfig struct {
 	// Limit concurrency of HTTP requests to process
 	ConcurrencyLimit int `yaml:"concurrencyLimit"`
 	// Model filters see issue #174
 	Filters ModelFilters `yaml:"filters"`
 }
 func (m *ModelConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
@@ -63,6 +67,46 @@ func (m *ModelConfig) SanitizedCommand() ([]string, error) {
 	return SanitizeCommand(m.Cmd)
 }
 // ModelFilters see issue #174
 type ModelFilters struct {
 	StripParams string `yaml:"strip_params"`
 }
 func (m *ModelFilters) UnmarshalYAML(unmarshal func(interface{}) error) error {
 	type rawModelFilters ModelFilters
 	defaults := rawModelFilters{
 		StripParams: "",
 	}
 	if err := unmarshal(&defaults); err != nil {
 		return err
 	}
 	*m = ModelFilters(defaults)
 	return nil
 }
 func (f ModelFilters) SanitizedStripParams() ([]string, error) {
 	if f.StripParams == "" {
 		return nil, nil
 	}
 	params := strings.Split(f.StripParams, ",")
 	cleaned := make([]string, 0, len(params))
 	for _, param := range params {
 		trimmed := strings.TrimSpace(param)
 		if trimmed == "model" || trimmed == "" {
 			continue
 		}
 		cleaned = append(cleaned, trimmed)
 	}
 	// sort cleaned
 	slices.Sort(cleaned)
 	return cleaned, nil
 }
 type GroupConfig struct {
 	Swap       bool     `yaml:"swap"`
 	Exclusive  bool     `yaml:"exclusive"`
@@ -212,6 +256,7 @@ func LoadConfigFromReader(r io.Reader) (Config, error) {
 			modelConfig.CmdStop = strings.ReplaceAll(modelConfig.CmdStop, macroSlug, macroValue)
 			modelConfig.Proxy = strings.ReplaceAll(modelConfig.Proxy, macroSlug, macroValue)
 			modelConfig.CheckEndpoint = strings.ReplaceAll(modelConfig.CheckEndpoint, macroSlug, macroValue)
 			modelConfig.Filters.StripParams = strings.ReplaceAll(modelConfig.Filters.StripParams, macroSlug, macroValue)
 		}
 		// enforce ${PORT} used in both cmd and proxy
@@ -83,6 +83,9 @@ models:
 		assert.Equal(t, "", model1.UseModelName)
 		assert.Equal(t, 0, model1.ConcurrencyLimit)
 	}
 	// default empty filter exists
 	assert.Equal(t, "", model1.Filters.StripParams)
 }
 func TestConfig_LoadPosix(t *testing.T) {
@@ -300,3 +300,28 @@ models:
 		})
 	}
 }
 func TestConfig_ModelFilters(t *testing.T) {
 	content := `
 macros:
  default_strip: "temperature, top_p"
 models:
  model1:
    cmd: path/to/cmd --port ${PORT}
    filters:
      strip_params: "model, top_k, ${default_strip}, , ,"
 `
 	config, err := LoadConfigFromReader(strings.NewReader(content))
 	assert.NoError(t, err)
 	modelConfig, ok := config.Models["model1"]
 	if !assert.True(t, ok) {
 		t.FailNow()
 	}
 	// make sure `model` and enmpty strings are not in the list
 	assert.Equal(t, "model, top_k, temperature, top_p, , ,", modelConfig.Filters.StripParams)
 	sanitized, err := modelConfig.Filters.SanitizedStripParams()
 	if assert.NoError(t, err) {
 		assert.Equal(t, []string{"temperature", "top_k", "top_p"}, sanitized)
 	}
 }
@@ -80,6 +80,9 @@ models:
 		assert.Equal(t, "", model1.UseModelName)
 		assert.Equal(t, 0, model1.ConcurrencyLimit)
 	}
 	// default empty filter exists
 	assert.Equal(t, "", model1.Filters.StripParams)
 }
 func TestConfig_LoadWindows(t *testing.T) {
@@ -189,18 +189,19 @@ func (p *Process) start() error {
 	p.waitStarting.Add(1)
 	defer p.waitStarting.Done()
 	cmdContext, ctxCancelUpstream := context.WithCancel(context.Background())
-	p.proxyLogger.Debugf("<%s> Executing start command: %s", p.ID, strings.Join(args, " "))
+
 	p.cmd = exec.CommandContext(cmdContext, args[0], args[1:]...)
 	p.cmd.Stdout = p.processLogger
 	p.cmd.Stderr = p.processLogger
-	p.cmd.Env = p.config.Env
+	p.cmd.Env = append(p.cmd.Environ(), p.config.Env...)
 	p.cmd.Cancel = p.cmdStopUpstreamProcess
 	p.cmd.WaitDelay = p.gracefulStopTimeout
 	p.cancelUpstream = ctxCancelUpstream
 	p.cmdWaitChan = make(chan struct{})
 	p.failedStartCount++ // this will be reset to zero when the process has successfully started
 	p.proxyLogger.Debugf("<%s> Executing start command: %s, env: %s", p.ID, strings.Join(args, " "), strings.Join(p.config.Env, ", "))
 	err = p.cmd.Start()
 	// Set process state to failed
@@ -531,7 +532,7 @@ func (p *Process) cmdStopUpstreamProcess() error {
 		stopCmd := exec.Command(stopArgs[0], stopArgs[1:]...)
 		stopCmd.Stdout = p.processLogger
 		stopCmd.Stderr = p.processLogger
-		stopCmd.Env = p.config.Env
+		stopCmd.Env = p.cmd.Env
 		if err := stopCmd.Run(); err != nil {
 			p.proxyLogger.Errorf("<%s> Failed to exec stop command: %v", p.ID, err)
@@ -394,6 +394,9 @@ func TestProcess_StopImmediately(t *testing.T) {
 // Test that SIGKILL is sent when gracefulStopTimeout is reached and properly terminates
 // the upstream command
 func TestProcess_ForceStopWithKill(t *testing.T) {
 	if runtime.GOOS == "windows" {
 		t.Skip("skipping SIGTERM test on Windows ")
 	}
 	expectedMessage := "test_sigkill"
 	binaryPath := getSimpleResponderPath()
@@ -405,7 +408,6 @@ func TestProcess_ForceStopWithKill(t *testing.T) {
 		Cmd:           fmt.Sprintf("%s --port %d --respond %s --silent --ignore-sig-term", binaryPath, port, expectedMessage),
 		Proxy:         fmt.Sprintf("http://127.0.0.1:%d", port),
 		CheckEndpoint: "/health",
 		CmdStop:       "taskkill /f /t /pid ${PID}",
 	}
 	process := NewProcess("stop_immediate", 2, config, debugLogger, debugLogger)
@@ -465,3 +467,27 @@ func TestProcess_StopCmd(t *testing.T) {
 	process.StopImmediately()
 	assert.Equal(t, process.CurrentState(), StateStopped)
 }
 func TestProcess_EnvironmentSetCorrectly(t *testing.T) {
 	expectedMessage := "test_env_not_emptied"
 	config := getTestSimpleResponderConfig(expectedMessage)
 	// ensure that the the default config does not blank out the inherited environment
 	configWEnv := config
 	// ensure the additiona variables are appended to the process' environment
 	configWEnv.Env = append(configWEnv.Env, "TEST_ENV1=1", "TEST_ENV2=2")
 	process1 := NewProcess("env_test", 2, config, debugLogger, debugLogger)
 	process2 := NewProcess("env_test", 2, configWEnv, debugLogger, debugLogger)
 	process1.start()
 	defer process1.Stop()
 	process2.start()
 	defer process2.Stop()
 	assert.NotZero(t, len(process1.cmd.Environ()))
 	assert.NotZero(t, len(process2.cmd.Environ()))
 	assert.Equal(t, len(process1.cmd.Environ())+2, len(process2.cmd.Environ()), "process2 should have 2 more environment variables than process1")
 }
@@ -365,6 +365,21 @@ func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
 		}
 	}
 	// issue #174 strip parameters from the JSON body
 	stripParams, err := pm.config.Models[realModelName].Filters.SanitizedStripParams()
 	if err != nil { // just log it and continue
 		pm.proxyLogger.Errorf("Error sanitizing strip params string: %s, %s", pm.config.Models[realModelName].Filters.StripParams, err.Error())
 	} else {
 		for _, param := range stripParams {
 			pm.proxyLogger.Debugf("<%s> stripping param: %s", realModelName, param)
 			bodyBytes, err = sjson.DeleteBytes(bodyBytes, param)
 			if err != nil {
 				pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error deleting parameter %s from request", param))
 				return
 			}
 		}
 	}
 	c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
 	// dechunk it as we already have all the body bytes see issue #11
@@ -623,3 +623,37 @@ func TestProxyManager_ChatContentLength(t *testing.T) {
 	assert.Equal(t, "81", response["h_content_length"])
 	assert.Equal(t, "model1", response["responseMessage"])
 }
 func TestProxyManager_FiltersStripParams(t *testing.T) {
 	modelConfig := getTestSimpleResponderConfig("model1")
 	modelConfig.Filters = ModelFilters{
 		StripParams: "temperature, model, stream",
 	}
 	config := AddDefaultGroupToConfig(Config{
 		HealthCheckTimeout: 15,
 		LogLevel:           "error",
 		Models: map[string]ModelConfig{
 			"model1": modelConfig,
 		},
 	})
 	proxy := New(config)
 	defer proxy.StopProcesses(StopWaitForInflightRequest)
 	reqBody := `{"model":"model1", "temperature":0.1, "x_param":"123", "y_param":"abc", "stream":true}`
 	req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewBufferString(reqBody))
 	w := httptest.NewRecorder()
 	proxy.ServeHTTP(w, req)
 	assert.Equal(t, http.StatusOK, w.Code)
 	var response map[string]string
 	assert.NoError(t, json.Unmarshal(w.Body.Bytes(), &response))
 	// `temperature` and `stream` are gone but model remains
 	assert.Equal(t, `{"model":"model1", "x_param":"123", "y_param":"abc"}`, response["request_body"])
 	// assert.Nil(t, response["temperature"])
 	// assert.Equal(t, "123", response["x_param"])
 	// assert.Equal(t, "abc", response["y_param"])
 	// t.Logf("%v", response)
 }
@@ -3,7 +3,11 @@
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <link rel="icon" type="image/png" href="/favicon.ico" />
+    <link rel="icon" type="image/png" href="/favicon-96x96.png" sizes="96x96" />
    <link rel="icon" type="image/svg+xml" href="/favicon.svg" />
    <link rel="shortcut icon" href="/favicon.ico" />
    <link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png" />
    <link rel="manifest" href="/site.webmanifest" />
    <title>llama-swap</title>
  </head>
  <body >
@@ -0,0 +1,21 @@
 {
  "name": "llama-swap",
  "short_name": "llama-swap",
  "icons": [
    {
      "src": "/web-app-manifest-192x192.png",
      "sizes": "192x192",
      "type": "image/png",
      "purpose": "maskable"
    },
    {
      "src": "/web-app-manifest-512x512.png",
      "sizes": "512x512",
      "type": "image/png",
      "purpose": "maskable"
    }
  ],
  "theme_color": "#ffffff",
  "background_color": "#ffffff",
  "display": "standalone"
 }
@@ -10,10 +10,10 @@ function App() {
    <Router basename="/ui/">
      <APIProvider>
        <div>
-          <nav className="bg-surface border-b border-border p-4">
+          <nav className="bg-surface border-b border-border p-2 h-[75px]">
-            <div className="flex items-center justify-between mx-auto px-4">
+            <div className="flex items-center justify-between mx-auto px-4 h-full">
-              <h1>llama-swap</h1>
+              <h1 className="flex items-center p-0">llama-swap</h1>
-              <div className="flex space-x-4">
+              <div className="flex items-center space-x-4">
                <NavLink to="/" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
                  Logs
                </NavLink>
@@ -12,6 +12,7 @@ interface APIProviderType {
  models: Model[];
  listModels: () => Promise<Model[]>;
  unloadAllModels: () => Promise<void>;
  loadModel: (model: string) => Promise<void>;
  enableProxyLogs: (enabled: boolean) => void;
  enableUpstreamLogs: (enabled: boolean) => void;
  enableModelUpdates: (enabled: boolean) => void;
@@ -57,9 +58,27 @@ export function APIProvider({ children }: APIProviderProps) {
  const enableProxyLogs = useCallback(
    (enabled: boolean) => {
      if (enabled) {
-        const eventSource = new EventSource("/logs/streamSSE/proxy");
+        let retryCount = 0;
-        eventSource.onmessage = handleProxyMessage;
+        const maxRetries = 3;
-        proxyEventSource.current = eventSource;
+        const initialDelay = 1000; // 1 second
        const connect = () => {
          const eventSource = new EventSource("/logs/streamSSE/proxy");
          eventSource.onmessage = handleProxyMessage;
          eventSource.onerror = () => {
            eventSource.close();
            if (retryCount < maxRetries) {
              retryCount++;
              const delay = initialDelay * Math.pow(2, retryCount - 1);
              setTimeout(connect, delay);
            }
          };
          proxyEventSource.current = eventSource;
        };
        connect();
      } else {
        proxyEventSource.current?.close();
        proxyEventSource.current = null;
@@ -71,15 +90,33 @@ export function APIProvider({ children }: APIProviderProps) {
  const enableUpstreamLogs = useCallback(
    (enabled: boolean) => {
      if (enabled) {
-        const eventSource = new EventSource("/logs/streamSSE/upstream");
+        let retryCount = 0;
-        eventSource.onmessage = handleUpstreamMessage;
+        const maxRetries = 3;
-        upstreamEventSource.current = eventSource;
+        const initialDelay = 1000; // 1 second
        const connect = () => {
          const eventSource = new EventSource("/logs/streamSSE/upstream");
          eventSource.onmessage = handleUpstreamMessage;
          eventSource.onerror = () => {
            eventSource.close();
            if (retryCount < maxRetries) {
              retryCount++;
              const delay = initialDelay * Math.pow(2, retryCount - 1);
              setTimeout(connect, delay);
            }
          };
          upstreamEventSource.current = eventSource;
        };
        connect();
      } else {
        upstreamEventSource.current?.close();
        upstreamEventSource.current = null;
      }
    },
-    [upstreamEventSource, handleUpstreamMessage]
+    [handleUpstreamMessage]
  );
  const enableModelUpdates = useCallback(
@@ -139,11 +176,26 @@ export function APIProvider({ children }: APIProviderProps) {
    }
  }, []);
  const loadModel = useCallback(async (model: string) => {
    try {
      const response = await fetch(`/upstream/${model}/`, {
        method: "GET",
      });
      if (!response.ok) {
        throw new Error(`Failed to load model: ${response.status}`);
      }
    } catch (error) {
      console.error("Failed to load model:", error);
      throw error; // Re-throw to let calling code handle it
    }
  }, []);
  const value = useMemo(
    () => ({
      models,
      listModels,
      unloadAllModels,
      loadModel,
      enableProxyLogs,
      enableUpstreamLogs,
      enableModelUpdates,
@@ -154,6 +206,7 @@ export function APIProvider({ children }: APIProviderProps) {
      models,
      listModels,
      unloadAllModels,
      loadModel,
      enableProxyLogs,
      enableUpstreamLogs,
      enableModelUpdates,
@@ -143,6 +143,10 @@
    @apply bg-surface p-2 px-4 text-sm rounded-full border border-2 transition-colors duration-200 border-btn-border;
  }
  .btn:hover {
    cursor: pointer;
  }
  .btn--sm {
    @apply px-2 py-0.5 text-xs;
  }
@@ -0,0 +1,18 @@
 export function processEvalTimes(text: string) {
  const lines = text.match(/^ *eval time.*$/gm) || [];
  let totalTokens = 0;
  let totalTime = 0;
  lines.forEach((line) => {
    const tokensMatch = line.match(/\/\s*(\d+)\s*tokens/);
    const timeMatch = line.match(/=\s*(\d+\.\d+)\s*ms/);
    if (tokensMatch) totalTokens += parseFloat(tokensMatch[1]);
    if (timeMatch) totalTime += parseFloat(timeMatch[1]);
  });
  const avgTokensPerSecond = totalTime > 0 ? totalTokens / (totalTime / 1000) : 0;
  return [lines.length, totalTokens, Math.round(avgTokensPerSecond * 100) / 100];
 }
@@ -15,7 +15,7 @@ const LogViewer = () => {
  }, []);
  return (
-    <div className="flex flex-col gap-5">
+    <div className="flex flex-col gap-5" style={{ height: "calc(100vh - 125px)" }}>
      <LogPanel id="proxy" title="Proxy Logs" logData={proxyLogs} />
      <LogPanel id="upstream" title="Upstream Logs" logData={upstreamLogs} />
    </div>
@@ -30,11 +30,8 @@ interface LogPanelProps {
 }
 export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
  const [isCollapsed, setIsCollapsed] = usePersistentState(`logPanel-${id}-isCollapsed`, false);
  const [filterRegex, setFilterRegex] = useState("");
  const [panelState, setPanelState] = usePersistentState<"hide" | "small" | "max">(
    `logPanel-${id}-panelState`,
    "small"
  );
  const [fontSize, setFontSize] = usePersistentState<"xxs" | "xs" | "small" | "normal">(
    `logPanel-${id}-fontSize`,
    "normal"
@@ -60,14 +57,6 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
    });
  }, []);
  const togglePanelState = useCallback(() => {
    setPanelState((prev) => {
      if (prev === "small") return "max";
      if (prev === "hide") return "small";
      return "hide";
    });
  }, []);
  const fontSizeClass = useMemo(() => {
    switch (fontSize) {
      case "xxs":
@@ -101,20 +90,21 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
  }, [filteredLogs]);
  return (
-    <div className={`bg-surface border border-border rounded-lg overflow-hidden flex flex-col ${className || ""}`}>
+    <div
      className={`bg-surface border border-border rounded-lg overflow-hidden flex flex-col ${
        !isCollapsed && "h-full"
      } ${className || ""}`}
    >
      <div className="p-4 border-b border-border bg-secondary">
        <div className="flex flex-col md:flex-row md:items-center md:justify-between gap-4">
          {/* Title - Always full width on mobile, normal on desktop */}
-          <div className="w-full md:w-auto" onClick={togglePanelState}>
+          <div className="w-full md:w-auto" onClick={() => setIsCollapsed(!isCollapsed)}>
            <h3 className="m-0 text-lg">{title}</h3>
          </div>
          <div className="flex flex-col sm:flex-row gap-4 w-full md:w-auto">
            {/* Sizing Buttons - Stacks vertically on mobile */}
            <div className="flex flex-wrap gap-2">
              <button className="btn" onClick={togglePanelState}>
                size: {panelState}
              </button>
              <button className="btn" onClick={toggleFontSize}>
                font: {fontSize}
              </button>
@@ -140,14 +130,11 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
        </div>
      </div>
-      {panelState !== "hide" && (
+      {!isCollapsed && (
-        <div className="flex-1 bg-background font-mono text-sm leading-[1.4] p-3">
+        <div className="flex-1 bg-background font-mono text-sm p-3 overflow-hidden">
          <pre
            ref={preTagRef}
-            className={`flex-1 p-4 overflow-y-auto whitespace-pre min-h-0 ${textWrapClass} ${fontSizeClass}`}
+            className={`h-full p-4 overflow-y-auto whitespace-pre min-h-0 ${textWrapClass} ${fontSizeClass}`}
            style={{
              maxHeight: panelState === "max" ? "1500px" : "500px",
            }}
          >
            {filteredLogs}
          </pre>
@@ -156,5 +143,4 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
    </div>
  );
 };
 export default LogViewer;
@@ -1,9 +1,10 @@
-import { useState, useEffect, useCallback } from "react";
+import { useState, useEffect, useCallback, useMemo } from "react";
 import { useAPI } from "../contexts/APIProvider";
 import { LogPanel } from "./LogViewer";
 import { processEvalTimes } from "../lib/Utils";
 export default function ModelsPage() {
-  const { models, enableModelUpdates, unloadAllModels, upstreamLogs, enableUpstreamLogs } = useAPI();
+  const { models, enableModelUpdates, unloadAllModels, loadModel, upstreamLogs, enableUpstreamLogs } = useAPI();
  const [isUnloading, setIsUnloading] = useState(false);
  useEffect(() => {
@@ -29,8 +30,12 @@ export default function ModelsPage() {
    }
  }, []);
  const [totalLines, totalTokens, avgTokensPerSecond] = useMemo(() => {
    return processEvalTimes(upstreamLogs);
  }, [upstreamLogs]);
  return (
-    <div className="h-screen">
+    <div>
      <div className="flex flex-col md:flex-row gap-4">
        {/* Left Column */}
        <div className="w-full md:w-1/2 flex items-top">
@@ -43,6 +48,7 @@ export default function ModelsPage() {
              <thead>
                <tr className="border-b border-primary">
                  <th className="text-left p-2">Name</th>
                  <th className="text-left p-2"></th>
                  <th className="text-left p-2">State</th>
                </tr>
              </thead>
@@ -50,10 +56,19 @@ export default function ModelsPage() {
                {models.map((model) => (
                  <tr key={model.id} className="border-b hover:bg-secondary-hover border-border">
                    <td className="p-2">
-                      <a href={`/upstream/${model.id}/`} className="underline" target="top">
+                      <a href={`/upstream/${model.id}/`} className="underline" target="_blank">
                        {model.id}
                      </a>
                    </td>
                    <td className="p-2">
                      <button
                        className="btn btn--sm"
                        disabled={model.state !== "stopped"}
                        onClick={() => loadModel(model.id)}
                      >
                        Load
                      </button>
                    </td>
                    <td className="p-2">
                      <span className={`status status--${model.state}`}>{model.state}</span>
                    </td>
@@ -65,8 +80,29 @@ export default function ModelsPage() {
        </div>
        {/* Right Column */}
-        <div className="w-full md:w-1/2  flex items-top">
+        <div className="w-full md:w-1/2 flex flex-col" style={{ height: "calc(100vh - 125px)" }}>
-          <LogPanel id="modelsupstream" title="Upstream Logs" logData={upstreamLogs} className="h-full" />
+          <div className="card mb-4 min-h-[250px]">
            <h2>Log Stats</h2>
            <p className="italic my-2">note: eval logs from llama-server</p>
            <table className="w-full border border-gray-200">
              <tbody>
                <tr className="border-b border-gray-200">
                  <td className="py-2 px-4 font-medium border-r border-gray-200">Requests</td>
                  <td className="py-2 px-4 text-right">{totalLines}</td>
                </tr>
                <tr className="border-b border-gray-200">
                  <td className="py-2 px-4 font-medium border-r border-gray-200">Total Tokens Generated</td>
                  <td className="py-2 px-4 text-right">{totalTokens}</td>
                </tr>
                <tr>
                  <td className="py-2 px-4 font-medium border-r border-gray-200">Average Tokens/Second</td>
                  <td className="py-2 px-4 text-right">{avgTokensPerSecond}</td>
                </tr>
              </tbody>
            </table>
          </div>
          <LogPanel id="modelsupstream" title="Upstream Logs" logData={upstreamLogs} />
        </div>
      </div>
    </div>
Author	SHA1	Message	Date
srevn	10606abf89	fix config hot-reload on macos (#180 ) Co-authored-by: srevn <srevn@github>	2025-06-26 09:20:50 -07:00
Benson Wong	fefd14903d	improve log display and add a small stats table in ui (#178 )	2025-06-25 12:27:49 -07:00
Benson Wong	717d64e336	update GUI image in README [skip ci]	2025-06-24 10:38:28 -07:00
Benson Wong	285191e655	Various UI improvements (#176 ) * add retry/backoff to reconnecting log streams * update favicons	2025-06-23 16:17:21 -07:00
Benson Wong	4236cec03a	Add Filters to Model Configuration (#174 ) llama-swap can strip specific keys in JSON requests. This is useful for removing the ability for clients to set sampling parameters like temperature, top_k, top_p, etc.	2025-06-23 10:52:29 -07:00
Alex O'Connell	756193d0dd	Load models in the UI without navigating the page (#173 ) * Load models in the UI without navigating the page * fix table layout for mobile	2025-06-19 14:39:07 -07:00
Benson Wong	a6b2e930d8	Update README.md [skip ci]	2025-06-18 11:47:08 -07:00
Benson Wong	9e02c22ff8	stopCmd should use same environment as p.cmd.Env (#171 , #172 )	2025-06-18 11:36:59 -07:00
Benson Wong	0bdbf2fdc1	fix more goreleaser deprecation warnings [skip ci]	2025-06-18 11:15:12 -07:00
Benson Wong	49035e2e8e	Append custom env vars instead of replace in Process (#171 ) Append custom env vars instead of replace in Process (#168, #169) PR #162 refactored the default configuration code. This introduced a subtle bug where `env` became `[]string{}` instead of the default of `nil`. In golang, `exec.Cmd.Env == nil` means to use the "current process's environment". By setting it to `[]string{}` as a default the Process's environment was emptied out which caused an array of strange and difficult to troubleshoot behaviour. See issues #168 and #169 This commit changes the behaviour to append model configured environment variables to the default list rather than replace them.	2025-06-18 11:09:13 -07:00
Benson Wong	9963ae18bf	fix? deprecation warning in .goreleaser.yaml [skip-ci]	2025-06-18 07:49:33 -07:00