UI: Allow editing of title (#246 )

- make <h1> title contentEditable - title setting persists across reloads in localStorage
improve example config [skip ci]
2025-08-17 09:42:06 -07:00 · 2025-08-17 09:19:04 -07:00 · 2025-08-15 21:44:08 -07:00 · 2025-08-15 15:38:12 -07:00 · 2025-08-14 10:27:28 -07:00 · 2025-08-14 10:02:16 -07:00
19 changed files with 510 additions and 67 deletions
@@ -2,7 +2,7 @@
 name: Bug Report
 about: I found a defect
 title: ''
-labels: bug
+labels: 'unconfirmed bug'
 assignees: ''
 ---
@@ -4,3 +4,4 @@ build/
 dist/
 .vscode
 .DS_Store
 .dev/
@@ -31,8 +31,9 @@ Written in golang, it is very easy to install (single binary with no dependencie
 - ✅ Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
 - ✅ Automatic unloading of models after timeout by setting a `ttl`
 - ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
- ✅ Docker and Podman support
+- ✅ Reliable Docker and Podman support with `cmdStart` and `cmdStop`
 - ✅ Full control over server settings per model
 - ✅ Preload models on startup with `hooks` ([#235](https://github.com/mostlygeek/llama-swap/pull/235))
 ## How does llama-swap work?
@@ -42,9 +43,9 @@ In the most basic configuration llama-swap handles one model at a time. For more
 ## config.yaml
-llama-swap is managed entirely through a yaml configuration file. 
+llama-swap is managed entirely through a yaml configuration file.
-It can be very minimal to start: 
+It can be very minimal to start:
 ```yaml
 models:
@@ -55,7 +56,7 @@ models:
      --port ${PORT}
 ```
-However, there are many more capabilities that llama-swap supports: 
+However, there are many more capabilities that llama-swap supports:
 - `groups` to run multiple models at once
 - `ttl` to automatically unload models
@@ -71,9 +72,13 @@ See the [configuration documentation](https://github.com/mostlygeek/llama-swap/w
 ## Web UI
-llama-swap ships with a real time web interface to monitor logs and status of models:
+llama-swap includes a real time web interface for monitoring logs and models:
-<img width="1786" height="1334" alt="image" src="https://github.com/user-attachments/assets/d6258cb9-1dad-40db-828f-2be860aec8fe" />
+<img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/adef4a8e-de0b-49db-885a-8f6dedae6799" />
 The Activity Page shows recent requests:
 <img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/5f3edee6-d03a-4ae5-ae06-b20ac1f135bd" />
 ## Installation
@@ -86,7 +91,7 @@ llama-swap can be installed in multiple ways
 ### Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
-Docker images with llama-swap and llama-server are built nightly. 
+Docker images with llama-swap and llama-server are built nightly.
 ```shell
 # use CPU inference comes with the example config above
@@ -133,10 +138,10 @@ $ docker run -it --rm --runtime nvidia -p 9292:8080 \
 ### Homebrew Install (macOS/Linux)
-The latest release of `llama-swap` can be installed via [Homebrew](https://brew.sh). 
+The latest release of `llama-swap` can be installed via [Homebrew](https://brew.sh).
 ```shell
-# Set up tap and install formula 
+# Set up tap and install formula
 brew tap mostlygeek/llama-swap
 brew install llama-swap
 # Run llama-swap
@@ -1,9 +1,17 @@
 # llama-swap YAML configuration example
 # -------------------------------------
 #
 # 💡 Tip - Use an LLM with this file!
 # ====================================
 #  This example configuration is written to be LLM friendly. Try
 #  copying this file into an LLM and asking it to explain or generate
 #  sections for you.
 # ====================================
 # Usage notes:
 # - Below are all the available configuration options for llama-swap.
-# - Settings with a default value, or noted as optional can be omitted.
+# - Settings noted as "required" must be in your configuration file
-# - Settings that are marked required must be in your configuration file
+# - Settings noted as "optional" can be omitted
 # healthCheckTimeout: number of seconds to wait for a model to be ready to serve requests
 # - optional, default: 120
@@ -27,9 +35,9 @@ metricsMaxInMemory: 1000
 # - it is automatically incremented for every model that uses it
 startPort: 10001
-# macros: sets a dictionary of string:string pairs
+# macros: a dictionary of string substitutions
 # - optional, default: empty dictionary
-# - these are reusable snippets
+# - macros are reusable snippets
 # - used in a model's cmd, cmdStop, proxy and checkEndpoint
 # - useful for reducing common configuration settings
 macros:
@@ -92,44 +100,46 @@ models:
    # checkEndpoint: URL path to check if the server is ready
    # - optional, default: /health
    # - use "none" to skip endpoint ready checking
    # - endpoint is expected to return an HTTP 200 response
-    # - all requests wait until the endpoint is ready (or fails)
+    # - all requests wait until the endpoint is ready or fails
    # - use "none" to skip endpoint health checking
    checkEndpoint: /custom-endpoint
-    # ttl: automatically unload the model after this many seconds
+    # ttl: automatically unload the model after ttl seconds
    # - optional, default: 0
    # - ttl values must be a value greater than 0
    # - a value of 0 disables automatic unloading of the model
    ttl: 60
-    # useModelName: overrides the model name that is sent to upstream server
+    # useModelName: override the model name that is sent to upstream server
    # - optional, default: ""
-    # - useful when the upstream server expects a specific model name or format
+    # - useful for when the upstream server expects a specific model name that
    #   is different from the model's ID
    useModelName: "qwen:qwq"
    # filters: a dictionary of filter settings
    # - optional, default: empty dictionary
    # - only strip_params is currently supported
    filters:
      # strip_params: a comma separated list of parameters to remove from the request
      # - optional, default: ""
-      # - useful for preventing overriding of default server params by requests
+      # - useful for server side enforcement of sampling parameters
-      # - `model` parameter is never removed
+      # - the `model` parameter can never be removed
      # - can be any JSON key in the request body
      # - recommended to stick to sampling parameters
      strip_params: "temperature, top_p, top_k"
  # Unlisted model example:
  "qwen-unlisted":
-    # unlisted: true or false
+    # unlisted: boolean, true or false
    # - optional, default: false
-    # - unlisted models do not show up in /v1/models or /upstream lists
+    # - unlisted models do not show up in /v1/models api requests
    # - can be requested as normal through all apis
    unlisted: true
    cmd: llama-server --port ${PORT} -m Llama-3.2-1B-Instruct-Q4_K_M.gguf -ngl 0
  # Docker example:
-  # container run times like Docker and Podman can also be used with a
+  # container run times like Docker and Podman can be used reliably with a
  # a combination of cmd and cmdStop.
  "docker-llama":
    proxy: "http://127.0.0.1:${PORT}"
@@ -142,24 +152,26 @@ models:
    # cmdStop: command to run to stop the model gracefully
    # - optional, default: ""
    # - useful for stopping commands managed by another system
    # - on POSIX systems: a SIGTERM is sent for graceful shutdown
    # - on Windows, taskkill is used
    # - processes are given 5 seconds to shutdown until they are forcefully killed
    # - the upstream's process id is available in the ${PID} macro
    #
    # When empty, llama-swap has this default behaviour:
    # - on POSIX systems: a SIGTERM signal is sent
    # - on Windows, calls taskkill to stop the process
    # - processes have 5 seconds to shutdown until forceful termination is attempted
    cmdStop: docker stop dockertest
 # groups: a dictionary of group settings
 # - optional, default: empty dictionary
-# - provide advanced controls over model swapping behaviour.
+# - provides advanced controls over model swapping behaviour
-# - Using groups some models can be kept loaded indefinitely, while others are swapped out.
+# - using groups some models can be kept loaded indefinitely, while others are swapped out
-# - model ids must be defined in the Models section
+# - model IDs must be defined in the Models section
 # - a model can only be a member of one group
 # - group behaviour is controlled via the `swap`, `exclusive` and `persistent` fields
 # - see issue #109 for details
 #
 # NOTE: the example below uses model names that are not defined above for demonstration purposes
 groups:
-  # group1 is same as the default behaviour of llama-swap where only one model is allowed
+  # group1 works the same as the default behaviour of llama-swap where only one model is allowed
  # to run a time across the whole llama-swap instance
  "group1":
    # swap: controls the model swapping behaviour in within the group
@@ -181,10 +193,13 @@ groups:
      - "qwen-unlisted"
  # Example:
-  # - in this group all the models can run at the same time
+  # - in group2 all models can run at the same time
-  # - when a different group loads all running models in this group are unloaded
+  # - when a different group is loaded it causes all running models in this group to unload
  "group2":
    swap: false
    # exclusive: false does not unload other groups when a model in group2 is requested
    # - the models in group2 will be loaded but will not unload any other groups
    exclusive: false
    members:
      - "docker-llama"
@@ -207,3 +222,19 @@ groups:
      - "forever-modelA"
      - "forever-modelB"
      - "forever-modelc"
 # hooks: a dictionary of event triggers and actions
 # - optional, default: empty dictionary
 # - the only supported hook is on_startup
 hooks:
  # on_startup: a dictionary of actions to perform on startup
  # - optional, default: empty dictionary
  # - the only supported action is preload
  on_startup:
        # preload: a list of model ids to load on startup
        # - optional, default: empty list
        # - model names must match keys in the models sections
        # - when preloading multiple models at once, define a group
        #   otherwise models will be loaded and swapped out
    preload:
      - "llama"
@@ -0,0 +1,159 @@
 package main
 // created for issue: #252 https://github.com/mostlygeek/llama-swap/issues/252
 // this simple benchmark tool sends a lot of small chat completion requests to llama-swap
 // to make sure all the requests are accounted for.
 //
 // requests can be sent in parallel, and the tool will report the results.
 // usage: go run main.go -baseurl http://localhost:8080/v1 -model llama3 -requests 1000 -par 5
 import (
 	"bytes"
 	"flag"
 	"fmt"
 	"io"
 	"log"
 	"net/http"
 	"os"
 	"sync"
 	"time"
 )
 func main() {
 	// ----- CLI arguments ----------------------------------------------------
 	var (
 		baseurl         string
 		modelName       string
 		totalRequests   int
 		parallelization int
 	)
 	flag.StringVar(&baseurl, "baseurl", "http://localhost:8080/v1", "Base URL of the API (e.g., https://api.example.com)")
 	flag.StringVar(&modelName, "model", "", "Model name to use")
 	flag.IntVar(&totalRequests, "requests", 1, "Total number of requests to send")
 	flag.IntVar(&parallelization, "par", 1, "Maximum number of concurrent requests")
 	flag.Parse()
 	if baseurl == "" || modelName == "" {
 		fmt.Println("Error: both -baseurl and -model are required.")
 		flag.Usage()
 		os.Exit(1)
 	}
 	if totalRequests <= 0 {
 		fmt.Println("Error: -requests must be greater than 0.")
 		os.Exit(1)
 	}
 	if parallelization <= 0 {
 		fmt.Println("Error: -parallelization must be greater than 0.")
 		os.Exit(1)
 	}
 	// ----- HTTP client -------------------------------------------------------
 	client := &http.Client{
 		Timeout: 30 * time.Second,
 	}
 	// ----- Tracking response codes -------------------------------------------
 	statusCounts := make(map[int]int) // map[statusCode]count
 	var mu sync.Mutex                 // protects statusCounts
 	// ----- Request queue (buffered channel) ----------------------------------
 	requests := make(chan int, 10) // Buffered channel with capacity 10
 	// Goroutine to fill the request queue
 	go func() {
 		for i := 0; i < totalRequests; i++ {
 			requests <- i + 1
 		}
 		close(requests)
 	}()
 	// ----- Worker pool -------------------------------------------------------
 	var wg sync.WaitGroup
 	for i := 0; i < parallelization; i++ {
 		wg.Add(1)
 		go func(workerID int) {
 			defer wg.Done()
 			for reqID := range requests {
 				// Build request payload as a single line JSON string
 				payload := `{"model":"` + modelName + `","max_tokens":100,"stream":false,"messages":[{"role":"user","content":"write a snake game in python"}]}`
 				// Send POST request
 				req, err := http.NewRequest(http.MethodPost,
 					fmt.Sprintf("%s/chat/completions", baseurl),
 					bytes.NewReader([]byte(payload)))
 				if err != nil {
 					log.Printf("[worker %d][req %d] request creation error: %v", workerID, reqID, err)
 					mu.Lock()
 					statusCounts[-1]++
 					mu.Unlock()
 					continue
 				}
 				req.Header.Set("Content-Type", "application/json")
 				resp, err := client.Do(req)
 				if err != nil {
 					log.Printf("[worker %d][req %d] HTTP request error: %v", workerID, reqID, err)
 					mu.Lock()
 					statusCounts[-1]++
 					mu.Unlock()
 					continue
 				}
 				io.Copy(io.Discard, resp.Body)
 				resp.Body.Close()
 				// Record status code
 				mu.Lock()
 				statusCounts[resp.StatusCode]++
 				mu.Unlock()
 			}
 		}(i + 1)
 	}
 	// ----- Status ticker (prints every second) -------------------------------
 	done := make(chan struct{})
 	tickerDone := make(chan struct{})
 	go func() {
 		ticker := time.NewTicker(1 * time.Second)
 		startTime := time.Now()
 		for {
 			select {
 			case <-ticker.C:
 				mu.Lock()
 				// Compute how many requests have completed so far
 				completed := 0
 				for _, cnt := range statusCounts {
 					completed += cnt
 				}
 				// Calculate duration and progress
 				duration := time.Since(startTime)
 				progress := completed * 100 / totalRequests
 				fmt.Printf("Duration: %v, Completed: %d%% requests\n", duration, progress)
 				mu.Unlock()
 			case <-done:
 				duration := time.Since(startTime)
 				fmt.Printf("Duration: %v, Completed: %d%% requests\n", duration, 100)
 				close(tickerDone)
 				return
 			}
 		}
 	}()
 	// Wait for all workers to finish
 	wg.Wait()
 	close(done)  // stops the status-update goroutine
 	<-tickerDone // give ticker time to finish / print
 	// ----- Summary ------------------------------------------------------------
 	fmt.Println("\n\n=== HTTP response code summary ===")
 	mu.Lock()
 	for code, cnt := range statusCounts {
 		if code == -1 {
 			fmt.Printf("Client-side errors (no HTTP response): %d\n", cnt)
 		} else {
 			fmt.Printf("%d : %d\n", code, cnt)
 		}
 	}
 	mu.Unlock()
 }
@@ -138,6 +138,14 @@ func (c *GroupConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
 	return nil
 }
 type HooksConfig struct {
 	OnStartup HookOnStartup `yaml:"on_startup"`
 }
 type HookOnStartup struct {
 	Preload []string `yaml:"preload"`
 }
 type Config struct {
 	HealthCheckTimeout int                    `yaml:"healthCheckTimeout"`
 	LogRequests        bool                   `yaml:"logRequests"`
@@ -155,6 +163,9 @@ type Config struct {
 	// automatic port assignments
 	StartPort int `yaml:"startPort"`
 	// hooks, see: #209
 	Hooks HooksConfig `yaml:"hooks"`
 }
 func (c *Config) RealModelName(search string) (string, bool) {
@@ -330,6 +341,22 @@ func LoadConfigFromReader(r io.Reader) (Config, error) {
 		}
 	}
 	// clean up hooks preload
 	if len(config.Hooks.OnStartup.Preload) > 0 {
 		var toPreload []string
 		for _, modelID := range config.Hooks.OnStartup.Preload {
 			modelID = strings.TrimSpace(modelID)
 			if modelID == "" {
 				continue
 			}
 			if real, found := config.RealModelName(modelID); found {
 				toPreload = append(toPreload, real)
 			}
 		}
 		config.Hooks.OnStartup.Preload = toPreload
 	}
 	return config, nil
 }
@@ -100,6 +100,9 @@ func TestConfig_LoadPosix(t *testing.T) {
 	content := `
 macros:
  svr-path: "path/to/server"
 hooks:
  on_startup:
    preload: ["model1", "model2"]
 models:
  model1:
    cmd: path/to/cmd --arg1 one
@@ -163,6 +166,11 @@ groups:
 		Macros: map[string]string{
 			"svr-path": "path/to/server",
 		},
 		Hooks: HooksConfig{
 			OnStartup: HookOnStartup{
 				Preload: []string{"model1", "model2"},
 			},
 		},
 		Models: map[string]ModelConfig{
 			"model1": {
 				Cmd:           "path/to/cmd --arg1 one",
@@ -0,0 +1,27 @@
 package proxy
 import "net/http"
 // Custom discard writer that implements http.ResponseWriter but just discards everything
 type DiscardWriter struct {
 	header http.Header
 	status int
 }
 func (w *DiscardWriter) Header() http.Header {
 	if w.header == nil {
 		w.header = make(http.Header)
 	}
 	return w.header
 }
 func (w *DiscardWriter) Write(data []byte) (int, error) {
 	return len(data), nil
 }
 func (w *DiscardWriter) WriteHeader(code int) {
 	w.status = code
 }
 // Satisfy the http.Flusher interface for streaming responses
 func (w *DiscardWriter) Flush() {}
@@ -7,6 +7,7 @@ const ChatCompletionStatsEventID = 0x02
 const ConfigFileChangedEventID = 0x03
 const LogDataEventID = 0x04
 const TokenMetricsEventID = 0x05
 const ModelPreloadedEventID = 0x06
 type ProcessStateChangeEvent struct {
 	ProcessName string
@@ -48,3 +49,12 @@ type LogDataEvent struct {
 func (e LogDataEvent) Type() uint32 {
 	return LogDataEventID
 }
 type ModelPreloadedEvent struct {
 	ModelName string
 	Success   bool
 }
 func (e ModelPreloadedEvent) Type() uint32 {
 	return ModelPreloadedEventID
 }
@@ -13,9 +13,10 @@ import (
 )
 var (
-	nextTestPort int = 12000
+	nextTestPort        int = 12000
-	portMutex    sync.Mutex
+	portMutex           sync.Mutex
-	testLogger   = NewLogMonitorWriter(os.Stdout)
+	testLogger          = NewLogMonitorWriter(os.Stdout)
 	simpleResponderPath = getSimpleResponderPath()
 )
 // Check if the binary exists
@@ -69,13 +70,11 @@ func getTestSimpleResponderConfig(expectedMessage string) ModelConfig {
 }
 func getTestSimpleResponderConfigPort(expectedMessage string, port int) ModelConfig {
 	binaryPath := getSimpleResponderPath()
 	// Create a YAML string with just the values we want to set
 	yamlStr := fmt.Sprintf(`
 cmd: '%s --port %d --silent --respond %s'
 proxy: "http://127.0.0.1:%d"
-`, binaryPath, port, expectedMessage, port)
+`, simpleResponderPath, port, expectedMessage, port)
 	var cfg ModelConfig
 	if err := yaml.Unmarshal([]byte(yamlStr), &cfg); err != nil {
@@ -79,10 +79,12 @@ func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
 	outputTokens := int(jsonData.Get("usage.completion_tokens").Int())
 	inputTokens := int(jsonData.Get("usage.prompt_tokens").Int())
 	tokensPerSecond := -1.0
 	promptPerSecond := -1.0
 	durationMs := int(time.Since(rec.startTime).Milliseconds())
 	// use llama-server's timing data for tok/sec and duration as it is more accurate
 	if timings := jsonData.Get("timings"); timings.Exists() {
 		promptPerSecond = jsonData.Get("timings.prompt_per_second").Float()
 		tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float()
 		durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float())
 	}
@@ -92,6 +94,7 @@ func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
 		Model:           rec.realModelName,
 		InputTokens:     inputTokens,
 		OutputTokens:    outputTokens,
 		PromptPerSecond: promptPerSecond,
 		TokensPerSecond: tokensPerSecond,
 		DurationMs:      durationMs,
 	})
@@ -15,6 +15,7 @@ type TokenMetrics struct {
 	Model           string    `json:"model"`
 	InputTokens     int       `json:"input_tokens"`
 	OutputTokens    int       `json:"output_tokens"`
 	PromptPerSecond float64   `json:"prompt_per_second"`
 	TokensPerSecond float64   `json:"tokens_per_second"`
 	DurationMs      int       `json:"duration_ms"`
 }
@@ -15,6 +15,7 @@ import (
 	"time"
 	"github.com/gin-gonic/gin"
 	"github.com/mostlygeek/llama-swap/event"
 	"github.com/tidwall/gjson"
 	"github.com/tidwall/sjson"
 )
@@ -96,6 +97,35 @@ func New(config Config) *ProxyManager {
 	}
 	pm.setupGinEngine()
 	// run any startup hooks
 	if len(config.Hooks.OnStartup.Preload) > 0 {
 		// do it in the background, don't block startup -- not sure if good idea yet
 		go func() {
 			discardWriter := &DiscardWriter{}
 			for _, realModelName := range config.Hooks.OnStartup.Preload {
 				proxyLogger.Infof("Preloading model: %s", realModelName)
 				processGroup, _, err := pm.swapProcessGroup(realModelName)
 				if err != nil {
 					event.Emit(ModelPreloadedEvent{
 						ModelName: realModelName,
 						Success:   false,
 					})
 					proxyLogger.Errorf("Failed to preload model %s: %v", realModelName, err)
 					continue
 				} else {
 					req, _ := http.NewRequest("GET", "/", nil)
 					processGroup.ProxyRequest(realModelName, discardWriter, req)
 					event.Emit(ModelPreloadedEvent{
 						ModelName: realModelName,
 						Success:   true,
 					})
 				}
 			}
 		}()
 	}
 	return pm
 }
@@ -132,7 +132,7 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
 		}
 	}
-	sendMetrics := func(metrics TokenMetrics) {
+	sendMetrics := func(metrics []TokenMetrics) {
 		jsonData, err := json.Marshal(metrics)
 		if err == nil {
 			select {
@@ -168,16 +168,14 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
 	 * Send Metrics data
 	 */
 	defer event.On(func(e TokenMetricsEvent) {
-		sendMetrics(e.Metrics)
+		sendMetrics([]TokenMetrics{e.Metrics})
 	})()
 	// send initial batch of data
 	sendLogData("proxy", pm.proxyLogger.GetHistory())
 	sendLogData("upstream", pm.upstreamLogger.GetHistory())
 	sendModels()
-	for _, metrics := range pm.metricsMonitor.GetMetrics() {
+	sendMetrics(pm.metricsMonitor.GetMetrics())
 		sendMetrics(metrics)
 	}
 	for {
 		select {
@@ -14,6 +14,7 @@ import (
 	"testing"
 	"time"
 	"github.com/mostlygeek/llama-swap/event"
 	"github.com/stretchr/testify/assert"
 	"github.com/tidwall/gjson"
 )
@@ -832,3 +833,62 @@ func TestProxyManager_HealthEndpoint(t *testing.T) {
 	assert.Equal(t, http.StatusOK, rec.Code)
 	assert.Equal(t, "OK", rec.Body.String())
 }
 func TestProxyManager_StartupHooks(t *testing.T) {
 	// using real YAML as the configuration has gotten more complex
 	// is the right approach as LoadConfigFromReader() does a lot more
 	// than parse YAML now. Eventually migrate all tests to use this approach
 	configStr := strings.Replace(`
 logLevel: error
 hooks:
  on_startup:
    preload:
      - model1
      - model2
 groups:
  preloadTestGroup:
    swap: false
    members:
       - model1
       - model2
 models:
  model1:
    cmd: ${simpleresponderpath} --port ${PORT} --silent --respond model1
  model2:
      cmd: ${simpleresponderpath} --port ${PORT} --silent --respond model2
 `, "${simpleresponderpath}", simpleResponderPath, -1)
 	// Create a test model configuration
 	config, err := LoadConfigFromReader(strings.NewReader(configStr))
 	if !assert.NoError(t, err, "Invalid configuration") {
 		return
 	}
 	preloadChan := make(chan ModelPreloadedEvent, 2) // buffer for 2 expected events
 	unsub := event.On(func(e ModelPreloadedEvent) {
 		preloadChan <- e
 	})
 	defer unsub()
 	// Create the proxy which should trigger preloading
 	proxy := New(config)
 	defer proxy.StopProcesses(StopWaitForInflightRequest)
 	for i := 0; i < 2; i++ {
 		select {
 		case <-preloadChan:
 		case <-time.After(5 * time.Second):
 			t.Fatal("timed out waiting for models to preload")
 		}
 	}
 	// make sure they are both loaded
 	_, foundGroup := proxy.processGroups["preloadTestGroup"]
 	if !assert.True(t, foundGroup, "preloadTestGroup should exist") {
 		return
 	}
 	assert.Equal(t, StateReady, proxy.processGroups["preloadTestGroup"].processes["model1"].CurrentState())
 	assert.Equal(t, StateReady, proxy.processGroups["preloadTestGroup"].processes["model2"].CurrentState())
 }
@@ -1,13 +1,29 @@
 import { useEffect, useCallback } from "react";
 import { BrowserRouter as Router, Routes, Route, Navigate, NavLink } from "react-router-dom";
 import { useTheme } from "./contexts/ThemeProvider";
 import { APIProvider } from "./contexts/APIProvider";
 import LogViewerPage from "./pages/LogViewer";
 import ModelPage from "./pages/Models";
 import ActivityPage from "./pages/Activity";
 import ConnectionStatus from "./components/ConnectionStatus";
 import { RiSunFill, RiMoonFill } from "react-icons/ri";
 import { usePersistentState } from "./hooks/usePersistentState";
 function App() {
  const { isNarrow, toggleTheme, isDarkMode } = useTheme();
  const [appTitle, setAppTitle] = usePersistentState("app-title", "llama-swap");
  const handleTitleChange = useCallback(
    (newTitle: string) => {
      setAppTitle(newTitle);
      document.title = newTitle;
    },
    [setAppTitle]
  );
  useEffect(() => {
    document.title = appTitle; // Set initial title
  }, [appTitle]);
  return (
    <Router basename="/ui/">
@@ -15,7 +31,28 @@ function App() {
        <div className="flex flex-col h-screen">
          <nav className="bg-surface border-b border-border p-2 h-[75px]">
            <div className="flex items-center justify-between mx-auto px-4 h-full">
-              {!isNarrow && <h1 className="flex items-center p-0">llama-swap</h1>}
+              {!isNarrow && (
                <h1
                  contentEditable
                  suppressContentEditableWarning
                  className="flex items-center p-0 outline-none hover:bg-gray-100 dark:hover:bg-gray-700 rounded px-1"
                  onBlur={(e) =>
                    handleTitleChange(e.currentTarget.textContent?.replace(/\n/g, "").trim() || "llama-swap")
                  }
                  onKeyDown={(e) => {
                    if (e.key === "Enter") {
                      e.preventDefault();
                      const sanitizedText =
                        e.currentTarget.textContent?.replace(/\n/g, "").trim().substring(0, 25) || "llama-swap";
                      handleTitleChange(sanitizedText);
                      e.currentTarget.textContent = sanitizedText;
                      e.currentTarget.blur();
                    }
                  }}
                >
                  {appTitle}
                </h1>
              )}
              <div className="flex items-center space-x-4">
                <NavLink to="/" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
                  Logs
@@ -31,6 +68,7 @@ function App() {
                <button className="" onClick={toggleTheme}>
                  {isDarkMode ? <RiMoonFill /> : <RiSunFill />}
                </button>
                <ConnectionStatus />
              </div>
            </div>
          </nav>
@@ -0,0 +1,36 @@
 import { useAPI } from "../contexts/APIProvider";
 import { useEffect, useState, useMemo } from "react";
 type ConnectionStatus = "disconnected" | "connecting" | "connected";
 const ConnectionStatus = () => {
  const { getConnectionStatus } = useAPI();
  const [eventStreamStatus, setEventStreamStatus] = useState<ConnectionStatus>("disconnected");
  useEffect(() => {
    const interval = setInterval(() => {
      setEventStreamStatus(getConnectionStatus());
    }, 1000);
    return () => clearInterval(interval);
  });
  const eventStatusColor = useMemo(() => {
    switch (eventStreamStatus) {
      case "connected":
        return "bg-green-500";
      case "connecting":
        return "bg-yellow-500";
      case "disconnected":
      default:
        return "bg-red-500";
    }
  }, [eventStreamStatus]);
  return (
    <div className="flex items-center" title={`event stream: ${eventStreamStatus}`}>
      <span className={`inline-block w-3 h-3 rounded-full ${eventStatusColor} mr-2`}></span>
    </div>
  );
 };
 export default ConnectionStatus;
@@ -20,6 +20,7 @@ interface APIProviderType {
  proxyLogs: string;
  upstreamLogs: string;
  metrics: Metrics[];
  getConnectionStatus: () => "connected" | "connecting" | "disconnected";
 }
 interface Metrics {
@@ -28,6 +29,7 @@ interface Metrics {
  model: string;
  input_tokens: number;
  output_tokens: number;
  prompt_per_second: number;
  tokens_per_second: number;
  duration_ms: number;
 }
@@ -62,6 +64,16 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
    });
  }, []);
  const getConnectionStatus = useCallback(() => {
    if (apiEventSource.current?.readyState === EventSource.OPEN) {
      return "connected";
    } else if (apiEventSource.current?.readyState === EventSource.CONNECTING) {
      return "connecting";
    } else {
      return "disconnected";
    }
  }, []);
  const enableAPIEvents = useCallback((enabled: boolean) => {
    if (!enabled) {
      apiEventSource.current?.close();
@@ -76,6 +88,14 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
    const connect = () => {
      const eventSource = new EventSource("/api/events");
      eventSource.onopen = () => {
        // clear everything out on connect to keep things in sync
        setProxyLogs("");
        setUpstreamLogs("");
        setMetrics([]); // clear metrics on reconnect
        setModels([]); // clear models on reconnect
      };
      eventSource.onmessage = (e: MessageEvent) => {
        try {
          const message = JSON.parse(e.data) as APIEventEnvelope;
@@ -107,9 +127,9 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
            case "metrics":
              {
-                const newMetric = JSON.parse(message.data) as Metrics;
+                const newMetrics = JSON.parse(message.data) as Metrics[];
                setMetrics((prevMetrics) => {
-                  return [newMetric, ...prevMetrics];
+                  return [...newMetrics, ...prevMetrics];
                });
              }
              break;
@@ -193,6 +213,7 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
      proxyLogs,
      upstreamLogs,
      metrics,
      getConnectionStatus,
    }),
    [models, listModels, unloadAllModels, loadModel, enableAPIEvents, proxyLogs, upstreamLogs, metrics]
  );
@@ -1,4 +1,4 @@
-import { useState, useEffect } from "react";
+import { useMemo } from "react";
 import { useAPI } from "../contexts/APIProvider";
 const formatTimestamp = (timestamp: string): string => {
@@ -15,25 +15,10 @@ const formatDuration = (ms: number): string => {
 const ActivityPage = () => {
  const { metrics } = useAPI();
-  const [error, setError] = useState<string | null>(null);
+  const sortedMetrics = useMemo(() => {
-
+    return [...metrics].sort((a, b) => b.id - a.id);
  useEffect(() => {
    if (metrics.length > 0) {
      setError(null);
    }
  }, [metrics]);
  if (error) {
    return (
      <div className="p-6">
        <h1 className="text-2xl font-bold mb-4">Activity</h1>
        <div className="bg-red-50 border border-red-200 rounded-md p-4">
          <p className="text-red-800">{error}</p>
        </div>
      </div>
    );
  }
  return (
    <div className="p-6">
      <h1 className="text-2xl font-bold mb-4">Activity</h1>
@@ -47,21 +32,25 @@ const ActivityPage = () => {
          <table className="min-w-full divide-y">
            <thead>
              <tr>
                <th className="px-4 py-3 text-left text-xs font-medium uppercase tracking-wider">Id</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Timestamp</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Model</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Input Tokens</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Output Tokens</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Prompt Processing</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Generation Speed</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Duration</th>
              </tr>
            </thead>
            <tbody className="divide-y">
-              {metrics.map((metric, index) => (
+              {sortedMetrics.map((metric) => (
-                <tr key={`${metric.id}-${index}`}>
+                <tr key={`metric_${metric.id}`}>
                  <td className="px-4 py-4 whitespace-nowrap text-sm">{metric.id + 1 /* un-zero index */}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatTimestamp(metric.timestamp)}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.model}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.input_tokens.toLocaleString()}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.output_tokens.toLocaleString()}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.prompt_per_second)}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.tokens_per_second)}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatDuration(metric.duration_ms)}</td>
                </tr>
Author	SHA1	Message	Date
Benson Wong	fcc5ad135a	UI: Allow editing of title (#246 ) - make <h1> title contentEditable - title setting persists across reloads in localStorage	2025-08-17 09:42:06 -07:00
Benson Wong	305e5a0031	improve example config [skip ci]	2025-08-17 09:19:04 -07:00
Benson Wong	04fc67354a	Improve Activity event handling in the UI (#254 ) Improve Activity event handling in the UI - fixes #252 found that the Activity page showed activity inconsistent with /api/metrics - Change data structure for event metrics to array. - Add Event stream connections status indicator	2025-08-15 21:44:08 -07:00
Benson Wong	4662cf7699	add 'unconfirmed bug' as default label in bug-report.md	2025-08-15 15:38:12 -07:00
Benson Wong	5dc6b3e6d9	Add barebones but working implementation of model preload (#209 , #235 ) Add barebones but working implementation of model preload * add config test for Preload hook * improve TestProxyManager_StartupHooks * docs for new hook configuration * add a .dev to .gitignore	2025-08-14 10:27:28 -07:00
Benson Wong	74c69f39ef	Add prompt processing metrics (#250 ) - capture prompt processing metrics - display prompt processing metrics on UI Activity page	2025-08-14 10:02:16 -07:00
Benson Wong	a186318892	Update Readme, Add screenshot for Activities page [skip ci]	2025-08-08 13:39:46 -07:00
Benson Wong	c4e4d5e1e9	Update Readme UI Screenshot [skip ci]	2025-08-08 13:33:47 -07:00