add tokens processed to ui models page

Update bug-report.md [skip ci]
Add gofmt linting to ci
2025-08-08 13:28:39 -07:00 · 2025-08-08 09:52:05 -07:00 · 2025-08-07 20:29:18 -07:00 · 2025-08-07 20:16:56 -07:00 · 2025-08-07 11:07:03 -07:00 · 2025-08-06 14:02:22 -07:00
27 changed files with 1069 additions and 233 deletions
@@ -1,11 +1,13 @@
 ---
 name: Bug Report
-about: Something is not working as expected...
+about: I found a defect
 title: ''
 labels: bug
 assignees: ''
 ---
 > [!IMPORTANT]
 > If you have questions about llama-swap please post in the Q&A in Discussions. Use bug reports when you've found a defect and wish to discuss a fix.
 **Describe the bug**
 A clear and concise description of what the bug is.
@@ -22,6 +22,13 @@ jobs:
      with:
        go-version: '1.23'
    # Only run in this linux based runner
    - name: Check Formatting
      run: |
        if [ "$(gofmt -l . | grep -v 'event/.*_test.go' | wc -l)" -gt 0 ]; then
          gofmt -l . | grep -v 'event/.*_test.go'
          exit 1
        fi
    # cache simple-responder to save the build time
    - name: Restore Simple Responder
      id: restore-simple-responder
@@ -7,6 +7,10 @@ on:
  # Allows manual triggering of the workflow
  workflow_dispatch:
    inputs:
      tag:
        description: 'Tag version to release (e.g. v144)'
        required: true
 permissions:
  contents: write
@@ -20,15 +24,15 @@ jobs:
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          ref: ${{ github.event.inputs.tag || github.ref }}
      -
        name: Set up Go
        uses: actions/setup-go@v5
      -
        name: Set up Node.js
        uses: actions/setup-node@v4
        with:
-          node-version: '23'  # or your preferred version
+          node-version: '23'
      -
        name: Install dependencies and build UI
        run: |
@@ -46,4 +50,30 @@ jobs:
          version: '~> v2'
          args: release --clean
        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  trigger-tap-update:
    runs-on: ubuntu-latest
    needs: goreleaser
    steps:
      - name: "Resolve tag to dispatch"
        id: tag
        run: |
          if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
            echo "tag=${{ github.event.inputs.tag }}" >> "$GITHUB_OUTPUT"
          else
            echo "tag=${{ github.ref_name }}" >> "$GITHUB_OUTPUT"
          fi
      - name: "Trigger tap repository update"
        uses: peter-evans/repository-dispatch@v2
        with:
          token: ${{ secrets.TAP_REPO_PAT }}
          repository: mostlygeek/homebrew-llama-swap
          event-type: new-release
          client-payload: |
            {
              "release": {
                "tag_name": "${{ steps.tag.outputs.tag }}"
              }
            }
@@ -45,6 +45,7 @@ mac: ui
 linux: ui
 	@echo "Building Linux binary..."
 	GOOS=linux GOARCH=amd64 go build -ldflags="-X main.commit=${GIT_HASH} -X main.version=local_${GIT_HASH} -X main.date=${BUILD_DATE}" -o $(BUILD_DIR)/$(APP_NAME)-linux-amd64
 	GOOS=linux GOARCH=arm64 go build -ldflags="-X main.commit=${GIT_HASH} -X main.version=local_${GIT_HASH} -X main.date=${BUILD_DATE}" -o $(BUILD_DIR)/$(APP_NAME)-linux-arm64
 # Build Windows binary
 windows: ui
@@ -18,7 +18,7 @@ Written in golang, it is very easy to install (single binary with no dependencie
  - `v1/completions`
  - `v1/chat/completions`
  - `v1/embeddings`
-  - `v1/rerank`
+  - `v1/rerank`, `v1/reranking`, `rerank`
  - `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
  - `v1/audio/transcriptions` ([docs](https://github.com/mostlygeek/llama-swap/issues/41#issuecomment-2722637867))
 - ✅ llama-swap custom API endpoints
@@ -27,6 +27,7 @@ Written in golang, it is very easy to install (single binary with no dependencie
  - `/upstream/:model_id` - direct access to upstream HTTP server ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
  - `/unload` - manually unload running models ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
  - `/running` - list currently running models ([#61](https://github.com/mostlygeek/llama-swap/issues/61))
  - `/health` - just returns "OK"
 - ✅ Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
 - ✅ Automatic unloading of models after timeout by setting a `ttl`
 - ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
@@ -35,7 +36,7 @@ Written in golang, it is very easy to install (single binary with no dependencie
 ## How does llama-swap work?
-When a request is made to an OpenAI compatible endpoint, lama-swap will extract the `model` value and load the appropriate server configuration to serve it. If the wrong upstream server is running, it will be replaced with the correct one. This is where the "swap" part comes in. The upstream server is automatically swapped to the correct one to serve the request.
+When a request is made to an OpenAI compatible endpoint, llama-swap will extract the `model` value and load the appropriate server configuration to serve it. If the wrong upstream server is running, it will be replaced with the correct one. This is where the "swap" part comes in. The upstream server is automatically swapped to the correct one to serve the request.
 In the most basic configuration llama-swap handles one model at a time. For more advanced use cases, the `groups` feature allows multiple models to be loaded at the same time. You have complete control over how your system resources are used.
@@ -70,13 +71,22 @@ See the [configuration documentation](https://github.com/mostlygeek/llama-swap/w
 ## Web UI
-llama-swap ships with a web based interface to make it easier to monitor logs and check the status of models. 
+llama-swap ships with a real time web interface to monitor logs and status of models:
-<img width="1758" alt="image" src="https://github.com/user-attachments/assets/31ae5bcd-5efd-46b0-b64b-6db9e60196d3" />
+<img width="1786" height="1334" alt="image" src="https://github.com/user-attachments/assets/d6258cb9-1dad-40db-828f-2be860aec8fe" />
-## Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
+## Installation
-Docker is the quickest way to try out llama-swap:
+llama-swap can be installed in multiple ways
 1. Docker
 2. Homebrew (OSX and Linux)
 3. From release binaries
 4. From source
 ### Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
 Docker images with llama-swap and llama-server are built nightly. 
 ```shell
 # use CPU inference comes with the example config above
@@ -98,7 +108,7 @@ $ curl -s http://localhost:9292/v1/chat/completions \
 ```
 <details>
-<summary>Docker images are built nightly for cuda, intel, vulcan, etc ...</summary>
+<summary>Docker images are built nightly with llama-server for cuda, intel, vulcan and musa.</summary>
 They include:
@@ -121,9 +131,23 @@ $ docker run -it --rm --runtime nvidia -p 9292:8080 \
 </details>
-## Bare metal Install ([download](https://github.com/mostlygeek/llama-swap/releases))
+### Homebrew Install (macOS/Linux)
-Pre-built binaries are available for Linux, Mac, Windows and FreeBSD. These are automatically published and are likely a few hours ahead of the docker releases. The baremetal install works with any OpenAI compatible server, not just llama-server.
+The latest release of `llama-swap` can be installed via [Homebrew](https://brew.sh). 
 ```shell
 # Set up tap and install formula 
 brew tap mostlygeek/llama-swap
 brew install llama-swap
 # Run llama-swap
 llama-swap --config path/to/config.yaml --listen localhost:8080
 ```
 This will install the `llama-swap` binary and make it available in your path. See the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration)
 ### Pre-built Binaries ([download](https://github.com/mostlygeek/llama-swap/releases))
 Binaries are available for Linux, Mac, Windows and FreeBSD. These are automatically published and are likely a few hours ahead of the docker releases. The binary install works with any OpenAI compatible server, not just llama-server.
 1. Download a [release](https://github.com/mostlygeek/llama-swap/releases) appropriate for your OS and architecture.
 1. Create a configuration file, see the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration).
@@ -137,7 +161,7 @@ Pre-built binaries are available for Linux, Mac, Windows and FreeBSD. These are
 ### Building from source
 1. Build requires golang and nodejs for the user interface.
-1. `git clone git@github.com:mostlygeek/llama-swap.git`
+1. `git clone https://github.com/mostlygeek/llama-swap.git`
 1. `make clean all`
 1. Binaries will be in `build/` subdirectory
@@ -15,6 +15,12 @@ healthCheckTimeout: 500
 # - Valid log levels: debug, info, warn, error
 logLevel: info
 # metricsMaxInMemory: maximum number of metrics to keep in memory
 # - optional, default: 1000
 # - controls how many metrics are stored in memory before older ones are discarded
 # - useful for limiting memory usage when processing large volumes of metrics
 metricsMaxInMemory: 1000
 # startPort: sets the starting port number for the automatic ${PORT} macro.
 # - optional, default: 5800
 # - the ${PORT} macro can be used in model.cmd and model.proxy settings
@@ -200,4 +206,4 @@ groups:
    members:
      - "forever-modelA"
      - "forever-modelB"
-      - "forever-modelc"
+      - "forever-modelc"
@@ -1,5 +1,6 @@
 healthCheckTimeout: 300
 logRequests: true
 metricsMaxInMemory: 1000
 models:
  "qwen2.5":
@@ -132,6 +132,11 @@ func main() {
 						event.Emit(proxy.ConfigFileChangedEvent{
 							ReloadingState: proxy.ReloadingStateStart,
 						})
 					} else if changeEvent.Name == filepath.Join(configDir, "..data") && changeEvent.Has(fsnotify.Create) {
 						// the change for k8s configmap
 						event.Emit(proxy.ConfigFileChangedEvent{
 							ReloadingState: proxy.ReloadingStateStart,
 						})
 					}
 				case err := <-watcher.Errors:
@@ -35,20 +35,90 @@ func main() {
 	// Set up the handler function using the provided response message
 	r.POST("/v1/chat/completions", func(c *gin.Context) {
 		c.Header("Content-Type", "application/json")
 		// add a wait to simulate a slow query
 		if wait, err := time.ParseDuration(c.Query("wait")); err == nil {
 			time.Sleep(wait)
 		}
 		bodyBytes, _ := io.ReadAll(c.Request.Body)
-		c.JSON(http.StatusOK, gin.H{
+		// Check if streaming is requested
-			"responseMessage":  *responseMessage,
+		// Query is checked instead of JSON body since that event stream conflicts with other tests
-			"h_content_length": c.Request.Header.Get("Content-Length"),
+		isStreaming := c.Query("stream") == "true"
-			"request_body":     string(bodyBytes),
+
-		})
+		if isStreaming {
 			// Set headers for streaming
 			c.Header("Content-Type", "text/event-stream")
 			c.Header("Cache-Control", "no-cache")
 			c.Header("Connection", "keep-alive")
 			c.Header("Transfer-Encoding", "chunked")
 			// add a wait to simulate a slow query
 			if wait, err := time.ParseDuration(c.Query("wait")); err == nil {
 				time.Sleep(wait)
 			}
 			// Send 10 "asdf" tokens
 			for i := 0; i < 10; i++ {
 				data := gin.H{
 					"created": time.Now().Unix(),
 					"choices": []gin.H{
 						{
 							"index": 0,
 							"delta": gin.H{
 								"content": "asdf",
 							},
 							"finish_reason": nil,
 						},
 					},
 				}
 				c.SSEvent("message", data)
 				c.Writer.Flush()
 			}
 			// Send final data with usage info
 			finalData := gin.H{
 				"usage": gin.H{
 					"completion_tokens": 10,
 					"prompt_tokens":     25,
 					"total_tokens":      35,
 				},
 				// add timings to simulate llama.cpp
 				"timings": gin.H{
 					"prompt_n":             25,
 					"prompt_ms":            13,
 					"predicted_n":          10,
 					"predicted_ms":         17,
 					"predicted_per_second": 10,
 				},
 			}
 			c.SSEvent("message", finalData)
 			c.Writer.Flush()
 			// Send [DONE]
 			c.SSEvent("message", "[DONE]")
 			c.Writer.Flush()
 		} else {
 			c.Header("Content-Type", "application/json")
 			// add a wait to simulate a slow query
 			if wait, err := time.ParseDuration(c.Query("wait")); err == nil {
 				time.Sleep(wait)
 			}
 			c.JSON(http.StatusOK, gin.H{
 				"responseMessage":  *responseMessage,
 				"h_content_length": c.Request.Header.Get("Content-Length"),
 				"request_body":     string(bodyBytes),
 				"usage": gin.H{
 					"completion_tokens": 10,
 					"prompt_tokens":     25,
 					"total_tokens":      35,
 				},
 				"timings": gin.H{
 					"prompt_n":             25,
 					"prompt_ms":            13,
 					"predicted_n":          10,
 					"predicted_ms":         17,
 					"predicted_per_second": 10,
 				},
 			})
 		}
 	})
 	// for issue #62 to check model name strips profile slug
@@ -74,6 +144,11 @@ func main() {
 		c.Header("Content-Type", "application/json")
 		c.JSON(http.StatusOK, gin.H{
 			"responseMessage": *responseMessage,
 			"usage": gin.H{
 				"completion_tokens": 10,
 				"prompt_tokens":     25,
 				"total_tokens":      35,
 			},
 		})
 	})
@@ -142,6 +142,7 @@ type Config struct {
 	HealthCheckTimeout int                    `yaml:"healthCheckTimeout"`
 	LogRequests        bool                   `yaml:"logRequests"`
 	LogLevel           string                 `yaml:"logLevel"`
 	MetricsMaxInMemory int                    `yaml:"metricsMaxInMemory"`
 	Models             map[string]ModelConfig `yaml:"models"` /* key is model ID */
 	Profiles           map[string][]string    `yaml:"profiles"`
 	Groups             map[string]GroupConfig `yaml:"groups"` /* key is group ID */
@@ -194,6 +195,7 @@ func LoadConfigFromReader(r io.Reader) (Config, error) {
 		HealthCheckTimeout: 120,
 		StartPort:          5800,
 		LogLevel:           "info",
 		MetricsMaxInMemory: 1000,
 	}
 	err = yaml.Unmarshal(data, &config)
 	if err != nil {
@@ -196,6 +196,7 @@ groups:
 			},
 		},
 		HealthCheckTimeout: 15,
 		MetricsMaxInMemory: 1000,
 		Profiles: map[string][]string{
 			"test": {"model1", "model2"},
 		},
@@ -193,6 +193,7 @@ groups:
 			},
 		},
 		HealthCheckTimeout: 15,
 		MetricsMaxInMemory: 1000,
 		Profiles: map[string][]string{
 			"test": {"model1", "model2"},
 		},
@@ -6,6 +6,7 @@ const ProcessStateChangeEventID = 0x01
 const ChatCompletionStatsEventID = 0x02
 const ConfigFileChangedEventID = 0x03
 const LogDataEventID = 0x04
 const TokenMetricsEventID = 0x05
 type ProcessStateChangeEvent struct {
 	ProcessName string
@@ -0,0 +1,170 @@
 package proxy
 import (
 	"bytes"
 	"fmt"
 	"io"
 	"net/http"
 	"time"
 	"github.com/gin-gonic/gin"
 	"github.com/tidwall/gjson"
 )
 // MetricsMiddleware sets up the MetricsResponseWriter for capturing upstream requests
 func MetricsMiddleware(pm *ProxyManager) gin.HandlerFunc {
 	return func(c *gin.Context) {
 		bodyBytes, err := io.ReadAll(c.Request.Body)
 		if err != nil {
 			pm.sendErrorResponse(c, http.StatusBadRequest, "could not ready request body")
 			c.Abort()
 			return
 		}
 		c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
 		requestedModel := gjson.GetBytes(bodyBytes, "model").String()
 		if requestedModel == "" {
 			pm.sendErrorResponse(c, http.StatusBadRequest, "missing or invalid 'model' key")
 			c.Abort()
 			return
 		}
 		realModelName, found := pm.config.RealModelName(requestedModel)
 		if !found {
 			pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("could not find real modelID for %s", requestedModel))
 			c.Abort()
 			return
 		}
 		writer := &MetricsResponseWriter{
 			ResponseWriter: c.Writer,
 			metricsRecorder: &MetricsRecorder{
 				metricsMonitor: pm.metricsMonitor,
 				realModelName:  realModelName,
 				isStreaming:    gjson.GetBytes(bodyBytes, "stream").Bool(),
 				startTime:      time.Now(),
 			},
 		}
 		c.Writer = writer
 		c.Next()
 		rec := writer.metricsRecorder
 		rec.processBody(writer.body)
 	}
 }
 type MetricsRecorder struct {
 	metricsMonitor *MetricsMonitor
 	realModelName  string
 	isStreaming    bool
 	startTime      time.Time
 }
 // processBody handles response processing after request completes
 func (rec *MetricsRecorder) processBody(body []byte) {
 	if rec.isStreaming {
 		rec.processStreamingResponse(body)
 	} else {
 		rec.processNonStreamingResponse(body)
 	}
 }
 func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
 	usage := jsonData.Get("usage")
 	if !usage.Exists() {
 		return false
 	}
 	// default values
 	outputTokens := int(jsonData.Get("usage.completion_tokens").Int())
 	inputTokens := int(jsonData.Get("usage.prompt_tokens").Int())
 	tokensPerSecond := -1.0
 	durationMs := int(time.Since(rec.startTime).Milliseconds())
 	// use llama-server's timing data for tok/sec and duration as it is more accurate
 	if timings := jsonData.Get("timings"); timings.Exists() {
 		tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float()
 		durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float())
 	}
 	rec.metricsMonitor.addMetrics(TokenMetrics{
 		Timestamp:       time.Now(),
 		Model:           rec.realModelName,
 		InputTokens:     inputTokens,
 		OutputTokens:    outputTokens,
 		TokensPerSecond: tokensPerSecond,
 		DurationMs:      durationMs,
 	})
 	return true
 }
 func (rec *MetricsRecorder) processStreamingResponse(body []byte) {
 	// Iterate **backwards** through the lines looking for the data payload with
 	// usage data
 	lines := bytes.Split(body, []byte("\n"))
 	for i := len(lines) - 1; i >= 0; i-- {
 		line := bytes.TrimSpace(lines[i])
 		if len(line) == 0 {
 			continue
 		}
 		// SSE payload always follows "data:"
 		prefix := []byte("data:")
 		if !bytes.HasPrefix(line, prefix) {
 			continue
 		}
 		data := bytes.TrimSpace(line[len(prefix):])
 		if len(data) == 0 {
 			continue
 		}
 		if bytes.Equal(data, []byte("[DONE]")) {
 			// [DONE] line itself contains nothing of interest.
 			continue
 		}
 		if gjson.ValidBytes(data) {
 			if rec.parseAndRecordMetrics(gjson.ParseBytes(data)) {
 				return // short circuit if a metric was recorded
 			}
 		}
 	}
 }
 func (rec *MetricsRecorder) processNonStreamingResponse(body []byte) {
 	if len(body) == 0 {
 		return
 	}
 	// Parse JSON to extract usage information
 	if gjson.ValidBytes(body) {
 		rec.parseAndRecordMetrics(gjson.ParseBytes(body))
 	}
 }
 // MetricsResponseWriter captures the entire response for non-streaming
 type MetricsResponseWriter struct {
 	gin.ResponseWriter
 	body            []byte
 	metricsRecorder *MetricsRecorder
 }
 func (w *MetricsResponseWriter) Write(b []byte) (int, error) {
 	n, err := w.ResponseWriter.Write(b)
 	if err != nil {
 		return n, err
 	}
 	w.body = append(w.body, b...)
 	return n, nil
 }
 func (w *MetricsResponseWriter) WriteHeader(statusCode int) {
 	w.ResponseWriter.WriteHeader(statusCode)
 }
 func (w *MetricsResponseWriter) Header() http.Header {
 	return w.ResponseWriter.Header()
 }
@@ -0,0 +1,82 @@
 package proxy
 import (
 	"encoding/json"
 	"sync"
 	"time"
 	"github.com/mostlygeek/llama-swap/event"
 )
 // TokenMetrics represents parsed token statistics from llama-server logs
 type TokenMetrics struct {
 	ID              int       `json:"id"`
 	Timestamp       time.Time `json:"timestamp"`
 	Model           string    `json:"model"`
 	InputTokens     int       `json:"input_tokens"`
 	OutputTokens    int       `json:"output_tokens"`
 	TokensPerSecond float64   `json:"tokens_per_second"`
 	DurationMs      int       `json:"duration_ms"`
 }
 // TokenMetricsEvent represents a token metrics event
 type TokenMetricsEvent struct {
 	Metrics TokenMetrics
 }
 func (e TokenMetricsEvent) Type() uint32 {
 	return TokenMetricsEventID // defined in events.go
 }
 // MetricsMonitor parses llama-server output for token statistics
 type MetricsMonitor struct {
 	mu         sync.RWMutex
 	metrics    []TokenMetrics
 	maxMetrics int
 	nextID     int
 }
 func NewMetricsMonitor(config *Config) *MetricsMonitor {
 	maxMetrics := config.MetricsMaxInMemory
 	if maxMetrics <= 0 {
 		maxMetrics = 1000 // Default fallback
 	}
 	mp := &MetricsMonitor{
 		maxMetrics: maxMetrics,
 	}
 	return mp
 }
 // addMetrics adds a new metric to the collection and publishes an event
 func (mp *MetricsMonitor) addMetrics(metric TokenMetrics) {
 	mp.mu.Lock()
 	defer mp.mu.Unlock()
 	metric.ID = mp.nextID
 	mp.nextID++
 	mp.metrics = append(mp.metrics, metric)
 	if len(mp.metrics) > mp.maxMetrics {
 		mp.metrics = mp.metrics[len(mp.metrics)-mp.maxMetrics:]
 	}
 	event.Emit(TokenMetricsEvent{Metrics: metric})
 }
 // GetMetrics returns a copy of the current metrics
 func (mp *MetricsMonitor) GetMetrics() []TokenMetrics {
 	mp.mu.RLock()
 	defer mp.mu.RUnlock()
 	result := make([]TokenMetrics, len(mp.metrics))
 	copy(result, mp.metrics)
 	return result
 }
 // GetMetricsJSON returns metrics as JSON
 func (mp *MetricsMonitor) GetMetricsJSON() ([]byte, error) {
 	mp.mu.RLock()
 	defer mp.mu.RUnlock()
 	return json.Marshal(mp.metrics)
 }
@@ -8,6 +8,7 @@ import (
 	"mime/multipart"
 	"net/http"
 	"os"
 	"sort"
 	"strconv"
 	"strings"
 	"sync"
@@ -33,6 +34,8 @@ type ProxyManager struct {
 	upstreamLogger *LogMonitor
 	muxLogger      *LogMonitor
 	metricsMonitor *MetricsMonitor
 	processGroups map[string]*ProcessGroup
 	// shutdown signaling
@@ -78,6 +81,8 @@ func New(config Config) *ProxyManager {
 		muxLogger:      stdoutLogger,
 		upstreamLogger: upstreamLogger,
 		metricsMonitor: NewMetricsMonitor(&config),
 		processGroups: make(map[string]*ProcessGroup),
 		shutdownCtx:    shutdownCtx,
@@ -149,14 +154,18 @@ func (pm *ProxyManager) setupGinEngine() {
 		c.Next()
 	})
 	mm := MetricsMiddleware(pm)
 	// Set up routes using the Gin engine
-	pm.ginEngine.POST("/v1/chat/completions", pm.proxyOAIHandler)
+	pm.ginEngine.POST("/v1/chat/completions", mm, pm.proxyOAIHandler)
 	// Support legacy /v1/completions api, see issue #12
-	pm.ginEngine.POST("/v1/completions", pm.proxyOAIHandler)
+	pm.ginEngine.POST("/v1/completions", mm, pm.proxyOAIHandler)
 	// Support embeddings
-	pm.ginEngine.POST("/v1/embeddings", pm.proxyOAIHandler)
+	pm.ginEngine.POST("/v1/embeddings", mm, pm.proxyOAIHandler)
-	pm.ginEngine.POST("/v1/rerank", pm.proxyOAIHandler)
+	pm.ginEngine.POST("/v1/rerank", mm, pm.proxyOAIHandler)
 	pm.ginEngine.POST("/v1/reranking", mm, pm.proxyOAIHandler)
 	pm.ginEngine.POST("/rerank", mm, pm.proxyOAIHandler)
 	// Support audio/speech endpoint
 	pm.ginEngine.POST("/v1/audio/speech", pm.proxyOAIHandler)
@@ -183,6 +192,9 @@ func (pm *ProxyManager) setupGinEngine() {
 	pm.ginEngine.GET("/unload", pm.unloadAllModelsHandler)
 	pm.ginEngine.GET("/running", pm.listRunningProcessesHandler)
 	pm.ginEngine.GET("/health", func(c *gin.Context) {
 		c.String(http.StatusOK, "OK")
 	})
 	pm.ginEngine.GET("/favicon.ico", func(c *gin.Context) {
 		if data, err := reactStaticFS.ReadFile("ui_dist/favicon.ico"); err == nil {
@@ -322,6 +334,13 @@ func (pm *ProxyManager) listModelsHandler(c *gin.Context) {
 		data = append(data, record)
 	}
 	// Sort by the "id" key
 	sort.Slice(data, func(i, j int) bool {
 		si, _ := data[i]["id"].(string)
 		sj, _ := data[j]["id"].(string)
 		return si < sj
 	})
 	// Set CORS headers if origin exists
 	if origin := c.GetHeader("Origin"); origin != "" {
 		c.Header("Access-Control-Allow-Origin", origin)
@@ -342,7 +361,7 @@ func (pm *ProxyManager) proxyToUpstream(c *gin.Context) {
 		return
 	}
-	processGroup, _, err := pm.swapProcessGroup(requestedModel)
+	processGroup, realModelName, err := pm.swapProcessGroup(requestedModel)
 	if err != nil {
 		pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
 		return
@@ -350,7 +369,7 @@ func (pm *ProxyManager) proxyToUpstream(c *gin.Context) {
 	// rewrite the path
 	c.Request.URL.Path = c.Param("upstreamPath")
-	processGroup.ProxyRequest(requestedModel, c.Writer, c.Request)
+	processGroup.ProxyRequest(realModelName, c.Writer, c.Request)
 }
 func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
@@ -366,7 +385,13 @@ func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
 		return
 	}
-	processGroup, realModelName, err := pm.swapProcessGroup(requestedModel)
+	realModelName, found := pm.config.RealModelName(requestedModel)
 	if !found {
 		pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("could not find real modelID for %s", requestedModel))
 		return
 	}
 	processGroup, _, err := pm.swapProcessGroup(realModelName)
 	if err != nil {
 		pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
 		return
@@ -24,6 +24,7 @@ func addApiHandlers(pm *ProxyManager) {
 	{
 		apiGroup.POST("/models/unload", pm.apiUnloadAllModels)
 		apiGroup.GET("/events", pm.apiSendEvents)
 		apiGroup.GET("/metrics", pm.apiGetMetrics)
 	}
 }
@@ -85,6 +86,7 @@ type messageType string
 const (
 	msgTypeModelStatus messageType = "modelStatus"
 	msgTypeLogData     messageType = "logData"
 	msgTypeMetrics     messageType = "metrics"
 )
 type messageEnvelope struct {
@@ -130,6 +132,18 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
 		}
 	}
 	sendMetrics := func(metrics TokenMetrics) {
 		jsonData, err := json.Marshal(metrics)
 		if err == nil {
 			select {
 			case sendBuffer <- messageEnvelope{Type: msgTypeMetrics, Data: string(jsonData)}:
 			case <-ctx.Done():
 				return
 			default:
 			}
 		}
 	}
 	/**
 	 * Send updated models list
 	 */
@@ -150,10 +164,20 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
 		sendLogData("upstream", data)
 	})()
 	/**
 	 * Send Metrics data
 	 */
 	defer event.On(func(e TokenMetricsEvent) {
 		sendMetrics(e.Metrics)
 	})()
 	// send initial batch of data
 	sendLogData("proxy", pm.proxyLogger.GetHistory())
 	sendLogData("upstream", pm.upstreamLogger.GetHistory())
 	sendModels()
 	for _, metrics := range pm.metricsMonitor.GetMetrics() {
 		sendMetrics(metrics)
 	}
 	for {
 		select {
@@ -169,3 +193,12 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
 		}
 	}
 }
 func (pm *ProxyManager) apiGetMetrics(c *gin.Context) {
 	jsonData, err := pm.metricsMonitor.GetMetricsJSON()
 	if err != nil {
 		c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to get metrics"})
 		return
 	}
 	c.Data(http.StatusOK, "application/json", jsonData)
 }
@@ -9,6 +9,7 @@ import (
 	"net/http"
 	"net/http/httptest"
 	"strconv"
 	"strings"
 	"sync"
 	"testing"
 	"time"
@@ -165,9 +166,11 @@ func TestProxyManager_SwapMultiProcessParallelRequests(t *testing.T) {
 			}
 			mu.Lock()
-			var response map[string]string
+			var response map[string]interface{}
 			assert.NoError(t, json.Unmarshal(w.Body.Bytes(), &response))
-			results[key] = response["responseMessage"]
+			result, ok := response["responseMessage"].(string)
 			assert.Equal(t, ok, true)
 			results[key] = result
 			mu.Unlock()
 		}(key)
@@ -277,6 +280,51 @@ func TestProxyManager_ListModelsHandler(t *testing.T) {
 	assert.Empty(t, expectedModels, "not all expected models were returned")
 }
 func TestProxyManager_ListModelsHandler_SortedByID(t *testing.T) {
 	// Intentionally add models in non-sorted order and with an unlisted model
 	config := Config{
 		HealthCheckTimeout: 15,
 		Models: map[string]ModelConfig{
 			"zeta":  getTestSimpleResponderConfig("zeta"),
 			"alpha": getTestSimpleResponderConfig("alpha"),
 			"beta":  getTestSimpleResponderConfig("beta"),
 			"hidden": func() ModelConfig {
 				mc := getTestSimpleResponderConfig("hidden")
 				mc.Unlisted = true
 				return mc
 			}(),
 		},
 		LogLevel: "error",
 	}
 	proxy := New(config)
 	// Request models list
 	req := httptest.NewRequest("GET", "/v1/models", nil)
 	w := httptest.NewRecorder()
 	proxy.ServeHTTP(w, req)
 	assert.Equal(t, http.StatusOK, w.Code)
 	var response struct {
 		Data []map[string]interface{} `json:"data"`
 	}
 	if err := json.Unmarshal(w.Body.Bytes(), &response); err != nil {
 		t.Fatalf("Failed to parse JSON response: %v", err)
 	}
 	// We expect only the listed models in sorted order by id
 	expectedOrder := []string{"alpha", "beta", "zeta"}
 	if assert.Len(t, response.Data, len(expectedOrder), "unexpected number of listed models") {
 		got := make([]string, 0, len(response.Data))
 		for _, m := range response.Data {
 			id, _ := m["id"].(string)
 			got = append(got, id)
 		}
 		assert.Equal(t, expectedOrder, got, "models should be sorted by id ascending")
 	}
 }
 func TestProxyManager_Shutdown(t *testing.T) {
 	// make broken model configurations
 	model1Config := getTestSimpleResponderConfigPort("model1", 9991)
@@ -609,21 +657,34 @@ func TestProxyManager_CORSOptionsHandler(t *testing.T) {
 }
 func TestProxyManager_Upstream(t *testing.T) {
-	config := AddDefaultGroupToConfig(Config{
+	configStr := fmt.Sprintf(`
-		HealthCheckTimeout: 15,
+logLevel: error
-		Models: map[string]ModelConfig{
+models:
-			"model1": getTestSimpleResponderConfig("model1"),
+  model1:
-		},
+    cmd: %s -port ${PORT} -silent -respond model1
-		LogLevel: "error",
+    aliases: [model-alias]
-	})
+`, getSimpleResponderPath())
 	config, err := LoadConfigFromReader(strings.NewReader(configStr))
 	assert.NoError(t, err)
 	proxy := New(config)
 	defer proxy.StopProcesses(StopWaitForInflightRequest)
-	req := httptest.NewRequest("GET", "/upstream/model1/test", nil)
+	t.Run("main model name", func(t *testing.T) {
-	rec := httptest.NewRecorder()
+		req := httptest.NewRequest("GET", "/upstream/model1/test", nil)
-	proxy.ServeHTTP(rec, req)
+		rec := httptest.NewRecorder()
-	assert.Equal(t, http.StatusOK, rec.Code)
+		proxy.ServeHTTP(rec, req)
-	assert.Equal(t, "model1", rec.Body.String())
+		assert.Equal(t, http.StatusOK, rec.Code)
 		assert.Equal(t, "model1", rec.Body.String())
 	})
 	t.Run("model alias", func(t *testing.T) {
 		req := httptest.NewRequest("GET", "/upstream/model-alias/test", nil)
 		rec := httptest.NewRecorder()
 		proxy.ServeHTTP(rec, req)
 		assert.Equal(t, http.StatusOK, rec.Code)
 		assert.Equal(t, "model1", rec.Body.String())
 	})
 }
 func TestProxyManager_ChatContentLength(t *testing.T) {
@@ -644,7 +705,7 @@ func TestProxyManager_ChatContentLength(t *testing.T) {
 	proxy.ServeHTTP(w, req)
 	assert.Equal(t, http.StatusOK, w.Code)
-	var response map[string]string
+	var response map[string]interface{}
 	assert.NoError(t, json.Unmarshal(w.Body.Bytes(), &response))
 	assert.Equal(t, "81", response["h_content_length"])
 	assert.Equal(t, "model1", response["responseMessage"])
@@ -672,7 +733,7 @@ func TestProxyManager_FiltersStripParams(t *testing.T) {
 	proxy.ServeHTTP(w, req)
 	assert.Equal(t, http.StatusOK, w.Code)
-	var response map[string]string
+	var response map[string]interface{}
 	assert.NoError(t, json.Unmarshal(w.Body.Bytes(), &response))
 	// `temperature` and `stream` are gone but model remains
@@ -683,3 +744,91 @@ func TestProxyManager_FiltersStripParams(t *testing.T) {
 	// assert.Equal(t, "abc", response["y_param"])
 	// t.Logf("%v", response)
 }
 func TestProxyManager_MiddlewareWritesMetrics_NonStreaming(t *testing.T) {
 	config := AddDefaultGroupToConfig(Config{
 		HealthCheckTimeout: 15,
 		Models: map[string]ModelConfig{
 			"model1": getTestSimpleResponderConfig("model1"),
 		},
 		LogLevel: "error",
 	})
 	proxy := New(config)
 	defer proxy.StopProcesses(StopWaitForInflightRequest)
 	// Make a non-streaming request
 	reqBody := `{"model":"model1", "stream": false}`
 	req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewBufferString(reqBody))
 	w := httptest.NewRecorder()
 	proxy.ServeHTTP(w, req)
 	assert.Equal(t, http.StatusOK, w.Code)
 	// Check that metrics were recorded
 	metrics := proxy.metricsMonitor.GetMetrics()
 	if !assert.NotEmpty(t, metrics, "metrics should be recorded for non-streaming request") {
 		return
 	}
 	// Verify the last metric has the correct model
 	lastMetric := metrics[len(metrics)-1]
 	assert.Equal(t, "model1", lastMetric.Model)
 	assert.Equal(t, 25, lastMetric.InputTokens, "input tokens should be 25")
 	assert.Equal(t, 10, lastMetric.OutputTokens, "output tokens should be 10")
 	assert.Greater(t, lastMetric.TokensPerSecond, 0.0, "tokens per second should be greater than 0")
 	assert.Greater(t, lastMetric.DurationMs, 0, "duration should be greater than 0")
 }
 func TestProxyManager_MiddlewareWritesMetrics_Streaming(t *testing.T) {
 	config := AddDefaultGroupToConfig(Config{
 		HealthCheckTimeout: 15,
 		Models: map[string]ModelConfig{
 			"model1": getTestSimpleResponderConfig("model1"),
 		},
 		LogLevel: "error",
 	})
 	proxy := New(config)
 	defer proxy.StopProcesses(StopWaitForInflightRequest)
 	// Make a streaming request
 	reqBody := `{"model":"model1", "stream": true}`
 	req := httptest.NewRequest("POST", "/v1/chat/completions?stream=true", bytes.NewBufferString(reqBody))
 	w := httptest.NewRecorder()
 	proxy.ServeHTTP(w, req)
 	assert.Equal(t, http.StatusOK, w.Code)
 	// Check that metrics were recorded
 	metrics := proxy.metricsMonitor.GetMetrics()
 	if !assert.NotEmpty(t, metrics, "metrics should be recorded for streaming request") {
 		return
 	}
 	// Verify the last metric has the correct model
 	lastMetric := metrics[len(metrics)-1]
 	assert.Equal(t, "model1", lastMetric.Model)
 	assert.Equal(t, 25, lastMetric.InputTokens, "input tokens should be 25")
 	assert.Equal(t, 10, lastMetric.OutputTokens, "output tokens should be 10")
 	assert.Greater(t, lastMetric.TokensPerSecond, 0.0, "tokens per second should be greater than 0")
 	assert.Greater(t, lastMetric.DurationMs, 0, "duration should be greater than 0")
 }
 func TestProxyManager_HealthEndpoint(t *testing.T) {
 	config := AddDefaultGroupToConfig(Config{
 		HealthCheckTimeout: 15,
 		Models: map[string]ModelConfig{
 			"model1": getTestSimpleResponderConfig("model1"),
 		},
 		LogLevel: "error",
 	})
 	proxy := New(config)
 	defer proxy.StopProcesses(StopWaitForInflightRequest)
 	req := httptest.NewRequest("GET", "/health", nil)
 	rec := httptest.NewRecorder()
 	proxy.ServeHTTP(rec, req)
 	assert.Equal(t, http.StatusOK, rec.Code)
 	assert.Equal(t, "OK", rec.Body.String())
 }
@@ -12,6 +12,8 @@
        "@tanstack/react-query": "^5.80.6",
        "react": "^19.1.0",
        "react-dom": "^19.1.0",
        "react-icons": "^5.5.0",
        "react-resizable-panels": "^3.0.4",
        "react-router-dom": "^7.6.2",
        "tailwindcss": "^4.1.8"
      },
@@ -3460,6 +3462,15 @@
        "react": "^19.1.0"
      }
    },
    "node_modules/react-icons": {
      "version": "5.5.0",
      "resolved": "https://registry.npmjs.org/react-icons/-/react-icons-5.5.0.tgz",
      "integrity": "sha512-MEFcXdkP3dLo8uumGI5xN3lDFNsRtrjbOEKDLD7yv76v4wpnEq2Lt2qeHaQOr34I/wPN3s3+N08WkQ+CW37Xiw==",
      "license": "MIT",
      "peerDependencies": {
        "react": "*"
      }
    },
    "node_modules/react-refresh": {
      "version": "0.17.0",
      "resolved": "https://registry.npmjs.org/react-refresh/-/react-refresh-0.17.0.tgz",
@@ -3470,6 +3481,16 @@
        "node": ">=0.10.0"
      }
    },
    "node_modules/react-resizable-panels": {
      "version": "3.0.4",
      "resolved": "https://registry.npmjs.org/react-resizable-panels/-/react-resizable-panels-3.0.4.tgz",
      "integrity": "sha512-8Y4KNgV94XhUvI2LeByyPIjoUJb71M/0hyhtzkHaqpVHs+ZQs8b627HmzyhmVYi3C9YP6R+XD1KmG7hHjEZXFQ==",
      "license": "MIT",
      "peerDependencies": {
        "react": "^16.14.0 || ^17.0.0 || ^18.0.0 || ^19.0.0 || ^19.0.0-rc",
        "react-dom": "^16.14.0 || ^17.0.0 || ^18.0.0 || ^19.0.0 || ^19.0.0-rc"
      }
    },
    "node_modules/react-router": {
      "version": "7.6.2",
      "resolved": "https://registry.npmjs.org/react-router/-/react-router-7.6.2.tgz",
@@ -14,6 +14,8 @@
    "@tanstack/react-query": "^5.80.6",
    "react": "^19.1.0",
    "react-dom": "^19.1.0",
    "react-icons": "^5.5.0",
    "react-resizable-panels": "^3.0.4",
    "react-router-dom": "^7.6.2",
    "tailwindcss": "^4.1.8"
  },
@@ -30,4 +32,4 @@
    "typescript-eslint": "^8.30.1",
    "vite": "^6.3.5"
  }
-}
+}
@@ -3,16 +3,19 @@ import { useTheme } from "./contexts/ThemeProvider";
 import { APIProvider } from "./contexts/APIProvider";
 import LogViewerPage from "./pages/LogViewer";
 import ModelPage from "./pages/Models";
 import ActivityPage from "./pages/Activity";
 import { RiSunFill, RiMoonFill } from "react-icons/ri";
 function App() {
-  const theme = useTheme();
+  const { isNarrow, toggleTheme, isDarkMode } = useTheme();
  return (
    <Router basename="/ui/">
      <APIProvider>
-        <div>
+        <div className="flex flex-col h-screen">
          <nav className="bg-surface border-b border-border p-2 h-[75px]">
            <div className="flex items-center justify-between mx-auto px-4 h-full">
-              <h1 className="flex items-center p-0">llama-swap</h1>
+              {!isNarrow && <h1 className="flex items-center p-0">llama-swap</h1>}
              <div className="flex items-center space-x-4">
                <NavLink to="/" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
                  Logs
@@ -21,17 +24,22 @@ function App() {
                <NavLink to="/models" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
                  Models
                </NavLink>
-                <button className="btn btn--sm" onClick={theme.toggleTheme}>
+
-                  {theme.isDarkMode ? "🌙" : "☀️"}
+                <NavLink to="/activity" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
                  Activity
                </NavLink>
                <button className="" onClick={toggleTheme}>
                  {isDarkMode ? <RiMoonFill /> : <RiSunFill />}
                </button>
              </div>
            </div>
          </nav>
-          <main className="mx-auto py-4 px-4">
+          <main className="flex-1 overflow-auto p-4">
            <Routes>
              <Route path="/" element={<LogViewerPage />} />
              <Route path="/models" element={<ModelPage />} />
              <Route path="/activity" element={<ActivityPage />} />
              <Route path="*" element={<Navigate to="/" replace />} />
            </Routes>
          </main>
@@ -19,30 +19,41 @@ interface APIProviderType {
  enableAPIEvents: (enabled: boolean) => void;
  proxyLogs: string;
  upstreamLogs: string;
  metrics: Metrics[];
 }
 interface Metrics {
  id: number;
  timestamp: string;
  model: string;
  input_tokens: number;
  output_tokens: number;
  tokens_per_second: number;
  duration_ms: number;
 }
 interface LogData {
  source: "upstream" | "proxy";
  data: string;
 }
 interface APIEventEnvelope {
-  type: "modelStatus" | "logData";
+  type: "modelStatus" | "logData" | "metrics";
  data: string;
 }
 const APIContext = createContext<APIProviderType | undefined>(undefined);
 type APIProviderProps = {
  children: ReactNode;
  autoStartAPIEvents?: boolean;
 };
-export function APIProvider({ children }: APIProviderProps) {
+export function APIProvider({ children, autoStartAPIEvents = true }: APIProviderProps) {
  const [proxyLogs, setProxyLogs] = useState("");
  const [upstreamLogs, setUpstreamLogs] = useState("");
-  const proxyEventSource = useRef<EventSource | null>(null);
+  const [metrics, setMetrics] = useState<Metrics[]>([]);
  const upstreamEventSource = useRef<EventSource | null>(null);
  const apiEventSource = useRef<EventSource | null>(null);
  const [models, setModels] = useState<Model[]>([]);
  const modelStatusEventSource = useRef<EventSource | null>(null);
  const appendLog = useCallback((newData: string, setter: React.Dispatch<React.SetStateAction<string>>) => {
    setter((prev) => {
@@ -55,6 +66,7 @@ export function APIProvider({ children }: APIProviderProps) {
    if (!enabled) {
      apiEventSource.current?.close();
      apiEventSource.current = null;
      setMetrics([]);
      return;
    }
@@ -71,11 +83,17 @@ export function APIProvider({ children }: APIProviderProps) {
            case "modelStatus":
              {
                const models = JSON.parse(message.data) as Model[];
                // sort models by name and id
                models.sort((a, b) => {
                  return (a.name + a.id).localeCompare(b.name + b.id);
                });
                setModels(models);
              }
              break;
-            case "logData": {
+            case "logData":
              const logData = JSON.parse(message.data) as LogData;
              switch (logData.source) {
                case "proxy":
@@ -85,7 +103,16 @@ export function APIProvider({ children }: APIProviderProps) {
                  appendLog(logData.data, setUpstreamLogs);
                  break;
              }
-            }
+              break;
            case "metrics":
              {
                const newMetric = JSON.parse(message.data) as Metrics;
                setMetrics((prevMetrics) => {
                  return [newMetric, ...prevMetrics];
                });
              }
              break;
          }
        } catch (err) {
          console.error(e.data, err);
@@ -105,12 +132,14 @@ export function APIProvider({ children }: APIProviderProps) {
  }, []);
  useEffect(() => {
    if (autoStartAPIEvents) {
      enableAPIEvents(true);
    }
    return () => {
-      proxyEventSource.current?.close();
+      enableAPIEvents(false);
      upstreamEventSource.current?.close();
      modelStatusEventSource.current?.close();
    };
-  }, []);
+  }, [enableAPIEvents, autoStartAPIEvents]);
  const listModels = useCallback(async (): Promise<Model[]> => {
    try {
@@ -163,8 +192,9 @@ export function APIProvider({ children }: APIProviderProps) {
      enableAPIEvents,
      proxyLogs,
      upstreamLogs,
      metrics,
    }),
-    [models, listModels, unloadAllModels, loadModel, enableAPIEvents, proxyLogs, upstreamLogs]
+    [models, listModels, unloadAllModels, loadModel, enableAPIEvents, proxyLogs, upstreamLogs, metrics]
  );
  return <APIContext.Provider value={value}>{children}</APIContext.Provider>;
@@ -1,8 +1,11 @@
-import { createContext, useContext, useEffect, type ReactNode } from "react";
+import { createContext, useContext, useEffect, type ReactNode, useMemo, useState } from "react";
 import { usePersistentState } from "../hooks/usePersistentState";
 type ScreenWidth = "xs" | "sm" | "md" | "lg" | "xl" | "2xl";
 type ThemeContextType = {
  isDarkMode: boolean;
  screenWidth: ScreenWidth;
  isNarrow: boolean;
  toggleTheme: () => void;
 };
@@ -14,14 +17,46 @@ type ThemeProviderProps = {
 export function ThemeProvider({ children }: ThemeProviderProps) {
  const [isDarkMode, setIsDarkMode] = usePersistentState<boolean>("theme", false);
  const [screenWidth, setScreenWidth] = useState<ScreenWidth>("md"); // Default to md
  // matches tailwind classes
  // https://tailwindcss.com/docs/responsive-design
  useEffect(() => {
    const checkInnerWidth = () => {
      const innerWidth = window.innerWidth;
      if (innerWidth < 640) {
        setScreenWidth("xs");
      } else if (innerWidth < 768) {
        setScreenWidth("sm");
      } else if (innerWidth < 1024) {
        setScreenWidth("md");
      } else if (innerWidth < 1280) {
        setScreenWidth("lg");
      } else if (innerWidth < 1536) {
        setScreenWidth("xl");
      } else {
        setScreenWidth("2xl");
      }
    };
    checkInnerWidth();
    window.addEventListener("resize", checkInnerWidth);
    return () => window.removeEventListener("resize", checkInnerWidth);
  }, []);
  useEffect(() => {
    document.documentElement.setAttribute("data-theme", isDarkMode ? "dark" : "light");
  }, [isDarkMode]);
  const toggleTheme = () => setIsDarkMode((prev) => !prev);
  const isNarrow = useMemo(() => {
    return screenWidth === "xs" || screenWidth === "sm" || screenWidth === "md";
  }, [screenWidth]);
-  return <ThemeContext.Provider value={{ isDarkMode, toggleTheme }}>{children}</ThemeContext.Provider>;
+  return (
    <ThemeContext.Provider value={{ isDarkMode, toggleTheme, screenWidth, isNarrow }}>{children}</ThemeContext.Provider>
  );
 }
 export function useTheme(): ThemeContextType {
@@ -1,18 +0,0 @@
 export function processEvalTimes(text: string) {
  const lines = text.match(/^ *eval time.*$/gm) || [];
  let totalTokens = 0;
  let totalTime = 0;
  lines.forEach((line) => {
    const tokensMatch = line.match(/\/\s*(\d+)\s*tokens/);
    const timeMatch = line.match(/=\s*(\d+\.\d+)\s*ms/);
    if (tokensMatch) totalTokens += parseFloat(tokensMatch[1]);
    if (timeMatch) totalTime += parseFloat(timeMatch[1]);
  });
  const avgTokensPerSecond = totalTime > 0 ? totalTokens / (totalTime / 1000) : 0;
  return [lines.length, totalTokens, Math.round(avgTokensPerSecond * 100) / 100];
 }
@@ -0,0 +1,77 @@
 import { useState, useEffect } from "react";
 import { useAPI } from "../contexts/APIProvider";
 const formatTimestamp = (timestamp: string): string => {
  return new Date(timestamp).toLocaleString();
 };
 const formatSpeed = (speed: number): string => {
  return speed < 0 ? "unknown" : speed.toFixed(2) + " t/s";
 };
 const formatDuration = (ms: number): string => {
  return (ms / 1000).toFixed(2) + "s";
 };
 const ActivityPage = () => {
  const { metrics } = useAPI();
  const [error, setError] = useState<string | null>(null);
  useEffect(() => {
    if (metrics.length > 0) {
      setError(null);
    }
  }, [metrics]);
  if (error) {
    return (
      <div className="p-6">
        <h1 className="text-2xl font-bold mb-4">Activity</h1>
        <div className="bg-red-50 border border-red-200 rounded-md p-4">
          <p className="text-red-800">{error}</p>
        </div>
      </div>
    );
  }
  return (
    <div className="p-6">
      <h1 className="text-2xl font-bold mb-4">Activity</h1>
      {metrics.length === 0 ? (
        <div className="text-center py-8">
          <p className="text-gray-600">No metrics data available</p>
        </div>
      ) : (
        <div className="overflow-x-auto">
          <table className="min-w-full divide-y">
            <thead>
              <tr>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Timestamp</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Model</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Input Tokens</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Output Tokens</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Generation Speed</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Duration</th>
              </tr>
            </thead>
            <tbody className="divide-y">
              {metrics.map((metric, index) => (
                <tr key={`${metric.id}-${index}`}>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatTimestamp(metric.timestamp)}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.model}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.input_tokens.toLocaleString()}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.output_tokens.toLocaleString()}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.tokens_per_second)}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatDuration(metric.duration_ms)}</td>
                </tr>
              ))}
            </tbody>
          </table>
        </div>
      )}
    </div>
  );
 };
 export default ActivityPage;
@@ -1,22 +1,38 @@
 import { useState, useEffect, useRef, useMemo, useCallback } from "react";
 import { useAPI } from "../contexts/APIProvider";
 import { usePersistentState } from "../hooks/usePersistentState";
 import { Panel, PanelGroup, PanelResizeHandle } from "react-resizable-panels";
 import {
  RiTextWrap,
  RiAlignJustify,
  RiFontSize,
  RiMenuSearchLine,
  RiMenuSearchFill,
  RiCloseCircleFill,
 } from "react-icons/ri";
 import { useTheme } from "../contexts/ThemeProvider";
 const LogViewer = () => {
-  const { proxyLogs, upstreamLogs, enableAPIEvents } = useAPI();
+  const { proxyLogs, upstreamLogs } = useAPI();
-
+  const { isNarrow } = useTheme();
-  useEffect(() => {
+  const direction = isNarrow ? "vertical" : "horizontal";
    enableAPIEvents(true);
    return () => {
      enableAPIEvents(false);
    };
  }, []);
  return (
-    <div className="flex flex-col gap-5" style={{ height: "calc(100vh - 125px)" }}>
+    <PanelGroup direction={direction} className="gap-2" autoSaveId="logviewer-panel-group">
-      <LogPanel id="proxy" title="Proxy Logs" logData={proxyLogs} />
+      <Panel id="proxy" defaultSize={50} minSize={5} maxSize={100} collapsible={true}>
-      <LogPanel id="upstream" title="Upstream Logs" logData={upstreamLogs} />
+        <LogPanel id="proxy" title="Proxy Logs" logData={proxyLogs} />
-    </div>
+      </Panel>
      <PanelResizeHandle
        className={
          direction === "horizontal"
            ? "w-2 h-full bg-primary hover:bg-success transition-colors rounded"
            : "w-full h-2 bg-primary hover:bg-success transition-colors rounded"
        }
      />
      <Panel id="upstream" defaultSize={50} minSize={5} maxSize={100} collapsible={true}>
        <LogPanel id="upstream" title="Upstream Logs" logData={upstreamLogs} />
      </Panel>
    </PanelGroup>
  );
 };
@@ -24,17 +40,15 @@ interface LogPanelProps {
  id: string;
  title: string;
  logData: string;
  className?: string;
 }
-
+export const LogPanel = ({ id, title, logData }: LogPanelProps) => {
 export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
  const [isCollapsed, setIsCollapsed] = usePersistentState(`logPanel-${id}-isCollapsed`, false);
  const [filterRegex, setFilterRegex] = useState("");
  const [fontSize, setFontSize] = usePersistentState<"xxs" | "xs" | "small" | "normal">(
    `logPanel-${id}-fontSize`,
    "normal"
  );
  const [wrapText, setTextWrap] = usePersistentState(`logPanel-${id}-wrapText`, false);
  const [showFilter, setShowFilter] = usePersistentState(`logPanel-${id}-showFilter`, false);
  const textWrapClass = useMemo(() => {
    return wrapText ? "whitespace-pre-wrap" : "whitespace-pre";
@@ -55,6 +69,19 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
    });
  }, []);
  const toggleWrapText = useCallback(() => {
    setTextWrap((prev) => !prev);
  }, []);
  const toggleFilter = useCallback(() => {
    if (showFilter) {
      setShowFilter(false);
      setFilterRegex(""); // Clear filter when closing
    } else {
      setShowFilter(true);
    }
  }, [filterRegex, setFilterRegex, showFilter]);
  const fontSizeClass = useMemo(() => {
    switch (fontSize) {
      case "xxs":
@@ -88,56 +115,47 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
  }, [filteredLogs]);
  return (
-    <div
+    <div className="bg-surface border border-border rounded-lg overflow-hidden flex flex-col h-full">
      className={`bg-surface border border-border rounded-lg overflow-hidden flex flex-col ${
        !isCollapsed && "h-full"
      } ${className || ""}`}
    >
      <div className="p-4 border-b border-border bg-secondary">
-        <div className="flex flex-col md:flex-row md:items-center md:justify-between gap-4">
+        <div className="flex items-center justify-between">
-          {/* Title - Always full width on mobile, normal on desktop */}
+          <h3 className="m-0 text-lg p-0">{title}</h3>
-          <div className="w-full md:w-auto" onClick={() => setIsCollapsed(!isCollapsed)}>
+
-            <h3 className="m-0 text-lg">{title}</h3>
+          <div className="flex gap-2 items-center">
            <button className="btn" onClick={toggleFontSize}>
              <RiFontSize />
            </button>
            <button className="btn" onClick={toggleWrapText}>
              {wrapText ? <RiTextWrap /> : <RiAlignJustify />}
            </button>
            <button className="btn" onClick={toggleFilter}>
              {showFilter ? <RiMenuSearchFill /> : <RiMenuSearchLine />}
            </button>
          </div>
        </div>
-          <div className="flex flex-col sm:flex-row gap-4 w-full md:w-auto">
+        {/* Filtering Options - Full width on mobile, normal on desktop */}
-            {/* Sizing Buttons - Stacks vertically on mobile */}
+        {showFilter && (
-            <div className="flex flex-wrap gap-2">
+          <div className="mt-2 w-full">
-              <button className="btn" onClick={toggleFontSize}>
+            <div className="flex gap-2 items-center w-full">
                font: {fontSize}
              </button>
              <button className="btn" onClick={() => setTextWrap((prev) => !prev)}>
                {wrapText ? "wrap" : "wrap off"}
              </button>
            </div>
            {/* Filtering Options - Full width on mobile, normal on desktop */}
            <div className="flex flex-1 min-w-0 gap-2">
              <input
                type="text"
-                className="flex-1 min-w-[120px] text-sm border p-2 rounded"
+                className="w-full text-sm border p-2 rounded"
                placeholder="Filter logs..."
                value={filterRegex}
                onChange={(e) => setFilterRegex(e.target.value)}
              />
-              <button className="btn" onClick={() => setFilterRegex("")}>
+              <button className="pl-2" onClick={() => setFilterRegex("")}>
-                Clear
+                <RiCloseCircleFill size="24" />
              </button>
            </div>
          </div>
-        </div>
+        )}
      </div>
      <div className="bg-background font-mono text-sm flex-1 overflow-hidden">
        <pre ref={preTagRef} className={`${textWrapClass} ${fontSizeClass} h-full overflow-auto p-4`}>
          {filteredLogs}
        </pre>
      </div>
      {!isCollapsed && (
        <div className="flex-1 bg-background font-mono text-sm p-3 overflow-hidden">
          <pre
            ref={preTagRef}
            className={`h-full p-4 overflow-y-auto whitespace-pre min-h-0 ${textWrapClass} ${fontSizeClass}`}
          >
            {filteredLogs}
          </pre>
        </div>
      )}
    </div>
  );
 };
@@ -1,25 +1,51 @@
-import { useState, useEffect, useCallback, useMemo } from "react";
+import { useState, useCallback, useMemo } from "react";
 import { useAPI } from "../contexts/APIProvider";
 import { LogPanel } from "./LogViewer";
 import { processEvalTimes } from "../lib/Utils";
 import { usePersistentState } from "../hooks/usePersistentState";
 import { Panel, PanelGroup, PanelResizeHandle } from "react-resizable-panels";
 import { useTheme } from "../contexts/ThemeProvider";
 import { RiEyeFill, RiEyeOffFill, RiStopCircleLine, RiSwapBoxFill } from "react-icons/ri";
 export default function ModelsPage() {
-  const { models, unloadAllModels, loadModel, upstreamLogs, enableAPIEvents } = useAPI();
+  const { isNarrow } = useTheme();
  const direction = isNarrow ? "vertical" : "horizontal";
  const { upstreamLogs } = useAPI();
  return (
    <PanelGroup direction={direction} className="gap-2" autoSaveId={"models-panel-group"}>
      <Panel id="models" defaultSize={50} minSize={isNarrow ? 0 : 25} maxSize={100} collapsible={isNarrow}>
        <ModelsPanel />
      </Panel>
      <PanelResizeHandle
        className={
          direction === "horizontal"
            ? "w-2 h-full bg-primary hover:bg-success transition-colors rounded"
            : "w-full h-2 bg-primary hover:bg-success transition-colors rounded"
        }
      />
      <Panel collapsible={true} defaultSize={50} minSize={0}>
        <div className="flex flex-col h-full space-y-4">
          {direction === "horizontal" && <StatsPanel />}
          <div className="flex-1 min-h-0">
            <LogPanel id="modelsupstream" title="Upstream Logs" logData={upstreamLogs} />
          </div>
        </div>
      </Panel>
    </PanelGroup>
  );
 }
 function ModelsPanel() {
  const { models, loadModel, unloadAllModels } = useAPI();
  const [isUnloading, setIsUnloading] = useState(false);
  const [showUnlisted, setShowUnlisted] = usePersistentState("showUnlisted", true);
  const [showIdorName, setShowIdorName] = usePersistentState<"id" | "name">("showIdorName", "id"); // true = show ID, false = show name
  const filteredModels = useMemo(() => {
    return models.filter((model) => showUnlisted || !model.unlisted);
  }, [models, showUnlisted]);
  useEffect(() => {
    enableAPIEvents(true);
    return () => {
      enableAPIEvents(false);
    };
  }, []);
  const handleUnloadAllModels = useCallback(async () => {
    setIsUnloading(true);
    try {
@@ -27,98 +53,120 @@ export default function ModelsPage() {
    } catch (e) {
      console.error(e);
    } finally {
      // at least give it a second to show the unloading message
      setTimeout(() => {
        setIsUnloading(false);
      }, 1000);
    }
-  }, []);
+  }, [unloadAllModels]);
-  const [totalLines, totalTokens, avgTokensPerSecond] = useMemo(() => {
+  const toggleIdorName = useCallback(() => {
-    return processEvalTimes(upstreamLogs);
+    setShowIdorName((prev) => (prev === "name" ? "id" : "name"));
-  }, [upstreamLogs]);
+  }, [showIdorName]);
  return (
-    <div>
+    <div className="card h-full flex flex-col">
-      <div className="flex flex-col md:flex-row gap-4">
+      <div className="shrink-0">
-        {/* Left Column */}
+        <h2>Models</h2>
-        <div className="w-full md:w-1/2 flex items-top">
+        <div className="flex justify-between">
-          <div className="card w-full">
+          <div className="flex gap-2">
-            <h2 className="">Models</h2>
+            <button className="btn flex items-center gap-2" onClick={toggleIdorName} style={{ lineHeight: "1.2" }}>
-            <div className="flex justify-between">
+              <RiSwapBoxFill /> {showIdorName === "id" ? "ID" : "Name"}
-              <button className="btn" onClick={() => setShowUnlisted(!showUnlisted)} style={{ lineHeight: "1.2" }}>
+            </button>
                {showUnlisted ? "🟢 unlisted" : "⚫️ unlisted"}
              </button>
              <button className="btn" onClick={handleUnloadAllModels} disabled={isUnloading}>
                {isUnloading ? "Stopping ..." : "Stop All"}
              </button>
            </div>
-            <table className="w-full mt-4">
+            <button
-              <thead>
+              className="btn flex items-center gap-2"
-                <tr className="border-b border-primary">
+              onClick={() => setShowUnlisted(!showUnlisted)}
-                  <th className="text-left p-2">Name</th>
+              style={{ lineHeight: "1.2" }}
-                  <th className="text-left p-2"></th>
+            >
-                  <th className="text-left p-2">State</th>
+              {showUnlisted ? <RiEyeFill /> : <RiEyeOffFill />} unlisted
-                </tr>
+            </button>
              </thead>
              <tbody>
                {filteredModels.map((model) => (
                  <tr key={model.id} className="border-b hover:bg-secondary-hover border-border">
                    <td className="p-2">
                      <a href={`/upstream/${model.id}/`} className="underline" target="_blank">
                        {model.name !== "" ? model.name : model.id}
                      </a>
                      {model.description != "" && (
                        <p>
                          <em>{model.description}</em>
                        </p>
                      )}
                    </td>
                    <td className="p-2 w-[50px]">
                      <button
                        className="btn btn--sm"
                        disabled={model.state !== "stopped"}
                        onClick={() => loadModel(model.id)}
                      >
                        Load
                      </button>
                    </td>
                    <td className="p-2 w-[75px]">
                      <span className={`status status--${model.state}`}>{model.state}</span>
                    </td>
                  </tr>
                ))}
              </tbody>
            </table>
          </div>
          <button className="btn flex items-center gap-2" onClick={handleUnloadAllModels} disabled={isUnloading}>
            <RiStopCircleLine size="24" /> {isUnloading ? "Unloading..." : "Unload"}
          </button>
        </div>
      </div>
-        {/* Right Column */}
+      <div className="flex-1 overflow-y-auto">
-        <div className="w-full md:w-1/2 flex flex-col" style={{ height: "calc(100vh - 125px)" }}>
+        <table className="w-full">
-          <div className="card mb-4 min-h-[250px]">
+          <thead className="sticky top-0 bg-card z-10">
-            <h2>Log Stats</h2>
+            <tr className="border-b border-primary bg-surface">
-            <p className="italic my-2">note: eval logs from llama-server</p>
+              <th className="text-left p-2">{showIdorName === "id" ? "Model ID" : "Name"}</th>
-            <table className="w-full border border-gray-200">
+              <th className="text-left p-2"></th>
-              <tbody>
+              <th className="text-left p-2">State</th>
-                <tr className="border-b border-gray-200">
+            </tr>
-                  <td className="py-2 px-4 font-medium border-r border-gray-200">Requests</td>
+          </thead>
-                  <td className="py-2 px-4 text-right">{totalLines}</td>
+          <tbody>
-                </tr>
+            {filteredModels.map((model) => (
-                <tr className="border-b border-gray-200">
+              <tr key={model.id} className="border-b hover:bg-secondary-hover border-border">
-                  <td className="py-2 px-4 font-medium border-r border-gray-200">Total Tokens Generated</td>
+                <td className={`p-2 ${model.unlisted ? "text-txtsecondary" : ""}`}>
-                  <td className="py-2 px-4 text-right">{totalTokens}</td>
+                  <a href={`/upstream/${model.id}/`} className={`underline`} target="_blank">
-                </tr>
+                    {showIdorName === "id" ? model.id : model.name !== "" ? model.name : model.id}
-                <tr>
+                  </a>
-                  <td className="py-2 px-4 font-medium border-r border-gray-200">Average Tokens/Second</td>
+                  {model.description !== "" && (
-                  <td className="py-2 px-4 text-right">{avgTokensPerSecond}</td>
+                    <p className={model.unlisted ? "text-opacity-70" : ""}>
-                </tr>
+                      <em>{model.description}</em>
-              </tbody>
+                    </p>
-            </table>
+                  )}
-          </div>
+                </td>
-
+                <td className="p-2 w-[50px]">
-          <LogPanel id="modelsupstream" title="Upstream Logs" logData={upstreamLogs} />
+                  <button
-        </div>
+                    className="btn btn--sm"
                    disabled={model.state !== "stopped"}
                    onClick={() => loadModel(model.id)}
                  >
                    Load
                  </button>
                </td>
                <td className="p-2 w-[75px]">
                  <span className={`status status--${model.state}`}>{model.state}</span>
                </td>
              </tr>
            ))}
          </tbody>
        </table>
      </div>
    </div>
  );
 }
 function StatsPanel() {
  const { metrics } = useAPI();
  const [totalRequests, totalInputTokens, totalOutputTokens, avgTokensPerSecond] = useMemo(() => {
    const totalRequests = metrics.length;
    if (totalRequests === 0) {
      return [0, 0, 0];
    }
    const totalInputTokens = metrics.reduce((sum, m) => sum + m.input_tokens, 0);
    const totalOutputTokens = metrics.reduce((sum, m) => sum + m.output_tokens, 0);
    const avgTokensPerSecond = (metrics.reduce((sum, m) => sum + m.tokens_per_second, 0) / totalRequests).toFixed(2);
    return [totalRequests, totalInputTokens, totalOutputTokens, avgTokensPerSecond];
  }, [metrics]);
  return (
    <div className="card">
      <div className="rounded-lg overflow-hidden border border-gray-200">
        <table className="w-full">
          <tbody>
            <tr>
              <th className="p-2 font-medium border-b border-gray-200 text-right">Requests</th>
              <th className="p-2 font-medium border-l border-b border-gray-200 text-right">Processed</th>
              <th className="p-2 font-medium border-l border-b border-gray-200 text-right">Generated</th>
              <th className="p-2 font-medium border-l border-b border-gray-200 text-right">Tokens/Sec</th>
            </tr>
            <tr>
              <td className="p-2 text-right border-r border-gray-200">{totalRequests}</td>
              <td className="p-2 text-right border-r border-gray-200">
                {new Intl.NumberFormat().format(totalInputTokens)}
              </td>
              <td className="p-2 text-right border-r border-gray-200">
                {new Intl.NumberFormat().format(totalOutputTokens)}
              </td>
              <td className="p-2 text-right">{avgTokensPerSecond}</td>
            </tr>
          </tbody>
        </table>
      </div>
    </div>
  );
Author	SHA1	Message	Date
Benson Wong	7985e94ba4	add tokens processed to ui models page	2025-08-08 13:28:39 -07:00
Benson Wong	74556c3a36	Update bug-report.md [skip ci]	2025-08-08 09:52:05 -07:00
Benson Wong	5c381e4b30	Add gofmt linting to ci	2025-08-07 20:29:18 -07:00
Benson Wong	10569ed546	Fix model alias usage in upstream path (#230 ) Model alias values are not properly resolved and work in upstream/ path. Related to #229.	2025-08-07 20:16:56 -07:00
Benson Wong	5b10b3c23f	UI Tweaks (#228 ) * sort model names in UI * add toggle to show model id/name on UI model page	2025-08-07 11:07:03 -07:00
Benson Wong	45ea792a3a	Fix UI panel not saving position correctly	2025-08-06 14:02:22 -07:00
Benson Wong	1bc2802353	fix panels not saving sizing state	2025-08-06 14:00:21 -07:00
Benson Wong	701476c0c4	Update README.md - remove contributor block [skip ci] Contributor information available on the Github page's sidebar. Redundant.	2025-08-06 11:11:47 -07:00
Ben Greene	5c63e0066c	return models sorted by id in /v1/models (#222 )	2025-08-06 10:04:52 -07:00
Martin Garton	8be5073c51	Fix typo (#223 ) [skip ci] Fix typo `lama-swap` -> `llama-swap`	2025-08-06 10:02:38 -07:00
Aaron Ang	6307bd3205	Add support for building Linux ARM64 binary in Makefile (#221 )	2025-08-05 16:26:06 -07:00
Benson Wong	558a72de17	UI Improvements (#219 ) - use react-resizable-panels for UI - improve icons for buttons - improve mobile layout with drag/resize panels	2025-08-03 17:49:13 -07:00
Leoyzen	dc42cf366d	Add config monitor support for k8s configmap. (#217 )	2025-08-03 08:05:48 -07:00
Ryein Goddard	ba0a81937a	Update README.md (#216 ) Update git clone protocol to https	2025-08-01 19:48:09 -07:00
Benson Wong	574fdfabb4	UI improvements (#213 ) * use two column for logs view on wider screens * hide log controls when panel is minimized	2025-07-31 11:59:21 -07:00
Benson Wong	5172cb2e12	Update docs in Readme [skip ci]	2025-07-30 11:51:14 -07:00
Benson Wong	5672cb03fd	Update github actions for notifying homebrew build (#212 ) Combine homebrew-llama-swap event with the release action	2025-07-30 11:29:03 -07:00
Benson Wong	0f583163f7	add /health (#211 )	2025-07-30 10:37:10 -07:00
Benson Wong	7905fa9ea3	Update trigger-homebrew-update.yml [skip ci]	2025-07-30 10:13:49 -07:00
Ian Sebastian Mathew	bbaf172956	add trigger to rebuild homebrew formula (#210 )	2025-07-30 10:12:21 -07:00
Benson Wong	fd50932dbc	Decouple MetricsMiddleware from downstream handlers (#206 ) * Decouple MetricsMiddleware from downstream handlers Remove ls-real-model-name optimization. Within proxyOAIHandler the request body's bytes are required for various rewriting features anyways. This negated any benefits from trying not to parse it twice.	2025-07-27 10:36:06 -07:00
Gaël James	8c693e7fcf	Add endpoint aliases for reranking models (#201 ) * Add endpoint aliases for reranking models * Add MetricsMiddleware to the previous reranking endpoint * Fix the embeddings endpoint not having model set	2025-07-24 08:32:47 -07:00
Benson Wong	8f2af26a41	fix stats on model page	2025-07-23 13:57:33 -07:00
Benson Wong	01d4838fb3	Fix token metrics parsing (#199 ) Fix #198 - use llama-server's `timings` info if available in response body - send "-1" for token/sec when not able to accurately calculate performance - optimize streaming body search for metrics information	2025-07-22 23:10:14 -07:00
Benson Wong	accd65294b	add contributors to README [skip ci]	2025-07-21 23:16:48 -07:00
Benson Wong	7472a25864	Update README.md [skip ci] update screenshot for web UI	2025-07-21 23:08:19 -07:00
Benson Wong	cce0bc6aa1	add guard to ensure ls-real-model-name is set in context	2025-07-21 22:59:41 -07:00
Benson Wong	36e25125e8	UI tidy [skip ci]	2025-07-21 22:47:55 -07:00
Benson Wong	9a54273d15	Update UI with new Activity event stream from #195 - use new metrics data instead of log parsing - auto-start events connection to server, improves responsiveness - remove unnecessary libraries and code	2025-07-21 22:42:30 -07:00
g2mt	87dce5f8f6	Add metrics logging for chat completion requests (#195 ) - Add token and performance metrics for v1/chat/completions - Add Activity Page in UI - Add /api/metrics endpoint Contributed by @g2mt	2025-07-21 22:19:55 -07:00
Benson Wong	307e619521	remove old eventsources from UI	2025-07-19 15:36:40 -07:00