proxy: implement setParamsByID filter (#535 )

Add setParamsByID filter that applies different request parameters based on the requested model ID, enabling per-alias behaviour for a single loaded model. - add SetParamsByID field to Filters struct and SanitizedSetParamsByID method - substitute ${MODEL_ID} and other macros in setParamsByID keys and values - validate no unknown macros remain in keys or values after substitution - apply setParamsByID in proxyInferenceHandler after setParams (can override it) - update config-schema.json with setParamsByID definition - update UI to show aliases and make them selectable in the Playground closes #534
ui: smart auto-scroll in LogPanel (#530 )
2026-02-19 22:21:10 -08:00 · 2026-02-18 19:47:37 -08:00 · 2026-02-16 09:41:15 -08:00 · 2026-02-15 21:31:30 -08:00 · 2026-02-15 21:30:52 -08:00 · 2026-02-15 11:00:44 -08:00
35 changed files with 1257 additions and 134 deletions
@@ -4,7 +4,7 @@ early_access: false
 reviews:
  profile: "chill"
  request_changes_workflow: false
-  high_level_summary: true
+  high_level_summary: false
  poem: false
  review_status: true
  collapse_walkthrough: false
@@ -17,6 +17,13 @@ on:
      - 'docker/build-container.sh'
      - 'docker/*.Containerfile'

+# grant permissions on GITHUB_TOKEN to publish packages
+# ref: https://docs.github.com/en/packages/managing-github-packages-using-github-actions-workflows/publishing-and-installing-a-package-with-github-actions#publishing-a-package-using-an-action
+permissions:
+  contents: read
+  packages: write
+  id-token: write
+
 jobs:
  build-and-push:
    runs-on: ubuntu-latest
@@ -0,0 +1,50 @@
+## Project Description:
+
+llama-swap is a light weight, transparent proxy server that provides automatic model swapping to llama.cpp's server.
+
+## Tech stack
+
+- golang
+- typescript, vite and svelt5 for UI (located in ui/)
+
+## Workflow Tasks
+
+- when summarizing changes only include details that require further action
+- just say "Done." when there is no further action
+- use the github CLI `gh` to create pull requests and work with github
+- Rules for creating pull requests:
+  - keep them short and focused on changes.
+  - never include a test plan
+  - write the summary using the same style rules as commit message
+
+## Testing
+
+- Follow test naming conventions like `TestProxyManager_<test name>`, `TestProcessGroup_<test name>`, etc.
+- Use `go test -v -run <name pattern for new tests>` to run any new tests you've written.
+- Use `make test-dev` after running new tests for a quick over all test run. This runs `go test` and `staticcheck`. Fix any static checking errors. Use this only when changes are made to any code under the `proxy/` directory
+- Use `make test-all` before completing work. This includes long running concurrency tests.
+
+### Commit message example format:
+
+```
+proxy: add new feature
+
+Add new feature that implements functionality X and Y.
+
+- key change 1
+- key change 2
+- key change 3
+
+fixes #123
+```
+
+## Code Reviews
+
+- use three levels High, Medium, Low severity
+- label each discovered issue with a label like H1, M2, L3 respectively
+- High severity are must fix issues (security, race conditions, critical bugs)
+- Medium severity are recommended improvements (coding style, missing functionality, inconsistencies)
+- Low severity are nice to have changes and nits
+- Include a suggestion with each discovered item
+- Limit your code review to three items with the highest priority first
+- Double check your discovered items and recommended remediations
@@ -1,49 +1 @@
-## Project Description:
-
-llama-swap is a light weight, transparent proxy server that provides automatic model swapping to llama.cpp's server.
-
-## Tech stack
-
- golang
- typescript, vite and react for UI (located in ui/)
-
-## Workflow Tasks
-
- when summarizing changes only include details that require further action
- just say "Done." when there is no further action
- use `gh` to create PRs and load issues
- do include Co-Authored-By or created by when committing changes or creating PRs
- keep PR descriptions short and focused on changes.
-  - never include a test plan
-
-## Testing
-
- Follow test naming conventions like `TestProxyManager_<test name>`, `TestProcessGroup_<test name>`, etc.
- Use `go test -v -run <name pattern for new tests>` to run any new tests you've written.
- Use `make test-dev` after running new tests for a quick over all test run. This runs `go test` and `staticcheck`. Fix any static checking errors. Use this only when changes are made to any code under the `proxy/` directory
- Use `make test-all` before completing work. This includes long running concurrency tests.
-
-### Commit message example format:
-
-```
-proxy: add new feature
-
-Add new feature that implements functionality X and Y.
-
- key change 1
- key change 2
- key change 3
-
-fixes #123
-```
-
-## Code Reviews
-
- use three levels High, Medium, Low severity
- label each discovered issue with a label like H1, M2, L3 respectively
- High severity are must fix issues (security, race conditions, critical bugs)
- Medium severity are recommended improvements (coding style, missing functionality, inconsistencies)
- Low severity are nice to have changes and nits
- Include a suggestion with each discovered item
- Limit your code review to three items with the highest priority first
- Double check your discovered items and recommended remediations
+@AGENTS.md
@@ -200,11 +200,20 @@
                                "additionalProperties": true,
                                "default": {},
                                "description": "Dictionary of parameters to set/override in requests. Useful for enforcing specific parameter values. Protected params like 'model' cannot be overridden. Values can be strings, numbers, booleans, arrays, or objects."
+                            },
+                            "setParamsByID": {
+                                "type": "object",
+                                "additionalProperties": {
+                                    "type": "object",
+                                    "additionalProperties": true
+                                },
+                                "default": {},
+                                "description": "Dictionary mapping requested model IDs (or aliases) to parameters to set/override in requests. Applied after setParams and can override those values. Useful with aliases to vary behaviour depending on which alias the client used (e.g. different reasoning_effort per alias). Keys support ${MODEL_ID} macro substitution. Protected params like 'model' cannot be overridden."
                            }
                        },
                        "additionalProperties": false,
                        "default": {},
-                        "description": "Dictionary of filter settings. Supports stripParams and setParams."
+                        "description": "Dictionary of filter settings. Supports stripParams, setParams, and setParamsByID."
                    },
                    "metadata": {
                        "type": "object",
@@ -126,7 +126,7 @@ apiKeys:
 # - below are examples of the all the settings a model can have
 models:
  # keys are the model names used in API requests
-  "llama":
+  "gpt-oss-120b":
    # macros: a dictionary of string substitutions specific to this model
    # - optional, default: empty dictionary
    # - macros defined here override macros defined in the global macros section
@@ -143,7 +143,7 @@ models:
    cmd: |
      # ${latest-llama} is a macro that is defined above
      ${latest-llama}
-      --model path/to/llama-8B-Q4_K_M.gguf
+      --model path/to/gpt-oss-120B.gguf
      --ctx-size ${default_ctx}
      --temperature ${temp}

@@ -151,13 +151,13 @@ models:
    # - optional, default: empty string
    # - if set, it will be used in the v1/models API response
    # - if not set, it will be omitted in the JSON model record
-    name: "llama 3.1 8B"
+    name: "gpt-oss 120B"

    # description: a description for the model
    # - optional, default: empty string
    # - if set, it will be used in the v1/models API response
    # - if not set, it will be omitted in the JSON model record
-    description: "A small but capable model used for quick testing"
+    description: "A thinking model from OpenAI"

    # env: define an array of environment variables to inject into cmd's environment
    # - optional, default: empty array
@@ -172,14 +172,6 @@ models:
    # - if you use a custom port in cmd this *must* be set
    proxy: http://127.0.0.1:8999

-    # aliases: alternative model names that this model configuration is used for
-    # - optional, default: empty array
-    # - aliases must be unique globally
-    # - useful for impersonating a specific model
-    aliases:
-      - "gpt-4o-mini"
-      - "gpt-3.5-turbo"
-
    # checkEndpoint: URL path to check if the server is ready
    # - optional, default: /health
    # - endpoint is expected to return an HTTP 200 response
@@ -197,7 +189,7 @@ models:
    # - optional, default: ""
    # - useful for when the upstream server expects a specific model name that
    #   is different from the model's ID
-    useModelName: "qwen:qwq"
+    useModelName: "openai/gpt-oss-120B"

    # filters: a dictionary of filter settings
    # - optional, default: empty dictionary
@@ -216,11 +208,38 @@ models:
      # - useful for enforcing specific parameter values
      # - protected params like "model" cannot be overridden
      # - values can be strings, numbers, booleans, arrays, or objects
+      # - always runs for the model
      setParams:
        # Example: enforce specific sampling parameters
        temperature: 0.7
        top_p: 0.9

+      # setParamsByID: a dictionary of parameters to set based the model ID
+      # - optional, default: empty dictionary
+      # - combine with aliases to create variant behaviour without reloading the model
+      # - parameters are set in the request body JSON
+      # - run after setParams so it will override any settings
+      # - protected params like "model" cannot be overridden
+      # - values can be strings, numbers, booleans, arrays, or objects
+      # - model aliases will be automatically created for each key
+      setParamsByID:
+        "${MODEL_ID}":
+          chat_template_kwargs:
+            reasoning_effort: medium
+        "${MODEL_ID}:high":
+          chat_template_kwargs:
+            reasoning_effort: high
+        "${MODEL_ID}:low":
+          chat_template_kwargs:
+            reasoning_effort: low
+
+    # aliases: alternative model names that this model configuration is used for
+    # - optional, default: empty array
+    # - aliases must be unique globally
+    # - useful for impersonating a specific model
+    aliases:
+      - "gpt-4o-mini"
+
    # metadata: a dictionary of arbitrary values that are included in /v1/models
    # - optional, default: empty dictionary
    # - while metadata can contains complex types it is recommended to keep it simple
@@ -142,7 +142,7 @@ for CONTAINER_TYPE in non-root root; do
  fi

  log_info "Building $CONTAINER_TYPE $CONTAINER_TAG $LS_VER"
-  docker build -f llama-swap.Containerfile --build-arg BASE_TAG=${BASE_TAG} --build-arg LS_VER=${LS_VER} --build-arg UID=${USER_UID} \
+  docker build --provenance=false -f llama-swap.Containerfile --build-arg BASE_TAG=${BASE_TAG} --build-arg LS_VER=${LS_VER} --build-arg UID=${USER_UID} \
    --build-arg LS_REPO=${LS_REPO} --build-arg GID=${USER_GID} --build-arg USER_HOME=${USER_HOME} -t ${CONTAINER_TAG} -t ${CONTAINER_LATEST} \
    --build-arg BASE_IMAGE=${BASE_IMAGE} .

@@ -150,7 +150,7 @@ for CONTAINER_TYPE in non-root root; do
  case "$ARCH" in
    "musa" | "vulkan")
      log_info "Adding sd-server to $CONTAINER_TAG"
-      docker build -f llama-swap-sd.Containerfile \
+      docker build --provenance=false -f llama-swap-sd.Containerfile \
        --build-arg BASE=${CONTAINER_TAG} \
        --build-arg SD_IMAGE=${SD_IMAGE} --build-arg SD_TAG=${SD_TAG} \
        --build-arg UID=${USER_UID} --build-arg GID=${USER_GID} \
@@ -294,6 +294,24 @@ func LoadConfigFromReader(r io.Reader) (Config, error) {
 			modelConfig.CheckEndpoint = strings.ReplaceAll(modelConfig.CheckEndpoint, macroSlug, macroStr)
 			modelConfig.Filters.StripParams = strings.ReplaceAll(modelConfig.Filters.StripParams, macroSlug, macroStr)

+			// Substitute macros in SetParamsByID keys and values
+			if len(modelConfig.Filters.SetParamsByID) > 0 {
+				newSetParamsByID := make(map[string]map[string]any, len(modelConfig.Filters.SetParamsByID))
+				for key, paramMap := range modelConfig.Filters.SetParamsByID {
+					newKey := strings.ReplaceAll(key, macroSlug, macroStr)
+					newValAny, err := substituteMacroInValue(any(paramMap), entry.Name, entry.Value)
+					if err != nil {
+						return Config{}, fmt.Errorf("model %s filters.setParamsByID: %s", modelId, err.Error())
+					}
+					newParamMap, ok := newValAny.(map[string]any)
+					if !ok {
+						return Config{}, fmt.Errorf("model %s filters.setParamsByID: unexpected type after macro substitution", modelId)
+					}
+					newSetParamsByID[newKey] = newParamMap
+				}
+				modelConfig.Filters.SetParamsByID = newSetParamsByID
+			}
+
 			// Substitute in metadata (type-preserving)
 			if len(modelConfig.Metadata) > 0 {
 				result, err := substituteMacroInValue(modelConfig.Metadata, entry.Name, entry.Value)
@@ -359,6 +377,34 @@ func LoadConfigFromReader(r io.Reader) (Config, error) {
 			}
 		}

+		// Validate SetParamsByID keys and values
+		for key, paramMap := range modelConfig.Filters.SetParamsByID {
+			if matches := macroPatternRegex.FindAllStringSubmatch(key, -1); len(matches) > 0 {
+				return Config{}, fmt.Errorf("unknown macro '${%s}' found in model %s filters.setParamsByID key", matches[0][1], modelId)
+			}
+			if err := validateNestedForUnknownMacros(any(paramMap), fmt.Sprintf("model %s filters.setParamsByID[%s]", modelId, key)); err != nil {
+				return Config{}, err
+			}
+		}
+
+		// Auto-register setParamsByID keys as aliases (skip the model's own ID)
+		for key := range modelConfig.Filters.SetParamsByID {
+			if key == modelId {
+				continue
+			}
+			if _, exists := config.Models[key]; exists {
+				return Config{}, fmt.Errorf("model %s filters.setParamsByID: key '%s' conflicts with an existing model ID", modelId, key)
+			}
+			if existingModel, exists := config.aliases[key]; exists {
+				if existingModel != modelId {
+					return Config{}, fmt.Errorf("duplicate alias '%s' in model %s filters.setParamsByID, already used by model %s", key, modelId, existingModel)
+				}
+				continue // already registered as explicit alias for this model
+			}
+			config.aliases[key] = modelId
+			modelConfig.Aliases = append(modelConfig.Aliases, key)
+		}
+
 		if _, err := url.Parse(modelConfig.Proxy); err != nil {
 			return Config{}, fmt.Errorf("model %s: invalid proxy URL: %w", modelId, err)
 		}
@@ -20,6 +20,12 @@ type Filters struct {
 	// SetParams is a dictionary of parameters to set/override in requests
 	// Protected params (like "model") cannot be set
 	SetParams map[string]any `yaml:"setParams"`
+
+	// SetParamsByID maps requested model IDs to parameters to set/override in requests.
+	// Useful with aliases: a single loaded model can behave differently depending on
+	// which alias the client used. Applied after SetParams, so it can override those values.
+	// Protected params (like "model") cannot be set.
+	SetParamsByID map[string]map[string]any `yaml:"setParamsByID"`
 }

 // SanitizedStripParams returns a sorted list of parameters to strip,
@@ -51,6 +57,33 @@ func (f Filters) SanitizedStripParams() []string {
 	return cleaned
 }

+// SanitizedSetParamsByID returns the params to set for the given requestedModelID,
+// with protected params removed and keys sorted for consistent iteration order.
+// Returns nil if the ID has no entry or all its params are protected.
+func (f Filters) SanitizedSetParamsByID(requestedModelID string) (map[string]any, []string) {
+	if len(f.SetParamsByID) == 0 {
+		return nil, nil
+	}
+	params, found := f.SetParamsByID[requestedModelID]
+	if !found || len(params) == 0 {
+		return nil, nil
+	}
+	result := make(map[string]any, len(params))
+	keys := make([]string, 0, len(params))
+	for key, value := range params {
+		if slices.Contains(ProtectedParams, key) {
+			continue
+		}
+		result[key] = value
+		keys = append(keys, key)
+	}
+	sort.Strings(keys)
+	if len(result) == 0 {
+		return nil, nil
+	}
+	return result, keys
+}
+
 // SanitizedSetParams returns a copy of SetParams with protected params removed
 // and keys sorted for consistent iteration order
 func (f Filters) SanitizedSetParams() (map[string]any, []string) {
@@ -162,6 +162,123 @@ func TestFilters_SanitizedSetParams(t *testing.T) {
 	}
 }

+func TestFilters_SanitizedSetParamsByID(t *testing.T) {
+	tests := []struct {
+		name             string
+		setParamsByID    map[string]map[string]any
+		requestedModelID string
+		wantParams       map[string]any
+		wantKeys         []string
+	}{
+		{
+			name:             "empty SetParamsByID returns nil",
+			setParamsByID:    nil,
+			requestedModelID: "model1",
+			wantParams:       nil,
+			wantKeys:         nil,
+		},
+		{
+			name:             "empty map returns nil",
+			setParamsByID:    map[string]map[string]any{},
+			requestedModelID: "model1",
+			wantParams:       nil,
+			wantKeys:         nil,
+		},
+		{
+			name: "non-matching model ID returns nil",
+			setParamsByID: map[string]map[string]any{
+				"model2": {"temperature": 0.9},
+			},
+			requestedModelID: "model1",
+			wantParams:       nil,
+			wantKeys:         nil,
+		},
+		{
+			name: "matching model ID returns correct params",
+			setParamsByID: map[string]map[string]any{
+				"model1": {"temperature": 0.7, "top_p": 0.9},
+				"model2": {"temperature": 0.5},
+			},
+			requestedModelID: "model1",
+			wantParams: map[string]any{
+				"temperature": 0.7,
+				"top_p":       0.9,
+			},
+			wantKeys: []string{"temperature", "top_p"},
+		},
+		{
+			name: "protected param model is filtered out",
+			setParamsByID: map[string]map[string]any{
+				"model1": {
+					"model":       "should-be-filtered",
+					"temperature": 0.7,
+				},
+			},
+			requestedModelID: "model1",
+			wantParams: map[string]any{
+				"temperature": 0.7,
+			},
+			wantKeys: []string{"temperature"},
+		},
+		{
+			name: "only protected param returns nil",
+			setParamsByID: map[string]map[string]any{
+				"model1": {
+					"model": "should-be-filtered",
+				},
+			},
+			requestedModelID: "model1",
+			wantParams:       nil,
+			wantKeys:         nil,
+		},
+		{
+			name: "keys are sorted",
+			setParamsByID: map[string]map[string]any{
+				"model1": {
+					"z_param": "z",
+					"a_param": "a",
+					"m_param": "m",
+				},
+			},
+			requestedModelID: "model1",
+			wantParams: map[string]any{
+				"z_param": "z",
+				"a_param": "a",
+				"m_param": "m",
+			},
+			wantKeys: []string{"a_param", "m_param", "z_param"},
+		},
+		{
+			name: "alias style key lookup",
+			setParamsByID: map[string]map[string]any{
+				"model1:high": {"reasoning_effort": "high"},
+				"model1:low":  {"reasoning_effort": "low"},
+			},
+			requestedModelID: "model1:high",
+			wantParams: map[string]any{
+				"reasoning_effort": "high",
+			},
+			wantKeys: []string{"reasoning_effort"},
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			f := Filters{SetParamsByID: tt.setParamsByID}
+			gotParams, gotKeys := f.SanitizedSetParamsByID(tt.requestedModelID)
+
+			if tt.wantParams == nil {
+				assert.Nil(t, gotParams)
+				assert.Nil(t, gotKeys)
+				return
+			}
+
+			assert.Equal(t, tt.wantKeys, gotKeys)
+			assert.Equal(t, tt.wantParams, gotParams)
+		})
+	}
+}
+
 func TestProtectedParams(t *testing.T) {
 	// Verify that "model" is protected
 	assert.Contains(t, ProtectedParams, "model")
@@ -73,6 +73,72 @@ models:
 	}
 }

+func TestConfig_SetParamsByIDAutoAlias(t *testing.T) {
+	content := `
+models:
+  model1:
+    cmd: path/to/cmd --port ${PORT}
+    filters:
+      setParamsByID:
+        "${MODEL_ID}:high":
+          reasoning_effort: high
+        "${MODEL_ID}:low":
+          reasoning_effort: low
+`
+	cfg, err := LoadConfigFromReader(strings.NewReader(content))
+	assert.NoError(t, err)
+
+	// Keys (other than the model's own ID) should be registered as aliases
+	realName, found := cfg.RealModelName("model1:high")
+	assert.True(t, found, "model1:high should be an auto-registered alias")
+	assert.Equal(t, "model1", realName)
+
+	realName, found = cfg.RealModelName("model1:low")
+	assert.True(t, found, "model1:low should be an auto-registered alias")
+	assert.Equal(t, "model1", realName)
+
+	// Auto-aliases should also appear in modelConfig.Aliases
+	aliases := cfg.Models["model1"].Aliases
+	assert.Contains(t, aliases, "model1:high")
+	assert.Contains(t, aliases, "model1:low")
+}
+
+func TestConfig_SetParamsByIDAutoAliasConflictWithModelID(t *testing.T) {
+	content := `
+models:
+  model1:
+    cmd: path/to/cmd --port ${PORT}
+    filters:
+      setParamsByID:
+        model2:
+          reasoning_effort: high
+  model2:
+    cmd: path/to/cmd --port ${PORT}
+`
+	_, err := LoadConfigFromReader(strings.NewReader(content))
+	assert.ErrorContains(t, err, "conflicts with an existing model ID")
+}
+
+func TestConfig_SetParamsByIDAutoAliasConflictWithOtherModel(t *testing.T) {
+	content := `
+models:
+  model1:
+    cmd: path/to/cmd --port ${PORT}
+    filters:
+      setParamsByID:
+        "shared-alias":
+          reasoning_effort: high
+  model2:
+    cmd: path/to/cmd --port ${PORT}
+    filters:
+      setParamsByID:
+        "shared-alias":
+          reasoning_effort: low
+`
+	_, err := LoadConfigFromReader(strings.NewReader(content))
+	assert.ErrorContains(t, err, "duplicate alias")
+}
+
 func TestConfig_ModelFiltersWithSetParams(t *testing.T) {
 	content := `
 models:
@@ -8,6 +8,7 @@ const ConfigFileChangedEventID = 0x03
 const LogDataEventID = 0x04
 const TokenMetricsEventID = 0x05
 const ModelPreloadedEventID = 0x06
+const InFlightRequestsEventID = 0x07

 type ProcessStateChangeEvent struct {
 	ProcessName string
@@ -58,3 +59,11 @@ type ModelPreloadedEvent struct {
 func (e ModelPreloadedEvent) Type() uint32 {
 	return ModelPreloadedEventID
 }
+
+type InFlightRequestsEvent struct {
+	Total int
+}
+
+func (e InFlightRequestsEvent) Type() uint32 {
+	return InFlightRequestsEventID
+}
@@ -240,7 +240,6 @@ func (mp *metricsMonitor) wrapHandler(
 			return nil
 		}
 	}
-
 	if strings.Contains(recorder.Header().Get("Content-Type"), "text/event-stream") {
 		if parsed, err := processStreamingResponse(modelID, recorder.StartTime(), body); err != nil {
 			mp.logger.Warnf("error processing streaming response: %v, path=%s, recording minimal metrics", err, request.URL.Path)
@@ -253,6 +252,14 @@ func (mp *metricsMonitor) wrapHandler(
 			usage := parsed.Get("usage")
 			timings := parsed.Get("timings")

+			// extract timings for infill - response is an array, timings are in the last element
+			// see #463
+			if strings.HasPrefix(request.URL.Path, "/infill") {
+				if arr := parsed.Array(); len(arr) > 0 {
+					timings = arr[len(arr)-1].Get("timings")
+				}
+			}
+
 			if usage.Exists() || timings.Exists() {
 				if parsedMetrics, err := parseMetrics(modelID, recorder.StartTime(), usage, timings); err != nil {
 					mp.logger.Warnf("error parsing metrics: %v, path=%s, recording minimal metrics", err, request.URL.Path)
@@ -384,6 +384,75 @@ data: [DONE]
 		assert.Equal(t, 0, metrics[0].InputTokens)
 		assert.Equal(t, 0, metrics[0].OutputTokens)
 	})
+
+	t.Run("infill request extracts timings from last array element", func(t *testing.T) {
+		mm := newMetricsMonitor(testLogger, 10, 0)
+
+		// Infill response is an array with timings in the last element
+		responseBody := `[
+			{"content": "first chunk"},
+			{"content": "second chunk"},
+			{"content": "final", "timings": {
+				"prompt_n": 150,
+				"predicted_n": 75,
+				"prompt_per_second": 200.5,
+				"predicted_per_second": 35.5,
+				"prompt_ms": 600.0,
+				"predicted_ms": 1800.0,
+				"cache_n": 30
+			}}
+		]`
+
+		nextHandler := func(modelID string, w http.ResponseWriter, r *http.Request) error {
+			w.Header().Set("Content-Type", "application/json")
+			w.WriteHeader(http.StatusOK)
+			w.Write([]byte(responseBody))
+			return nil
+		}
+
+		req := httptest.NewRequest("POST", "/infill", nil)
+		rec := httptest.NewRecorder()
+		ginCtx, _ := gin.CreateTestContext(rec)
+
+		err := mm.wrapHandler("test-model", ginCtx.Writer, req, nextHandler)
+		assert.NoError(t, err)
+
+		metrics := mm.getMetrics()
+		assert.Equal(t, 1, len(metrics))
+		assert.Equal(t, "test-model", metrics[0].Model)
+		assert.Equal(t, 150, metrics[0].InputTokens)
+		assert.Equal(t, 75, metrics[0].OutputTokens)
+		assert.Equal(t, 30, metrics[0].CachedTokens)
+		assert.Equal(t, 200.5, metrics[0].PromptPerSecond)
+		assert.Equal(t, 35.5, metrics[0].TokensPerSecond)
+		assert.Equal(t, 2400, metrics[0].DurationMs) // 600 + 1800
+	})
+
+	t.Run("infill request with empty array records minimal metrics", func(t *testing.T) {
+		mm := newMetricsMonitor(testLogger, 10, 0)
+
+		responseBody := `[]`
+
+		nextHandler := func(modelID string, w http.ResponseWriter, r *http.Request) error {
+			w.Header().Set("Content-Type", "application/json")
+			w.WriteHeader(http.StatusOK)
+			w.Write([]byte(responseBody))
+			return nil
+		}
+
+		req := httptest.NewRequest("POST", "/infill", nil)
+		rec := httptest.NewRecorder()
+		ginCtx, _ := gin.CreateTestContext(rec)
+
+		err := mm.wrapHandler("test-model", ginCtx.Writer, req, nextHandler)
+		assert.NoError(t, err)
+
+		metrics := mm.getMetrics()
+		assert.Equal(t, 1, len(metrics))
+		assert.Equal(t, "test-model", metrics[0].Model)
+		assert.Equal(t, 0, metrics[0].InputTokens)
+		assert.Equal(t, 0, metrics[0].OutputTokens)
+	})
 }

 func TestMetricsMonitor_ResponseBodyCopier(t *testing.T) {
@@ -28,6 +28,40 @@ const (

 type proxyCtxKey string

+type InflightCounter struct {
+	mu    sync.Mutex
+	total int
+}
+
+func newInflightCounter() *InflightCounter {
+	return &InflightCounter{}
+}
+
+func (ic *InflightCounter) Current() int {
+	ic.mu.Lock()
+	total := ic.total
+	ic.mu.Unlock()
+	return total
+}
+
+func (ic *InflightCounter) Increment() int {
+	ic.mu.Lock()
+	ic.total++
+	total := ic.total
+	ic.mu.Unlock()
+	return total
+}
+
+func (ic *InflightCounter) Decrement() int {
+	ic.mu.Lock()
+	if ic.total > 0 {
+		ic.total--
+	}
+	total := ic.total
+	ic.mu.Unlock()
+	return total
+}
+
 type ProxyManager struct {
 	sync.Mutex

@@ -43,6 +77,8 @@ type ProxyManager struct {

 	processGroups map[string]*ProcessGroup

+	inFlightCounter *InflightCounter
+
 	// shutdown signaling
 	shutdownCtx    context.Context
 	shutdownCancel context.CancelFunc
@@ -155,6 +191,8 @@ func New(proxyConfig config.Config) *ProxyManager {

 		processGroups: make(map[string]*ProcessGroup),

+		inFlightCounter: newInflightCounter(),
+
 		shutdownCtx:    shutdownCtx,
 		shutdownCancel: shutdownCancel,

@@ -276,37 +314,37 @@ func (pm *ProxyManager) setupGinEngine() {

 	// Set up routes using the Gin engine
 	// Protected routes use pm.apiKeyAuth() middleware
-	pm.ginEngine.POST("/v1/chat/completions", pm.apiKeyAuth(), pm.proxyInferenceHandler)
-	pm.ginEngine.POST("/v1/responses", pm.apiKeyAuth(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/chat/completions", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/responses", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
 	// Support legacy /v1/completions api, see issue #12
-	pm.ginEngine.POST("/v1/completions", pm.apiKeyAuth(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/completions", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
 	// Support anthropic /v1/messages (added https://github.com/ggml-org/llama.cpp/pull/17570)
-	pm.ginEngine.POST("/v1/messages", pm.apiKeyAuth(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/messages", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
 	// Support anthropic count_tokens API (Also added in the above PR)
-	pm.ginEngine.POST("/v1/messages/count_tokens", pm.apiKeyAuth(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/messages/count_tokens", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)

 	// Support embeddings and reranking
-	pm.ginEngine.POST("/v1/embeddings", pm.apiKeyAuth(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/embeddings", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)

 	// llama-server's /reranking endpoint + aliases
-	pm.ginEngine.POST("/reranking", pm.apiKeyAuth(), pm.proxyInferenceHandler)
-	pm.ginEngine.POST("/rerank", pm.apiKeyAuth(), pm.proxyInferenceHandler)
-	pm.ginEngine.POST("/v1/rerank", pm.apiKeyAuth(), pm.proxyInferenceHandler)
-	pm.ginEngine.POST("/v1/reranking", pm.apiKeyAuth(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/reranking", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/rerank", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/rerank", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/reranking", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)

 	// llama-server's /infill endpoint for code infilling
-	pm.ginEngine.POST("/infill", pm.apiKeyAuth(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/infill", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)

 	// llama-server's /completion endpoint
-	pm.ginEngine.POST("/completion", pm.apiKeyAuth(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/completion", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)

 	// Support audio/speech endpoint
-	pm.ginEngine.POST("/v1/audio/speech", pm.apiKeyAuth(), pm.proxyInferenceHandler)
-	pm.ginEngine.POST("/v1/audio/voices", pm.apiKeyAuth(), pm.proxyInferenceHandler)
-	pm.ginEngine.GET("/v1/audio/voices", pm.apiKeyAuth(), pm.proxyGETModelHandler)
-	pm.ginEngine.POST("/v1/audio/transcriptions", pm.apiKeyAuth(), pm.proxyOAIPostFormHandler)
-	pm.ginEngine.POST("/v1/images/generations", pm.apiKeyAuth(), pm.proxyInferenceHandler)
-	pm.ginEngine.POST("/v1/images/edits", pm.apiKeyAuth(), pm.proxyOAIPostFormHandler)
+	pm.ginEngine.POST("/v1/audio/speech", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/audio/voices", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
+	pm.ginEngine.GET("/v1/audio/voices", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyGETModelHandler)
+	pm.ginEngine.POST("/v1/audio/transcriptions", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyOAIPostFormHandler)
+	pm.ginEngine.POST("/v1/images/generations", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyInferenceHandler)
+	pm.ginEngine.POST("/v1/images/edits", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyOAIPostFormHandler)

 	pm.ginEngine.GET("/v1/models", pm.apiKeyAuth(), pm.listModelsHandler)

@@ -325,7 +363,7 @@ func (pm *ProxyManager) setupGinEngine() {
 	pm.ginEngine.GET("/upstream", func(c *gin.Context) {
 		c.Redirect(http.StatusFound, "/ui/models")
 	})
-	pm.ginEngine.Any("/upstream/*upstreamPath", pm.apiKeyAuth(), pm.proxyToUpstream)
+	pm.ginEngine.Any("/upstream/*upstreamPath", pm.apiKeyAuth(), pm.trackInflight(), pm.proxyToUpstream)
 	pm.ginEngine.GET("/unload", pm.apiKeyAuth(), pm.unloadAllModelsHandler)
 	pm.ginEngine.GET("/running", pm.apiKeyAuth(), pm.listRunningProcessesHandler)
 	pm.ginEngine.GET("/health", func(c *gin.Context) {
@@ -389,6 +427,14 @@ func (pm *ProxyManager) setupGinEngine() {
 	gin.DisableConsoleColor()
 }

+func (pm *ProxyManager) trackInflight() gin.HandlerFunc {
+	return func(c *gin.Context) {
+		event.Emit(InFlightRequestsEvent{Total: pm.inFlightCounter.Increment()})
+		defer event.Emit(InFlightRequestsEvent{Total: pm.inFlightCounter.Decrement()})
+		c.Next()
+	}
+}
+
 // ServeHTTP implements http.Handler interface
 func (pm *ProxyManager) ServeHTTP(w http.ResponseWriter, r *http.Request) {
 	pm.ginEngine.ServeHTTP(w, r)
@@ -674,6 +720,17 @@ func (pm *ProxyManager) proxyInferenceHandler(c *gin.Context) {
 			}
 		}

+		// setParamsByID: set params based on the requested model ID (runs after setParams, can override it)
+		setParamsByIDParams, setParamsByIDKeys := pm.config.Models[modelID].Filters.SanitizedSetParamsByID(requestedModel)
+		for _, key := range setParamsByIDKeys {
+			pm.proxyLogger.Debugf("<%s> setting param by id: %s", requestedModel, key)
+			bodyBytes, err = sjson.SetBytes(bodyBytes, key, setParamsByIDParams[key])
+			if err != nil {
+				pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error setting parameter %s in request", key))
+				return
+			}
+		}
+
 		pm.proxyLogger.Debugf("ProxyManager using local Process for model: %s", requestedModel)
 		nextHandler = processGroup.ProxyRequest
 	} else if pm.peerProxy != nil && pm.peerProxy.HasPeerModel(requestedModel) {
@@ -14,12 +14,13 @@ import (
 )

 type Model struct {
-	Id          string `json:"id"`
-	Name        string `json:"name"`
-	Description string `json:"description"`
-	State       string `json:"state"`
-	Unlisted    bool   `json:"unlisted"`
-	PeerID      string `json:"peerID"`
+	Id          string   `json:"id"`
+	Name        string   `json:"name"`
+	Description string   `json:"description"`
+	State       string   `json:"state"`
+	Unlisted    bool     `json:"unlisted"`
+	PeerID      string   `json:"peerID"`
+	Aliases     []string `json:"aliases,omitempty"`
 }

 func addApiHandlers(pm *ProxyManager) {
@@ -83,6 +84,7 @@ func (pm *ProxyManager) getModelStatus() []Model {
 			Description: pm.config.Models[modelID].Description,
 			State:       state,
 			Unlisted:    pm.config.Models[modelID].Unlisted,
+			Aliases:     pm.config.Models[modelID].Aliases,
 		})
 	}

@@ -107,6 +109,7 @@ const (
 	msgTypeModelStatus messageType = "modelStatus"
 	msgTypeLogData     messageType = "logData"
 	msgTypeMetrics     messageType = "metrics"
+	msgTypeInFlight    messageType = "inflight"
 )

 type messageEnvelope struct {
@@ -166,6 +169,18 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
 		}
 	}

+	sendInFlight := func(total int) {
+		jsonData, err := json.Marshal(gin.H{"total": total})
+		if err == nil {
+			select {
+			case sendBuffer <- messageEnvelope{Type: msgTypeInFlight, Data: string(jsonData)}:
+			case <-ctx.Done():
+				return
+			default:
+			}
+		}
+	}
+
 	/**
 	 * Send updated models list
 	 */
@@ -193,11 +208,19 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
 		sendMetrics([]TokenMetrics{e.Metrics})
 	})()

+	/**
+	 * Send in-flight request stats related to token stats "Waiting: N" count.
+	 */
+	defer event.On(func(e InFlightRequestsEvent) {
+		sendInFlight(e.Total)
+	})()
+
 	// send initial batch of data
 	sendLogData("proxy", pm.proxyLogger.GetHistory())
 	sendLogData("upstream", pm.upstreamLogger.GetHistory())
 	sendModels()
 	sendMetrics(pm.metricsMonitor.getMetrics())
+	sendInFlight(pm.inFlightCounter.Current())

 	for {
 		select {
@@ -1046,6 +1046,61 @@ func TestProxyManager_FiltersStripParams(t *testing.T) {
 	// t.Logf("%v", response)
 }

+func TestProxyManager_FiltersSetParamsByID(t *testing.T) {
+	// no explicit aliases — setParamsByID keys are auto-registered as aliases
+	configStr := strings.Replace(`
+logLevel: error
+models:
+  model1:
+    cmd: 'SRPATH --port ${PORT} --silent --respond model1'
+    proxy: "http://127.0.0.1:${PORT}"
+    filters:
+      setParams:
+        reasoning_effort: medium
+      setParamsByID:
+        "${MODEL_ID}:high":
+          reasoning_effort: high
+        "${MODEL_ID}:low":
+          reasoning_effort: low
+`, "SRPATH", simpleResponderPath, -1)
+
+	cfg, err := config.LoadConfigFromReader(strings.NewReader(configStr))
+	if !assert.NoError(t, err, "invalid test configuration") {
+		return
+	}
+
+	proxy := New(cfg)
+	defer proxy.StopProcesses(StopWaitForInflightRequest)
+
+	tests := []struct {
+		requestedModel string
+		wantEffort     string
+	}{
+		// setParams applies, no setParamsByID match
+		{requestedModel: "model1", wantEffort: "medium"},
+		// setParamsByID overrides setParams
+		{requestedModel: "model1:high", wantEffort: "high"},
+		{requestedModel: "model1:low", wantEffort: "low"},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.requestedModel, func(t *testing.T) {
+			reqBody := fmt.Sprintf(`{"model":%q}`, tt.requestedModel)
+			req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewBufferString(reqBody))
+			w := CreateTestResponseRecorder()
+			proxy.ServeHTTP(w, req)
+			assert.Equal(t, http.StatusOK, w.Code)
+
+			var response map[string]interface{}
+			assert.NoError(t, json.Unmarshal(w.Body.Bytes(), &response))
+
+			requestBody, _ := response["request_body"].(string)
+			gotEffort := gjson.Get(requestBody, "reasoning_effort").String()
+			assert.Equal(t, tt.wantEffort, gotEffort, "reasoning_effort mismatch for model %s", tt.requestedModel)
+		})
+	}
+}
+
 func TestProxyManager_HealthEndpoint(t *testing.T) {
 	config := config.AddDefaultGroupToConfig(config.Config{
 		HealthCheckTimeout: 15,
@@ -6,23 +6,28 @@
  import Models from "./routes/Models.svelte";
  import Activity from "./routes/Activity.svelte";
  import Playground from "./routes/Playground.svelte";
+  import PlaygroundStub from "./routes/PlaygroundStub.svelte";
  import { enableAPIEvents } from "./stores/api";
  import { initScreenWidth, isDarkMode, appTitle, connectionState } from "./stores/theme";
+  import { currentRoute } from "./stores/route";

  const routes = {
-    "/": Playground,
+    "/": PlaygroundStub,
    "/models": Models,
    "/logs": LogViewer,
    "/activity": Activity,
-    "*": Playground,
+    "*": PlaygroundStub,
  };

-  // Sync theme to document attribute
+  function handleRouteLoaded(event: { detail: { route: string | RegExp } }) {
+    const route = event.detail.route;
+    currentRoute.set(typeof route === "string" ? route : "/");
+  }
+
  $effect(() => {
    document.documentElement.setAttribute("data-theme", $isDarkMode ? "dark" : "light");
  });

-  // Sync title to document
  $effect(() => {
    const icon = $connectionState === "connecting" ? "\u{1F7E1}" : $connectionState === "connected" ? "\u{1F7E2}" : "\u{1F534}";
    document.title = `${icon} ${$appTitle}`;
@@ -43,6 +48,11 @@
  <Header />

  <main class="flex-1 overflow-auto p-4">
-    <Router {routes} />
+    <div class="h-full" class:hidden={$currentRoute !== "/"}>
+      <Playground />
+    </div>
+    <div class="h-full" class:hidden={$currentRoute === "/"}>
+      <Router {routes} on:routeLoaded={handleRouteLoaded} />
+    </div>
  </main>
 </div>
@@ -1,6 +1,8 @@
 <script lang="ts">
-  import { link, location } from "svelte-spa-router";
+  import { link } from "svelte-spa-router";
  import { screenWidth, toggleTheme, isDarkMode, appTitle, isNarrow } from "../stores/theme";
+  import { currentRoute } from "../stores/route";
+  import { playgroundActivity } from "../stores/playgroundActivity";
  import ConnectionStatus from "./ConnectionStatus.svelte";

  function handleTitleChange(newTitle: string): void {
@@ -22,9 +24,10 @@
    handleTitleChange(target.textContent || "(set title)");
  }

-  function isActive(path: string, currentLocation: string): boolean {
-    return path === "/" ? currentLocation === "/" : currentLocation.startsWith(path);
+  function isActive(path: string, current: string): boolean {
+    return path === "/" ? current === "/" : current.startsWith(path);
  }
+
 </script>

 <header
@@ -47,8 +50,7 @@
    <a
      href="/"
      use:link
-      class="text-gray-600 hover:text-black dark:text-gray-300 dark:hover:text-gray-100 p-1 whitespace-nowrap"
-      class:font-semibold={isActive("/", $location)}
+      class="p-1 whitespace-nowrap {isActive('/', $currentRoute) ? 'font-semibold' : ''} {$playgroundActivity ? 'activity-link' : 'text-gray-600 hover:text-black dark:text-gray-300 dark:hover:text-gray-100'}"
    >
      Playground
    </a>
@@ -56,7 +58,7 @@
      href="/models"
      use:link
      class="text-gray-600 hover:text-black dark:text-gray-300 dark:hover:text-gray-100 p-1 whitespace-nowrap"
-      class:font-semibold={isActive("/models", $location)}
+      class:font-semibold={isActive("/models", $currentRoute)}
    >
      Models
    </a>
@@ -64,7 +66,7 @@
      href="/activity"
      use:link
      class="text-gray-600 hover:text-black dark:text-gray-300 dark:hover:text-gray-100 p-1 whitespace-nowrap"
-      class:font-semibold={isActive("/activity", $location)}
+      class:font-semibold={isActive("/activity", $currentRoute)}
    >
      Activity
    </a>
@@ -72,7 +74,7 @@
      href="/logs"
      use:link
      class="text-gray-600 hover:text-black dark:text-gray-300 dark:hover:text-gray-100 p-1 whitespace-nowrap"
-      class:font-semibold={isActive("/logs", $location)}
+      class:font-semibold={isActive("/logs", $currentRoute)}
    >
      Logs
    </a>
@@ -96,3 +98,23 @@
    <ConnectionStatus />
  </menu>
 </header>
+
+<style>
+  .activity-link {
+    background: linear-gradient(90deg, #6366f1, #8b5cf6, #a855f7, #8b5cf6, #6366f1);
+    background-size: 200% 100%;
+    -webkit-background-clip: text;
+    background-clip: text;
+    -webkit-text-fill-color: transparent;
+    animation: gradient-shift 2s linear infinite;
+  }
+
+  @keyframes gradient-shift {
+    0% {
+      background-position: 0% 50%;
+    }
+    100% {
+      background-position: 200% 50%;
+    }
+  }
+</style>
@@ -65,10 +65,17 @@
  });

  let preElement: HTMLPreElement;
+  let userScrolledUp = $state(false);

-  // Auto scroll to bottom when logs change
+  function handleScroll() {
+    if (!preElement) return;
+    const { scrollTop, scrollHeight, clientHeight } = preElement;
+    userScrolledUp = scrollHeight - scrollTop - clientHeight > 40;
+  }
+
+  // Auto scroll to bottom when logs change, unless user has scrolled up
  $effect(() => {
-    if (preElement && filteredLogs) {
+    if (preElement && filteredLogs && !userScrolledUp) {
      preElement.scrollTop = preElement.scrollHeight;
    }
  });
@@ -127,6 +134,6 @@
    {/if}
  </div>
  <div class="rounded-lg bg-background font-mono text-sm flex-1 overflow-hidden">
-    <pre bind:this={preElement} class="{textWrapClass} {fontSizeClass} h-full overflow-auto p-4">{filteredLogs}</pre>
+    <pre bind:this={preElement} onscroll={handleScroll} class="{textWrapClass} {fontSizeClass} h-full overflow-auto p-4">{filteredLogs}</pre>
  </div>
 </div>
@@ -165,6 +165,9 @@
              {#if model.description}
                <p class={model.unlisted ? "text-opacity-70" : ""}><em>{model.description}</em></p>
              {/if}
+              {#if model.aliases && model.aliases.length > 0}
+                <p class="text-xs text-txtsecondary">Aliases: {model.aliases.join(", ")}</p>
+              {/if}
            </td>
            <td class="w-12">
              {#if model.state === "stopped"}
@@ -1,5 +1,5 @@
 <script lang="ts">
-  import { metrics } from "../stores/api";
+  import { inFlightRequests, metrics } from "../stores/api";
  import TokenHistogram from "./TokenHistogram.svelte";

  interface HistogramData {
@@ -15,7 +15,14 @@
  let stats = $derived.by(() => {
    const totalRequests = $metrics.length;
    if (totalRequests === 0) {
-      return { totalRequests: 0, totalInputTokens: 0, totalOutputTokens: 0, tokenStats: { p99: "0", p95: "0", p50: "0" }, histogramData: null };
+      return {
+        totalRequests: 0,
+        totalInputTokens: 0,
+        totalOutputTokens: 0,
+        inFlightRequests: $inFlightRequests,
+        tokenStats: { p99: "0", p95: "0", p50: "0" },
+        histogramData: null,
+      };
    }

    const totalInputTokens = $metrics.reduce((sum, m) => sum + m.input_tokens, 0);
@@ -24,7 +31,14 @@
    // Calculate token statistics using output_tokens and duration_ms
    const validMetrics = $metrics.filter((m) => m.duration_ms > 0 && m.output_tokens > 0);
    if (validMetrics.length === 0) {
-      return { totalRequests, totalInputTokens, totalOutputTokens, tokenStats: { p99: "0", p95: "0", p50: "0" }, histogramData: null };
+      return {
+        totalRequests,
+        totalInputTokens,
+        totalOutputTokens,
+        inFlightRequests: $inFlightRequests,
+        tokenStats: { p99: "0", p95: "0", p50: "0" },
+        histogramData: null,
+      };
    }

    // Calculate tokens/second for each valid metric
@@ -63,6 +77,7 @@
      totalRequests,
      totalInputTokens,
      totalOutputTokens,
+      inFlightRequests: $inFlightRequests,
      tokenStats: {
        p99: p99.toFixed(2),
        p95: p95.toFixed(2),
@@ -95,7 +110,12 @@

      <tbody class="bg-surface divide-y divide-card-border-inner">
        <tr class="hover:bg-secondary">
-          <td class="px-4 py-4 text-sm font-semibold text-gray-900 dark:text-white">{stats.totalRequests}</td>
+          <td class="px-4 py-4 text-sm font-semibold text-gray-900 dark:text-white">
+            <div class="flex flex-col gap-1">
+              <span class="text-xs font-medium text-gray-500 dark:text-gray-400">Completed: {nf.format(stats.totalRequests)}</span>
+              <span class="text-xs font-medium text-gray-500 dark:text-gray-400">Waiting: {nf.format(stats.inFlightRequests)}</span>
+            </div>
+          </td>

          <td class="px-4 py-4 text-sm text-gray-700 dark:text-gray-300 border-l border-gray-200 dark:border-white/10">
            <div class="flex items-center gap-2">
@@ -2,6 +2,7 @@
  import { models } from "../../stores/api";
  import { persistentStore } from "../../stores/persistent";
  import { transcribeAudio } from "../../lib/audioApi";
+  import { playgroundStores } from "../../stores/playgroundActivity";
  import ModelSelector from "./ModelSelector.svelte";

  const selectedModelStore = persistentStore<string>("playground-audio-model", "");
@@ -22,6 +23,10 @@

  let canTranscribe = $derived(selectedFile !== null && $selectedModelStore !== "" && !isTranscribing);

+  $effect(() => {
+    playgroundStores.audioTranscribing.set(isTranscribing);
+  });
+
  function validateFile(file: File): { valid: boolean; error?: string } {
    const ext = '.' + file.name.split('.').pop()?.toLowerCase();

@@ -2,6 +2,7 @@
  import { models } from "../../stores/api";
  import { persistentStore } from "../../stores/persistent";
  import { streamChatCompletion } from "../../lib/chatApi";
+  import { playgroundStores } from "../../stores/playgroundActivity";
  import type { ChatMessage, ContentPart } from "../../lib/types";
  import ChatMessageComponent from "./ChatMessage.svelte";
  import ModelSelector from "./ModelSelector.svelte";
@@ -11,7 +12,16 @@
  const systemPromptStore = persistentStore<string>("playground-system-prompt", "");
  const temperatureStore = persistentStore<number>("playground-temperature", 0.7);

-  let messages = $state<ChatMessage[]>([]);
+  function loadMessages(): ChatMessage[] {
+    try {
+      const saved = localStorage.getItem("playground-messages");
+      return saved ? JSON.parse(saved) : [];
+    } catch {
+      return [];
+    }
+  }
+
+  let messages = $state<ChatMessage[]>(loadMessages());
  let userInput = $state("");
  let isStreaming = $state(false);
  let isReasoning = $state(false);
@@ -24,21 +34,52 @@
  let imageError = $state<string | null>(null);

  let hasModels = $derived($models.some((m) => !m.unlisted));
+  let userScrolledUp = $state(false);

-  // Auto-scroll when messages change
  $effect(() => {
-    if (messages.length > 0 && messagesContainer) {
+    playgroundStores.chatStreaming.set(isStreaming);
+  });
+
+  function handleMessagesScroll() {
+    if (!messagesContainer) return;
+    const { scrollTop, scrollHeight, clientHeight } = messagesContainer;
+    // Consider "at bottom" if within 40px of the bottom
+    userScrolledUp = scrollHeight - scrollTop - clientHeight > 40;
+  }
+
+  // Auto-scroll when messages change — skip if user scrolled up
+  $effect(() => {
+    if (messages.length > 0 && messagesContainer && !userScrolledUp) {
      messagesContainer.scrollTo({
        top: messagesContainer.scrollHeight,
-        behavior: "smooth",
+        behavior: isStreaming ? "instant" : "smooth",
      });
    }
  });

+  // Persist messages to localStorage (throttled to once per 2s)
+  let lastSaveTime = 0;
+  $effect(() => {
+    const json = JSON.stringify(messages);
+    const elapsed = Date.now() - lastSaveTime;
+    const save = () => {
+      try { localStorage.setItem("playground-messages", json); } catch {}
+      lastSaveTime = Date.now();
+    };
+    if (elapsed >= 2000) {
+      save();
+      return;
+    }
+    const timer = setTimeout(save, 2000 - elapsed);
+    return () => clearTimeout(timer);
+  });
+
  async function sendMessage() {
    const trimmedInput = userInput.trim();
    if ((!trimmedInput && attachedImages.length === 0) || !$selectedModelStore || isStreaming) return;

+    userScrolledUp = false;
+
    // Build message content (multimodal if images attached)
    let content: string | ContentPart[];
    if (attachedImages.length > 0) {
@@ -321,6 +362,7 @@
    <div
      class="flex-1 overflow-y-auto mb-4 px-2"
      bind:this={messagesContainer}
+      onscroll={handleMessagesScroll}
    >
      {#if messages.length === 0}
        <div class="h-full flex items-center justify-center text-txtsecondary">
@@ -1,5 +1,6 @@
 <script lang="ts">
-  import { renderMarkdown, escapeHtml } from "../../lib/markdown";
+  import { renderMarkdown, escapeHtml, renderStreamingMarkdown, createStreamingCache } from "../../lib/markdown";
+  import type { RenderedBlock } from "../../lib/markdown";
  import { Copy, Check, Pencil, X, Save, RefreshCw, ChevronDown, ChevronRight, Brain, Code } from "lucide-svelte";
  import { getTextContent, getImageUrls } from "../../lib/types";
  import type { ContentPart } from "../../lib/types";
@@ -22,11 +23,17 @@
  let hasImages = $derived(imageUrls.length > 0);
  let canEdit = $derived(onEdit !== undefined && !hasImages);

-  let renderedContent = $derived(
-    role === "assistant" && !isStreaming
-      ? renderMarkdown(textContent)
-      : escapeHtml(textContent).replace(/\n/g, '<br>')
-  );
+  let streamingCache = createStreamingCache();
+  let renderedParts = $derived.by(() => {
+    if (role !== "assistant") {
+      return { blocks: [{ id: -1, html: escapeHtml(textContent).replace(/\n/g, '<br>') }] as RenderedBlock[], pendingHtml: "" };
+    }
+    if (!isStreaming) {
+      streamingCache = createStreamingCache();
+      return { blocks: [{ id: -1, html: renderMarkdown(textContent) }] as RenderedBlock[], pendingHtml: "" };
+    }
+    return renderStreamingMarkdown(textContent, streamingCache);
+  });
  let copied = $state(false);
  let showRaw = $state(false);
  let isEditing = $state(false);
@@ -113,9 +120,9 @@

 <div class="flex {role === 'user' ? 'justify-end' : 'justify-start'} mb-4">
  <div
-    class="relative group max-w-[85%] rounded-lg px-4 py-2 {role === 'user'
-      ? 'bg-primary text-btn-primary-text'
-      : 'bg-surface border border-gray-200 dark:border-white/10'}"
+    class="relative group rounded-lg px-4 py-2 {role === 'user'
+      ? 'max-w-[85%] bg-primary text-btn-primary-text'
+      : 'w-full sm:w-4/5 bg-surface border border-gray-200 dark:border-white/10'}"
  >
    {#if role === "assistant"}
      {#if reasoning_content || isReasoning}
@@ -168,7 +175,10 @@
        <div class="whitespace-pre-wrap font-mono text-sm">{textContent}</div>
      {:else}
        <div class="prose prose-sm dark:prose-invert max-w-none">
-          {@html renderedContent}
+          {#each renderedParts.blocks as block (block.id)}
+            {@html block.html}
+          {/each}
+          {@html renderedParts.pendingHtml}
          {#if isStreaming && !isReasoning}
            <span class="inline-block w-2 h-4 bg-current animate-pulse ml-0.5"></span>
          {/if}
@@ -2,6 +2,7 @@
  import { models } from "../../stores/api";
  import { persistentStore } from "../../stores/persistent";
  import { generateImage } from "../../lib/imageApi";
+  import { playgroundStores } from "../../stores/playgroundActivity";
  import ModelSelector from "./ModelSelector.svelte";
  import ExpandableTextarea from "./ExpandableTextarea.svelte";

@@ -17,6 +18,10 @@

  let hasModels = $derived($models.some((m) => !m.unlisted));

+  $effect(() => {
+    playgroundStores.imageGenerating.set(isGenerating);
+  });
+
  async function generate() {
    const trimmedPrompt = prompt.trim();
    if (!trimmedPrompt || !$selectedModelStore || isGenerating) return;
@@ -25,6 +25,11 @@
      <optgroup label="Local">
        {#each grouped.local as model (model.id)}
          <option value={model.id}>{model.id}</option>
+          {#if model.aliases}
+            {#each model.aliases as alias (alias)}
+              <option value={alias}>  ↳ {alias}</option>
+            {/each}
+          {/if}
        {/each}
      </optgroup>
    {/if}
@@ -2,6 +2,7 @@
  import { models } from "../../stores/api";
  import { persistentStore } from "../../stores/persistent";
  import { generateSpeech } from "../../lib/speechApi";
+  import { playgroundStores } from "../../stores/playgroundActivity";
  import ModelSelector from "./ModelSelector.svelte";
  import ExpandableTextarea from "./ExpandableTextarea.svelte";

@@ -20,11 +21,9 @@
  let availableVoices = $state<string[]>(["coral", "alloy", "echo", "fable", "onyx", "nova", "shimmer"]);
  let isLoadingVoices = $state(false);

-  // Default voices to fall back to if API call fails
  const defaultVoices = ["coral", "alloy", "echo", "fable", "onyx", "nova", "shimmer"];
  const CACHE_KEY = "playground-speech-voices-cache";

-  // Load voices cache from localStorage
  function getVoicesCache(): Record<string, string[]> {
    if (typeof window === "undefined") return {};
    try {
@@ -35,7 +34,6 @@
    }
  }

-  // Save voices cache to localStorage
  function saveVoicesCache(cache: Record<string, string[]>) {
    if (typeof window === "undefined") return;
    try {
@@ -47,9 +45,12 @@

  let hasModels = $derived($models.some((m) => !m.unlisted));

-  // Track if this is the initial page load to avoid fetching on refresh
  let isInitialLoad = $state(true);

+  $effect(() => {
+    playgroundStores.speechGenerating.set(isGenerating);
+  });
+
  // On page load, restore cached voices for the selected model if available
  $effect(() => {
    const model = $selectedModelStore;
@@ -1,5 +1,5 @@
 import { describe, it, expect } from "vitest";
-import { renderMarkdown, escapeHtml } from "./markdown";
+import { renderMarkdown, escapeHtml, splitCompleteBlocks, closePendingBlock, normalizeLatexDelimiters, renderStreamingMarkdown, createStreamingCache } from "./markdown";

 describe("renderMarkdown", () => {
  describe("basic markdown", () => {
@@ -130,6 +130,35 @@ More text here.
      expect(result).toContain("katex");
      expect(result).toContain("sqrt");
    });
+
+    it("renders \\[...\\] display math", () => {
+      const result = renderMarkdown("\\[\nx^2 + y^2 = z^2\n\\]");
+      expect(result).toContain("katex");
+    });
+
+    it("renders \\(...\\) inline math", () => {
+      const result = renderMarkdown("The equation \\(E = mc^2\\) is famous.");
+      expect(result).toContain("katex");
+    });
+  });
+
+  describe("normalizeLatexDelimiters", () => {
+    it("converts \\[...\\] to $$...$$", () => {
+      expect(normalizeLatexDelimiters("\\[\nx^2\n\\]")).toBe("$$\nx^2\n$$");
+    });
+
+    it("converts \\(...\\) to $...$", () => {
+      expect(normalizeLatexDelimiters("\\(x^2\\)")).toBe("$x^2$");
+    });
+
+    it("leaves $$ and $ delimiters unchanged", () => {
+      const text = "$$x^2$$ and $y$";
+      expect(normalizeLatexDelimiters(text)).toBe(text);
+    });
+
+    it("handles multiple occurrences", () => {
+      expect(normalizeLatexDelimiters("\\(a\\) and \\(b\\)")).toBe("$a$ and $b$");
+    });
  });

  describe("escapeHtml", () => {
@@ -158,3 +187,237 @@ More text here.
    });
  });
 });
+
+describe("splitCompleteBlocks", () => {
+  it("returns everything as pending when no blank line", () => {
+    const result = splitCompleteBlocks("Hello world");
+    expect(result.complete).toBe("");
+    expect(result.pending).toBe("Hello world");
+  });
+
+  it("returns empty for empty input", () => {
+    const result = splitCompleteBlocks("");
+    expect(result.complete).toBe("");
+    expect(result.pending).toBe("");
+  });
+
+  it("splits on blank line between paragraphs", () => {
+    const result = splitCompleteBlocks("First paragraph.\n\nSecond paragraph");
+    expect(result.complete).toBe("First paragraph.\n");
+    expect(result.pending).toBe("Second paragraph");
+  });
+
+  it("splits multiple paragraphs at last blank line", () => {
+    const result = splitCompleteBlocks("Para 1.\n\nPara 2.\n\nPara 3");
+    expect(result.complete).toBe("Para 1.\n\nPara 2.\n");
+    expect(result.pending).toBe("Para 3");
+  });
+
+  it("treats closed code fence as complete boundary", () => {
+    const text = "```js\nconst x = 1;\n```\nMore text";
+    const result = splitCompleteBlocks(text);
+    expect(result.complete).toBe("```js\nconst x = 1;\n```");
+    expect(result.pending).toBe("More text");
+  });
+
+  it("treats unclosed code fence as pending", () => {
+    const text = "Done paragraph.\n\n```js\nconst x = 1;";
+    const result = splitCompleteBlocks(text);
+    expect(result.complete).toBe("Done paragraph.\n");
+    expect(result.pending).toBe("```js\nconst x = 1;");
+  });
+
+  it("does not split on blank lines inside code fences", () => {
+    const text = "```\nline1\n\nline2\n```";
+    const result = splitCompleteBlocks(text);
+    expect(result.complete).toBe("```\nline1\n\nline2\n```");
+    expect(result.pending).toBe("");
+  });
+
+  it("handles tilde fences", () => {
+    const text = "~~~py\nprint('hi')\n~~~\nAfter";
+    const result = splitCompleteBlocks(text);
+    expect(result.complete).toBe("~~~py\nprint('hi')\n~~~");
+    expect(result.pending).toBe("After");
+  });
+
+  it("does not close backtick fence with tilde fence", () => {
+    const text = "```\ncode\n~~~\nstill code";
+    const result = splitCompleteBlocks(text);
+    // The ~~~ should not close a backtick fence, so everything from ``` onward is pending
+    expect(result.complete).toBe("");
+    expect(result.pending).toBe("```\ncode\n~~~\nstill code");
+  });
+
+  it("treats closed math block as complete boundary", () => {
+    const text = "$$\nx^2\n$$\nAfter";
+    const result = splitCompleteBlocks(text);
+    expect(result.complete).toBe("$$\nx^2\n$$");
+    expect(result.pending).toBe("After");
+  });
+
+  it("treats unclosed math block as pending", () => {
+    const text = "Before.\n\n$$\nx^2";
+    const result = splitCompleteBlocks(text);
+    expect(result.complete).toBe("Before.\n");
+    expect(result.pending).toBe("$$\nx^2");
+  });
+
+  it("treats closed \\[...\\] math block as complete boundary", () => {
+    const text = "\\[\nx^2\n\\]\nAfter";
+    const result = splitCompleteBlocks(text);
+    expect(result.complete).toBe("\\[\nx^2\n\\]");
+    expect(result.pending).toBe("After");
+  });
+
+  it("treats unclosed \\[ math block as pending", () => {
+    const text = "Before.\n\n\\[\nx^2";
+    const result = splitCompleteBlocks(text);
+    expect(result.complete).toBe("Before.\n");
+    expect(result.pending).toBe("\\[\nx^2");
+  });
+
+  it("handles trailing blank line making everything complete", () => {
+    const text = "Hello world.\n";
+    const result = splitCompleteBlocks(text);
+    // Last line is empty string after split, which is a blank line
+    expect(result.complete).toBe("Hello world.\n");
+    expect(result.pending).toBe("");
+  });
+});
+
+describe("closePendingBlock", () => {
+  it("returns empty string for empty input", () => {
+    expect(closePendingBlock("")).toBe("");
+  });
+
+  it("returns plain text unchanged", () => {
+    expect(closePendingBlock("Hello world")).toBe("Hello world");
+  });
+
+  it("closes an open backtick code fence", () => {
+    const result = closePendingBlock("```python\nprint('hi')");
+    expect(result).toBe("```python\nprint('hi')\n```");
+  });
+
+  it("closes an open tilde code fence", () => {
+    const result = closePendingBlock("~~~js\nconst x = 1;");
+    expect(result).toBe("~~~js\nconst x = 1;\n~~~");
+  });
+
+  it("does not modify already-closed code fence", () => {
+    const text = "```py\ncode\n```";
+    expect(closePendingBlock(text)).toBe(text);
+  });
+
+  it("closes an open math block", () => {
+    const result = closePendingBlock("$$\nx^2 + y^2");
+    expect(result).toBe("$$\nx^2 + y^2\n$$");
+  });
+
+  it("does not modify already-closed math block", () => {
+    const text = "$$\nx^2\n$$";
+    expect(closePendingBlock(text)).toBe(text);
+  });
+
+  it("closes an open \\[ math block with \\]", () => {
+    const result = closePendingBlock("\\[\nx^2 + y^2");
+    expect(result).toBe("\\[\nx^2 + y^2\n\\]");
+  });
+
+  it("does not modify already-closed \\[...\\] math block", () => {
+    const text = "\\[\nx^2\n\\]";
+    expect(closePendingBlock(text)).toBe(text);
+  });
+
+  it("closes code fence when preceded by regular text", () => {
+    const result = closePendingBlock("Some text\n```\ncode");
+    expect(result).toBe("Some text\n```\ncode\n```");
+  });
+
+  it("leaves headers unchanged", () => {
+    expect(closePendingBlock("## Hello")).toBe("## Hello");
+  });
+
+  it("leaves tables unchanged", () => {
+    const table = "| a | b |\n| --- | --- |\n| 1 | 2 |";
+    expect(closePendingBlock(table)).toBe(table);
+  });
+
+  it("leaves lists unchanged", () => {
+    expect(closePendingBlock("- item 1\n- item 2")).toBe("- item 1\n- item 2");
+  });
+});
+
+describe("renderStreamingMarkdown", () => {
+  it("renders complete blocks and pending as markdown", () => {
+    const cache = createStreamingCache();
+    const text = "# Hello\n\nWorld";
+    const { blocks, pendingHtml } = renderStreamingMarkdown(text, cache);
+    expect(blocks).toHaveLength(1);
+    expect(blocks[0].html).toContain("<h1>Hello</h1>");
+    expect(pendingHtml).toContain("World");
+    expect(pendingHtml).toContain("<p>");
+  });
+
+  it("preserves existing blocks when complete portion is unchanged", () => {
+    const cache = createStreamingCache();
+    renderStreamingMarkdown("# Hello\n\nWor", cache);
+    const firstBlocks = cache.blocks;
+
+    const { blocks } = renderStreamingMarkdown("# Hello\n\nWorld", cache);
+    // Same block array reference — nothing changed in the complete section
+    expect(blocks).toBe(firstBlocks);
+    expect(cache.completeKey).toBe("# Hello\n");
+  });
+
+  it("appends a new block when a new section completes", () => {
+    const cache = createStreamingCache();
+    renderStreamingMarkdown("# Hello\n\nParagraph", cache);
+    expect(cache.blocks).toHaveLength(1);
+    const firstBlock = cache.blocks[0];
+
+    renderStreamingMarkdown("# Hello\n\nParagraph.\n\nMore", cache);
+    expect(cache.blocks).toHaveLength(2);
+    // First block is preserved with the same id and html
+    expect(cache.blocks[0].id).toBe(firstBlock.id);
+    expect(cache.blocks[0].html).toBe(firstBlock.html);
+    // Second block contains the new paragraph
+    expect(cache.blocks[1].html).toContain("Paragraph.");
+  });
+
+  it("assigns unique stable ids to each block", () => {
+    const cache = createStreamingCache();
+    renderStreamingMarkdown("A.\n\nB.\n\nC", cache);
+    expect(cache.blocks).toHaveLength(1);
+    const id0 = cache.blocks[0].id;
+
+    renderStreamingMarkdown("A.\n\nB.\n\nC.\n\nD", cache);
+    expect(cache.blocks).toHaveLength(2);
+    expect(cache.blocks[0].id).toBe(id0);
+    expect(cache.blocks[1].id).toBe(id0 + 1);
+  });
+
+  it("renders pending code block with syntax highlighting", () => {
+    const cache = createStreamingCache();
+    const text = "Done.\n\n```python\nprint('hello')";
+    const { pendingHtml } = renderStreamingMarkdown(text, cache);
+    expect(pendingHtml).toContain("<code");
+    expect(pendingHtml).toContain("hljs");
+  });
+
+  it("renders pending table as markdown", () => {
+    const cache = createStreamingCache();
+    const text = "Done.\n\n| a | b |\n| --- | --- |\n| 1 | 2 |";
+    const { pendingHtml } = renderStreamingMarkdown(text, cache);
+    expect(pendingHtml).toContain("<table>");
+    expect(pendingHtml).toContain("<td>");
+  });
+
+  it("renders pending portion through markdown pipeline", () => {
+    const cache = createStreamingCache();
+    const text = "Done.\n\nSome **bold** text";
+    const { pendingHtml } = renderStreamingMarkdown(text, cache);
+    expect(pendingHtml).toContain("<strong>bold</strong>");
+  });
+});
@@ -69,13 +69,189 @@ const processor = unified()
  .use(rehypeHighlight)
  .use(rehypeStringify, { allowDangerousHtml: true });

+export function splitCompleteBlocks(text: string): { complete: string; pending: string } {
+  if (!text) {
+    return { complete: "", pending: "" };
+  }
+
+  const lines = text.split("\n");
+  let lastCompleteBoundary = -1; // index of last line that ends a complete block
+  let inFence = false;
+  let fenceChar = "";
+  let inMathBlock = false;
+
+  for (let i = 0; i < lines.length; i++) {
+    const trimmed = lines[i].trimEnd();
+
+    if (inFence) {
+      // Check for closing fence: same character, at least 3, no other content
+      if (new RegExp(`^\\s*${fenceChar.replace(/~/g, "\\~")}{3,}\\s*$`).test(trimmed)) {
+        inFence = false;
+        fenceChar = "";
+        lastCompleteBoundary = i;
+      }
+      continue;
+    }
+
+    if (inMathBlock) {
+      if (trimmed === "$$" || trimmed === "\\]") {
+        inMathBlock = false;
+        lastCompleteBoundary = i;
+      }
+      continue;
+    }
+
+    // Check for opening fence
+    const fenceMatch = trimmed.match(/^(\s*)(```|~~~)/);
+    if (fenceMatch) {
+      // Check if it's an opening fence (may have language info after)
+      // A line with just ``` or ~~~ could be opening or closing, but since we're not in a fence it's opening
+      fenceChar = fenceMatch[2][0]; // '`' or '~'
+      inFence = true;
+      continue;
+    }
+
+    // Check for opening math block
+    if (trimmed === "$$" || trimmed === "\\[") {
+      inMathBlock = true;
+      continue;
+    }
+
+    // Outside fences/math: blank line marks a complete boundary
+    if (trimmed === "") {
+      lastCompleteBoundary = i;
+    }
+  }
+
+  if (lastCompleteBoundary < 0) {
+    return { complete: "", pending: text };
+  }
+
+  const completeLines = lines.slice(0, lastCompleteBoundary + 1);
+  const pendingLines = lines.slice(lastCompleteBoundary + 1);
+
+  return {
+    complete: completeLines.join("\n"),
+    pending: pendingLines.join("\n"),
+  };
+}
+
+export function closePendingBlock(pending: string): string {
+  if (!pending) return "";
+
+  const lines = pending.split("\n");
+  let inFence = false;
+  let fenceStr = "";
+  let inMathBlock = false;
+  let mathClose = "";
+
+  for (const line of lines) {
+    const trimmed = line.trimEnd();
+
+    if (inFence) {
+      if (new RegExp(`^\\s*${fenceStr[0] === "~" ? "~~~" : "\\`\\`\\`"}\\s*$`).test(trimmed)) {
+        inFence = false;
+        fenceStr = "";
+      }
+      continue;
+    }
+
+    if (inMathBlock) {
+      if (trimmed === "$$" || trimmed === "\\]") {
+        inMathBlock = false;
+        mathClose = "";
+      }
+      continue;
+    }
+
+    const fenceMatch = trimmed.match(/^(\s*)(```|~~~)/);
+    if (fenceMatch) {
+      fenceStr = fenceMatch[2];
+      inFence = true;
+      continue;
+    }
+
+    if (trimmed === "$$") {
+      inMathBlock = true;
+      mathClose = "$$";
+      continue;
+    }
+
+    if (trimmed === "\\[") {
+      inMathBlock = true;
+      mathClose = "\\]";
+      continue;
+    }
+  }
+
+  if (inFence) return pending + "\n" + fenceStr;
+  if (inMathBlock) return pending + "\n" + mathClose;
+  return pending;
+}
+
+export interface RenderedBlock {
+  id: number;
+  html: string;
+}
+
+export interface StreamingCache {
+  blocks: RenderedBlock[];
+  nextId: number;
+  completeKey: string;
+}
+
+export function createStreamingCache(): StreamingCache {
+  return { blocks: [], nextId: 0, completeKey: "" };
+}
+
+export function renderStreamingMarkdown(
+  text: string,
+  cache: StreamingCache,
+): { blocks: RenderedBlock[]; pendingHtml: string } {
+  const { complete, pending } = splitCompleteBlocks(text);
+
+  if (complete) {
+    if (cache.completeKey !== complete) {
+      if (complete.startsWith(cache.completeKey) && cache.completeKey.length > 0) {
+        // Complete section grew — render only the new part as a new block
+        const newPart = complete.slice(cache.completeKey.length);
+        cache.blocks = [...cache.blocks, { id: cache.nextId++, html: renderMarkdown(newPart) }];
+      } else {
+        // Complete section changed unexpectedly — re-render as single block
+        cache.blocks = [{ id: cache.nextId++, html: renderMarkdown(complete) }];
+      }
+      cache.completeKey = complete;
+    }
+  } else if (cache.blocks.length > 0) {
+    cache.blocks = [];
+    cache.completeKey = "";
+  }
+
+  let pendingHtml = "";
+  if (pending) {
+    const closed = closePendingBlock(pending);
+    pendingHtml = renderMarkdown(closed);
+  }
+
+  return { blocks: cache.blocks, pendingHtml };
+}
+
+// Convert \[...\] to $$...$$ and \(...\) to $...$
+export function normalizeLatexDelimiters(text: string): string {
+  // Display math: \[...\] → $$...$$  (may span multiple lines)
+  text = text.replace(/\\\[([\s\S]*?)\\\]/g, (_match, inner) => `$$${inner}$$`);
+  // Inline math: \(...\) → $...$
+  text = text.replace(/\\\(([\s\S]*?)\\\)/g, (_match, inner) => `$${inner}$`);
+  return text;
+}
+
 export function renderMarkdown(content: string): string {
  if (!content) {
    return "";
  }

  try {
-    const result = processor.processSync(content);
+    const result = processor.processSync(normalizeLatexDelimiters(content));
    return String(result);
  } catch {
    // Fallback to escaped plain text if markdown parsing fails
@@ -9,6 +9,7 @@ export interface Model {
  description: string;
  unlisted: boolean;
  peerID: string;
+  aliases?: string[];
 }

 export interface Metrics {
@@ -38,8 +39,12 @@ export interface LogData {
  data: string;
 }

+export interface InFlightStats {
+  total: number;
+}
+
 export interface APIEventEnvelope {
-  type: "modelStatus" | "logData" | "metrics";
+  type: "modelStatus" | "logData" | "metrics" | "inflight";
  data: string;
 }

@@ -0,0 +1 @@
+<!-- empty: real Playground is always mounted in App.svelte -->
@@ -1,5 +1,5 @@
 import { writable } from "svelte/store";
-import type { Model, Metrics, VersionInfo, LogData, APIEventEnvelope, ReqRespCapture } from "../lib/types";
+import type { Model, Metrics, VersionInfo, LogData, APIEventEnvelope, ReqRespCapture, InFlightStats } from "../lib/types";
 import { connectionState } from "./theme";

 const LOG_LENGTH_LIMIT = 1024 * 100; /* 100KB of log data */
@@ -9,6 +9,7 @@ export const models = writable<Model[]>([]);
 export const proxyLogs = writable<string>("");
 export const upstreamLogs = writable<string>("");
 export const metrics = writable<Metrics[]>([]);
+export const inFlightRequests = writable<number>(0);
 export const versionInfo = writable<VersionInfo>({
  build_date: "unknown",
  commit: "unknown",
@@ -29,6 +30,7 @@ export function enableAPIEvents(enabled: boolean): void {
    apiEventSource?.close();
    apiEventSource = null;
    metrics.set([]);
+    inFlightRequests.set(0);
    return;
  }

@@ -46,6 +48,7 @@ export function enableAPIEvents(enabled: boolean): void {
      proxyLogs.set("");
      upstreamLogs.set("");
      metrics.set([]);
+      inFlightRequests.set(0);
      models.set([]);
      retryCount = 0;
      connectionState.set("connected");
@@ -83,6 +86,11 @@ export function enableAPIEvents(enabled: boolean): void {
            metrics.update((prevMetrics) => [...newMetrics, ...prevMetrics]);
            break;
          }
+          case "inflight": {
+            const stats = JSON.parse(message.data) as InFlightStats;
+            inFlightRequests.set(stats.total ?? 0);
+            break;
+          }
        }
      } catch (err) {
        console.error(e.data, err);
@@ -0,0 +1,18 @@
+import { writable, derived } from "svelte/store";
+
+const chatStreaming = writable(false);
+const imageGenerating = writable(false);
+const speechGenerating = writable(false);
+const audioTranscribing = writable(false);
+
+export const playgroundActivity = derived(
+  [chatStreaming, imageGenerating, speechGenerating, audioTranscribing],
+  ([$chat, $image, $speech, $audio]) => $chat || $image || $speech || $audio
+);
+
+export const playgroundStores = {
+  chatStreaming,
+  imageGenerating,
+  speechGenerating,
+  audioTranscribing,
+};
@@ -0,0 +1,3 @@
+import { writable } from "svelte/store";
+
+export const currentRoute = writable("/");
Author	SHA1	Message	Date
Benson Wong	19fb5f35e9	proxy: implement setParamsByID filter (#535 ) Add setParamsByID filter that applies different request parameters based on the requested model ID, enabling per-alias behaviour for a single loaded model. - add SetParamsByID field to Filters struct and SanitizedSetParamsByID method - substitute ${MODEL_ID} and other macros in setParamsByID keys and values - validate no unknown macros remain in keys or values after substitution - apply setParamsByID in proxyInferenceHandler after setParams (can override it) - update config-schema.json with setParamsByID definition - update UI to show aliases and make them selectable in the Playground closes #534	2026-02-19 22:21:10 -08:00
Benson Wong	b45102bde8	ui: smart auto-scroll in LogPanel (#530 ) Pause auto-scroll when the user scrolls up to review logs, and resume when they scroll back to the bottom. - add `userScrolledUp` state variable - add `handleScroll` to detect scroll position with 40px threshold - guard the auto-scroll effect with `!userScrolledUp` closes #529	2026-02-18 19:47:37 -08:00
Brian Mendonca	1688bdd1e9	proxy, ui: add pending requests count to the main dashboard (#516 ) add a real time counter of pending (inflight) requests to the UI.	2026-02-16 09:41:15 -08:00
Benson Wong	d33d51fa75	.coderabbit.yaml,AGENTS.md: small tweaks	2026-02-15 21:31:30 -08:00
Benson Wong	e3bf065574	ui: persist playground state across route navigation (#525 ) - Keep Playground component mounted when navigating away, preserving streaming/generating state - Add animated gradient effect on Playground nav link when activity is in progress	2026-02-15 21:30:52 -08:00
Benson Wong	3e52144058	ui-svelte: incremental rendering of chat messages in the Playground (#520 ) add incremental rendering to Playground > Chat	2026-02-15 11:00:44 -08:00
Benson Wong	d5e52d7d00	build: disable provenance attestations in container builds (#523 ) ## Summary - Add `--provenance=false` to docker build commands in `build-container.sh` - BuildKit attestation manifests are stored as untagged images in GHCR, and the `delete-untagged-containers` cleanup job deletes them, breaking the manifest list and causing `manifest unknown` errors on pull - ref: https://github.com/actions/delete-package-versions/issues/162	2026-02-14 10:23:08 -08:00
Benson Wong	17e5263a76	.github/workflows: fix expired token in publishing images (#522 ) Fixes: #517	2026-02-14 10:06:05 -08:00
Benson Wong	8d6d949ec3	proxy: support timings for /infill from llama-server (#510 ) fixes: #463	2026-02-07 17:16:27 -08:00
				`@@ -0,0 +1 @@`
				`<!-- empty: real Playground is always mounted in App.svelte -->`