Add barebones but working implementation of model preload (#209 , #235 )

Add barebones but working implementation of model preload * add config test for Preload hook * improve TestProxyManager_StartupHooks * docs for new hook configuration * add a .dev to .gitignore
Add prompt processing metrics (#250 )
2025-08-14 10:27:28 -07:00 · 2025-08-14 10:02:16 -07:00 · 2025-08-08 13:39:46 -07:00 · 2025-08-08 13:33:47 -07:00 · 2025-08-08 13:28:39 -07:00 · 2025-08-08 09:52:05 -07:00
18 changed files with 341 additions and 98 deletions
@@ -1,11 +1,13 @@
 ---
 name: Bug Report
-about: Something is not working as expected...
+about: I found a defect
 title: ''
 labels: bug
 assignees: ''
 ---
 > [!IMPORTANT]
 > If you have questions about llama-swap please post in the Q&A in Discussions. Use bug reports when you've found a defect and wish to discuss a fix.
 **Describe the bug**
 A clear and concise description of what the bug is.
@@ -22,6 +22,13 @@ jobs:
      with:
        go-version: '1.23'
    # Only run in this linux based runner
    - name: Check Formatting
      run: |
        if [ "$(gofmt -l . | grep -v 'event/.*_test.go' | wc -l)" -gt 0 ]; then
          gofmt -l . | grep -v 'event/.*_test.go'
          exit 1
        fi
    # cache simple-responder to save the build time
    - name: Restore Simple Responder
      id: restore-simple-responder
@@ -4,3 +4,4 @@ build/
 dist/
 .vscode
 .DS_Store
 .dev/
@@ -31,8 +31,9 @@ Written in golang, it is very easy to install (single binary with no dependencie
 - ✅ Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
 - ✅ Automatic unloading of models after timeout by setting a `ttl`
 - ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
- ✅ Docker and Podman support
+- ✅ Reliable Docker and Podman support with `cmdStart` and `cmdStop`
 - ✅ Full control over server settings per model
 - ✅ Preload models on startup with `hooks` ([#235](https://github.com/mostlygeek/llama-swap/pull/235))
 ## How does llama-swap work?
@@ -71,9 +72,13 @@ See the [configuration documentation](https://github.com/mostlygeek/llama-swap/w
 ## Web UI
-llama-swap ships with a real time web interface to monitor logs and status of models:
+llama-swap includes a real time web interface for monitoring logs and models:
-<img width="1786" height="1334" alt="image" src="https://github.com/user-attachments/assets/d6258cb9-1dad-40db-828f-2be860aec8fe" />
+<img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/adef4a8e-de0b-49db-885a-8f6dedae6799" />
 The Activity Page shows recent requests:
 <img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/5f3edee6-d03a-4ae5-ae06-b20ac1f135bd" />
 ## Installation
@@ -1,6 +1,13 @@
 # llama-swap YAML configuration example
 # -------------------------------------
 #
 # 💡 Tip - Use an LLM with this file!
 # ====================================
 #  This example configuration is written to be LLM friendly! Try
 #  copying this file into an LLM and asking it to explain or generate
 #  sections for you.
 # ====================================
 #
 # - Below are all the available configuration options for llama-swap.
 # - Settings with a default value, or noted as optional can be omitted.
 # - Settings that are marked required must be in your configuration file
@@ -207,3 +214,19 @@ groups:
      - "forever-modelA"
      - "forever-modelB"
      - "forever-modelc"
 # hooks: a dictionary of event triggers and actions
 # - optional, default: empty dictionary
 # - the only supported hook is on_startup
 hooks:
  # on_startup: a dictionary of actions to perform on startup
  # - optional, default: empty dictionar
  # - the only supported action is preload
  on_startup:
        # preload: a list of model ids to load on startup
        # - optional, default: empty list
        # - model names must match keys in the models sections
        # - when preloading multiple models at once, define a group
        #   otherwise models will be loaded and swapped out
    preload:
      - "llama"
@@ -138,6 +138,14 @@ func (c *GroupConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
 	return nil
 }
 type HooksConfig struct {
 	OnStartup HookOnStartup `yaml:"on_startup"`
 }
 type HookOnStartup struct {
 	Preload []string `yaml:"preload"`
 }
 type Config struct {
 	HealthCheckTimeout int                    `yaml:"healthCheckTimeout"`
 	LogRequests        bool                   `yaml:"logRequests"`
@@ -155,6 +163,9 @@ type Config struct {
 	// automatic port assignments
 	StartPort int `yaml:"startPort"`
 	// hooks, see: #209
 	Hooks HooksConfig `yaml:"hooks"`
 }
 func (c *Config) RealModelName(search string) (string, bool) {
@@ -330,6 +341,22 @@ func LoadConfigFromReader(r io.Reader) (Config, error) {
 		}
 	}
 	// clean up hooks preload
 	if len(config.Hooks.OnStartup.Preload) > 0 {
 		var toPreload []string
 		for _, modelID := range config.Hooks.OnStartup.Preload {
 			modelID = strings.TrimSpace(modelID)
 			if modelID == "" {
 				continue
 			}
 			if real, found := config.RealModelName(modelID); found {
 				toPreload = append(toPreload, real)
 			}
 		}
 		config.Hooks.OnStartup.Preload = toPreload
 	}
 	return config, nil
 }
@@ -100,6 +100,9 @@ func TestConfig_LoadPosix(t *testing.T) {
 	content := `
 macros:
  svr-path: "path/to/server"
 hooks:
  on_startup:
    preload: ["model1", "model2"]
 models:
  model1:
    cmd: path/to/cmd --arg1 one
@@ -163,6 +166,11 @@ groups:
 		Macros: map[string]string{
 			"svr-path": "path/to/server",
 		},
 		Hooks: HooksConfig{
 			OnStartup: HookOnStartup{
 				Preload: []string{"model1", "model2"},
 			},
 		},
 		Models: map[string]ModelConfig{
 			"model1": {
 				Cmd:           "path/to/cmd --arg1 one",
@@ -0,0 +1,27 @@
 package proxy
 import "net/http"
 // Custom discard writer that implements http.ResponseWriter but just discards everything
 type DiscardWriter struct {
 	header http.Header
 	status int
 }
 func (w *DiscardWriter) Header() http.Header {
 	if w.header == nil {
 		w.header = make(http.Header)
 	}
 	return w.header
 }
 func (w *DiscardWriter) Write(data []byte) (int, error) {
 	return len(data), nil
 }
 func (w *DiscardWriter) WriteHeader(code int) {
 	w.status = code
 }
 // Satisfy the http.Flusher interface for streaming responses
 func (w *DiscardWriter) Flush() {}
@@ -7,6 +7,7 @@ const ChatCompletionStatsEventID = 0x02
 const ConfigFileChangedEventID = 0x03
 const LogDataEventID = 0x04
 const TokenMetricsEventID = 0x05
 const ModelPreloadedEventID = 0x06
 type ProcessStateChangeEvent struct {
 	ProcessName string
@@ -48,3 +49,12 @@ type LogDataEvent struct {
 func (e LogDataEvent) Type() uint32 {
 	return LogDataEventID
 }
 type ModelPreloadedEvent struct {
 	ModelName string
 	Success   bool
 }
 func (e ModelPreloadedEvent) Type() uint32 {
 	return ModelPreloadedEventID
 }
@@ -16,6 +16,7 @@ var (
 	nextTestPort        int = 12000
 	portMutex           sync.Mutex
 	testLogger          = NewLogMonitorWriter(os.Stdout)
 	simpleResponderPath = getSimpleResponderPath()
 )
 // Check if the binary exists
@@ -69,13 +70,11 @@ func getTestSimpleResponderConfig(expectedMessage string) ModelConfig {
 }
 func getTestSimpleResponderConfigPort(expectedMessage string, port int) ModelConfig {
 	binaryPath := getSimpleResponderPath()
 	// Create a YAML string with just the values we want to set
 	yamlStr := fmt.Sprintf(`
 cmd: '%s --port %d --silent --respond %s'
 proxy: "http://127.0.0.1:%d"
-`, binaryPath, port, expectedMessage, port)
+`, simpleResponderPath, port, expectedMessage, port)
 	var cfg ModelConfig
 	if err := yaml.Unmarshal([]byte(yamlStr), &cfg); err != nil {
@@ -79,10 +79,12 @@ func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
 	outputTokens := int(jsonData.Get("usage.completion_tokens").Int())
 	inputTokens := int(jsonData.Get("usage.prompt_tokens").Int())
 	tokensPerSecond := -1.0
 	promptPerSecond := -1.0
 	durationMs := int(time.Since(rec.startTime).Milliseconds())
 	// use llama-server's timing data for tok/sec and duration as it is more accurate
 	if timings := jsonData.Get("timings"); timings.Exists() {
 		promptPerSecond = jsonData.Get("timings.prompt_per_second").Float()
 		tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float()
 		durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float())
 	}
@@ -92,6 +94,7 @@ func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
 		Model:           rec.realModelName,
 		InputTokens:     inputTokens,
 		OutputTokens:    outputTokens,
 		PromptPerSecond: promptPerSecond,
 		TokensPerSecond: tokensPerSecond,
 		DurationMs:      durationMs,
 	})
@@ -15,6 +15,7 @@ type TokenMetrics struct {
 	Model           string    `json:"model"`
 	InputTokens     int       `json:"input_tokens"`
 	OutputTokens    int       `json:"output_tokens"`
 	PromptPerSecond float64   `json:"prompt_per_second"`
 	TokensPerSecond float64   `json:"tokens_per_second"`
 	DurationMs      int       `json:"duration_ms"`
 }
@@ -15,6 +15,7 @@ import (
 	"time"
 	"github.com/gin-gonic/gin"
 	"github.com/mostlygeek/llama-swap/event"
 	"github.com/tidwall/gjson"
 	"github.com/tidwall/sjson"
 )
@@ -96,6 +97,35 @@ func New(config Config) *ProxyManager {
 	}
 	pm.setupGinEngine()
 	// run any startup hooks
 	if len(config.Hooks.OnStartup.Preload) > 0 {
 		// do it in the background, don't block startup -- not sure if good idea yet
 		go func() {
 			discardWriter := &DiscardWriter{}
 			for _, realModelName := range config.Hooks.OnStartup.Preload {
 				proxyLogger.Infof("Preloading model: %s", realModelName)
 				processGroup, _, err := pm.swapProcessGroup(realModelName)
 				if err != nil {
 					event.Emit(ModelPreloadedEvent{
 						ModelName: realModelName,
 						Success:   false,
 					})
 					proxyLogger.Errorf("Failed to preload model %s: %v", realModelName, err)
 					continue
 				} else {
 					req, _ := http.NewRequest("GET", "/", nil)
 					processGroup.ProxyRequest(realModelName, discardWriter, req)
 					event.Emit(ModelPreloadedEvent{
 						ModelName: realModelName,
 						Success:   true,
 					})
 				}
 			}
 		}()
 	}
 	return pm
 }
@@ -361,7 +391,7 @@ func (pm *ProxyManager) proxyToUpstream(c *gin.Context) {
 		return
 	}
-	processGroup, _, err := pm.swapProcessGroup(requestedModel)
+	processGroup, realModelName, err := pm.swapProcessGroup(requestedModel)
 	if err != nil {
 		pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
 		return
@@ -369,7 +399,7 @@ func (pm *ProxyManager) proxyToUpstream(c *gin.Context) {
 	// rewrite the path
 	c.Request.URL.Path = c.Param("upstreamPath")
-	processGroup.ProxyRequest(requestedModel, c.Writer, c.Request)
+	processGroup.ProxyRequest(realModelName, c.Writer, c.Request)
 }
 func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
@@ -9,10 +9,12 @@ import (
 	"net/http"
 	"net/http/httptest"
 	"strconv"
 	"strings"
 	"sync"
 	"testing"
 	"time"
 	"github.com/mostlygeek/llama-swap/event"
 	"github.com/stretchr/testify/assert"
 	"github.com/tidwall/gjson"
 )
@@ -656,21 +658,34 @@ func TestProxyManager_CORSOptionsHandler(t *testing.T) {
 }
 func TestProxyManager_Upstream(t *testing.T) {
-	config := AddDefaultGroupToConfig(Config{
+	configStr := fmt.Sprintf(`
-		HealthCheckTimeout: 15,
+logLevel: error
-		Models: map[string]ModelConfig{
+models:
-			"model1": getTestSimpleResponderConfig("model1"),
+  model1:
-		},
+    cmd: %s -port ${PORT} -silent -respond model1
-		LogLevel: "error",
+    aliases: [model-alias]
-	})
+`, getSimpleResponderPath())
 	config, err := LoadConfigFromReader(strings.NewReader(configStr))
 	assert.NoError(t, err)
 	proxy := New(config)
 	defer proxy.StopProcesses(StopWaitForInflightRequest)
 	t.Run("main model name", func(t *testing.T) {
 		req := httptest.NewRequest("GET", "/upstream/model1/test", nil)
 		rec := httptest.NewRecorder()
 		proxy.ServeHTTP(rec, req)
 		assert.Equal(t, http.StatusOK, rec.Code)
 		assert.Equal(t, "model1", rec.Body.String())
 	})
 	t.Run("model alias", func(t *testing.T) {
 		req := httptest.NewRequest("GET", "/upstream/model-alias/test", nil)
 		rec := httptest.NewRecorder()
 		proxy.ServeHTTP(rec, req)
 		assert.Equal(t, http.StatusOK, rec.Code)
 		assert.Equal(t, "model1", rec.Body.String())
 	})
 }
 func TestProxyManager_ChatContentLength(t *testing.T) {
@@ -818,3 +833,62 @@ func TestProxyManager_HealthEndpoint(t *testing.T) {
 	assert.Equal(t, http.StatusOK, rec.Code)
 	assert.Equal(t, "OK", rec.Body.String())
 }
 func TestProxyManager_StartupHooks(t *testing.T) {
 	// using real YAML as the configuration has gotten more complex
 	// is the right approach as LoadConfigFromReader() does a lot more
 	// than parse YAML now. Eventually migrate all tests to use this approach
 	configStr := strings.Replace(`
 logLevel: error
 hooks:
  on_startup:
    preload:
      - model1
      - model2
 groups:
  preloadTestGroup:
    swap: false
    members:
       - model1
       - model2
 models:
  model1:
    cmd: ${simpleresponderpath} --port ${PORT} --silent --respond model1
  model2:
      cmd: ${simpleresponderpath} --port ${PORT} --silent --respond model2
 `, "${simpleresponderpath}", simpleResponderPath, -1)
 	// Create a test model configuration
 	config, err := LoadConfigFromReader(strings.NewReader(configStr))
 	if !assert.NoError(t, err, "Invalid configuration") {
 		return
 	}
 	preloadChan := make(chan ModelPreloadedEvent, 2) // buffer for 2 expected events
 	unsub := event.On(func(e ModelPreloadedEvent) {
 		preloadChan <- e
 	})
 	defer unsub()
 	// Create the proxy which should trigger preloading
 	proxy := New(config)
 	defer proxy.StopProcesses(StopWaitForInflightRequest)
 	for i := 0; i < 2; i++ {
 		select {
 		case <-preloadChan:
 		case <-time.After(5 * time.Second):
 			t.Fatal("timed out waiting for models to preload")
 		}
 	}
 	// make sure they are both loaded
 	_, foundGroup := proxy.processGroups["preloadTestGroup"]
 	if !assert.True(t, foundGroup, "preloadTestGroup should exist") {
 		return
 	}
 	assert.Equal(t, StateReady, proxy.processGroups["preloadTestGroup"].processes["model1"].CurrentState())
 	assert.Equal(t, StateReady, proxy.processGroups["preloadTestGroup"].processes["model2"].CurrentState())
 }
@@ -28,6 +28,7 @@ interface Metrics {
  model: string;
  input_tokens: number;
  output_tokens: number;
  prompt_per_second: number;
  tokens_per_second: number;
  duration_ms: number;
 }
@@ -83,6 +84,12 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
            case "modelStatus":
              {
                const models = JSON.parse(message.data) as Model[];
                // sort models by name and id
                models.sort((a, b) => {
                  return (a.name + a.id).localeCompare(b.name + b.id);
                });
                setModels(models);
              }
              break;
@@ -51,6 +51,7 @@ const ActivityPage = () => {
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Model</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Input Tokens</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Output Tokens</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Prompt Processing</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Generation Speed</th>
                <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Duration</th>
              </tr>
@@ -62,6 +63,7 @@ const ActivityPage = () => {
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.model}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.input_tokens.toLocaleString()}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.output_tokens.toLocaleString()}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.prompt_per_second)}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.tokens_per_second)}</td>
                  <td className="px-6 py-4 whitespace-nowrap text-sm">{formatDuration(metric.duration_ms)}</td>
                </tr>
@@ -4,7 +4,7 @@ import { LogPanel } from "./LogViewer";
 import { usePersistentState } from "../hooks/usePersistentState";
 import { Panel, PanelGroup, PanelResizeHandle } from "react-resizable-panels";
 import { useTheme } from "../contexts/ThemeProvider";
-import { RiEyeFill, RiEyeOffFill, RiStopCircleLine } from "react-icons/ri";
+import { RiEyeFill, RiEyeOffFill, RiStopCircleLine, RiSwapBoxFill } from "react-icons/ri";
 export default function ModelsPage() {
  const { isNarrow } = useTheme();
@@ -40,6 +40,7 @@ function ModelsPanel() {
  const { models, loadModel, unloadAllModels } = useAPI();
  const [isUnloading, setIsUnloading] = useState(false);
  const [showUnlisted, setShowUnlisted] = usePersistentState("showUnlisted", true);
  const [showIdorName, setShowIdorName] = usePersistentState<"id" | "name">("showIdorName", "id"); // true = show ID, false = show name
  const filteredModels = useMemo(() => {
    return models.filter((model) => showUnlisted || !model.unlisted);
@@ -58,11 +59,20 @@ function ModelsPanel() {
    }
  }, [unloadAllModels]);
  const toggleIdorName = useCallback(() => {
    setShowIdorName((prev) => (prev === "name" ? "id" : "name"));
  }, [showIdorName]);
  return (
    <div className="card h-full flex flex-col">
      <div className="shrink-0">
        <h2>Models</h2>
        <div className="flex justify-between">
          <div className="flex gap-2">
            <button className="btn flex items-center gap-2" onClick={toggleIdorName} style={{ lineHeight: "1.2" }}>
              <RiSwapBoxFill /> {showIdorName === "id" ? "ID" : "Name"}
            </button>
            <button
              className="btn flex items-center gap-2"
              onClick={() => setShowUnlisted(!showUnlisted)}
@@ -70,6 +80,7 @@ function ModelsPanel() {
            >
              {showUnlisted ? <RiEyeFill /> : <RiEyeOffFill />} unlisted
            </button>
          </div>
          <button className="btn flex items-center gap-2" onClick={handleUnloadAllModels} disabled={isUnloading}>
            <RiStopCircleLine size="24" /> {isUnloading ? "Unloading..." : "Unload"}
          </button>
@@ -80,7 +91,7 @@ function ModelsPanel() {
        <table className="w-full">
          <thead className="sticky top-0 bg-card z-10">
            <tr className="border-b border-primary bg-surface">
-              <th className="text-left p-2">Name</th>
+              <th className="text-left p-2">{showIdorName === "id" ? "Model ID" : "Name"}</th>
              <th className="text-left p-2"></th>
              <th className="text-left p-2">State</th>
            </tr>
@@ -90,7 +101,7 @@ function ModelsPanel() {
              <tr key={model.id} className="border-b hover:bg-secondary-hover border-border">
                <td className={`p-2 ${model.unlisted ? "text-txtsecondary" : ""}`}>
                  <a href={`/upstream/${model.id}/`} className={`underline`} target="_blank">
-                    {model.name !== "" ? model.name : model.id}
+                    {showIdorName === "id" ? model.id : model.name !== "" ? model.name : model.id}
                  </a>
                  {model.description !== "" && (
                    <p className={model.unlisted ? "text-opacity-70" : ""}>
@@ -122,35 +133,41 @@ function ModelsPanel() {
 function StatsPanel() {
  const { metrics } = useAPI();
-  const [totalRequests, totalTokens, avgTokensPerSecond] = useMemo(() => {
+  const [totalRequests, totalInputTokens, totalOutputTokens, avgTokensPerSecond] = useMemo(() => {
    const totalRequests = metrics.length;
    if (totalRequests === 0) {
      return [0, 0, 0];
    }
-    const totalTokens = metrics.reduce((sum, m) => sum + m.output_tokens, 0);
+    const totalInputTokens = metrics.reduce((sum, m) => sum + m.input_tokens, 0);
    const totalOutputTokens = metrics.reduce((sum, m) => sum + m.output_tokens, 0);
    const avgTokensPerSecond = (metrics.reduce((sum, m) => sum + m.tokens_per_second, 0) / totalRequests).toFixed(2);
-    return [totalRequests, totalTokens, avgTokensPerSecond];
+    return [totalRequests, totalInputTokens, totalOutputTokens, avgTokensPerSecond];
  }, [metrics]);
  return (
    <div className="card">
-      <h2>Chat Activity</h2>
+      <div className="rounded-lg overflow-hidden border border-gray-200">
-      <table className="w-full border border-gray-200">
+        <table className="w-full">
          <tbody>
-          <tr className="border-b border-gray-200">
+            <tr>
-            <td className="py-2 px-4 font-medium border-r border-gray-200">Requests</td>
+              <th className="p-2 font-medium border-b border-gray-200 text-right">Requests</th>
-            <td className="py-2 px-4 text-right">{totalRequests}</td>
+              <th className="p-2 font-medium border-l border-b border-gray-200 text-right">Processed</th>
-          </tr>
+              <th className="p-2 font-medium border-l border-b border-gray-200 text-right">Generated</th>
-          <tr className="border-b border-gray-200">
+              <th className="p-2 font-medium border-l border-b border-gray-200 text-right">Tokens/Sec</th>
            <td className="py-2 px-4 font-medium border-r border-gray-200">Total Tokens Generated</td>
            <td className="py-2 px-4 text-right">{totalTokens}</td>
            </tr>
            <tr>
-            <td className="py-2 px-4 font-medium border-r border-gray-200">Average Tokens/Second</td>
+              <td className="p-2 text-right border-r border-gray-200">{totalRequests}</td>
-            <td className="py-2 px-4 text-right">{avgTokensPerSecond}</td>
+              <td className="p-2 text-right border-r border-gray-200">
                {new Intl.NumberFormat().format(totalInputTokens)}
              </td>
              <td className="p-2 text-right border-r border-gray-200">
                {new Intl.NumberFormat().format(totalOutputTokens)}
              </td>
              <td className="p-2 text-right">{avgTokensPerSecond}</td>
            </tr>
          </tbody>
        </table>
      </div>
    </div>
  );
 }
Author	SHA1	Message	Date
Benson Wong	5dc6b3e6d9	Add barebones but working implementation of model preload (#209 , #235 ) Add barebones but working implementation of model preload * add config test for Preload hook * improve TestProxyManager_StartupHooks * docs for new hook configuration * add a .dev to .gitignore	2025-08-14 10:27:28 -07:00
Benson Wong	74c69f39ef	Add prompt processing metrics (#250 ) - capture prompt processing metrics - display prompt processing metrics on UI Activity page	2025-08-14 10:02:16 -07:00
Benson Wong	a186318892	Update Readme, Add screenshot for Activities page [skip ci]	2025-08-08 13:39:46 -07:00
Benson Wong	c4e4d5e1e9	Update Readme UI Screenshot [skip ci]	2025-08-08 13:33:47 -07:00
Benson Wong	7985e94ba4	add tokens processed to ui models page	2025-08-08 13:28:39 -07:00
Benson Wong	74556c3a36	Update bug-report.md [skip ci]	2025-08-08 09:52:05 -07:00
Benson Wong	5c381e4b30	Add gofmt linting to ci	2025-08-07 20:29:18 -07:00
Benson Wong	10569ed546	Fix model alias usage in upstream path (#230 ) Model alias values are not properly resolved and work in upstream/ path. Related to #229.	2025-08-07 20:16:56 -07:00
Benson Wong	5b10b3c23f	UI Tweaks (#228 ) * sort model names in UI * add toggle to show model id/name on UI model page	2025-08-07 11:07:03 -07:00