Compare commits

...

9 Commits

Author SHA1 Message Date
Benson Wong 5dc6b3e6d9 Add barebones but working implementation of model preload (#209, #235)
Add barebones but working implementation of model preload

* add config test for Preload hook
* improve TestProxyManager_StartupHooks
* docs for new hook configuration
* add a .dev to .gitignore
2025-08-14 10:27:28 -07:00
Benson Wong 74c69f39ef Add prompt processing metrics (#250)
- capture prompt processing metrics
- display prompt processing metrics on UI Activity page
2025-08-14 10:02:16 -07:00
Benson Wong a186318892 Update Readme, Add screenshot for Activities page [skip ci] 2025-08-08 13:39:46 -07:00
Benson Wong c4e4d5e1e9 Update Readme UI Screenshot [skip ci] 2025-08-08 13:33:47 -07:00
Benson Wong 7985e94ba4 add tokens processed to ui models page 2025-08-08 13:28:39 -07:00
Benson Wong 74556c3a36 Update bug-report.md [skip ci] 2025-08-08 09:52:05 -07:00
Benson Wong 5c381e4b30 Add gofmt linting to ci 2025-08-07 20:29:18 -07:00
Benson Wong 10569ed546 Fix model alias usage in upstream path (#230)
Model alias values are not properly resolved and work in upstream/ path.

Related to #229.
2025-08-07 20:16:56 -07:00
Benson Wong 5b10b3c23f UI Tweaks (#228)
* sort model names in UI

* add toggle to show model id/name on UI model page
2025-08-07 11:07:03 -07:00
18 changed files with 341 additions and 98 deletions
+3 -1
View File
@@ -1,11 +1,13 @@
--- ---
name: Bug Report name: Bug Report
about: Something is not working as expected... about: I found a defect
title: '' title: ''
labels: bug labels: bug
assignees: '' assignees: ''
--- ---
> [!IMPORTANT]
> If you have questions about llama-swap please post in the Q&A in Discussions. Use bug reports when you've found a defect and wish to discuss a fix.
**Describe the bug** **Describe the bug**
A clear and concise description of what the bug is. A clear and concise description of what the bug is.
+7
View File
@@ -22,6 +22,13 @@ jobs:
with: with:
go-version: '1.23' go-version: '1.23'
# Only run in this linux based runner
- name: Check Formatting
run: |
if [ "$(gofmt -l . | grep -v 'event/.*_test.go' | wc -l)" -gt 0 ]; then
gofmt -l . | grep -v 'event/.*_test.go'
exit 1
fi
# cache simple-responder to save the build time # cache simple-responder to save the build time
- name: Restore Simple Responder - name: Restore Simple Responder
id: restore-simple-responder id: restore-simple-responder
+1
View File
@@ -4,3 +4,4 @@ build/
dist/ dist/
.vscode .vscode
.DS_Store .DS_Store
.dev/
+8 -3
View File
@@ -31,8 +31,9 @@ Written in golang, it is very easy to install (single binary with no dependencie
- ✅ Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107)) - ✅ Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
- ✅ Automatic unloading of models after timeout by setting a `ttl` - ✅ Automatic unloading of models after timeout by setting a `ttl`
- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc) - ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
- ✅ Docker and Podman support - Reliable Docker and Podman support with `cmdStart` and `cmdStop`
- ✅ Full control over server settings per model - ✅ Full control over server settings per model
- ✅ Preload models on startup with `hooks` ([#235](https://github.com/mostlygeek/llama-swap/pull/235))
## How does llama-swap work? ## How does llama-swap work?
@@ -71,9 +72,13 @@ See the [configuration documentation](https://github.com/mostlygeek/llama-swap/w
## Web UI ## Web UI
llama-swap ships with a real time web interface to monitor logs and status of models: llama-swap includes a real time web interface for monitoring logs and models:
<img width="1786" height="1334" alt="image" src="https://github.com/user-attachments/assets/d6258cb9-1dad-40db-828f-2be860aec8fe" /> <img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/adef4a8e-de0b-49db-885a-8f6dedae6799" />
The Activity Page shows recent requests:
<img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/5f3edee6-d03a-4ae5-ae06-b20ac1f135bd" />
## Installation ## Installation
+23
View File
@@ -1,6 +1,13 @@
# llama-swap YAML configuration example # llama-swap YAML configuration example
# ------------------------------------- # -------------------------------------
# #
# 💡 Tip - Use an LLM with this file!
# ====================================
# This example configuration is written to be LLM friendly! Try
# copying this file into an LLM and asking it to explain or generate
# sections for you.
# ====================================
#
# - Below are all the available configuration options for llama-swap. # - Below are all the available configuration options for llama-swap.
# - Settings with a default value, or noted as optional can be omitted. # - Settings with a default value, or noted as optional can be omitted.
# - Settings that are marked required must be in your configuration file # - Settings that are marked required must be in your configuration file
@@ -207,3 +214,19 @@ groups:
- "forever-modelA" - "forever-modelA"
- "forever-modelB" - "forever-modelB"
- "forever-modelc" - "forever-modelc"
# hooks: a dictionary of event triggers and actions
# - optional, default: empty dictionary
# - the only supported hook is on_startup
hooks:
# on_startup: a dictionary of actions to perform on startup
# - optional, default: empty dictionar
# - the only supported action is preload
on_startup:
# preload: a list of model ids to load on startup
# - optional, default: empty list
# - model names must match keys in the models sections
# - when preloading multiple models at once, define a group
# otherwise models will be loaded and swapped out
preload:
- "llama"
+27
View File
@@ -138,6 +138,14 @@ func (c *GroupConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
return nil return nil
} }
type HooksConfig struct {
OnStartup HookOnStartup `yaml:"on_startup"`
}
type HookOnStartup struct {
Preload []string `yaml:"preload"`
}
type Config struct { type Config struct {
HealthCheckTimeout int `yaml:"healthCheckTimeout"` HealthCheckTimeout int `yaml:"healthCheckTimeout"`
LogRequests bool `yaml:"logRequests"` LogRequests bool `yaml:"logRequests"`
@@ -155,6 +163,9 @@ type Config struct {
// automatic port assignments // automatic port assignments
StartPort int `yaml:"startPort"` StartPort int `yaml:"startPort"`
// hooks, see: #209
Hooks HooksConfig `yaml:"hooks"`
} }
func (c *Config) RealModelName(search string) (string, bool) { func (c *Config) RealModelName(search string) (string, bool) {
@@ -330,6 +341,22 @@ func LoadConfigFromReader(r io.Reader) (Config, error) {
} }
} }
// clean up hooks preload
if len(config.Hooks.OnStartup.Preload) > 0 {
var toPreload []string
for _, modelID := range config.Hooks.OnStartup.Preload {
modelID = strings.TrimSpace(modelID)
if modelID == "" {
continue
}
if real, found := config.RealModelName(modelID); found {
toPreload = append(toPreload, real)
}
}
config.Hooks.OnStartup.Preload = toPreload
}
return config, nil return config, nil
} }
+8
View File
@@ -100,6 +100,9 @@ func TestConfig_LoadPosix(t *testing.T) {
content := ` content := `
macros: macros:
svr-path: "path/to/server" svr-path: "path/to/server"
hooks:
on_startup:
preload: ["model1", "model2"]
models: models:
model1: model1:
cmd: path/to/cmd --arg1 one cmd: path/to/cmd --arg1 one
@@ -163,6 +166,11 @@ groups:
Macros: map[string]string{ Macros: map[string]string{
"svr-path": "path/to/server", "svr-path": "path/to/server",
}, },
Hooks: HooksConfig{
OnStartup: HookOnStartup{
Preload: []string{"model1", "model2"},
},
},
Models: map[string]ModelConfig{ Models: map[string]ModelConfig{
"model1": { "model1": {
Cmd: "path/to/cmd --arg1 one", Cmd: "path/to/cmd --arg1 one",
+27
View File
@@ -0,0 +1,27 @@
package proxy
import "net/http"
// Custom discard writer that implements http.ResponseWriter but just discards everything
type DiscardWriter struct {
header http.Header
status int
}
func (w *DiscardWriter) Header() http.Header {
if w.header == nil {
w.header = make(http.Header)
}
return w.header
}
func (w *DiscardWriter) Write(data []byte) (int, error) {
return len(data), nil
}
func (w *DiscardWriter) WriteHeader(code int) {
w.status = code
}
// Satisfy the http.Flusher interface for streaming responses
func (w *DiscardWriter) Flush() {}
+10
View File
@@ -7,6 +7,7 @@ const ChatCompletionStatsEventID = 0x02
const ConfigFileChangedEventID = 0x03 const ConfigFileChangedEventID = 0x03
const LogDataEventID = 0x04 const LogDataEventID = 0x04
const TokenMetricsEventID = 0x05 const TokenMetricsEventID = 0x05
const ModelPreloadedEventID = 0x06
type ProcessStateChangeEvent struct { type ProcessStateChangeEvent struct {
ProcessName string ProcessName string
@@ -48,3 +49,12 @@ type LogDataEvent struct {
func (e LogDataEvent) Type() uint32 { func (e LogDataEvent) Type() uint32 {
return LogDataEventID return LogDataEventID
} }
type ModelPreloadedEvent struct {
ModelName string
Success bool
}
func (e ModelPreloadedEvent) Type() uint32 {
return ModelPreloadedEventID
}
+5 -6
View File
@@ -13,9 +13,10 @@ import (
) )
var ( var (
nextTestPort int = 12000 nextTestPort int = 12000
portMutex sync.Mutex portMutex sync.Mutex
testLogger = NewLogMonitorWriter(os.Stdout) testLogger = NewLogMonitorWriter(os.Stdout)
simpleResponderPath = getSimpleResponderPath()
) )
// Check if the binary exists // Check if the binary exists
@@ -69,13 +70,11 @@ func getTestSimpleResponderConfig(expectedMessage string) ModelConfig {
} }
func getTestSimpleResponderConfigPort(expectedMessage string, port int) ModelConfig { func getTestSimpleResponderConfigPort(expectedMessage string, port int) ModelConfig {
binaryPath := getSimpleResponderPath()
// Create a YAML string with just the values we want to set // Create a YAML string with just the values we want to set
yamlStr := fmt.Sprintf(` yamlStr := fmt.Sprintf(`
cmd: '%s --port %d --silent --respond %s' cmd: '%s --port %d --silent --respond %s'
proxy: "http://127.0.0.1:%d" proxy: "http://127.0.0.1:%d"
`, binaryPath, port, expectedMessage, port) `, simpleResponderPath, port, expectedMessage, port)
var cfg ModelConfig var cfg ModelConfig
if err := yaml.Unmarshal([]byte(yamlStr), &cfg); err != nil { if err := yaml.Unmarshal([]byte(yamlStr), &cfg); err != nil {
+3
View File
@@ -79,10 +79,12 @@ func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
outputTokens := int(jsonData.Get("usage.completion_tokens").Int()) outputTokens := int(jsonData.Get("usage.completion_tokens").Int())
inputTokens := int(jsonData.Get("usage.prompt_tokens").Int()) inputTokens := int(jsonData.Get("usage.prompt_tokens").Int())
tokensPerSecond := -1.0 tokensPerSecond := -1.0
promptPerSecond := -1.0
durationMs := int(time.Since(rec.startTime).Milliseconds()) durationMs := int(time.Since(rec.startTime).Milliseconds())
// use llama-server's timing data for tok/sec and duration as it is more accurate // use llama-server's timing data for tok/sec and duration as it is more accurate
if timings := jsonData.Get("timings"); timings.Exists() { if timings := jsonData.Get("timings"); timings.Exists() {
promptPerSecond = jsonData.Get("timings.prompt_per_second").Float()
tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float() tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float()
durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float()) durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float())
} }
@@ -92,6 +94,7 @@ func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
Model: rec.realModelName, Model: rec.realModelName,
InputTokens: inputTokens, InputTokens: inputTokens,
OutputTokens: outputTokens, OutputTokens: outputTokens,
PromptPerSecond: promptPerSecond,
TokensPerSecond: tokensPerSecond, TokensPerSecond: tokensPerSecond,
DurationMs: durationMs, DurationMs: durationMs,
}) })
+1
View File
@@ -15,6 +15,7 @@ type TokenMetrics struct {
Model string `json:"model"` Model string `json:"model"`
InputTokens int `json:"input_tokens"` InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"` OutputTokens int `json:"output_tokens"`
PromptPerSecond float64 `json:"prompt_per_second"`
TokensPerSecond float64 `json:"tokens_per_second"` TokensPerSecond float64 `json:"tokens_per_second"`
DurationMs int `json:"duration_ms"` DurationMs int `json:"duration_ms"`
} }
+32 -2
View File
@@ -15,6 +15,7 @@ import (
"time" "time"
"github.com/gin-gonic/gin" "github.com/gin-gonic/gin"
"github.com/mostlygeek/llama-swap/event"
"github.com/tidwall/gjson" "github.com/tidwall/gjson"
"github.com/tidwall/sjson" "github.com/tidwall/sjson"
) )
@@ -96,6 +97,35 @@ func New(config Config) *ProxyManager {
} }
pm.setupGinEngine() pm.setupGinEngine()
// run any startup hooks
if len(config.Hooks.OnStartup.Preload) > 0 {
// do it in the background, don't block startup -- not sure if good idea yet
go func() {
discardWriter := &DiscardWriter{}
for _, realModelName := range config.Hooks.OnStartup.Preload {
proxyLogger.Infof("Preloading model: %s", realModelName)
processGroup, _, err := pm.swapProcessGroup(realModelName)
if err != nil {
event.Emit(ModelPreloadedEvent{
ModelName: realModelName,
Success: false,
})
proxyLogger.Errorf("Failed to preload model %s: %v", realModelName, err)
continue
} else {
req, _ := http.NewRequest("GET", "/", nil)
processGroup.ProxyRequest(realModelName, discardWriter, req)
event.Emit(ModelPreloadedEvent{
ModelName: realModelName,
Success: true,
})
}
}
}()
}
return pm return pm
} }
@@ -361,7 +391,7 @@ func (pm *ProxyManager) proxyToUpstream(c *gin.Context) {
return return
} }
processGroup, _, err := pm.swapProcessGroup(requestedModel) processGroup, realModelName, err := pm.swapProcessGroup(requestedModel)
if err != nil { if err != nil {
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error())) pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error swapping process group: %s", err.Error()))
return return
@@ -369,7 +399,7 @@ func (pm *ProxyManager) proxyToUpstream(c *gin.Context) {
// rewrite the path // rewrite the path
c.Request.URL.Path = c.Param("upstreamPath") c.Request.URL.Path = c.Param("upstreamPath")
processGroup.ProxyRequest(requestedModel, c.Writer, c.Request) processGroup.ProxyRequest(realModelName, c.Writer, c.Request)
} }
func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) { func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
+123 -49
View File
@@ -9,10 +9,12 @@ import (
"net/http" "net/http"
"net/http/httptest" "net/http/httptest"
"strconv" "strconv"
"strings"
"sync" "sync"
"testing" "testing"
"time" "time"
"github.com/mostlygeek/llama-swap/event"
"github.com/stretchr/testify/assert" "github.com/stretchr/testify/assert"
"github.com/tidwall/gjson" "github.com/tidwall/gjson"
) )
@@ -280,48 +282,48 @@ func TestProxyManager_ListModelsHandler(t *testing.T) {
} }
func TestProxyManager_ListModelsHandler_SortedByID(t *testing.T) { func TestProxyManager_ListModelsHandler_SortedByID(t *testing.T) {
// Intentionally add models in non-sorted order and with an unlisted model // Intentionally add models in non-sorted order and with an unlisted model
config := Config{ config := Config{
HealthCheckTimeout: 15, HealthCheckTimeout: 15,
Models: map[string]ModelConfig{ Models: map[string]ModelConfig{
"zeta": getTestSimpleResponderConfig("zeta"), "zeta": getTestSimpleResponderConfig("zeta"),
"alpha": getTestSimpleResponderConfig("alpha"), "alpha": getTestSimpleResponderConfig("alpha"),
"beta": getTestSimpleResponderConfig("beta"), "beta": getTestSimpleResponderConfig("beta"),
"hidden": func() ModelConfig { "hidden": func() ModelConfig {
mc := getTestSimpleResponderConfig("hidden") mc := getTestSimpleResponderConfig("hidden")
mc.Unlisted = true mc.Unlisted = true
return mc return mc
}(), }(),
}, },
LogLevel: "error", LogLevel: "error",
} }
proxy := New(config) proxy := New(config)
// Request models list // Request models list
req := httptest.NewRequest("GET", "/v1/models", nil) req := httptest.NewRequest("GET", "/v1/models", nil)
w := httptest.NewRecorder() w := httptest.NewRecorder()
proxy.ServeHTTP(w, req) proxy.ServeHTTP(w, req)
assert.Equal(t, http.StatusOK, w.Code) assert.Equal(t, http.StatusOK, w.Code)
var response struct { var response struct {
Data []map[string]interface{} `json:"data"` Data []map[string]interface{} `json:"data"`
} }
if err := json.Unmarshal(w.Body.Bytes(), &response); err != nil { if err := json.Unmarshal(w.Body.Bytes(), &response); err != nil {
t.Fatalf("Failed to parse JSON response: %v", err) t.Fatalf("Failed to parse JSON response: %v", err)
} }
// We expect only the listed models in sorted order by id // We expect only the listed models in sorted order by id
expectedOrder := []string{"alpha", "beta", "zeta"} expectedOrder := []string{"alpha", "beta", "zeta"}
if assert.Len(t, response.Data, len(expectedOrder), "unexpected number of listed models") { if assert.Len(t, response.Data, len(expectedOrder), "unexpected number of listed models") {
got := make([]string, 0, len(response.Data)) got := make([]string, 0, len(response.Data))
for _, m := range response.Data { for _, m := range response.Data {
id, _ := m["id"].(string) id, _ := m["id"].(string)
got = append(got, id) got = append(got, id)
} }
assert.Equal(t, expectedOrder, got, "models should be sorted by id ascending") assert.Equal(t, expectedOrder, got, "models should be sorted by id ascending")
} }
} }
func TestProxyManager_Shutdown(t *testing.T) { func TestProxyManager_Shutdown(t *testing.T) {
@@ -656,21 +658,34 @@ func TestProxyManager_CORSOptionsHandler(t *testing.T) {
} }
func TestProxyManager_Upstream(t *testing.T) { func TestProxyManager_Upstream(t *testing.T) {
config := AddDefaultGroupToConfig(Config{ configStr := fmt.Sprintf(`
HealthCheckTimeout: 15, logLevel: error
Models: map[string]ModelConfig{ models:
"model1": getTestSimpleResponderConfig("model1"), model1:
}, cmd: %s -port ${PORT} -silent -respond model1
LogLevel: "error", aliases: [model-alias]
}) `, getSimpleResponderPath())
config, err := LoadConfigFromReader(strings.NewReader(configStr))
assert.NoError(t, err)
proxy := New(config) proxy := New(config)
defer proxy.StopProcesses(StopWaitForInflightRequest) defer proxy.StopProcesses(StopWaitForInflightRequest)
req := httptest.NewRequest("GET", "/upstream/model1/test", nil) t.Run("main model name", func(t *testing.T) {
rec := httptest.NewRecorder() req := httptest.NewRequest("GET", "/upstream/model1/test", nil)
proxy.ServeHTTP(rec, req) rec := httptest.NewRecorder()
assert.Equal(t, http.StatusOK, rec.Code) proxy.ServeHTTP(rec, req)
assert.Equal(t, "model1", rec.Body.String()) assert.Equal(t, http.StatusOK, rec.Code)
assert.Equal(t, "model1", rec.Body.String())
})
t.Run("model alias", func(t *testing.T) {
req := httptest.NewRequest("GET", "/upstream/model-alias/test", nil)
rec := httptest.NewRecorder()
proxy.ServeHTTP(rec, req)
assert.Equal(t, http.StatusOK, rec.Code)
assert.Equal(t, "model1", rec.Body.String())
})
} }
func TestProxyManager_ChatContentLength(t *testing.T) { func TestProxyManager_ChatContentLength(t *testing.T) {
@@ -818,3 +833,62 @@ func TestProxyManager_HealthEndpoint(t *testing.T) {
assert.Equal(t, http.StatusOK, rec.Code) assert.Equal(t, http.StatusOK, rec.Code)
assert.Equal(t, "OK", rec.Body.String()) assert.Equal(t, "OK", rec.Body.String())
} }
func TestProxyManager_StartupHooks(t *testing.T) {
// using real YAML as the configuration has gotten more complex
// is the right approach as LoadConfigFromReader() does a lot more
// than parse YAML now. Eventually migrate all tests to use this approach
configStr := strings.Replace(`
logLevel: error
hooks:
on_startup:
preload:
- model1
- model2
groups:
preloadTestGroup:
swap: false
members:
- model1
- model2
models:
model1:
cmd: ${simpleresponderpath} --port ${PORT} --silent --respond model1
model2:
cmd: ${simpleresponderpath} --port ${PORT} --silent --respond model2
`, "${simpleresponderpath}", simpleResponderPath, -1)
// Create a test model configuration
config, err := LoadConfigFromReader(strings.NewReader(configStr))
if !assert.NoError(t, err, "Invalid configuration") {
return
}
preloadChan := make(chan ModelPreloadedEvent, 2) // buffer for 2 expected events
unsub := event.On(func(e ModelPreloadedEvent) {
preloadChan <- e
})
defer unsub()
// Create the proxy which should trigger preloading
proxy := New(config)
defer proxy.StopProcesses(StopWaitForInflightRequest)
for i := 0; i < 2; i++ {
select {
case <-preloadChan:
case <-time.After(5 * time.Second):
t.Fatal("timed out waiting for models to preload")
}
}
// make sure they are both loaded
_, foundGroup := proxy.processGroups["preloadTestGroup"]
if !assert.True(t, foundGroup, "preloadTestGroup should exist") {
return
}
assert.Equal(t, StateReady, proxy.processGroups["preloadTestGroup"].processes["model1"].CurrentState())
assert.Equal(t, StateReady, proxy.processGroups["preloadTestGroup"].processes["model2"].CurrentState())
}
+7
View File
@@ -28,6 +28,7 @@ interface Metrics {
model: string; model: string;
input_tokens: number; input_tokens: number;
output_tokens: number; output_tokens: number;
prompt_per_second: number;
tokens_per_second: number; tokens_per_second: number;
duration_ms: number; duration_ms: number;
} }
@@ -83,6 +84,12 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
case "modelStatus": case "modelStatus":
{ {
const models = JSON.parse(message.data) as Model[]; const models = JSON.parse(message.data) as Model[];
// sort models by name and id
models.sort((a, b) => {
return (a.name + a.id).localeCompare(b.name + b.id);
});
setModels(models); setModels(models);
} }
break; break;
+2
View File
@@ -51,6 +51,7 @@ const ActivityPage = () => {
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Model</th> <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Model</th>
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Input Tokens</th> <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Input Tokens</th>
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Output Tokens</th> <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Output Tokens</th>
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Prompt Processing</th>
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Generation Speed</th> <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Generation Speed</th>
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Duration</th> <th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Duration</th>
</tr> </tr>
@@ -62,6 +63,7 @@ const ActivityPage = () => {
<td className="px-6 py-4 whitespace-nowrap text-sm">{metric.model}</td> <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.model}</td>
<td className="px-6 py-4 whitespace-nowrap text-sm">{metric.input_tokens.toLocaleString()}</td> <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.input_tokens.toLocaleString()}</td>
<td className="px-6 py-4 whitespace-nowrap text-sm">{metric.output_tokens.toLocaleString()}</td> <td className="px-6 py-4 whitespace-nowrap text-sm">{metric.output_tokens.toLocaleString()}</td>
<td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.prompt_per_second)}</td>
<td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.tokens_per_second)}</td> <td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.tokens_per_second)}</td>
<td className="px-6 py-4 whitespace-nowrap text-sm">{formatDuration(metric.duration_ms)}</td> <td className="px-6 py-4 whitespace-nowrap text-sm">{formatDuration(metric.duration_ms)}</td>
</tr> </tr>
+47 -30
View File
@@ -4,7 +4,7 @@ import { LogPanel } from "./LogViewer";
import { usePersistentState } from "../hooks/usePersistentState"; import { usePersistentState } from "../hooks/usePersistentState";
import { Panel, PanelGroup, PanelResizeHandle } from "react-resizable-panels"; import { Panel, PanelGroup, PanelResizeHandle } from "react-resizable-panels";
import { useTheme } from "../contexts/ThemeProvider"; import { useTheme } from "../contexts/ThemeProvider";
import { RiEyeFill, RiEyeOffFill, RiStopCircleLine } from "react-icons/ri"; import { RiEyeFill, RiEyeOffFill, RiStopCircleLine, RiSwapBoxFill } from "react-icons/ri";
export default function ModelsPage() { export default function ModelsPage() {
const { isNarrow } = useTheme(); const { isNarrow } = useTheme();
@@ -40,6 +40,7 @@ function ModelsPanel() {
const { models, loadModel, unloadAllModels } = useAPI(); const { models, loadModel, unloadAllModels } = useAPI();
const [isUnloading, setIsUnloading] = useState(false); const [isUnloading, setIsUnloading] = useState(false);
const [showUnlisted, setShowUnlisted] = usePersistentState("showUnlisted", true); const [showUnlisted, setShowUnlisted] = usePersistentState("showUnlisted", true);
const [showIdorName, setShowIdorName] = usePersistentState<"id" | "name">("showIdorName", "id"); // true = show ID, false = show name
const filteredModels = useMemo(() => { const filteredModels = useMemo(() => {
return models.filter((model) => showUnlisted || !model.unlisted); return models.filter((model) => showUnlisted || !model.unlisted);
@@ -58,18 +59,28 @@ function ModelsPanel() {
} }
}, [unloadAllModels]); }, [unloadAllModels]);
const toggleIdorName = useCallback(() => {
setShowIdorName((prev) => (prev === "name" ? "id" : "name"));
}, [showIdorName]);
return ( return (
<div className="card h-full flex flex-col"> <div className="card h-full flex flex-col">
<div className="shrink-0"> <div className="shrink-0">
<h2>Models</h2> <h2>Models</h2>
<div className="flex justify-between"> <div className="flex justify-between">
<button <div className="flex gap-2">
className="btn flex items-center gap-2" <button className="btn flex items-center gap-2" onClick={toggleIdorName} style={{ lineHeight: "1.2" }}>
onClick={() => setShowUnlisted(!showUnlisted)} <RiSwapBoxFill /> {showIdorName === "id" ? "ID" : "Name"}
style={{ lineHeight: "1.2" }} </button>
>
{showUnlisted ? <RiEyeFill /> : <RiEyeOffFill />} unlisted <button
</button> className="btn flex items-center gap-2"
onClick={() => setShowUnlisted(!showUnlisted)}
style={{ lineHeight: "1.2" }}
>
{showUnlisted ? <RiEyeFill /> : <RiEyeOffFill />} unlisted
</button>
</div>
<button className="btn flex items-center gap-2" onClick={handleUnloadAllModels} disabled={isUnloading}> <button className="btn flex items-center gap-2" onClick={handleUnloadAllModels} disabled={isUnloading}>
<RiStopCircleLine size="24" /> {isUnloading ? "Unloading..." : "Unload"} <RiStopCircleLine size="24" /> {isUnloading ? "Unloading..." : "Unload"}
</button> </button>
@@ -80,7 +91,7 @@ function ModelsPanel() {
<table className="w-full"> <table className="w-full">
<thead className="sticky top-0 bg-card z-10"> <thead className="sticky top-0 bg-card z-10">
<tr className="border-b border-primary bg-surface"> <tr className="border-b border-primary bg-surface">
<th className="text-left p-2">Name</th> <th className="text-left p-2">{showIdorName === "id" ? "Model ID" : "Name"}</th>
<th className="text-left p-2"></th> <th className="text-left p-2"></th>
<th className="text-left p-2">State</th> <th className="text-left p-2">State</th>
</tr> </tr>
@@ -90,7 +101,7 @@ function ModelsPanel() {
<tr key={model.id} className="border-b hover:bg-secondary-hover border-border"> <tr key={model.id} className="border-b hover:bg-secondary-hover border-border">
<td className={`p-2 ${model.unlisted ? "text-txtsecondary" : ""}`}> <td className={`p-2 ${model.unlisted ? "text-txtsecondary" : ""}`}>
<a href={`/upstream/${model.id}/`} className={`underline`} target="_blank"> <a href={`/upstream/${model.id}/`} className={`underline`} target="_blank">
{model.name !== "" ? model.name : model.id} {showIdorName === "id" ? model.id : model.name !== "" ? model.name : model.id}
</a> </a>
{model.description !== "" && ( {model.description !== "" && (
<p className={model.unlisted ? "text-opacity-70" : ""}> <p className={model.unlisted ? "text-opacity-70" : ""}>
@@ -122,35 +133,41 @@ function ModelsPanel() {
function StatsPanel() { function StatsPanel() {
const { metrics } = useAPI(); const { metrics } = useAPI();
const [totalRequests, totalTokens, avgTokensPerSecond] = useMemo(() => { const [totalRequests, totalInputTokens, totalOutputTokens, avgTokensPerSecond] = useMemo(() => {
const totalRequests = metrics.length; const totalRequests = metrics.length;
if (totalRequests === 0) { if (totalRequests === 0) {
return [0, 0, 0]; return [0, 0, 0];
} }
const totalTokens = metrics.reduce((sum, m) => sum + m.output_tokens, 0); const totalInputTokens = metrics.reduce((sum, m) => sum + m.input_tokens, 0);
const totalOutputTokens = metrics.reduce((sum, m) => sum + m.output_tokens, 0);
const avgTokensPerSecond = (metrics.reduce((sum, m) => sum + m.tokens_per_second, 0) / totalRequests).toFixed(2); const avgTokensPerSecond = (metrics.reduce((sum, m) => sum + m.tokens_per_second, 0) / totalRequests).toFixed(2);
return [totalRequests, totalTokens, avgTokensPerSecond]; return [totalRequests, totalInputTokens, totalOutputTokens, avgTokensPerSecond];
}, [metrics]); }, [metrics]);
return ( return (
<div className="card"> <div className="card">
<h2>Chat Activity</h2> <div className="rounded-lg overflow-hidden border border-gray-200">
<table className="w-full border border-gray-200"> <table className="w-full">
<tbody> <tbody>
<tr className="border-b border-gray-200"> <tr>
<td className="py-2 px-4 font-medium border-r border-gray-200">Requests</td> <th className="p-2 font-medium border-b border-gray-200 text-right">Requests</th>
<td className="py-2 px-4 text-right">{totalRequests}</td> <th className="p-2 font-medium border-l border-b border-gray-200 text-right">Processed</th>
</tr> <th className="p-2 font-medium border-l border-b border-gray-200 text-right">Generated</th>
<tr className="border-b border-gray-200"> <th className="p-2 font-medium border-l border-b border-gray-200 text-right">Tokens/Sec</th>
<td className="py-2 px-4 font-medium border-r border-gray-200">Total Tokens Generated</td> </tr>
<td className="py-2 px-4 text-right">{totalTokens}</td> <tr>
</tr> <td className="p-2 text-right border-r border-gray-200">{totalRequests}</td>
<tr> <td className="p-2 text-right border-r border-gray-200">
<td className="py-2 px-4 font-medium border-r border-gray-200">Average Tokens/Second</td> {new Intl.NumberFormat().format(totalInputTokens)}
<td className="py-2 px-4 text-right">{avgTokensPerSecond}</td> </td>
</tr> <td className="p-2 text-right border-r border-gray-200">
</tbody> {new Intl.NumberFormat().format(totalOutputTokens)}
</table> </td>
<td className="p-2 text-right">{avgTokensPerSecond}</td>
</tr>
</tbody>
</table>
</div>
</div> </div>
); );
} }