Compare commits
11 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 57803fd3aa | |||
| c55d0cc842 | |||
| 7acbaf4712 | |||
| fcc5ad135a | |||
| 305e5a0031 | |||
| 04fc67354a | |||
| 4662cf7699 | |||
| 5dc6b3e6d9 | |||
| 74c69f39ef | |||
| a186318892 | |||
| c4e4d5e1e9 |
@@ -2,7 +2,7 @@
|
||||
name: Bug Report
|
||||
about: I found a defect
|
||||
title: ''
|
||||
labels: bug
|
||||
labels: 'unconfirmed bug'
|
||||
assignees: ''
|
||||
|
||||
---
|
||||
|
||||
@@ -4,3 +4,4 @@ build/
|
||||
dist/
|
||||
.vscode
|
||||
.DS_Store
|
||||
.dev/
|
||||
|
||||
@@ -18,9 +18,11 @@ Written in golang, it is very easy to install (single binary with no dependencie
|
||||
- `v1/completions`
|
||||
- `v1/chat/completions`
|
||||
- `v1/embeddings`
|
||||
- `v1/rerank`, `v1/reranking`, `rerank`
|
||||
- `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
|
||||
- `v1/audio/transcriptions` ([docs](https://github.com/mostlygeek/llama-swap/issues/41#issuecomment-2722637867))
|
||||
- ✅ llama-server (llama.cpp) supported endpoints:
|
||||
- `v1/rerank`, `v1/reranking`, `/rerank`
|
||||
- `/infill` - for code infilling
|
||||
- ✅ llama-swap custom API endpoints
|
||||
- `/ui` - web UI
|
||||
- `/log` - remote log monitoring
|
||||
@@ -31,8 +33,9 @@ Written in golang, it is very easy to install (single binary with no dependencie
|
||||
- ✅ Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
|
||||
- ✅ Automatic unloading of models after timeout by setting a `ttl`
|
||||
- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
|
||||
- ✅ Docker and Podman support
|
||||
- ✅ Reliable Docker and Podman support with `cmdStart` and `cmdStop`
|
||||
- ✅ Full control over server settings per model
|
||||
- ✅ Preload models on startup with `hooks` ([#235](https://github.com/mostlygeek/llama-swap/pull/235))
|
||||
|
||||
## How does llama-swap work?
|
||||
|
||||
@@ -71,9 +74,13 @@ See the [configuration documentation](https://github.com/mostlygeek/llama-swap/w
|
||||
|
||||
## Web UI
|
||||
|
||||
llama-swap ships with a real time web interface to monitor logs and status of models:
|
||||
llama-swap includes a real time web interface for monitoring logs and models:
|
||||
|
||||
<img width="1786" height="1334" alt="image" src="https://github.com/user-attachments/assets/d6258cb9-1dad-40db-828f-2be860aec8fe" />
|
||||
<img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/adef4a8e-de0b-49db-885a-8f6dedae6799" />
|
||||
|
||||
The Activity Page shows recent requests:
|
||||
|
||||
<img width="1360" height="963" alt="image" src="https://github.com/user-attachments/assets/5f3edee6-d03a-4ae5-ae06-b20ac1f135bd" />
|
||||
|
||||
## Installation
|
||||
|
||||
|
||||
+63
-23
@@ -1,9 +1,17 @@
|
||||
# llama-swap YAML configuration example
|
||||
# -------------------------------------
|
||||
#
|
||||
# 💡 Tip - Use an LLM with this file!
|
||||
# ====================================
|
||||
# This example configuration is written to be LLM friendly. Try
|
||||
# copying this file into an LLM and asking it to explain or generate
|
||||
# sections for you.
|
||||
# ====================================
|
||||
|
||||
# Usage notes:
|
||||
# - Below are all the available configuration options for llama-swap.
|
||||
# - Settings with a default value, or noted as optional can be omitted.
|
||||
# - Settings that are marked required must be in your configuration file
|
||||
# - Settings noted as "required" must be in your configuration file
|
||||
# - Settings noted as "optional" can be omitted
|
||||
|
||||
# healthCheckTimeout: number of seconds to wait for a model to be ready to serve requests
|
||||
# - optional, default: 120
|
||||
@@ -27,9 +35,9 @@ metricsMaxInMemory: 1000
|
||||
# - it is automatically incremented for every model that uses it
|
||||
startPort: 10001
|
||||
|
||||
# macros: sets a dictionary of string:string pairs
|
||||
# macros: a dictionary of string substitutions
|
||||
# - optional, default: empty dictionary
|
||||
# - these are reusable snippets
|
||||
# - macros are reusable snippets
|
||||
# - used in a model's cmd, cmdStop, proxy and checkEndpoint
|
||||
# - useful for reducing common configuration settings
|
||||
macros:
|
||||
@@ -92,44 +100,55 @@ models:
|
||||
|
||||
# checkEndpoint: URL path to check if the server is ready
|
||||
# - optional, default: /health
|
||||
# - use "none" to skip endpoint ready checking
|
||||
# - endpoint is expected to return an HTTP 200 response
|
||||
# - all requests wait until the endpoint is ready (or fails)
|
||||
# - all requests wait until the endpoint is ready or fails
|
||||
# - use "none" to skip endpoint health checking
|
||||
checkEndpoint: /custom-endpoint
|
||||
|
||||
# ttl: automatically unload the model after this many seconds
|
||||
# ttl: automatically unload the model after ttl seconds
|
||||
# - optional, default: 0
|
||||
# - ttl values must be a value greater than 0
|
||||
# - a value of 0 disables automatic unloading of the model
|
||||
ttl: 60
|
||||
|
||||
# useModelName: overrides the model name that is sent to upstream server
|
||||
# useModelName: override the model name that is sent to upstream server
|
||||
# - optional, default: ""
|
||||
# - useful when the upstream server expects a specific model name or format
|
||||
# - useful for when the upstream server expects a specific model name that
|
||||
# is different from the model's ID
|
||||
useModelName: "qwen:qwq"
|
||||
|
||||
# filters: a dictionary of filter settings
|
||||
# - optional, default: empty dictionary
|
||||
# - only strip_params is currently supported
|
||||
filters:
|
||||
# strip_params: a comma separated list of parameters to remove from the request
|
||||
# - optional, default: ""
|
||||
# - useful for preventing overriding of default server params by requests
|
||||
# - `model` parameter is never removed
|
||||
# - useful for server side enforcement of sampling parameters
|
||||
# - the `model` parameter can never be removed
|
||||
# - can be any JSON key in the request body
|
||||
# - recommended to stick to sampling parameters
|
||||
strip_params: "temperature, top_p, top_k"
|
||||
|
||||
# concurrencyLimit: overrides the allowed number of active parallel requests to a model
|
||||
# - optional, default: 0
|
||||
# - useful for limiting the number of active parallel requests a model can process
|
||||
# - must be set per model
|
||||
# - any number greater than 0 will override the internal default value of 10
|
||||
# - any requests that exceeds the limit will receive an HTTP 429 Too Many Requests response
|
||||
# - recommended to be omitted and the default used
|
||||
concurrencyLimit: 0
|
||||
|
||||
# Unlisted model example:
|
||||
"qwen-unlisted":
|
||||
# unlisted: true or false
|
||||
# unlisted: boolean, true or false
|
||||
# - optional, default: false
|
||||
# - unlisted models do not show up in /v1/models or /upstream lists
|
||||
# - unlisted models do not show up in /v1/models api requests
|
||||
# - can be requested as normal through all apis
|
||||
unlisted: true
|
||||
cmd: llama-server --port ${PORT} -m Llama-3.2-1B-Instruct-Q4_K_M.gguf -ngl 0
|
||||
|
||||
# Docker example:
|
||||
# container run times like Docker and Podman can also be used with a
|
||||
# container run times like Docker and Podman can be used reliably with a
|
||||
# a combination of cmd and cmdStop.
|
||||
"docker-llama":
|
||||
proxy: "http://127.0.0.1:${PORT}"
|
||||
@@ -142,24 +161,26 @@ models:
|
||||
# cmdStop: command to run to stop the model gracefully
|
||||
# - optional, default: ""
|
||||
# - useful for stopping commands managed by another system
|
||||
# - on POSIX systems: a SIGTERM is sent for graceful shutdown
|
||||
# - on Windows, taskkill is used
|
||||
# - processes are given 5 seconds to shutdown until they are forcefully killed
|
||||
# - the upstream's process id is available in the ${PID} macro
|
||||
#
|
||||
# When empty, llama-swap has this default behaviour:
|
||||
# - on POSIX systems: a SIGTERM signal is sent
|
||||
# - on Windows, calls taskkill to stop the process
|
||||
# - processes have 5 seconds to shutdown until forceful termination is attempted
|
||||
cmdStop: docker stop dockertest
|
||||
|
||||
# groups: a dictionary of group settings
|
||||
# - optional, default: empty dictionary
|
||||
# - provide advanced controls over model swapping behaviour.
|
||||
# - Using groups some models can be kept loaded indefinitely, while others are swapped out.
|
||||
# - model ids must be defined in the Models section
|
||||
# - provides advanced controls over model swapping behaviour
|
||||
# - using groups some models can be kept loaded indefinitely, while others are swapped out
|
||||
# - model IDs must be defined in the Models section
|
||||
# - a model can only be a member of one group
|
||||
# - group behaviour is controlled via the `swap`, `exclusive` and `persistent` fields
|
||||
# - see issue #109 for details
|
||||
#
|
||||
# NOTE: the example below uses model names that are not defined above for demonstration purposes
|
||||
groups:
|
||||
# group1 is same as the default behaviour of llama-swap where only one model is allowed
|
||||
# group1 works the same as the default behaviour of llama-swap where only one model is allowed
|
||||
# to run a time across the whole llama-swap instance
|
||||
"group1":
|
||||
# swap: controls the model swapping behaviour in within the group
|
||||
@@ -181,10 +202,13 @@ groups:
|
||||
- "qwen-unlisted"
|
||||
|
||||
# Example:
|
||||
# - in this group all the models can run at the same time
|
||||
# - when a different group loads all running models in this group are unloaded
|
||||
# - in group2 all models can run at the same time
|
||||
# - when a different group is loaded it causes all running models in this group to unload
|
||||
"group2":
|
||||
swap: false
|
||||
|
||||
# exclusive: false does not unload other groups when a model in group2 is requested
|
||||
# - the models in group2 will be loaded but will not unload any other groups
|
||||
exclusive: false
|
||||
members:
|
||||
- "docker-llama"
|
||||
@@ -207,3 +231,19 @@ groups:
|
||||
- "forever-modelA"
|
||||
- "forever-modelB"
|
||||
- "forever-modelc"
|
||||
|
||||
# hooks: a dictionary of event triggers and actions
|
||||
# - optional, default: empty dictionary
|
||||
# - the only supported hook is on_startup
|
||||
hooks:
|
||||
# on_startup: a dictionary of actions to perform on startup
|
||||
# - optional, default: empty dictionary
|
||||
# - the only supported action is preload
|
||||
on_startup:
|
||||
# preload: a list of model ids to load on startup
|
||||
# - optional, default: empty list
|
||||
# - model names must match keys in the models sections
|
||||
# - when preloading multiple models at once, define a group
|
||||
# otherwise models will be loaded and swapped out
|
||||
preload:
|
||||
- "llama"
|
||||
@@ -0,0 +1,159 @@
|
||||
package main
|
||||
|
||||
// created for issue: #252 https://github.com/mostlygeek/llama-swap/issues/252
|
||||
// this simple benchmark tool sends a lot of small chat completion requests to llama-swap
|
||||
// to make sure all the requests are accounted for.
|
||||
//
|
||||
// requests can be sent in parallel, and the tool will report the results.
|
||||
// usage: go run main.go -baseurl http://localhost:8080/v1 -model llama3 -requests 1000 -par 5
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"net/http"
|
||||
"os"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// ----- CLI arguments ----------------------------------------------------
|
||||
var (
|
||||
baseurl string
|
||||
modelName string
|
||||
totalRequests int
|
||||
parallelization int
|
||||
)
|
||||
|
||||
flag.StringVar(&baseurl, "baseurl", "http://localhost:8080/v1", "Base URL of the API (e.g., https://api.example.com)")
|
||||
flag.StringVar(&modelName, "model", "", "Model name to use")
|
||||
flag.IntVar(&totalRequests, "requests", 1, "Total number of requests to send")
|
||||
flag.IntVar(¶llelization, "par", 1, "Maximum number of concurrent requests")
|
||||
flag.Parse()
|
||||
|
||||
if baseurl == "" || modelName == "" {
|
||||
fmt.Println("Error: both -baseurl and -model are required.")
|
||||
flag.Usage()
|
||||
os.Exit(1)
|
||||
}
|
||||
if totalRequests <= 0 {
|
||||
fmt.Println("Error: -requests must be greater than 0.")
|
||||
os.Exit(1)
|
||||
}
|
||||
if parallelization <= 0 {
|
||||
fmt.Println("Error: -parallelization must be greater than 0.")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// ----- HTTP client -------------------------------------------------------
|
||||
client := &http.Client{
|
||||
Timeout: 30 * time.Second,
|
||||
}
|
||||
|
||||
// ----- Tracking response codes -------------------------------------------
|
||||
statusCounts := make(map[int]int) // map[statusCode]count
|
||||
var mu sync.Mutex // protects statusCounts
|
||||
|
||||
// ----- Request queue (buffered channel) ----------------------------------
|
||||
requests := make(chan int, 10) // Buffered channel with capacity 10
|
||||
|
||||
// Goroutine to fill the request queue
|
||||
go func() {
|
||||
for i := 0; i < totalRequests; i++ {
|
||||
requests <- i + 1
|
||||
}
|
||||
close(requests)
|
||||
}()
|
||||
|
||||
// ----- Worker pool -------------------------------------------------------
|
||||
var wg sync.WaitGroup
|
||||
for i := 0; i < parallelization; i++ {
|
||||
wg.Add(1)
|
||||
go func(workerID int) {
|
||||
defer wg.Done()
|
||||
|
||||
for reqID := range requests {
|
||||
// Build request payload as a single line JSON string
|
||||
payload := `{"model":"` + modelName + `","max_tokens":100,"stream":false,"messages":[{"role":"user","content":"write a snake game in python"}]}`
|
||||
|
||||
// Send POST request
|
||||
req, err := http.NewRequest(http.MethodPost,
|
||||
fmt.Sprintf("%s/chat/completions", baseurl),
|
||||
bytes.NewReader([]byte(payload)))
|
||||
if err != nil {
|
||||
log.Printf("[worker %d][req %d] request creation error: %v", workerID, reqID, err)
|
||||
mu.Lock()
|
||||
statusCounts[-1]++
|
||||
mu.Unlock()
|
||||
continue
|
||||
}
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
|
||||
resp, err := client.Do(req)
|
||||
if err != nil {
|
||||
log.Printf("[worker %d][req %d] HTTP request error: %v", workerID, reqID, err)
|
||||
mu.Lock()
|
||||
statusCounts[-1]++
|
||||
mu.Unlock()
|
||||
continue
|
||||
}
|
||||
io.Copy(io.Discard, resp.Body)
|
||||
resp.Body.Close()
|
||||
|
||||
// Record status code
|
||||
mu.Lock()
|
||||
statusCounts[resp.StatusCode]++
|
||||
mu.Unlock()
|
||||
}
|
||||
}(i + 1)
|
||||
}
|
||||
|
||||
// ----- Status ticker (prints every second) -------------------------------
|
||||
done := make(chan struct{})
|
||||
tickerDone := make(chan struct{})
|
||||
go func() {
|
||||
ticker := time.NewTicker(1 * time.Second)
|
||||
startTime := time.Now()
|
||||
for {
|
||||
select {
|
||||
case <-ticker.C:
|
||||
mu.Lock()
|
||||
// Compute how many requests have completed so far
|
||||
completed := 0
|
||||
for _, cnt := range statusCounts {
|
||||
completed += cnt
|
||||
}
|
||||
// Calculate duration and progress
|
||||
duration := time.Since(startTime)
|
||||
progress := completed * 100 / totalRequests
|
||||
fmt.Printf("Duration: %v, Completed: %d%% requests\n", duration, progress)
|
||||
mu.Unlock()
|
||||
case <-done:
|
||||
duration := time.Since(startTime)
|
||||
fmt.Printf("Duration: %v, Completed: %d%% requests\n", duration, 100)
|
||||
close(tickerDone)
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
// Wait for all workers to finish
|
||||
wg.Wait()
|
||||
close(done) // stops the status-update goroutine
|
||||
<-tickerDone // give ticker time to finish / print
|
||||
|
||||
// ----- Summary ------------------------------------------------------------
|
||||
fmt.Println("\n\n=== HTTP response code summary ===")
|
||||
mu.Lock()
|
||||
for code, cnt := range statusCounts {
|
||||
if code == -1 {
|
||||
fmt.Printf("Client-side errors (no HTTP response): %d\n", cnt)
|
||||
} else {
|
||||
fmt.Printf("%d : %d\n", code, cnt)
|
||||
}
|
||||
}
|
||||
mu.Unlock()
|
||||
}
|
||||
@@ -138,6 +138,14 @@ func (c *GroupConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
type HooksConfig struct {
|
||||
OnStartup HookOnStartup `yaml:"on_startup"`
|
||||
}
|
||||
|
||||
type HookOnStartup struct {
|
||||
Preload []string `yaml:"preload"`
|
||||
}
|
||||
|
||||
type Config struct {
|
||||
HealthCheckTimeout int `yaml:"healthCheckTimeout"`
|
||||
LogRequests bool `yaml:"logRequests"`
|
||||
@@ -155,6 +163,9 @@ type Config struct {
|
||||
|
||||
// automatic port assignments
|
||||
StartPort int `yaml:"startPort"`
|
||||
|
||||
// hooks, see: #209
|
||||
Hooks HooksConfig `yaml:"hooks"`
|
||||
}
|
||||
|
||||
func (c *Config) RealModelName(search string) (string, bool) {
|
||||
@@ -330,6 +341,22 @@ func LoadConfigFromReader(r io.Reader) (Config, error) {
|
||||
}
|
||||
}
|
||||
|
||||
// clean up hooks preload
|
||||
if len(config.Hooks.OnStartup.Preload) > 0 {
|
||||
var toPreload []string
|
||||
for _, modelID := range config.Hooks.OnStartup.Preload {
|
||||
modelID = strings.TrimSpace(modelID)
|
||||
if modelID == "" {
|
||||
continue
|
||||
}
|
||||
if real, found := config.RealModelName(modelID); found {
|
||||
toPreload = append(toPreload, real)
|
||||
}
|
||||
}
|
||||
|
||||
config.Hooks.OnStartup.Preload = toPreload
|
||||
}
|
||||
|
||||
return config, nil
|
||||
}
|
||||
|
||||
|
||||
@@ -100,6 +100,9 @@ func TestConfig_LoadPosix(t *testing.T) {
|
||||
content := `
|
||||
macros:
|
||||
svr-path: "path/to/server"
|
||||
hooks:
|
||||
on_startup:
|
||||
preload: ["model1", "model2"]
|
||||
models:
|
||||
model1:
|
||||
cmd: path/to/cmd --arg1 one
|
||||
@@ -163,6 +166,11 @@ groups:
|
||||
Macros: map[string]string{
|
||||
"svr-path": "path/to/server",
|
||||
},
|
||||
Hooks: HooksConfig{
|
||||
OnStartup: HookOnStartup{
|
||||
Preload: []string{"model1", "model2"},
|
||||
},
|
||||
},
|
||||
Models: map[string]ModelConfig{
|
||||
"model1": {
|
||||
Cmd: "path/to/cmd --arg1 one",
|
||||
|
||||
@@ -0,0 +1,27 @@
|
||||
package proxy
|
||||
|
||||
import "net/http"
|
||||
|
||||
// Custom discard writer that implements http.ResponseWriter but just discards everything
|
||||
type DiscardWriter struct {
|
||||
header http.Header
|
||||
status int
|
||||
}
|
||||
|
||||
func (w *DiscardWriter) Header() http.Header {
|
||||
if w.header == nil {
|
||||
w.header = make(http.Header)
|
||||
}
|
||||
return w.header
|
||||
}
|
||||
|
||||
func (w *DiscardWriter) Write(data []byte) (int, error) {
|
||||
return len(data), nil
|
||||
}
|
||||
|
||||
func (w *DiscardWriter) WriteHeader(code int) {
|
||||
w.status = code
|
||||
}
|
||||
|
||||
// Satisfy the http.Flusher interface for streaming responses
|
||||
func (w *DiscardWriter) Flush() {}
|
||||
@@ -7,6 +7,7 @@ const ChatCompletionStatsEventID = 0x02
|
||||
const ConfigFileChangedEventID = 0x03
|
||||
const LogDataEventID = 0x04
|
||||
const TokenMetricsEventID = 0x05
|
||||
const ModelPreloadedEventID = 0x06
|
||||
|
||||
type ProcessStateChangeEvent struct {
|
||||
ProcessName string
|
||||
@@ -48,3 +49,12 @@ type LogDataEvent struct {
|
||||
func (e LogDataEvent) Type() uint32 {
|
||||
return LogDataEventID
|
||||
}
|
||||
|
||||
type ModelPreloadedEvent struct {
|
||||
ModelName string
|
||||
Success bool
|
||||
}
|
||||
|
||||
func (e ModelPreloadedEvent) Type() uint32 {
|
||||
return ModelPreloadedEventID
|
||||
}
|
||||
|
||||
@@ -13,9 +13,10 @@ import (
|
||||
)
|
||||
|
||||
var (
|
||||
nextTestPort int = 12000
|
||||
portMutex sync.Mutex
|
||||
testLogger = NewLogMonitorWriter(os.Stdout)
|
||||
nextTestPort int = 12000
|
||||
portMutex sync.Mutex
|
||||
testLogger = NewLogMonitorWriter(os.Stdout)
|
||||
simpleResponderPath = getSimpleResponderPath()
|
||||
)
|
||||
|
||||
// Check if the binary exists
|
||||
@@ -69,13 +70,11 @@ func getTestSimpleResponderConfig(expectedMessage string) ModelConfig {
|
||||
}
|
||||
|
||||
func getTestSimpleResponderConfigPort(expectedMessage string, port int) ModelConfig {
|
||||
binaryPath := getSimpleResponderPath()
|
||||
|
||||
// Create a YAML string with just the values we want to set
|
||||
yamlStr := fmt.Sprintf(`
|
||||
cmd: '%s --port %d --silent --respond %s'
|
||||
proxy: "http://127.0.0.1:%d"
|
||||
`, binaryPath, port, expectedMessage, port)
|
||||
`, simpleResponderPath, port, expectedMessage, port)
|
||||
|
||||
var cfg ModelConfig
|
||||
if err := yaml.Unmarshal([]byte(yamlStr), &cfg); err != nil {
|
||||
|
||||
+31
-22
@@ -5,12 +5,20 @@ import (
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/tidwall/gjson"
|
||||
)
|
||||
|
||||
type MetricsRecorder struct {
|
||||
metricsMonitor *MetricsMonitor
|
||||
realModelName string
|
||||
// isStreaming bool
|
||||
startTime time.Time
|
||||
}
|
||||
|
||||
// MetricsMiddleware sets up the MetricsResponseWriter for capturing upstream requests
|
||||
func MetricsMiddleware(pm *ProxyManager) gin.HandlerFunc {
|
||||
return func(c *gin.Context) {
|
||||
@@ -41,48 +49,48 @@ func MetricsMiddleware(pm *ProxyManager) gin.HandlerFunc {
|
||||
metricsRecorder: &MetricsRecorder{
|
||||
metricsMonitor: pm.metricsMonitor,
|
||||
realModelName: realModelName,
|
||||
isStreaming: gjson.GetBytes(bodyBytes, "stream").Bool(),
|
||||
startTime: time.Now(),
|
||||
},
|
||||
}
|
||||
c.Writer = writer
|
||||
c.Next()
|
||||
|
||||
rec := writer.metricsRecorder
|
||||
rec.processBody(writer.body)
|
||||
}
|
||||
}
|
||||
// check for streaming response
|
||||
if strings.Contains(c.Writer.Header().Get("Content-Type"), "text/event-stream") {
|
||||
writer.metricsRecorder.processStreamingResponse(writer.body)
|
||||
} else {
|
||||
writer.metricsRecorder.processNonStreamingResponse(writer.body)
|
||||
}
|
||||
|
||||
type MetricsRecorder struct {
|
||||
metricsMonitor *MetricsMonitor
|
||||
realModelName string
|
||||
isStreaming bool
|
||||
startTime time.Time
|
||||
}
|
||||
|
||||
// processBody handles response processing after request completes
|
||||
func (rec *MetricsRecorder) processBody(body []byte) {
|
||||
if rec.isStreaming {
|
||||
rec.processStreamingResponse(body)
|
||||
} else {
|
||||
rec.processNonStreamingResponse(body)
|
||||
}
|
||||
}
|
||||
|
||||
func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
|
||||
usage := jsonData.Get("usage")
|
||||
if !usage.Exists() {
|
||||
timings := jsonData.Get("timings")
|
||||
if !usage.Exists() && !timings.Exists() {
|
||||
return false
|
||||
}
|
||||
|
||||
// default values
|
||||
outputTokens := int(jsonData.Get("usage.completion_tokens").Int())
|
||||
inputTokens := int(jsonData.Get("usage.prompt_tokens").Int())
|
||||
outputTokens := 0
|
||||
inputTokens := 0
|
||||
|
||||
// timings data
|
||||
tokensPerSecond := -1.0
|
||||
promptPerSecond := -1.0
|
||||
durationMs := int(time.Since(rec.startTime).Milliseconds())
|
||||
|
||||
if usage.Exists() {
|
||||
outputTokens = int(jsonData.Get("usage.completion_tokens").Int())
|
||||
inputTokens = int(jsonData.Get("usage.prompt_tokens").Int())
|
||||
}
|
||||
|
||||
// use llama-server's timing data for tok/sec and duration as it is more accurate
|
||||
if timings := jsonData.Get("timings"); timings.Exists() {
|
||||
if timings.Exists() {
|
||||
inputTokens = int(jsonData.Get("timings.prompt_n").Int())
|
||||
outputTokens = int(jsonData.Get("timings.predicted_n").Int())
|
||||
promptPerSecond = jsonData.Get("timings.prompt_per_second").Float()
|
||||
tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float()
|
||||
durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float())
|
||||
}
|
||||
@@ -92,6 +100,7 @@ func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) bool {
|
||||
Model: rec.realModelName,
|
||||
InputTokens: inputTokens,
|
||||
OutputTokens: outputTokens,
|
||||
PromptPerSecond: promptPerSecond,
|
||||
TokensPerSecond: tokensPerSecond,
|
||||
DurationMs: durationMs,
|
||||
})
|
||||
|
||||
@@ -15,6 +15,7 @@ type TokenMetrics struct {
|
||||
Model string `json:"model"`
|
||||
InputTokens int `json:"input_tokens"`
|
||||
OutputTokens int `json:"output_tokens"`
|
||||
PromptPerSecond float64 `json:"prompt_per_second"`
|
||||
TokensPerSecond float64 `json:"tokens_per_second"`
|
||||
DurationMs int `json:"duration_ms"`
|
||||
}
|
||||
|
||||
+38
-2
@@ -15,6 +15,7 @@ import (
|
||||
"time"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/mostlygeek/llama-swap/event"
|
||||
"github.com/tidwall/gjson"
|
||||
"github.com/tidwall/sjson"
|
||||
)
|
||||
@@ -96,6 +97,35 @@ func New(config Config) *ProxyManager {
|
||||
}
|
||||
|
||||
pm.setupGinEngine()
|
||||
|
||||
// run any startup hooks
|
||||
if len(config.Hooks.OnStartup.Preload) > 0 {
|
||||
// do it in the background, don't block startup -- not sure if good idea yet
|
||||
go func() {
|
||||
discardWriter := &DiscardWriter{}
|
||||
for _, realModelName := range config.Hooks.OnStartup.Preload {
|
||||
proxyLogger.Infof("Preloading model: %s", realModelName)
|
||||
processGroup, _, err := pm.swapProcessGroup(realModelName)
|
||||
|
||||
if err != nil {
|
||||
event.Emit(ModelPreloadedEvent{
|
||||
ModelName: realModelName,
|
||||
Success: false,
|
||||
})
|
||||
proxyLogger.Errorf("Failed to preload model %s: %v", realModelName, err)
|
||||
continue
|
||||
} else {
|
||||
req, _ := http.NewRequest("GET", "/", nil)
|
||||
processGroup.ProxyRequest(realModelName, discardWriter, req)
|
||||
event.Emit(ModelPreloadedEvent{
|
||||
ModelName: realModelName,
|
||||
Success: true,
|
||||
})
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
return pm
|
||||
}
|
||||
|
||||
@@ -161,11 +191,17 @@ func (pm *ProxyManager) setupGinEngine() {
|
||||
// Support legacy /v1/completions api, see issue #12
|
||||
pm.ginEngine.POST("/v1/completions", mm, pm.proxyOAIHandler)
|
||||
|
||||
// Support embeddings
|
||||
// Support embeddings and reranking
|
||||
pm.ginEngine.POST("/v1/embeddings", mm, pm.proxyOAIHandler)
|
||||
|
||||
// llama-server's /reranking endpoint + aliases
|
||||
pm.ginEngine.POST("/reranking", mm, pm.proxyOAIHandler)
|
||||
pm.ginEngine.POST("/rerank", mm, pm.proxyOAIHandler)
|
||||
pm.ginEngine.POST("/v1/rerank", mm, pm.proxyOAIHandler)
|
||||
pm.ginEngine.POST("/v1/reranking", mm, pm.proxyOAIHandler)
|
||||
pm.ginEngine.POST("/rerank", mm, pm.proxyOAIHandler)
|
||||
|
||||
// llama-server's /infill endpoint for code infilling
|
||||
pm.ginEngine.POST("/infill", mm, pm.proxyOAIHandler)
|
||||
|
||||
// Support audio/speech endpoint
|
||||
pm.ginEngine.POST("/v1/audio/speech", pm.proxyOAIHandler)
|
||||
|
||||
@@ -132,7 +132,7 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
|
||||
}
|
||||
}
|
||||
|
||||
sendMetrics := func(metrics TokenMetrics) {
|
||||
sendMetrics := func(metrics []TokenMetrics) {
|
||||
jsonData, err := json.Marshal(metrics)
|
||||
if err == nil {
|
||||
select {
|
||||
@@ -168,16 +168,14 @@ func (pm *ProxyManager) apiSendEvents(c *gin.Context) {
|
||||
* Send Metrics data
|
||||
*/
|
||||
defer event.On(func(e TokenMetricsEvent) {
|
||||
sendMetrics(e.Metrics)
|
||||
sendMetrics([]TokenMetrics{e.Metrics})
|
||||
})()
|
||||
|
||||
// send initial batch of data
|
||||
sendLogData("proxy", pm.proxyLogger.GetHistory())
|
||||
sendLogData("upstream", pm.upstreamLogger.GetHistory())
|
||||
sendModels()
|
||||
for _, metrics := range pm.metricsMonitor.GetMetrics() {
|
||||
sendMetrics(metrics)
|
||||
}
|
||||
sendMetrics(pm.metricsMonitor.GetMetrics())
|
||||
|
||||
for {
|
||||
select {
|
||||
|
||||
@@ -14,6 +14,7 @@ import (
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/mostlygeek/llama-swap/event"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/tidwall/gjson"
|
||||
)
|
||||
@@ -832,3 +833,62 @@ func TestProxyManager_HealthEndpoint(t *testing.T) {
|
||||
assert.Equal(t, http.StatusOK, rec.Code)
|
||||
assert.Equal(t, "OK", rec.Body.String())
|
||||
}
|
||||
|
||||
func TestProxyManager_StartupHooks(t *testing.T) {
|
||||
|
||||
// using real YAML as the configuration has gotten more complex
|
||||
// is the right approach as LoadConfigFromReader() does a lot more
|
||||
// than parse YAML now. Eventually migrate all tests to use this approach
|
||||
configStr := strings.Replace(`
|
||||
logLevel: error
|
||||
hooks:
|
||||
on_startup:
|
||||
preload:
|
||||
- model1
|
||||
- model2
|
||||
groups:
|
||||
preloadTestGroup:
|
||||
swap: false
|
||||
members:
|
||||
- model1
|
||||
- model2
|
||||
models:
|
||||
model1:
|
||||
cmd: ${simpleresponderpath} --port ${PORT} --silent --respond model1
|
||||
model2:
|
||||
cmd: ${simpleresponderpath} --port ${PORT} --silent --respond model2
|
||||
`, "${simpleresponderpath}", simpleResponderPath, -1)
|
||||
|
||||
// Create a test model configuration
|
||||
config, err := LoadConfigFromReader(strings.NewReader(configStr))
|
||||
if !assert.NoError(t, err, "Invalid configuration") {
|
||||
return
|
||||
}
|
||||
|
||||
preloadChan := make(chan ModelPreloadedEvent, 2) // buffer for 2 expected events
|
||||
|
||||
unsub := event.On(func(e ModelPreloadedEvent) {
|
||||
preloadChan <- e
|
||||
})
|
||||
|
||||
defer unsub()
|
||||
|
||||
// Create the proxy which should trigger preloading
|
||||
proxy := New(config)
|
||||
defer proxy.StopProcesses(StopWaitForInflightRequest)
|
||||
|
||||
for i := 0; i < 2; i++ {
|
||||
select {
|
||||
case <-preloadChan:
|
||||
case <-time.After(5 * time.Second):
|
||||
t.Fatal("timed out waiting for models to preload")
|
||||
}
|
||||
}
|
||||
// make sure they are both loaded
|
||||
_, foundGroup := proxy.processGroups["preloadTestGroup"]
|
||||
if !assert.True(t, foundGroup, "preloadTestGroup should exist") {
|
||||
return
|
||||
}
|
||||
assert.Equal(t, StateReady, proxy.processGroups["preloadTestGroup"].processes["model1"].CurrentState())
|
||||
assert.Equal(t, StateReady, proxy.processGroups["preloadTestGroup"].processes["model2"].CurrentState())
|
||||
}
|
||||
|
||||
+62
-34
@@ -1,50 +1,78 @@
|
||||
import { useEffect, useCallback } from "react";
|
||||
import { BrowserRouter as Router, Routes, Route, Navigate, NavLink } from "react-router-dom";
|
||||
import { useTheme } from "./contexts/ThemeProvider";
|
||||
import { APIProvider } from "./contexts/APIProvider";
|
||||
import { useAPI } from "./contexts/APIProvider";
|
||||
import LogViewerPage from "./pages/LogViewer";
|
||||
import ModelPage from "./pages/Models";
|
||||
import ActivityPage from "./pages/Activity";
|
||||
import ConnectionStatusIcon from "./components/ConnectionStatus";
|
||||
import { RiSunFill, RiMoonFill } from "react-icons/ri";
|
||||
|
||||
function App() {
|
||||
const { isNarrow, toggleTheme, isDarkMode } = useTheme();
|
||||
const { isNarrow, toggleTheme, isDarkMode, appTitle, setAppTitle, setConnectionState } = useTheme();
|
||||
const handleTitleChange = useCallback(
|
||||
(newTitle: string) => {
|
||||
setAppTitle(newTitle.replace(/\n/g, "").trim().substring(0, 64) || "llama-swap");
|
||||
},
|
||||
[setAppTitle]
|
||||
);
|
||||
|
||||
const { connectionStatus } = useAPI();
|
||||
|
||||
// Synchronize the window.title connections state with the actual connection state
|
||||
useEffect(() => {
|
||||
setConnectionState(connectionStatus);
|
||||
}, [connectionStatus]);
|
||||
|
||||
return (
|
||||
<Router basename="/ui/">
|
||||
<APIProvider>
|
||||
<div className="flex flex-col h-screen">
|
||||
<nav className="bg-surface border-b border-border p-2 h-[75px]">
|
||||
<div className="flex items-center justify-between mx-auto px-4 h-full">
|
||||
{!isNarrow && <h1 className="flex items-center p-0">llama-swap</h1>}
|
||||
<div className="flex items-center space-x-4">
|
||||
<NavLink to="/" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
|
||||
Logs
|
||||
</NavLink>
|
||||
|
||||
<NavLink to="/models" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
|
||||
Models
|
||||
</NavLink>
|
||||
|
||||
<NavLink to="/activity" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
|
||||
Activity
|
||||
</NavLink>
|
||||
<button className="" onClick={toggleTheme}>
|
||||
{isDarkMode ? <RiMoonFill /> : <RiSunFill />}
|
||||
</button>
|
||||
</div>
|
||||
<div className="flex flex-col h-screen">
|
||||
<nav className="bg-surface border-b border-border p-2 h-[75px]">
|
||||
<div className="flex items-center justify-between mx-auto px-4 h-full">
|
||||
{!isNarrow && (
|
||||
<h1
|
||||
contentEditable
|
||||
suppressContentEditableWarning
|
||||
className="flex items-center p-0 outline-none hover:bg-gray-100 dark:hover:bg-gray-700 rounded px-1"
|
||||
onBlur={(e) => handleTitleChange(e.currentTarget.textContent || "(set title)")}
|
||||
onKeyDown={(e) => {
|
||||
if (e.key === "Enter") {
|
||||
e.preventDefault();
|
||||
handleTitleChange(e.currentTarget.textContent || "(set title)");
|
||||
e.currentTarget.blur();
|
||||
}
|
||||
}}
|
||||
>
|
||||
{appTitle}
|
||||
</h1>
|
||||
)}
|
||||
<div className="flex items-center space-x-4">
|
||||
<NavLink to="/" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
|
||||
Logs
|
||||
</NavLink>
|
||||
<NavLink to="/models" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
|
||||
Models
|
||||
</NavLink>
|
||||
<NavLink to="/activity" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
|
||||
Activity
|
||||
</NavLink>
|
||||
<button className="" onClick={toggleTheme}>
|
||||
{isDarkMode ? <RiMoonFill /> : <RiSunFill />}
|
||||
</button>
|
||||
<ConnectionStatusIcon />
|
||||
</div>
|
||||
</nav>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<main className="flex-1 overflow-auto p-4">
|
||||
<Routes>
|
||||
<Route path="/" element={<LogViewerPage />} />
|
||||
<Route path="/models" element={<ModelPage />} />
|
||||
<Route path="/activity" element={<ActivityPage />} />
|
||||
<Route path="*" element={<Navigate to="/" replace />} />
|
||||
</Routes>
|
||||
</main>
|
||||
</div>
|
||||
</APIProvider>
|
||||
<main className="flex-1 overflow-auto p-4">
|
||||
<Routes>
|
||||
<Route path="/" element={<LogViewerPage />} />
|
||||
<Route path="/models" element={<ModelPage />} />
|
||||
<Route path="/activity" element={<ActivityPage />} />
|
||||
<Route path="*" element={<Navigate to="/" replace />} />
|
||||
</Routes>
|
||||
</main>
|
||||
</div>
|
||||
</Router>
|
||||
);
|
||||
}
|
||||
|
||||
@@ -0,0 +1,26 @@
|
||||
import { useAPI } from "../contexts/APIProvider";
|
||||
import { useMemo } from "react";
|
||||
|
||||
const ConnectionStatusIcon = () => {
|
||||
const { connectionStatus } = useAPI();
|
||||
|
||||
const eventStatusColor = useMemo(() => {
|
||||
switch (connectionStatus) {
|
||||
case "connected":
|
||||
return "bg-green-500";
|
||||
case "connecting":
|
||||
return "bg-yellow-500";
|
||||
case "disconnected":
|
||||
default:
|
||||
return "bg-red-500";
|
||||
}
|
||||
}, [connectionStatus]);
|
||||
|
||||
return (
|
||||
<div className="flex items-center" title={`event stream: ${connectionStatus}`}>
|
||||
<span className={`inline-block w-3 h-3 rounded-full ${eventStatusColor} mr-2`}></span>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
export default ConnectionStatusIcon;
|
||||
@@ -1,4 +1,5 @@
|
||||
import { useRef, createContext, useState, useContext, useEffect, useCallback, useMemo, type ReactNode } from "react";
|
||||
import type { ConnectionState } from "../lib/types";
|
||||
|
||||
type ModelStatus = "ready" | "starting" | "stopping" | "stopped" | "shutdown" | "unknown";
|
||||
const LOG_LENGTH_LIMIT = 1024 * 100; /* 100KB of log data */
|
||||
@@ -20,6 +21,7 @@ interface APIProviderType {
|
||||
proxyLogs: string;
|
||||
upstreamLogs: string;
|
||||
metrics: Metrics[];
|
||||
connectionStatus: ConnectionState;
|
||||
}
|
||||
|
||||
interface Metrics {
|
||||
@@ -28,6 +30,7 @@ interface Metrics {
|
||||
model: string;
|
||||
input_tokens: number;
|
||||
output_tokens: number;
|
||||
prompt_per_second: number;
|
||||
tokens_per_second: number;
|
||||
duration_ms: number;
|
||||
}
|
||||
@@ -51,6 +54,7 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
|
||||
const [proxyLogs, setProxyLogs] = useState("");
|
||||
const [upstreamLogs, setUpstreamLogs] = useState("");
|
||||
const [metrics, setMetrics] = useState<Metrics[]>([]);
|
||||
const [connectionStatus, setConnectionState] = useState<ConnectionState>("disconnected");
|
||||
const apiEventSource = useRef<EventSource | null>(null);
|
||||
|
||||
const [models, setModels] = useState<Model[]>([]);
|
||||
@@ -74,7 +78,20 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
|
||||
const initialDelay = 1000; // 1 second
|
||||
|
||||
const connect = () => {
|
||||
apiEventSource.current = null;
|
||||
const eventSource = new EventSource("/api/events");
|
||||
setConnectionState("connecting");
|
||||
|
||||
eventSource.onopen = () => {
|
||||
// clear everything out on connect to keep things in sync
|
||||
setProxyLogs("");
|
||||
setUpstreamLogs("");
|
||||
setMetrics([]); // clear metrics on reconnect
|
||||
setModels([]); // clear models on reconnect
|
||||
apiEventSource.current = eventSource;
|
||||
retryCount = 0;
|
||||
setConnectionState("connected");
|
||||
};
|
||||
|
||||
eventSource.onmessage = (e: MessageEvent) => {
|
||||
try {
|
||||
@@ -107,9 +124,9 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
|
||||
|
||||
case "metrics":
|
||||
{
|
||||
const newMetric = JSON.parse(message.data) as Metrics;
|
||||
const newMetrics = JSON.parse(message.data) as Metrics[];
|
||||
setMetrics((prevMetrics) => {
|
||||
return [newMetric, ...prevMetrics];
|
||||
return [...newMetrics, ...prevMetrics];
|
||||
});
|
||||
}
|
||||
break;
|
||||
@@ -118,14 +135,14 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
|
||||
console.error(e.data, err);
|
||||
}
|
||||
};
|
||||
|
||||
eventSource.onerror = () => {
|
||||
eventSource.close();
|
||||
retryCount++;
|
||||
const delay = Math.min(initialDelay * Math.pow(2, retryCount - 1), 5000);
|
||||
setConnectionState("disconnected");
|
||||
setTimeout(connect, delay);
|
||||
};
|
||||
|
||||
apiEventSource.current = eventSource;
|
||||
};
|
||||
|
||||
connect();
|
||||
@@ -193,6 +210,7 @@ export function APIProvider({ children, autoStartAPIEvents = true }: APIProvider
|
||||
proxyLogs,
|
||||
upstreamLogs,
|
||||
metrics,
|
||||
connectionStatus,
|
||||
}),
|
||||
[models, listModels, unloadAllModels, loadModel, enableAPIEvents, proxyLogs, upstreamLogs, metrics]
|
||||
);
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import { createContext, useContext, useEffect, type ReactNode, useMemo, useState } from "react";
|
||||
import { usePersistentState } from "../hooks/usePersistentState";
|
||||
import type { ConnectionState } from "../lib/types";
|
||||
|
||||
type ScreenWidth = "xs" | "sm" | "md" | "lg" | "xl" | "2xl";
|
||||
type ThemeContextType = {
|
||||
@@ -7,6 +8,11 @@ type ThemeContextType = {
|
||||
screenWidth: ScreenWidth;
|
||||
isNarrow: boolean;
|
||||
toggleTheme: () => void;
|
||||
|
||||
// for managing the window title and connection state information
|
||||
appTitle: string;
|
||||
setAppTitle: (title: string) => void;
|
||||
setConnectionState: (state: ConnectionState) => void;
|
||||
};
|
||||
|
||||
const ThemeContext = createContext<ThemeContextType | undefined>(undefined);
|
||||
@@ -16,6 +22,17 @@ type ThemeProviderProps = {
|
||||
};
|
||||
|
||||
export function ThemeProvider({ children }: ThemeProviderProps) {
|
||||
const [appTitle, setAppTitle] = usePersistentState("app-title", "llama-swap");
|
||||
const [connectionState, setConnectionState] = useState<ConnectionState>("disconnected");
|
||||
|
||||
/**
|
||||
* Set the document.title with informative information
|
||||
*/
|
||||
useEffect(() => {
|
||||
const connectionIcon = connectionState === "connecting" ? "🟡" : connectionState === "connected" ? "🟢" : "🔴";
|
||||
document.title = connectionIcon + " " + appTitle; // Set initial title
|
||||
}, [appTitle, connectionState]);
|
||||
|
||||
const [isDarkMode, setIsDarkMode] = usePersistentState<boolean>("theme", false);
|
||||
const [screenWidth, setScreenWidth] = useState<ScreenWidth>("md"); // Default to md
|
||||
|
||||
@@ -55,7 +72,19 @@ export function ThemeProvider({ children }: ThemeProviderProps) {
|
||||
}, [screenWidth]);
|
||||
|
||||
return (
|
||||
<ThemeContext.Provider value={{ isDarkMode, toggleTheme, screenWidth, isNarrow }}>{children}</ThemeContext.Provider>
|
||||
<ThemeContext.Provider
|
||||
value={{
|
||||
isDarkMode,
|
||||
toggleTheme,
|
||||
screenWidth,
|
||||
isNarrow,
|
||||
appTitle,
|
||||
setAppTitle,
|
||||
setConnectionState,
|
||||
}}
|
||||
>
|
||||
{children}
|
||||
</ThemeContext.Provider>
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1 @@
|
||||
export type ConnectionState = "connected" | "connecting" | "disconnected";
|
||||
+4
-1
@@ -3,11 +3,14 @@ import { createRoot } from "react-dom/client";
|
||||
import "./index.css";
|
||||
import App from "./App.tsx";
|
||||
import { ThemeProvider } from "./contexts/ThemeProvider";
|
||||
import { APIProvider } from "./contexts/APIProvider";
|
||||
|
||||
createRoot(document.getElementById("root")!).render(
|
||||
<StrictMode>
|
||||
<ThemeProvider>
|
||||
<App />
|
||||
<APIProvider>
|
||||
<App />
|
||||
</APIProvider>
|
||||
</ThemeProvider>
|
||||
</StrictMode>
|
||||
);
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { useState, useEffect } from "react";
|
||||
import { useMemo } from "react";
|
||||
import { useAPI } from "../contexts/APIProvider";
|
||||
|
||||
const formatTimestamp = (timestamp: string): string => {
|
||||
@@ -15,25 +15,10 @@ const formatDuration = (ms: number): string => {
|
||||
|
||||
const ActivityPage = () => {
|
||||
const { metrics } = useAPI();
|
||||
const [error, setError] = useState<string | null>(null);
|
||||
|
||||
useEffect(() => {
|
||||
if (metrics.length > 0) {
|
||||
setError(null);
|
||||
}
|
||||
const sortedMetrics = useMemo(() => {
|
||||
return [...metrics].sort((a, b) => b.id - a.id);
|
||||
}, [metrics]);
|
||||
|
||||
if (error) {
|
||||
return (
|
||||
<div className="p-6">
|
||||
<h1 className="text-2xl font-bold mb-4">Activity</h1>
|
||||
<div className="bg-red-50 border border-red-200 rounded-md p-4">
|
||||
<p className="text-red-800">{error}</p>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="p-6">
|
||||
<h1 className="text-2xl font-bold mb-4">Activity</h1>
|
||||
@@ -47,21 +32,25 @@ const ActivityPage = () => {
|
||||
<table className="min-w-full divide-y">
|
||||
<thead>
|
||||
<tr>
|
||||
<th className="px-4 py-3 text-left text-xs font-medium uppercase tracking-wider">Id</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Timestamp</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Model</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Input Tokens</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Output Tokens</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Prompt Processing</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Generation Speed</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider">Duration</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y">
|
||||
{metrics.map((metric, index) => (
|
||||
<tr key={`${metric.id}-${index}`}>
|
||||
{sortedMetrics.map((metric) => (
|
||||
<tr key={`metric_${metric.id}`}>
|
||||
<td className="px-4 py-4 whitespace-nowrap text-sm">{metric.id + 1 /* un-zero index */}</td>
|
||||
<td className="px-6 py-4 whitespace-nowrap text-sm">{formatTimestamp(metric.timestamp)}</td>
|
||||
<td className="px-6 py-4 whitespace-nowrap text-sm">{metric.model}</td>
|
||||
<td className="px-6 py-4 whitespace-nowrap text-sm">{metric.input_tokens.toLocaleString()}</td>
|
||||
<td className="px-6 py-4 whitespace-nowrap text-sm">{metric.output_tokens.toLocaleString()}</td>
|
||||
<td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.prompt_per_second)}</td>
|
||||
<td className="px-6 py-4 whitespace-nowrap text-sm">{formatSpeed(metric.tokens_per_second)}</td>
|
||||
<td className="px-6 py-4 whitespace-nowrap text-sm">{formatDuration(metric.duration_ms)}</td>
|
||||
</tr>
|
||||
|
||||
Reference in New Issue
Block a user