Compare commits
11 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 10606abf89 | |||
| fefd14903d | |||
| 717d64e336 | |||
| 285191e655 | |||
| 4236cec03a | |||
| 756193d0dd | |||
| a6b2e930d8 | |||
| 9e02c22ff8 | |||
| 0bdbf2fdc1 | |||
| 49035e2e8e | |||
| 9963ae18bf |
@@ -17,14 +17,16 @@ builds:
|
|||||||
- goos: windows
|
- goos: windows
|
||||||
goarch: arm64
|
goarch: arm64
|
||||||
|
|
||||||
# use zip format for windows
|
|
||||||
archives:
|
archives:
|
||||||
- id: default
|
- id: default
|
||||||
format: tar.gz
|
formats:
|
||||||
|
- tar.gz
|
||||||
name_template: "{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}"
|
name_template: "{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}"
|
||||||
builds_info:
|
builds_info:
|
||||||
group: root
|
group: root
|
||||||
owner: root
|
owner: root
|
||||||
format_overrides:
|
format_overrides:
|
||||||
|
# use zip format for windows
|
||||||
- goos: windows
|
- goos: windows
|
||||||
format: zip
|
formats:
|
||||||
|
- zip
|
||||||
@@ -22,6 +22,7 @@ Written in golang, it is very easy to install (single binary with no dependencie
|
|||||||
- `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
|
- `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
|
||||||
- `v1/audio/transcriptions` ([docs](https://github.com/mostlygeek/llama-swap/issues/41#issuecomment-2722637867))
|
- `v1/audio/transcriptions` ([docs](https://github.com/mostlygeek/llama-swap/issues/41#issuecomment-2722637867))
|
||||||
- ✅ llama-swap custom API endpoints
|
- ✅ llama-swap custom API endpoints
|
||||||
|
- `/ui` - web UI
|
||||||
- `/log` - remote log monitoring
|
- `/log` - remote log monitoring
|
||||||
- `/upstream/:model_id` - direct access to upstream HTTP server ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
|
- `/upstream/:model_id` - direct access to upstream HTTP server ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
|
||||||
- `/unload` - manually unload running models ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
|
- `/unload` - manually unload running models ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
|
||||||
@@ -67,6 +68,12 @@ However, there are many more capabilities that llama-swap supports:
|
|||||||
|
|
||||||
See the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration) in the wiki all options and examples.
|
See the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration) in the wiki all options and examples.
|
||||||
|
|
||||||
|
## Web UI
|
||||||
|
|
||||||
|
llama-swap ships with a web based interface to make it easier to monitor logs and check the status of models.
|
||||||
|
|
||||||
|
<img width="1758" alt="image" src="https://github.com/user-attachments/assets/31ae5bcd-5efd-46b0-b64b-6db9e60196d3" />
|
||||||
|
|
||||||
## Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
|
## Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
|
||||||
|
|
||||||
Docker is the quickest way to try out llama-swap:
|
Docker is the quickest way to try out llama-swap:
|
||||||
|
|||||||
@@ -1,93 +1,191 @@
|
|||||||
# ======
|
# llama-swap YAML configuration example
|
||||||
# For a more detailed configuration example:
|
# -------------------------------------
|
||||||
# https://github.com/mostlygeek/llama-swap/wiki/Configuration
|
#
|
||||||
# ======
|
# - Below are all the available configuration options for llama-swap.
|
||||||
|
# - Settings with a default value, or noted as optional can be omitted.
|
||||||
|
# - Settings that are marked required must be in your configuration file
|
||||||
|
|
||||||
# Seconds to wait for llama.cpp to be available to serve requests
|
# healthCheckTimeout: number of seconds to wait for a model to be ready to serve requests
|
||||||
# Default (and minimum): 15 seconds
|
# - optional, default: 120
|
||||||
healthCheckTimeout: 90
|
# - minimum value is 15 seconds, anything less will be set to this value
|
||||||
|
healthCheckTimeout: 500
|
||||||
|
|
||||||
# valid log levels: debug, info (default), warn, error
|
# logLevel: sets the logging value
|
||||||
logLevel: debug
|
# - optional, default: info
|
||||||
|
# - Valid log levels: debug, info, warn, error
|
||||||
|
logLevel: info
|
||||||
|
|
||||||
# creating a coding profile with models for code generation and general questions
|
# startPort: sets the starting port number for the automatic ${PORT} macro.
|
||||||
groups:
|
# - optional, default: 5800
|
||||||
coding:
|
# - the ${PORT} macro can be used in model.cmd and model.proxy settings
|
||||||
swap: false
|
# - it is automatically incremented for every model that uses it
|
||||||
members:
|
startPort: 10001
|
||||||
- "qwen"
|
|
||||||
- "llama"
|
|
||||||
|
|
||||||
|
# macros: sets a dictionary of string:string pairs
|
||||||
|
# - optional, default: empty dictionary
|
||||||
|
# - these are reusable snippets
|
||||||
|
# - used in a model's cmd, cmdStop, proxy and checkEndpoint
|
||||||
|
# - useful for reducing common configuration settings
|
||||||
|
macros:
|
||||||
|
"latest-llama": >
|
||||||
|
/path/to/llama-server/llama-server-ec9e0301
|
||||||
|
--port ${PORT}
|
||||||
|
|
||||||
|
# models: a dictionary of model configurations
|
||||||
|
# - required
|
||||||
|
# - each key is the model's ID, used in API requests
|
||||||
|
# - model settings have default values that are used if they are not defined here
|
||||||
|
# - below are examples of the various settings a model can have:
|
||||||
|
# - available model settings: env, cmd, cmdStop, proxy, aliases, checkEndpoint, ttl, unlisted
|
||||||
models:
|
models:
|
||||||
|
|
||||||
|
# keys are the model names used in API requests
|
||||||
"llama":
|
"llama":
|
||||||
|
# cmd: the command to run to start the inference server.
|
||||||
|
# - required
|
||||||
|
# - it is just a string, similar to what you would run on the CLI
|
||||||
|
# - using `|` allows for comments in the command, these will be parsed out
|
||||||
|
# - macros can be used within cmd
|
||||||
cmd: |
|
cmd: |
|
||||||
models/llama-server-osx
|
# ${latest-llama} is a macro that is defined above
|
||||||
--port ${PORT}
|
${latest-llama}
|
||||||
-m models/Llama-3.2-1B-Instruct-Q4_0.gguf
|
--model path/to/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf
|
||||||
|
|
||||||
# list of model name aliases this llama.cpp instance can serve
|
# env: define an array of environment variables to inject into cmd's environment
|
||||||
|
# - optional, default: empty array
|
||||||
|
# - each value is a single string
|
||||||
|
# - in the format: ENV_NAME=value
|
||||||
|
env:
|
||||||
|
- "CUDA_VISIBLE_DEVICES=0,1,2"
|
||||||
|
|
||||||
|
# proxy: the URL where llama-swap routes API requests
|
||||||
|
# - optional, default: http://localhost:${PORT}
|
||||||
|
# - if you used ${PORT} in cmd this can be omitted
|
||||||
|
# - if you use a custom port in cmd this *must* be set
|
||||||
|
proxy: http://127.0.0.1:8999
|
||||||
|
|
||||||
|
# aliases: alternative model names that this model configuration is used for
|
||||||
|
# - optional, default: empty array
|
||||||
|
# - aliases must be unique globally
|
||||||
|
# - useful for impersonating a specific model
|
||||||
aliases:
|
aliases:
|
||||||
- gpt-4o-mini
|
- "gpt-4o-mini"
|
||||||
|
- "gpt-3.5-turbo"
|
||||||
|
|
||||||
# check this path for a HTTP 200 response for the server to be ready
|
# checkEndpoint: URL path to check if the server is ready
|
||||||
checkEndpoint: /health
|
# - optional, default: /health
|
||||||
|
# - use "none" to skip endpoint ready checking
|
||||||
|
# - endpoint is expected to return an HTTP 200 response
|
||||||
|
# - all requests wait until the endpoint is ready (or fails)
|
||||||
|
checkEndpoint: /custom-endpoint
|
||||||
|
|
||||||
# unload model after 5 seconds
|
# ttl: automatically unload the model after this many seconds
|
||||||
ttl: 5
|
# - optional, default: 0
|
||||||
|
# - ttl values must be a value greater than 0
|
||||||
|
# - a value of 0 disables automatic unloading of the model
|
||||||
|
ttl: 60
|
||||||
|
|
||||||
"qwen":
|
# useModelName: overrides the model name that is sent to upstream server
|
||||||
cmd: models/llama-server-osx --port ${PORT} -m models/qwen2.5-0.5b-instruct-q8_0.gguf
|
# - optional, default: ""
|
||||||
aliases:
|
# - useful when the upstream server expects a specific model name or format
|
||||||
- gpt-3.5-turbo
|
useModelName: "qwen:qwq"
|
||||||
|
|
||||||
# Embedding example with Nomic
|
# filters: a dictionary of filter settings
|
||||||
# https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF
|
# - optional, default: empty dictionary
|
||||||
"nomic":
|
filters:
|
||||||
cmd: |
|
# strip_params: a comma separated list of parameters to remove from the request
|
||||||
models/llama-server-osx --port ${PORT}
|
# - optional, default: ""
|
||||||
-m models/nomic-embed-text-v1.5.Q8_0.gguf
|
# - useful for preventing overriding of default server params by requests
|
||||||
--ctx-size 8192
|
# - `model` parameter is never removed
|
||||||
--batch-size 8192
|
# - can be any JSON key in the request body
|
||||||
--rope-scaling yarn
|
# - recommended to stick to sampling parameters
|
||||||
--rope-freq-scale 0.75
|
strip_params: "temperature, top_p, top_k"
|
||||||
-ngl 99
|
|
||||||
--embeddings
|
|
||||||
|
|
||||||
# Reranking example with bge-reranker
|
# Unlisted model example:
|
||||||
# https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF
|
"qwen-unlisted":
|
||||||
"bge-reranker":
|
# unlisted: true or false
|
||||||
cmd: |
|
# - optional, default: false
|
||||||
models/llama-server-osx --port ${PORT}
|
# - unlisted models do not show up in /v1/models or /upstream lists
|
||||||
-m models/bge-reranker-v2-m3-Q4_K_M.gguf
|
# - can be requested as normal through all apis
|
||||||
--ctx-size 8192
|
unlisted: true
|
||||||
--reranking
|
cmd: llama-server --port ${PORT} -m Llama-3.2-1B-Instruct-Q4_K_M.gguf -ngl 0
|
||||||
|
|
||||||
# Docker Support (v26.1.4+ required!)
|
# Docker example:
|
||||||
"dockertest":
|
# container run times like Docker and Podman can also be used with a
|
||||||
|
# a combination of cmd and cmdStop.
|
||||||
|
"docker-llama":
|
||||||
|
proxy: "http://127.0.0.1:${PORT}"
|
||||||
cmd: |
|
cmd: |
|
||||||
docker run --name dockertest
|
docker run --name dockertest
|
||||||
--init --rm -p ${PORT}:8080 -v /mnt/nvme/models:/models
|
--init --rm -p ${PORT}:8080 -v /mnt/nvme/models:/models
|
||||||
ghcr.io/ggerganov/llama.cpp:server
|
ghcr.io/ggml-org/llama.cpp:server
|
||||||
--model '/models/Qwen2.5-Coder-0.5B-Instruct-Q4_K_M.gguf'
|
--model '/models/Qwen2.5-Coder-0.5B-Instruct-Q4_K_M.gguf'
|
||||||
|
|
||||||
"simple":
|
# cmdStop: command to run to stop the model gracefully
|
||||||
# example of setting environment variables
|
# - optional, default: ""
|
||||||
env:
|
# - useful for stopping commands managed by another system
|
||||||
- CUDA_VISIBLE_DEVICES=0,1
|
# - on POSIX systems: a SIGTERM is sent for graceful shutdown
|
||||||
- env1=hello
|
# - on Windows, taskkill is used
|
||||||
cmd: build/simple-responder --port ${PORT}
|
# - processes are given 5 seconds to shutdown until they are forcefully killed
|
||||||
unlisted: true
|
# - the upstream's process id is available in the ${PID} macro
|
||||||
|
cmdStop: docker stop dockertest
|
||||||
|
|
||||||
# use "none" to skip check. Caution this may cause some requests to fail
|
# groups: a dictionary of group settings
|
||||||
# until the upstream server is ready for traffic
|
# - optional, default: empty dictionary
|
||||||
checkEndpoint: none
|
# - provide advanced controls over model swapping behaviour.
|
||||||
|
# - Using groups some models can be kept loaded indefinitely, while others are swapped out.
|
||||||
|
# - model ids must be defined in the Models section
|
||||||
|
# - a model can only be a member of one group
|
||||||
|
# - group behaviour is controlled via the `swap`, `exclusive` and `persistent` fields
|
||||||
|
# - see issue #109 for details
|
||||||
|
#
|
||||||
|
# NOTE: the example below uses model names that are not defined above for demonstration purposes
|
||||||
|
groups:
|
||||||
|
# group1 is same as the default behaviour of llama-swap where only one model is allowed
|
||||||
|
# to run a time across the whole llama-swap instance
|
||||||
|
"group1":
|
||||||
|
# swap: controls the model swapping behaviour in within the group
|
||||||
|
# - optional, default: true
|
||||||
|
# - true : only one model is allowed to run at a time
|
||||||
|
# - false: all models can run together, no swapping
|
||||||
|
swap: true
|
||||||
|
|
||||||
# don't use these, just for testing if things are broken
|
# exclusive: controls how the group affects other groups
|
||||||
"broken":
|
# - optional, default: true
|
||||||
cmd: models/llama-server-osx --port 8999 -m models/doesnotexist.gguf
|
# - true: causes all other groups to unload when this group runs a model
|
||||||
proxy: http://127.0.0.1:8999
|
# - false: does not affect other groups
|
||||||
unlisted: true
|
exclusive: true
|
||||||
"broken_timeout":
|
|
||||||
cmd: models/llama-server-osx --port 8999 -m models/qwen2.5-0.5b-instruct-q8_0.gguf
|
# members references the models defined above
|
||||||
proxy: http://127.0.0.1:9000
|
# required
|
||||||
unlisted: true
|
members:
|
||||||
|
- "llama"
|
||||||
|
- "qwen-unlisted"
|
||||||
|
|
||||||
|
# Example:
|
||||||
|
# - in this group all the models can run at the same time
|
||||||
|
# - when a different group loads all running models in this group are unloaded
|
||||||
|
"group2":
|
||||||
|
swap: false
|
||||||
|
exclusive: false
|
||||||
|
members:
|
||||||
|
- "docker-llama"
|
||||||
|
- "modelA"
|
||||||
|
- "modelB"
|
||||||
|
|
||||||
|
# Example:
|
||||||
|
# - a persistent group, prevents other groups from unloading it
|
||||||
|
"forever":
|
||||||
|
# persistent: prevents over groups from unloading the models in this group
|
||||||
|
# - optional, default: false
|
||||||
|
# - does not affect individual model behaviour
|
||||||
|
persistent: true
|
||||||
|
|
||||||
|
# set swap/exclusive to false to prevent swapping inside the group
|
||||||
|
# and the unloading of other groups
|
||||||
|
swap: false
|
||||||
|
exclusive: false
|
||||||
|
members:
|
||||||
|
- "forever-modelA"
|
||||||
|
- "forever-modelB"
|
||||||
|
- "forever-modelc"
|
||||||
@@ -144,8 +144,8 @@ func watchConfigFileWithReload(configPath string, reloadChan chan<- *proxy.Proxy
|
|||||||
if !ok {
|
if !ok {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
// We only care about writes to the specific config file
|
// We only care about writes/creates to the specific config file
|
||||||
if event.Name == configPath && event.Has(fsnotify.Write) {
|
if event.Name == configPath && (event.Has(fsnotify.Write) || event.Has(fsnotify.Create)) {
|
||||||
// Reset or start the debounce timer
|
// Reset or start the debounce timer
|
||||||
if debounceTimer != nil {
|
if debounceTimer != nil {
|
||||||
debounceTimer.Stop()
|
debounceTimer.Stop()
|
||||||
|
|||||||
@@ -42,9 +42,12 @@ func main() {
|
|||||||
time.Sleep(wait)
|
time.Sleep(wait)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bodyBytes, _ := io.ReadAll(c.Request.Body)
|
||||||
|
|
||||||
c.JSON(http.StatusOK, gin.H{
|
c.JSON(http.StatusOK, gin.H{
|
||||||
"responseMessage": *responseMessage,
|
"responseMessage": *responseMessage,
|
||||||
"h_content_length": c.Request.Header.Get("Content-Length"),
|
"h_content_length": c.Request.Header.Get("Content-Length"),
|
||||||
|
"request_body": string(bodyBytes),
|
||||||
})
|
})
|
||||||
})
|
})
|
||||||
|
|
||||||
|
|||||||
@@ -6,6 +6,7 @@ import (
|
|||||||
"os"
|
"os"
|
||||||
"regexp"
|
"regexp"
|
||||||
"runtime"
|
"runtime"
|
||||||
|
"slices"
|
||||||
"sort"
|
"sort"
|
||||||
"strconv"
|
"strconv"
|
||||||
"strings"
|
"strings"
|
||||||
@@ -29,6 +30,9 @@ type ModelConfig struct {
|
|||||||
|
|
||||||
// Limit concurrency of HTTP requests to process
|
// Limit concurrency of HTTP requests to process
|
||||||
ConcurrencyLimit int `yaml:"concurrencyLimit"`
|
ConcurrencyLimit int `yaml:"concurrencyLimit"`
|
||||||
|
|
||||||
|
// Model filters see issue #174
|
||||||
|
Filters ModelFilters `yaml:"filters"`
|
||||||
}
|
}
|
||||||
|
|
||||||
func (m *ModelConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
|
func (m *ModelConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
|
||||||
@@ -63,6 +67,46 @@ func (m *ModelConfig) SanitizedCommand() ([]string, error) {
|
|||||||
return SanitizeCommand(m.Cmd)
|
return SanitizeCommand(m.Cmd)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ModelFilters see issue #174
|
||||||
|
type ModelFilters struct {
|
||||||
|
StripParams string `yaml:"strip_params"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func (m *ModelFilters) UnmarshalYAML(unmarshal func(interface{}) error) error {
|
||||||
|
type rawModelFilters ModelFilters
|
||||||
|
defaults := rawModelFilters{
|
||||||
|
StripParams: "",
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := unmarshal(&defaults); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
*m = ModelFilters(defaults)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f ModelFilters) SanitizedStripParams() ([]string, error) {
|
||||||
|
if f.StripParams == "" {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
params := strings.Split(f.StripParams, ",")
|
||||||
|
cleaned := make([]string, 0, len(params))
|
||||||
|
|
||||||
|
for _, param := range params {
|
||||||
|
trimmed := strings.TrimSpace(param)
|
||||||
|
if trimmed == "model" || trimmed == "" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
cleaned = append(cleaned, trimmed)
|
||||||
|
}
|
||||||
|
|
||||||
|
// sort cleaned
|
||||||
|
slices.Sort(cleaned)
|
||||||
|
return cleaned, nil
|
||||||
|
}
|
||||||
|
|
||||||
type GroupConfig struct {
|
type GroupConfig struct {
|
||||||
Swap bool `yaml:"swap"`
|
Swap bool `yaml:"swap"`
|
||||||
Exclusive bool `yaml:"exclusive"`
|
Exclusive bool `yaml:"exclusive"`
|
||||||
@@ -212,6 +256,7 @@ func LoadConfigFromReader(r io.Reader) (Config, error) {
|
|||||||
modelConfig.CmdStop = strings.ReplaceAll(modelConfig.CmdStop, macroSlug, macroValue)
|
modelConfig.CmdStop = strings.ReplaceAll(modelConfig.CmdStop, macroSlug, macroValue)
|
||||||
modelConfig.Proxy = strings.ReplaceAll(modelConfig.Proxy, macroSlug, macroValue)
|
modelConfig.Proxy = strings.ReplaceAll(modelConfig.Proxy, macroSlug, macroValue)
|
||||||
modelConfig.CheckEndpoint = strings.ReplaceAll(modelConfig.CheckEndpoint, macroSlug, macroValue)
|
modelConfig.CheckEndpoint = strings.ReplaceAll(modelConfig.CheckEndpoint, macroSlug, macroValue)
|
||||||
|
modelConfig.Filters.StripParams = strings.ReplaceAll(modelConfig.Filters.StripParams, macroSlug, macroValue)
|
||||||
}
|
}
|
||||||
|
|
||||||
// enforce ${PORT} used in both cmd and proxy
|
// enforce ${PORT} used in both cmd and proxy
|
||||||
|
|||||||
@@ -83,6 +83,9 @@ models:
|
|||||||
assert.Equal(t, "", model1.UseModelName)
|
assert.Equal(t, "", model1.UseModelName)
|
||||||
assert.Equal(t, 0, model1.ConcurrencyLimit)
|
assert.Equal(t, 0, model1.ConcurrencyLimit)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// default empty filter exists
|
||||||
|
assert.Equal(t, "", model1.Filters.StripParams)
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestConfig_LoadPosix(t *testing.T) {
|
func TestConfig_LoadPosix(t *testing.T) {
|
||||||
|
|||||||
@@ -300,3 +300,28 @@ models:
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestConfig_ModelFilters(t *testing.T) {
|
||||||
|
content := `
|
||||||
|
macros:
|
||||||
|
default_strip: "temperature, top_p"
|
||||||
|
models:
|
||||||
|
model1:
|
||||||
|
cmd: path/to/cmd --port ${PORT}
|
||||||
|
filters:
|
||||||
|
strip_params: "model, top_k, ${default_strip}, , ,"
|
||||||
|
`
|
||||||
|
config, err := LoadConfigFromReader(strings.NewReader(content))
|
||||||
|
assert.NoError(t, err)
|
||||||
|
modelConfig, ok := config.Models["model1"]
|
||||||
|
if !assert.True(t, ok) {
|
||||||
|
t.FailNow()
|
||||||
|
}
|
||||||
|
|
||||||
|
// make sure `model` and enmpty strings are not in the list
|
||||||
|
assert.Equal(t, "model, top_k, temperature, top_p, , ,", modelConfig.Filters.StripParams)
|
||||||
|
sanitized, err := modelConfig.Filters.SanitizedStripParams()
|
||||||
|
if assert.NoError(t, err) {
|
||||||
|
assert.Equal(t, []string{"temperature", "top_k", "top_p"}, sanitized)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -80,6 +80,9 @@ models:
|
|||||||
assert.Equal(t, "", model1.UseModelName)
|
assert.Equal(t, "", model1.UseModelName)
|
||||||
assert.Equal(t, 0, model1.ConcurrencyLimit)
|
assert.Equal(t, 0, model1.ConcurrencyLimit)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// default empty filter exists
|
||||||
|
assert.Equal(t, "", model1.Filters.StripParams)
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestConfig_LoadWindows(t *testing.T) {
|
func TestConfig_LoadWindows(t *testing.T) {
|
||||||
|
|||||||
@@ -189,18 +189,19 @@ func (p *Process) start() error {
|
|||||||
p.waitStarting.Add(1)
|
p.waitStarting.Add(1)
|
||||||
defer p.waitStarting.Done()
|
defer p.waitStarting.Done()
|
||||||
cmdContext, ctxCancelUpstream := context.WithCancel(context.Background())
|
cmdContext, ctxCancelUpstream := context.WithCancel(context.Background())
|
||||||
p.proxyLogger.Debugf("<%s> Executing start command: %s", p.ID, strings.Join(args, " "))
|
|
||||||
p.cmd = exec.CommandContext(cmdContext, args[0], args[1:]...)
|
p.cmd = exec.CommandContext(cmdContext, args[0], args[1:]...)
|
||||||
p.cmd.Stdout = p.processLogger
|
p.cmd.Stdout = p.processLogger
|
||||||
p.cmd.Stderr = p.processLogger
|
p.cmd.Stderr = p.processLogger
|
||||||
p.cmd.Env = p.config.Env
|
p.cmd.Env = append(p.cmd.Environ(), p.config.Env...)
|
||||||
|
|
||||||
p.cmd.Cancel = p.cmdStopUpstreamProcess
|
p.cmd.Cancel = p.cmdStopUpstreamProcess
|
||||||
p.cmd.WaitDelay = p.gracefulStopTimeout
|
p.cmd.WaitDelay = p.gracefulStopTimeout
|
||||||
p.cancelUpstream = ctxCancelUpstream
|
p.cancelUpstream = ctxCancelUpstream
|
||||||
p.cmdWaitChan = make(chan struct{})
|
p.cmdWaitChan = make(chan struct{})
|
||||||
|
|
||||||
p.failedStartCount++ // this will be reset to zero when the process has successfully started
|
p.failedStartCount++ // this will be reset to zero when the process has successfully started
|
||||||
|
|
||||||
|
p.proxyLogger.Debugf("<%s> Executing start command: %s, env: %s", p.ID, strings.Join(args, " "), strings.Join(p.config.Env, ", "))
|
||||||
err = p.cmd.Start()
|
err = p.cmd.Start()
|
||||||
|
|
||||||
// Set process state to failed
|
// Set process state to failed
|
||||||
@@ -531,7 +532,7 @@ func (p *Process) cmdStopUpstreamProcess() error {
|
|||||||
stopCmd := exec.Command(stopArgs[0], stopArgs[1:]...)
|
stopCmd := exec.Command(stopArgs[0], stopArgs[1:]...)
|
||||||
stopCmd.Stdout = p.processLogger
|
stopCmd.Stdout = p.processLogger
|
||||||
stopCmd.Stderr = p.processLogger
|
stopCmd.Stderr = p.processLogger
|
||||||
stopCmd.Env = p.config.Env
|
stopCmd.Env = p.cmd.Env
|
||||||
|
|
||||||
if err := stopCmd.Run(); err != nil {
|
if err := stopCmd.Run(); err != nil {
|
||||||
p.proxyLogger.Errorf("<%s> Failed to exec stop command: %v", p.ID, err)
|
p.proxyLogger.Errorf("<%s> Failed to exec stop command: %v", p.ID, err)
|
||||||
|
|||||||
@@ -394,6 +394,9 @@ func TestProcess_StopImmediately(t *testing.T) {
|
|||||||
// Test that SIGKILL is sent when gracefulStopTimeout is reached and properly terminates
|
// Test that SIGKILL is sent when gracefulStopTimeout is reached and properly terminates
|
||||||
// the upstream command
|
// the upstream command
|
||||||
func TestProcess_ForceStopWithKill(t *testing.T) {
|
func TestProcess_ForceStopWithKill(t *testing.T) {
|
||||||
|
if runtime.GOOS == "windows" {
|
||||||
|
t.Skip("skipping SIGTERM test on Windows ")
|
||||||
|
}
|
||||||
|
|
||||||
expectedMessage := "test_sigkill"
|
expectedMessage := "test_sigkill"
|
||||||
binaryPath := getSimpleResponderPath()
|
binaryPath := getSimpleResponderPath()
|
||||||
@@ -405,7 +408,6 @@ func TestProcess_ForceStopWithKill(t *testing.T) {
|
|||||||
Cmd: fmt.Sprintf("%s --port %d --respond %s --silent --ignore-sig-term", binaryPath, port, expectedMessage),
|
Cmd: fmt.Sprintf("%s --port %d --respond %s --silent --ignore-sig-term", binaryPath, port, expectedMessage),
|
||||||
Proxy: fmt.Sprintf("http://127.0.0.1:%d", port),
|
Proxy: fmt.Sprintf("http://127.0.0.1:%d", port),
|
||||||
CheckEndpoint: "/health",
|
CheckEndpoint: "/health",
|
||||||
CmdStop: "taskkill /f /t /pid ${PID}",
|
|
||||||
}
|
}
|
||||||
|
|
||||||
process := NewProcess("stop_immediate", 2, config, debugLogger, debugLogger)
|
process := NewProcess("stop_immediate", 2, config, debugLogger, debugLogger)
|
||||||
@@ -465,3 +467,27 @@ func TestProcess_StopCmd(t *testing.T) {
|
|||||||
process.StopImmediately()
|
process.StopImmediately()
|
||||||
assert.Equal(t, process.CurrentState(), StateStopped)
|
assert.Equal(t, process.CurrentState(), StateStopped)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestProcess_EnvironmentSetCorrectly(t *testing.T) {
|
||||||
|
expectedMessage := "test_env_not_emptied"
|
||||||
|
config := getTestSimpleResponderConfig(expectedMessage)
|
||||||
|
|
||||||
|
// ensure that the the default config does not blank out the inherited environment
|
||||||
|
configWEnv := config
|
||||||
|
|
||||||
|
// ensure the additiona variables are appended to the process' environment
|
||||||
|
configWEnv.Env = append(configWEnv.Env, "TEST_ENV1=1", "TEST_ENV2=2")
|
||||||
|
|
||||||
|
process1 := NewProcess("env_test", 2, config, debugLogger, debugLogger)
|
||||||
|
process2 := NewProcess("env_test", 2, configWEnv, debugLogger, debugLogger)
|
||||||
|
|
||||||
|
process1.start()
|
||||||
|
defer process1.Stop()
|
||||||
|
process2.start()
|
||||||
|
defer process2.Stop()
|
||||||
|
|
||||||
|
assert.NotZero(t, len(process1.cmd.Environ()))
|
||||||
|
assert.NotZero(t, len(process2.cmd.Environ()))
|
||||||
|
assert.Equal(t, len(process1.cmd.Environ())+2, len(process2.cmd.Environ()), "process2 should have 2 more environment variables than process1")
|
||||||
|
|
||||||
|
}
|
||||||
|
|||||||
@@ -365,6 +365,21 @@ func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// issue #174 strip parameters from the JSON body
|
||||||
|
stripParams, err := pm.config.Models[realModelName].Filters.SanitizedStripParams()
|
||||||
|
if err != nil { // just log it and continue
|
||||||
|
pm.proxyLogger.Errorf("Error sanitizing strip params string: %s, %s", pm.config.Models[realModelName].Filters.StripParams, err.Error())
|
||||||
|
} else {
|
||||||
|
for _, param := range stripParams {
|
||||||
|
pm.proxyLogger.Debugf("<%s> stripping param: %s", realModelName, param)
|
||||||
|
bodyBytes, err = sjson.DeleteBytes(bodyBytes, param)
|
||||||
|
if err != nil {
|
||||||
|
pm.sendErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("error deleting parameter %s from request", param))
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
|
c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
|
||||||
|
|
||||||
// dechunk it as we already have all the body bytes see issue #11
|
// dechunk it as we already have all the body bytes see issue #11
|
||||||
|
|||||||
@@ -623,3 +623,37 @@ func TestProxyManager_ChatContentLength(t *testing.T) {
|
|||||||
assert.Equal(t, "81", response["h_content_length"])
|
assert.Equal(t, "81", response["h_content_length"])
|
||||||
assert.Equal(t, "model1", response["responseMessage"])
|
assert.Equal(t, "model1", response["responseMessage"])
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestProxyManager_FiltersStripParams(t *testing.T) {
|
||||||
|
modelConfig := getTestSimpleResponderConfig("model1")
|
||||||
|
modelConfig.Filters = ModelFilters{
|
||||||
|
StripParams: "temperature, model, stream",
|
||||||
|
}
|
||||||
|
|
||||||
|
config := AddDefaultGroupToConfig(Config{
|
||||||
|
HealthCheckTimeout: 15,
|
||||||
|
LogLevel: "error",
|
||||||
|
Models: map[string]ModelConfig{
|
||||||
|
"model1": modelConfig,
|
||||||
|
},
|
||||||
|
})
|
||||||
|
|
||||||
|
proxy := New(config)
|
||||||
|
defer proxy.StopProcesses(StopWaitForInflightRequest)
|
||||||
|
reqBody := `{"model":"model1", "temperature":0.1, "x_param":"123", "y_param":"abc", "stream":true}`
|
||||||
|
req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewBufferString(reqBody))
|
||||||
|
w := httptest.NewRecorder()
|
||||||
|
|
||||||
|
proxy.ServeHTTP(w, req)
|
||||||
|
assert.Equal(t, http.StatusOK, w.Code)
|
||||||
|
var response map[string]string
|
||||||
|
assert.NoError(t, json.Unmarshal(w.Body.Bytes(), &response))
|
||||||
|
|
||||||
|
// `temperature` and `stream` are gone but model remains
|
||||||
|
assert.Equal(t, `{"model":"model1", "x_param":"123", "y_param":"abc"}`, response["request_body"])
|
||||||
|
|
||||||
|
// assert.Nil(t, response["temperature"])
|
||||||
|
// assert.Equal(t, "123", response["x_param"])
|
||||||
|
// assert.Equal(t, "abc", response["y_param"])
|
||||||
|
// t.Logf("%v", response)
|
||||||
|
}
|
||||||
|
|||||||
@@ -3,7 +3,11 @@
|
|||||||
<head>
|
<head>
|
||||||
<meta charset="UTF-8" />
|
<meta charset="UTF-8" />
|
||||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
<link rel="icon" type="image/png" href="/favicon.ico" />
|
<link rel="icon" type="image/png" href="/favicon-96x96.png" sizes="96x96" />
|
||||||
|
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
|
||||||
|
<link rel="shortcut icon" href="/favicon.ico" />
|
||||||
|
<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png" />
|
||||||
|
<link rel="manifest" href="/site.webmanifest" />
|
||||||
<title>llama-swap</title>
|
<title>llama-swap</title>
|
||||||
</head>
|
</head>
|
||||||
<body >
|
<body >
|
||||||
|
|||||||
|
After Width: | Height: | Size: 5.9 KiB |
|
After Width: | Height: | Size: 2.2 KiB |
|
Before Width: | Height: | Size: 15 KiB After Width: | Height: | Size: 15 KiB |
|
After Width: | Height: | Size: 38 KiB |
@@ -0,0 +1,21 @@
|
|||||||
|
{
|
||||||
|
"name": "llama-swap",
|
||||||
|
"short_name": "llama-swap",
|
||||||
|
"icons": [
|
||||||
|
{
|
||||||
|
"src": "/web-app-manifest-192x192.png",
|
||||||
|
"sizes": "192x192",
|
||||||
|
"type": "image/png",
|
||||||
|
"purpose": "maskable"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"src": "/web-app-manifest-512x512.png",
|
||||||
|
"sizes": "512x512",
|
||||||
|
"type": "image/png",
|
||||||
|
"purpose": "maskable"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"theme_color": "#ffffff",
|
||||||
|
"background_color": "#ffffff",
|
||||||
|
"display": "standalone"
|
||||||
|
}
|
||||||
|
After Width: | Height: | Size: 6.5 KiB |
|
After Width: | Height: | Size: 28 KiB |
@@ -10,10 +10,10 @@ function App() {
|
|||||||
<Router basename="/ui/">
|
<Router basename="/ui/">
|
||||||
<APIProvider>
|
<APIProvider>
|
||||||
<div>
|
<div>
|
||||||
<nav className="bg-surface border-b border-border p-4">
|
<nav className="bg-surface border-b border-border p-2 h-[75px]">
|
||||||
<div className="flex items-center justify-between mx-auto px-4">
|
<div className="flex items-center justify-between mx-auto px-4 h-full">
|
||||||
<h1>llama-swap</h1>
|
<h1 className="flex items-center p-0">llama-swap</h1>
|
||||||
<div className="flex space-x-4">
|
<div className="flex items-center space-x-4">
|
||||||
<NavLink to="/" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
|
<NavLink to="/" className={({ isActive }) => (isActive ? "navlink active" : "navlink")}>
|
||||||
Logs
|
Logs
|
||||||
</NavLink>
|
</NavLink>
|
||||||
|
|||||||
@@ -12,6 +12,7 @@ interface APIProviderType {
|
|||||||
models: Model[];
|
models: Model[];
|
||||||
listModels: () => Promise<Model[]>;
|
listModels: () => Promise<Model[]>;
|
||||||
unloadAllModels: () => Promise<void>;
|
unloadAllModels: () => Promise<void>;
|
||||||
|
loadModel: (model: string) => Promise<void>;
|
||||||
enableProxyLogs: (enabled: boolean) => void;
|
enableProxyLogs: (enabled: boolean) => void;
|
||||||
enableUpstreamLogs: (enabled: boolean) => void;
|
enableUpstreamLogs: (enabled: boolean) => void;
|
||||||
enableModelUpdates: (enabled: boolean) => void;
|
enableModelUpdates: (enabled: boolean) => void;
|
||||||
@@ -57,9 +58,27 @@ export function APIProvider({ children }: APIProviderProps) {
|
|||||||
const enableProxyLogs = useCallback(
|
const enableProxyLogs = useCallback(
|
||||||
(enabled: boolean) => {
|
(enabled: boolean) => {
|
||||||
if (enabled) {
|
if (enabled) {
|
||||||
const eventSource = new EventSource("/logs/streamSSE/proxy");
|
let retryCount = 0;
|
||||||
eventSource.onmessage = handleProxyMessage;
|
const maxRetries = 3;
|
||||||
proxyEventSource.current = eventSource;
|
const initialDelay = 1000; // 1 second
|
||||||
|
|
||||||
|
const connect = () => {
|
||||||
|
const eventSource = new EventSource("/logs/streamSSE/proxy");
|
||||||
|
|
||||||
|
eventSource.onmessage = handleProxyMessage;
|
||||||
|
eventSource.onerror = () => {
|
||||||
|
eventSource.close();
|
||||||
|
if (retryCount < maxRetries) {
|
||||||
|
retryCount++;
|
||||||
|
const delay = initialDelay * Math.pow(2, retryCount - 1);
|
||||||
|
setTimeout(connect, delay);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
proxyEventSource.current = eventSource;
|
||||||
|
};
|
||||||
|
|
||||||
|
connect();
|
||||||
} else {
|
} else {
|
||||||
proxyEventSource.current?.close();
|
proxyEventSource.current?.close();
|
||||||
proxyEventSource.current = null;
|
proxyEventSource.current = null;
|
||||||
@@ -71,15 +90,33 @@ export function APIProvider({ children }: APIProviderProps) {
|
|||||||
const enableUpstreamLogs = useCallback(
|
const enableUpstreamLogs = useCallback(
|
||||||
(enabled: boolean) => {
|
(enabled: boolean) => {
|
||||||
if (enabled) {
|
if (enabled) {
|
||||||
const eventSource = new EventSource("/logs/streamSSE/upstream");
|
let retryCount = 0;
|
||||||
eventSource.onmessage = handleUpstreamMessage;
|
const maxRetries = 3;
|
||||||
upstreamEventSource.current = eventSource;
|
const initialDelay = 1000; // 1 second
|
||||||
|
|
||||||
|
const connect = () => {
|
||||||
|
const eventSource = new EventSource("/logs/streamSSE/upstream");
|
||||||
|
|
||||||
|
eventSource.onmessage = handleUpstreamMessage;
|
||||||
|
eventSource.onerror = () => {
|
||||||
|
eventSource.close();
|
||||||
|
if (retryCount < maxRetries) {
|
||||||
|
retryCount++;
|
||||||
|
const delay = initialDelay * Math.pow(2, retryCount - 1);
|
||||||
|
setTimeout(connect, delay);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
upstreamEventSource.current = eventSource;
|
||||||
|
};
|
||||||
|
|
||||||
|
connect();
|
||||||
} else {
|
} else {
|
||||||
upstreamEventSource.current?.close();
|
upstreamEventSource.current?.close();
|
||||||
upstreamEventSource.current = null;
|
upstreamEventSource.current = null;
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
[upstreamEventSource, handleUpstreamMessage]
|
[handleUpstreamMessage]
|
||||||
);
|
);
|
||||||
|
|
||||||
const enableModelUpdates = useCallback(
|
const enableModelUpdates = useCallback(
|
||||||
@@ -139,11 +176,26 @@ export function APIProvider({ children }: APIProviderProps) {
|
|||||||
}
|
}
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
|
const loadModel = useCallback(async (model: string) => {
|
||||||
|
try {
|
||||||
|
const response = await fetch(`/upstream/${model}/`, {
|
||||||
|
method: "GET",
|
||||||
|
});
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error(`Failed to load model: ${response.status}`);
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error("Failed to load model:", error);
|
||||||
|
throw error; // Re-throw to let calling code handle it
|
||||||
|
}
|
||||||
|
}, []);
|
||||||
|
|
||||||
const value = useMemo(
|
const value = useMemo(
|
||||||
() => ({
|
() => ({
|
||||||
models,
|
models,
|
||||||
listModels,
|
listModels,
|
||||||
unloadAllModels,
|
unloadAllModels,
|
||||||
|
loadModel,
|
||||||
enableProxyLogs,
|
enableProxyLogs,
|
||||||
enableUpstreamLogs,
|
enableUpstreamLogs,
|
||||||
enableModelUpdates,
|
enableModelUpdates,
|
||||||
@@ -154,6 +206,7 @@ export function APIProvider({ children }: APIProviderProps) {
|
|||||||
models,
|
models,
|
||||||
listModels,
|
listModels,
|
||||||
unloadAllModels,
|
unloadAllModels,
|
||||||
|
loadModel,
|
||||||
enableProxyLogs,
|
enableProxyLogs,
|
||||||
enableUpstreamLogs,
|
enableUpstreamLogs,
|
||||||
enableModelUpdates,
|
enableModelUpdates,
|
||||||
|
|||||||
@@ -143,6 +143,10 @@
|
|||||||
@apply bg-surface p-2 px-4 text-sm rounded-full border border-2 transition-colors duration-200 border-btn-border;
|
@apply bg-surface p-2 px-4 text-sm rounded-full border border-2 transition-colors duration-200 border-btn-border;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.btn:hover {
|
||||||
|
cursor: pointer;
|
||||||
|
}
|
||||||
|
|
||||||
.btn--sm {
|
.btn--sm {
|
||||||
@apply px-2 py-0.5 text-xs;
|
@apply px-2 py-0.5 text-xs;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -0,0 +1,18 @@
|
|||||||
|
export function processEvalTimes(text: string) {
|
||||||
|
const lines = text.match(/^ *eval time.*$/gm) || [];
|
||||||
|
|
||||||
|
let totalTokens = 0;
|
||||||
|
let totalTime = 0;
|
||||||
|
|
||||||
|
lines.forEach((line) => {
|
||||||
|
const tokensMatch = line.match(/\/\s*(\d+)\s*tokens/);
|
||||||
|
const timeMatch = line.match(/=\s*(\d+\.\d+)\s*ms/);
|
||||||
|
|
||||||
|
if (tokensMatch) totalTokens += parseFloat(tokensMatch[1]);
|
||||||
|
if (timeMatch) totalTime += parseFloat(timeMatch[1]);
|
||||||
|
});
|
||||||
|
|
||||||
|
const avgTokensPerSecond = totalTime > 0 ? totalTokens / (totalTime / 1000) : 0;
|
||||||
|
|
||||||
|
return [lines.length, totalTokens, Math.round(avgTokensPerSecond * 100) / 100];
|
||||||
|
}
|
||||||
@@ -15,7 +15,7 @@ const LogViewer = () => {
|
|||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div className="flex flex-col gap-5">
|
<div className="flex flex-col gap-5" style={{ height: "calc(100vh - 125px)" }}>
|
||||||
<LogPanel id="proxy" title="Proxy Logs" logData={proxyLogs} />
|
<LogPanel id="proxy" title="Proxy Logs" logData={proxyLogs} />
|
||||||
<LogPanel id="upstream" title="Upstream Logs" logData={upstreamLogs} />
|
<LogPanel id="upstream" title="Upstream Logs" logData={upstreamLogs} />
|
||||||
</div>
|
</div>
|
||||||
@@ -30,11 +30,8 @@ interface LogPanelProps {
|
|||||||
}
|
}
|
||||||
|
|
||||||
export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
|
export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
|
||||||
|
const [isCollapsed, setIsCollapsed] = usePersistentState(`logPanel-${id}-isCollapsed`, false);
|
||||||
const [filterRegex, setFilterRegex] = useState("");
|
const [filterRegex, setFilterRegex] = useState("");
|
||||||
const [panelState, setPanelState] = usePersistentState<"hide" | "small" | "max">(
|
|
||||||
`logPanel-${id}-panelState`,
|
|
||||||
"small"
|
|
||||||
);
|
|
||||||
const [fontSize, setFontSize] = usePersistentState<"xxs" | "xs" | "small" | "normal">(
|
const [fontSize, setFontSize] = usePersistentState<"xxs" | "xs" | "small" | "normal">(
|
||||||
`logPanel-${id}-fontSize`,
|
`logPanel-${id}-fontSize`,
|
||||||
"normal"
|
"normal"
|
||||||
@@ -60,14 +57,6 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
|
|||||||
});
|
});
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
const togglePanelState = useCallback(() => {
|
|
||||||
setPanelState((prev) => {
|
|
||||||
if (prev === "small") return "max";
|
|
||||||
if (prev === "hide") return "small";
|
|
||||||
return "hide";
|
|
||||||
});
|
|
||||||
}, []);
|
|
||||||
|
|
||||||
const fontSizeClass = useMemo(() => {
|
const fontSizeClass = useMemo(() => {
|
||||||
switch (fontSize) {
|
switch (fontSize) {
|
||||||
case "xxs":
|
case "xxs":
|
||||||
@@ -101,20 +90,21 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
|
|||||||
}, [filteredLogs]);
|
}, [filteredLogs]);
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div className={`bg-surface border border-border rounded-lg overflow-hidden flex flex-col ${className || ""}`}>
|
<div
|
||||||
|
className={`bg-surface border border-border rounded-lg overflow-hidden flex flex-col ${
|
||||||
|
!isCollapsed && "h-full"
|
||||||
|
} ${className || ""}`}
|
||||||
|
>
|
||||||
<div className="p-4 border-b border-border bg-secondary">
|
<div className="p-4 border-b border-border bg-secondary">
|
||||||
<div className="flex flex-col md:flex-row md:items-center md:justify-between gap-4">
|
<div className="flex flex-col md:flex-row md:items-center md:justify-between gap-4">
|
||||||
{/* Title - Always full width on mobile, normal on desktop */}
|
{/* Title - Always full width on mobile, normal on desktop */}
|
||||||
<div className="w-full md:w-auto" onClick={togglePanelState}>
|
<div className="w-full md:w-auto" onClick={() => setIsCollapsed(!isCollapsed)}>
|
||||||
<h3 className="m-0 text-lg">{title}</h3>
|
<h3 className="m-0 text-lg">{title}</h3>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div className="flex flex-col sm:flex-row gap-4 w-full md:w-auto">
|
<div className="flex flex-col sm:flex-row gap-4 w-full md:w-auto">
|
||||||
{/* Sizing Buttons - Stacks vertically on mobile */}
|
{/* Sizing Buttons - Stacks vertically on mobile */}
|
||||||
<div className="flex flex-wrap gap-2">
|
<div className="flex flex-wrap gap-2">
|
||||||
<button className="btn" onClick={togglePanelState}>
|
|
||||||
size: {panelState}
|
|
||||||
</button>
|
|
||||||
<button className="btn" onClick={toggleFontSize}>
|
<button className="btn" onClick={toggleFontSize}>
|
||||||
font: {fontSize}
|
font: {fontSize}
|
||||||
</button>
|
</button>
|
||||||
@@ -140,14 +130,11 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{panelState !== "hide" && (
|
{!isCollapsed && (
|
||||||
<div className="flex-1 bg-background font-mono text-sm leading-[1.4] p-3">
|
<div className="flex-1 bg-background font-mono text-sm p-3 overflow-hidden">
|
||||||
<pre
|
<pre
|
||||||
ref={preTagRef}
|
ref={preTagRef}
|
||||||
className={`flex-1 p-4 overflow-y-auto whitespace-pre min-h-0 ${textWrapClass} ${fontSizeClass}`}
|
className={`h-full p-4 overflow-y-auto whitespace-pre min-h-0 ${textWrapClass} ${fontSizeClass}`}
|
||||||
style={{
|
|
||||||
maxHeight: panelState === "max" ? "1500px" : "500px",
|
|
||||||
}}
|
|
||||||
>
|
>
|
||||||
{filteredLogs}
|
{filteredLogs}
|
||||||
</pre>
|
</pre>
|
||||||
@@ -156,5 +143,4 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
|
|||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
};
|
};
|
||||||
|
|
||||||
export default LogViewer;
|
export default LogViewer;
|
||||||
|
|||||||
@@ -1,9 +1,10 @@
|
|||||||
import { useState, useEffect, useCallback } from "react";
|
import { useState, useEffect, useCallback, useMemo } from "react";
|
||||||
import { useAPI } from "../contexts/APIProvider";
|
import { useAPI } from "../contexts/APIProvider";
|
||||||
import { LogPanel } from "./LogViewer";
|
import { LogPanel } from "./LogViewer";
|
||||||
|
import { processEvalTimes } from "../lib/Utils";
|
||||||
|
|
||||||
export default function ModelsPage() {
|
export default function ModelsPage() {
|
||||||
const { models, enableModelUpdates, unloadAllModels, upstreamLogs, enableUpstreamLogs } = useAPI();
|
const { models, enableModelUpdates, unloadAllModels, loadModel, upstreamLogs, enableUpstreamLogs } = useAPI();
|
||||||
const [isUnloading, setIsUnloading] = useState(false);
|
const [isUnloading, setIsUnloading] = useState(false);
|
||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
@@ -29,8 +30,12 @@ export default function ModelsPage() {
|
|||||||
}
|
}
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
|
const [totalLines, totalTokens, avgTokensPerSecond] = useMemo(() => {
|
||||||
|
return processEvalTimes(upstreamLogs);
|
||||||
|
}, [upstreamLogs]);
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div className="h-screen">
|
<div>
|
||||||
<div className="flex flex-col md:flex-row gap-4">
|
<div className="flex flex-col md:flex-row gap-4">
|
||||||
{/* Left Column */}
|
{/* Left Column */}
|
||||||
<div className="w-full md:w-1/2 flex items-top">
|
<div className="w-full md:w-1/2 flex items-top">
|
||||||
@@ -43,6 +48,7 @@ export default function ModelsPage() {
|
|||||||
<thead>
|
<thead>
|
||||||
<tr className="border-b border-primary">
|
<tr className="border-b border-primary">
|
||||||
<th className="text-left p-2">Name</th>
|
<th className="text-left p-2">Name</th>
|
||||||
|
<th className="text-left p-2"></th>
|
||||||
<th className="text-left p-2">State</th>
|
<th className="text-left p-2">State</th>
|
||||||
</tr>
|
</tr>
|
||||||
</thead>
|
</thead>
|
||||||
@@ -50,10 +56,19 @@ export default function ModelsPage() {
|
|||||||
{models.map((model) => (
|
{models.map((model) => (
|
||||||
<tr key={model.id} className="border-b hover:bg-secondary-hover border-border">
|
<tr key={model.id} className="border-b hover:bg-secondary-hover border-border">
|
||||||
<td className="p-2">
|
<td className="p-2">
|
||||||
<a href={`/upstream/${model.id}/`} className="underline" target="top">
|
<a href={`/upstream/${model.id}/`} className="underline" target="_blank">
|
||||||
{model.id}
|
{model.id}
|
||||||
</a>
|
</a>
|
||||||
</td>
|
</td>
|
||||||
|
<td className="p-2">
|
||||||
|
<button
|
||||||
|
className="btn btn--sm"
|
||||||
|
disabled={model.state !== "stopped"}
|
||||||
|
onClick={() => loadModel(model.id)}
|
||||||
|
>
|
||||||
|
Load
|
||||||
|
</button>
|
||||||
|
</td>
|
||||||
<td className="p-2">
|
<td className="p-2">
|
||||||
<span className={`status status--${model.state}`}>{model.state}</span>
|
<span className={`status status--${model.state}`}>{model.state}</span>
|
||||||
</td>
|
</td>
|
||||||
@@ -65,8 +80,29 @@ export default function ModelsPage() {
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Right Column */}
|
{/* Right Column */}
|
||||||
<div className="w-full md:w-1/2 flex items-top">
|
<div className="w-full md:w-1/2 flex flex-col" style={{ height: "calc(100vh - 125px)" }}>
|
||||||
<LogPanel id="modelsupstream" title="Upstream Logs" logData={upstreamLogs} className="h-full" />
|
<div className="card mb-4 min-h-[250px]">
|
||||||
|
<h2>Log Stats</h2>
|
||||||
|
<p className="italic my-2">note: eval logs from llama-server</p>
|
||||||
|
<table className="w-full border border-gray-200">
|
||||||
|
<tbody>
|
||||||
|
<tr className="border-b border-gray-200">
|
||||||
|
<td className="py-2 px-4 font-medium border-r border-gray-200">Requests</td>
|
||||||
|
<td className="py-2 px-4 text-right">{totalLines}</td>
|
||||||
|
</tr>
|
||||||
|
<tr className="border-b border-gray-200">
|
||||||
|
<td className="py-2 px-4 font-medium border-r border-gray-200">Total Tokens Generated</td>
|
||||||
|
<td className="py-2 px-4 text-right">{totalTokens}</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td className="py-2 px-4 font-medium border-r border-gray-200">Average Tokens/Second</td>
|
||||||
|
<td className="py-2 px-4 text-right">{avgTokensPerSecond}</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<LogPanel id="modelsupstream" title="Upstream Logs" logData={upstreamLogs} />
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||