Commit Graph

24 Commits

Author SHA1 Message Date
Benson Wong 7e3e94a08a proxy,ui: add performance monitoring with Prometheus metrics (#743)
Add a comprehensive performance monitoring system that collects CPU, memory, swap, load average, network IO, and GPU stats. Provides both a REST API for the UI and a Prometheus /metrics endpoint.

Backend changes:
- New internal/perf package with configurable interval-based stats collection
- GPU monitoring via LACT (Unix socket) and nvidia-smi fallback on Linux
- Ring buffer (internal/ring) for time-series stat storage
- Prometheus /metrics endpoint with all system and GPU metrics
- Moved LogMonitor to internal/logmon package
- New PerformanceConfig for hot-reloadable monitoring settings
- REST /api/performance endpoint replacing SSE streaming

UI changes:
- New Performance page with real-time charts for CPU, memory, GPU, and network
- Reusable PerformanceChart component
- LLAMA_SWAP_URL environment variable support
- Improved capture dialog display

Other:
- Example Grafana dashboard for Prometheus metrics
- monitor-test standalone binary
- Config schema and example updates

fixes #596
2026-05-09 13:29:22 -07:00
Benson Wong fd3c28ffc5 Refactor Activity Page (#710)
- inference handles to store an activity record for all inference endpoints
- add path, status code, and content type to Activities page
- toggle on/off columns no Activities page 
- add configurable capture level for inference endpoints so large binary blobs are not stored in memory
- store captures in compressed binary format
2026-04-28 20:33:03 -07:00
Benson Wong 66639e83f7 proxy: replace fsnotify with stat-poll watcher and add SIGHUP reload (#685)
The fsnotify-based config watcher does not work reliably when the config
file is bind-mounted into a Docker container as an individual file, and
mishandles k8s ConfigMap projections (atomically swapped symlinks).
Replace it with a small os.Stat-polling watcher and add SIGHUP as an
explicit reload signal.

- new proxy/configwatcher package: 2s os.Stat poller, follows symlinks,
  fires on mtime/size change and on missing -> present transitions
- SIGHUP triggers reload unconditionally (works without --watch-config)
  via the same ConfigFileChangedEvent pipeline so the UI sees identical
  state transitions
- watcher goroutine now exits cleanly on shutdown via a context
- drop github.com/fsnotify/fsnotify dependency

fixes #682
2026-04-21 23:21:48 -07:00
Benson Wong 5e3c646829 proxy: compress captures with zstd (#668)
The previous captures were saved uncompressed in memory. In agentic
workflows there can be many turns with each request containing the
previous context in the body with a lot of redundant data. Use zstd to
compress the request and response data before keeping a copy of memory.

Results: 

- Average Percentage Saved: 73.19%
- Average Compression Factor: ~6.77:1
2026-04-17 23:29:37 -07:00
Benson Wong a3725e7d09 Update go.mod to 1.26.1 (#593) 2026-03-20 16:09:58 +09:00
Benson Wong 15bd55d3a9 proxy, ui-svelte: add /sdapi/v1 endpoint support (#587)
Add proxy routes for stable-diffusion.cpp's /sdapi/v1/txt2img,
/sdapi/v1/img2img, and /sdapi/v1/loras endpoints. POST endpoints
use proxyInferenceHandler (model in JSON body), GET /loras uses
proxyGETModelHandler (model in query param).

Update the image playground with a dual-mode UI supporting both
OpenAI and SDAPI backends. In SDAPI mode, loras are fetched first
to prime the server-side cache, and all txt2img parameters are
exposed (negative prompt, steps, cfg_scale, seed, batch_size,
clip_skip, sampler, scheduler, lora selection with multipliers).

- Add 3 sdapi route registrations in proxymanager.go
- Add sdApi.ts client with generateSdImage and fetchSdLoras
- Add SDAPI types (SdApiTxt2ImgRequest, SdApiResponse, etc.)
- Add /sdapi to vite dev proxy config
- Add backend tests for sdapi routing
- Support batch image display in gallery grid

https://claude.ai/code/session_0186MGX6NXdHVBTv2KH45fqn

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-19 22:08:31 +09:00
Benson Wong bccce5fa19 go.mod,ui/package-lock.json: dependency and security updates (#418) 2025-11-29 22:27:22 -08:00
Benson Wong 6299c1b874 Fix High CPU (#189)
* vendor in kelindar/event lib and refactor to remove time.Ticker
2025-07-15 18:04:30 -07:00
Benson Wong 1921e570d7 Add Event Bus (#184)
Major internal refactor to use an event bus to pass event/messages along. These changes are largely invisible user facing but sets up internal design for real time stats and information.

- `--watch-config` logic refactored for events
- remove multiple SSE api endpoints, replaced with /api/events
- keep all functionality essentially the same
- UI/backend sync is in near real time now
2025-07-01 22:17:35 -07:00
Benson Wong d7b390df74 Add GH Action for Testing on Windows (#132)
* Add windows specific test changes
* Change the command line parsing library - Possible breaking changes for windows users!
2025-05-14 21:51:53 -07:00
Sam bc652709a5 Add config hot-reload (#106)
introduce --watch-config command line option to reload ProxyManager when configuration changes.
2025-05-11 17:37:00 -07:00
Benson Wong 8404244fab Moderate security update for golang/x/net -> v0.38.0 2025-04-24 09:58:40 -07:00
Benson Wong 671c1a5a7b update deps 2025-03-13 14:00:15 -07:00
Benson Wong b3d331da0d Properly strip profile name slug from models fixes (#62)
The profile slug in a model name, `profile:model`, is specific to
llama-swap. This strips `profile:` out of the model name request so
upstreams that expect just `model` work and do not require knowing about
the profile slug.
2025-03-09 12:41:52 -07:00
Benson Wong db6715bec3 update golang.org/x/net -> v0.33.0 for dependabot 2024-12-20 11:28:32 -08:00
Benson Wong d4e22cceaa Fix security vulnerability with golang.org/x/crypto
- does not affect the project as llama-swap does not use the crypto
  libraries
- good practice to keep security deps updated!
2024-12-14 10:20:22 -08:00
Benson Wong 9fc5d5b5eb improve cmd parsing (#22)
Switch from using a naive strings.Fields() to shlex.Split() for parsing the model startup command into a string[]. This makes parsing much more reliable around newlines, quotes, etc.
2024-12-01 09:02:58 -08:00
Benson Wong c3b4bb1684 use gin for http server 2024-11-18 15:30:16 -08:00
Benson Wong be82d1a6a0 Support multiline cmds in YAML configuration
Add support for multiline `cmd` configurations allowing for nicer looking configuration YAML files.
2024-10-19 20:06:59 -07:00
Benson Wong 4fae7cf946 update docs 2024-10-04 21:11:08 -07:00
Benson Wong ef05c05f9c renaming to llama-swap 2024-10-04 20:21:11 -07:00
Benson Wong 844615bfcc rename to llamagate 2024-10-04 11:09:36 -07:00
Benson Wong f44faf5a93 move config to its own package 2024-10-03 21:08:11 -07:00
Benson Wong b63b81b121 first commit 2024-10-03 20:20:01 -07:00