Load models in the UI without navigating the page (#173 )

* Load models in the UI without navigating the page * fix table layout for mobile
Update README.md [skip ci]
2025-06-19 14:39:07 -07:00 · 2025-06-18 11:47:08 -07:00 · 2025-06-18 11:36:59 -07:00 · 2025-06-18 11:15:12 -07:00 · 2025-06-18 11:09:13 -07:00 · 2025-06-18 07:49:33 -07:00
8 changed files with 93 additions and 29 deletions
@@ -17,14 +17,16 @@ builds:
      - goos: windows
        goarch: arm64

-# use zip format for windows
 archives:
  - id: default
-    format: tar.gz
+    formats:
+      - tar.gz
    name_template: "{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}"
    builds_info:
      group: root
      owner: root
    format_overrides:
+      # use zip format for windows
      - goos: windows
-        format: zip
+        formats:
+          - zip
@@ -29,9 +29,13 @@ test: proxy/ui_dist/placeholder.txt
 test-all: proxy/ui_dist/placeholder.txt
 	go test -v -count=1 ./proxy

+ui/node_modules:
+	cd ui && npm install
+
 # build react UI
-ui:
+ui: ui/node_modules
 	cd ui && npm run build
+
 # Build OSX binary
 mac: ui
 	@echo "Building Mac binary..."
@@ -22,6 +22,7 @@ Written in golang, it is very easy to install (single binary with no dependencie
  - `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
  - `v1/audio/transcriptions` ([docs](https://github.com/mostlygeek/llama-swap/issues/41#issuecomment-2722637867))
 - ✅ llama-swap custom API endpoints
+  - `/ui` - web UI
  - `/log` - remote log monitoring
  - `/upstream/:model_id` - direct access to upstream HTTP server ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
  - `/unload` - manually unload running models ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
@@ -40,36 +41,40 @@ In the most basic configuration llama-swap handles one model at a time. For more

 ## config.yaml

-llama-swap's configuration is purposefully simple:
+llama-swap is managed entirely through a yaml configuration file. 
+
+It can be very minimal to start: 

 ```yaml
 models:
  "qwen2.5":
    cmd: |
-      /app/llama-server
+      /path/to/llama-server
      -hf bartowski/Qwen2.5-0.5B-Instruct-GGUF:Q4_K_M
      --port ${PORT}
-
-  "smollm2":
-    cmd: |
-      /app/llama-server
-      -hf bartowski/SmolLM2-135M-Instruct-GGUF:Q4_K_M
-      --port ${PORT}
 ```

-.. but also supports many advanced features:
+However, there are many more capabilities that llama-swap supports: 

 - `groups` to run multiple models at once
- `macros` for reusable snippets
 - `ttl` to automatically unload models
+- `macros` for reusable snippets
 - `aliases` to use familiar model names (e.g., "gpt-4o-mini")
- `env` variables to pass custom environment to inference servers
+- `env` to pass custom environment variables to inference servers
+- `cmdStop` for to gracefully stop Docker/Podman containers
 - `useModelName` to override model names sent to upstream servers
 - `healthCheckTimeout` to control model startup wait times
 - `${PORT}` automatic port variables for dynamic port assignment
- `cmdStop` for to gracefully stop Docker/Podman containers

-Check the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration) in the wiki for all options.
+See the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration) in the wiki all options and examples.
+
+## Web UI
+
+llama-swap ships with a web based interface to make it easier to monitor logs and check the status of models. 
+
+<img width="1854" alt="image" src="https://github.com/user-attachments/assets/ee0025f0-f031-4158-9b5d-cd98b2b9fe4d" />
+
+

 ## Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))

@@ -120,11 +125,11 @@ $ docker run -it --rm --runtime nvidia -p 9292:8080 \

 ## Bare metal Install ([download](https://github.com/mostlygeek/llama-swap/releases))

-Pre-built binaries are available for Linux, FreeBSD and Darwin (OSX). These are automatically published and are likely a few hours ahead of the docker releases. The baremetal install works with any OpenAI compatible server, not just llama-server.
+Pre-built binaries are available for Linux, Mac, Windows and FreeBSD. These are automatically published and are likely a few hours ahead of the docker releases. The baremetal install works with any OpenAI compatible server, not just llama-server.

-1. Create a configuration file, see [config.example.yaml](config.example.yaml)
 1. Download a [release](https://github.com/mostlygeek/llama-swap/releases) appropriate for your OS and architecture.
-1. Run the binary with `llama-swap --config path/to/config.yaml`.
+1. Create a configuration file, see the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration).
+1. Run the binary with `llama-swap --config path/to/config.yaml --listen localhost:8080`.
   Available flags:
   - `--config`: Path to the configuration file (default: `config.yaml`).
   - `--listen`: Address and port to listen on (default: `:8080`).
@@ -133,16 +138,16 @@ Pre-built binaries are available for Linux, FreeBSD and Darwin (OSX). These are

 ### Building from source

-1. Install golang for your system
+1. Build requires golang and nodejs for the user interface.
 1. `git clone git@github.com:mostlygeek/llama-swap.git`
 1. `make clean all`
 1. Binaries will be in `build/` subdirectory

 ## Monitoring Logs

-Open the `http://<host>/logs` with your browser to get a web interface with streaming logs.
+Open the `http://<host>:<port>/` with your browser to get a web interface with streaming logs.

-Of course, CLI access is also supported:
+CLI access is also supported:

 ```shell
 # sends up to the last 10KB of logs
@@ -189,17 +189,19 @@ func (p *Process) start() error {
 	p.waitStarting.Add(1)
 	defer p.waitStarting.Done()
 	cmdContext, ctxCancelUpstream := context.WithCancel(context.Background())
+
 	p.cmd = exec.CommandContext(cmdContext, args[0], args[1:]...)
 	p.cmd.Stdout = p.processLogger
 	p.cmd.Stderr = p.processLogger
-	p.cmd.Env = p.config.Env
-
+	p.cmd.Env = append(p.cmd.Environ(), p.config.Env...)
 	p.cmd.Cancel = p.cmdStopUpstreamProcess
 	p.cmd.WaitDelay = p.gracefulStopTimeout
 	p.cancelUpstream = ctxCancelUpstream
 	p.cmdWaitChan = make(chan struct{})

 	p.failedStartCount++ // this will be reset to zero when the process has successfully started
+
+	p.proxyLogger.Debugf("<%s> Executing start command: %s, env: %s", p.ID, strings.Join(args, " "), strings.Join(p.config.Env, ", "))
 	err = p.cmd.Start()

 	// Set process state to failed
@@ -530,7 +532,7 @@ func (p *Process) cmdStopUpstreamProcess() error {
 		stopCmd := exec.Command(stopArgs[0], stopArgs[1:]...)
 		stopCmd.Stdout = p.processLogger
 		stopCmd.Stderr = p.processLogger
-		stopCmd.Env = p.config.Env
+		stopCmd.Env = p.cmd.Env

 		if err := stopCmd.Run(); err != nil {
 			p.proxyLogger.Errorf("<%s> Failed to exec stop command: %v", p.ID, err)
@@ -394,6 +394,9 @@ func TestProcess_StopImmediately(t *testing.T) {
 // Test that SIGKILL is sent when gracefulStopTimeout is reached and properly terminates
 // the upstream command
 func TestProcess_ForceStopWithKill(t *testing.T) {
+	if runtime.GOOS == "windows" {
+		t.Skip("skipping SIGTERM test on Windows ")
+	}

 	expectedMessage := "test_sigkill"
 	binaryPath := getSimpleResponderPath()
@@ -405,7 +408,6 @@ func TestProcess_ForceStopWithKill(t *testing.T) {
 		Cmd:           fmt.Sprintf("%s --port %d --respond %s --silent --ignore-sig-term", binaryPath, port, expectedMessage),
 		Proxy:         fmt.Sprintf("http://127.0.0.1:%d", port),
 		CheckEndpoint: "/health",
-		CmdStop:       "taskkill /f /t /pid ${PID}",
 	}

 	process := NewProcess("stop_immediate", 2, config, debugLogger, debugLogger)
@@ -465,3 +467,27 @@ func TestProcess_StopCmd(t *testing.T) {
 	process.StopImmediately()
 	assert.Equal(t, process.CurrentState(), StateStopped)
 }
+
+func TestProcess_EnvironmentSetCorrectly(t *testing.T) {
+	expectedMessage := "test_env_not_emptied"
+	config := getTestSimpleResponderConfig(expectedMessage)
+
+	// ensure that the the default config does not blank out the inherited environment
+	configWEnv := config
+
+	// ensure the additiona variables are appended to the process' environment
+	configWEnv.Env = append(configWEnv.Env, "TEST_ENV1=1", "TEST_ENV2=2")
+
+	process1 := NewProcess("env_test", 2, config, debugLogger, debugLogger)
+	process2 := NewProcess("env_test", 2, configWEnv, debugLogger, debugLogger)
+
+	process1.start()
+	defer process1.Stop()
+	process2.start()
+	defer process2.Stop()
+
+	assert.NotZero(t, len(process1.cmd.Environ()))
+	assert.NotZero(t, len(process2.cmd.Environ()))
+	assert.Equal(t, len(process1.cmd.Environ())+2, len(process2.cmd.Environ()), "process2 should have 2 more environment variables than process1")
+
+}
@@ -12,6 +12,7 @@ interface APIProviderType {
  models: Model[];
  listModels: () => Promise<Model[]>;
  unloadAllModels: () => Promise<void>;
+  loadModel: (model: string) => Promise<void>;
  enableProxyLogs: (enabled: boolean) => void;
  enableUpstreamLogs: (enabled: boolean) => void;
  enableModelUpdates: (enabled: boolean) => void;
@@ -139,11 +140,26 @@ export function APIProvider({ children }: APIProviderProps) {
    }
  }, []);

+  const loadModel = useCallback(async (model: string) => {
+    try {
+      const response = await fetch(`/upstream/${model}/`, {
+        method: "GET",
+      });
+      if (!response.ok) {
+        throw new Error(`Failed to load model: ${response.status}`);
+      }
+    } catch (error) {
+      console.error("Failed to load model:", error);
+      throw error; // Re-throw to let calling code handle it
+    }
+  }, []);
+
  const value = useMemo(
    () => ({
      models,
      listModels,
      unloadAllModels,
+      loadModel,
      enableProxyLogs,
      enableUpstreamLogs,
      enableModelUpdates,
@@ -154,6 +170,7 @@ export function APIProvider({ children }: APIProviderProps) {
      models,
      listModels,
      unloadAllModels,
+      loadModel,
      enableProxyLogs,
      enableUpstreamLogs,
      enableModelUpdates,
@@ -143,6 +143,10 @@
    @apply bg-surface p-2 px-4 text-sm rounded-full border border-2 transition-colors duration-200 border-btn-border;
  }

+  .btn:hover {
+    cursor: pointer;
+  }
+
  .btn--sm {
    @apply px-2 py-0.5 text-xs;
  }
@@ -3,7 +3,7 @@ import { useAPI } from "../contexts/APIProvider";
 import { LogPanel } from "./LogViewer";

 export default function ModelsPage() {
-  const { models, enableModelUpdates, unloadAllModels, upstreamLogs, enableUpstreamLogs } = useAPI();
+  const { models, enableModelUpdates, unloadAllModels, loadModel, upstreamLogs, enableUpstreamLogs } = useAPI();
  const [isUnloading, setIsUnloading] = useState(false);

  useEffect(() => {
@@ -43,6 +43,7 @@ export default function ModelsPage() {
              <thead>
                <tr className="border-b border-primary">
                  <th className="text-left p-2">Name</th>
+                  <th className="text-left p-2"></th>
                  <th className="text-left p-2">State</th>
                </tr>
              </thead>
@@ -50,10 +51,13 @@ export default function ModelsPage() {
                {models.map((model) => (
                  <tr key={model.id} className="border-b hover:bg-secondary-hover border-border">
                    <td className="p-2">
-                      <a href={`/upstream/${model.id}/`} className="underline" target="top">
+                      <a href={`/upstream/${model.id}/`} className="underline" target="_blank">
                        {model.id}
                      </a>
                    </td>
+                    <td className="p-2">
+                      <button className="btn btn--sm" disabled={model.state !== "stopped"} onClick={() => loadModel(model.id)}>Load</button>
+                    </td>
                    <td className="p-2">
                      <span className={`status status--${model.state}`}>{model.state}</span>
                    </td>
Author	SHA1	Message	Date
Alex O'Connell	756193d0dd	Load models in the UI without navigating the page (#173 ) * Load models in the UI without navigating the page * fix table layout for mobile	2025-06-19 14:39:07 -07:00
Benson Wong	a6b2e930d8	Update README.md [skip ci]	2025-06-18 11:47:08 -07:00
Benson Wong	9e02c22ff8	stopCmd should use same environment as p.cmd.Env (#171 , #172 )	2025-06-18 11:36:59 -07:00
Benson Wong	0bdbf2fdc1	fix more goreleaser deprecation warnings [skip ci]	2025-06-18 11:15:12 -07:00
Benson Wong	49035e2e8e	Append custom env vars instead of replace in Process (#171 ) Append custom env vars instead of replace in Process (#168, #169) PR #162 refactored the default configuration code. This introduced a subtle bug where `env` became `[]string{}` instead of the default of `nil`. In golang, `exec.Cmd.Env == nil` means to use the "current process's environment". By setting it to `[]string{}` as a default the Process's environment was emptied out which caused an array of strange and difficult to troubleshoot behaviour. See issues #168 and #169 This commit changes the behaviour to append model configured environment variables to the default list rather than replace them.	2025-06-18 11:09:13 -07:00
Benson Wong	9963ae18bf	fix? deprecation warning in .goreleaser.yaml [skip-ci]	2025-06-18 07:49:33 -07:00
Benson Wong	2ae48c713b	add debug output for start command	2025-06-18 07:43:23 -07:00
Benson Wong	54c519e365	update Makefile to install ui deps	2025-06-17 09:54:01 -07:00
Benson Wong	3fce9ee0e9	Update README.md [skip ci]	2025-06-17 09:53:22 -07:00
Benson Wong	5899ae7966	Update README.md [skip ci]	2025-06-17 09:52:47 -07:00