UI improvements (#213 )

* use two column for logs view on wider screens * hide log controls when panel is minimized
Update docs in Readme [skip ci]
2025-07-31 11:59:21 -07:00 · 2025-07-30 11:51:14 -07:00 · 2025-07-30 11:29:03 -07:00 · 2025-07-30 10:37:10 -07:00 · 2025-07-30 10:13:49 -07:00 · 2025-07-30 10:12:21 -07:00
6 changed files with 127 additions and 40 deletions
@@ -7,6 +7,10 @@ on:

  # Allows manual triggering of the workflow
  workflow_dispatch:
+    inputs:
+      tag:
+        description: 'Tag version to release (e.g. v144)'
+        required: true

 permissions:
  contents: write
@@ -20,15 +24,15 @@ jobs:
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
+          ref: ${{ github.event.inputs.tag || github.ref }}
      -
        name: Set up Go
        uses: actions/setup-go@v5
-
      -
        name: Set up Node.js
        uses: actions/setup-node@v4
        with:
-          node-version: '23'  # or your preferred version
+          node-version: '23'
      -
        name: Install dependencies and build UI
        run: |
@@ -46,4 +50,30 @@ jobs:
          version: '~> v2'
          args: release --clean
        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+
+  trigger-tap-update:
+    runs-on: ubuntu-latest
+    needs: goreleaser
+    steps:
+      - name: "Resolve tag to dispatch"
+        id: tag
+        run: |
+          if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
+            echo "tag=${{ github.event.inputs.tag }}" >> "$GITHUB_OUTPUT"
+          else
+            echo "tag=${{ github.ref_name }}" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: "Trigger tap repository update"
+        uses: peter-evans/repository-dispatch@v2
+        with:
+          token: ${{ secrets.TAP_REPO_PAT }}
+          repository: mostlygeek/homebrew-llama-swap
+          event-type: new-release
+          client-payload: |
+            {
+              "release": {
+                "tag_name": "${{ steps.tag.outputs.tag }}"
+              }
+            }
@@ -18,7 +18,7 @@ Written in golang, it is very easy to install (single binary with no dependencie
  - `v1/completions`
  - `v1/chat/completions`
  - `v1/embeddings`
-  - `v1/rerank`
+  - `v1/rerank`, `v1/reranking`, `rerank`
  - `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
  - `v1/audio/transcriptions` ([docs](https://github.com/mostlygeek/llama-swap/issues/41#issuecomment-2722637867))
 - ✅ llama-swap custom API endpoints
@@ -27,6 +27,7 @@ Written in golang, it is very easy to install (single binary with no dependencie
  - `/upstream/:model_id` - direct access to upstream HTTP server ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
  - `/unload` - manually unload running models ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
  - `/running` - list currently running models ([#61](https://github.com/mostlygeek/llama-swap/issues/61))
+  - `/health` - just returns "OK"
 - ✅ Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
 - ✅ Automatic unloading of models after timeout by setting a `ttl`
 - ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
@@ -74,10 +75,18 @@ llama-swap ships with a real time web interface to monitor logs and status of mo

 <img width="1786" height="1334" alt="image" src="https://github.com/user-attachments/assets/d6258cb9-1dad-40db-828f-2be860aec8fe" />

+## Installation

-## Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
+llama-swap can be installed in multiple ways

-Docker is the quickest way to try out llama-swap:
+1. Docker
+2. Homebrew (OSX and Linux)
+3. From release binaries
+4. From source
+
+### Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
+
+Docker images with llama-swap and llama-server are built nightly. 

 ```shell
 # use CPU inference comes with the example config above
@@ -99,7 +108,7 @@ $ curl -s http://localhost:9292/v1/chat/completions \
 ```

 <details>
-<summary>Docker images are built nightly for cuda, intel, vulcan, etc ...</summary>
+<summary>Docker images are built nightly with llama-server for cuda, intel, vulcan and musa.</summary>

 They include:

@@ -122,9 +131,23 @@ $ docker run -it --rm --runtime nvidia -p 9292:8080 \

 </details>

-## Bare metal Install ([download](https://github.com/mostlygeek/llama-swap/releases))
+### Homebrew Install (macOS/Linux)

-Pre-built binaries are available for Linux, Mac, Windows and FreeBSD. These are automatically published and are likely a few hours ahead of the docker releases. The baremetal install works with any OpenAI compatible server, not just llama-server.
+The latest release of `llama-swap` can be installed via [Homebrew](https://brew.sh). 
+
+```shell
+# Set up tap and install formula 
+brew tap mostlygeek/llama-swap
+brew install llama-swap
+# Run llama-swap
+llama-swap --config path/to/config.yaml --listen localhost:8080
+```
+
+This will install the `llama-swap` binary and make it available in your path. See the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration)
+
+### Pre-built Binaries ([download](https://github.com/mostlygeek/llama-swap/releases))
+
+Binaries are available for Linux, Mac, Windows and FreeBSD. These are automatically published and are likely a few hours ahead of the docker releases. The binary install works with any OpenAI compatible server, not just llama-server.

 1. Download a [release](https://github.com/mostlygeek/llama-swap/releases) appropriate for your OS and architecture.
 1. Create a configuration file, see the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration).
@@ -17,6 +17,7 @@ func MetricsMiddleware(pm *ProxyManager) gin.HandlerFunc {
 		bodyBytes, err := io.ReadAll(c.Request.Body)
 		if err != nil {
 			pm.sendErrorResponse(c, http.StatusBadRequest, "could not ready request body")
+			c.Abort()
 			return
 		}
 		c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
@@ -24,15 +25,16 @@ func MetricsMiddleware(pm *ProxyManager) gin.HandlerFunc {
 		requestedModel := gjson.GetBytes(bodyBytes, "model").String()
 		if requestedModel == "" {
 			pm.sendErrorResponse(c, http.StatusBadRequest, "missing or invalid 'model' key")
+			c.Abort()
 			return
 		}

 		realModelName, found := pm.config.RealModelName(requestedModel)
 		if !found {
 			pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("could not find real modelID for %s", requestedModel))
+			c.Abort()
 			return
 		}
-		c.Set("ls-real-model-name", realModelName)

 		writer := &MetricsResponseWriter{
 			ResponseWriter: c.Writer,
@@ -14,6 +14,7 @@ import (
 	"time"

 	"github.com/gin-gonic/gin"
+	"github.com/tidwall/gjson"
 	"github.com/tidwall/sjson"
 )

@@ -160,8 +161,10 @@ func (pm *ProxyManager) setupGinEngine() {
 	pm.ginEngine.POST("/v1/completions", mm, pm.proxyOAIHandler)

 	// Support embeddings
-	pm.ginEngine.POST("/v1/embeddings", pm.proxyOAIHandler)
-	pm.ginEngine.POST("/v1/rerank", pm.proxyOAIHandler)
+	pm.ginEngine.POST("/v1/embeddings", mm, pm.proxyOAIHandler)
+	pm.ginEngine.POST("/v1/rerank", mm, pm.proxyOAIHandler)
+	pm.ginEngine.POST("/v1/reranking", mm, pm.proxyOAIHandler)
+	pm.ginEngine.POST("/rerank", mm, pm.proxyOAIHandler)

 	// Support audio/speech endpoint
 	pm.ginEngine.POST("/v1/audio/speech", pm.proxyOAIHandler)
@@ -188,6 +191,9 @@ func (pm *ProxyManager) setupGinEngine() {

 	pm.ginEngine.GET("/unload", pm.unloadAllModelsHandler)
 	pm.ginEngine.GET("/running", pm.listRunningProcessesHandler)
+	pm.ginEngine.GET("/health", func(c *gin.Context) {
+		c.String(http.StatusOK, "OK")
+	})

 	pm.ginEngine.GET("/favicon.ico", func(c *gin.Context) {
 		if data, err := reactStaticFS.ReadFile("ui_dist/favicon.ico"); err == nil {
@@ -365,9 +371,15 @@ func (pm *ProxyManager) proxyOAIHandler(c *gin.Context) {
 		return
 	}

-	realModelName := c.GetString("ls-real-model-name") // Should be set in MetricsMiddleware
-	if realModelName == "" {
-		pm.sendErrorResponse(c, http.StatusInternalServerError, "ls-real-model-name not set")
+	requestedModel := gjson.GetBytes(bodyBytes, "model").String()
+	if requestedModel == "" {
+		pm.sendErrorResponse(c, http.StatusBadRequest, "missing or invalid 'model' key")
+		return
+	}
+
+	realModelName, found := pm.config.RealModelName(requestedModel)
+	if !found {
+		pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("could not find real modelID for %s", requestedModel))
 		return
 	}

@@ -755,3 +755,21 @@ func TestProxyManager_MiddlewareWritesMetrics_Streaming(t *testing.T) {
 	assert.Greater(t, lastMetric.TokensPerSecond, 0.0, "tokens per second should be greater than 0")
 	assert.Greater(t, lastMetric.DurationMs, 0, "duration should be greater than 0")
 }
+
+func TestProxyManager_HealthEndpoint(t *testing.T) {
+	config := AddDefaultGroupToConfig(Config{
+		HealthCheckTimeout: 15,
+		Models: map[string]ModelConfig{
+			"model1": getTestSimpleResponderConfig("model1"),
+		},
+		LogLevel: "error",
+	})
+
+	proxy := New(config)
+	defer proxy.StopProcesses(StopWaitForInflightRequest)
+	req := httptest.NewRequest("GET", "/health", nil)
+	rec := httptest.NewRecorder()
+	proxy.ServeHTTP(rec, req)
+	assert.Equal(t, http.StatusOK, rec.Code)
+	assert.Equal(t, "OK", rec.Body.String())
+}
@@ -6,7 +6,7 @@ const LogViewer = () => {
  const { proxyLogs, upstreamLogs } = useAPI();

  return (
-    <div className="flex flex-col gap-5" style={{ height: "calc(100vh - 125px)" }}>
+    <div className="flex flex-col lg:flex-row gap-5" style={{ height: "calc(100vh - 125px)" }}>
      <LogPanel id="proxy" title="Proxy Logs" logData={proxyLogs} />
      <LogPanel id="upstream" title="Upstream Logs" logData={upstreamLogs} />
    </div>
@@ -90,34 +90,36 @@ export const LogPanel = ({ id, title, logData, className }: LogPanelProps) => {
        <div className="flex flex-col md:flex-row md:items-center md:justify-between gap-4">
          {/* Title - Always full width on mobile, normal on desktop */}
          <div className="w-full md:w-auto" onClick={() => setIsCollapsed(!isCollapsed)}>
-            <h3 className="m-0 text-lg">{title}</h3>
+            <h3 className="m-0 text-lg p-0">{title}</h3>
          </div>

-          <div className="flex flex-col sm:flex-row gap-4 w-full md:w-auto">
-            {/* Sizing Buttons - Stacks vertically on mobile */}
-            <div className="flex flex-wrap gap-2">
-              <button className="btn" onClick={toggleFontSize}>
-                font: {fontSize}
-              </button>
-              <button className="btn" onClick={() => setTextWrap((prev) => !prev)}>
-                {wrapText ? "wrap" : "wrap off"}
-              </button>
-            </div>
+          {!isCollapsed && (
+            <div className="flex flex-col sm:flex-row gap-4 w-full md:w-auto">
+              {/* Sizing Buttons - Stacks vertically on mobile */}
+              <div className="flex flex-wrap gap-2">
+                <button className="btn" onClick={toggleFontSize}>
+                  font: {fontSize}
+                </button>
+                <button className="btn" onClick={() => setTextWrap((prev) => !prev)}>
+                  {wrapText ? "wrap" : "wrap off"}
+                </button>
+              </div>

-            {/* Filtering Options - Full width on mobile, normal on desktop */}
-            <div className="flex flex-1 min-w-0 gap-2">
-              <input
-                type="text"
-                className="flex-1 min-w-[120px] text-sm border p-2 rounded"
-                placeholder="Filter logs..."
-                value={filterRegex}
-                onChange={(e) => setFilterRegex(e.target.value)}
-              />
-              <button className="btn" onClick={() => setFilterRegex("")}>
-                Clear
-              </button>
+              {/* Filtering Options - Full width on mobile, normal on desktop */}
+              <div className="flex flex-1 min-w-0 gap-2">
+                <input
+                  type="text"
+                  className="flex-1 min-w-[120px] text-sm border p-2 rounded"
+                  placeholder="Filter logs..."
+                  value={filterRegex}
+                  onChange={(e) => setFilterRegex(e.target.value)}
+                />
+                <button className="btn" onClick={() => setFilterRegex("")}>
+                  Clear
+                </button>
+              </div>
            </div>
-          </div>
+          )}
        </div>
      </div>
Author	SHA1	Message	Date
Benson Wong	574fdfabb4	UI improvements (#213 ) * use two column for logs view on wider screens * hide log controls when panel is minimized	2025-07-31 11:59:21 -07:00
Benson Wong	5172cb2e12	Update docs in Readme [skip ci]	2025-07-30 11:51:14 -07:00
Benson Wong	5672cb03fd	Update github actions for notifying homebrew build (#212 ) Combine homebrew-llama-swap event with the release action	2025-07-30 11:29:03 -07:00
Benson Wong	0f583163f7	add /health (#211 )	2025-07-30 10:37:10 -07:00
Benson Wong	7905fa9ea3	Update trigger-homebrew-update.yml [skip ci]	2025-07-30 10:13:49 -07:00
Ian Sebastian Mathew	bbaf172956	add trigger to rebuild homebrew formula (#210 )	2025-07-30 10:12:21 -07:00
Benson Wong	fd50932dbc	Decouple MetricsMiddleware from downstream handlers (#206 ) * Decouple MetricsMiddleware from downstream handlers Remove ls-real-model-name optimization. Within proxyOAIHandler the request body's bytes are required for various rewriting features anyways. This negated any benefits from trying not to parse it twice.	2025-07-27 10:36:06 -07:00
Gaël James	8c693e7fcf	Add endpoint aliases for reranking models (#201 ) * Add endpoint aliases for reranking models * Add MetricsMiddleware to the previous reranking endpoint * Fix the embeddings endpoint not having model set	2025-07-24 08:32:47 -07:00