feat(llamaswap): add llama-swap provider + canonical imagegen interface

Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and a timeout-free client so cold model swaps rely on context deadlines. The "tailored" surface is concrete management methods (ListModels / Running / Unload) that don't belong on the canonical llm.Provider interface. The llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL built-in errors clearly on use, mirroring foreman. Add imagegen, a new canonical text-to-image interface separate from llm (Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed straight back into chat). First backend is llama-swap via OpenAI /v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is txt2img only. Hermetic httptest coverage for chat delegation, management endpoints, image decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix + image-gen section, CLAUDE.md package map, and progress.md updated in the same commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 15:01:54 -04:00
parent 1fd7109a42
commit 96c612e707
14 changed files with 994 additions and 7 deletions
@@ -126,6 +126,7 @@ Chains are health-tracked per target:
 | Ollama Cloud | `ollama-cloud` | `OLLAMA_API_KEY` | https://ollama.com |
 | Ollama (local) | `ollama` | — | `OLLAMA_HOST` or http://localhost:11434 |
 | foreman | `foreman` | — (token via DSN) | requires an LLM_* DSN or `ollama.Foreman(url, token)` |
+| llama-swap | `llama-swap` | — (token via DSN) | requires an LLM_* DSN or `llamaswap.New(...)` |

 OpenAI-compatible / Anthropic-compatible endpoints: construct the provider
 with a name and base URL and register it —
@@ -158,12 +159,24 @@ m, _ := reg.Parse("m5/qwen3:30b,m1/qwen3:30b,thinking")
 ```

 DSN format: `scheme://[token@]host[/path]`, scheme ∈ `foreman`, `ollama`,
-`ollama-cloud`, `openai`, `anthropic`, `google`/`gemini`, or any scheme you
-add with `RegisterScheme`. The token is the credential (bearer token / API
-key); the base URL is always `https://host[/path]`. `New()` loads `LLM_*`
+`ollama-cloud`, `openai`, `anthropic`, `google`/`gemini`, `llama-swap`, or any
+scheme you add with `RegisterScheme`. The token is the credential (bearer token
+/ API key); the base URL is always `https://host[/path]` — except `llama-swap`,
+which builds `http://host[:port]` since it's local-first. `New()` loads `LLM_*`
 vars eagerly; unknown provider names also resolve lazily at Parse time
 (`my-prov/x` → `LLM_MY_PROV`).

+```
+LLM_LS=llama-swap://token@box.local:8080   # then "ls/qwen3:14b" parses
+```
+
+[llama-swap](https://github.com/mostlygeek/llama-swap) is a model-swapping proxy
+over llama.cpp. Its chat API is OpenAI-compatible (majordomo reuses the openai
+client), and the `*llamaswap.Provider` adds management methods
+(`ListModels`/`Running`/`Unload`) plus image generation (see below). A cold
+model swap can take many seconds — bound calls with a context deadline, not a
+client timeout.
+
 ### Custom providers

 Implement the two-method `Provider` interface and register it:
@@ -191,6 +204,27 @@ resp, err := m.Generate(ctx, majordomo.Request{
 })
 ```

+## Image generation
+
+Text-to-image is a separate contract (`imagegen`) from chat, because it shares
+none of the message/tool/stream machinery. Generated images come back as
+`llm.ImagePart`, so they drop straight back into a chat turn. The first backend
+is llama-swap (OpenAI `/v1/images/generations` → a stable-diffusion.cpp
+upstream).
+
+```go
+ls := llamaswap.New(llamaswap.WithBaseURL("http://box.local:8080"))
+im, _ := ls.ImageModel("sd-xl")
+
+res, err := im.Generate(ctx, imagegen.Request{Prompt: "a red bicycle"},
+    imagegen.WithSize("1024x1024"))
+// res.Images[0] is an llm.ImagePart (bytes + MIME) — feed it back into chat:
+// majordomo.UserParts(majordomo.Text("describe this"), res.Images[0])
+```
+
+`*llamaswap.Provider` also exposes management methods: `ListModels` (what
+llama-swap can serve), `Running` (what's loaded), and `Unload` (free a model).
+
 ## Tool calls

 ```go
@@ -312,12 +346,19 @@ to build one.
 | Ollama Cloud | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 | Ollama (local) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 | foreman | ✅ | ✅ | ✅¹ | ✅ | ✅ | ✅ | ✅ |
+| llama-swap | ✅ | ✅ | ✅ | ✅² | ✅² | ✅² | ✅ |
 | fake (testing) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — |

 ¹ foreman's daemon currently buffers sync chat responses (no token-by-token
 streaming); majordomo's stream API works against it and delivers the
 response as a single delta plus final event.

+² llama-swap's chat is OpenAI-compatible and reuses the openai client, so these
+capabilities are present at the client level; whether a given call succeeds
+depends on the llama.cpp model llama-swap loads. llama-swap also provides
+**image generation** (a separate `imagegen` axis, not shown above) and
+management methods on `*llamaswap.Provider`.
+
 Notes: Ollama has no native tool_choice — `"none"` drops the tools;
 `"required"`/named choices are best-effort ignored there. Ollama Cloud
 ignores the `format` field (verified live), so the provider also states