feat(llamaswap): add llama-swap provider + canonical imagegen interface
CI / Tidy (pull_request) Successful in 9m25s
CI / Build & Test (pull_request) Successful in 10m15s

Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping
proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to
provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with
legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and
a timeout-free client so cold model swaps rely on context deadlines. The
"tailored" surface is concrete management methods (ListModels / Running /
Unload) that don't belong on the canonical llm.Provider interface. The
llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL
built-in errors clearly on use, mirroring foreman.

Add imagegen, a new canonical text-to-image interface separate from llm
(Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed
straight back into chat). First backend is llama-swap via OpenAI
/v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is
txt2img only.

Hermetic httptest coverage for chat delegation, management endpoints, image
decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix +
image-gen section, CLAUDE.md package map, and progress.md updated in the same
commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-27 15:01:54 -04:00
parent 1fd7109a42
commit 96c612e707
14 changed files with 994 additions and 7 deletions
+44 -3
View File
@@ -126,6 +126,7 @@ Chains are health-tracked per target:
| Ollama Cloud | `ollama-cloud` | `OLLAMA_API_KEY` | https://ollama.com |
| Ollama (local) | `ollama` | — | `OLLAMA_HOST` or http://localhost:11434 |
| foreman | `foreman` | — (token via DSN) | requires an LLM_* DSN or `ollama.Foreman(url, token)` |
| llama-swap | `llama-swap` | — (token via DSN) | requires an LLM_* DSN or `llamaswap.New(...)` |
OpenAI-compatible / Anthropic-compatible endpoints: construct the provider
with a name and base URL and register it —
@@ -158,12 +159,24 @@ m, _ := reg.Parse("m5/qwen3:30b,m1/qwen3:30b,thinking")
```
DSN format: `scheme://[token@]host[/path]`, scheme ∈ `foreman`, `ollama`,
`ollama-cloud`, `openai`, `anthropic`, `google`/`gemini`, or any scheme you
add with `RegisterScheme`. The token is the credential (bearer token / API
key); the base URL is always `https://host[/path]`. `New()` loads `LLM_*`
`ollama-cloud`, `openai`, `anthropic`, `google`/`gemini`, `llama-swap`, or any
scheme you add with `RegisterScheme`. The token is the credential (bearer token
/ API key); the base URL is always `https://host[/path]` — except `llama-swap`,
which builds `http://host[:port]` since it's local-first. `New()` loads `LLM_*`
vars eagerly; unknown provider names also resolve lazily at Parse time
(`my-prov/x``LLM_MY_PROV`).
```
LLM_LS=llama-swap://token@box.local:8080 # then "ls/qwen3:14b" parses
```
[llama-swap](https://github.com/mostlygeek/llama-swap) is a model-swapping proxy
over llama.cpp. Its chat API is OpenAI-compatible (majordomo reuses the openai
client), and the `*llamaswap.Provider` adds management methods
(`ListModels`/`Running`/`Unload`) plus image generation (see below). A cold
model swap can take many seconds — bound calls with a context deadline, not a
client timeout.
### Custom providers
Implement the two-method `Provider` interface and register it:
@@ -191,6 +204,27 @@ resp, err := m.Generate(ctx, majordomo.Request{
})
```
## Image generation
Text-to-image is a separate contract (`imagegen`) from chat, because it shares
none of the message/tool/stream machinery. Generated images come back as
`llm.ImagePart`, so they drop straight back into a chat turn. The first backend
is llama-swap (OpenAI `/v1/images/generations` → a stable-diffusion.cpp
upstream).
```go
ls := llamaswap.New(llamaswap.WithBaseURL("http://box.local:8080"))
im, _ := ls.ImageModel("sd-xl")
res, err := im.Generate(ctx, imagegen.Request{Prompt: "a red bicycle"},
imagegen.WithSize("1024x1024"))
// res.Images[0] is an llm.ImagePart (bytes + MIME) — feed it back into chat:
// majordomo.UserParts(majordomo.Text("describe this"), res.Images[0])
```
`*llamaswap.Provider` also exposes management methods: `ListModels` (what
llama-swap can serve), `Running` (what's loaded), and `Unload` (free a model).
## Tool calls
```go
@@ -312,12 +346,19 @@ to build one.
| Ollama Cloud | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Ollama (local) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| foreman | ✅ | ✅ | ✅¹ | ✅ | ✅ | ✅ | ✅ |
| llama-swap | ✅ | ✅ | ✅ | ✅² | ✅² | ✅² | ✅ |
| fake (testing) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — |
¹ foreman's daemon currently buffers sync chat responses (no token-by-token
streaming); majordomo's stream API works against it and delivers the
response as a single delta plus final event.
² llama-swap's chat is OpenAI-compatible and reuses the openai client, so these
capabilities are present at the client level; whether a given call succeeds
depends on the llama.cpp model llama-swap loads. llama-swap also provides
**image generation** (a separate `imagegen` axis, not shown above) and
management methods on `*llamaswap.Provider`.
Notes: Ollama has no native tool_choice — `"none"` drops the tools;
`"required"`/named choices are best-effort ignored there. Ollama Cloud
ignores the `format` field (verified live), so the provider also states