feat(llamaswap): add llama-swap provider + canonical imagegen interface
Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping
proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to
provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with
legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and
a timeout-free client so cold model swaps rely on context deadlines. The
"tailored" surface is concrete management methods (ListModels / Running /
Unload) that don't belong on the canonical llm.Provider interface. The
llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL
built-in errors clearly on use, mirroring foreman.
Add imagegen, a new canonical text-to-image interface separate from llm
(Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed
straight back into chat). First backend is llama-swap via OpenAI
/v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is
txt2img only.
Hermetic httptest coverage for chat delegation, management endpoints, image
decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix +
image-gen section, CLAUDE.md package map, and progress.md updated in the same
commit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -126,6 +126,7 @@ Chains are health-tracked per target:
|
||||
| Ollama Cloud | `ollama-cloud` | `OLLAMA_API_KEY` | https://ollama.com |
|
||||
| Ollama (local) | `ollama` | — | `OLLAMA_HOST` or http://localhost:11434 |
|
||||
| foreman | `foreman` | — (token via DSN) | requires an LLM_* DSN or `ollama.Foreman(url, token)` |
|
||||
| llama-swap | `llama-swap` | — (token via DSN) | requires an LLM_* DSN or `llamaswap.New(...)` |
|
||||
|
||||
OpenAI-compatible / Anthropic-compatible endpoints: construct the provider
|
||||
with a name and base URL and register it —
|
||||
@@ -158,12 +159,24 @@ m, _ := reg.Parse("m5/qwen3:30b,m1/qwen3:30b,thinking")
|
||||
```
|
||||
|
||||
DSN format: `scheme://[token@]host[/path]`, scheme ∈ `foreman`, `ollama`,
|
||||
`ollama-cloud`, `openai`, `anthropic`, `google`/`gemini`, or any scheme you
|
||||
add with `RegisterScheme`. The token is the credential (bearer token / API
|
||||
key); the base URL is always `https://host[/path]`. `New()` loads `LLM_*`
|
||||
`ollama-cloud`, `openai`, `anthropic`, `google`/`gemini`, `llama-swap`, or any
|
||||
scheme you add with `RegisterScheme`. The token is the credential (bearer token
|
||||
/ API key); the base URL is always `https://host[/path]` — except `llama-swap`,
|
||||
which builds `http://host[:port]` since it's local-first. `New()` loads `LLM_*`
|
||||
vars eagerly; unknown provider names also resolve lazily at Parse time
|
||||
(`my-prov/x` → `LLM_MY_PROV`).
|
||||
|
||||
```
|
||||
LLM_LS=llama-swap://token@box.local:8080 # then "ls/qwen3:14b" parses
|
||||
```
|
||||
|
||||
[llama-swap](https://github.com/mostlygeek/llama-swap) is a model-swapping proxy
|
||||
over llama.cpp. Its chat API is OpenAI-compatible (majordomo reuses the openai
|
||||
client), and the `*llamaswap.Provider` adds management methods
|
||||
(`ListModels`/`Running`/`Unload`) plus image generation (see below). A cold
|
||||
model swap can take many seconds — bound calls with a context deadline, not a
|
||||
client timeout.
|
||||
|
||||
### Custom providers
|
||||
|
||||
Implement the two-method `Provider` interface and register it:
|
||||
@@ -191,6 +204,27 @@ resp, err := m.Generate(ctx, majordomo.Request{
|
||||
})
|
||||
```
|
||||
|
||||
## Image generation
|
||||
|
||||
Text-to-image is a separate contract (`imagegen`) from chat, because it shares
|
||||
none of the message/tool/stream machinery. Generated images come back as
|
||||
`llm.ImagePart`, so they drop straight back into a chat turn. The first backend
|
||||
is llama-swap (OpenAI `/v1/images/generations` → a stable-diffusion.cpp
|
||||
upstream).
|
||||
|
||||
```go
|
||||
ls := llamaswap.New(llamaswap.WithBaseURL("http://box.local:8080"))
|
||||
im, _ := ls.ImageModel("sd-xl")
|
||||
|
||||
res, err := im.Generate(ctx, imagegen.Request{Prompt: "a red bicycle"},
|
||||
imagegen.WithSize("1024x1024"))
|
||||
// res.Images[0] is an llm.ImagePart (bytes + MIME) — feed it back into chat:
|
||||
// majordomo.UserParts(majordomo.Text("describe this"), res.Images[0])
|
||||
```
|
||||
|
||||
`*llamaswap.Provider` also exposes management methods: `ListModels` (what
|
||||
llama-swap can serve), `Running` (what's loaded), and `Unload` (free a model).
|
||||
|
||||
## Tool calls
|
||||
|
||||
```go
|
||||
@@ -312,12 +346,19 @@ to build one.
|
||||
| Ollama Cloud | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Ollama (local) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| foreman | ✅ | ✅ | ✅¹ | ✅ | ✅ | ✅ | ✅ |
|
||||
| llama-swap | ✅ | ✅ | ✅ | ✅² | ✅² | ✅² | ✅ |
|
||||
| fake (testing) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — |
|
||||
|
||||
¹ foreman's daemon currently buffers sync chat responses (no token-by-token
|
||||
streaming); majordomo's stream API works against it and delivers the
|
||||
response as a single delta plus final event.
|
||||
|
||||
² llama-swap's chat is OpenAI-compatible and reuses the openai client, so these
|
||||
capabilities are present at the client level; whether a given call succeeds
|
||||
depends on the llama.cpp model llama-swap loads. llama-swap also provides
|
||||
**image generation** (a separate `imagegen` axis, not shown above) and
|
||||
management methods on `*llamaswap.Provider`.
|
||||
|
||||
Notes: Ollama has no native tool_choice — `"none"` drops the tools;
|
||||
`"required"`/named choices are best-effort ignored there. Ollama Cloud
|
||||
ignores the `format` field (verified live), so the provider also states
|
||||
|
||||
Reference in New Issue
Block a user