feat(llamaswap): add llama-swap provider + canonical imagegen interface
Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping
proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to
provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with
legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and
a timeout-free client so cold model swaps rely on context deadlines. The
"tailored" surface is concrete management methods (ListModels / Running /
Unload) that don't belong on the canonical llm.Provider interface. The
llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL
built-in errors clearly on use, mirroring foreman.
Add imagegen, a new canonical text-to-image interface separate from llm
(Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed
straight back into chat). First backend is llama-swap via OpenAI
/v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is
txt2img only.
Hermetic httptest coverage for chat delegation, management endpoints, image
decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix +
image-gen section, CLAUDE.md package map, and progress.md updated in the same
commit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,58 @@
|
||||
# ADR-0015: llama-swap provider
|
||||
|
||||
**Status:** Accepted — 2026-06-27
|
||||
|
||||
## Context
|
||||
|
||||
llama-swap (https://github.com/mostlygeek/llama-swap) is an on-demand
|
||||
model-swapping proxy in front of llama.cpp (and stable-diffusion.cpp) servers:
|
||||
it extracts the `model` from each request, loads/hot-swaps the matching
|
||||
upstream, and serves it. It is what foreman reached for, but more robust
|
||||
(groups, TTL unload, health checks, a management API). We want it as a
|
||||
first-class majordomo target — `llama-swap://token@host:port` in the DSN — and
|
||||
the user explicitly asked for a *tailored* provider, not a bare alias of the
|
||||
OpenAI client.
|
||||
|
||||
The tension: llama-swap's **chat** API is byte-for-byte OpenAI Chat
|
||||
Completions. A new hand-rolled chat wire client would duplicate
|
||||
`provider/openai` for zero behavioral gain, which ADR-0007 forbids. But the
|
||||
"more robust" surface (model discovery, running list, unload) does not fit the
|
||||
canonical `llm.Provider`/`llm.Model` interface (anti-creep: no provider-specific
|
||||
features leak into the canonical API).
|
||||
|
||||
## Decision
|
||||
|
||||
- A dedicated `provider/llamaswap` package, but its chat path **delegates to
|
||||
`provider/openai`** pointed at `{baseURL}/v1` — no duplicated wire client.
|
||||
`Provider.Model` returns `openai.New(...).Model(id)`.
|
||||
- Chat construction specifics: `WithLegacyMaxTokens()` (llama.cpp's OpenAI shim
|
||||
honors `max_tokens`, not `max_completion_tokens`); a placeholder `Bearer
|
||||
no-key` when no token is set (the openai client treats a blank key as a
|
||||
synthetic 401, but a local keyless llama-swap ignores a bearer it didn't ask
|
||||
for); the injected HTTP client carries **no timeout** — a cold model swap
|
||||
blocks up to llama-swap's `healthCheckTimeout` (≥15s), so callers bound work
|
||||
with a context deadline, never a client timeout.
|
||||
- The "tailored" surface lives as **concrete methods** on `*llamaswap.Provider`,
|
||||
outside the canonical interface: `ListModels` (GET `/v1/models`), `Running`
|
||||
(GET `/running`, returned as raw JSON — its shape is not a stable contract),
|
||||
`Unload` (POST `/api/models/unload[/:model]`). A small `doJSON` helper shares
|
||||
bearer auth + error mapping; non-2xx → `*llm.APIError` (so `llm.Classify`
|
||||
applies), transport errors wrapped raw.
|
||||
- DSN: the `llama-swap` scheme builds an **http://** base URL from the host
|
||||
(llama-swap is local-first), deliberately *not* the DSN's https-always
|
||||
`BaseURL()`. A TLS-fronted instance can use the `openai://` scheme for chat.
|
||||
A no-DSN built-in `llama-swap` provider registers but errors on use (mirrors
|
||||
foreman).
|
||||
- Image generation is implemented here too, against the new `imagegen`
|
||||
interface (see ADR-0016).
|
||||
|
||||
## Consequences
|
||||
|
||||
- No new dependency, no duplicated chat client; the chat path inherits every
|
||||
openai feature/fix automatically.
|
||||
- Management methods are reachable only by holding the concrete
|
||||
`*llamaswap.Provider` (e.g. mort), not through `Parse`/`llm.Provider` — the
|
||||
correct boundary for non-canonical features.
|
||||
- `Running`'s raw-JSON return is honest about llama-swap not publishing a stable
|
||||
schema; a typed shape can be added later without breaking callers that ignore
|
||||
it.
|
||||
@@ -0,0 +1,44 @@
|
||||
# ADR-0016: imagegen — a canonical text-to-image interface
|
||||
|
||||
**Status:** Accepted — 2026-06-27
|
||||
|
||||
## Context
|
||||
|
||||
mort needs to generate images (via llama-swap's stable-diffusion.cpp backend),
|
||||
and majordomo had no image-generation surface. Image generation does not fit
|
||||
the chat contract: there are no conversation messages, tools, streaming, or
|
||||
failover-chain semantics — forcing it through `llm.Request`/`llm.Response`/
|
||||
`llm.Model` would overload that contract with mostly-unused fields. The user
|
||||
asked for "a new ai image interface as opposed to llm".
|
||||
|
||||
## Decision
|
||||
|
||||
- A new canonical **leaf package `imagegen`**, parallel to `llm`, re-exported
|
||||
from the root (`ImageModel`, `ImageProvider`, `ImageRequest`, `ImageResult`,
|
||||
`ImageOption`, plus `WithImageCount`/`WithImageSize`). Providers import
|
||||
`imagegen`; mort codes to the interface, not to llama-swap.
|
||||
- Minimal v1 surface (text-to-image only):
|
||||
- `Request{ Prompt string; N int; Size string }` — zero values mean provider
|
||||
default (N=0 → backend default count; "" Size → backend default).
|
||||
- `Result{ Images []Image; Raw any }`.
|
||||
- `Model.Generate(ctx, Request, ...Option) (*Result, error)` and
|
||||
`Provider.ImageModel(id, ...ModelOption) (Model, error)`.
|
||||
- Functional options + `Request.Apply`, mirroring `llm`.
|
||||
- **`type Image = llm.ImagePart`** (bytes + MIME). Reusing the chat content type
|
||||
means a generated image drops straight back into a chat turn
|
||||
(`llm.UserParts(res.Images[0])`) with no conversion — the key interop win.
|
||||
- Out of scope for v1 (designed-for, deferred): image edits / img2img, the raw
|
||||
A1111 SDAPI, masks/seeds/steps, streaming, and registry-level image-model DSN
|
||||
resolution (construct the provider directly for now).
|
||||
- First implementation: `provider/llamaswap`, targeting OpenAI
|
||||
`/v1/images/generations` with `response_format: "b64_json"` (bytes inline; we
|
||||
never fetch remote URLs — mirrors `ImagePart`'s bytes-only contract).
|
||||
|
||||
## Consequences
|
||||
|
||||
- Image generation is provider-agnostic from day one; a future OpenAI DALL·E or
|
||||
Gemini image backend implements the same interface.
|
||||
- The narrow interface keeps the door open for richer requests without breaking
|
||||
callers (additive fields/options).
|
||||
- No health/failover for image models yet; if needed it can be added as a
|
||||
separate chain type rather than retrofitting the chat chain.
|
||||
@@ -18,3 +18,5 @@ One decision per file, append-only; supersede rather than rewrite.
|
||||
| [0012](0012-agent-loop.md) | Agent run loop | Accepted |
|
||||
| [0013](0013-skill-model.md) | Skill model — additive instruction+tool bundles | Accepted |
|
||||
| [0014](0014-conversion-driven-extensions.md) | Conversion-driven extensions (resolvers, typed tools, hooks, ops controls) | Accepted |
|
||||
| [0015](0015-llama-swap-provider.md) | llama-swap provider — reuse openai for chat, tailored management + image | Accepted |
|
||||
| [0016](0016-imagegen-interface.md) | imagegen — a canonical text-to-image interface | Accepted |
|
||||
|
||||
Reference in New Issue
Block a user