a213c18263
The OpenAI /v1/images/generations endpoint ignores `seed` on our stable-diffusion.cpp build — every render of a given prompt comes back byte-identical, so a drawbot batch of N collapsed to one image. Switch the image provider to sd-server's A1111 /sdapi/v1/txt2img endpoint, which honors `seed` (verified live: distinct seeds -> distinct images on SDXL and Qwen-Image). Size is split into width/height; llama-swap still routes by the `model` field. Tests + ADR-0016 updated.
64 lines
3.4 KiB
Markdown
64 lines
3.4 KiB
Markdown
# ADR-0016: imagegen — a canonical text-to-image interface
|
|
|
|
**Status:** Accepted — 2026-06-27
|
|
|
|
## Context
|
|
|
|
mort needs to generate images (via llama-swap's stable-diffusion.cpp backend),
|
|
and majordomo had no image-generation surface. Image generation does not fit
|
|
the chat contract: there are no conversation messages, tools, streaming, or
|
|
failover-chain semantics — forcing it through `llm.Request`/`llm.Response`/
|
|
`llm.Model` would overload that contract with mostly-unused fields. The user
|
|
asked for "a new ai image interface as opposed to llm".
|
|
|
|
## Decision
|
|
|
|
- A new canonical **leaf package `imagegen`**, parallel to `llm`, re-exported
|
|
from the root (`ImageModel`, `ImageProvider`, `ImageRequest`, `ImageResult`,
|
|
`ImageOption`, plus `WithImageCount`/`WithImageSize`). Providers import
|
|
`imagegen`; mort codes to the interface, not to llama-swap.
|
|
- Minimal v1 surface (text-to-image only):
|
|
- `Request{ Prompt string; N int; Size string }` — zero values mean provider
|
|
default (N=0 → backend default count; "" Size → backend default).
|
|
- `Result{ Images []Image; Raw any }`.
|
|
- `Model.Generate(ctx, Request, ...Option) (*Result, error)` and
|
|
`Provider.ImageModel(id, ...ModelOption) (Model, error)`.
|
|
- Functional options + `Request.Apply`, mirroring `llm`.
|
|
- **`type Image = llm.ImagePart`** (bytes + MIME). Reusing the chat content type
|
|
means a generated image drops straight back into a chat turn
|
|
(`llm.UserParts(res.Images[0])`) with no conversion — the key interop win.
|
|
- Out of scope for v1 (designed-for, deferred): image edits / img2img, the raw
|
|
A1111 SDAPI, masks/seeds/steps, streaming, and registry-level image-model DSN
|
|
resolution (construct the provider directly for now).
|
|
- First implementation: `provider/llamaswap`, targeting OpenAI
|
|
`/v1/images/generations` with `response_format: "b64_json"` (bytes inline; we
|
|
never fetch remote URLs — mirrors `ImagePart`'s bytes-only contract).
|
|
|
|
## Consequences
|
|
|
|
- Image generation is provider-agnostic from day one; a future OpenAI DALL·E or
|
|
Gemini image backend implements the same interface.
|
|
- The narrow interface keeps the door open for richer requests without breaking
|
|
callers (additive fields/options).
|
|
- No health/failover for image models yet; if needed it can be added as a
|
|
separate chain type rather than retrofitting the chat chain.
|
|
|
|
## Update — optional per-request settings
|
|
|
|
`Request` gained additive optional overrides — `Steps *int`, `CFGScale *float64`,
|
|
`NegativePrompt string`, `Sampler string`, `Seed *int64` — with mirror options
|
|
(`WithSteps`, …). nil/"" means "leave the backend's per-model default", so the v1
|
|
contract is unchanged for callers that don't set them. `provider/llamaswap`
|
|
forwards them to sd-server as `steps`/`cfg_scale`/`negative_prompt`/`sample_method`/
|
|
`seed` (omitempty). This realizes the "seeds/steps … additive fields" note above;
|
|
img2img/masks/streaming remain deferred.
|
|
|
|
## Update — A1111 txt2img endpoint (seed support)
|
|
|
|
`provider/llamaswap` now POSTs to sd-server's **`/sdapi/v1/txt2img`** (A1111)
|
|
instead of the OpenAI `/v1/images/generations`. That OpenAI endpoint **ignores
|
|
`seed`** on the stable-diffusion.cpp build we run — every render of a prompt is
|
|
byte-identical, so a batch of N collapses to one image. `/sdapi/v1/txt2img`
|
|
honours `seed`, restoring real per-render variety. llama-swap still routes by
|
|
the `model` field in the body; `Size` is split into `width`/`height`.
|