# ADR-0016: imagegen — a canonical text-to-image interface **Status:** Accepted — 2026-06-27 ## Context mort needs to generate images (via llama-swap's stable-diffusion.cpp backend), and majordomo had no image-generation surface. Image generation does not fit the chat contract: there are no conversation messages, tools, streaming, or failover-chain semantics — forcing it through `llm.Request`/`llm.Response`/ `llm.Model` would overload that contract with mostly-unused fields. The user asked for "a new ai image interface as opposed to llm". ## Decision - A new canonical **leaf package `imagegen`**, parallel to `llm`, re-exported from the root (`ImageModel`, `ImageProvider`, `ImageRequest`, `ImageResult`, `ImageOption`, plus `WithImageCount`/`WithImageSize`). Providers import `imagegen`; mort codes to the interface, not to llama-swap. - Minimal v1 surface (text-to-image only): - `Request{ Prompt string; N int; Size string }` — zero values mean provider default (N=0 → backend default count; "" Size → backend default). - `Result{ Images []Image; Raw any }`. - `Model.Generate(ctx, Request, ...Option) (*Result, error)` and `Provider.ImageModel(id, ...ModelOption) (Model, error)`. - Functional options + `Request.Apply`, mirroring `llm`. - **`type Image = llm.ImagePart`** (bytes + MIME). Reusing the chat content type means a generated image drops straight back into a chat turn (`llm.UserParts(res.Images[0])`) with no conversion — the key interop win. - Out of scope for v1 (designed-for, deferred): image edits / img2img, the raw A1111 SDAPI, masks/seeds/steps, streaming, and registry-level image-model DSN resolution (construct the provider directly for now). - First implementation: `provider/llamaswap`, targeting OpenAI `/v1/images/generations` with `response_format: "b64_json"` (bytes inline; we never fetch remote URLs — mirrors `ImagePart`'s bytes-only contract). ## Consequences - Image generation is provider-agnostic from day one; a future OpenAI DALL·E or Gemini image backend implements the same interface. - The narrow interface keeps the door open for richer requests without breaking callers (additive fields/options). - No health/failover for image models yet; if needed it can be added as a separate chain type rather than retrofitting the chat chain.