Add Steps, CFGScale, NegativePrompt, Sampler, Seed to imagegen.Request (pointer/empty = leave the backend's per-model default), with mirror options, and forward them in the llamaswap wire payload as the stable-diffusion.cpp fields (steps/cfg_scale/negative_prompt/ sample_method/seed). Unset fields are omitted so sd-server keeps its baked defaults. Lets callers (e.g. mort drawbots) override only what they explicitly set.
2.9 KiB
ADR-0016: imagegen — a canonical text-to-image interface
Status: Accepted — 2026-06-27
Context
mort needs to generate images (via llama-swap's stable-diffusion.cpp backend),
and majordomo had no image-generation surface. Image generation does not fit
the chat contract: there are no conversation messages, tools, streaming, or
failover-chain semantics — forcing it through llm.Request/llm.Response/
llm.Model would overload that contract with mostly-unused fields. The user
asked for "a new ai image interface as opposed to llm".
Decision
- A new canonical leaf package
imagegen, parallel tollm, re-exported from the root (ImageModel,ImageProvider,ImageRequest,ImageResult,ImageOption, plusWithImageCount/WithImageSize). Providers importimagegen; mort codes to the interface, not to llama-swap. - Minimal v1 surface (text-to-image only):
Request{ Prompt string; N int; Size string }— zero values mean provider default (N=0 → backend default count; "" Size → backend default).Result{ Images []Image; Raw any }.Model.Generate(ctx, Request, ...Option) (*Result, error)andProvider.ImageModel(id, ...ModelOption) (Model, error).- Functional options +
Request.Apply, mirroringllm.
type Image = llm.ImagePart(bytes + MIME). Reusing the chat content type means a generated image drops straight back into a chat turn (llm.UserParts(res.Images[0])) with no conversion — the key interop win.- Out of scope for v1 (designed-for, deferred): image edits / img2img, the raw A1111 SDAPI, masks/seeds/steps, streaming, and registry-level image-model DSN resolution (construct the provider directly for now).
- First implementation:
provider/llamaswap, targeting OpenAI/v1/images/generationswithresponse_format: "b64_json"(bytes inline; we never fetch remote URLs — mirrorsImagePart's bytes-only contract).
Consequences
- Image generation is provider-agnostic from day one; a future OpenAI DALL·E or Gemini image backend implements the same interface.
- The narrow interface keeps the door open for richer requests without breaking callers (additive fields/options).
- No health/failover for image models yet; if needed it can be added as a separate chain type rather than retrofitting the chat chain.
Update — optional per-request settings
Request gained additive optional overrides — Steps *int, CFGScale *float64,
NegativePrompt string, Sampler string, Seed *int64 — with mirror options
(WithSteps, …). nil/"" means "leave the backend's per-model default", so the v1
contract is unchanged for callers that don't set them. provider/llamaswap
forwards them to sd-server as steps/cfg_scale/negative_prompt/sample_method/
seed (omitempty). This realizes the "seeds/steps … additive fields" note above;
img2img/masks/streaming remain deferred.