Files
majordomo/docs/adr/0016-imagegen-interface.md
steve a213c18263
CI / Tidy (pull_request) Successful in 9m24s
CI / Build & Test (pull_request) Successful in 9m45s
Adversarial Review (Gadfly) / review (pull_request) Successful in 11m30s
fix(llamaswap): use A1111 /sdapi/v1/txt2img so seed is honored
The OpenAI /v1/images/generations endpoint ignores `seed` on our
stable-diffusion.cpp build — every render of a given prompt comes back
byte-identical, so a drawbot batch of N collapsed to one image. Switch the
image provider to sd-server's A1111 /sdapi/v1/txt2img endpoint, which honors
`seed` (verified live: distinct seeds -> distinct images on SDXL and
Qwen-Image). Size is split into width/height; llama-swap still routes by the
`model` field. Tests + ADR-0016 updated.
2026-06-28 22:56:25 -04:00

3.4 KiB

ADR-0016: imagegen — a canonical text-to-image interface

Status: Accepted — 2026-06-27

Context

mort needs to generate images (via llama-swap's stable-diffusion.cpp backend), and majordomo had no image-generation surface. Image generation does not fit the chat contract: there are no conversation messages, tools, streaming, or failover-chain semantics — forcing it through llm.Request/llm.Response/ llm.Model would overload that contract with mostly-unused fields. The user asked for "a new ai image interface as opposed to llm".

Decision

  • A new canonical leaf package imagegen, parallel to llm, re-exported from the root (ImageModel, ImageProvider, ImageRequest, ImageResult, ImageOption, plus WithImageCount/WithImageSize). Providers import imagegen; mort codes to the interface, not to llama-swap.
  • Minimal v1 surface (text-to-image only):
    • Request{ Prompt string; N int; Size string } — zero values mean provider default (N=0 → backend default count; "" Size → backend default).
    • Result{ Images []Image; Raw any }.
    • Model.Generate(ctx, Request, ...Option) (*Result, error) and Provider.ImageModel(id, ...ModelOption) (Model, error).
    • Functional options + Request.Apply, mirroring llm.
  • type Image = llm.ImagePart (bytes + MIME). Reusing the chat content type means a generated image drops straight back into a chat turn (llm.UserParts(res.Images[0])) with no conversion — the key interop win.
  • Out of scope for v1 (designed-for, deferred): image edits / img2img, the raw A1111 SDAPI, masks/seeds/steps, streaming, and registry-level image-model DSN resolution (construct the provider directly for now).
  • First implementation: provider/llamaswap, targeting OpenAI /v1/images/generations with response_format: "b64_json" (bytes inline; we never fetch remote URLs — mirrors ImagePart's bytes-only contract).

Consequences

  • Image generation is provider-agnostic from day one; a future OpenAI DALL·E or Gemini image backend implements the same interface.
  • The narrow interface keeps the door open for richer requests without breaking callers (additive fields/options).
  • No health/failover for image models yet; if needed it can be added as a separate chain type rather than retrofitting the chat chain.

Update — optional per-request settings

Request gained additive optional overrides — Steps *int, CFGScale *float64, NegativePrompt string, Sampler string, Seed *int64 — with mirror options (WithSteps, …). nil/"" means "leave the backend's per-model default", so the v1 contract is unchanged for callers that don't set them. provider/llamaswap forwards them to sd-server as steps/cfg_scale/negative_prompt/sample_method/ seed (omitempty). This realizes the "seeds/steps … additive fields" note above; img2img/masks/streaming remain deferred.

Update — A1111 txt2img endpoint (seed support)

provider/llamaswap now POSTs to sd-server's /sdapi/v1/txt2img (A1111) instead of the OpenAI /v1/images/generations. That OpenAI endpoint ignores seed on the stable-diffusion.cpp build we run — every render of a prompt is byte-identical, so a batch of N collapses to one image. /sdapi/v1/txt2img honours seed, restoring real per-render variety. llama-swap still routes by the model field in the body; Size is split into width/height.