Files
majordomo/docs/adr/0016-imagegen-interface.md
T
steve db8d455bd8
CI / Tidy (pull_request) Successful in 9m23s
CI / Build & Test (pull_request) Successful in 9m39s
Adversarial Review (Gadfly) / review (pull_request) Has been cancelled
feat(imagegen): optional per-request generation settings
Add Steps, CFGScale, NegativePrompt, Sampler, Seed to imagegen.Request
(pointer/empty = leave the backend's per-model default), with mirror
options, and forward them in the llamaswap wire payload as the
stable-diffusion.cpp fields (steps/cfg_scale/negative_prompt/
sample_method/seed). Unset fields are omitted so sd-server keeps its
baked defaults.

Lets callers (e.g. mort drawbots) override only what they explicitly set.
2026-06-28 18:21:32 -04:00

55 lines
2.9 KiB
Markdown

# ADR-0016: imagegen — a canonical text-to-image interface
**Status:** Accepted — 2026-06-27
## Context
mort needs to generate images (via llama-swap's stable-diffusion.cpp backend),
and majordomo had no image-generation surface. Image generation does not fit
the chat contract: there are no conversation messages, tools, streaming, or
failover-chain semantics — forcing it through `llm.Request`/`llm.Response`/
`llm.Model` would overload that contract with mostly-unused fields. The user
asked for "a new ai image interface as opposed to llm".
## Decision
- A new canonical **leaf package `imagegen`**, parallel to `llm`, re-exported
from the root (`ImageModel`, `ImageProvider`, `ImageRequest`, `ImageResult`,
`ImageOption`, plus `WithImageCount`/`WithImageSize`). Providers import
`imagegen`; mort codes to the interface, not to llama-swap.
- Minimal v1 surface (text-to-image only):
- `Request{ Prompt string; N int; Size string }` — zero values mean provider
default (N=0 → backend default count; "" Size → backend default).
- `Result{ Images []Image; Raw any }`.
- `Model.Generate(ctx, Request, ...Option) (*Result, error)` and
`Provider.ImageModel(id, ...ModelOption) (Model, error)`.
- Functional options + `Request.Apply`, mirroring `llm`.
- **`type Image = llm.ImagePart`** (bytes + MIME). Reusing the chat content type
means a generated image drops straight back into a chat turn
(`llm.UserParts(res.Images[0])`) with no conversion — the key interop win.
- Out of scope for v1 (designed-for, deferred): image edits / img2img, the raw
A1111 SDAPI, masks/seeds/steps, streaming, and registry-level image-model DSN
resolution (construct the provider directly for now).
- First implementation: `provider/llamaswap`, targeting OpenAI
`/v1/images/generations` with `response_format: "b64_json"` (bytes inline; we
never fetch remote URLs — mirrors `ImagePart`'s bytes-only contract).
## Consequences
- Image generation is provider-agnostic from day one; a future OpenAI DALL·E or
Gemini image backend implements the same interface.
- The narrow interface keeps the door open for richer requests without breaking
callers (additive fields/options).
- No health/failover for image models yet; if needed it can be added as a
separate chain type rather than retrofitting the chat chain.
## Update — optional per-request settings
`Request` gained additive optional overrides — `Steps *int`, `CFGScale *float64`,
`NegativePrompt string`, `Sampler string`, `Seed *int64` — with mirror options
(`WithSteps`, …). nil/"" means "leave the backend's per-model default", so the v1
contract is unchanged for callers that don't set them. `provider/llamaswap`
forwards them to sd-server as `steps`/`cfg_scale`/`negative_prompt`/`sample_method`/
`seed` (omitempty). This realizes the "seeds/steps … additive fields" note above;
img2img/masks/streaming remain deferred.