# ADR-0009: Multimodal strategy — normalize per target, enforce at the provider **Status:** Accepted — 2026-06-10 ## Context Every provider (and some models) imposes different image rules: max dimensions/bytes, allowed MIME types, max images per request. A caller must be able to attach an image without knowing the eventual target — especially with failover chains, where the serving target isn't known until runtime. ## Decision Two cooperating layers: 1. **`media.Normalize(req, caps)`** — the transformation point. The chain executor calls it **per target, per attempt**, against the actual target's capabilities, before the provider sees the request: - The real format is **sniffed from magic bytes** and wins over the declared MIME (callers lie; jpeg/png/gif/webp recognized). - Already-fitting images pass through untouched (fast path: zero copies). - Oversize dimensions downscale (aspect-preserving) with a hand-rolled box-filter — stdlib has no scaler and `x/image` stays out per ADR-0007; box-average quality is ample for vision input. - Disallowed MIME re-encodes: original format if allowed, else JPEG (q85), else PNG, else the first allowed encodable type. - Byte budgets enforce via a quality ladder (jpeg 85→65→45→30) then dimension halving; ~6 attempts before giving up. - WebP cannot be decoded by stdlib: it passes through when it fits and is allowed; any needed transform is a clear error. - Everything that cannot be made to fit errors **wrapping `llm.ErrUnsupported`** — never silently dropped. 2. **Provider backstop** — each provider cheaply enforces its effective capabilities at request time (image count/MIME/bytes, plus tools/structured/streaming support flags) and rejects with `ErrUnsupported`. This keeps providers honest for expert callers who build models directly without the registry. Chain semantics: a normalization failure for one target **advances** to the next element with no health penalty (the target isn't sick, it's just incapable) — so `fp/text-only,fp/vision` serves an image request from the vision element automatically. Canonical image content stays **bytes + MIME** (ADR-0002); no URL fetching. ## Consequences - A 100×50 PNG sent at a 32px-cap target arrives as a 32×16 PNG; the same request served by an 8000px target arrives untouched. - Conditional provider rules (e.g. Anthropic's 2000px cap above 20 images) are approximated by the flat declared caps — conservative and simple. ## Alternatives considered - Normalize once against chain-intersection caps: over-restricts every request for the sake of rarely-used fallbacks. Rejected (ADR-0008). - `x/image/draw` scalers: a dependency for one function. Rejected.