feat: foundations — canonical types, Parse grammar, env DSNs, health, chains

Phase 1 of the majordomo build: - llm/ canonical contract (messages, parts, tools, capabilities, streaming, Model/Provider, error classification) - health/ clock-injected tracker (threshold bench, exponential capped cooldown, reset-on-success) - root Registry + Parse (verbatim model ids, inline recursive alias expansion with cycle detection, chain dedup), LLM_* env-DSN providers (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain executor behind the Model interface - provider/fake scriptable test provider; hermetic test suite incl. the trailing-thinking chain and foreman:// env loading - ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow, docs/phase-1-design.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:35:23 +02:00
parent 3025044817
commit dcd004289f
42 changed files with 3863 additions and 0 deletions
@@ -0,0 +1,16 @@
+# majordomo example environment.
+# Copy to .env and fill in real values. .env is gitignored — never commit it.
+
+# Ollama Cloud API key (used by the ollama-cloud provider and live tests).
+OLLAMA_API_KEY=your-ollama-cloud-key-here
+
+# Built-in provider keys (each optional; only needed for the providers you use).
+#OPENAI_API_KEY=sk-...
+#ANTHROPIC_API_KEY=sk-ant-...
+#GOOGLE_API_KEY=...
+
+# LLM_* env-DSN provider definitions (go-llm parity).
+# Format: LLM_<NAME>=scheme://[token@]host[/path]
+# <NAME> becomes the provider's registry name (LLM_M1 -> "m1").
+#LLM_M1=foreman://token@foreman-m1.example.com
+#LLM_M5=foreman://token@foreman-m5.example.com
@@ -0,0 +1,26 @@
+name: CI
+on:
+  push: { branches: ["*"] }
+  pull_request: { branches: ["*"] }
+jobs:
+  build:
+    name: Build & Test
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-go@v5
+        with: { go-version-file: "go.mod" }
+      - run: go mod download
+      - run: go build ./...
+      - run: go vet ./...
+      - run: go test -race -count=1 ./...
+  tidy:
+    name: Tidy
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-go@v5
+        with: { go-version-file: "go.mod" }
+      - run: |
+          go mod tidy
+          git diff --exit-code go.mod go.sum
@@ -25,3 +25,6 @@ go.work.sum
 # env file
 .env

+
+# macOS
+.DS_Store
@@ -0,0 +1,114 @@
+# CLAUDE.md — majordomo operating manual
+
+majordomo is a clean-slate Go substrate for LLM-backed agents:
+target-agnostic model access, a parseable model naming / failover / tiering
+system with health tracking, multimodality, tool calls, structured output,
+and agents composed from model + system prompt + toolboxes + skills.
+
+**North star:** majordomo exists to re-architect mort's agentic layer. mort
+is the first consumer and the design's acceptance test — when a choice is a
+toss-up, pick what makes mort's tiers, failover chains, toolboxes, and
+skills cleanest to express. But majordomo itself stays general-purpose and
+mort-agnostic: no mort types, no Discord, no mort config.
+
+## Module & stack
+
+- Module: `gitea.stevedudenhoeffer.com/steve/majordomo`, Go 1.26.
+- Stdlib-first (ADR-0007): hand-rolled `net/http` clients for
+  OpenAI(+compat), Anthropic(+compat), Ollama (cloud+local), foreman. The
+  one approved dependency is `google.golang.org/genai` (Google provider).
+  Anything else needs an ADR. No `go-llm`, no `go-agentkit` — importing
+  either is an automatic failure.
+
+## Package map (ADR-0001)
+
+```
+majordomo        Registry, Parse, env-DSN loading, chain executor, re-exports
+  llm/           canonical contract: Message/Part/Request/Response/Option,
+                 Tool/Toolbox, Capabilities, Stream, Model, Provider, errors
+  health/        clock-injected health tracker (bench/backoff)
+  media/         image normalization to target capabilities   (Phase 3)
+  provider/fake/ scriptable in-memory provider for hermetic tests
+  provider/{openai,anthropic,ollama,google}/                  (Phases 3-4)
+  agent/         Agent run loop                               (Phase 5)
+  skill/         Skill interface + composition                (Phase 6)
+  examples/      one runnable example per hard requirement    (Phase 7-8)
+```
+
+Canonical types live in leaf package `llm`; the root re-exports them via
+type aliases. Providers import `llm`, never each other, never the root.
+
+## Parse grammar (ADR-0003)
+
+```
+spec    := element ("," element)*       # ordered failover chain
+element := target | alias
+target  := provider "/" model           # model id VERBATIM after first "/"
+alias   := bare token (no slash), expands INLINE, recursively, cycle-checked
+```
+
+- `Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8")`
+  → try head-to-tail. Appending `,thinking` expands the registered alias in
+  place at the tail.
+- Provider resolution: registry (built-ins, RegisterProvider, eager env) →
+  lazy `LLM_{UPPER(name)}` env DSN → error.
+- Single element ≡ chain of one; same Model interface, same semantics.
+- No reasoning suffixes (`:high` etc. are NOT stripped — model ids are
+  verbatim). Reasoning effort becomes a request option (provider phases).
+
+## LLM_* env-DSN providers (ADR-0004, go-llm parity)
+
+`LLM_<NAME>=scheme://[token@]host[/path]` — e.g.
+`LLM_M5=foreman://token@foreman-m5.example` defines provider `m5`; then
+`m5/qwen3:30b` works in Parse, chains, and aliases. Scheme ∈ {foreman,
+ollama, ollama-cloud, openai, anthropic, google, gemini} ∪ RegisterScheme.
+Token = credential; base URL = `https://host` always. `New()` scans the
+process env eagerly; unknown names also resolve lazily at Parse time
+(`my-prov` → `LLM_MY_PROV`). Malformed entries fail on use, not at startup.
+
+## Health & failover (ADR-0006, ADR-0008)
+
+- Transient (408/429/5xx, timeouts, conn refused/reset, DNS, deadline) vs
+  permanent (400/401/403/404/405/422, model-not-found, ctx.Canceled).
+  Unknown → transient. Classifier overridable.
+- One transient error → retry same target (default 1 retry). Every failed
+  attempt counts; at threshold (default 2 consecutive) the target is
+  benched for base 5s × 2^n, capped 5m. Success fully resets. Chains skip
+  benched targets; 404 advances penalty-free; auth/malformed fail fast
+  (configurable); exhaustion returns a joined error naming every target.
+- Tracker is in-memory, process-local, clock-injected. No persistence.
+
+## House conventions (mirror foreman)
+
+- gofmt; check errors immediately and wrap with `fmt.Errorf("%w: ...")`;
+  imports stdlib → third-party → internal; `// Why:` doc comments where
+  rationale isn't obvious.
+- ADRs in `docs/adr/`, one decision each, append-only, indexed in its
+  README. progress.md gets a dated entry per phase.
+- Conventional commits (`feat:`, `test:`, `docs:`, `chore:`, `refactor:`).
+- Tests are hermetic: fake provider + fake clock; provider clients test
+  against `httptest`; **no network or credentials in the default suite**.
+  Live tests sit behind `//go:build live` / `examples/live/` and skip
+  without their env vars.
+- `.env` holds live keys (gitignored, never committed/printed/quoted);
+  `.env.example` carries placeholders.
+
+## Gates (every phase; what CI runs)
+
+```
+go build ./...
+go vet ./...
+go test -race -count=1 ./...
+go mod tidy && git diff --exit-code go.mod go.sum
+```
+
+CI: `.gitea/workflows/ci.yaml` (Gitea Actions, mirrors foreman). README.md
+must match reality in the same commit that changes behavior — no
+aspirational docs; unbuilt features are marked pending in the matrix.
+
+## Out of scope (anti-creep)
+
+No persistent store (health is in-memory behind the registry), no
+observability/metrics stack, no config-file framework beyond LLM_* env
+DSNs, no CLI beyond examples, no provider-specific features leaking into
+the canonical API, nothing mort-specific in the library.
@@ -1,2 +1,216 @@
 # majordomo

+A clean-slate Go library for building LLM-backed agents: one canonical API
+over many model providers, a parseable model naming / failover / tiering
+system with built-in health tracking, capability-aware multimodality, tool
+calls, structured output, and composable agents and skills.
+
+> **Status:** under construction, phase by phase. The
+> [support matrix](#featureprovider-support-matrix) below is kept honest:
+> *pending* means not built yet, and this README is updated in the same
+> commit as the behavior it describes.
+
+## Install
+
+```bash
+go get gitea.stevedudenhoeffer.com/steve/majordomo
+```
+
+Requires Go 1.26+.
+
+## Quickstart
+
+```go
+package main
+
+import (
+    "context"
+    "fmt"
+
+    "gitea.stevedudenhoeffer.com/steve/majordomo"
+)
+
+func main() {
+    reg := majordomo.New() // built-ins + LLM_* env providers
+
+    m, err := reg.Parse("ollama-cloud/minimax-m3:cloud")
+    if err != nil { panic(err) }
+
+    resp, err := m.Generate(context.Background(), majordomo.Request{
+        Messages: []majordomo.Message{majordomo.UserText("hello!")},
+    })
+    if err != nil { panic(err) }
+    fmt.Println(resp.Text())
+}
+```
+
+`majordomo.Parse(...)` (package level) uses a lazily-built default registry
+if you don't need isolation.
+
+## Model specs: targets, chains, tiers
+
+A model spec is a comma-separated **failover chain**; each element is either
+a `provider/model` target or a registered **alias** (tier):
+
+```go
+// Try minimax-m3 first; on failure kimi-k2.6; finally fall back to opus-4.8.
+m, _ := reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8")
+
+// Identical, with the registered alias "thinking" appended and expanded
+// in place as the tail of the chain:
+m, _ = reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8,thinking")
+```
+
+Everything after the **first `/`** (up to the next comma) is the model id,
+passed to the provider **verbatim** — tags (`:cloud`, `:30b`) and ids with
+extra slashes survive intact. majordomo never validates ids against a
+catalog.
+
+### Custom tiers (aliases)
+
+```go
+reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
+reg.RegisterAlias("workhorse", "ollama-cloud/minimax-m2.7:cloud,ollama-cloud/qwen3-coder:480b-cloud")
+
+m, _ := reg.Parse("thinking") // a chain, same Model interface as a single target
+```
+
+Aliases may appear anywhere in a chain (head, middle, tail), may reference
+other aliases, and expand inline and recursively; cycles are detected and
+returned as errors.
+
+### Failover & health
+
+Chains are health-tracked per target:
+
+- A **single transient error** (429/5xx, timeout, connection failure) is
+  retried once on the same target.
+- **Repeated transient errors** (default: 2 consecutive failed attempts)
+  bench the target — chains skip it until its cooldown expires (exponential:
+  5s, 10s, 20s, ... capped at 5m). Any success resets it.
+- `model not found` advances down the chain without penalty; auth/malformed
+  errors fail fast (failing over can't fix a bad key). All knobs are
+  configurable via `WithChainConfig` / `WithHealthConfig`.
+- If every element fails, you get one joined error naming each target and
+  why it failed.
+
+## Providers
+
+### Built-in env vars
+
+| Provider | Spec name | Key env var | Default endpoint |
+|----------|-----------|-------------|------------------|
+| OpenAI (+compatible) | `openai` | `OPENAI_API_KEY` | api.openai.com *(pending)* |
+| Anthropic (+compatible) | `anthropic` | `ANTHROPIC_API_KEY` | api.anthropic.com *(pending)* |
+| Google (Gemini) | `google` | `GOOGLE_API_KEY` / `GEMINI_API_KEY` | Gen AI API *(pending)* |
+| Ollama Cloud | `ollama-cloud` | `OLLAMA_API_KEY` | https://ollama.com *(pending)* |
+| Ollama (local) | `ollama` | — | `OLLAMA_HOST` or http://localhost:11434 *(pending)* |
+| foreman | `foreman` | — (token via DSN) | requires DSN/base URL *(pending)* |
+
+### `LLM_*` env-DSN provider definitions
+
+Define named providers entirely from the environment (go-llm parity):
+
+```
+LLM_M1=foreman://test-token-change-me@foreman-m1.orgrimmar.dudenhoeffer.casa
+LLM_M5=foreman://test-token-change-me@foreman-m5.orgrimmar.dudenhoeffer.casa
+```
+
+defines providers `m1` and `m5` (foreman targets — native Ollama wire
+protocol behind a bearer token). They are first-class in `Parse`, chains,
+and aliases:
+
+```go
+m, _ := reg.Parse("m5/qwen3:30b,m1/qwen3:30b,thinking")
+```
+
+DSN format: `scheme://[token@]host[/path]`, scheme ∈ `foreman`, `ollama`,
+`ollama-cloud`, `openai`, `anthropic`, `google`/`gemini`, or any scheme you
+add with `RegisterScheme`. The token is the credential (bearer token / API
+key); the base URL is always `https://host[/path]`. `New()` loads `LLM_*`
+vars eagerly; unknown provider names also resolve lazily at Parse time
+(`my-prov/x` → `LLM_MY_PROV`).
+
+### Custom providers
+
+Implement the two-method `Provider` interface and register it:
+
+```go
+reg.RegisterProvider(myProvider) // now "myprovider/model-x" parses, chains, aliases
+```
+
+## Multimodality *(pending — Phase 3)*
+
+Attach images without knowing the target's limits; majordomo normalizes
+(downscale, re-encode, count/size limits) against the resolved target's
+declared capabilities and rejects clearly what cannot fit.
+
+```go
+resp, err := m.Generate(ctx, majordomo.Request{
+    Messages: []majordomo.Message{
+        majordomo.UserParts(majordomo.Text("what's in this image?"),
+            majordomo.Image("image/png", pngBytes)),
+    },
+})
+```
+
+## Tool calls *(canonical API ready; provider wiring pending — Phase 3)*
+
+```go
+weather := majordomo.Tool{
+    Name:        "get_weather",
+    Description: "Current weather for a city",
+    Parameters:  json.RawMessage(`{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}`),
+    Handler: func(ctx context.Context, args json.RawMessage) (any, error) {
+        var p struct{ City string `json:"city"` }
+        _ = json.Unmarshal(args, &p)
+        return map[string]any{"city": p.City, "temp_c": 21}, nil
+    },
+}
+resp, _ := m.Generate(ctx, req, majordomo.WithTools(weather))
+// resp.ToolCalls → execute → append ToolResultsMessage → continue
+```
+
+## Structured output *(canonical API ready; provider wiring pending — Phase 3)*
+
+```go
+resp, _ := m.Generate(ctx, req, majordomo.WithSchema(schemaJSON, "answer"))
+```
+
+A generic `Generate[T]` helper (schema from your struct, unmarshal into it)
+lands with the agent phase.
+
+## Agents & skills *(pending — Phases 5–6)*
+
+Agents = model + system prompt + toolboxes, running a tool-dispatch loop;
+skills = reusable instruction+tool bundles attachable to any agent.
+
+## Feature/provider support matrix
+
+| Provider | Resolve/Parse | Chat | Streaming | Tools | Structured | Images | Env DSN |
+|----------------------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| OpenAI (+compatible) | ✅ | pending | pending | pending | pending | pending | ✅ |
+| Anthropic (+compat) | ✅ | pending | pending | pending | pending | pending | ✅ |
+| Google (Gemini) | ✅ | pending | pending | pending | pending | pending | ✅ |
+| Ollama Cloud | ✅ | pending | pending | pending | pending | pending | ✅ |
+| Ollama (local) | ✅ | pending | pending | pending | pending | pending | ✅ |
+| foreman | ✅ | pending | pending | pending | pending | pending | ✅ |
+| fake (testing) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — |
+
+Cross-cutting: Parse grammar ✅ · aliases/tiers ✅ · failover chains ✅ ·
+health tracking/backoff ✅ · LLM_* env DSNs ✅ · media pipeline pending ·
+agent loop pending · skills pending · `Generate[T]` pending.
+
+## Development
+
+```bash
+go build ./... && go vet ./... && go test -race -count=1 ./...
+```
+
+The default test suite is fully hermetic (no network, no credentials).
+Live integration tests (Phase 8) are gated behind the `live` build tag and
+read `.env` (see `.env.example`; never commit `.env`).
+
+Design decisions are recorded in [docs/adr/](docs/adr/README.md);
+conventions in [CLAUDE.md](CLAUDE.md); build history in
+[progress.md](progress.md).
@@ -0,0 +1,82 @@
+package majordomo
+
+import (
+	"context"
+	"fmt"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+)
+
+// Built-in provider names. Real client implementations land per-phase
+// (see progress.md); until a provider's phase ships, its registration is a
+// stub that resolves (so specs parse and env DSNs load) but errors on use.
+const (
+	ProviderOpenAI      = "openai"
+	ProviderAnthropic   = "anthropic"
+	ProviderGoogle      = "google"
+	ProviderOllama      = "ollama"
+	ProviderOllamaCloud = "ollama-cloud"
+	ProviderForeman     = "foreman"
+)
+
+// registerBuiltins installs the built-in providers and env-DSN scheme
+// factories into a fresh registry.
+func registerBuiltins(r *Registry) {
+	stub := func(kind string) SchemeFactory {
+		return func(name string, dsn DSN) (llm.Provider, error) {
+			return &stubProvider{name: name, kind: kind, baseURL: dsn.BaseURL(), token: dsn.Token}, nil
+		}
+	}
+
+	for _, kind := range []string{
+		ProviderOpenAI, ProviderAnthropic, ProviderGoogle,
+		ProviderOllama, ProviderOllamaCloud, ProviderForeman,
+	} {
+		r.providers[kind] = &stubProvider{name: kind, kind: kind}
+		r.schemes[kind] = stub(kind)
+	}
+	// "gemini" is an alternate scheme for the Google provider.
+	r.schemes["gemini"] = stub(ProviderGoogle)
+}
+
+// stubProvider stands in for a provider implementation that lands in a
+// later phase. It resolves and carries its connection details (so Parse,
+// chains, and env loading are fully functional) but errors on use.
+type stubProvider struct {
+	name    string
+	kind    string
+	baseURL string
+	token   string
+}
+
+func (s *stubProvider) Name() string { return s.name }
+
+func (s *stubProvider) Model(id string, opts ...llm.ModelOption) (llm.Model, error) {
+	cfg := llm.ApplyModelOptions(opts)
+	return &stubModel{provider: s, id: id, cfg: cfg}, nil
+}
+
+type stubModel struct {
+	provider *stubProvider
+	id       string
+	cfg      llm.ModelConfig
+}
+
+func (m *stubModel) err() error {
+	return fmt.Errorf("majordomo: provider %q (%s) is not implemented yet", m.provider.name, m.provider.kind)
+}
+
+func (m *stubModel) Generate(context.Context, llm.Request, ...llm.Option) (*llm.Response, error) {
+	return nil, m.err()
+}
+
+func (m *stubModel) Stream(context.Context, llm.Request, ...llm.Option) (llm.Stream, error) {
+	return nil, m.err()
+}
+
+func (m *stubModel) Capabilities() llm.Capabilities {
+	if m.cfg.Capabilities != nil {
+		return *m.cfg.Capabilities
+	}
+	return llm.Capabilities{}
+}
@@ -0,0 +1,134 @@
+package majordomo
+
+import (
+	"context"
+	"errors"
+	"fmt"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/health"
+	"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+)
+
+// ErrChainExhausted reports that every element of a failover chain failed
+// (or was skipped while backed off). It is always joined with the
+// per-target errors.
+var ErrChainExhausted = errors.New("all chain targets failed")
+
+// chainTarget is one resolved element of a failover chain.
+type chainTarget struct {
+	// key identifies the target for health tracking: "provider/model-id".
+	key   string
+	model llm.Model
+}
+
+// chain implements llm.Model over an ordered list of targets with
+// health-tracked failover. A single-element spec is a chain of one — the
+// behavior (retry-on-transient, backoff bookkeeping) is identical, so
+// callers never branch on what Parse returned.
+//
+// Semantics (ADR-0006, ADR-0008):
+//   - Targets are tried head-to-tail; targets currently backed off are
+//     skipped.
+//   - A transient error is retried on the same target (ChainConfig
+//     TransientRetries, default 1). Every failed attempt counts toward the
+//     target's consecutive-failure threshold; when the tracker benches the
+//     target (default: 2 consecutive transient failures → exponential
+//     capped cooldown) the chain stops retrying it and advances.
+//   - Model-not-found advances without penalizing health. Other permanent
+//     errors fail fast by default (AdvanceOnPermanent flips this).
+//   - Any success resets the target's health.
+//   - When every target fails or is skipped, the returned error joins
+//     ErrChainExhausted with each target's reason.
+type chain struct {
+	targets []chainTarget
+	tracker *health.Tracker
+	cfg     ChainConfig
+}
+
+// Targets returns the resolved "provider/model" keys in chain order
+// (diagnostics and tests).
+func (c *chain) Targets() []string {
+	keys := make([]string, len(c.targets))
+	for i, t := range c.targets {
+		keys[i] = t.key
+	}
+	return keys
+}
+
+// Capabilities reports the head element's capabilities — the chain's
+// preferred target (ADR-0008). Per-attempt media normalization uses the
+// actual target's capabilities, not this value.
+func (c *chain) Capabilities() llm.Capabilities {
+	return c.targets[0].model.Capabilities()
+}
+
+// Generate tries each target per the chain semantics above.
+func (c *chain) Generate(ctx context.Context, req llm.Request, opts ...llm.Option) (*llm.Response, error) {
+	req = req.Apply(opts...)
+	return chainDo(ctx, c, func(ctx context.Context, t chainTarget) (*llm.Response, error) {
+		return t.model.Generate(ctx, req)
+	})
+}
+
+// Stream tries each target per the chain semantics. Failover applies to
+// establishing the stream; once a stream is open, mid-stream errors
+// propagate to the consumer rather than restarting on another target
+// (replaying half-delivered output would duplicate content).
+func (c *chain) Stream(ctx context.Context, req llm.Request, opts ...llm.Option) (llm.Stream, error) {
+	req = req.Apply(opts...)
+	return chainDo(ctx, c, func(ctx context.Context, t chainTarget) (llm.Stream, error) {
+		return t.model.Stream(ctx, req)
+	})
+}
+
+// chainDo runs the head-to-tail failover algorithm around an attempt
+// function, generic over the result type (response vs stream).
+func chainDo[T any](ctx context.Context, c *chain, attempt func(context.Context, chainTarget) (T, error)) (T, error) {
+	var zero T
+	var failures []error
+
+	for _, t := range c.targets {
+		if !c.tracker.Available(t.key) {
+			until := c.tracker.BackedOffUntil(t.key)
+			failures = append(failures, fmt.Errorf("%s: skipped (backed off until %s)", t.key, until.Format("15:04:05.000")))
+			continue
+		}
+
+		retries := c.cfg.retries()
+		for attemptN := 0; ; attemptN++ {
+			if err := ctx.Err(); err != nil {
+				return zero, err
+			}
+			result, err := attempt(ctx, t)
+			if err == nil {
+				c.tracker.ReportSuccess(t.key)
+				return result, nil
+			}
+
+			class := c.cfg.classify(err)
+			if class == llm.ClassPermanent {
+				if errors.Is(err, llm.ErrModelNotFound) || c.cfg.AdvanceOnPermanent {
+					// Not a health problem (or policy says keep going):
+					// advance without penalizing the target.
+					failures = append(failures, fmt.Errorf("%s: %w", t.key, err))
+					break
+				}
+				// Failing over cannot fix a bad request or bad credentials.
+				return zero, fmt.Errorf("%s: %w", t.key, err)
+			}
+
+			// Transient: every failed attempt counts toward the target's
+			// consecutive-failure threshold. Retry the same target while
+			// attempts remain — but advance as soon as the tracker benches
+			// it (a freshly backed-off target is not worth more retries).
+			benched := c.tracker.ReportFailure(t.key)
+			if !benched && attemptN < retries {
+				continue
+			}
+			failures = append(failures, fmt.Errorf("%s: %w", t.key, err))
+			break
+		}
+	}
+
+	return zero, errors.Join(append([]error{ErrChainExhausted}, failures...)...)
+}
@@ -0,0 +1,207 @@
+package majordomo
+
+import (
+	"context"
+	"errors"
+	"io"
+	"net/http"
+	"strings"
+	"testing"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+	"gitea.stevedudenhoeffer.com/steve/majordomo/provider/fake"
+)
+
+func transientErr(model string) error {
+	return &llm.APIError{Provider: "fp", Model: model, Status: http.StatusServiceUnavailable, Message: "overloaded"}
+}
+
+func authErr(model string) error {
+	return &llm.APIError{Provider: "fp", Model: model, Status: http.StatusUnauthorized, Message: "bad key"}
+}
+
+func notFoundErr(model string) error {
+	return &llm.APIError{Provider: "fp", Model: model, Status: http.StatusNotFound, Message: "no such model"}
+}
+
+// TestChainSingleTransientRecoversViaRetry: one blip, same target succeeds
+// on the retry — the request never fails over.
+func TestChainSingleTransientRecoversViaRetry(t *testing.T) {
+	r := newTestRegistry(t)
+	fp := fake.New("fp")
+	r.RegisterProvider(fp)
+	fp.Enqueue("a", fake.Fail(transientErr("a")), fake.Reply("recovered"))
+
+	m, err := r.Parse("fp/a,fp/b")
+	if err != nil {
+		t.Fatalf("Parse: %v", err)
+	}
+	resp, err := m.Generate(context.Background(), Request{Messages: []Message{UserText("hi")}})
+	if err != nil {
+		t.Fatalf("Generate: %v", err)
+	}
+	if resp.Text() != "recovered" {
+		t.Errorf("text = %q, want recovered (same-target retry)", resp.Text())
+	}
+	if got := fp.CallCount("a"); got != 2 {
+		t.Errorf("target a saw %d calls, want 2 (initial + retry)", got)
+	}
+	if got := fp.CallCount("b"); got != 0 {
+		t.Errorf("target b saw %d calls, want 0", got)
+	}
+}
+
+// TestChainRepeatedTransientFailsOver: the head exhausts its retry, gets
+// benched, and the chain advances to the next element.
+func TestChainRepeatedTransientFailsOver(t *testing.T) {
+	r := newTestRegistry(t)
+	fp := fake.New("fp")
+	r.RegisterProvider(fp)
+	fp.Enqueue("a", fake.Fail(transientErr("a")), fake.Fail(transientErr("a")))
+	fp.Enqueue("b", fake.Reply("from-b"), fake.Reply("from-b"))
+
+	m, err := r.Parse("fp/a,fp/b")
+	if err != nil {
+		t.Fatalf("Parse: %v", err)
+	}
+	resp, err := m.Generate(context.Background(), Request{Messages: []Message{UserText("hi")}})
+	if err != nil {
+		t.Fatalf("Generate: %v", err)
+	}
+	if resp.Text() != "from-b" {
+		t.Errorf("text = %q, want from-b", resp.Text())
+	}
+	// Two consecutive transient failures hit the default threshold: the
+	// head is now backed off and skipped on the next request.
+	if r.Health().Available("fp/a") {
+		t.Error("fp/a should be backed off after two consecutive transient failures")
+	}
+	resp2, err := m.Generate(context.Background(), Request{Messages: []Message{UserText("again")}})
+	if err != nil {
+		t.Fatalf("Generate #2: %v", err)
+	}
+	if resp2.Text() != "from-b" {
+		t.Errorf("second response = %q, want from-b (head skipped)", resp2.Text())
+	}
+	if got := fp.CallCount("a"); got != 2 {
+		t.Errorf("backed-off target a saw %d calls, want 2", got)
+	}
+}
+
+// TestChainPermanentAuthFailsFast: failing over cannot fix bad credentials.
+func TestChainPermanentAuthFailsFast(t *testing.T) {
+	r := newTestRegistry(t)
+	fp := fake.New("fp")
+	r.RegisterProvider(fp)
+	fp.Enqueue("a", fake.Fail(authErr("a")))
+
+	m, _ := r.Parse("fp/a,fp/b")
+	_, err := m.Generate(context.Background(), Request{Messages: []Message{UserText("hi")}})
+	if err == nil {
+		t.Fatal("want error")
+	}
+	var apiErr *llm.APIError
+	if !errors.As(err, &apiErr) || apiErr.Status != http.StatusUnauthorized {
+		t.Errorf("error = %v, want the 401 APIError", err)
+	}
+	if got := fp.CallCount("b"); got != 0 {
+		t.Errorf("target b saw %d calls, want 0 (fail-fast)", got)
+	}
+	if !r.Health().Available("fp/a") {
+		t.Error("permanent errors must not penalize health")
+	}
+}
+
+// TestChainModelNotFoundAdvances: 404 advances without a health penalty.
+func TestChainModelNotFoundAdvances(t *testing.T) {
+	r := newTestRegistry(t)
+	fp := fake.New("fp")
+	r.RegisterProvider(fp)
+	fp.Enqueue("a", fake.Fail(notFoundErr("a")))
+	fp.Enqueue("b", fake.Reply("from-b"))
+
+	m, _ := r.Parse("fp/a,fp/b")
+	resp, err := m.Generate(context.Background(), Request{Messages: []Message{UserText("hi")}})
+	if err != nil {
+		t.Fatalf("Generate: %v", err)
+	}
+	if resp.Text() != "from-b" {
+		t.Errorf("text = %q, want from-b", resp.Text())
+	}
+	if !r.Health().Available("fp/a") {
+		t.Error("model-not-found must not penalize health")
+	}
+}
+
+// TestChainExhaustedJoinsErrors: when everything fails the error names what
+// was tried and why each failed.
+func TestChainExhaustedJoinsErrors(t *testing.T) {
+	r := newTestRegistry(t)
+	fp := fake.New("fp")
+	r.RegisterProvider(fp)
+	fp.Enqueue("a", fake.Fail(transientErr("a")), fake.Fail(transientErr("a")))
+	fp.Enqueue("b", fake.Fail(notFoundErr("b")))
+
+	m, _ := r.Parse("fp/a,fp/b")
+	_, err := m.Generate(context.Background(), Request{Messages: []Message{UserText("hi")}})
+	if !errors.Is(err, ErrChainExhausted) {
+		t.Fatalf("error = %v, want ErrChainExhausted", err)
+	}
+	for _, frag := range []string{"fp/a", "fp/b", "overloaded", "no such model"} {
+		if !strings.Contains(err.Error(), frag) {
+			t.Errorf("joined error %q should mention %q", err.Error(), frag)
+		}
+	}
+}
+
+func TestChainStream(t *testing.T) {
+	r := newTestRegistry(t)
+	fp := fake.New("fp")
+	r.RegisterProvider(fp)
+	fp.Enqueue("a", fake.Fail(transientErr("a")), fake.Fail(transientErr("a")))
+	fp.Enqueue("b", fake.Reply("streamed"))
+
+	m, _ := r.Parse("fp/a,fp/b")
+	s, err := m.Stream(context.Background(), Request{Messages: []Message{UserText("hi")}})
+	if err != nil {
+		t.Fatalf("Stream: %v", err)
+	}
+	defer s.Close()
+
+	var text string
+	var final *Response
+	for {
+		ev, err := s.Next()
+		if errors.Is(err, io.EOF) {
+			break
+		}
+		if err != nil {
+			t.Fatalf("Next: %v", err)
+		}
+		text += ev.TextDelta
+		if ev.Response != nil {
+			final = ev.Response
+		}
+	}
+	if text != "streamed" {
+		t.Errorf("streamed text = %q, want streamed", text)
+	}
+	if final == nil {
+		t.Fatal("missing final response event")
+	}
+}
+
+// TestChainContextCancellation: a canceled context aborts immediately.
+func TestChainContextCancellation(t *testing.T) {
+	r := newTestRegistry(t)
+	fp := fake.New("fp")
+	r.RegisterProvider(fp)
+
+	ctx, cancel := context.WithCancel(context.Background())
+	cancel()
+	m, _ := r.Parse("fp/a,fp/b")
+	_, err := m.Generate(ctx, Request{Messages: []Message{UserText("hi")}})
+	if !errors.Is(err, context.Canceled) {
+		t.Errorf("error = %v, want context.Canceled", err)
+	}
+}
@@ -0,0 +1,46 @@
+# ADR-0001: Package layout — canonical types in a leaf `llm` package, root re-exports
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Provider implementations (openai, anthropic, google, ollama/foreman) must share
+the canonical types (Message, Request, Response, Capabilities, Model, Provider).
+If those types lived in the root `majordomo` package, the root could not also
+register built-in providers (root → provider/openai → root is an import cycle).
+go-llm solved this with a `v2/provider` leaf package; the kickoff sketch puts
+the Provider interface in `provider/provider.go` and the message types at root,
+which recreates the cycle.
+
+## Decision
+
+- All canonical contract types live in the leaf package
+  `majordomo/llm` (Message, Part, Request, Response, Option, Tool, Toolbox,
+  Capabilities, Stream, Model, Provider, error classification). It imports
+  nothing else in the module.
+- The root `majordomo` package re-exports every canonical type via type
+  aliases (plus constructor/option wrappers), so consumers write
+  `majordomo.Request`, `majordomo.UserText(...)` and rarely import `llm`.
+- The root owns assembly: Registry, Parse, env-DSN loading, the chain
+  executor, and (from Phase 3) registration of real provider clients.
+- The planned `resolve/` package is folded into the root: the grammar needs
+  registry state (aliases, providers, env fallback) at every expansion step,
+  and a callback interface between two packages bought nothing but
+  indirection.
+- `health/`, `media/`, `provider/<impl>/`, `provider/fake/`, `agent/`, and
+  `skill/` are subpackages importing `llm` (and never each other, except
+  agent → skill).
+
+## Consequences
+
+- No import cycles; new providers are additive subpackages.
+- Consumers get the flat one-import API the kickoff sketches.
+- Type aliases (not wrappers) mean zero conversion cost and full
+  interchangeability between `majordomo.X` and `llm.X`.
+
+## Alternatives considered
+
+- **Everything in root.** No cycles only if providers also live in root —
+  a single giant package. Rejected.
+- **Self-registering providers via package init() side effects.** Hides
+  wiring, breaks multi-registry isolation, surprises tests. Rejected.
@@ -0,0 +1,53 @@
+# ADR-0002: Canonical message/content model
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Every provider has a different wire shape for conversations, content,
+tool calls, and system prompts. majordomo needs one canonical shape that all
+providers translate to/from, expressive enough for multimodality and tool
+loops, small enough to keep providers honest.
+
+## Decision
+
+- `Message{Role, Parts, ToolCalls, ToolResults}` with roles system / user /
+  assistant / tool. `Part` is a **sealed** interface (`TextPart`,
+  `ImagePart`) so providers can switch exhaustively; new media kinds are
+  deliberate API changes, not silent pass-throughs.
+- `ImagePart` is **bytes + MIME only** — no URL form. The media pipeline
+  must inspect/resize/transcode images against target capabilities, which
+  requires bytes; fetching remote URLs is the caller's job, not a hidden
+  network dependency inside a model call.
+- `Request.System` is a dedicated top-level field (maps to Anthropic
+  `system`, Google `SystemInstruction`, an OpenAI/Ollama system message).
+  RoleSystem messages in the history are also accepted and folded by
+  providers. Request also carries Tools, ToolChoice, Schema/SchemaName, and
+  sampling knobs; per-call mutation happens via `Option` funcs applied to a
+  copy, so Request values are reusable.
+- Model ids never carry behavior suffixes: unlike go-llm there is **no
+  `:low/:medium/:high` reasoning-suffix grammar** (it conflicts with
+  verbatim model ids like `minimax-m3:cloud`, see ADR-0003). Reasoning
+  effort will be a request option when providers land.
+- `Response{Parts, ToolCalls, FinishReason, Usage, Model, Raw}` — `Model`
+  names the target that actually served the request (vital with chains);
+  `Raw` is the provider-native escape hatch, never required.
+- Streaming (`Stream.Next() → StreamEvent`): text deltas stream as they
+  arrive; **tool-call arguments are buffered until complete** (consumers
+  never see partial JSON); the final event carries the accumulated
+  `*Response`; `io.EOF` terminates.
+
+## Consequences
+
+- Providers stay translation layers; nothing provider-specific leaks into
+  the canonical API.
+- Callers needing remote images fetch them first — explicit, testable.
+- Partial-tool-call streaming UIs are out of scope (acceptable: arguments
+  are rarely useful before they parse).
+
+## Alternatives considered
+
+- Open `Part` interface — silent content drops on unknown kinds. Rejected.
+- URL image parts with lazy fetch — hidden I/O inside Generate, breaks
+  capability normalization. Rejected.
+- go-llm-style reasoning suffixes — see ADR-0003. Rejected.
@@ -0,0 +1,57 @@
+# ADR-0003: Parse grammar — verbatim model ids, inline alias expansion, chains
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Callers (mort first) address models by string: single targets, tier aliases,
+and comma-separated failover chains, with custom and env-defined providers as
+first-class elements. go-llm's grammar is close but nests alias-chains as
+composite Models and strips `:low/:medium/:high` reasoning suffixes, which
+collides with Ollama-style tags (`minimax-m3:cloud`) and Google-style ids.
+
+## Decision
+
+Grammar (binding, from the kickoff):
+
+```
+spec    := element ("," element)*
+element := target | alias
+target  := provider "/" model      # model = everything after the FIRST "/",
+                                   # up to the next comma, passed VERBATIM
+alias   := bare token, no slash
+```
+
+- Provider resolution order per target: registered providers (built-ins,
+  RegisterProvider, eagerly env-loaded) → lazy `LLM_{UPPER(name)}` env DSN
+  (ADR-0004) → error naming both places checked.
+- Aliases expand **inline** wherever they appear (head/middle/tail),
+  recursively, into the flat element list. Cycles are detected via the
+  expansion stack and return `ErrAliasCycle` — never a hang. Inline (not
+  nested-Model, as in go-llm) expansion keeps one flat chain so health
+  skipping and error reporting see every element uniformly.
+- Duplicate elements after expansion are dropped (first occurrence wins):
+  retrying an already-failed target in the same pass is never useful.
+- A single element and a multi-element chain return the same `Model`
+  (a chain of one) — identical retry/health semantics, callers never branch.
+- **No reasoning-suffix stripping.** mort's `:high` dialect is handled by
+  mort's spec layer during migration; majordomo will expose reasoning effort
+  as an explicit request option instead.
+- The package-level `Default()` registry (lazy, loads process env) backs
+  `majordomo.Parse` for go-llm-style one-call ergonomics; `New()` builds
+  isolated registries for tests/multi-tenant use.
+
+## Consequences
+
+- `m1/richardyoung/qwen3-14b-abliterated:q4_K_M` (a real mort tier value)
+  parses as provider `m1`, model `richardyoung/qwen3-14b-abliterated:q4_K_M`.
+- A bare token that is a provider name yields a targeted error
+  ("use openai/<model-id>").
+- Alias updates after Parse don't affect already-built Models (expansion is
+  at Parse time). mort re-parses per request, so DB-tier edits still apply.
+
+## Alternatives considered
+
+- Nested alias expansion (go-llm): opaque chains inside chains; health
+  skipping can't see the elements. Rejected.
+- Reasoning suffixes in the grammar: breaks verbatim ids. Rejected.
@@ -0,0 +1,60 @@
+# ADR-0004: LLM_* env-DSN provider definitions (go-llm parity, plus eager load)
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Steve's deployments define providers via env vars that must keep working
+unchanged:
+
+```
+LLM_M1=foreman://token@foreman-m1.orgrimmar.dudenhoeffer.casa
+LLM_M5=foreman://token@foreman-m5.orgrimmar.dudenhoeffer.casa
+```
+
+go-llm (v2/parse.go) implements this **lazily only**: `Parse("m5/x")` misses
+the registry, computes `LLM_` + UPPER(name) with `-`→`_`, reads exactly that
+var, parses `scheme://[token@]host[/path]` by plain string splits, requires
+the scheme to be a registered provider, and dials `https://` + host. There is
+no environment scan. The kickoff additionally requires `New()` to load LLM_*
+providers eagerly and a testable `LoadEnv(map)`.
+
+## Decision
+
+Implement **both** paths over one DSN parser (byte-for-byte go-llm
+semantics — `://` split, first-`@` split, trailing-`/` trim, ErrInvalidDSN on
+missing scheme/host, base URL always `https://host[/path]`):
+
+- **Eager:** `New()` scans the process environment for `LLM_<NAME>` and
+  registers each as provider `lower(<NAME>)` (underscores preserved:
+  `LLM_MY_BOX` → `my_box`). `LoadEnv(map[string]string)` is the explicit,
+  testable entry. Malformed entries never fail construction: they are
+  recorded per-name, returned joined from LoadEnv, and surface from Parse
+  only when that name is actually referenced (matching go-llm's
+  fail-on-use behavior).
+- **Lazy (go-llm parity):** an unknown provider name in Parse falls back to
+  `LLM_{UPPER(name, - → _)}`, so hyphenated spec names (`my-prov/x` →
+  `LLM_MY_PROV`) work exactly as in go-llm. Lazily resolved providers are
+  cached in the registry.
+- The DSN **scheme** selects a `SchemeFactory` (foreman, ollama,
+  ollama-cloud, openai, anthropic, google, gemini; extensible via
+  `RegisterScheme`). The factory receives the registry name and the parsed
+  DSN (token = credential, `https://host` = base URL).
+
+## Consequences
+
+- Existing muscle memory carries over: every go-llm-resolvable LLM_* var
+  resolves identically here.
+- Eager loading additionally makes env providers visible to discovery
+  (`Provider(name)`) before first use.
+- An env DSN cannot express plain-http endpoints (https is forced) — same
+  limitation as go-llm, kept deliberately for parity; local Ollama uses the
+  `ollama` provider's own default (`http://localhost:11434`) rather than a
+  DSN.
+
+## Alternatives considered
+
+- `url.Parse`-based DSN parsing: subtly different (percent-decoding,
+  userinfo passwords). Parity wins. Rejected.
+- Failing New() on malformed LLM_* vars: one stray var would break every
+  consumer at startup. Rejected.
@@ -0,0 +1,41 @@
+# ADR-0005: Provider interface and the capabilities model
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Each provider — and some individual models — imposes different limits (image
+dimensions/bytes/MIME/count, tools, structured output, streaming, context
+size). Callers must not need to know them; the library must normalize or
+clearly reject.
+
+## Decision
+
+- `Provider` is minimal: `Name()` and `Model(id, opts...) (Model, error)`.
+  Model ids pass through verbatim; providers never validate ids against a
+  catalog (models churn weekly; catalogs rot).
+- `Capabilities` is a plain struct declared **per provider** with
+  **per-model overrides** via `WithCapabilities` (a `ModelOption`). Zero
+  values mean: `MaxImagesPerReq == 0` → images unsupported;
+  `MaxImageBytes/MaxImageDimension/ContextWindow == 0` → no declared limit;
+  empty `AllowedImageMIME` → any type.
+- Providers construct without error even when credentials are missing; the
+  failure surfaces as an auth error at request time (and a chain can fail
+  over past it). Construction-time validation would make `New()` fragile.
+- Until a provider's implementation phase lands, built-ins register as
+  **stubs**: they resolve in Parse (so chains, aliases, and env DSNs are
+  fully functional) and return a clear "not implemented yet" error on use.
+
+## Consequences
+
+- The media pipeline (Phase 3, ADR to follow) can normalize against any
+  target uniformly.
+- Adding a provider is additive: implement two methods + declare
+  capabilities.
+
+## Alternatives considered
+
+- Capability methods on Model with provider-specific logic — pushes limits
+  knowledge into every caller. Rejected.
+- Model catalogs with validation — stale within weeks, breaks pass-through
+  targets like foreman. Rejected.
@@ -0,0 +1,48 @@
+# ADR-0006: Model health tracking and backoff
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Ollama Cloud models intermittently return "high demand" errors. mort's
+behavior to preserve: one blip should not fail a request (retry); a model
+that keeps failing should be benched so chains skip it, then re-admitted
+after a cooldown. majordomo owns this (the "model health tracker").
+
+## Decision
+
+In-memory, process-local, thread-safe tracker in `health/`, keyed by
+`"provider/model-id"`, with an **injected clock** (`func() time.Time`) so
+every backoff path is unit-testable without sleeping.
+
+- **Classification** (`llm.Classify`, overridable via `ChainConfig.Classify`):
+  transient = HTTP 408/429/5xx, network timeouts, connection refused/reset,
+  DNS failures, `context.DeadlineExceeded`; permanent = HTTP
+  400/401/403/404/405/422, `ErrModelNotFound`, `context.Canceled` (the
+  caller gave up — retrying defies intent). **Unknown errors default to
+  transient**: failing over can only help availability, and a wrongly
+  benched model self-heals via cooldown, while a wrongly fail-fasted request
+  is lost.
+- **Counting:** every failed transient *attempt* increments the target's
+  consecutive-failure count; any success resets count **and** backoff
+  exponent. At threshold (default **2**) the target is benched until
+  `now + cooldown`, with cooldown = base (default **5s**) × multiplier
+  (default **2**) per consecutive backoff round, capped (default **5m**).
+  After the bench triggers, the count resets, so re-benching needs a fresh
+  run of failures — but at the doubled cooldown.
+- All knobs (threshold, base/cap/multiplier, clock, classifier, retry count)
+  are configuration with the above defaults baked in.
+- **No persistence, no interface.** The tracker is a concrete type; health
+  is process-local by design (out-of-scope guardrail). A consumer wanting
+  shared state can wrap the registry; we do not build for it now.
+
+## Consequences
+
+- Deterministic tests via fake clock; no `time.Sleep` anywhere.
+- Two providers addressing the same upstream model (e.g. `m1/x` and `m5/x`)
+  track independently — correct, since the backends are different machines.
+
+## Alternatives considered
+
+- Persistent/pluggable health store — explicitly out of scope. Rejected.
+- Unknown→permanent default — drops availability on novel errors. Rejected.
@@ -0,0 +1,31 @@
+# ADR-0007: Dependency policy — stdlib-first, hand-rolled REST clients
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+go-llm leans on SDKs (openai-go, go-anthropic, genai) and carries their
+transitive weight and churn. The kickoff mandates minimal dependencies with
+full control over multimodal payloads and capability handling.
+
+## Decision
+
+- **Hand-rolled `net/http` JSON clients** for OpenAI(+compatible),
+  Anthropic(+compatible), Ollama (cloud + local), and foreman. Their REST
+  surfaces are small and stable; owning the wire shapes gives exact control
+  over tool calls, structured output, streaming, and image payloads.
+- **One approved third-party dependency:** the official Google Gen AI Go SDK
+  (`google.golang.org/genai`) for the Gemini provider — Google's surface
+  moves too much to hand-roll profitably.
+- Image normalization uses stdlib `image`, `image/jpeg`, `image/png`.
+  `golang.org/x/image` may be added **only** if a needed format demands it,
+  via a new ADR.
+- Any other third-party dependency requires its own ADR justifying it.
+- No persistent store, no metrics stack, no config framework, no CLI beyond
+  `examples/` (out-of-scope guardrails).
+
+## Consequences
+
+- `go.mod` stays near-empty; consumers inherit almost nothing transitively.
+- We own wire-format drift: provider docs are verified against current
+  documentation at implementation time and recorded in the provider ADRs.
@@ -0,0 +1,60 @@
+# ADR-0008: Failover-chain execution semantics
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+A parsed spec is an ordered chain of targets sharing the registry's health
+tracker. The executor must realize the kickoff's failover story (retry one
+blip; bench repeat offenders; skip benched targets; clear exhaustion errors)
+identically for chains of one and many.
+
+## Decision
+
+For each request, iterate elements head-to-tail:
+
+1. **Skip** targets currently benched (recorded in the exhaustion error).
+2. Attempt the target. On success → report success (resets health), return.
+3. On error, classify:
+   - **Permanent + model-not-found** → advance, no health penalty.
+   - **Permanent otherwise** (auth, malformed) → **fail fast** by default —
+     failing over cannot fix a bad request; `ChainConfig.AdvanceOnPermanent`
+     flips this for callers who prefer availability.
+   - **Transient** → report the failed attempt to the tracker; retry the
+     same target while attempts remain (`TransientRetries`, default 1)
+     **unless the tracker just benched it**, in which case advance
+     immediately.
+4. All elements failed/skipped → return `errors.Join(ErrChainExhausted,
+   per-target reasons...)` naming every target and why.
+
+Other decisions:
+
+- **Capabilities() = head element's capabilities.** The head is the
+  preferred target and the honest answer to "what should I prepare for?".
+  Per-attempt media normalization (Phase 3) uses the *actual* target's
+  capabilities, so fallbacks still get correctly-fitted inputs.
+  Intersection semantics were rejected: a rarely-used tail fallback would
+  artificially constrain every request.
+- **Streaming failover applies to stream establishment only.** Once a
+  stream is open, mid-stream errors propagate; silently restarting on
+  another target would re-deliver partial output.
+- `context.Canceled` aborts the chain immediately between and during
+  attempts.
+- Duplicate post-expansion elements were already dropped at Parse
+  (ADR-0003).
+
+## Consequences
+
+- "One transient error is fine" holds: blip → same-target retry succeeds,
+  no failover, one health mark that the success immediately clears... and
+  with default knobs (retries=1, threshold=2) a target whose retry also
+  fails is benched in the same request and the chain advances — exactly the
+  kickoff narrative.
+- Single-target specs get the same retry/backoff behavior for free.
+
+## Alternatives considered
+
+- Per-request (not per-attempt) failure counting — needs two failed
+  *requests* to bench, letting a dead model eat the retry budget twice.
+  Rejected as weaker than the kickoff's story.
+- Intersection capabilities — see above. Rejected.
@@ -0,0 +1,14 @@
+# Architecture Decision Records
+
+One decision per file, append-only; supersede rather than rewrite.
+
+| ADR | Title | Status |
+|-----|-------|--------|
+| [0001](0001-package-layout.md) | Package layout — canonical types in leaf `llm`, root re-exports | Accepted |
+| [0002](0002-canonical-message-model.md) | Canonical message/content model | Accepted |
+| [0003](0003-parse-grammar.md) | Parse grammar — verbatim ids, inline alias expansion, chains | Accepted |
+| [0004](0004-env-dsn-providers.md) | LLM_* env-DSN provider definitions (go-llm parity + eager load) | Accepted |
+| [0005](0005-provider-capabilities.md) | Provider interface and capabilities model | Accepted |
+| [0006](0006-health-and-backoff.md) | Model health tracking and backoff | Accepted |
+| [0007](0007-dependency-policy.md) | Dependency policy — stdlib-first, hand-rolled REST clients | Accepted |
+| [0008](0008-chain-semantics.md) | Failover-chain execution semantics | Accepted |
@@ -0,0 +1,84 @@
+# Phase 1 design summary (for after-the-fact review)
+
+Written at the Phase 1 → 2 boundary of the unattended build run
+(2026-06-10). Captures the public surface and the decisions behind it.
+Authoritative details live in the ADRs; this is the review digest.
+
+## What the library looks like to a consumer
+
+```go
+reg := majordomo.New()                      // built-ins + LLM_* env providers
+reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
+
+m, err := reg.Parse("m5/qwen3:30b,ollama-cloud/kimi-k2.6:cloud,thinking")
+resp, err := m.Generate(ctx, majordomo.Request{
+    System:   "You are terse.",
+    Messages: []majordomo.Message{majordomo.UserText("hi")},
+}, majordomo.WithMaxTokens(200))
+```
+
+- `Model` = `Generate` / `Stream` / `Capabilities`; a chain and a single
+  target are the same interface.
+- `Provider` = `Name` / `Model(id, opts...)`; ids verbatim, no catalogs.
+- Canonical types live in `majordomo/llm`, re-exported at root via aliases
+  (ADR-0001) — providers import `llm` only.
+
+## Parse grammar (ADR-0003)
+
+`spec := element ("," element)*`; element = `provider/model` (model id =
+everything after the first slash, verbatim) or a bare alias token expanded
+inline + recursively with cycle detection. Both kickoff README examples are
+covered by tests, including the trailing-`thinking` variant and dedup of
+overlapping alias expansions.
+
+**Deviation from go-llm worth reviewing:** no `:low/:medium/:high`
+reasoning-suffix stripping — it conflicts with verbatim ids
+(`minimax-m3:cloud`, `richardyoung/qwen3-14b-abliterated:q4_K_M` in mort's
+tiers). Plan: reasoning effort becomes an explicit request option when
+providers land; mort's wrapper translates its legacy suffix dialect during
+Phase 9. If you want suffix parity instead, it's an additive change behind
+a RegistryOption.
+
+## LLM_* env DSNs (ADR-0004)
+
+Parser is byte-for-byte go-llm (`scheme://[token@]host[/path]`, https
+forced, fail-on-use for malformed values). Two resolution paths:
+eager scan in `New()`/`LoadEnv(map)` (kickoff requirement;
+`LLM_M1` → provider `m1`) **plus** go-llm's lazy `LLM_{UPPER(name)}`
+fallback at Parse time (so hyphenated names keep working). Schemes are
+factories (`RegisterScheme`) — consumers can bind custom provider kinds to
+DSNs.
+
+## Health & chains (ADR-0006, ADR-0008)
+
+Clock-injected in-memory tracker keyed `provider/model`. Transient vs
+permanent via `llm.Classify` (unknown → transient; `context.Canceled` →
+permanent). Defaults: 1 same-target retry; bench after 2 consecutive failed
+attempts; cooldown 5s ×2 capped 5m; success resets everything. Chains skip
+benched targets, advance penalty-free on 404, fail fast on auth/malformed
+(flippable via `AdvanceOnPermanent`), and join per-target reasons on
+exhaustion. Chain `Capabilities()` = head element (per-attempt media
+normalization will use the actual target, Phase 3). Streaming failover
+covers stream establishment only.
+
+## Flagged for reconsideration
+
+1. **Reasoning suffixes** (above) — deliberate deviation, easy to add back.
+2. **Duplicate-element dedup in chains** (first occurrence wins): right for
+   health semantics, but means `a,b,a` won't retry `a` at the tail even
+   after `b` fails. Believed correct (same request, same bench state);
+   flag if "retry head last" matters to you.
+3. **`AdvanceOnPermanent` default = fail-fast** on auth/malformed errors:
+   matches the kickoff; mort's old behavior was closer to
+   advance-on-everything. Phase 9 can set the flag per-registry if mort's
+   UX prefers availability.
+4. **Stub built-ins**: until Phases 3–4, `openai/...` etc. parse fine and
+   error on use with "not implemented yet". Chains mixing stubs and real
+   providers will fail over past stubs naturally (the error classifies
+   transient) — temporary, gone by Phase 4.
+
+## ADR set
+
+0001 package layout · 0002 message model · 0003 parse grammar ·
+0004 env DSNs · 0005 provider/capabilities · 0006 health/backoff ·
+0007 dependency policy · 0008 chain semantics
@@ -0,0 +1,120 @@
+package majordomo
+
+import (
+	"errors"
+	"fmt"
+	"sort"
+	"strings"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+)
+
+// ErrInvalidDSN reports a malformed env-DSN value.
+var ErrInvalidDSN = errors.New("invalid DSN")
+
+// ErrUnknownProvider reports a spec element whose provider could not be
+// resolved through the registry or the LLM_* environment.
+var ErrUnknownProvider = errors.New("unknown provider")
+
+// DSN is a parsed provider Data Source Name, as used in LLM_* env vars.
+//
+// Format (go-llm parity): scheme://[token@]host[/path]
+//
+//	LLM_M1=foreman://test-token@foreman-m1.example.com
+//
+// defines provider "m1": a foreman target at https://foreman-m1.example.com
+// authenticated with the bearer token "test-token".
+type DSN struct {
+	// Scheme selects the provider implementation: "foreman", "ollama",
+	// "ollama-cloud", "openai", "anthropic", "google"/"gemini", or any
+	// custom scheme registered with RegisterScheme.
+	Scheme string
+	// Token is the provider secret (bearer token or API key); empty = none.
+	Token string
+	// Host is hostname[:port][/path] with no scheme prefix and no trailing
+	// slash.
+	Host string
+}
+
+// BaseURL returns the https base URL for the DSN host (go-llm parity:
+// env-defined providers always speak TLS).
+func (d DSN) BaseURL() string { return "https://" + d.Host }
+
+// ParseDSN parses a raw DSN string. The algorithm matches go-llm exactly:
+// split on "://", then an optional "@" separates the token from the host;
+// trailing slashes on the host are trimmed.
+func ParseDSN(raw string) (DSN, error) {
+	scheme, rest, found := strings.Cut(raw, "://")
+	if !found {
+		return DSN{}, fmt.Errorf("%w: missing scheme://: %q", ErrInvalidDSN, raw)
+	}
+
+	var token, host string
+	if before, after, hasAt := strings.Cut(rest, "@"); hasAt {
+		token = before
+		host = after
+	} else {
+		host = rest
+	}
+	host = strings.TrimRight(host, "/")
+	if host == "" {
+		return DSN{}, fmt.Errorf("%w: missing host: %q", ErrInvalidDSN, raw)
+	}
+	return DSN{Scheme: scheme, Token: token, Host: host}, nil
+}
+
+// LoadEnv registers a provider for every LLM_<NAME> entry in env. <NAME> is
+// lowercased to form the registry name (LLM_M1 → "m1"); the value is a DSN
+// whose scheme selects the factory. Entries that fail to parse are recorded
+// and their error is returned (joined) — and also surfaces later if the
+// name is referenced in Parse — but valid entries always register.
+//
+// New() calls this with the process environment; tests call it explicitly.
+func (r *Registry) LoadEnv(env map[string]string) error {
+	// Deterministic order makes error output stable.
+	keys := make([]string, 0, len(env))
+	for k := range env {
+		if strings.HasPrefix(k, "LLM_") && len(k) > len("LLM_") {
+			keys = append(keys, k)
+		}
+	}
+	sort.Strings(keys)
+
+	var errs []error
+	for _, key := range keys {
+		name := strings.ToLower(strings.TrimPrefix(key, "LLM_"))
+		p, err := r.providerFromDSN(name, env[key])
+		if err != nil {
+			err = fmt.Errorf("%s: %w", key, err)
+			errs = append(errs, err)
+			r.mu.Lock()
+			r.envErrs[name] = err
+			r.mu.Unlock()
+			continue
+		}
+		r.mu.Lock()
+		r.providers[name] = p
+		delete(r.envErrs, name)
+		r.mu.Unlock()
+	}
+	return errors.Join(errs...)
+}
+
+// providerFromDSN parses a DSN and builds a provider via its scheme factory.
+func (r *Registry) providerFromDSN(name, raw string) (llm.Provider, error) {
+	dsn, err := ParseDSN(raw)
+	if err != nil {
+		return nil, err
+	}
+	r.mu.RLock()
+	factory, ok := r.schemes[dsn.Scheme]
+	r.mu.RUnlock()
+	if !ok {
+		return nil, fmt.Errorf("%w: DSN scheme %q is not a registered scheme", ErrUnknownProvider, dsn.Scheme)
+	}
+	p, err := factory(name, dsn)
+	if err != nil {
+		return nil, fmt.Errorf("scheme %q: %w", dsn.Scheme, err)
+	}
+	return p, nil
+}
@@ -0,0 +1,195 @@
+package majordomo
+
+import (
+	"errors"
+	"slices"
+	"strings"
+	"testing"
+)
+
+func TestParseDSN(t *testing.T) {
+	tests := []struct {
+		raw     string
+		want    DSN
+		wantErr error
+	}{
+		{
+			raw:  "foreman://test-token-change-me@foreman-m1.orgrimmar.dudenhoeffer.casa",
+			want: DSN{Scheme: "foreman", Token: "test-token-change-me", Host: "foreman-m1.orgrimmar.dudenhoeffer.casa"},
+		},
+		{
+			raw:  "ollama://my-host.example:11434",
+			want: DSN{Scheme: "ollama", Token: "", Host: "my-host.example:11434"},
+		},
+		{
+			raw:  "openai://sk-key@api.example.com/v1/",
+			want: DSN{Scheme: "openai", Token: "sk-key", Host: "api.example.com/v1"},
+		},
+		{raw: "no-scheme-here", wantErr: ErrInvalidDSN},
+		{raw: "foreman://token@", wantErr: ErrInvalidDSN},
+		{raw: "foreman:///", wantErr: ErrInvalidDSN},
+	}
+	for _, tt := range tests {
+		got, err := ParseDSN(tt.raw)
+		if tt.wantErr != nil {
+			if !errors.Is(err, tt.wantErr) {
+				t.Errorf("ParseDSN(%q) error = %v, want %v", tt.raw, err, tt.wantErr)
+			}
+			continue
+		}
+		if err != nil {
+			t.Errorf("ParseDSN(%q): %v", tt.raw, err)
+			continue
+		}
+		if got != tt.want {
+			t.Errorf("ParseDSN(%q) = %+v, want %+v", tt.raw, got, tt.want)
+		}
+	}
+}
+
+func TestDSNBaseURL(t *testing.T) {
+	d := DSN{Scheme: "foreman", Host: "h.example:8443/base"}
+	if got, want := d.BaseURL(), "https://h.example:8443/base"; got != want {
+		t.Errorf("BaseURL = %q, want %q", got, want)
+	}
+}
+
+// TestLoadEnvForeman covers the required behavior: an LLM_* foreman DSN
+// defines a named provider that is first-class in Parse and in chains.
+func TestLoadEnvForeman(t *testing.T) {
+	r := newTestRegistry(t)
+	err := r.LoadEnv(map[string]string{
+		"LLM_M1": "foreman://test-token-change-me@foreman-m1.orgrimmar.dudenhoeffer.casa",
+		"LLM_M5": "foreman://test-token-change-me@foreman-m5.orgrimmar.dudenhoeffer.casa",
+	})
+	if err != nil {
+		t.Fatalf("LoadEnv: %v", err)
+	}
+
+	for _, name := range []string{"m1", "m5"} {
+		p, ok := r.Provider(name)
+		if !ok {
+			t.Fatalf("provider %q not registered", name)
+		}
+		sp, ok := p.(*stubProvider)
+		if !ok {
+			t.Fatalf("provider %q is %T, want *stubProvider (phase 1)", name, p)
+		}
+		if sp.kind != ProviderForeman {
+			t.Errorf("provider %q kind = %q, want foreman", name, sp.kind)
+		}
+		wantURL := "https://foreman-" + name + ".orgrimmar.dudenhoeffer.casa"
+		if sp.baseURL != wantURL {
+			t.Errorf("provider %q baseURL = %q, want %q", name, sp.baseURL, wantURL)
+		}
+		if sp.token != "test-token-change-me" {
+			t.Errorf("provider %q token = %q, want the DSN userinfo", name, sp.token)
+		}
+	}
+
+	// Env-defined providers are first-class chain elements alongside
+	// built-ins and aliases.
+	r.RegisterAlias("thinking", "anthropic/opus-4.8")
+	m, err := r.Parse("m5/qwen3:30b,m1/qwen3:30b,thinking")
+	if err != nil {
+		t.Fatalf("Parse: %v", err)
+	}
+	want := []string{"m5/qwen3:30b", "m1/qwen3:30b", "anthropic/opus-4.8"}
+	if got := targetsOf(t, m); !slices.Equal(got, want) {
+		t.Errorf("targets = %v, want %v", got, want)
+	}
+}
+
+func TestLoadEnvNameNormalization(t *testing.T) {
+	r := newTestRegistry(t)
+	if err := r.LoadEnv(map[string]string{"LLM_MY_BOX": "ollama://my-box.example"}); err != nil {
+		t.Fatalf("LoadEnv: %v", err)
+	}
+	if _, ok := r.Provider("my_box"); !ok {
+		t.Error("LLM_MY_BOX should register provider \"my_box\"")
+	}
+}
+
+func TestLoadEnvIgnoresNonLLMVars(t *testing.T) {
+	r := newTestRegistry(t)
+	if err := r.LoadEnv(map[string]string{
+		"PATH":     "/usr/bin",
+		"LLM_":     "foreman://x@h",
+		"NOT_LLM_": "foreman://x@h",
+	}); err != nil {
+		t.Fatalf("LoadEnv: %v", err)
+	}
+	if _, ok := r.Provider(""); ok {
+		t.Error("empty-suffix LLM_ var must not register a provider")
+	}
+}
+
+func TestLoadEnvInvalidDSN(t *testing.T) {
+	r := newTestRegistry(t)
+	err := r.LoadEnv(map[string]string{
+		"LLM_BAD":  "not-a-dsn",
+		"LLM_GOOD": "foreman://tok@good.example",
+	})
+	if !errors.Is(err, ErrInvalidDSN) {
+		t.Errorf("LoadEnv error = %v, want ErrInvalidDSN", err)
+	}
+	// The valid entry still registered.
+	if _, ok := r.Provider("good"); !ok {
+		t.Error("valid LLM_GOOD entry should register despite LLM_BAD failing")
+	}
+	// The invalid entry's error surfaces when the name is used.
+	_, perr := r.Parse("bad/some-model")
+	if perr == nil || !strings.Contains(perr.Error(), "LLM_BAD") {
+		t.Errorf("Parse(bad/...) error = %v, want recorded LLM_BAD load error", perr)
+	}
+}
+
+func TestLoadEnvUnknownScheme(t *testing.T) {
+	r := newTestRegistry(t)
+	err := r.LoadEnv(map[string]string{"LLM_X": "zorp://tok@host.example"})
+	if !errors.Is(err, ErrUnknownProvider) {
+		t.Errorf("LoadEnv error = %v, want ErrUnknownProvider", err)
+	}
+	if err == nil || !strings.Contains(err.Error(), `"zorp"`) {
+		t.Errorf("error %v should name the unknown scheme", err)
+	}
+}
+
+// TestLazyEnvFallback covers go-llm parity: a provider name that is not
+// registered resolves through LLM_{UPPER(name)} at Parse time.
+func TestLazyEnvFallback(t *testing.T) {
+	env := map[string]string{
+		"LLM_M9":      "foreman://lazy-token@foreman-m9.example",
+		"LLM_MY_PROV": "ollama://my-prov.example",
+	}
+	r := New(
+		WithoutEnvProviders(),
+		WithEnvLookup(func(k string) string { return env[k] }),
+	)
+
+	m, err := r.Parse("m9/qwen3:30b")
+	if err != nil {
+		t.Fatalf("Parse(m9/...): %v", err)
+	}
+	if got := targetsOf(t, m); !slices.Equal(got, []string{"m9/qwen3:30b"}) {
+		t.Errorf("targets = %v", got)
+	}
+	// The lazily-resolved provider is cached.
+	if _, ok := r.Provider("m9"); !ok {
+		t.Error("lazy env provider should be cached in the registry")
+	}
+
+	// Hyphenated names map to underscored env vars (go-llm parity).
+	if _, err := r.Parse("my-prov/llama3"); err != nil {
+		t.Errorf("Parse(my-prov/...): %v", err)
+	}
+}
+
+// TestNewLoadsProcessEnv covers the eager scan in New().
+func TestNewLoadsProcessEnv(t *testing.T) {
+	t.Setenv("LLM_ENVTEST", "foreman://tok@envtest.example")
+	r := New(WithEnvLookup(func(string) string { return "" }))
+	if _, ok := r.Provider("envtest"); !ok {
+		t.Error("New() should eagerly load LLM_ENVTEST from the process environment")
+	}
+}
@@ -0,0 +1,3 @@
+module gitea.stevedudenhoeffer.com/steve/majordomo
+
+go 1.26
@@ -0,0 +1,163 @@
+// Package health tracks per-target model health for failover decisions.
+//
+// Why: a failover chain must skip targets that are repeatedly failing
+// ("backed off") and re-admit them after a cooldown, without any persistent
+// state or background goroutines. The tracker is in-memory, process-local,
+// thread-safe, and clock-injected so backoff is unit-testable.
+//
+// Semantics (see ADR-0006):
+//   - One transient failure increments a consecutive-failure count.
+//   - Reaching the failure threshold (default 2) backs the target off until
+//     now + cooldown. Cooldown grows exponentially per consecutive backoff
+//     (default base 5s, x2 each time, capped at 5m).
+//   - Any success fully resets the target: failure count and backoff
+//     history both clear.
+package health
+
+import (
+	"sync"
+	"time"
+)
+
+// Default configuration values.
+const (
+	DefaultFailureThreshold = 2
+	DefaultBaseCooldown     = 5 * time.Second
+	DefaultMaxCooldown      = 5 * time.Minute
+	DefaultMultiplier       = 2.0
+)
+
+// Clock supplies the current time; injected for tests.
+type Clock func() time.Time
+
+// Config tunes the tracker. Zero values select the defaults above.
+type Config struct {
+	// FailureThreshold is the number of consecutive transient failures that
+	// triggers a backoff.
+	FailureThreshold int
+	// BaseCooldown is the first backoff duration.
+	BaseCooldown time.Duration
+	// MaxCooldown caps the exponential growth.
+	MaxCooldown time.Duration
+	// Multiplier scales the cooldown per consecutive backoff.
+	Multiplier float64
+	// Clock supplies the current time (defaults to time.Now).
+	Clock Clock
+}
+
+func (c Config) withDefaults() Config {
+	if c.FailureThreshold <= 0 {
+		c.FailureThreshold = DefaultFailureThreshold
+	}
+	if c.BaseCooldown <= 0 {
+		c.BaseCooldown = DefaultBaseCooldown
+	}
+	if c.MaxCooldown <= 0 {
+		c.MaxCooldown = DefaultMaxCooldown
+	}
+	if c.Multiplier <= 1 {
+		c.Multiplier = DefaultMultiplier
+	}
+	if c.Clock == nil {
+		c.Clock = time.Now
+	}
+	return c
+}
+
+// Tracker records per-key health. Keys are opaque; majordomo uses
+// "provider/model-id".
+//
+// Tracker is an interface-free concrete type on purpose: consumers that want
+// persistence can wrap it behind their own interface; majordomo itself stays
+// in-memory (ADR-0006).
+type Tracker struct {
+	mu      sync.Mutex
+	cfg     Config
+	entries map[string]*entry
+}
+
+type entry struct {
+	// consecutiveFailures counts transient failures since the last success
+	// or backoff trigger.
+	consecutiveFailures int
+	// backoffs counts consecutive backoff rounds since the last success;
+	// it drives the exponential cooldown.
+	backoffs int
+	// until is the moment the current backoff expires (zero = not backed off).
+	until time.Time
+}
+
+// NewTracker creates a tracker with the given configuration.
+func NewTracker(cfg Config) *Tracker {
+	return &Tracker{cfg: cfg.withDefaults(), entries: make(map[string]*entry)}
+}
+
+// Available reports whether the key is currently usable (not backed off).
+func (t *Tracker) Available(key string) bool {
+	t.mu.Lock()
+	defer t.mu.Unlock()
+	e, ok := t.entries[key]
+	if !ok {
+		return true
+	}
+	return !t.cfg.Clock().Before(e.until)
+}
+
+// ReportSuccess resets the key's failure count and backoff history.
+func (t *Tracker) ReportSuccess(key string) {
+	t.mu.Lock()
+	defer t.mu.Unlock()
+	delete(t.entries, key)
+}
+
+// ReportFailure records a transient failure. When the consecutive-failure
+// count reaches the threshold the key is backed off and the method reports
+// true; the count then resets so re-admission requires a fresh run of
+// failures to trigger the next (longer) backoff.
+func (t *Tracker) ReportFailure(key string) (backedOff bool) {
+	t.mu.Lock()
+	defer t.mu.Unlock()
+	e, ok := t.entries[key]
+	if !ok {
+		e = &entry{}
+		t.entries[key] = e
+	}
+	e.consecutiveFailures++
+	if e.consecutiveFailures < t.cfg.FailureThreshold {
+		return false
+	}
+	cooldown := t.cooldownFor(e.backoffs)
+	e.until = t.cfg.Clock().Add(cooldown)
+	e.backoffs++
+	e.consecutiveFailures = 0
+	return true
+}
+
+// BackedOffUntil returns the end of the key's current backoff window, or the
+// zero time when the key is not backed off. Useful for diagnostics and error
+// messages.
+func (t *Tracker) BackedOffUntil(key string) time.Time {
+	t.mu.Lock()
+	defer t.mu.Unlock()
+	e, ok := t.entries[key]
+	if !ok || !t.cfg.Clock().Before(e.until) {
+		return time.Time{}
+	}
+	return e.until
+}
+
+// cooldownFor computes the cooldown for the n-th consecutive backoff
+// (0-based): base * multiplier^n, capped at MaxCooldown.
+func (t *Tracker) cooldownFor(n int) time.Duration {
+	d := float64(t.cfg.BaseCooldown)
+	for range n {
+		d *= t.cfg.Multiplier
+		if time.Duration(d) >= t.cfg.MaxCooldown {
+			return t.cfg.MaxCooldown
+		}
+	}
+	if time.Duration(d) > t.cfg.MaxCooldown {
+		return t.cfg.MaxCooldown
+	}
+	return time.Duration(d)
+}
@@ -0,0 +1,165 @@
+package health
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+// fakeClock is a manually-advanced clock for deterministic backoff tests.
+type fakeClock struct {
+	mu  sync.Mutex
+	now time.Time
+}
+
+func newFakeClock() *fakeClock {
+	return &fakeClock{now: time.Date(2026, 6, 10, 12, 0, 0, 0, time.UTC)}
+}
+
+func (c *fakeClock) Now() time.Time {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+	return c.now
+}
+
+func (c *fakeClock) Advance(d time.Duration) {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+	c.now = c.now.Add(d)
+}
+
+func newTestTracker(clock *fakeClock) *Tracker {
+	return NewTracker(Config{
+		FailureThreshold: 2,
+		BaseCooldown:     5 * time.Second,
+		MaxCooldown:      5 * time.Minute,
+		Multiplier:       2,
+		Clock:            clock.Now,
+	})
+}
+
+func TestSingleFailureStaysAvailable(t *testing.T) {
+	clock := newFakeClock()
+	tr := newTestTracker(clock)
+	if backedOff := tr.ReportFailure("k"); backedOff {
+		t.Error("first failure must not back off")
+	}
+	if !tr.Available("k") {
+		t.Error("key should remain available after one failure")
+	}
+}
+
+func TestThresholdTriggersBackoff(t *testing.T) {
+	clock := newFakeClock()
+	tr := newTestTracker(clock)
+	tr.ReportFailure("k")
+	if backedOff := tr.ReportFailure("k"); !backedOff {
+		t.Error("second consecutive failure should back off")
+	}
+	if tr.Available("k") {
+		t.Error("key should be unavailable during backoff")
+	}
+	if until := tr.BackedOffUntil("k"); !until.Equal(clock.Now().Add(5 * time.Second)) {
+		t.Errorf("BackedOffUntil = %v, want now+5s", until)
+	}
+}
+
+func TestCooldownExpiryReadmits(t *testing.T) {
+	clock := newFakeClock()
+	tr := newTestTracker(clock)
+	tr.ReportFailure("k")
+	tr.ReportFailure("k")
+	clock.Advance(5*time.Second - time.Millisecond)
+	if tr.Available("k") {
+		t.Error("still inside cooldown")
+	}
+	clock.Advance(time.Millisecond)
+	if !tr.Available("k") {
+		t.Error("cooldown expiry should re-admit the key")
+	}
+}
+
+func TestExponentialCooldownWithCap(t *testing.T) {
+	clock := newFakeClock()
+	tr := newTestTracker(clock)
+
+	// Consecutive backoffs: 5s, 10s, 20s, ... capped at 5m.
+	wantCooldowns := []time.Duration{
+		5 * time.Second, 10 * time.Second, 20 * time.Second, 40 * time.Second,
+		80 * time.Second, 160 * time.Second, 5 * time.Minute, 5 * time.Minute,
+	}
+	for i, want := range wantCooldowns {
+		tr.ReportFailure("k")
+		tr.ReportFailure("k")
+		until := tr.BackedOffUntil("k")
+		if got := until.Sub(clock.Now()); got != want {
+			t.Fatalf("backoff #%d cooldown = %v, want %v", i+1, got, want)
+		}
+		clock.Advance(want)
+	}
+}
+
+func TestSuccessResetsEverything(t *testing.T) {
+	clock := newFakeClock()
+	tr := newTestTracker(clock)
+
+	// Build up to a long cooldown...
+	for range 3 {
+		tr.ReportFailure("k")
+		tr.ReportFailure("k")
+		clock.Advance(tr.BackedOffUntil("k").Sub(clock.Now()))
+	}
+	// ...then a success resets both the count and the exponent.
+	tr.ReportSuccess("k")
+	tr.ReportFailure("k")
+	if !tr.Available("k") {
+		t.Error("one failure after success must not back off")
+	}
+	tr.ReportFailure("k")
+	if got := tr.BackedOffUntil("k").Sub(clock.Now()); got != 5*time.Second {
+		t.Errorf("post-reset cooldown = %v, want base 5s", got)
+	}
+}
+
+func TestKeysAreIndependent(t *testing.T) {
+	clock := newFakeClock()
+	tr := newTestTracker(clock)
+	tr.ReportFailure("a")
+	tr.ReportFailure("a")
+	if tr.Available("a") {
+		t.Error("a should be backed off")
+	}
+	if !tr.Available("b") {
+		t.Error("b must be unaffected")
+	}
+}
+
+func TestDefaultsApplied(t *testing.T) {
+	tr := NewTracker(Config{})
+	if tr.cfg.FailureThreshold != DefaultFailureThreshold ||
+		tr.cfg.BaseCooldown != DefaultBaseCooldown ||
+		tr.cfg.MaxCooldown != DefaultMaxCooldown ||
+		tr.cfg.Multiplier != DefaultMultiplier ||
+		tr.cfg.Clock == nil {
+		t.Errorf("defaults not applied: %+v", tr.cfg)
+	}
+}
+
+func TestTrackerConcurrency(t *testing.T) {
+	clock := newFakeClock()
+	tr := newTestTracker(clock)
+	var wg sync.WaitGroup
+	for i := range 8 {
+		wg.Add(1)
+		go func(n int) {
+			defer wg.Done()
+			key := []string{"a", "b"}[n%2]
+			for range 200 {
+				tr.ReportFailure(key)
+				tr.Available(key)
+				tr.ReportSuccess(key)
+			}
+		}(i)
+	}
+	wg.Wait()
+}
@@ -0,0 +1,45 @@
+package llm
+
+import "slices"
+
+// Capabilities declares what a model (or provider) supports and the limits
+// it imposes. Providers declare defaults; individual models may override.
+// The media pipeline normalizes image inputs against these values before a
+// request is serialized.
+//
+// Zero-value semantics:
+//   - MaxImagesPerReq == 0 means image input is NOT supported.
+//   - MaxImageBytes / MaxImageDimension / ContextWindow == 0 mean
+//     "no declared limit", not zero.
+//   - AllowedImageMIME empty means any MIME type is acceptable
+//     (only meaningful when images are supported at all).
+type Capabilities struct {
+	// MaxImageBytes is the largest single image payload, in bytes.
+	MaxImageBytes int
+	// MaxImageDimension is the largest allowed width or height, in pixels.
+	MaxImageDimension int
+	// AllowedImageMIME lists acceptable image content types
+	// (e.g. "image/jpeg", "image/png").
+	AllowedImageMIME []string
+	// MaxImagesPerReq is the most images one request may carry; 0 = images
+	// unsupported.
+	MaxImagesPerReq int
+
+	SupportsTools      bool
+	SupportsStructured bool
+	SupportsStreaming  bool
+
+	// ContextWindow is the model's context size in tokens, when known.
+	ContextWindow int
+}
+
+// SupportsImages reports whether the target accepts image input.
+func (c Capabilities) SupportsImages() bool { return c.MaxImagesPerReq > 0 }
+
+// MIMEAllowed reports whether the given image MIME type is acceptable.
+func (c Capabilities) MIMEAllowed(mime string) bool {
+	if len(c.AllowedImageMIME) == 0 {
+		return true
+	}
+	return slices.Contains(c.AllowedImageMIME, mime)
+}
@@ -0,0 +1,39 @@
+package llm
+
+// Part is one piece of message content: text, an image, or future media
+// kinds. The set of implementations is closed (sealed by the unexported
+// method) so providers can switch exhaustively over content kinds.
+//
+// Why: providers need a finite, known content vocabulary to serialize into
+// their wire formats; an open interface would silently drop unknown content.
+type Part interface {
+	isPart()
+}
+
+// TextPart is plain text content.
+type TextPart struct {
+	Text string
+}
+
+func (TextPart) isPart() {}
+
+// ImagePart is image content carried as raw bytes plus a MIME type.
+//
+// Why bytes-only (no URL form): the media pipeline must be able to inspect,
+// downscale, and re-encode every image to fit the target's capabilities, and
+// that requires the bytes. Callers with a URL fetch it themselves; majordomo
+// does not download remote content on a caller's behalf.
+type ImagePart struct {
+	// MIME is the image content type, e.g. "image/png" or "image/jpeg".
+	MIME string
+	// Data is the raw, unencoded image bytes (providers base64 as needed).
+	Data []byte
+}
+
+func (ImagePart) isPart() {}
+
+// Text constructs a text content part.
+func Text(s string) Part { return TextPart{Text: s} }
+
+// Image constructs an image content part from raw bytes.
+func Image(mime string, data []byte) Part { return ImagePart{MIME: mime, Data: data} }
@@ -0,0 +1,119 @@
+package llm
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"net"
+	"net/http"
+	"strings"
+	"syscall"
+)
+
+// ErrorClass buckets errors for retry/failover decisions.
+type ErrorClass int
+
+const (
+	// ClassTransient errors may succeed on retry or on another target:
+	// rate limits, server errors, timeouts, connection failures.
+	ClassTransient ErrorClass = iota
+	// ClassPermanent errors will not improve on retry of the same request:
+	// malformed requests, auth failures, model-not-found.
+	ClassPermanent
+)
+
+// ErrModelNotFound marks a permanent "this target does not know this model"
+// condition. Chains advance past it without penalizing the target's health.
+var ErrModelNotFound = errors.New("model not found")
+
+// APIError is a structured provider error carrying enough context to
+// classify it and to debug it.
+type APIError struct {
+	// Provider and Model identify the target that failed.
+	Provider string
+	Model    string
+
+	// Status is the HTTP status code, or 0 when the failure was not an HTTP
+	// response (connection error, decode error, ...).
+	Status int
+
+	// Code is the provider-specific error code, when one was supplied.
+	Code string
+
+	// Message is the provider's human-readable error message.
+	Message string
+
+	// Err is the wrapped underlying cause, if any.
+	Err error
+}
+
+func (e *APIError) Error() string {
+	var b strings.Builder
+	fmt.Fprintf(&b, "%s/%s", e.Provider, e.Model)
+	if e.Status != 0 {
+		fmt.Fprintf(&b, ": HTTP %d", e.Status)
+	}
+	if e.Code != "" {
+		fmt.Fprintf(&b, " [%s]", e.Code)
+	}
+	if e.Message != "" {
+		fmt.Fprintf(&b, ": %s", e.Message)
+	}
+	if e.Err != nil {
+		fmt.Fprintf(&b, ": %v", e.Err)
+	}
+	return b.String()
+}
+
+func (e *APIError) Unwrap() error {
+	if e.Err != nil {
+		return e.Err
+	}
+	if e.Status == http.StatusNotFound {
+		return ErrModelNotFound
+	}
+	return nil
+}
+
+// Classify buckets an error as transient or permanent.
+//
+// The default policy (overridable via health configuration):
+//   - context.Canceled is permanent — the caller gave up; retrying defies
+//     their intent. context.DeadlineExceeded is transient.
+//   - Network timeouts, refused/reset connections, and DNS failures are
+//     transient ("high demand" conditions).
+//   - HTTP 400/401/403/404/405/422 (and ErrModelNotFound) are permanent;
+//     408/429 and all 5xx are transient.
+//   - Anything unrecognized is transient: when in doubt, failing over to the
+//     next target in a chain can only help availability.
+func Classify(err error) ErrorClass {
+	if err == nil {
+		return ClassTransient
+	}
+	if errors.Is(err, context.Canceled) {
+		return ClassPermanent
+	}
+	if errors.Is(err, context.DeadlineExceeded) {
+		return ClassTransient
+	}
+	if errors.Is(err, ErrModelNotFound) {
+		return ClassPermanent
+	}
+	if errors.Is(err, syscall.ECONNREFUSED) || errors.Is(err, syscall.ECONNRESET) {
+		return ClassTransient
+	}
+	if _, ok := errors.AsType[net.Error](err); ok {
+		return ClassTransient
+	}
+	if apiErr, ok := errors.AsType[*APIError](err); ok && apiErr.Status != 0 {
+		switch {
+		case apiErr.Status == http.StatusRequestTimeout, // 408
+			apiErr.Status == http.StatusTooManyRequests, // 429
+			apiErr.Status >= 500:
+			return ClassTransient
+		case apiErr.Status >= 400:
+			return ClassPermanent
+		}
+	}
+	return ClassTransient
+}
@@ -0,0 +1,84 @@
+package llm
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"net"
+	"strings"
+	"syscall"
+	"testing"
+)
+
+type fakeNetErr struct{ timeout bool }
+
+func (e fakeNetErr) Error() string   { return "fake net error" }
+func (e fakeNetErr) Timeout() bool   { return e.timeout }
+func (e fakeNetErr) Temporary() bool { return true }
+
+var _ net.Error = fakeNetErr{}
+
+func TestClassify(t *testing.T) {
+	tests := []struct {
+		name string
+		err  error
+		want ErrorClass
+	}{
+		{"canceled is permanent", context.Canceled, ClassPermanent},
+		{"deadline is transient", context.DeadlineExceeded, ClassTransient},
+		{"wrapped canceled", fmt.Errorf("call: %w", context.Canceled), ClassPermanent},
+		{"model not found", fmt.Errorf("x: %w", ErrModelNotFound), ClassPermanent},
+		{"conn refused", syscall.ECONNREFUSED, ClassTransient},
+		{"conn reset", fmt.Errorf("write: %w", syscall.ECONNRESET), ClassTransient},
+		{"net timeout", fakeNetErr{timeout: true}, ClassTransient},
+		{"http 429", &APIError{Status: 429}, ClassTransient},
+		{"http 408", &APIError{Status: 408}, ClassTransient},
+		{"http 500", &APIError{Status: 500}, ClassTransient},
+		{"http 503", &APIError{Status: 503}, ClassTransient},
+		{"http 529", &APIError{Status: 529}, ClassTransient},
+		{"http 400", &APIError{Status: 400}, ClassPermanent},
+		{"http 401", &APIError{Status: 401}, ClassPermanent},
+		{"http 403", &APIError{Status: 403}, ClassPermanent},
+		{"http 404", &APIError{Status: 404}, ClassPermanent},
+		{"http 422", &APIError{Status: 422}, ClassPermanent},
+		{"wrapped api error", fmt.Errorf("call: %w", &APIError{Status: 503}), ClassTransient},
+		{"unknown defaults transient", errors.New("mystery"), ClassTransient},
+		{"non-http api error defaults transient", &APIError{Message: "decode failed"}, ClassTransient},
+	}
+	for _, tt := range tests {
+		if got := Classify(tt.err); got != tt.want {
+			t.Errorf("%s: Classify = %v, want %v", tt.name, got, tt.want)
+		}
+	}
+}
+
+func TestAPIError404UnwrapsToModelNotFound(t *testing.T) {
+	err := &APIError{Provider: "openai", Model: "nope", Status: 404}
+	if !errors.Is(err, ErrModelNotFound) {
+		t.Error("404 APIError should match ErrModelNotFound")
+	}
+	if errors.Is(&APIError{Status: 500}, ErrModelNotFound) {
+		t.Error("500 APIError must not match ErrModelNotFound")
+	}
+}
+
+func TestAPIErrorMessage(t *testing.T) {
+	err := &APIError{
+		Provider: "anthropic", Model: "opus-4.8",
+		Status: 429, Code: "rate_limit_error", Message: "slow down",
+	}
+	got := err.Error()
+	for _, frag := range []string{"anthropic/opus-4.8", "429", "rate_limit_error", "slow down"} {
+		if !strings.Contains(got, frag) {
+			t.Errorf("error string %q missing %q", got, frag)
+		}
+	}
+}
+
+func TestAPIErrorUnwrapsCause(t *testing.T) {
+	cause := errors.New("boom")
+	err := &APIError{Provider: "p", Model: "m", Err: cause}
+	if !errors.Is(err, cause) {
+		t.Error("APIError should unwrap to its cause")
+	}
+}
@@ -0,0 +1,12 @@
+// Package llm defines majordomo's canonical, provider-agnostic contract:
+// messages and content parts, requests and responses, tools, capabilities,
+// streaming, and the Model/Provider interfaces every backend implements.
+//
+// Why: provider implementations (openai, anthropic, google, ollama, foreman,
+// and any client-defined backend) must share one vocabulary without importing
+// each other or the root package. This package is the dependency leaf — it
+// imports nothing else in the module, and everything else imports it.
+//
+// Most consumers never import this package directly: the root majordomo
+// package re-exports every type here via type aliases.
+package llm
@@ -0,0 +1,71 @@
+package llm
+
+import "strings"
+
+// Role identifies the author of a message.
+type Role string
+
+const (
+	RoleSystem    Role = "system"
+	RoleUser      Role = "user"
+	RoleAssistant Role = "assistant"
+	RoleTool      Role = "tool"
+)
+
+// Message is one turn in a conversation.
+//
+// Exactly which fields are populated depends on the role: user and system
+// messages carry Parts; assistant messages carry Parts and/or ToolCalls;
+// tool messages carry ToolResults. Providers translate this canonical shape
+// to and from their wire formats.
+type Message struct {
+	Role Role
+
+	// Parts is the message content (text, images, ...).
+	Parts []Part
+
+	// ToolCalls are tool invocations requested by the assistant
+	// (meaningful only when Role == RoleAssistant).
+	ToolCalls []ToolCall
+
+	// ToolResults carry the outcomes of earlier ToolCalls
+	// (meaningful only when Role == RoleTool).
+	ToolResults []ToolResult
+}
+
+// Text returns the concatenation of all text parts in the message.
+func (m Message) Text() string {
+	var b strings.Builder
+	for _, p := range m.Parts {
+		if t, ok := p.(TextPart); ok {
+			b.WriteString(t.Text)
+		}
+	}
+	return b.String()
+}
+
+// SystemText constructs a system message with one text part.
+func SystemText(s string) Message {
+	return Message{Role: RoleSystem, Parts: []Part{Text(s)}}
+}
+
+// UserText constructs a user message with one text part.
+func UserText(s string) Message {
+	return Message{Role: RoleUser, Parts: []Part{Text(s)}}
+}
+
+// UserParts constructs a user message from arbitrary content parts
+// (e.g. text plus images).
+func UserParts(parts ...Part) Message {
+	return Message{Role: RoleUser, Parts: parts}
+}
+
+// AssistantText constructs an assistant message with one text part.
+func AssistantText(s string) Message {
+	return Message{Role: RoleAssistant, Parts: []Part{Text(s)}}
+}
+
+// ToolResultsMessage constructs a tool message carrying one or more results.
+func ToolResultsMessage(results ...ToolResult) Message {
+	return Message{Role: RoleTool, ToolResults: results}
+}
@@ -0,0 +1,62 @@
+package llm
+
+import "testing"
+
+func TestMessageText(t *testing.T) {
+	m := UserParts(Text("a "), Image("image/png", []byte{1}), Text("b"))
+	if got := m.Text(); got != "a b" {
+		t.Errorf("Text = %q, want %q", got, "a b")
+	}
+}
+
+func TestConstructors(t *testing.T) {
+	if m := SystemText("s"); m.Role != RoleSystem || m.Text() != "s" {
+		t.Errorf("SystemText = %+v", m)
+	}
+	if m := UserText("u"); m.Role != RoleUser || m.Text() != "u" {
+		t.Errorf("UserText = %+v", m)
+	}
+	if m := AssistantText("a"); m.Role != RoleAssistant || m.Text() != "a" {
+		t.Errorf("AssistantText = %+v", m)
+	}
+	m := ToolResultsMessage(ToolResult{ID: "1", Content: "ok"})
+	if m.Role != RoleTool || len(m.ToolResults) != 1 {
+		t.Errorf("ToolResultsMessage = %+v", m)
+	}
+}
+
+func TestResponseTextAndMessage(t *testing.T) {
+	r := &Response{
+		Parts:     []Part{Text("hello "), Text("world")},
+		ToolCalls: []ToolCall{{ID: "1", Name: "t"}},
+	}
+	if got := r.Text(); got != "hello world" {
+		t.Errorf("Text = %q", got)
+	}
+	m := r.Message()
+	if m.Role != RoleAssistant || m.Text() != "hello world" || len(m.ToolCalls) != 1 {
+		t.Errorf("Message = %+v", m)
+	}
+}
+
+func TestUsageAccumulation(t *testing.T) {
+	u := Usage{InputTokens: 10, OutputTokens: 5}
+	u.Add(Usage{InputTokens: 1, OutputTokens: 2})
+	if u.InputTokens != 11 || u.OutputTokens != 7 || u.Total() != 18 {
+		t.Errorf("usage = %+v", u)
+	}
+}
+
+func TestCapabilitiesHelpers(t *testing.T) {
+	c := Capabilities{}
+	if c.SupportsImages() {
+		t.Error("zero MaxImagesPerReq must mean images unsupported")
+	}
+	if !c.MIMEAllowed("image/png") {
+		t.Error("empty AllowedImageMIME must allow any type")
+	}
+	c = Capabilities{MaxImagesPerReq: 2, AllowedImageMIME: []string{"image/jpeg"}}
+	if !c.SupportsImages() || c.MIMEAllowed("image/png") || !c.MIMEAllowed("image/jpeg") {
+		t.Errorf("capabilities helpers misbehave: %+v", c)
+	}
+}
@@ -0,0 +1,58 @@
+package llm
+
+import "context"
+
+// Model is the canonical generation interface. A Model may be a single
+// provider-bound target or a failover chain — the two are interchangeable
+// and callers never branch on which they got.
+type Model interface {
+	// Generate performs one request/response round trip.
+	Generate(ctx context.Context, req Request, opts ...Option) (*Response, error)
+
+	// Stream performs one request with incremental delivery.
+	Stream(ctx context.Context, req Request, opts ...Option) (Stream, error)
+
+	// Capabilities reports what this model supports. For chains this is the
+	// head element's capabilities (the preferred target); per-attempt media
+	// normalization always uses the actual target's capabilities.
+	Capabilities() Capabilities
+}
+
+// ModelOption configures a Model at construction time (Provider.Model).
+type ModelOption func(*ModelConfig)
+
+// ModelConfig carries per-model construction settings shared by all
+// providers.
+type ModelConfig struct {
+	// Capabilities, when non-nil, overrides the provider's default
+	// capabilities for this model.
+	Capabilities *Capabilities
+}
+
+// ApplyModelOptions folds options into a config.
+func ApplyModelOptions(opts []ModelOption) ModelConfig {
+	var cfg ModelConfig
+	for _, opt := range opts {
+		opt(&cfg)
+	}
+	return cfg
+}
+
+// WithCapabilities overrides the provider's default capabilities for one
+// model (e.g. a vision-capable tag on an otherwise text-only provider).
+func WithCapabilities(caps Capabilities) ModelOption {
+	return func(cfg *ModelConfig) { cfg.Capabilities = &caps }
+}
+
+// Provider mints Models bound to one backend. Implementations translate the
+// canonical Request/Response to and from their wire format and enforce their
+// declared Capabilities.
+type Provider interface {
+	// Name is the registry identifier used in "provider/model" specs.
+	Name() string
+
+	// Model returns a Model bound to the given id. The id is whatever the
+	// backend accepts — majordomo passes it through verbatim and never
+	// validates it against a catalog.
+	Model(id string, opts ...ModelOption) (Model, error)
+}
@@ -0,0 +1,98 @@
+package llm
+
+import "encoding/json"
+
+// Request is the canonical generation request. Providers translate it to
+// their wire format and enforce their declared Capabilities against it.
+type Request struct {
+	// System is the system prompt. Providers map it to their native system
+	// mechanism (top-level system field, system message, SystemInstruction).
+	// Any RoleSystem messages in Messages are folded in after this field.
+	System string
+
+	// Messages is the conversation so far, oldest first.
+	Messages []Message
+
+	// Tools the model may call.
+	Tools []Tool
+
+	// ToolChoice constrains tool use: "" or "auto" lets the model decide,
+	// "none" forbids tool calls, "required" forces some tool call, and any
+	// other value names the one tool the model must call.
+	ToolChoice string
+
+	// Schema, when non-nil, is a JSON Schema object the response must
+	// conform to (structured output). Providers map it to their native
+	// mechanism. SchemaName names the schema for providers that require one.
+	Schema     json.RawMessage
+	SchemaName string
+
+	// Sampling and limit knobs. Pointer fields distinguish "unset" (provider
+	// default) from an explicit zero.
+	Temperature *float64
+	TopP        *float64
+
+	// MaxTokens caps the response length; 0 means provider default.
+	MaxTokens int
+
+	// StopSequences halt generation when emitted.
+	StopSequences []string
+}
+
+// Option mutates a Request before it is sent. Options passed to Generate or
+// Stream are applied to a copy of the request, so a Request value can be
+// safely reused across calls.
+type Option func(*Request)
+
+// WithSystem sets the system prompt.
+func WithSystem(s string) Option { return func(r *Request) { r.System = s } }
+
+// WithTools appends tools to the request.
+func WithTools(tools ...Tool) Option {
+	return func(r *Request) { r.Tools = append(r.Tools, tools...) }
+}
+
+// WithToolbox appends every tool in the toolbox to the request.
+func WithToolbox(b *Toolbox) Option {
+	return func(r *Request) { r.Tools = append(r.Tools, b.Tools()...) }
+}
+
+// WithToolChoice sets the tool-choice policy ("auto", "none", "required",
+// or a specific tool name).
+func WithToolChoice(choice string) Option {
+	return func(r *Request) { r.ToolChoice = choice }
+}
+
+// WithSchema requests structured output conforming to the given JSON Schema.
+// name is optional; providers that require a schema name fall back to
+// "response" when it is empty.
+func WithSchema(schema json.RawMessage, name string) Option {
+	return func(r *Request) { r.Schema = schema; r.SchemaName = name }
+}
+
+// WithTemperature sets the sampling temperature.
+func WithTemperature(t float64) Option {
+	return func(r *Request) { r.Temperature = &t }
+}
+
+// WithTopP sets nucleus-sampling top-p.
+func WithTopP(p float64) Option {
+	return func(r *Request) { r.TopP = &p }
+}
+
+// WithMaxTokens caps the response length.
+func WithMaxTokens(n int) Option { return func(r *Request) { r.MaxTokens = n } }
+
+// WithStopSequences sets stop sequences.
+func WithStopSequences(stops ...string) Option {
+	return func(r *Request) { r.StopSequences = stops }
+}
+
+// Apply returns a copy of the request with all options applied. Providers
+// and wrappers call this once at the top of Generate/Stream.
+func (r Request) Apply(opts ...Option) Request {
+	for _, opt := range opts {
+		opt(&r)
+	}
+	return r
+}
@@ -0,0 +1,73 @@
+package llm
+
+import "strings"
+
+// FinishReason explains why generation stopped.
+type FinishReason string
+
+const (
+	// FinishStop: the model completed its answer (or hit a stop sequence).
+	FinishStop FinishReason = "stop"
+	// FinishLength: the MaxTokens (or context) limit was hit.
+	FinishLength FinishReason = "length"
+	// FinishToolCalls: the model stopped to request tool invocations.
+	FinishToolCalls FinishReason = "tool_calls"
+	// FinishContentFilter: the provider suppressed content.
+	FinishContentFilter FinishReason = "content_filter"
+	// FinishOther: any provider-specific reason not mapped above.
+	FinishOther FinishReason = "other"
+)
+
+// Usage reports token accounting for one request.
+type Usage struct {
+	InputTokens  int
+	OutputTokens int
+}
+
+// Total returns input plus output tokens.
+func (u Usage) Total() int { return u.InputTokens + u.OutputTokens }
+
+// Add accumulates another usage record (used by agents summing steps).
+func (u *Usage) Add(o Usage) {
+	u.InputTokens += o.InputTokens
+	u.OutputTokens += o.OutputTokens
+}
+
+// Response is the canonical generation result.
+type Response struct {
+	// Parts is the response content (text, and for multimodal-output models,
+	// other media).
+	Parts []Part
+
+	// ToolCalls are the tool invocations the model requested, if any.
+	ToolCalls []ToolCall
+
+	FinishReason FinishReason
+	Usage        Usage
+
+	// Model identifies the resolved target that produced this response as
+	// "provider/model-id". With failover chains this names the element that
+	// actually served the request.
+	Model string
+
+	// Raw is the provider-native response object, an escape hatch for
+	// provider-specific fields. May be nil; never required for normal use.
+	Raw any
+}
+
+// Text returns the concatenation of all text parts in the response.
+func (r *Response) Text() string {
+	var b strings.Builder
+	for _, p := range r.Parts {
+		if t, ok := p.(TextPart); ok {
+			b.WriteString(t.Text)
+		}
+	}
+	return b.String()
+}
+
+// Message converts the response into an assistant message suitable for
+// appending to a conversation history.
+func (r *Response) Message() Message {
+	return Message{Role: RoleAssistant, Parts: r.Parts, ToolCalls: r.ToolCalls}
+}
@@ -0,0 +1,28 @@
+package llm
+
+// StreamEvent is one increment of a streaming response.
+//
+// Exactly one field group is meaningful per event: a text delta, a completed
+// tool call, or the final response. Tool-call arguments are buffered by the
+// provider until complete — consumers never see partial JSON.
+type StreamEvent struct {
+	// TextDelta is a fragment of assistant text.
+	TextDelta string
+
+	// ToolCall, when non-nil, is a fully-assembled tool call.
+	ToolCall *ToolCall
+
+	// Response, when non-nil, is the final accumulated response (content,
+	// tool calls, finish reason, usage). It is always the last event.
+	Response *Response
+}
+
+// Stream delivers a response incrementally.
+//
+// Next returns io.EOF after the final event (the one carrying Response).
+// Close releases the underlying connection and is safe to call at any time,
+// including after io.EOF or concurrently with Next returning.
+type Stream interface {
+	Next() (StreamEvent, error)
+	Close() error
+}
@@ -0,0 +1,165 @@
+package llm
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+)
+
+// Tool is a callable capability exposed to a model: a name, a description,
+// JSON-Schema parameters, and a Go handler. Providers map this one canonical
+// shape onto their native function-calling formats.
+type Tool struct {
+	Name        string
+	Description string
+
+	// Parameters is a JSON Schema object describing the tool's arguments.
+	// nil means the tool takes no arguments.
+	Parameters json.RawMessage
+
+	// Handler executes the tool. args is the raw JSON arguments object the
+	// model supplied. The returned value is JSON-encoded into the ToolResult.
+	Handler func(ctx context.Context, args json.RawMessage) (any, error)
+}
+
+// ToolCall is a model's request to invoke a tool.
+type ToolCall struct {
+	// ID is the provider-assigned call id; majordomo synthesizes one for
+	// providers that do not supply ids. ToolResult.ID must echo it.
+	ID   string
+	Name string
+
+	// Arguments is the raw JSON arguments object.
+	Arguments json.RawMessage
+}
+
+// ToolResult is the outcome of executing a ToolCall, sent back to the model.
+type ToolResult struct {
+	// ID matches the originating ToolCall.ID.
+	ID   string
+	Name string
+
+	// Content is the result serialized as text (JSON for structured values).
+	Content string
+
+	// IsError marks the result as a failure; the content then describes the
+	// error so the model can react (retry, apologize, try another tool).
+	IsError bool
+}
+
+// Toolbox is a named, ordered set of tools.
+//
+// Why: agents compose their available tools from several sources (multiple
+// toolboxes plus skills); a small named container with duplicate detection
+// keeps that merge explicit and debuggable.
+type Toolbox struct {
+	name  string
+	order []string
+	tools map[string]Tool
+}
+
+// NewToolbox creates a toolbox with the given name and initial tools.
+// Duplicate tool names panic: toolboxes are assembled at startup, and a
+// silently shadowed tool is a programming error worth failing loudly on.
+func NewToolbox(name string, tools ...Tool) *Toolbox {
+	b := &Toolbox{name: name, tools: make(map[string]Tool, len(tools))}
+	for _, t := range tools {
+		if err := b.Add(t); err != nil {
+			panic(err)
+		}
+	}
+	return b
+}
+
+// Name returns the toolbox name.
+func (b *Toolbox) Name() string { return b.name }
+
+// Add registers a tool, rejecting empty or duplicate names.
+func (b *Toolbox) Add(t Tool) error {
+	if t.Name == "" {
+		return fmt.Errorf("toolbox %q: tool with empty name", b.name)
+	}
+	if _, exists := b.tools[t.Name]; exists {
+		return fmt.Errorf("toolbox %q: duplicate tool %q", b.name, t.Name)
+	}
+	b.tools[t.Name] = t
+	b.order = append(b.order, t.Name)
+	return nil
+}
+
+// Tools returns the tools in insertion order.
+func (b *Toolbox) Tools() []Tool {
+	out := make([]Tool, 0, len(b.order))
+	for _, name := range b.order {
+		out = append(out, b.tools[name])
+	}
+	return out
+}
+
+// Get returns the named tool.
+func (b *Toolbox) Get(name string) (Tool, bool) {
+	t, ok := b.tools[name]
+	return t, ok
+}
+
+// Execute runs the named tool for the given call and packages the outcome as
+// a ToolResult. It never panics and never returns an error: handler errors
+// and panics become IsError results so an agent loop can always continue.
+func (b *Toolbox) Execute(ctx context.Context, call ToolCall) ToolResult {
+	t, ok := b.tools[call.Name]
+	if !ok {
+		return ToolResult{
+			ID: call.ID, Name: call.Name,
+			Content: fmt.Sprintf("unknown tool %q", call.Name),
+			IsError: true,
+		}
+	}
+	return ExecuteTool(ctx, t, call)
+}
+
+// ExecuteTool runs a single tool for the given call, recovering panics and
+// converting errors into IsError results.
+func ExecuteTool(ctx context.Context, t Tool, call ToolCall) (res ToolResult) {
+	res = ToolResult{ID: call.ID, Name: call.Name}
+	defer func() {
+		if r := recover(); r != nil {
+			res.Content = fmt.Sprintf("tool %q panicked: %v", call.Name, r)
+			res.IsError = true
+		}
+	}()
+
+	if t.Handler == nil {
+		res.Content = fmt.Sprintf("tool %q has no handler", call.Name)
+		res.IsError = true
+		return res
+	}
+
+	args := call.Arguments
+	if len(args) == 0 {
+		args = json.RawMessage("{}")
+	}
+	out, err := t.Handler(ctx, args)
+	if err != nil {
+		res.Content = err.Error()
+		res.IsError = true
+		return res
+	}
+
+	switch v := out.(type) {
+	case nil:
+		res.Content = "null"
+	case string:
+		res.Content = v
+	case json.RawMessage:
+		res.Content = string(v)
+	default:
+		enc, err := json.Marshal(v)
+		if err != nil {
+			res.Content = fmt.Sprintf("tool %q returned unencodable value: %v", call.Name, err)
+			res.IsError = true
+			return res
+		}
+		res.Content = string(enc)
+	}
+	return res
+}
@@ -0,0 +1,98 @@
+package llm
+
+import (
+	"context"
+	"encoding/json"
+	"errors"
+	"strings"
+	"testing"
+)
+
+func TestToolboxAddRejectsDuplicatesAndEmptyNames(t *testing.T) {
+	b := NewToolbox("box")
+	if err := b.Add(Tool{Name: "a"}); err != nil {
+		t.Fatalf("Add: %v", err)
+	}
+	if err := b.Add(Tool{Name: "a"}); err == nil {
+		t.Error("duplicate name should error")
+	}
+	if err := b.Add(Tool{}); err == nil {
+		t.Error("empty name should error")
+	}
+}
+
+func TestToolboxOrderPreserved(t *testing.T) {
+	b := NewToolbox("box", Tool{Name: "z"}, Tool{Name: "a"}, Tool{Name: "m"})
+	var names []string
+	for _, tool := range b.Tools() {
+		names = append(names, tool.Name)
+	}
+	if got, want := strings.Join(names, ","), "z,a,m"; got != want {
+		t.Errorf("order = %s, want %s", got, want)
+	}
+}
+
+func TestExecuteUnknownTool(t *testing.T) {
+	b := NewToolbox("box")
+	res := b.Execute(context.Background(), ToolCall{ID: "1", Name: "missing"})
+	if !res.IsError || !strings.Contains(res.Content, "missing") {
+		t.Errorf("result = %+v, want unknown-tool error", res)
+	}
+}
+
+func TestExecuteHandlerOutcomes(t *testing.T) {
+	echo := func(v any, err error) Tool {
+		return Tool{Name: "t", Handler: func(context.Context, json.RawMessage) (any, error) { return v, err }}
+	}
+
+	tests := []struct {
+		name        string
+		tool        Tool
+		wantContent string
+		wantErr     bool
+	}{
+		{"string passthrough", echo("plain", nil), "plain", false},
+		{"struct json-encoded", echo(struct {
+			N int `json:"n"`
+		}{4}, nil), `{"n":4}`, false},
+		{"raw message passthrough", echo(json.RawMessage(`{"k":1}`), nil), `{"k":1}`, false},
+		{"nil becomes null", echo(nil, nil), "null", false},
+		{"handler error", echo(nil, errors.New("boom")), "boom", true},
+		{"unencodable value", echo(func() {}, nil), "unencodable", true},
+		{"no handler", Tool{Name: "t"}, "no handler", true},
+	}
+	for _, tt := range tests {
+		res := ExecuteTool(context.Background(), tt.tool, ToolCall{ID: "c1", Name: "t"})
+		if res.IsError != tt.wantErr {
+			t.Errorf("%s: IsError = %v, want %v (%+v)", tt.name, res.IsError, tt.wantErr, res)
+		}
+		if !strings.Contains(res.Content, tt.wantContent) {
+			t.Errorf("%s: content = %q, want it to contain %q", tt.name, res.Content, tt.wantContent)
+		}
+		if res.ID != "c1" {
+			t.Errorf("%s: result ID = %q, want c1", tt.name, res.ID)
+		}
+	}
+}
+
+func TestExecuteRecoversPanic(t *testing.T) {
+	tool := Tool{Name: "t", Handler: func(context.Context, json.RawMessage) (any, error) {
+		panic("kaboom")
+	}}
+	res := ExecuteTool(context.Background(), tool, ToolCall{ID: "1", Name: "t"})
+	if !res.IsError || !strings.Contains(res.Content, "kaboom") {
+		t.Errorf("result = %+v, want recovered panic error", res)
+	}
+}
+
+func TestExecuteEmptyArgsBecomeEmptyObject(t *testing.T) {
+	var got json.RawMessage
+	tool := Tool{Name: "t", Handler: func(_ context.Context, args json.RawMessage) (any, error) {
+		got = args
+		return "ok", nil
+	}}
+	ExecuteTool(context.Background(), tool, ToolCall{ID: "1", Name: "t"})
+	if string(got) != "{}" {
+		t.Errorf("args = %q, want {}", got)
+	}
+}
@@ -0,0 +1,139 @@
+// Package majordomo is a clean-slate substrate for building LLM-backed
+// agents: target-agnostic model access, a parseable model naming /
+// failover / tiering system with health tracking, multimodality, tool calls
+// and structured output, and agents composed from a model + system prompt +
+// toolboxes + skills.
+//
+// The one-call entry point is Parse:
+//
+//	reg := majordomo.New()
+//	m, err := reg.Parse("ollama-cloud/minimax-m3:cloud,anthropic/opus-4.8,thinking")
+//	resp, err := m.Generate(ctx, majordomo.Request{
+//	    Messages: []majordomo.Message{majordomo.UserText("hello")},
+//	})
+//
+// A spec is a comma-separated failover chain. Each element is either a
+// "provider/model" target (built-in, client-registered, or defined via an
+// LLM_* env DSN) or a registered alias/tier, which expands inline. See
+// Registry.Parse for the full grammar.
+//
+// The canonical types (Message, Request, Response, Tool, Capabilities, ...)
+// are defined in the llm subpackage and re-exported here, so most consumers
+// only ever import this package (plus agent and skill).
+package majordomo
+
+import (
+	"encoding/json"
+	"sync"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+)
+
+// Re-exported canonical types. See the llm package for documentation.
+type (
+	Model        = llm.Model
+	Provider     = llm.Provider
+	Message      = llm.Message
+	Role         = llm.Role
+	Part         = llm.Part
+	TextPart     = llm.TextPart
+	ImagePart    = llm.ImagePart
+	Request      = llm.Request
+	Response     = llm.Response
+	Option       = llm.Option
+	ModelOption  = llm.ModelOption
+	ModelConfig  = llm.ModelConfig
+	Tool         = llm.Tool
+	ToolCall     = llm.ToolCall
+	ToolResult   = llm.ToolResult
+	Toolbox      = llm.Toolbox
+	Capabilities = llm.Capabilities
+	Stream       = llm.Stream
+	StreamEvent  = llm.StreamEvent
+	Usage        = llm.Usage
+	FinishReason = llm.FinishReason
+	APIError     = llm.APIError
+	ErrorClass   = llm.ErrorClass
+)
+
+// Re-exported role and finish-reason constants.
+const (
+	RoleSystem    = llm.RoleSystem
+	RoleUser      = llm.RoleUser
+	RoleAssistant = llm.RoleAssistant
+	RoleTool      = llm.RoleTool
+
+	FinishStop          = llm.FinishStop
+	FinishLength        = llm.FinishLength
+	FinishToolCalls     = llm.FinishToolCalls
+	FinishContentFilter = llm.FinishContentFilter
+	FinishOther         = llm.FinishOther
+
+	ClassTransient = llm.ClassTransient
+	ClassPermanent = llm.ClassPermanent
+)
+
+// ErrModelNotFound re-exports llm.ErrModelNotFound.
+var ErrModelNotFound = llm.ErrModelNotFound
+
+// Re-exported content and message constructors.
+func Text(s string) Part                              { return llm.Text(s) }
+func Image(mime string, data []byte) Part             { return llm.Image(mime, data) }
+func SystemText(s string) Message                     { return llm.SystemText(s) }
+func UserText(s string) Message                       { return llm.UserText(s) }
+func UserParts(parts ...Part) Message                 { return llm.UserParts(parts...) }
+func AssistantText(s string) Message                  { return llm.AssistantText(s) }
+func ToolResultsMessage(results ...ToolResult) Message { return llm.ToolResultsMessage(results...) }
+func NewToolbox(name string, tools ...Tool) *Toolbox  { return llm.NewToolbox(name, tools...) }
+
+// Re-exported request options.
+func WithSystem(s string) Option                        { return llm.WithSystem(s) }
+func WithTools(tools ...Tool) Option                    { return llm.WithTools(tools...) }
+func WithToolbox(b *Toolbox) Option                     { return llm.WithToolbox(b) }
+func WithToolChoice(choice string) Option               { return llm.WithToolChoice(choice) }
+func WithSchema(schema json.RawMessage, name string) Option { return llm.WithSchema(schema, name) }
+func WithTemperature(t float64) Option                  { return llm.WithTemperature(t) }
+func WithTopP(p float64) Option                         { return llm.WithTopP(p) }
+func WithMaxTokens(n int) Option                        { return llm.WithMaxTokens(n) }
+func WithStopSequences(stops ...string) Option          { return llm.WithStopSequences(stops...) }
+
+// WithModelCapabilities re-exports llm.WithCapabilities for Provider.Model
+// calls made through this package.
+func WithModelCapabilities(caps Capabilities) ModelOption { return llm.WithCapabilities(caps) }
+
+// Classify re-exports llm.Classify.
+func Classify(err error) ErrorClass { return llm.Classify(err) }
+
+// defaultRegistry backs the package-level convenience functions.
+var defaultRegistry = func() func() *Registry {
+	var (
+		once sync.Once
+		reg  *Registry
+	)
+	return func() *Registry {
+		once.Do(func() { reg = New() })
+		return reg
+	}
+}()
+
+// Default returns the lazily-initialized package-level Registry (built-in
+// providers plus LLM_* env providers from the process environment).
+func Default() *Registry { return defaultRegistry() }
+
+// Parse resolves a spec using the Default registry.
+func Parse(spec string) (Model, error) { return Default().Parse(spec) }
+
+// MustParse is Parse that panics on error; for wiring code and examples.
+func MustParse(spec string) Model {
+	m, err := Parse(spec)
+	if err != nil {
+		panic(err)
+	}
+	return m
+}
+
+// RegisterProvider registers a provider in the Default registry.
+func RegisterProvider(p Provider) { Default().RegisterProvider(p) }
+
+// RegisterAlias registers an alias/tier in the Default registry.
+func RegisterAlias(name, spec string) { Default().RegisterAlias(name, spec) }
@@ -0,0 +1,122 @@
+package majordomo
+
+import (
+	"errors"
+	"fmt"
+	"slices"
+	"strings"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+)
+
+// ErrAliasCycle reports a self-referential or looping alias expansion.
+var ErrAliasCycle = errors.New("alias cycle")
+
+// ErrEmptySpec reports a spec with no usable elements.
+var ErrEmptySpec = errors.New("empty model spec")
+
+// element is one resolved chain element: a provider name plus a verbatim
+// model id.
+type element struct {
+	provider string
+	model    string
+}
+
+func (e element) key() string { return e.provider + "/" + e.model }
+
+// Parse resolves a model spec to a Model.
+//
+// Grammar:
+//
+//	spec    := chain
+//	chain   := element ("," element)*
+//	element := target | alias
+//	target  := provider "/" model
+//	alias   := bare token with no slash
+//
+// The provider of a target is the first path segment; everything after the
+// first "/" (up to the next comma) is the model id and is passed to the
+// provider verbatim — "ollama-cloud/minimax-m3:cloud" keeps its tag, and
+// Google-style ids with extra slashes survive intact. Providers resolve
+// through the registry: built-ins, RegisterProvider entries, LLM_* env
+// definitions (eager or lazy), in that order.
+//
+// An alias expands to its registered spec inline, wherever it appears in a
+// chain (head, middle, or tail), recursively, with cycle detection.
+//
+// A single element and a multi-element chain return the same Model
+// interface; callers never branch on which they got. Multi-element chains
+// try elements head-to-tail with health-tracked failover (see ChainConfig
+// and the health package).
+func (r *Registry) Parse(spec string) (llm.Model, error) {
+	elements, err := r.expand(spec, nil)
+	if err != nil {
+		return nil, err
+	}
+	if len(elements) == 0 {
+		return nil, fmt.Errorf("%w: %q", ErrEmptySpec, spec)
+	}
+
+	targets := make([]chainTarget, 0, len(elements))
+	seen := make(map[string]bool, len(elements))
+	for _, el := range elements {
+		// A duplicate element (e.g. via overlapping alias expansions) would
+		// just retry the same backed-off target; keep the first occurrence.
+		if seen[el.key()] {
+			continue
+		}
+		seen[el.key()] = true
+
+		p, err := r.providerFor(el.provider)
+		if err != nil {
+			return nil, fmt.Errorf("spec %q: %w", spec, err)
+		}
+		m, err := p.Model(el.model)
+		if err != nil {
+			return nil, fmt.Errorf("spec %q: provider %q: model %q: %w", spec, el.provider, el.model, err)
+		}
+		targets = append(targets, chainTarget{key: el.key(), model: m})
+	}
+
+	return &chain{targets: targets, tracker: r.tracker, cfg: r.chainCfg}, nil
+}
+
+// expand splits a spec into elements, expanding aliases inline and
+// recursively. visiting holds the alias names currently being expanded, for
+// cycle detection.
+func (r *Registry) expand(spec string, visiting []string) ([]element, error) {
+	var out []element
+	for raw := range strings.SplitSeq(spec, ",") {
+		raw = strings.TrimSpace(raw)
+		if raw == "" {
+			continue
+		}
+
+		if provider, model, hasSlash := strings.Cut(raw, "/"); hasSlash {
+			out = append(out, element{provider: provider, model: model})
+			continue
+		}
+
+		// Bare token: must be a registered alias.
+		r.mu.RLock()
+		target, isAlias := r.aliases[raw]
+		_, isProvider := r.providers[raw]
+		r.mu.RUnlock()
+
+		if !isAlias {
+			if isProvider {
+				return nil, fmt.Errorf("%q is a provider, not an alias — use %q", raw, raw+"/<model-id>")
+			}
+			return nil, fmt.Errorf("%w: %q is not a registered alias and has no provider/ prefix", ErrUnknownProvider, raw)
+		}
+		if slices.Contains(visiting, raw) {
+			return nil, fmt.Errorf("%w: %s", ErrAliasCycle, strings.Join(append(visiting, raw), " -> "))
+		}
+		sub, err := r.expand(target, append(visiting, raw))
+		if err != nil {
+			return nil, fmt.Errorf("alias %q: %w", raw, err)
+		}
+		out = append(out, sub...)
+	}
+	return out, nil
+}
@@ -0,0 +1,221 @@
+package majordomo
+
+import (
+	"context"
+	"errors"
+	"slices"
+	"strings"
+	"testing"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/provider/fake"
+)
+
+// newTestRegistry returns a registry isolated from the process environment.
+func newTestRegistry(t *testing.T, opts ...RegistryOption) *Registry {
+	t.Helper()
+	opts = append([]RegistryOption{
+		WithoutEnvProviders(),
+		WithEnvLookup(func(string) string { return "" }),
+	}, opts...)
+	return New(opts...)
+}
+
+// targetsOf extracts the resolved chain keys from a parsed model.
+func targetsOf(t *testing.T, m Model) []string {
+	t.Helper()
+	c, ok := m.(*chain)
+	if !ok {
+		t.Fatalf("Parse returned %T, want *chain", m)
+	}
+	return c.Targets()
+}
+
+func TestParseSingleTarget(t *testing.T) {
+	r := newTestRegistry(t)
+	r.RegisterProvider(fake.New("fp"))
+
+	m, err := r.Parse("fp/some-model:7b")
+	if err != nil {
+		t.Fatalf("Parse: %v", err)
+	}
+	want := []string{"fp/some-model:7b"}
+	if got := targetsOf(t, m); !slices.Equal(got, want) {
+		t.Errorf("targets = %v, want %v", got, want)
+	}
+
+	resp, err := m.Generate(context.Background(), Request{Messages: []Message{UserText("hi")}})
+	if err != nil {
+		t.Fatalf("Generate: %v", err)
+	}
+	if resp.Text() == "" {
+		t.Error("empty response text")
+	}
+	if resp.Model != "fp/some-model:7b" {
+		t.Errorf("resp.Model = %q, want fp/some-model:7b", resp.Model)
+	}
+}
+
+func TestParseModelIDIsVerbatim(t *testing.T) {
+	r := newTestRegistry(t)
+	r.RegisterProvider(fake.New("google"))
+	r.RegisterProvider(fake.New("ollama-cloud"))
+
+	// Everything after the first slash, up to the next comma, is the model
+	// id: colons and additional slashes pass through untouched.
+	for spec, want := range map[string]string{
+		"ollama-cloud/minimax-m3:cloud":  "ollama-cloud/minimax-m3:cloud",
+		"google/models/gemini-3.0-pro":   "google/models/gemini-3.0-pro",
+		"ollama-cloud/qwen3-coder:480b-cloud": "ollama-cloud/qwen3-coder:480b-cloud",
+	} {
+		m, err := r.Parse(spec)
+		if err != nil {
+			t.Fatalf("Parse(%q): %v", spec, err)
+		}
+		if got := targetsOf(t, m); !slices.Equal(got, []string{want}) {
+			t.Errorf("Parse(%q) targets = %v, want [%s]", spec, got, want)
+		}
+	}
+}
+
+// TestParseTrailingAliasChain covers the README's flagship example: a chain
+// whose tail is a registered alias, expanded inline.
+func TestParseTrailingAliasChain(t *testing.T) {
+	r := newTestRegistry(t)
+	r.RegisterProvider(fake.New("ollama-cloud"))
+	r.RegisterProvider(fake.New("anthropic"))
+	r.RegisterProvider(fake.New("openai"))
+	r.RegisterAlias("thinking", "openai/gpt-5.5,anthropic/opus-4.8")
+
+	m, err := r.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8,thinking")
+	if err != nil {
+		t.Fatalf("Parse: %v", err)
+	}
+	// "thinking" expands inline at the tail; its anthropic/opus-4.8 element
+	// is a duplicate of the explicit one and is kept once (first wins).
+	want := []string{
+		"ollama-cloud/minimax-m3:cloud",
+		"ollama-cloud/kimi-k2.6:cloud",
+		"anthropic/opus-4.8",
+		"openai/gpt-5.5",
+	}
+	if got := targetsOf(t, m); !slices.Equal(got, want) {
+		t.Errorf("targets = %v, want %v", got, want)
+	}
+}
+
+func TestParseAliasPositions(t *testing.T) {
+	r := newTestRegistry(t)
+	r.RegisterProvider(fake.New("fp"))
+	r.RegisterAlias("mid", "fp/m1,fp/m2")
+
+	m, err := r.Parse("fp/head,mid,fp/tail")
+	if err != nil {
+		t.Fatalf("Parse: %v", err)
+	}
+	want := []string{"fp/head", "fp/m1", "fp/m2", "fp/tail"}
+	if got := targetsOf(t, m); !slices.Equal(got, want) {
+		t.Errorf("targets = %v, want %v", got, want)
+	}
+}
+
+func TestParseNestedAlias(t *testing.T) {
+	r := newTestRegistry(t)
+	r.RegisterProvider(fake.New("fp"))
+	r.RegisterAlias("inner", "fp/deep")
+	r.RegisterAlias("outer", "inner,fp/shallow")
+
+	m, err := r.Parse("outer")
+	if err != nil {
+		t.Fatalf("Parse: %v", err)
+	}
+	want := []string{"fp/deep", "fp/shallow"}
+	if got := targetsOf(t, m); !slices.Equal(got, want) {
+		t.Errorf("targets = %v, want %v", got, want)
+	}
+}
+
+func TestParseAliasCycle(t *testing.T) {
+	r := newTestRegistry(t)
+	r.RegisterAlias("a", "b")
+	r.RegisterAlias("b", "a")
+	if _, err := r.Parse("a"); !errors.Is(err, ErrAliasCycle) {
+		t.Errorf("Parse(a) error = %v, want ErrAliasCycle", err)
+	}
+
+	r.RegisterAlias("self", "self")
+	if _, err := r.Parse("self"); !errors.Is(err, ErrAliasCycle) {
+		t.Errorf("Parse(self) error = %v, want ErrAliasCycle", err)
+	}
+}
+
+func TestParseUnknownAlias(t *testing.T) {
+	r := newTestRegistry(t)
+	if _, err := r.Parse("nonesuch"); !errors.Is(err, ErrUnknownProvider) {
+		t.Errorf("error = %v, want ErrUnknownProvider", err)
+	}
+}
+
+func TestParseBareProviderName(t *testing.T) {
+	r := newTestRegistry(t)
+	_, err := r.Parse("openai")
+	if err == nil || !strings.Contains(err.Error(), "openai/<model-id>") {
+		t.Errorf("error = %v, want hint about openai/<model-id>", err)
+	}
+}
+
+func TestParseUnknownProviderMentionsEnvVar(t *testing.T) {
+	r := newTestRegistry(t)
+	_, err := r.Parse("nope/some-model")
+	if !errors.Is(err, ErrUnknownProvider) {
+		t.Fatalf("error = %v, want ErrUnknownProvider", err)
+	}
+	if !strings.Contains(err.Error(), "LLM_NOPE") {
+		t.Errorf("error %q should mention the LLM_NOPE env var", err)
+	}
+}
+
+func TestParseEmptySpecs(t *testing.T) {
+	r := newTestRegistry(t)
+	for _, spec := range []string{"", "   ", ",", " , ,"} {
+		if _, err := r.Parse(spec); !errors.Is(err, ErrEmptySpec) {
+			t.Errorf("Parse(%q) error = %v, want ErrEmptySpec", spec, err)
+		}
+	}
+}
+
+func TestParseTrimsWhitespace(t *testing.T) {
+	r := newTestRegistry(t)
+	r.RegisterProvider(fake.New("fp"))
+	m, err := r.Parse("  fp/a , fp/b  ")
+	if err != nil {
+		t.Fatalf("Parse: %v", err)
+	}
+	want := []string{"fp/a", "fp/b"}
+	if got := targetsOf(t, m); !slices.Equal(got, want) {
+		t.Errorf("targets = %v, want %v", got, want)
+	}
+}
+
+func TestParseDeduplicatesElements(t *testing.T) {
+	r := newTestRegistry(t)
+	r.RegisterProvider(fake.New("fp"))
+	m, err := r.Parse("fp/a,fp/b,fp/a")
+	if err != nil {
+		t.Fatalf("Parse: %v", err)
+	}
+	want := []string{"fp/a", "fp/b"}
+	if got := targetsOf(t, m); !slices.Equal(got, want) {
+		t.Errorf("targets = %v, want %v", got, want)
+	}
+}
+
+func TestBuiltinsResolve(t *testing.T) {
+	r := newTestRegistry(t)
+	// All built-in provider names resolve even before their client
+	// implementations land (stub providers error only on use).
+	for _, name := range []string{"openai", "anthropic", "google", "ollama", "ollama-cloud", "foreman"} {
+		if _, err := r.Parse(name + "/anything"); err != nil {
+			t.Errorf("Parse(%s/anything): %v", name, err)
+		}
+	}
+}
@@ -0,0 +1,36 @@
+# progress
+
+## 2026-06-10 — Phase 1: foundations, ADRs, skeleton, docs
+
+**Landed:**
+- Module scaffold (Go 1.26), `.gitea/workflows/ci.yaml` (foreman-style
+  gates: build, vet, race tests, tidy-diff), `.env.example`.
+- `llm/` canonical contract: Message/Part (sealed; text+image),
+  Request/Options, Response/Usage/FinishReason, Stream/StreamEvent,
+  Tool/Toolbox (panic-safe Execute), Capabilities (zero-value semantics),
+  Model/Provider interfaces, APIError + transient/permanent Classify.
+- `health/`: clock-injected tracker — consecutive-failure threshold,
+  exponential capped cooldown, reset-on-success, thread-safe; full
+  deterministic test suite (fake clock).
+- Root: Registry (providers/aliases/schemes/health), Parse with the binding
+  grammar (verbatim model ids, inline recursive alias expansion, cycle
+  detection, dedup), LLM_* env-DSN loading (go-llm-parity lazy fallback +
+  eager LoadEnv/New scan), chain executor implementing Model
+  (retry-on-transient, bench-on-repeat, skip-benched, 404-advance,
+  fail-fast-on-auth, joined exhaustion errors). Built-ins register as
+  resolvable stubs until their phases land.
+- `provider/fake/`: scriptable provider (per-model outcome queues, request
+  recording, capabilities overrides, streaming) — the hermetic test rig.
+- ADRs 0001–0008 + index; CLAUDE.md; honest README with pending-marked
+  matrix.
+- Tests cover the two required cases: the trailing-`thinking` chain parse
+  and `LLM_M1=foreman://token@host` loading (plus DSN table, lazy fallback,
+  cycle detection, chain failover/backoff/exhaustion, toolbox execution,
+  error classification).
+
+**Notes:** chain executor landed in Phase 1 (design was settled);
+Phase 2 deepens its test matrix (cooldown re-admission via fake clock,
+alias-in-chain failover, permanent-policy override) and wires anything the
+tests flush out.
+
+**Next:** Phase 2 — exhaustive health/chain test matrix.
@@ -0,0 +1,230 @@
+// Package fake provides an in-memory llm.Provider for hermetic tests.
+//
+// Why: the resolver, env-DSN loader, chain executor, health tracker, agent
+// loop, and skill composition must all be testable with no live API calls.
+// The fake provider scripts responses and errors per model id, records every
+// request it receives, and supports tools, structured output, and streaming
+// well enough to drive those layers deterministically.
+package fake
+
+import (
+	"context"
+	"fmt"
+	"io"
+	"sync"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+)
+
+// Step is one scripted outcome: either a response or an error.
+type Step struct {
+	Response *llm.Response
+	Err      error
+}
+
+// Reply scripts a successful text response.
+func Reply(text string) Step {
+	return Step{Response: &llm.Response{
+		Parts:        []llm.Part{llm.Text(text)},
+		FinishReason: llm.FinishStop,
+		Usage:        llm.Usage{InputTokens: 1, OutputTokens: 1},
+	}}
+}
+
+// ReplyWith scripts an arbitrary successful response.
+func ReplyWith(resp llm.Response) Step { return Step{Response: &resp} }
+
+// Fail scripts an error outcome.
+func Fail(err error) Step { return Step{Err: err} }
+
+// Call records one request received by the fake, with the model id it was
+// addressed to.
+type Call struct {
+	ModelID string
+	Request llm.Request
+}
+
+// Provider is a scriptable in-memory llm.Provider.
+//
+// Outcomes are enqueued per model id with Enqueue. A model whose queue is
+// empty falls back to the provider default response (a fixed text reply).
+// All methods are safe for concurrent use.
+type Provider struct {
+	name string
+
+	mu        sync.Mutex
+	caps      llm.Capabilities
+	modelCaps map[string]llm.Capabilities
+	queues    map[string][]Step
+	calls     []Call
+	defaultFn func(modelID string, req llm.Request) Step
+}
+
+// Option configures the fake provider.
+type Option func(*Provider)
+
+// WithCapabilities sets the provider-default capabilities.
+func WithCapabilities(caps llm.Capabilities) Option {
+	return func(p *Provider) { p.caps = caps }
+}
+
+// WithModelCapabilities overrides capabilities for one model id.
+func WithModelCapabilities(modelID string, caps llm.Capabilities) Option {
+	return func(p *Provider) { p.modelCaps[modelID] = caps }
+}
+
+// WithDefault sets the outcome used when a model's queue is empty.
+func WithDefault(fn func(modelID string, req llm.Request) Step) Option {
+	return func(p *Provider) { p.defaultFn = fn }
+}
+
+// New creates a fake provider with the given registry name.
+func New(name string, opts ...Option) *Provider {
+	p := &Provider{
+		name:      name,
+		modelCaps: make(map[string]llm.Capabilities),
+		queues:    make(map[string][]Step),
+		caps: llm.Capabilities{
+			SupportsTools:      true,
+			SupportsStructured: true,
+			SupportsStreaming:  true,
+			MaxImagesPerReq:    4,
+		},
+		defaultFn: func(modelID string, _ llm.Request) Step {
+			return Reply(fmt.Sprintf("fake response from %s", modelID))
+		},
+	}
+	for _, opt := range opts {
+		opt(p)
+	}
+	return p
+}
+
+// Name implements llm.Provider.
+func (p *Provider) Name() string { return p.name }
+
+// Model implements llm.Provider. Any id is accepted.
+func (p *Provider) Model(id string, opts ...llm.ModelOption) (llm.Model, error) {
+	cfg := llm.ApplyModelOptions(opts)
+	return &model{provider: p, id: id, cfg: cfg}, nil
+}
+
+// Enqueue appends scripted outcomes for a model id.
+func (p *Provider) Enqueue(modelID string, steps ...Step) {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	p.queues[modelID] = append(p.queues[modelID], steps...)
+}
+
+// Calls returns a copy of every request received so far.
+func (p *Provider) Calls() []Call {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	out := make([]Call, len(p.calls))
+	copy(out, p.calls)
+	return out
+}
+
+// CallCount returns the number of requests received for one model id.
+func (p *Provider) CallCount(modelID string) int {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	n := 0
+	for _, c := range p.calls {
+		if c.ModelID == modelID {
+			n++
+		}
+	}
+	return n
+}
+
+// next records the call and pops the next scripted outcome.
+func (p *Provider) next(modelID string, req llm.Request) Step {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	p.calls = append(p.calls, Call{ModelID: modelID, Request: req})
+	q := p.queues[modelID]
+	if len(q) == 0 {
+		return p.defaultFn(modelID, req)
+	}
+	step := q[0]
+	p.queues[modelID] = q[1:]
+	return step
+}
+
+func (p *Provider) capsFor(modelID string, cfg llm.ModelConfig) llm.Capabilities {
+	if cfg.Capabilities != nil {
+		return *cfg.Capabilities
+	}
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	if caps, ok := p.modelCaps[modelID]; ok {
+		return caps
+	}
+	return p.caps
+}
+
+type model struct {
+	provider *Provider
+	id       string
+	cfg      llm.ModelConfig
+}
+
+func (m *model) Generate(ctx context.Context, req llm.Request, opts ...llm.Option) (*llm.Response, error) {
+	if err := ctx.Err(); err != nil {
+		return nil, err
+	}
+	req = req.Apply(opts...)
+	step := m.provider.next(m.id, req)
+	if step.Err != nil {
+		return nil, step.Err
+	}
+	resp := *step.Response
+	if resp.Model == "" {
+		resp.Model = m.provider.name + "/" + m.id
+	}
+	return &resp, nil
+}
+
+func (m *model) Stream(ctx context.Context, req llm.Request, opts ...llm.Option) (llm.Stream, error) {
+	resp, err := m.Generate(ctx, req, opts...)
+	if err != nil {
+		return nil, err
+	}
+	// Deliver the response as a small sequence of events: one text delta per
+	// part, one event per tool call, then the final response.
+	var events []llm.StreamEvent
+	for _, part := range resp.Parts {
+		if t, ok := part.(llm.TextPart); ok {
+			events = append(events, llm.StreamEvent{TextDelta: t.Text})
+		}
+	}
+	for i := range resp.ToolCalls {
+		events = append(events, llm.StreamEvent{ToolCall: &resp.ToolCalls[i]})
+	}
+	events = append(events, llm.StreamEvent{Response: resp})
+	return &stream{events: events}, nil
+}
+
+func (m *model) Capabilities() llm.Capabilities {
+	return m.provider.capsFor(m.id, m.cfg)
+}
+
+type stream struct {
+	mu     sync.Mutex
+	events []llm.StreamEvent
+	pos    int
+}
+
+func (s *stream) Next() (llm.StreamEvent, error) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	if s.pos >= len(s.events) {
+		return llm.StreamEvent{}, io.EOF
+	}
+	ev := s.events[s.pos]
+	s.pos++
+	return ev, nil
+}
+
+func (s *stream) Close() error { return nil }
@@ -0,0 +1,227 @@
+package majordomo
+
+import (
+	"fmt"
+	"os"
+	"strings"
+	"sync"
+	"time"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/health"
+	"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+)
+
+// Registry owns providers, aliases/tiers, env-DSN scheme factories, the
+// model health tracker, and Parse. It is safe for concurrent use.
+type Registry struct {
+	mu        sync.RWMutex
+	providers map[string]llm.Provider
+	aliases   map[string]string
+	schemes   map[string]SchemeFactory
+	// envErrs records LLM_* entries that failed to load so the failure
+	// surfaces when (and only when) that provider name is actually used.
+	envErrs map[string]error
+
+	tracker   *health.Tracker
+	chainCfg  ChainConfig
+	envLookup func(string) string
+}
+
+// SchemeFactory builds a provider instance from an env DSN. name is the
+// registry name the provider will be registered under (e.g. "m1" for
+// LLM_M1); dsn carries the scheme, credential, and host.
+type SchemeFactory func(name string, dsn DSN) (llm.Provider, error)
+
+// ChainConfig tunes failover-chain execution.
+type ChainConfig struct {
+	// TransientRetries is the number of immediate same-target retries after
+	// a transient error. 0 selects the default (1); negative disables
+	// retries.
+	TransientRetries int
+
+	// AdvanceOnPermanent, when true, makes the chain advance to the next
+	// element on permanent errors other than model-not-found instead of
+	// returning immediately. Model-not-found always advances (without
+	// penalizing health); auth/malformed errors default to fail-fast
+	// because failing over cannot help a bad request.
+	AdvanceOnPermanent bool
+
+	// Classify overrides the default error classifier (llm.Classify).
+	Classify func(error) llm.ErrorClass
+}
+
+// DefaultTransientRetries is the default number of same-target retries
+// after a single transient error.
+const DefaultTransientRetries = 1
+
+func (c ChainConfig) retries() int {
+	switch {
+	case c.TransientRetries < 0:
+		return 0
+	case c.TransientRetries == 0:
+		return DefaultTransientRetries
+	default:
+		return c.TransientRetries
+	}
+}
+
+func (c ChainConfig) classify(err error) llm.ErrorClass {
+	if c.Classify != nil {
+		return c.Classify(err)
+	}
+	return llm.Classify(err)
+}
+
+type registryConfig struct {
+	health    health.Config
+	chain     ChainConfig
+	envLookup func(string) string
+	environ   func() []string
+	skipEnv   bool
+}
+
+// RegistryOption configures New.
+type RegistryOption func(*registryConfig)
+
+// WithHealthConfig overrides the health tracker configuration
+// (thresholds, cooldowns, clock).
+func WithHealthConfig(cfg health.Config) RegistryOption {
+	return func(rc *registryConfig) { rc.health = cfg }
+}
+
+// WithChainConfig overrides failover-chain behavior (retry count,
+// permanent-error policy, classifier).
+func WithChainConfig(cfg ChainConfig) RegistryOption {
+	return func(rc *registryConfig) { rc.chain = cfg }
+}
+
+// WithClock injects a clock for the health tracker; tests use a fake clock
+// to step through backoff windows deterministically.
+func WithClock(clock func() time.Time) RegistryOption {
+	return func(rc *registryConfig) { rc.health.Clock = clock }
+}
+
+// WithEnvLookup injects the env-var lookup used for lazy LLM_* resolution
+// during Parse (defaults to os.Getenv). Tests use this to avoid touching
+// the process environment.
+func WithEnvLookup(lookup func(string) string) RegistryOption {
+	return func(rc *registryConfig) { rc.envLookup = lookup }
+}
+
+// WithoutEnvProviders disables the eager LLM_* scan at construction time.
+// Lazy per-name resolution during Parse still works (use WithEnvLookup to
+// control it in tests).
+func WithoutEnvProviders() RegistryOption {
+	return func(rc *registryConfig) { rc.skipEnv = true }
+}
+
+// New creates a Registry with all built-in providers and scheme factories
+// registered, then loads LLM_* env-DSN providers from the process
+// environment (unless WithoutEnvProviders is given). Malformed LLM_* entries
+// do not fail construction; the error surfaces if that provider name is
+// referenced in Parse.
+func New(opts ...RegistryOption) *Registry {
+	cfg := registryConfig{
+		envLookup: os.Getenv,
+		environ:   os.Environ,
+	}
+	for _, opt := range opts {
+		opt(&cfg)
+	}
+
+	r := &Registry{
+		providers: make(map[string]llm.Provider),
+		aliases:   make(map[string]string),
+		schemes:   make(map[string]SchemeFactory),
+		envErrs:   make(map[string]error),
+		tracker:   health.NewTracker(cfg.health),
+		chainCfg:  cfg.chain,
+		envLookup: cfg.envLookup,
+	}
+
+	registerBuiltins(r)
+
+	if !cfg.skipEnv {
+		env := make(map[string]string)
+		for _, kv := range cfg.environ() {
+			if k, v, ok := strings.Cut(kv, "="); ok {
+				env[k] = v
+			}
+		}
+		// Errors are recorded per-name and surfaced on use; see envErrs.
+		_ = r.LoadEnv(env)
+	}
+	return r
+}
+
+// RegisterProvider adds or replaces a provider under its Name().
+func (r *Registry) RegisterProvider(p llm.Provider) {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	r.providers[p.Name()] = p
+}
+
+// RegisterAlias maps a bare name (no slash) to a spec. The spec may be a
+// single target, another alias, or a comma-separated chain; Parse expands
+// aliases inline and recursively, with cycle detection.
+func (r *Registry) RegisterAlias(name, spec string) {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	r.aliases[name] = spec
+}
+
+// RegisterScheme adds or replaces an env-DSN scheme factory, letting
+// consumers wire custom provider kinds into LLM_* definitions.
+func (r *Registry) RegisterScheme(scheme string, factory SchemeFactory) {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	r.schemes[scheme] = factory
+}
+
+// Provider returns the registered provider with the given name, if any.
+func (r *Registry) Provider(name string) (llm.Provider, bool) {
+	r.mu.RLock()
+	defer r.mu.RUnlock()
+	p, ok := r.providers[name]
+	return p, ok
+}
+
+// Health exposes the registry's health tracker (read-mostly; useful for
+// diagnostics and tests).
+func (r *Registry) Health() *health.Tracker { return r.tracker }
+
+// providerFor resolves a provider name: registered providers first, then a
+// recorded env-load error for that name, then lazy LLM_* env resolution
+// (go-llm parity: "m5" → env LLM_M5, "my-prov" → LLM_MY_PROV). Providers
+// resolved lazily are cached in the registry.
+func (r *Registry) providerFor(name string) (llm.Provider, error) {
+	r.mu.RLock()
+	p, ok := r.providers[name]
+	envErr := r.envErrs[name]
+	r.mu.RUnlock()
+	if ok {
+		return p, nil
+	}
+	if envErr != nil {
+		return nil, envErr
+	}
+
+	envKey := "LLM_" + strings.ToUpper(strings.ReplaceAll(name, "-", "_"))
+	envVal := r.envLookup(envKey)
+	if envVal == "" {
+		return nil, fmt.Errorf("%w: %q (checked registry and %s env var)", ErrUnknownProvider, name, envKey)
+	}
+	p, err := r.providerFromDSN(name, envVal)
+	if err != nil {
+		return nil, fmt.Errorf("parse %s: %w", envKey, err)
+	}
+
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	// Another goroutine may have raced us here; keep the first registration.
+	if existing, ok := r.providers[name]; ok {
+		return existing, nil
+	}
+	r.providers[name] = p
+	return p, nil
+}