# Phase 1 design summary (for after-the-fact review)

Written at the Phase 1 → 2 boundary of the unattended build run
(2026-06-10). Captures the public surface and the decisions behind it.
Authoritative details live in the ADRs; this is the review digest.

## What the library looks like to a consumer

```go
reg := majordomo.New()                      // built-ins + LLM_* env providers
reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")

m, err := reg.Parse("m5/qwen3:30b,ollama-cloud/kimi-k2.6:cloud,thinking")
resp, err := m.Generate(ctx, majordomo.Request{
    System:   "You are terse.",
    Messages: []majordomo.Message{majordomo.UserText("hi")},
}, majordomo.WithMaxTokens(200))
```

- `Model` = `Generate` / `Stream` / `Capabilities`; a chain and a single
  target are the same interface.
- `Provider` = `Name` / `Model(id, opts...)`; ids verbatim, no catalogs.
- Canonical types live in `majordomo/llm`, re-exported at root via aliases
  (ADR-0001) — providers import `llm` only.

## Parse grammar (ADR-0003)

`spec := element ("," element)*`; element = `provider/model` (model id =
everything after the first slash, verbatim) or a bare alias token expanded
inline + recursively with cycle detection. Both kickoff README examples are
covered by tests, including the trailing-`thinking` variant and dedup of
overlapping alias expansions.

**Deviation from go-llm worth reviewing:** no `:low/:medium/:high`
reasoning-suffix stripping — it conflicts with verbatim ids
(`minimax-m3:cloud`, `richardyoung/qwen3-14b-abliterated:q4_K_M` in mort's
tiers). Plan: reasoning effort becomes an explicit request option when
providers land; mort's wrapper translates its legacy suffix dialect during
Phase 9. If you want suffix parity instead, it's an additive change behind
a RegistryOption.

## LLM_* env DSNs (ADR-0004)

Parser is byte-for-byte go-llm (`scheme://[token@]host[/path]`, https
forced, fail-on-use for malformed values). Two resolution paths:
eager scan in `New()`/`LoadEnv(map)` (kickoff requirement;
`LLM_M1` → provider `m1`) **plus** go-llm's lazy `LLM_{UPPER(name)}`
fallback at Parse time (so hyphenated names keep working). Schemes are
factories (`RegisterScheme`) — consumers can bind custom provider kinds to
DSNs.

## Health & chains (ADR-0006, ADR-0008)

Clock-injected in-memory tracker keyed `provider/model`. Transient vs
permanent via `llm.Classify` (unknown → transient; `context.Canceled` →
permanent). Defaults: 1 same-target retry; bench after 2 consecutive failed
attempts; cooldown 5s ×2 capped 5m; success resets everything. Chains skip
benched targets, advance penalty-free on 404, fail fast on auth/malformed
(flippable via `AdvanceOnPermanent`), and join per-target reasons on
exhaustion. Chain `Capabilities()` = head element (per-attempt media
normalization will use the actual target, Phase 3). Streaming failover
covers stream establishment only.

## Flagged for reconsideration

1. **Reasoning suffixes** (above) — deliberate deviation, easy to add back.
2. **Duplicate-element dedup in chains** (first occurrence wins): right for
   health semantics, but means `a,b,a` won't retry `a` at the tail even
   after `b` fails. Believed correct (same request, same bench state);
   flag if "retry head last" matters to you.
3. **`AdvanceOnPermanent` default = fail-fast** on auth/malformed errors:
   matches the kickoff; mort's old behavior was closer to
   advance-on-everything. Phase 9 can set the flag per-registry if mort's
   UX prefers availability.
4. **Stub built-ins**: until Phases 3–4, `openai/...` etc. parse fine and
   error on use with "not implemented yet". Chains mixing stubs and real
   providers will fail over past stubs naturally (the error classifies
   transient) — temporary, gone by Phase 4.

## ADR set

0001 package layout · 0002 message model · 0003 parse grammar ·
0004 env DSNs · 0005 provider/capabilities · 0006 health/backoff ·
0007 dependency policy · 0008 chain semantics