Files
majordomo/docs/phase-1-design.md
T
steve dcd004289f feat: foundations — canonical types, Parse grammar, env DSNs, health, chains
Phase 1 of the majordomo build:
- llm/ canonical contract (messages, parts, tools, capabilities, streaming,
  Model/Provider, error classification)
- health/ clock-injected tracker (threshold bench, exponential capped
  cooldown, reset-on-success)
- root Registry + Parse (verbatim model ids, inline recursive alias
  expansion with cycle detection, chain dedup), LLM_* env-DSN providers
  (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain
  executor behind the Model interface
- provider/fake scriptable test provider; hermetic test suite incl. the
  trailing-thinking chain and foreman:// env loading
- ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow,
  docs/phase-1-design.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:35:34 +02:00

85 lines
3.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 1 design summary (for after-the-fact review)
Written at the Phase 1 → 2 boundary of the unattended build run
(2026-06-10). Captures the public surface and the decisions behind it.
Authoritative details live in the ADRs; this is the review digest.
## What the library looks like to a consumer
```go
reg := majordomo.New() // built-ins + LLM_* env providers
reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
m, err := reg.Parse("m5/qwen3:30b,ollama-cloud/kimi-k2.6:cloud,thinking")
resp, err := m.Generate(ctx, majordomo.Request{
System: "You are terse.",
Messages: []majordomo.Message{majordomo.UserText("hi")},
}, majordomo.WithMaxTokens(200))
```
- `Model` = `Generate` / `Stream` / `Capabilities`; a chain and a single
target are the same interface.
- `Provider` = `Name` / `Model(id, opts...)`; ids verbatim, no catalogs.
- Canonical types live in `majordomo/llm`, re-exported at root via aliases
(ADR-0001) — providers import `llm` only.
## Parse grammar (ADR-0003)
`spec := element ("," element)*`; element = `provider/model` (model id =
everything after the first slash, verbatim) or a bare alias token expanded
inline + recursively with cycle detection. Both kickoff README examples are
covered by tests, including the trailing-`thinking` variant and dedup of
overlapping alias expansions.
**Deviation from go-llm worth reviewing:** no `:low/:medium/:high`
reasoning-suffix stripping — it conflicts with verbatim ids
(`minimax-m3:cloud`, `richardyoung/qwen3-14b-abliterated:q4_K_M` in mort's
tiers). Plan: reasoning effort becomes an explicit request option when
providers land; mort's wrapper translates its legacy suffix dialect during
Phase 9. If you want suffix parity instead, it's an additive change behind
a RegistryOption.
## LLM_* env DSNs (ADR-0004)
Parser is byte-for-byte go-llm (`scheme://[token@]host[/path]`, https
forced, fail-on-use for malformed values). Two resolution paths:
eager scan in `New()`/`LoadEnv(map)` (kickoff requirement;
`LLM_M1` → provider `m1`) **plus** go-llm's lazy `LLM_{UPPER(name)}`
fallback at Parse time (so hyphenated names keep working). Schemes are
factories (`RegisterScheme`) — consumers can bind custom provider kinds to
DSNs.
## Health & chains (ADR-0006, ADR-0008)
Clock-injected in-memory tracker keyed `provider/model`. Transient vs
permanent via `llm.Classify` (unknown → transient; `context.Canceled`
permanent). Defaults: 1 same-target retry; bench after 2 consecutive failed
attempts; cooldown 5s ×2 capped 5m; success resets everything. Chains skip
benched targets, advance penalty-free on 404, fail fast on auth/malformed
(flippable via `AdvanceOnPermanent`), and join per-target reasons on
exhaustion. Chain `Capabilities()` = head element (per-attempt media
normalization will use the actual target, Phase 3). Streaming failover
covers stream establishment only.
## Flagged for reconsideration
1. **Reasoning suffixes** (above) — deliberate deviation, easy to add back.
2. **Duplicate-element dedup in chains** (first occurrence wins): right for
health semantics, but means `a,b,a` won't retry `a` at the tail even
after `b` fails. Believed correct (same request, same bench state);
flag if "retry head last" matters to you.
3. **`AdvanceOnPermanent` default = fail-fast** on auth/malformed errors:
matches the kickoff; mort's old behavior was closer to
advance-on-everything. Phase 9 can set the flag per-registry if mort's
UX prefers availability.
4. **Stub built-ins**: until Phases 34, `openai/...` etc. parse fine and
error on use with "not implemented yet". Chains mixing stubs and real
providers will fail over past stubs naturally (the error classifies
transient) — temporary, gone by Phase 4.
## ADR set
0001 package layout · 0002 message model · 0003 parse grammar ·
0004 env DSNs · 0005 provider/capabilities · 0006 health/backoff ·
0007 dependency policy · 0008 chain semantics