dcd004289f
Phase 1 of the majordomo build: - llm/ canonical contract (messages, parts, tools, capabilities, streaming, Model/Provider, error classification) - health/ clock-injected tracker (threshold bench, exponential capped cooldown, reset-on-success) - root Registry + Parse (verbatim model ids, inline recursive alias expansion with cycle detection, chain dedup), LLM_* env-DSN providers (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain executor behind the Model interface - provider/fake scriptable test provider; hermetic test suite incl. the trailing-thinking chain and foreman:// env loading - ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow, docs/phase-1-design.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
85 lines
3.9 KiB
Markdown
85 lines
3.9 KiB
Markdown
# Phase 1 design summary (for after-the-fact review)
|
||
|
||
Written at the Phase 1 → 2 boundary of the unattended build run
|
||
(2026-06-10). Captures the public surface and the decisions behind it.
|
||
Authoritative details live in the ADRs; this is the review digest.
|
||
|
||
## What the library looks like to a consumer
|
||
|
||
```go
|
||
reg := majordomo.New() // built-ins + LLM_* env providers
|
||
reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
|
||
|
||
m, err := reg.Parse("m5/qwen3:30b,ollama-cloud/kimi-k2.6:cloud,thinking")
|
||
resp, err := m.Generate(ctx, majordomo.Request{
|
||
System: "You are terse.",
|
||
Messages: []majordomo.Message{majordomo.UserText("hi")},
|
||
}, majordomo.WithMaxTokens(200))
|
||
```
|
||
|
||
- `Model` = `Generate` / `Stream` / `Capabilities`; a chain and a single
|
||
target are the same interface.
|
||
- `Provider` = `Name` / `Model(id, opts...)`; ids verbatim, no catalogs.
|
||
- Canonical types live in `majordomo/llm`, re-exported at root via aliases
|
||
(ADR-0001) — providers import `llm` only.
|
||
|
||
## Parse grammar (ADR-0003)
|
||
|
||
`spec := element ("," element)*`; element = `provider/model` (model id =
|
||
everything after the first slash, verbatim) or a bare alias token expanded
|
||
inline + recursively with cycle detection. Both kickoff README examples are
|
||
covered by tests, including the trailing-`thinking` variant and dedup of
|
||
overlapping alias expansions.
|
||
|
||
**Deviation from go-llm worth reviewing:** no `:low/:medium/:high`
|
||
reasoning-suffix stripping — it conflicts with verbatim ids
|
||
(`minimax-m3:cloud`, `richardyoung/qwen3-14b-abliterated:q4_K_M` in mort's
|
||
tiers). Plan: reasoning effort becomes an explicit request option when
|
||
providers land; mort's wrapper translates its legacy suffix dialect during
|
||
Phase 9. If you want suffix parity instead, it's an additive change behind
|
||
a RegistryOption.
|
||
|
||
## LLM_* env DSNs (ADR-0004)
|
||
|
||
Parser is byte-for-byte go-llm (`scheme://[token@]host[/path]`, https
|
||
forced, fail-on-use for malformed values). Two resolution paths:
|
||
eager scan in `New()`/`LoadEnv(map)` (kickoff requirement;
|
||
`LLM_M1` → provider `m1`) **plus** go-llm's lazy `LLM_{UPPER(name)}`
|
||
fallback at Parse time (so hyphenated names keep working). Schemes are
|
||
factories (`RegisterScheme`) — consumers can bind custom provider kinds to
|
||
DSNs.
|
||
|
||
## Health & chains (ADR-0006, ADR-0008)
|
||
|
||
Clock-injected in-memory tracker keyed `provider/model`. Transient vs
|
||
permanent via `llm.Classify` (unknown → transient; `context.Canceled` →
|
||
permanent). Defaults: 1 same-target retry; bench after 2 consecutive failed
|
||
attempts; cooldown 5s ×2 capped 5m; success resets everything. Chains skip
|
||
benched targets, advance penalty-free on 404, fail fast on auth/malformed
|
||
(flippable via `AdvanceOnPermanent`), and join per-target reasons on
|
||
exhaustion. Chain `Capabilities()` = head element (per-attempt media
|
||
normalization will use the actual target, Phase 3). Streaming failover
|
||
covers stream establishment only.
|
||
|
||
## Flagged for reconsideration
|
||
|
||
1. **Reasoning suffixes** (above) — deliberate deviation, easy to add back.
|
||
2. **Duplicate-element dedup in chains** (first occurrence wins): right for
|
||
health semantics, but means `a,b,a` won't retry `a` at the tail even
|
||
after `b` fails. Believed correct (same request, same bench state);
|
||
flag if "retry head last" matters to you.
|
||
3. **`AdvanceOnPermanent` default = fail-fast** on auth/malformed errors:
|
||
matches the kickoff; mort's old behavior was closer to
|
||
advance-on-everything. Phase 9 can set the flag per-registry if mort's
|
||
UX prefers availability.
|
||
4. **Stub built-ins**: until Phases 3–4, `openai/...` etc. parse fine and
|
||
error on use with "not implemented yet". Chains mixing stubs and real
|
||
providers will fail over past stubs naturally (the error classifies
|
||
transient) — temporary, gone by Phase 4.
|
||
|
||
## ADR set
|
||
|
||
0001 package layout · 0002 message model · 0003 parse grammar ·
|
||
0004 env DSNs · 0005 provider/capabilities · 0006 health/backoff ·
|
||
0007 dependency policy · 0008 chain semantics
|