Files
majordomo/docs/phase-1-design.md
T
steve dcd004289f feat: foundations — canonical types, Parse grammar, env DSNs, health, chains
Phase 1 of the majordomo build:
- llm/ canonical contract (messages, parts, tools, capabilities, streaming,
  Model/Provider, error classification)
- health/ clock-injected tracker (threshold bench, exponential capped
  cooldown, reset-on-success)
- root Registry + Parse (verbatim model ids, inline recursive alias
  expansion with cycle detection, chain dedup), LLM_* env-DSN providers
  (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain
  executor behind the Model interface
- provider/fake scriptable test provider; hermetic test suite incl. the
  trailing-thinking chain and foreman:// env loading
- ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow,
  docs/phase-1-design.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:35:34 +02:00

3.9 KiB
Raw Blame History

Phase 1 design summary (for after-the-fact review)

Written at the Phase 1 → 2 boundary of the unattended build run (2026-06-10). Captures the public surface and the decisions behind it. Authoritative details live in the ADRs; this is the review digest.

What the library looks like to a consumer

reg := majordomo.New()                      // built-ins + LLM_* env providers
reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")

m, err := reg.Parse("m5/qwen3:30b,ollama-cloud/kimi-k2.6:cloud,thinking")
resp, err := m.Generate(ctx, majordomo.Request{
    System:   "You are terse.",
    Messages: []majordomo.Message{majordomo.UserText("hi")},
}, majordomo.WithMaxTokens(200))
  • Model = Generate / Stream / Capabilities; a chain and a single target are the same interface.
  • Provider = Name / Model(id, opts...); ids verbatim, no catalogs.
  • Canonical types live in majordomo/llm, re-exported at root via aliases (ADR-0001) — providers import llm only.

Parse grammar (ADR-0003)

spec := element ("," element)*; element = provider/model (model id = everything after the first slash, verbatim) or a bare alias token expanded inline + recursively with cycle detection. Both kickoff README examples are covered by tests, including the trailing-thinking variant and dedup of overlapping alias expansions.

Deviation from go-llm worth reviewing: no :low/:medium/:high reasoning-suffix stripping — it conflicts with verbatim ids (minimax-m3:cloud, richardyoung/qwen3-14b-abliterated:q4_K_M in mort's tiers). Plan: reasoning effort becomes an explicit request option when providers land; mort's wrapper translates its legacy suffix dialect during Phase 9. If you want suffix parity instead, it's an additive change behind a RegistryOption.

LLM_* env DSNs (ADR-0004)

Parser is byte-for-byte go-llm (scheme://[token@]host[/path], https forced, fail-on-use for malformed values). Two resolution paths: eager scan in New()/LoadEnv(map) (kickoff requirement; LLM_M1 → provider m1) plus go-llm's lazy LLM_{UPPER(name)} fallback at Parse time (so hyphenated names keep working). Schemes are factories (RegisterScheme) — consumers can bind custom provider kinds to DSNs.

Health & chains (ADR-0006, ADR-0008)

Clock-injected in-memory tracker keyed provider/model. Transient vs permanent via llm.Classify (unknown → transient; context.Canceled → permanent). Defaults: 1 same-target retry; bench after 2 consecutive failed attempts; cooldown 5s ×2 capped 5m; success resets everything. Chains skip benched targets, advance penalty-free on 404, fail fast on auth/malformed (flippable via AdvanceOnPermanent), and join per-target reasons on exhaustion. Chain Capabilities() = head element (per-attempt media normalization will use the actual target, Phase 3). Streaming failover covers stream establishment only.

Flagged for reconsideration

  1. Reasoning suffixes (above) — deliberate deviation, easy to add back.
  2. Duplicate-element dedup in chains (first occurrence wins): right for health semantics, but means a,b,a won't retry a at the tail even after b fails. Believed correct (same request, same bench state); flag if "retry head last" matters to you.
  3. AdvanceOnPermanent default = fail-fast on auth/malformed errors: matches the kickoff; mort's old behavior was closer to advance-on-everything. Phase 9 can set the flag per-registry if mort's UX prefers availability.
  4. Stub built-ins: until Phases 34, openai/... etc. parse fine and error on use with "not implemented yet". Chains mixing stubs and real providers will fail over past stubs naturally (the error classifies transient) — temporary, gone by Phase 4.

ADR set

0001 package layout · 0002 message model · 0003 parse grammar · 0004 env DSNs · 0005 provider/capabilities · 0006 health/backoff · 0007 dependency policy · 0008 chain semantics