feat: foundations — canonical types, Parse grammar, env DSNs, health, chains

Phase 1 of the majordomo build: - llm/ canonical contract (messages, parts, tools, capabilities, streaming, Model/Provider, error classification) - health/ clock-injected tracker (threshold bench, exponential capped cooldown, reset-on-success) - root Registry + Parse (verbatim model ids, inline recursive alias expansion with cycle detection, chain dedup), LLM_* env-DSN providers (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain executor behind the Model interface - provider/fake scriptable test provider; hermetic test suite incl. the trailing-thinking chain and foreman:// env loading - ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow, docs/phase-1-design.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:35:23 +02:00
parent 3025044817
commit dcd004289f
42 changed files with 3863 additions and 0 deletions
@@ -0,0 +1,46 @@
+# ADR-0001: Package layout — canonical types in a leaf `llm` package, root re-exports
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Provider implementations (openai, anthropic, google, ollama/foreman) must share
+the canonical types (Message, Request, Response, Capabilities, Model, Provider).
+If those types lived in the root `majordomo` package, the root could not also
+register built-in providers (root → provider/openai → root is an import cycle).
+go-llm solved this with a `v2/provider` leaf package; the kickoff sketch puts
+the Provider interface in `provider/provider.go` and the message types at root,
+which recreates the cycle.
+
+## Decision
+
+- All canonical contract types live in the leaf package
+  `majordomo/llm` (Message, Part, Request, Response, Option, Tool, Toolbox,
+  Capabilities, Stream, Model, Provider, error classification). It imports
+  nothing else in the module.
+- The root `majordomo` package re-exports every canonical type via type
+  aliases (plus constructor/option wrappers), so consumers write
+  `majordomo.Request`, `majordomo.UserText(...)` and rarely import `llm`.
+- The root owns assembly: Registry, Parse, env-DSN loading, the chain
+  executor, and (from Phase 3) registration of real provider clients.
+- The planned `resolve/` package is folded into the root: the grammar needs
+  registry state (aliases, providers, env fallback) at every expansion step,
+  and a callback interface between two packages bought nothing but
+  indirection.
+- `health/`, `media/`, `provider/<impl>/`, `provider/fake/`, `agent/`, and
+  `skill/` are subpackages importing `llm` (and never each other, except
+  agent → skill).
+
+## Consequences
+
+- No import cycles; new providers are additive subpackages.
+- Consumers get the flat one-import API the kickoff sketches.
+- Type aliases (not wrappers) mean zero conversion cost and full
+  interchangeability between `majordomo.X` and `llm.X`.
+
+## Alternatives considered
+
+- **Everything in root.** No cycles only if providers also live in root —
+  a single giant package. Rejected.
+- **Self-registering providers via package init() side effects.** Hides
+  wiring, breaks multi-registry isolation, surprises tests. Rejected.
@@ -0,0 +1,53 @@
+# ADR-0002: Canonical message/content model
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Every provider has a different wire shape for conversations, content,
+tool calls, and system prompts. majordomo needs one canonical shape that all
+providers translate to/from, expressive enough for multimodality and tool
+loops, small enough to keep providers honest.
+
+## Decision
+
+- `Message{Role, Parts, ToolCalls, ToolResults}` with roles system / user /
+  assistant / tool. `Part` is a **sealed** interface (`TextPart`,
+  `ImagePart`) so providers can switch exhaustively; new media kinds are
+  deliberate API changes, not silent pass-throughs.
+- `ImagePart` is **bytes + MIME only** — no URL form. The media pipeline
+  must inspect/resize/transcode images against target capabilities, which
+  requires bytes; fetching remote URLs is the caller's job, not a hidden
+  network dependency inside a model call.
+- `Request.System` is a dedicated top-level field (maps to Anthropic
+  `system`, Google `SystemInstruction`, an OpenAI/Ollama system message).
+  RoleSystem messages in the history are also accepted and folded by
+  providers. Request also carries Tools, ToolChoice, Schema/SchemaName, and
+  sampling knobs; per-call mutation happens via `Option` funcs applied to a
+  copy, so Request values are reusable.
+- Model ids never carry behavior suffixes: unlike go-llm there is **no
+  `:low/:medium/:high` reasoning-suffix grammar** (it conflicts with
+  verbatim model ids like `minimax-m3:cloud`, see ADR-0003). Reasoning
+  effort will be a request option when providers land.
+- `Response{Parts, ToolCalls, FinishReason, Usage, Model, Raw}` — `Model`
+  names the target that actually served the request (vital with chains);
+  `Raw` is the provider-native escape hatch, never required.
+- Streaming (`Stream.Next() → StreamEvent`): text deltas stream as they
+  arrive; **tool-call arguments are buffered until complete** (consumers
+  never see partial JSON); the final event carries the accumulated
+  `*Response`; `io.EOF` terminates.
+
+## Consequences
+
+- Providers stay translation layers; nothing provider-specific leaks into
+  the canonical API.
+- Callers needing remote images fetch them first — explicit, testable.
+- Partial-tool-call streaming UIs are out of scope (acceptable: arguments
+  are rarely useful before they parse).
+
+## Alternatives considered
+
+- Open `Part` interface — silent content drops on unknown kinds. Rejected.
+- URL image parts with lazy fetch — hidden I/O inside Generate, breaks
+  capability normalization. Rejected.
+- go-llm-style reasoning suffixes — see ADR-0003. Rejected.
@@ -0,0 +1,57 @@
+# ADR-0003: Parse grammar — verbatim model ids, inline alias expansion, chains
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Callers (mort first) address models by string: single targets, tier aliases,
+and comma-separated failover chains, with custom and env-defined providers as
+first-class elements. go-llm's grammar is close but nests alias-chains as
+composite Models and strips `:low/:medium/:high` reasoning suffixes, which
+collides with Ollama-style tags (`minimax-m3:cloud`) and Google-style ids.
+
+## Decision
+
+Grammar (binding, from the kickoff):
+
+```
+spec    := element ("," element)*
+element := target | alias
+target  := provider "/" model      # model = everything after the FIRST "/",
+                                   # up to the next comma, passed VERBATIM
+alias   := bare token, no slash
+```
+
+- Provider resolution order per target: registered providers (built-ins,
+  RegisterProvider, eagerly env-loaded) → lazy `LLM_{UPPER(name)}` env DSN
+  (ADR-0004) → error naming both places checked.
+- Aliases expand **inline** wherever they appear (head/middle/tail),
+  recursively, into the flat element list. Cycles are detected via the
+  expansion stack and return `ErrAliasCycle` — never a hang. Inline (not
+  nested-Model, as in go-llm) expansion keeps one flat chain so health
+  skipping and error reporting see every element uniformly.
+- Duplicate elements after expansion are dropped (first occurrence wins):
+  retrying an already-failed target in the same pass is never useful.
+- A single element and a multi-element chain return the same `Model`
+  (a chain of one) — identical retry/health semantics, callers never branch.
+- **No reasoning-suffix stripping.** mort's `:high` dialect is handled by
+  mort's spec layer during migration; majordomo will expose reasoning effort
+  as an explicit request option instead.
+- The package-level `Default()` registry (lazy, loads process env) backs
+  `majordomo.Parse` for go-llm-style one-call ergonomics; `New()` builds
+  isolated registries for tests/multi-tenant use.
+
+## Consequences
+
+- `m1/richardyoung/qwen3-14b-abliterated:q4_K_M` (a real mort tier value)
+  parses as provider `m1`, model `richardyoung/qwen3-14b-abliterated:q4_K_M`.
+- A bare token that is a provider name yields a targeted error
+  ("use openai/<model-id>").
+- Alias updates after Parse don't affect already-built Models (expansion is
+  at Parse time). mort re-parses per request, so DB-tier edits still apply.
+
+## Alternatives considered
+
+- Nested alias expansion (go-llm): opaque chains inside chains; health
+  skipping can't see the elements. Rejected.
+- Reasoning suffixes in the grammar: breaks verbatim ids. Rejected.
@@ -0,0 +1,60 @@
+# ADR-0004: LLM_* env-DSN provider definitions (go-llm parity, plus eager load)
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Steve's deployments define providers via env vars that must keep working
+unchanged:
+
+```
+LLM_M1=foreman://token@foreman-m1.orgrimmar.dudenhoeffer.casa
+LLM_M5=foreman://token@foreman-m5.orgrimmar.dudenhoeffer.casa
+```
+
+go-llm (v2/parse.go) implements this **lazily only**: `Parse("m5/x")` misses
+the registry, computes `LLM_` + UPPER(name) with `-`→`_`, reads exactly that
+var, parses `scheme://[token@]host[/path]` by plain string splits, requires
+the scheme to be a registered provider, and dials `https://` + host. There is
+no environment scan. The kickoff additionally requires `New()` to load LLM_*
+providers eagerly and a testable `LoadEnv(map)`.
+
+## Decision
+
+Implement **both** paths over one DSN parser (byte-for-byte go-llm
+semantics — `://` split, first-`@` split, trailing-`/` trim, ErrInvalidDSN on
+missing scheme/host, base URL always `https://host[/path]`):
+
+- **Eager:** `New()` scans the process environment for `LLM_<NAME>` and
+  registers each as provider `lower(<NAME>)` (underscores preserved:
+  `LLM_MY_BOX` → `my_box`). `LoadEnv(map[string]string)` is the explicit,
+  testable entry. Malformed entries never fail construction: they are
+  recorded per-name, returned joined from LoadEnv, and surface from Parse
+  only when that name is actually referenced (matching go-llm's
+  fail-on-use behavior).
+- **Lazy (go-llm parity):** an unknown provider name in Parse falls back to
+  `LLM_{UPPER(name, - → _)}`, so hyphenated spec names (`my-prov/x` →
+  `LLM_MY_PROV`) work exactly as in go-llm. Lazily resolved providers are
+  cached in the registry.
+- The DSN **scheme** selects a `SchemeFactory` (foreman, ollama,
+  ollama-cloud, openai, anthropic, google, gemini; extensible via
+  `RegisterScheme`). The factory receives the registry name and the parsed
+  DSN (token = credential, `https://host` = base URL).
+
+## Consequences
+
+- Existing muscle memory carries over: every go-llm-resolvable LLM_* var
+  resolves identically here.
+- Eager loading additionally makes env providers visible to discovery
+  (`Provider(name)`) before first use.
+- An env DSN cannot express plain-http endpoints (https is forced) — same
+  limitation as go-llm, kept deliberately for parity; local Ollama uses the
+  `ollama` provider's own default (`http://localhost:11434`) rather than a
+  DSN.
+
+## Alternatives considered
+
+- `url.Parse`-based DSN parsing: subtly different (percent-decoding,
+  userinfo passwords). Parity wins. Rejected.
+- Failing New() on malformed LLM_* vars: one stray var would break every
+  consumer at startup. Rejected.
@@ -0,0 +1,41 @@
+# ADR-0005: Provider interface and the capabilities model
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Each provider — and some individual models — imposes different limits (image
+dimensions/bytes/MIME/count, tools, structured output, streaming, context
+size). Callers must not need to know them; the library must normalize or
+clearly reject.
+
+## Decision
+
+- `Provider` is minimal: `Name()` and `Model(id, opts...) (Model, error)`.
+  Model ids pass through verbatim; providers never validate ids against a
+  catalog (models churn weekly; catalogs rot).
+- `Capabilities` is a plain struct declared **per provider** with
+  **per-model overrides** via `WithCapabilities` (a `ModelOption`). Zero
+  values mean: `MaxImagesPerReq == 0` → images unsupported;
+  `MaxImageBytes/MaxImageDimension/ContextWindow == 0` → no declared limit;
+  empty `AllowedImageMIME` → any type.
+- Providers construct without error even when credentials are missing; the
+  failure surfaces as an auth error at request time (and a chain can fail
+  over past it). Construction-time validation would make `New()` fragile.
+- Until a provider's implementation phase lands, built-ins register as
+  **stubs**: they resolve in Parse (so chains, aliases, and env DSNs are
+  fully functional) and return a clear "not implemented yet" error on use.
+
+## Consequences
+
+- The media pipeline (Phase 3, ADR to follow) can normalize against any
+  target uniformly.
+- Adding a provider is additive: implement two methods + declare
+  capabilities.
+
+## Alternatives considered
+
+- Capability methods on Model with provider-specific logic — pushes limits
+  knowledge into every caller. Rejected.
+- Model catalogs with validation — stale within weeks, breaks pass-through
+  targets like foreman. Rejected.
@@ -0,0 +1,48 @@
+# ADR-0006: Model health tracking and backoff
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+Ollama Cloud models intermittently return "high demand" errors. mort's
+behavior to preserve: one blip should not fail a request (retry); a model
+that keeps failing should be benched so chains skip it, then re-admitted
+after a cooldown. majordomo owns this (the "model health tracker").
+
+## Decision
+
+In-memory, process-local, thread-safe tracker in `health/`, keyed by
+`"provider/model-id"`, with an **injected clock** (`func() time.Time`) so
+every backoff path is unit-testable without sleeping.
+
+- **Classification** (`llm.Classify`, overridable via `ChainConfig.Classify`):
+  transient = HTTP 408/429/5xx, network timeouts, connection refused/reset,
+  DNS failures, `context.DeadlineExceeded`; permanent = HTTP
+  400/401/403/404/405/422, `ErrModelNotFound`, `context.Canceled` (the
+  caller gave up — retrying defies intent). **Unknown errors default to
+  transient**: failing over can only help availability, and a wrongly
+  benched model self-heals via cooldown, while a wrongly fail-fasted request
+  is lost.
+- **Counting:** every failed transient *attempt* increments the target's
+  consecutive-failure count; any success resets count **and** backoff
+  exponent. At threshold (default **2**) the target is benched until
+  `now + cooldown`, with cooldown = base (default **5s**) × multiplier
+  (default **2**) per consecutive backoff round, capped (default **5m**).
+  After the bench triggers, the count resets, so re-benching needs a fresh
+  run of failures — but at the doubled cooldown.
+- All knobs (threshold, base/cap/multiplier, clock, classifier, retry count)
+  are configuration with the above defaults baked in.
+- **No persistence, no interface.** The tracker is a concrete type; health
+  is process-local by design (out-of-scope guardrail). A consumer wanting
+  shared state can wrap the registry; we do not build for it now.
+
+## Consequences
+
+- Deterministic tests via fake clock; no `time.Sleep` anywhere.
+- Two providers addressing the same upstream model (e.g. `m1/x` and `m5/x`)
+  track independently — correct, since the backends are different machines.
+
+## Alternatives considered
+
+- Persistent/pluggable health store — explicitly out of scope. Rejected.
+- Unknown→permanent default — drops availability on novel errors. Rejected.
@@ -0,0 +1,31 @@
+# ADR-0007: Dependency policy — stdlib-first, hand-rolled REST clients
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+go-llm leans on SDKs (openai-go, go-anthropic, genai) and carries their
+transitive weight and churn. The kickoff mandates minimal dependencies with
+full control over multimodal payloads and capability handling.
+
+## Decision
+
+- **Hand-rolled `net/http` JSON clients** for OpenAI(+compatible),
+  Anthropic(+compatible), Ollama (cloud + local), and foreman. Their REST
+  surfaces are small and stable; owning the wire shapes gives exact control
+  over tool calls, structured output, streaming, and image payloads.
+- **One approved third-party dependency:** the official Google Gen AI Go SDK
+  (`google.golang.org/genai`) for the Gemini provider — Google's surface
+  moves too much to hand-roll profitably.
+- Image normalization uses stdlib `image`, `image/jpeg`, `image/png`.
+  `golang.org/x/image` may be added **only** if a needed format demands it,
+  via a new ADR.
+- Any other third-party dependency requires its own ADR justifying it.
+- No persistent store, no metrics stack, no config framework, no CLI beyond
+  `examples/` (out-of-scope guardrails).
+
+## Consequences
+
+- `go.mod` stays near-empty; consumers inherit almost nothing transitively.
+- We own wire-format drift: provider docs are verified against current
+  documentation at implementation time and recorded in the provider ADRs.
@@ -0,0 +1,60 @@
+# ADR-0008: Failover-chain execution semantics
+
+**Status:** Accepted — 2026-06-10
+
+## Context
+
+A parsed spec is an ordered chain of targets sharing the registry's health
+tracker. The executor must realize the kickoff's failover story (retry one
+blip; bench repeat offenders; skip benched targets; clear exhaustion errors)
+identically for chains of one and many.
+
+## Decision
+
+For each request, iterate elements head-to-tail:
+
+1. **Skip** targets currently benched (recorded in the exhaustion error).
+2. Attempt the target. On success → report success (resets health), return.
+3. On error, classify:
+   - **Permanent + model-not-found** → advance, no health penalty.
+   - **Permanent otherwise** (auth, malformed) → **fail fast** by default —
+     failing over cannot fix a bad request; `ChainConfig.AdvanceOnPermanent`
+     flips this for callers who prefer availability.
+   - **Transient** → report the failed attempt to the tracker; retry the
+     same target while attempts remain (`TransientRetries`, default 1)
+     **unless the tracker just benched it**, in which case advance
+     immediately.
+4. All elements failed/skipped → return `errors.Join(ErrChainExhausted,
+   per-target reasons...)` naming every target and why.
+
+Other decisions:
+
+- **Capabilities() = head element's capabilities.** The head is the
+  preferred target and the honest answer to "what should I prepare for?".
+  Per-attempt media normalization (Phase 3) uses the *actual* target's
+  capabilities, so fallbacks still get correctly-fitted inputs.
+  Intersection semantics were rejected: a rarely-used tail fallback would
+  artificially constrain every request.
+- **Streaming failover applies to stream establishment only.** Once a
+  stream is open, mid-stream errors propagate; silently restarting on
+  another target would re-deliver partial output.
+- `context.Canceled` aborts the chain immediately between and during
+  attempts.
+- Duplicate post-expansion elements were already dropped at Parse
+  (ADR-0003).
+
+## Consequences
+
+- "One transient error is fine" holds: blip → same-target retry succeeds,
+  no failover, one health mark that the success immediately clears... and
+  with default knobs (retries=1, threshold=2) a target whose retry also
+  fails is benched in the same request and the chain advances — exactly the
+  kickoff narrative.
+- Single-target specs get the same retry/backoff behavior for free.
+
+## Alternatives considered
+
+- Per-request (not per-attempt) failure counting — needs two failed
+  *requests* to bench, letting a dead model eat the retry budget twice.
+  Rejected as weaker than the kickoff's story.
+- Intersection capabilities — see above. Rejected.
@@ -0,0 +1,14 @@
+# Architecture Decision Records
+
+One decision per file, append-only; supersede rather than rewrite.
+
+| ADR | Title | Status |
+|-----|-------|--------|
+| [0001](0001-package-layout.md) | Package layout — canonical types in leaf `llm`, root re-exports | Accepted |
+| [0002](0002-canonical-message-model.md) | Canonical message/content model | Accepted |
+| [0003](0003-parse-grammar.md) | Parse grammar — verbatim ids, inline alias expansion, chains | Accepted |
+| [0004](0004-env-dsn-providers.md) | LLM_* env-DSN provider definitions (go-llm parity + eager load) | Accepted |
+| [0005](0005-provider-capabilities.md) | Provider interface and capabilities model | Accepted |
+| [0006](0006-health-and-backoff.md) | Model health tracking and backoff | Accepted |
+| [0007](0007-dependency-policy.md) | Dependency policy — stdlib-first, hand-rolled REST clients | Accepted |
+| [0008](0008-chain-semantics.md) | Failover-chain execution semantics | Accepted |
@@ -0,0 +1,84 @@
+# Phase 1 design summary (for after-the-fact review)
+
+Written at the Phase 1 → 2 boundary of the unattended build run
+(2026-06-10). Captures the public surface and the decisions behind it.
+Authoritative details live in the ADRs; this is the review digest.
+
+## What the library looks like to a consumer
+
+```go
+reg := majordomo.New()                      // built-ins + LLM_* env providers
+reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
+
+m, err := reg.Parse("m5/qwen3:30b,ollama-cloud/kimi-k2.6:cloud,thinking")
+resp, err := m.Generate(ctx, majordomo.Request{
+    System:   "You are terse.",
+    Messages: []majordomo.Message{majordomo.UserText("hi")},
+}, majordomo.WithMaxTokens(200))
+```
+
+- `Model` = `Generate` / `Stream` / `Capabilities`; a chain and a single
+  target are the same interface.
+- `Provider` = `Name` / `Model(id, opts...)`; ids verbatim, no catalogs.
+- Canonical types live in `majordomo/llm`, re-exported at root via aliases
+  (ADR-0001) — providers import `llm` only.
+
+## Parse grammar (ADR-0003)
+
+`spec := element ("," element)*`; element = `provider/model` (model id =
+everything after the first slash, verbatim) or a bare alias token expanded
+inline + recursively with cycle detection. Both kickoff README examples are
+covered by tests, including the trailing-`thinking` variant and dedup of
+overlapping alias expansions.
+
+**Deviation from go-llm worth reviewing:** no `:low/:medium/:high`
+reasoning-suffix stripping — it conflicts with verbatim ids
+(`minimax-m3:cloud`, `richardyoung/qwen3-14b-abliterated:q4_K_M` in mort's
+tiers). Plan: reasoning effort becomes an explicit request option when
+providers land; mort's wrapper translates its legacy suffix dialect during
+Phase 9. If you want suffix parity instead, it's an additive change behind
+a RegistryOption.
+
+## LLM_* env DSNs (ADR-0004)
+
+Parser is byte-for-byte go-llm (`scheme://[token@]host[/path]`, https
+forced, fail-on-use for malformed values). Two resolution paths:
+eager scan in `New()`/`LoadEnv(map)` (kickoff requirement;
+`LLM_M1` → provider `m1`) **plus** go-llm's lazy `LLM_{UPPER(name)}`
+fallback at Parse time (so hyphenated names keep working). Schemes are
+factories (`RegisterScheme`) — consumers can bind custom provider kinds to
+DSNs.
+
+## Health & chains (ADR-0006, ADR-0008)
+
+Clock-injected in-memory tracker keyed `provider/model`. Transient vs
+permanent via `llm.Classify` (unknown → transient; `context.Canceled` →
+permanent). Defaults: 1 same-target retry; bench after 2 consecutive failed
+attempts; cooldown 5s ×2 capped 5m; success resets everything. Chains skip
+benched targets, advance penalty-free on 404, fail fast on auth/malformed
+(flippable via `AdvanceOnPermanent`), and join per-target reasons on
+exhaustion. Chain `Capabilities()` = head element (per-attempt media
+normalization will use the actual target, Phase 3). Streaming failover
+covers stream establishment only.
+
+## Flagged for reconsideration
+
+1. **Reasoning suffixes** (above) — deliberate deviation, easy to add back.
+2. **Duplicate-element dedup in chains** (first occurrence wins): right for
+   health semantics, but means `a,b,a` won't retry `a` at the tail even
+   after `b` fails. Believed correct (same request, same bench state);
+   flag if "retry head last" matters to you.
+3. **`AdvanceOnPermanent` default = fail-fast** on auth/malformed errors:
+   matches the kickoff; mort's old behavior was closer to
+   advance-on-everything. Phase 9 can set the flag per-registry if mort's
+   UX prefers availability.
+4. **Stub built-ins**: until Phases 3–4, `openai/...` etc. parse fine and
+   error on use with "not implemented yet". Chains mixing stubs and real
+   providers will fail over past stubs naturally (the error classifies
+   transient) — temporary, gone by Phase 4.
+
+## ADR set
+
+0001 package layout · 0002 message model · 0003 parse grammar ·
+0004 env DSNs · 0005 provider/capabilities · 0006 health/backoff ·
+0007 dependency policy · 0008 chain semantics