feat: foundations — canonical types, Parse grammar, env DSNs, health, chains

Phase 1 of the majordomo build:
- llm/ canonical contract (messages, parts, tools, capabilities, streaming,
  Model/Provider, error classification)
- health/ clock-injected tracker (threshold bench, exponential capped
  cooldown, reset-on-success)
- root Registry + Parse (verbatim model ids, inline recursive alias
  expansion with cycle detection, chain dedup), LLM_* env-DSN providers
  (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain
  executor behind the Model interface
- provider/fake scriptable test provider; hermetic test suite incl. the
  trailing-thinking chain and foreman:// env loading
- ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow,
  docs/phase-1-design.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-06-10 12:35:23 +02:00
parent 3025044817
commit dcd004289f
42 changed files with 3863 additions and 0 deletions
+46
View File
@@ -0,0 +1,46 @@
# ADR-0001: Package layout — canonical types in a leaf `llm` package, root re-exports
**Status:** Accepted — 2026-06-10
## Context
Provider implementations (openai, anthropic, google, ollama/foreman) must share
the canonical types (Message, Request, Response, Capabilities, Model, Provider).
If those types lived in the root `majordomo` package, the root could not also
register built-in providers (root → provider/openai → root is an import cycle).
go-llm solved this with a `v2/provider` leaf package; the kickoff sketch puts
the Provider interface in `provider/provider.go` and the message types at root,
which recreates the cycle.
## Decision
- All canonical contract types live in the leaf package
`majordomo/llm` (Message, Part, Request, Response, Option, Tool, Toolbox,
Capabilities, Stream, Model, Provider, error classification). It imports
nothing else in the module.
- The root `majordomo` package re-exports every canonical type via type
aliases (plus constructor/option wrappers), so consumers write
`majordomo.Request`, `majordomo.UserText(...)` and rarely import `llm`.
- The root owns assembly: Registry, Parse, env-DSN loading, the chain
executor, and (from Phase 3) registration of real provider clients.
- The planned `resolve/` package is folded into the root: the grammar needs
registry state (aliases, providers, env fallback) at every expansion step,
and a callback interface between two packages bought nothing but
indirection.
- `health/`, `media/`, `provider/<impl>/`, `provider/fake/`, `agent/`, and
`skill/` are subpackages importing `llm` (and never each other, except
agent → skill).
## Consequences
- No import cycles; new providers are additive subpackages.
- Consumers get the flat one-import API the kickoff sketches.
- Type aliases (not wrappers) mean zero conversion cost and full
interchangeability between `majordomo.X` and `llm.X`.
## Alternatives considered
- **Everything in root.** No cycles only if providers also live in root —
a single giant package. Rejected.
- **Self-registering providers via package init() side effects.** Hides
wiring, breaks multi-registry isolation, surprises tests. Rejected.
+53
View File
@@ -0,0 +1,53 @@
# ADR-0002: Canonical message/content model
**Status:** Accepted — 2026-06-10
## Context
Every provider has a different wire shape for conversations, content,
tool calls, and system prompts. majordomo needs one canonical shape that all
providers translate to/from, expressive enough for multimodality and tool
loops, small enough to keep providers honest.
## Decision
- `Message{Role, Parts, ToolCalls, ToolResults}` with roles system / user /
assistant / tool. `Part` is a **sealed** interface (`TextPart`,
`ImagePart`) so providers can switch exhaustively; new media kinds are
deliberate API changes, not silent pass-throughs.
- `ImagePart` is **bytes + MIME only** — no URL form. The media pipeline
must inspect/resize/transcode images against target capabilities, which
requires bytes; fetching remote URLs is the caller's job, not a hidden
network dependency inside a model call.
- `Request.System` is a dedicated top-level field (maps to Anthropic
`system`, Google `SystemInstruction`, an OpenAI/Ollama system message).
RoleSystem messages in the history are also accepted and folded by
providers. Request also carries Tools, ToolChoice, Schema/SchemaName, and
sampling knobs; per-call mutation happens via `Option` funcs applied to a
copy, so Request values are reusable.
- Model ids never carry behavior suffixes: unlike go-llm there is **no
`:low/:medium/:high` reasoning-suffix grammar** (it conflicts with
verbatim model ids like `minimax-m3:cloud`, see ADR-0003). Reasoning
effort will be a request option when providers land.
- `Response{Parts, ToolCalls, FinishReason, Usage, Model, Raw}``Model`
names the target that actually served the request (vital with chains);
`Raw` is the provider-native escape hatch, never required.
- Streaming (`Stream.Next() → StreamEvent`): text deltas stream as they
arrive; **tool-call arguments are buffered until complete** (consumers
never see partial JSON); the final event carries the accumulated
`*Response`; `io.EOF` terminates.
## Consequences
- Providers stay translation layers; nothing provider-specific leaks into
the canonical API.
- Callers needing remote images fetch them first — explicit, testable.
- Partial-tool-call streaming UIs are out of scope (acceptable: arguments
are rarely useful before they parse).
## Alternatives considered
- Open `Part` interface — silent content drops on unknown kinds. Rejected.
- URL image parts with lazy fetch — hidden I/O inside Generate, breaks
capability normalization. Rejected.
- go-llm-style reasoning suffixes — see ADR-0003. Rejected.
+57
View File
@@ -0,0 +1,57 @@
# ADR-0003: Parse grammar — verbatim model ids, inline alias expansion, chains
**Status:** Accepted — 2026-06-10
## Context
Callers (mort first) address models by string: single targets, tier aliases,
and comma-separated failover chains, with custom and env-defined providers as
first-class elements. go-llm's grammar is close but nests alias-chains as
composite Models and strips `:low/:medium/:high` reasoning suffixes, which
collides with Ollama-style tags (`minimax-m3:cloud`) and Google-style ids.
## Decision
Grammar (binding, from the kickoff):
```
spec := element ("," element)*
element := target | alias
target := provider "/" model # model = everything after the FIRST "/",
# up to the next comma, passed VERBATIM
alias := bare token, no slash
```
- Provider resolution order per target: registered providers (built-ins,
RegisterProvider, eagerly env-loaded) → lazy `LLM_{UPPER(name)}` env DSN
(ADR-0004) → error naming both places checked.
- Aliases expand **inline** wherever they appear (head/middle/tail),
recursively, into the flat element list. Cycles are detected via the
expansion stack and return `ErrAliasCycle` — never a hang. Inline (not
nested-Model, as in go-llm) expansion keeps one flat chain so health
skipping and error reporting see every element uniformly.
- Duplicate elements after expansion are dropped (first occurrence wins):
retrying an already-failed target in the same pass is never useful.
- A single element and a multi-element chain return the same `Model`
(a chain of one) — identical retry/health semantics, callers never branch.
- **No reasoning-suffix stripping.** mort's `:high` dialect is handled by
mort's spec layer during migration; majordomo will expose reasoning effort
as an explicit request option instead.
- The package-level `Default()` registry (lazy, loads process env) backs
`majordomo.Parse` for go-llm-style one-call ergonomics; `New()` builds
isolated registries for tests/multi-tenant use.
## Consequences
- `m1/richardyoung/qwen3-14b-abliterated:q4_K_M` (a real mort tier value)
parses as provider `m1`, model `richardyoung/qwen3-14b-abliterated:q4_K_M`.
- A bare token that is a provider name yields a targeted error
("use openai/<model-id>").
- Alias updates after Parse don't affect already-built Models (expansion is
at Parse time). mort re-parses per request, so DB-tier edits still apply.
## Alternatives considered
- Nested alias expansion (go-llm): opaque chains inside chains; health
skipping can't see the elements. Rejected.
- Reasoning suffixes in the grammar: breaks verbatim ids. Rejected.
+60
View File
@@ -0,0 +1,60 @@
# ADR-0004: LLM_* env-DSN provider definitions (go-llm parity, plus eager load)
**Status:** Accepted — 2026-06-10
## Context
Steve's deployments define providers via env vars that must keep working
unchanged:
```
LLM_M1=foreman://token@foreman-m1.orgrimmar.dudenhoeffer.casa
LLM_M5=foreman://token@foreman-m5.orgrimmar.dudenhoeffer.casa
```
go-llm (v2/parse.go) implements this **lazily only**: `Parse("m5/x")` misses
the registry, computes `LLM_` + UPPER(name) with `-``_`, reads exactly that
var, parses `scheme://[token@]host[/path]` by plain string splits, requires
the scheme to be a registered provider, and dials `https://` + host. There is
no environment scan. The kickoff additionally requires `New()` to load LLM_*
providers eagerly and a testable `LoadEnv(map)`.
## Decision
Implement **both** paths over one DSN parser (byte-for-byte go-llm
semantics — `://` split, first-`@` split, trailing-`/` trim, ErrInvalidDSN on
missing scheme/host, base URL always `https://host[/path]`):
- **Eager:** `New()` scans the process environment for `LLM_<NAME>` and
registers each as provider `lower(<NAME>)` (underscores preserved:
`LLM_MY_BOX``my_box`). `LoadEnv(map[string]string)` is the explicit,
testable entry. Malformed entries never fail construction: they are
recorded per-name, returned joined from LoadEnv, and surface from Parse
only when that name is actually referenced (matching go-llm's
fail-on-use behavior).
- **Lazy (go-llm parity):** an unknown provider name in Parse falls back to
`LLM_{UPPER(name, - → _)}`, so hyphenated spec names (`my-prov/x`
`LLM_MY_PROV`) work exactly as in go-llm. Lazily resolved providers are
cached in the registry.
- The DSN **scheme** selects a `SchemeFactory` (foreman, ollama,
ollama-cloud, openai, anthropic, google, gemini; extensible via
`RegisterScheme`). The factory receives the registry name and the parsed
DSN (token = credential, `https://host` = base URL).
## Consequences
- Existing muscle memory carries over: every go-llm-resolvable LLM_* var
resolves identically here.
- Eager loading additionally makes env providers visible to discovery
(`Provider(name)`) before first use.
- An env DSN cannot express plain-http endpoints (https is forced) — same
limitation as go-llm, kept deliberately for parity; local Ollama uses the
`ollama` provider's own default (`http://localhost:11434`) rather than a
DSN.
## Alternatives considered
- `url.Parse`-based DSN parsing: subtly different (percent-decoding,
userinfo passwords). Parity wins. Rejected.
- Failing New() on malformed LLM_* vars: one stray var would break every
consumer at startup. Rejected.
+41
View File
@@ -0,0 +1,41 @@
# ADR-0005: Provider interface and the capabilities model
**Status:** Accepted — 2026-06-10
## Context
Each provider — and some individual models — imposes different limits (image
dimensions/bytes/MIME/count, tools, structured output, streaming, context
size). Callers must not need to know them; the library must normalize or
clearly reject.
## Decision
- `Provider` is minimal: `Name()` and `Model(id, opts...) (Model, error)`.
Model ids pass through verbatim; providers never validate ids against a
catalog (models churn weekly; catalogs rot).
- `Capabilities` is a plain struct declared **per provider** with
**per-model overrides** via `WithCapabilities` (a `ModelOption`). Zero
values mean: `MaxImagesPerReq == 0` → images unsupported;
`MaxImageBytes/MaxImageDimension/ContextWindow == 0` → no declared limit;
empty `AllowedImageMIME` → any type.
- Providers construct without error even when credentials are missing; the
failure surfaces as an auth error at request time (and a chain can fail
over past it). Construction-time validation would make `New()` fragile.
- Until a provider's implementation phase lands, built-ins register as
**stubs**: they resolve in Parse (so chains, aliases, and env DSNs are
fully functional) and return a clear "not implemented yet" error on use.
## Consequences
- The media pipeline (Phase 3, ADR to follow) can normalize against any
target uniformly.
- Adding a provider is additive: implement two methods + declare
capabilities.
## Alternatives considered
- Capability methods on Model with provider-specific logic — pushes limits
knowledge into every caller. Rejected.
- Model catalogs with validation — stale within weeks, breaks pass-through
targets like foreman. Rejected.
+48
View File
@@ -0,0 +1,48 @@
# ADR-0006: Model health tracking and backoff
**Status:** Accepted — 2026-06-10
## Context
Ollama Cloud models intermittently return "high demand" errors. mort's
behavior to preserve: one blip should not fail a request (retry); a model
that keeps failing should be benched so chains skip it, then re-admitted
after a cooldown. majordomo owns this (the "model health tracker").
## Decision
In-memory, process-local, thread-safe tracker in `health/`, keyed by
`"provider/model-id"`, with an **injected clock** (`func() time.Time`) so
every backoff path is unit-testable without sleeping.
- **Classification** (`llm.Classify`, overridable via `ChainConfig.Classify`):
transient = HTTP 408/429/5xx, network timeouts, connection refused/reset,
DNS failures, `context.DeadlineExceeded`; permanent = HTTP
400/401/403/404/405/422, `ErrModelNotFound`, `context.Canceled` (the
caller gave up — retrying defies intent). **Unknown errors default to
transient**: failing over can only help availability, and a wrongly
benched model self-heals via cooldown, while a wrongly fail-fasted request
is lost.
- **Counting:** every failed transient *attempt* increments the target's
consecutive-failure count; any success resets count **and** backoff
exponent. At threshold (default **2**) the target is benched until
`now + cooldown`, with cooldown = base (default **5s**) × multiplier
(default **2**) per consecutive backoff round, capped (default **5m**).
After the bench triggers, the count resets, so re-benching needs a fresh
run of failures — but at the doubled cooldown.
- All knobs (threshold, base/cap/multiplier, clock, classifier, retry count)
are configuration with the above defaults baked in.
- **No persistence, no interface.** The tracker is a concrete type; health
is process-local by design (out-of-scope guardrail). A consumer wanting
shared state can wrap the registry; we do not build for it now.
## Consequences
- Deterministic tests via fake clock; no `time.Sleep` anywhere.
- Two providers addressing the same upstream model (e.g. `m1/x` and `m5/x`)
track independently — correct, since the backends are different machines.
## Alternatives considered
- Persistent/pluggable health store — explicitly out of scope. Rejected.
- Unknown→permanent default — drops availability on novel errors. Rejected.
+31
View File
@@ -0,0 +1,31 @@
# ADR-0007: Dependency policy — stdlib-first, hand-rolled REST clients
**Status:** Accepted — 2026-06-10
## Context
go-llm leans on SDKs (openai-go, go-anthropic, genai) and carries their
transitive weight and churn. The kickoff mandates minimal dependencies with
full control over multimodal payloads and capability handling.
## Decision
- **Hand-rolled `net/http` JSON clients** for OpenAI(+compatible),
Anthropic(+compatible), Ollama (cloud + local), and foreman. Their REST
surfaces are small and stable; owning the wire shapes gives exact control
over tool calls, structured output, streaming, and image payloads.
- **One approved third-party dependency:** the official Google Gen AI Go SDK
(`google.golang.org/genai`) for the Gemini provider — Google's surface
moves too much to hand-roll profitably.
- Image normalization uses stdlib `image`, `image/jpeg`, `image/png`.
`golang.org/x/image` may be added **only** if a needed format demands it,
via a new ADR.
- Any other third-party dependency requires its own ADR justifying it.
- No persistent store, no metrics stack, no config framework, no CLI beyond
`examples/` (out-of-scope guardrails).
## Consequences
- `go.mod` stays near-empty; consumers inherit almost nothing transitively.
- We own wire-format drift: provider docs are verified against current
documentation at implementation time and recorded in the provider ADRs.
+60
View File
@@ -0,0 +1,60 @@
# ADR-0008: Failover-chain execution semantics
**Status:** Accepted — 2026-06-10
## Context
A parsed spec is an ordered chain of targets sharing the registry's health
tracker. The executor must realize the kickoff's failover story (retry one
blip; bench repeat offenders; skip benched targets; clear exhaustion errors)
identically for chains of one and many.
## Decision
For each request, iterate elements head-to-tail:
1. **Skip** targets currently benched (recorded in the exhaustion error).
2. Attempt the target. On success → report success (resets health), return.
3. On error, classify:
- **Permanent + model-not-found** → advance, no health penalty.
- **Permanent otherwise** (auth, malformed) → **fail fast** by default —
failing over cannot fix a bad request; `ChainConfig.AdvanceOnPermanent`
flips this for callers who prefer availability.
- **Transient** → report the failed attempt to the tracker; retry the
same target while attempts remain (`TransientRetries`, default 1)
**unless the tracker just benched it**, in which case advance
immediately.
4. All elements failed/skipped → return `errors.Join(ErrChainExhausted,
per-target reasons...)` naming every target and why.
Other decisions:
- **Capabilities() = head element's capabilities.** The head is the
preferred target and the honest answer to "what should I prepare for?".
Per-attempt media normalization (Phase 3) uses the *actual* target's
capabilities, so fallbacks still get correctly-fitted inputs.
Intersection semantics were rejected: a rarely-used tail fallback would
artificially constrain every request.
- **Streaming failover applies to stream establishment only.** Once a
stream is open, mid-stream errors propagate; silently restarting on
another target would re-deliver partial output.
- `context.Canceled` aborts the chain immediately between and during
attempts.
- Duplicate post-expansion elements were already dropped at Parse
(ADR-0003).
## Consequences
- "One transient error is fine" holds: blip → same-target retry succeeds,
no failover, one health mark that the success immediately clears... and
with default knobs (retries=1, threshold=2) a target whose retry also
fails is benched in the same request and the chain advances — exactly the
kickoff narrative.
- Single-target specs get the same retry/backoff behavior for free.
## Alternatives considered
- Per-request (not per-attempt) failure counting — needs two failed
*requests* to bench, letting a dead model eat the retry budget twice.
Rejected as weaker than the kickoff's story.
- Intersection capabilities — see above. Rejected.
+14
View File
@@ -0,0 +1,14 @@
# Architecture Decision Records
One decision per file, append-only; supersede rather than rewrite.
| ADR | Title | Status |
|-----|-------|--------|
| [0001](0001-package-layout.md) | Package layout — canonical types in leaf `llm`, root re-exports | Accepted |
| [0002](0002-canonical-message-model.md) | Canonical message/content model | Accepted |
| [0003](0003-parse-grammar.md) | Parse grammar — verbatim ids, inline alias expansion, chains | Accepted |
| [0004](0004-env-dsn-providers.md) | LLM_* env-DSN provider definitions (go-llm parity + eager load) | Accepted |
| [0005](0005-provider-capabilities.md) | Provider interface and capabilities model | Accepted |
| [0006](0006-health-and-backoff.md) | Model health tracking and backoff | Accepted |
| [0007](0007-dependency-policy.md) | Dependency policy — stdlib-first, hand-rolled REST clients | Accepted |
| [0008](0008-chain-semantics.md) | Failover-chain execution semantics | Accepted |