feat: foundations — canonical types, Parse grammar, env DSNs, health, chains
Phase 1 of the majordomo build: - llm/ canonical contract (messages, parts, tools, capabilities, streaming, Model/Provider, error classification) - health/ clock-injected tracker (threshold bench, exponential capped cooldown, reset-on-success) - root Registry + Parse (verbatim model ids, inline recursive alias expansion with cycle detection, chain dedup), LLM_* env-DSN providers (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain executor behind the Model interface - provider/fake scriptable test provider; hermetic test suite incl. the trailing-thinking chain and foreman:// env loading - ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow, docs/phase-1-design.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,46 @@
|
||||
# ADR-0001: Package layout — canonical types in a leaf `llm` package, root re-exports
|
||||
|
||||
**Status:** Accepted — 2026-06-10
|
||||
|
||||
## Context
|
||||
|
||||
Provider implementations (openai, anthropic, google, ollama/foreman) must share
|
||||
the canonical types (Message, Request, Response, Capabilities, Model, Provider).
|
||||
If those types lived in the root `majordomo` package, the root could not also
|
||||
register built-in providers (root → provider/openai → root is an import cycle).
|
||||
go-llm solved this with a `v2/provider` leaf package; the kickoff sketch puts
|
||||
the Provider interface in `provider/provider.go` and the message types at root,
|
||||
which recreates the cycle.
|
||||
|
||||
## Decision
|
||||
|
||||
- All canonical contract types live in the leaf package
|
||||
`majordomo/llm` (Message, Part, Request, Response, Option, Tool, Toolbox,
|
||||
Capabilities, Stream, Model, Provider, error classification). It imports
|
||||
nothing else in the module.
|
||||
- The root `majordomo` package re-exports every canonical type via type
|
||||
aliases (plus constructor/option wrappers), so consumers write
|
||||
`majordomo.Request`, `majordomo.UserText(...)` and rarely import `llm`.
|
||||
- The root owns assembly: Registry, Parse, env-DSN loading, the chain
|
||||
executor, and (from Phase 3) registration of real provider clients.
|
||||
- The planned `resolve/` package is folded into the root: the grammar needs
|
||||
registry state (aliases, providers, env fallback) at every expansion step,
|
||||
and a callback interface between two packages bought nothing but
|
||||
indirection.
|
||||
- `health/`, `media/`, `provider/<impl>/`, `provider/fake/`, `agent/`, and
|
||||
`skill/` are subpackages importing `llm` (and never each other, except
|
||||
agent → skill).
|
||||
|
||||
## Consequences
|
||||
|
||||
- No import cycles; new providers are additive subpackages.
|
||||
- Consumers get the flat one-import API the kickoff sketches.
|
||||
- Type aliases (not wrappers) mean zero conversion cost and full
|
||||
interchangeability between `majordomo.X` and `llm.X`.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- **Everything in root.** No cycles only if providers also live in root —
|
||||
a single giant package. Rejected.
|
||||
- **Self-registering providers via package init() side effects.** Hides
|
||||
wiring, breaks multi-registry isolation, surprises tests. Rejected.
|
||||
@@ -0,0 +1,53 @@
|
||||
# ADR-0002: Canonical message/content model
|
||||
|
||||
**Status:** Accepted — 2026-06-10
|
||||
|
||||
## Context
|
||||
|
||||
Every provider has a different wire shape for conversations, content,
|
||||
tool calls, and system prompts. majordomo needs one canonical shape that all
|
||||
providers translate to/from, expressive enough for multimodality and tool
|
||||
loops, small enough to keep providers honest.
|
||||
|
||||
## Decision
|
||||
|
||||
- `Message{Role, Parts, ToolCalls, ToolResults}` with roles system / user /
|
||||
assistant / tool. `Part` is a **sealed** interface (`TextPart`,
|
||||
`ImagePart`) so providers can switch exhaustively; new media kinds are
|
||||
deliberate API changes, not silent pass-throughs.
|
||||
- `ImagePart` is **bytes + MIME only** — no URL form. The media pipeline
|
||||
must inspect/resize/transcode images against target capabilities, which
|
||||
requires bytes; fetching remote URLs is the caller's job, not a hidden
|
||||
network dependency inside a model call.
|
||||
- `Request.System` is a dedicated top-level field (maps to Anthropic
|
||||
`system`, Google `SystemInstruction`, an OpenAI/Ollama system message).
|
||||
RoleSystem messages in the history are also accepted and folded by
|
||||
providers. Request also carries Tools, ToolChoice, Schema/SchemaName, and
|
||||
sampling knobs; per-call mutation happens via `Option` funcs applied to a
|
||||
copy, so Request values are reusable.
|
||||
- Model ids never carry behavior suffixes: unlike go-llm there is **no
|
||||
`:low/:medium/:high` reasoning-suffix grammar** (it conflicts with
|
||||
verbatim model ids like `minimax-m3:cloud`, see ADR-0003). Reasoning
|
||||
effort will be a request option when providers land.
|
||||
- `Response{Parts, ToolCalls, FinishReason, Usage, Model, Raw}` — `Model`
|
||||
names the target that actually served the request (vital with chains);
|
||||
`Raw` is the provider-native escape hatch, never required.
|
||||
- Streaming (`Stream.Next() → StreamEvent`): text deltas stream as they
|
||||
arrive; **tool-call arguments are buffered until complete** (consumers
|
||||
never see partial JSON); the final event carries the accumulated
|
||||
`*Response`; `io.EOF` terminates.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Providers stay translation layers; nothing provider-specific leaks into
|
||||
the canonical API.
|
||||
- Callers needing remote images fetch them first — explicit, testable.
|
||||
- Partial-tool-call streaming UIs are out of scope (acceptable: arguments
|
||||
are rarely useful before they parse).
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- Open `Part` interface — silent content drops on unknown kinds. Rejected.
|
||||
- URL image parts with lazy fetch — hidden I/O inside Generate, breaks
|
||||
capability normalization. Rejected.
|
||||
- go-llm-style reasoning suffixes — see ADR-0003. Rejected.
|
||||
@@ -0,0 +1,57 @@
|
||||
# ADR-0003: Parse grammar — verbatim model ids, inline alias expansion, chains
|
||||
|
||||
**Status:** Accepted — 2026-06-10
|
||||
|
||||
## Context
|
||||
|
||||
Callers (mort first) address models by string: single targets, tier aliases,
|
||||
and comma-separated failover chains, with custom and env-defined providers as
|
||||
first-class elements. go-llm's grammar is close but nests alias-chains as
|
||||
composite Models and strips `:low/:medium/:high` reasoning suffixes, which
|
||||
collides with Ollama-style tags (`minimax-m3:cloud`) and Google-style ids.
|
||||
|
||||
## Decision
|
||||
|
||||
Grammar (binding, from the kickoff):
|
||||
|
||||
```
|
||||
spec := element ("," element)*
|
||||
element := target | alias
|
||||
target := provider "/" model # model = everything after the FIRST "/",
|
||||
# up to the next comma, passed VERBATIM
|
||||
alias := bare token, no slash
|
||||
```
|
||||
|
||||
- Provider resolution order per target: registered providers (built-ins,
|
||||
RegisterProvider, eagerly env-loaded) → lazy `LLM_{UPPER(name)}` env DSN
|
||||
(ADR-0004) → error naming both places checked.
|
||||
- Aliases expand **inline** wherever they appear (head/middle/tail),
|
||||
recursively, into the flat element list. Cycles are detected via the
|
||||
expansion stack and return `ErrAliasCycle` — never a hang. Inline (not
|
||||
nested-Model, as in go-llm) expansion keeps one flat chain so health
|
||||
skipping and error reporting see every element uniformly.
|
||||
- Duplicate elements after expansion are dropped (first occurrence wins):
|
||||
retrying an already-failed target in the same pass is never useful.
|
||||
- A single element and a multi-element chain return the same `Model`
|
||||
(a chain of one) — identical retry/health semantics, callers never branch.
|
||||
- **No reasoning-suffix stripping.** mort's `:high` dialect is handled by
|
||||
mort's spec layer during migration; majordomo will expose reasoning effort
|
||||
as an explicit request option instead.
|
||||
- The package-level `Default()` registry (lazy, loads process env) backs
|
||||
`majordomo.Parse` for go-llm-style one-call ergonomics; `New()` builds
|
||||
isolated registries for tests/multi-tenant use.
|
||||
|
||||
## Consequences
|
||||
|
||||
- `m1/richardyoung/qwen3-14b-abliterated:q4_K_M` (a real mort tier value)
|
||||
parses as provider `m1`, model `richardyoung/qwen3-14b-abliterated:q4_K_M`.
|
||||
- A bare token that is a provider name yields a targeted error
|
||||
("use openai/<model-id>").
|
||||
- Alias updates after Parse don't affect already-built Models (expansion is
|
||||
at Parse time). mort re-parses per request, so DB-tier edits still apply.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- Nested alias expansion (go-llm): opaque chains inside chains; health
|
||||
skipping can't see the elements. Rejected.
|
||||
- Reasoning suffixes in the grammar: breaks verbatim ids. Rejected.
|
||||
@@ -0,0 +1,60 @@
|
||||
# ADR-0004: LLM_* env-DSN provider definitions (go-llm parity, plus eager load)
|
||||
|
||||
**Status:** Accepted — 2026-06-10
|
||||
|
||||
## Context
|
||||
|
||||
Steve's deployments define providers via env vars that must keep working
|
||||
unchanged:
|
||||
|
||||
```
|
||||
LLM_M1=foreman://token@foreman-m1.orgrimmar.dudenhoeffer.casa
|
||||
LLM_M5=foreman://token@foreman-m5.orgrimmar.dudenhoeffer.casa
|
||||
```
|
||||
|
||||
go-llm (v2/parse.go) implements this **lazily only**: `Parse("m5/x")` misses
|
||||
the registry, computes `LLM_` + UPPER(name) with `-`→`_`, reads exactly that
|
||||
var, parses `scheme://[token@]host[/path]` by plain string splits, requires
|
||||
the scheme to be a registered provider, and dials `https://` + host. There is
|
||||
no environment scan. The kickoff additionally requires `New()` to load LLM_*
|
||||
providers eagerly and a testable `LoadEnv(map)`.
|
||||
|
||||
## Decision
|
||||
|
||||
Implement **both** paths over one DSN parser (byte-for-byte go-llm
|
||||
semantics — `://` split, first-`@` split, trailing-`/` trim, ErrInvalidDSN on
|
||||
missing scheme/host, base URL always `https://host[/path]`):
|
||||
|
||||
- **Eager:** `New()` scans the process environment for `LLM_<NAME>` and
|
||||
registers each as provider `lower(<NAME>)` (underscores preserved:
|
||||
`LLM_MY_BOX` → `my_box`). `LoadEnv(map[string]string)` is the explicit,
|
||||
testable entry. Malformed entries never fail construction: they are
|
||||
recorded per-name, returned joined from LoadEnv, and surface from Parse
|
||||
only when that name is actually referenced (matching go-llm's
|
||||
fail-on-use behavior).
|
||||
- **Lazy (go-llm parity):** an unknown provider name in Parse falls back to
|
||||
`LLM_{UPPER(name, - → _)}`, so hyphenated spec names (`my-prov/x` →
|
||||
`LLM_MY_PROV`) work exactly as in go-llm. Lazily resolved providers are
|
||||
cached in the registry.
|
||||
- The DSN **scheme** selects a `SchemeFactory` (foreman, ollama,
|
||||
ollama-cloud, openai, anthropic, google, gemini; extensible via
|
||||
`RegisterScheme`). The factory receives the registry name and the parsed
|
||||
DSN (token = credential, `https://host` = base URL).
|
||||
|
||||
## Consequences
|
||||
|
||||
- Existing muscle memory carries over: every go-llm-resolvable LLM_* var
|
||||
resolves identically here.
|
||||
- Eager loading additionally makes env providers visible to discovery
|
||||
(`Provider(name)`) before first use.
|
||||
- An env DSN cannot express plain-http endpoints (https is forced) — same
|
||||
limitation as go-llm, kept deliberately for parity; local Ollama uses the
|
||||
`ollama` provider's own default (`http://localhost:11434`) rather than a
|
||||
DSN.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- `url.Parse`-based DSN parsing: subtly different (percent-decoding,
|
||||
userinfo passwords). Parity wins. Rejected.
|
||||
- Failing New() on malformed LLM_* vars: one stray var would break every
|
||||
consumer at startup. Rejected.
|
||||
@@ -0,0 +1,41 @@
|
||||
# ADR-0005: Provider interface and the capabilities model
|
||||
|
||||
**Status:** Accepted — 2026-06-10
|
||||
|
||||
## Context
|
||||
|
||||
Each provider — and some individual models — imposes different limits (image
|
||||
dimensions/bytes/MIME/count, tools, structured output, streaming, context
|
||||
size). Callers must not need to know them; the library must normalize or
|
||||
clearly reject.
|
||||
|
||||
## Decision
|
||||
|
||||
- `Provider` is minimal: `Name()` and `Model(id, opts...) (Model, error)`.
|
||||
Model ids pass through verbatim; providers never validate ids against a
|
||||
catalog (models churn weekly; catalogs rot).
|
||||
- `Capabilities` is a plain struct declared **per provider** with
|
||||
**per-model overrides** via `WithCapabilities` (a `ModelOption`). Zero
|
||||
values mean: `MaxImagesPerReq == 0` → images unsupported;
|
||||
`MaxImageBytes/MaxImageDimension/ContextWindow == 0` → no declared limit;
|
||||
empty `AllowedImageMIME` → any type.
|
||||
- Providers construct without error even when credentials are missing; the
|
||||
failure surfaces as an auth error at request time (and a chain can fail
|
||||
over past it). Construction-time validation would make `New()` fragile.
|
||||
- Until a provider's implementation phase lands, built-ins register as
|
||||
**stubs**: they resolve in Parse (so chains, aliases, and env DSNs are
|
||||
fully functional) and return a clear "not implemented yet" error on use.
|
||||
|
||||
## Consequences
|
||||
|
||||
- The media pipeline (Phase 3, ADR to follow) can normalize against any
|
||||
target uniformly.
|
||||
- Adding a provider is additive: implement two methods + declare
|
||||
capabilities.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- Capability methods on Model with provider-specific logic — pushes limits
|
||||
knowledge into every caller. Rejected.
|
||||
- Model catalogs with validation — stale within weeks, breaks pass-through
|
||||
targets like foreman. Rejected.
|
||||
@@ -0,0 +1,48 @@
|
||||
# ADR-0006: Model health tracking and backoff
|
||||
|
||||
**Status:** Accepted — 2026-06-10
|
||||
|
||||
## Context
|
||||
|
||||
Ollama Cloud models intermittently return "high demand" errors. mort's
|
||||
behavior to preserve: one blip should not fail a request (retry); a model
|
||||
that keeps failing should be benched so chains skip it, then re-admitted
|
||||
after a cooldown. majordomo owns this (the "model health tracker").
|
||||
|
||||
## Decision
|
||||
|
||||
In-memory, process-local, thread-safe tracker in `health/`, keyed by
|
||||
`"provider/model-id"`, with an **injected clock** (`func() time.Time`) so
|
||||
every backoff path is unit-testable without sleeping.
|
||||
|
||||
- **Classification** (`llm.Classify`, overridable via `ChainConfig.Classify`):
|
||||
transient = HTTP 408/429/5xx, network timeouts, connection refused/reset,
|
||||
DNS failures, `context.DeadlineExceeded`; permanent = HTTP
|
||||
400/401/403/404/405/422, `ErrModelNotFound`, `context.Canceled` (the
|
||||
caller gave up — retrying defies intent). **Unknown errors default to
|
||||
transient**: failing over can only help availability, and a wrongly
|
||||
benched model self-heals via cooldown, while a wrongly fail-fasted request
|
||||
is lost.
|
||||
- **Counting:** every failed transient *attempt* increments the target's
|
||||
consecutive-failure count; any success resets count **and** backoff
|
||||
exponent. At threshold (default **2**) the target is benched until
|
||||
`now + cooldown`, with cooldown = base (default **5s**) × multiplier
|
||||
(default **2**) per consecutive backoff round, capped (default **5m**).
|
||||
After the bench triggers, the count resets, so re-benching needs a fresh
|
||||
run of failures — but at the doubled cooldown.
|
||||
- All knobs (threshold, base/cap/multiplier, clock, classifier, retry count)
|
||||
are configuration with the above defaults baked in.
|
||||
- **No persistence, no interface.** The tracker is a concrete type; health
|
||||
is process-local by design (out-of-scope guardrail). A consumer wanting
|
||||
shared state can wrap the registry; we do not build for it now.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Deterministic tests via fake clock; no `time.Sleep` anywhere.
|
||||
- Two providers addressing the same upstream model (e.g. `m1/x` and `m5/x`)
|
||||
track independently — correct, since the backends are different machines.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- Persistent/pluggable health store — explicitly out of scope. Rejected.
|
||||
- Unknown→permanent default — drops availability on novel errors. Rejected.
|
||||
@@ -0,0 +1,31 @@
|
||||
# ADR-0007: Dependency policy — stdlib-first, hand-rolled REST clients
|
||||
|
||||
**Status:** Accepted — 2026-06-10
|
||||
|
||||
## Context
|
||||
|
||||
go-llm leans on SDKs (openai-go, go-anthropic, genai) and carries their
|
||||
transitive weight and churn. The kickoff mandates minimal dependencies with
|
||||
full control over multimodal payloads and capability handling.
|
||||
|
||||
## Decision
|
||||
|
||||
- **Hand-rolled `net/http` JSON clients** for OpenAI(+compatible),
|
||||
Anthropic(+compatible), Ollama (cloud + local), and foreman. Their REST
|
||||
surfaces are small and stable; owning the wire shapes gives exact control
|
||||
over tool calls, structured output, streaming, and image payloads.
|
||||
- **One approved third-party dependency:** the official Google Gen AI Go SDK
|
||||
(`google.golang.org/genai`) for the Gemini provider — Google's surface
|
||||
moves too much to hand-roll profitably.
|
||||
- Image normalization uses stdlib `image`, `image/jpeg`, `image/png`.
|
||||
`golang.org/x/image` may be added **only** if a needed format demands it,
|
||||
via a new ADR.
|
||||
- Any other third-party dependency requires its own ADR justifying it.
|
||||
- No persistent store, no metrics stack, no config framework, no CLI beyond
|
||||
`examples/` (out-of-scope guardrails).
|
||||
|
||||
## Consequences
|
||||
|
||||
- `go.mod` stays near-empty; consumers inherit almost nothing transitively.
|
||||
- We own wire-format drift: provider docs are verified against current
|
||||
documentation at implementation time and recorded in the provider ADRs.
|
||||
@@ -0,0 +1,60 @@
|
||||
# ADR-0008: Failover-chain execution semantics
|
||||
|
||||
**Status:** Accepted — 2026-06-10
|
||||
|
||||
## Context
|
||||
|
||||
A parsed spec is an ordered chain of targets sharing the registry's health
|
||||
tracker. The executor must realize the kickoff's failover story (retry one
|
||||
blip; bench repeat offenders; skip benched targets; clear exhaustion errors)
|
||||
identically for chains of one and many.
|
||||
|
||||
## Decision
|
||||
|
||||
For each request, iterate elements head-to-tail:
|
||||
|
||||
1. **Skip** targets currently benched (recorded in the exhaustion error).
|
||||
2. Attempt the target. On success → report success (resets health), return.
|
||||
3. On error, classify:
|
||||
- **Permanent + model-not-found** → advance, no health penalty.
|
||||
- **Permanent otherwise** (auth, malformed) → **fail fast** by default —
|
||||
failing over cannot fix a bad request; `ChainConfig.AdvanceOnPermanent`
|
||||
flips this for callers who prefer availability.
|
||||
- **Transient** → report the failed attempt to the tracker; retry the
|
||||
same target while attempts remain (`TransientRetries`, default 1)
|
||||
**unless the tracker just benched it**, in which case advance
|
||||
immediately.
|
||||
4. All elements failed/skipped → return `errors.Join(ErrChainExhausted,
|
||||
per-target reasons...)` naming every target and why.
|
||||
|
||||
Other decisions:
|
||||
|
||||
- **Capabilities() = head element's capabilities.** The head is the
|
||||
preferred target and the honest answer to "what should I prepare for?".
|
||||
Per-attempt media normalization (Phase 3) uses the *actual* target's
|
||||
capabilities, so fallbacks still get correctly-fitted inputs.
|
||||
Intersection semantics were rejected: a rarely-used tail fallback would
|
||||
artificially constrain every request.
|
||||
- **Streaming failover applies to stream establishment only.** Once a
|
||||
stream is open, mid-stream errors propagate; silently restarting on
|
||||
another target would re-deliver partial output.
|
||||
- `context.Canceled` aborts the chain immediately between and during
|
||||
attempts.
|
||||
- Duplicate post-expansion elements were already dropped at Parse
|
||||
(ADR-0003).
|
||||
|
||||
## Consequences
|
||||
|
||||
- "One transient error is fine" holds: blip → same-target retry succeeds,
|
||||
no failover, one health mark that the success immediately clears... and
|
||||
with default knobs (retries=1, threshold=2) a target whose retry also
|
||||
fails is benched in the same request and the chain advances — exactly the
|
||||
kickoff narrative.
|
||||
- Single-target specs get the same retry/backoff behavior for free.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- Per-request (not per-attempt) failure counting — needs two failed
|
||||
*requests* to bench, letting a dead model eat the retry budget twice.
|
||||
Rejected as weaker than the kickoff's story.
|
||||
- Intersection capabilities — see above. Rejected.
|
||||
@@ -0,0 +1,14 @@
|
||||
# Architecture Decision Records
|
||||
|
||||
One decision per file, append-only; supersede rather than rewrite.
|
||||
|
||||
| ADR | Title | Status |
|
||||
|-----|-------|--------|
|
||||
| [0001](0001-package-layout.md) | Package layout — canonical types in leaf `llm`, root re-exports | Accepted |
|
||||
| [0002](0002-canonical-message-model.md) | Canonical message/content model | Accepted |
|
||||
| [0003](0003-parse-grammar.md) | Parse grammar — verbatim ids, inline alias expansion, chains | Accepted |
|
||||
| [0004](0004-env-dsn-providers.md) | LLM_* env-DSN provider definitions (go-llm parity + eager load) | Accepted |
|
||||
| [0005](0005-provider-capabilities.md) | Provider interface and capabilities model | Accepted |
|
||||
| [0006](0006-health-and-backoff.md) | Model health tracking and backoff | Accepted |
|
||||
| [0007](0007-dependency-policy.md) | Dependency policy — stdlib-first, hand-rolled REST clients | Accepted |
|
||||
| [0008](0008-chain-semantics.md) | Failover-chain execution semantics | Accepted |
|
||||
@@ -0,0 +1,84 @@
|
||||
# Phase 1 design summary (for after-the-fact review)
|
||||
|
||||
Written at the Phase 1 → 2 boundary of the unattended build run
|
||||
(2026-06-10). Captures the public surface and the decisions behind it.
|
||||
Authoritative details live in the ADRs; this is the review digest.
|
||||
|
||||
## What the library looks like to a consumer
|
||||
|
||||
```go
|
||||
reg := majordomo.New() // built-ins + LLM_* env providers
|
||||
reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
|
||||
|
||||
m, err := reg.Parse("m5/qwen3:30b,ollama-cloud/kimi-k2.6:cloud,thinking")
|
||||
resp, err := m.Generate(ctx, majordomo.Request{
|
||||
System: "You are terse.",
|
||||
Messages: []majordomo.Message{majordomo.UserText("hi")},
|
||||
}, majordomo.WithMaxTokens(200))
|
||||
```
|
||||
|
||||
- `Model` = `Generate` / `Stream` / `Capabilities`; a chain and a single
|
||||
target are the same interface.
|
||||
- `Provider` = `Name` / `Model(id, opts...)`; ids verbatim, no catalogs.
|
||||
- Canonical types live in `majordomo/llm`, re-exported at root via aliases
|
||||
(ADR-0001) — providers import `llm` only.
|
||||
|
||||
## Parse grammar (ADR-0003)
|
||||
|
||||
`spec := element ("," element)*`; element = `provider/model` (model id =
|
||||
everything after the first slash, verbatim) or a bare alias token expanded
|
||||
inline + recursively with cycle detection. Both kickoff README examples are
|
||||
covered by tests, including the trailing-`thinking` variant and dedup of
|
||||
overlapping alias expansions.
|
||||
|
||||
**Deviation from go-llm worth reviewing:** no `:low/:medium/:high`
|
||||
reasoning-suffix stripping — it conflicts with verbatim ids
|
||||
(`minimax-m3:cloud`, `richardyoung/qwen3-14b-abliterated:q4_K_M` in mort's
|
||||
tiers). Plan: reasoning effort becomes an explicit request option when
|
||||
providers land; mort's wrapper translates its legacy suffix dialect during
|
||||
Phase 9. If you want suffix parity instead, it's an additive change behind
|
||||
a RegistryOption.
|
||||
|
||||
## LLM_* env DSNs (ADR-0004)
|
||||
|
||||
Parser is byte-for-byte go-llm (`scheme://[token@]host[/path]`, https
|
||||
forced, fail-on-use for malformed values). Two resolution paths:
|
||||
eager scan in `New()`/`LoadEnv(map)` (kickoff requirement;
|
||||
`LLM_M1` → provider `m1`) **plus** go-llm's lazy `LLM_{UPPER(name)}`
|
||||
fallback at Parse time (so hyphenated names keep working). Schemes are
|
||||
factories (`RegisterScheme`) — consumers can bind custom provider kinds to
|
||||
DSNs.
|
||||
|
||||
## Health & chains (ADR-0006, ADR-0008)
|
||||
|
||||
Clock-injected in-memory tracker keyed `provider/model`. Transient vs
|
||||
permanent via `llm.Classify` (unknown → transient; `context.Canceled` →
|
||||
permanent). Defaults: 1 same-target retry; bench after 2 consecutive failed
|
||||
attempts; cooldown 5s ×2 capped 5m; success resets everything. Chains skip
|
||||
benched targets, advance penalty-free on 404, fail fast on auth/malformed
|
||||
(flippable via `AdvanceOnPermanent`), and join per-target reasons on
|
||||
exhaustion. Chain `Capabilities()` = head element (per-attempt media
|
||||
normalization will use the actual target, Phase 3). Streaming failover
|
||||
covers stream establishment only.
|
||||
|
||||
## Flagged for reconsideration
|
||||
|
||||
1. **Reasoning suffixes** (above) — deliberate deviation, easy to add back.
|
||||
2. **Duplicate-element dedup in chains** (first occurrence wins): right for
|
||||
health semantics, but means `a,b,a` won't retry `a` at the tail even
|
||||
after `b` fails. Believed correct (same request, same bench state);
|
||||
flag if "retry head last" matters to you.
|
||||
3. **`AdvanceOnPermanent` default = fail-fast** on auth/malformed errors:
|
||||
matches the kickoff; mort's old behavior was closer to
|
||||
advance-on-everything. Phase 9 can set the flag per-registry if mort's
|
||||
UX prefers availability.
|
||||
4. **Stub built-ins**: until Phases 3–4, `openai/...` etc. parse fine and
|
||||
error on use with "not implemented yet". Chains mixing stubs and real
|
||||
providers will fail over past stubs naturally (the error classifies
|
||||
transient) — temporary, gone by Phase 4.
|
||||
|
||||
## ADR set
|
||||
|
||||
0001 package layout · 0002 message model · 0003 parse grammar ·
|
||||
0004 env DSNs · 0005 provider/capabilities · 0006 health/backoff ·
|
||||
0007 dependency policy · 0008 chain semantics
|
||||
Reference in New Issue
Block a user