feat: foundations — canonical types, Parse grammar, env DSNs, health, chains
Phase 1 of the majordomo build: - llm/ canonical contract (messages, parts, tools, capabilities, streaming, Model/Provider, error classification) - health/ clock-injected tracker (threshold bench, exponential capped cooldown, reset-on-success) - root Registry + Parse (verbatim model ids, inline recursive alias expansion with cycle detection, chain dedup), LLM_* env-DSN providers (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain executor behind the Model interface - provider/fake scriptable test provider; hermetic test suite incl. the trailing-thinking chain and foreman:// env loading - ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow, docs/phase-1-design.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,60 @@
|
||||
# ADR-0008: Failover-chain execution semantics
|
||||
|
||||
**Status:** Accepted — 2026-06-10
|
||||
|
||||
## Context
|
||||
|
||||
A parsed spec is an ordered chain of targets sharing the registry's health
|
||||
tracker. The executor must realize the kickoff's failover story (retry one
|
||||
blip; bench repeat offenders; skip benched targets; clear exhaustion errors)
|
||||
identically for chains of one and many.
|
||||
|
||||
## Decision
|
||||
|
||||
For each request, iterate elements head-to-tail:
|
||||
|
||||
1. **Skip** targets currently benched (recorded in the exhaustion error).
|
||||
2. Attempt the target. On success → report success (resets health), return.
|
||||
3. On error, classify:
|
||||
- **Permanent + model-not-found** → advance, no health penalty.
|
||||
- **Permanent otherwise** (auth, malformed) → **fail fast** by default —
|
||||
failing over cannot fix a bad request; `ChainConfig.AdvanceOnPermanent`
|
||||
flips this for callers who prefer availability.
|
||||
- **Transient** → report the failed attempt to the tracker; retry the
|
||||
same target while attempts remain (`TransientRetries`, default 1)
|
||||
**unless the tracker just benched it**, in which case advance
|
||||
immediately.
|
||||
4. All elements failed/skipped → return `errors.Join(ErrChainExhausted,
|
||||
per-target reasons...)` naming every target and why.
|
||||
|
||||
Other decisions:
|
||||
|
||||
- **Capabilities() = head element's capabilities.** The head is the
|
||||
preferred target and the honest answer to "what should I prepare for?".
|
||||
Per-attempt media normalization (Phase 3) uses the *actual* target's
|
||||
capabilities, so fallbacks still get correctly-fitted inputs.
|
||||
Intersection semantics were rejected: a rarely-used tail fallback would
|
||||
artificially constrain every request.
|
||||
- **Streaming failover applies to stream establishment only.** Once a
|
||||
stream is open, mid-stream errors propagate; silently restarting on
|
||||
another target would re-deliver partial output.
|
||||
- `context.Canceled` aborts the chain immediately between and during
|
||||
attempts.
|
||||
- Duplicate post-expansion elements were already dropped at Parse
|
||||
(ADR-0003).
|
||||
|
||||
## Consequences
|
||||
|
||||
- "One transient error is fine" holds: blip → same-target retry succeeds,
|
||||
no failover, one health mark that the success immediately clears... and
|
||||
with default knobs (retries=1, threshold=2) a target whose retry also
|
||||
fails is benched in the same request and the chain advances — exactly the
|
||||
kickoff narrative.
|
||||
- Single-target specs get the same retry/backoff behavior for free.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- Per-request (not per-attempt) failure counting — needs two failed
|
||||
*requests* to bench, letting a dead model eat the retry budget twice.
|
||||
Rejected as weaker than the kickoff's story.
|
||||
- Intersection capabilities — see above. Rejected.
|
||||
Reference in New Issue
Block a user