# ADR-0006: Model health tracking and backoff **Status:** Accepted — 2026-06-10 ## Context Ollama Cloud models intermittently return "high demand" errors. mort's behavior to preserve: one blip should not fail a request (retry); a model that keeps failing should be benched so chains skip it, then re-admitted after a cooldown. majordomo owns this (the "model health tracker"). ## Decision In-memory, process-local, thread-safe tracker in `health/`, keyed by `"provider/model-id"`, with an **injected clock** (`func() time.Time`) so every backoff path is unit-testable without sleeping. - **Classification** (`llm.Classify`, overridable via `ChainConfig.Classify`): transient = HTTP 408/429/5xx, network timeouts, connection refused/reset, DNS failures, `context.DeadlineExceeded`; permanent = HTTP 400/401/403/404/405/422, `ErrModelNotFound`, `context.Canceled` (the caller gave up — retrying defies intent). **Unknown errors default to transient**: failing over can only help availability, and a wrongly benched model self-heals via cooldown, while a wrongly fail-fasted request is lost. - **Counting:** every failed transient *attempt* increments the target's consecutive-failure count; any success resets count **and** backoff exponent. At threshold (default **2**) the target is benched until `now + cooldown`, with cooldown = base (default **5s**) × multiplier (default **2**) per consecutive backoff round, capped (default **5m**). After the bench triggers, the count resets, so re-benching needs a fresh run of failures — but at the doubled cooldown. - All knobs (threshold, base/cap/multiplier, clock, classifier, retry count) are configuration with the above defaults baked in. - **No persistence, no interface.** The tracker is a concrete type; health is process-local by design (out-of-scope guardrail). A consumer wanting shared state can wrap the registry; we do not build for it now. ## Consequences - Deterministic tests via fake clock; no `time.Sleep` anywhere. - Two providers addressing the same upstream model (e.g. `m1/x` and `m5/x`) track independently — correct, since the backends are different machines. ## Alternatives considered - Persistent/pluggable health store — explicitly out of scope. Rejected. - Unknown→permanent default — drops availability on novel errors. Rejected.