# ADR-0006: Model health tracking and backoff

**Status:** Accepted — 2026-06-10

## Context

Ollama Cloud models intermittently return "high demand" errors. mort's
behavior to preserve: one blip should not fail a request (retry); a model
that keeps failing should be benched so chains skip it, then re-admitted
after a cooldown. majordomo owns this (the "model health tracker").

## Decision

In-memory, process-local, thread-safe tracker in `health/`, keyed by
`"provider/model-id"`, with an **injected clock** (`func() time.Time`) so
every backoff path is unit-testable without sleeping.

- **Classification** (`llm.Classify`, overridable via `ChainConfig.Classify`):
  transient = HTTP 408/429/5xx, network timeouts, connection refused/reset,
  DNS failures, `context.DeadlineExceeded`; permanent = HTTP
  400/401/403/404/405/422, `ErrModelNotFound`, `context.Canceled` (the
  caller gave up — retrying defies intent). **Unknown errors default to
  transient**: failing over can only help availability, and a wrongly
  benched model self-heals via cooldown, while a wrongly fail-fasted request
  is lost.
- **Counting:** every failed transient *attempt* increments the target's
  consecutive-failure count; any success resets count **and** backoff
  exponent. At threshold (default **2**) the target is benched until
  `now + cooldown`, with cooldown = base (default **5s**) × multiplier
  (default **2**) per consecutive backoff round, capped (default **5m**).
  After the bench triggers, the count resets, so re-benching needs a fresh
  run of failures — but at the doubled cooldown.
- All knobs (threshold, base/cap/multiplier, clock, classifier, retry count)
  are configuration with the above defaults baked in.
- **No persistence, no interface.** The tracker is a concrete type; health
  is process-local by design (out-of-scope guardrail). A consumer wanting
  shared state can wrap the registry; we do not build for it now.

## Consequences

- Deterministic tests via fake clock; no `time.Sleep` anywhere.
- Two providers addressing the same upstream model (e.g. `m1/x` and `m5/x`)
  track independently — correct, since the backends are different machines.

## Alternatives considered

- Persistent/pluggable health store — explicitly out of scope. Rejected.
- Unknown→permanent default — drops availability on novel errors. Rejected.