67c3ebe067
CI / Build, Test & Lint (push) Successful in 10m50s
Ollama Cloud returns HTTP 503 when the model is temporarily overloaded, 429 on rate limit, and 502 on upstream failures. These are transient conditions that resolve on retry. Previously they bubbled up as hard errors, forcing callers to implement their own retry logic. The retry is implemented at the HTTP transport level in doChatRequest, so both Complete and Stream benefit transparently. Strategy: up to 3 retries with exponential backoff (1s, 2s, 4s), Retry-After header respected for 429, context cancellation checked between retries. Non-transient errors (400, 401, 403, 404, 500) are never retried. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>