67c3ebe06767976c7c96827aa2b3b79a200faa91
CI / Build, Test & Lint (push) Successful in 10m50s
Ollama Cloud returns HTTP 503 when the model is temporarily overloaded, 429 on rate limit, and 502 on upstream failures. These are transient conditions that resolve on retry. Previously they bubbled up as hard errors, forcing callers to implement their own retry logic. The retry is implemented at the HTTP transport level in doChatRequest, so both Complete and Stream benefit transparently. Strategy: up to 3 retries with exponential backoff (1s, 2s, 4s), Retry-After header respected for 429, context cancellation checked between retries. Non-transient errors (400, 401, 403, 404, 500) are never retried. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
Abstraction layer interface for various similar LLM services
Languages
Go
100%