2.1 KiB
2.1 KiB
ADR-0007: Model inventory polling and discovery
Status: Accepted — 2026-05-23
Context
foreman needs a "relatively in-sync" view of which models are installed on its target so it can (a) advertise them to callers, (b) reject jobs for missing models early instead of failing mid-execution, and (c) know what is currently resident to inform scheduling (ADR-0009).
Decision
A background poller queries the target on a configurable interval (default ~30s):
GET /api/tags→ the installed-model inventory. Cached in memory; this cache backs foreman's own/api/tagspassthrough (ADR-0003) and/v1/modelsif the OpenAI-compat surface is enabled.GET /api/ps→ which model(s) are currently loaded, their VRAM/where-resident, and the unload timer. Used by the scheduler to decide whether the next job requires a swap.
Behavior
- Early validation: a job naming a model absent from the cached inventory is rejected at submit time with a clear error (and, for async jobs, the inventory is recent enough that this is reliable). A small grace path allows a job for a model that appears between polls by re-checking once on a miss.
- Degraded mode: if the target is unreachable, the last-known inventory is retained and foreman marks itself degraded (surfaced on a health endpoint). Jobs are not rejected wholesale on a single failed poll — the target is a laptop that may briefly sleep (ADR-0002). Execution-time unreachability is handled by job retry (ADR-0004).
Consequences
- Callers can discover available models through the normal Ollama/OpenAI endpoints; no foreman-specific discovery API needed.
- Bad-model jobs fail fast and cheaply.
- A health/status endpoint exposing degraded state and last-poll time is required.
Alternatives considered
- No caching; proxy
/api/tagslive per request. Simpler but couples every discovery call to target availability and adds latency. Rejected; the poller also feeds the scheduler, so the cache is needed regardless. - Push/event-based inventory. Ollama offers no such mechanism; polling is the only option.