initial commit
This commit is contained in:
@@ -0,0 +1,48 @@
|
||||
# ADR-0007: Model inventory polling and discovery
|
||||
|
||||
**Status:** Accepted — 2026-05-23
|
||||
|
||||
## Context
|
||||
|
||||
foreman needs a "relatively in-sync" view of which models are installed on its
|
||||
target so it can (a) advertise them to callers, (b) reject jobs for missing
|
||||
models early instead of failing mid-execution, and (c) know what is currently
|
||||
resident to inform scheduling (ADR-0009).
|
||||
|
||||
## Decision
|
||||
|
||||
A background poller queries the target on a configurable interval (default ~30s):
|
||||
|
||||
- `GET /api/tags` → the installed-model inventory. Cached in memory; this cache
|
||||
backs foreman's own `/api/tags` passthrough (ADR-0003) and `/v1/models` if the
|
||||
OpenAI-compat surface is enabled.
|
||||
- `GET /api/ps` → which model(s) are currently loaded, their VRAM/where-resident,
|
||||
and the unload timer. Used by the scheduler to decide whether the next job
|
||||
requires a swap.
|
||||
|
||||
### Behavior
|
||||
|
||||
- **Early validation:** a job naming a model absent from the cached inventory is
|
||||
rejected at submit time with a clear error (and, for async jobs, the inventory
|
||||
is recent enough that this is reliable). A small grace path allows a job for a
|
||||
model that appears between polls by re-checking once on a miss.
|
||||
- **Degraded mode:** if the target is unreachable, the last-known inventory is
|
||||
retained and foreman marks itself degraded (surfaced on a health endpoint).
|
||||
Jobs are not rejected wholesale on a single failed poll — the target is a
|
||||
laptop that may briefly sleep (ADR-0002). Execution-time unreachability is
|
||||
handled by job retry (ADR-0004).
|
||||
|
||||
## Consequences
|
||||
|
||||
- Callers can discover available models through the normal Ollama/OpenAI
|
||||
endpoints; no foreman-specific discovery API needed.
|
||||
- Bad-model jobs fail fast and cheaply.
|
||||
- A health/status endpoint exposing degraded state and last-poll time is required.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- **No caching; proxy `/api/tags` live per request.** Simpler but couples every
|
||||
discovery call to target availability and adds latency. Rejected; the poller
|
||||
also feeds the scheduler, so the cache is needed regardless.
|
||||
- **Push/event-based inventory.** Ollama offers no such mechanism; polling is the
|
||||
only option.
|
||||
Reference in New Issue
Block a user