38 lines
1.5 KiB
Markdown
38 lines
1.5 KiB
Markdown
# ADR-0001: One daemon per Ollama target
|
|
|
|
**Status:** Accepted — 2026-05-23
|
|
|
|
## Context
|
|
|
|
`peon-overseer` ballooned because it coordinated *multiple* workers from a
|
|
central service: pull-based dispatch, claim leases, weighted fair queueing,
|
|
capacity budgets, eligibility gates. All of that complexity existed solely to
|
|
arbitrate shared workers. We want none of it back.
|
|
|
|
The system being built fronts inference hardware (initially the M1 Pro running
|
|
Ollama) and exposes it as a managed job endpoint.
|
|
|
|
## Decision
|
|
|
|
Each `foreman` process is bound to **exactly one** Ollama target, configured by a
|
|
single base URL. One target = one daemon = one queue. There is no cross-daemon
|
|
awareness and no shared state between daemons.
|
|
|
|
If a second worker is added later (the 4090 box, the M5 Max), it gets its own
|
|
`foreman` instance. Any fan-out across workers is the concern of a *separate*
|
|
higher-level router that talks to multiple foreman instances — explicitly out of
|
|
scope here and not to be anticipated in this codebase.
|
|
|
|
## Consequences
|
|
|
|
- The daemon is radically simple: one target, one serialized work stream.
|
|
- Horizontal scale is "run another daemon," an operational act, not a code change.
|
|
- No lease/fairness/budget machinery is permitted in this repo. If a change
|
|
starts to require it, that is the signal that the multi-worker router (a
|
|
different project) is what's actually needed.
|
|
|
|
## Alternatives considered
|
|
|
|
- **One daemon managing many targets.** Rejected: reintroduces the scheduling and
|
|
arbitration complexity that sank the predecessor.
|