initial commit
This commit is contained in:
@@ -0,0 +1,37 @@
|
||||
# ADR-0001: One daemon per Ollama target
|
||||
|
||||
**Status:** Accepted — 2026-05-23
|
||||
|
||||
## Context
|
||||
|
||||
`peon-overseer` ballooned because it coordinated *multiple* workers from a
|
||||
central service: pull-based dispatch, claim leases, weighted fair queueing,
|
||||
capacity budgets, eligibility gates. All of that complexity existed solely to
|
||||
arbitrate shared workers. We want none of it back.
|
||||
|
||||
The system being built fronts inference hardware (initially the M1 Pro running
|
||||
Ollama) and exposes it as a managed job endpoint.
|
||||
|
||||
## Decision
|
||||
|
||||
Each `foreman` process is bound to **exactly one** Ollama target, configured by a
|
||||
single base URL. One target = one daemon = one queue. There is no cross-daemon
|
||||
awareness and no shared state between daemons.
|
||||
|
||||
If a second worker is added later (the 4090 box, the M5 Max), it gets its own
|
||||
`foreman` instance. Any fan-out across workers is the concern of a *separate*
|
||||
higher-level router that talks to multiple foreman instances — explicitly out of
|
||||
scope here and not to be anticipated in this codebase.
|
||||
|
||||
## Consequences
|
||||
|
||||
- The daemon is radically simple: one target, one serialized work stream.
|
||||
- Horizontal scale is "run another daemon," an operational act, not a code change.
|
||||
- No lease/fairness/budget machinery is permitted in this repo. If a change
|
||||
starts to require it, that is the signal that the multi-worker router (a
|
||||
different project) is what's actually needed.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- **One daemon managing many targets.** Rejected: reintroduces the scheduling and
|
||||
arbitration complexity that sank the predecessor.
|
||||
Reference in New Issue
Block a user