ADR-0001: One daemon per Ollama target

Status: Accepted — 2026-05-23

Context

peon-overseer ballooned because it coordinated multiple workers from a central service: pull-based dispatch, claim leases, weighted fair queueing, capacity budgets, eligibility gates. All of that complexity existed solely to arbitrate shared workers. We want none of it back.

The system being built fronts inference hardware (initially the M1 Pro running Ollama) and exposes it as a managed job endpoint.

Decision

Each foreman process is bound to exactly one Ollama target, configured by a single base URL. One target = one daemon = one queue. There is no cross-daemon awareness and no shared state between daemons.

If a second worker is added later (the 4090 box, the M5 Max), it gets its own foreman instance. Any fan-out across workers is the concern of a separate higher-level router that talks to multiple foreman instances — explicitly out of scope here and not to be anticipated in this codebase.

Consequences

The daemon is radically simple: one target, one serialized work stream.
Horizontal scale is "run another daemon," an operational act, not a code change.
No lease/fairness/budget machinery is permitted in this repo. If a change starts to require it, that is the signal that the multi-worker router (a different project) is what's actually needed.

Alternatives considered

One daemon managing many targets. Rejected: reintroduces the scheduling and arbitration complexity that sank the predecessor.

1.5 KiB Raw Permalink Blame History

ADR-0001: One daemon per Ollama target

Context

Decision

Consequences

Alternatives considered

1.5 KiB

Raw Permalink Blame History