initial commit

2026-05-23 16:41:20 -04:00
commit 8fde024281
15 changed files with 803 additions and 0 deletions
@@ -0,0 +1,37 @@
+# ADR-0001: One daemon per Ollama target
+
+**Status:** Accepted — 2026-05-23
+
+## Context
+
+`peon-overseer` ballooned because it coordinated *multiple* workers from a
+central service: pull-based dispatch, claim leases, weighted fair queueing,
+capacity budgets, eligibility gates. All of that complexity existed solely to
+arbitrate shared workers. We want none of it back.
+
+The system being built fronts inference hardware (initially the M1 Pro running
+Ollama) and exposes it as a managed job endpoint.
+
+## Decision
+
+Each `foreman` process is bound to **exactly one** Ollama target, configured by a
+single base URL. One target = one daemon = one queue. There is no cross-daemon
+awareness and no shared state between daemons.
+
+If a second worker is added later (the 4090 box, the M5 Max), it gets its own
+`foreman` instance. Any fan-out across workers is the concern of a *separate*
+higher-level router that talks to multiple foreman instances — explicitly out of
+scope here and not to be anticipated in this codebase.
+
+## Consequences
+
+- The daemon is radically simple: one target, one serialized work stream.
+- Horizontal scale is "run another daemon," an operational act, not a code change.
+- No lease/fairness/budget machinery is permitted in this repo. If a change
+  starts to require it, that is the signal that the multi-worker router (a
+  different project) is what's actually needed.
+
+## Alternatives considered
+
+- **One daemon managing many targets.** Rejected: reintroduces the scheduling and
+  arbitration complexity that sank the predecessor.