docs: MIT license + public-readiness framing
CI / Tidy (push) Successful in 9m42s
CI / Build & Test (push) Successful in 10m32s
CI / Publish Docker Image (push) Successful in 1m16s

Add MIT LICENSE (matches gadfly/majordomo, same author). README + CLAUDE.md:
note this is a public, vibe-coded project; clarify the `go-llm` referenced in
the docs is now majordomo, and link it + gadfly as the downstream consumers
(foreman is a drop-in native-Ollama target via majordomo's ollama.Foreman
preset). CLAUDE.md gains a Build / test / run section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-26 20:30:52 -04:00
parent 7cd7eaff8b
commit 823c0b4ca8
3 changed files with 101 additions and 26 deletions
+63 -23
View File
@@ -4,7 +4,16 @@ A small, always-on daemon that fronts **one** Ollama target. It turns a single
Ollama instance into a queued, observable job endpoint: it polls the target's Ollama instance into a queued, observable job endpoint: it polls the target's
installed models, serializes work through the target (managing model swaps), installed models, serializes work through the target (managing model swaps),
assigns every job an ID, and reports progress + artifacts via webhooks. On the assigns every job an ID, and reports progress + artifacts via webhooks. On the
wire it speaks **native Ollama**, so it doubles as a drop-in `go-llm` target. wire it speaks **native Ollama**, so it doubles as a drop-in client target — for
any Ollama client, and specifically for
[majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo) (the `go-llm`
library referenced throughout these docs is now majordomo) and the
[gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly) reviewer built on it.
> This is a public, **vibe-coded** project (built largely by an AI agent). Keep
> that framing honest in the README; don't oversell it. Homelab specifics below
> (orgrimmar, the Macs, Komodo, Tailscale) are the author's deployment and are
> illustrative — the daemon itself is generic.
foreman is the deliberately pared-down successor to `peon-overseer`. One daemon, foreman is the deliberately pared-down successor to `peon-overseer`. One daemon,
one target, one queue. The complexity that sank the predecessor — distributed one target, one queue. The complexity that sank the predecessor — distributed
@@ -13,6 +22,26 @@ gates — existed to coordinate *multiple* workers and is **out of scope**.
Resisting that creep is a first-class design goal. See `docs/adr/` for the Resisting that creep is a first-class design goal. See `docs/adr/` for the
decisions; this file summarizes them. decisions; this file summarizes them.
## Build / test / run
```sh
go build ./cmd/foreman # the daemon binary
go test ./... # client/ + internal/* unit tests
go vet ./... && gofmt -l . # must be quiet / clean before committing
```
Run it locally against a real Ollama target (only `FOREMAN_OLLAMA_URL` is
required; full env reference in `.env.example` and the README table):
```sh
FOREMAN_OLLAMA_URL=http://mac.tail:11434 go run ./cmd/foreman serve
curl -s localhost:8080/healthz # {"status":"ok","degraded":false}
scripts/pull-models.sh # pull the recommended roster on the target
```
Pure-Go only (`modernc.org/sqlite`, no CGO) so Docker/Komodo builds stay trivial
— keep it that way. The worker loop must never panic: log, mark the job, continue.
## Topology (ADR-0001, ADR-0002) ## Topology (ADR-0001, ADR-0002)
``` ```
@@ -33,12 +62,16 @@ M1 Pro Mac: Ollama only (models on disk, no foreman logic)
1. **Primary — transparent native Ollama passthrough:** `/api/chat`, `/api/tags`, 1. **Primary — transparent native Ollama passthrough:** `/api/chat`, `/api/tags`,
`/api/ps`. foreman looks exactly like an Ollama server. Synchronous: calls are `/api/ps`. foreman looks exactly like an Ollama server. Synchronous: calls are
queued internally but the HTTP response blocks until completion. SSE streaming queued internally but the HTTP response blocks until completion. NDJSON
supported (ADR-0012). This is the `go-llm` target path. streaming supported (`application/x-ndjson` — Ollama's native wire format, not
2. **Async jobs — `POST /jobs`, `GET /jobs/{id}`:** body is a native-chat payload SSE; ADR-0012). This is the `go-llm` target path.
2. **Embeddings (bypass the queue) — `/api/embed`, `/api/embeddings`:** proxied
directly and concurrently to the always-resident embedder; never touch the
queue or worker loop (ADR-0013).
3. **Async jobs — `POST /jobs`, `GET /jobs/{id}`:** body is a native-chat payload
plus optional `state_webhook_url`. Returns `202` + `{ "job_id": "<ulid>" }` plus optional `state_webhook_url`. Returns `202` + `{ "job_id": "<ulid>" }`
immediately. For fire-and-forget orchestration callers. immediately. For fire-and-forget orchestration callers.
3. **Optional OpenAI-compat `/v1/chat/completions` + `/v1/models`:** deferred; 4. **Optional OpenAI-compat `/v1/chat/completions` + `/v1/models`:** deferred;
added only if a non-go-llm caller needs it. added only if a non-go-llm caller needs it.
Job lifecycle: `queued → loading → working → done` (+ terminal `failed`). A Job lifecycle: `queued → loading → working → done` (+ terminal `failed`). A
@@ -65,15 +98,18 @@ guard poison jobs). IDs are ULIDs (sortable, timestamped).
miss). Target unreachable → retain last-known list, mark degraded on a health miss). Target unreachable → retain last-known list, mark degraded on a health
endpoint; do not reject wholesale on a single failed poll. endpoint; do not reject wholesale on a single failed poll.
## Execution (ADR-0009) ## Execution (ADR-0009, ADR-0013)
- **Concurrency against the target is 1.** A single worker loop pulls a job, - **Worker-model concurrency against the target is 1.** A single worker loop pulls
ensures the right model is resident, executes, records the result. a job, ensures the right worker model is resident, executes, records the result.
- **Drain-by-model:** finish every queued job for the currently-resident model Embeddings are not jobs and bypass this loop entirely (ADR-0013).
before paying a swap (`ORDER BY (model != current), created_at`). A heuristic, - **Drain-by-model:** finish every queued job for the currently-resident worker
not a scheduler. No priorities, fairness, or budgets. model before paying a swap (`ORDER BY (model != current), created_at`). A
- Pin residency with Ollama `keep_alive`; target runs `OLLAMA_MAX_LOADED_MODELS=1` heuristic, not a scheduler. No priorities, fairness, or budgets.
and `OLLAMA_CONTEXT_LENGTH=8192`+. - **Two resident slots:** target runs `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 is the
always-resident embedder (`FOREMAN_EMBED_MODEL`, pinned `keep_alive: -1`,
warmed on startup/reconnect); slot 2 is the rotating worker model. Pin the
worker with `keep_alive`; set `OLLAMA_CONTEXT_LENGTH=8192`+.
## Persistence (ADR-0008) ## Persistence (ADR-0008)
@@ -85,13 +121,17 @@ guard poison jobs). IDs are ULIDs (sortable, timestamped).
foreman serves **any installed model** named in a request; it does not own a foreman serves **any installed model** named in a request; it does not own a
role→model mapping (the caller picks the model, e.g. go-llm `.Model(...)`). role→model mapping (the caller picks the model, e.g. go-llm `.Model(...)`).
Recommended roster to pull on the Mac (32GB, ~26-28GB usable, single-resident Recommended roster to pull on the Mac (32GB; the embedder stays resident in slot
swap): 1, one worker model rotates through slot 2 — ADR-0013):
- **embedder (always resident)** — `nomic-embed-text` (~0.3GB) or
`qwen3-embedding:0.6b`; selected via `FOREMAN_EMBED_MODEL`.
- **parse / data** — `qwen3:14b` (~9GB, structured/JSON output). - **parse / data** — `qwen3:14b` (~9GB, structured/JSON output).
- **agent + code** — `qwen3.6:35b` (MoE, ~3B active, ~20GB, fast tool-calling). - **agent + code** — `qwen3:30b` (Qwen3-30B-A3B MoE, ~3B active, ~19GB, fast
- Split a dedicated dense coder (`qwen3.6:27b`) off later only if `35b`'s code tool-calling). This is the default worker model.
quality disappoints; it's bandwidth-bound and slow on this Mac. - Add a dedicated dense coder only if `qwen3:30b`'s code quality disappoints:
`gpt-oss:20b` (~13GB, faster) or `qwen2.5-coder:32b` (~20GB, higher quality but
bandwidth-bound and slow on this Mac).
- Verify exact tags against the Ollama library before pulling; the registry moves. - Verify exact tags against the Ollama library before pulling; the registry moves.
## go-llm integration (ADR-0011) ## go-llm integration (ADR-0011)
@@ -100,7 +140,7 @@ Verified: `llm.OllamaCloud(key, WithBaseURL(...))` already targets a private
authenticated native-Ollama endpoint — which foreman is. Integration is a thin authenticated native-Ollama endpoint — which foreman is. Integration is a thin
constructor, no new provider: constructor, no new provider:
- **Level 0 (now):** `llm.Foreman(baseURL, token).Model("qwen3.6:35b")` — delegates - **Level 0 (now):** `llm.Foreman(baseURL, token).Model("qwen3:30b")` — delegates
to the ollama provider; transparent, synchronous, full tool/think/stream. to the ollama provider; transparent, synchronous, full tool/think/stream.
- **Level 1 (later):** a `foreman` client package — synchronous facade over the - **Level 1 (later):** a `foreman` client package — synchronous facade over the
async `/jobs` surface (manages a webhook receiver, blocks to done). async `/jobs` surface (manages a webhook receiver, blocks to done).
@@ -117,7 +157,7 @@ constructor, no new provider:
## Stack & conventions ## Stack & conventions
- Go, stdlib `net/http`, minimal deps. SQLite via `modernc.org/sqlite`. - Go 1.26, stdlib `net/http`, minimal deps. SQLite via `modernc.org/sqlite`.
- No UI. HTTP API + small CLI only. - No UI. HTTP API + small CLI only.
- Match go-llm house style: standard Go tabs; `camelCase`/`PascalCase`; check - Match go-llm house style: standard Go tabs; `camelCase`/`PascalCase`; check
errors immediately and wrap with `fmt.Errorf("%w: ...", err)`; imports stdlib → errors immediately and wrap with `fmt.Errorf("%w: ...", err)`; imports stdlib →
@@ -137,8 +177,8 @@ so a future second backend is additive — but do not build for it now.
- **M0** — native `/api/chat` passthrough + SQLite queue + single-worker loop, one - **M0** — native `/api/chat` passthrough + SQLite queue + single-worker loop, one
model end to end, synchronous. model end to end, synchronous.
- **M1** — model poller + `/api/tags`/`/api/ps`, drain-by-model, async `/jobs` + - **M1** — model poller + `/api/tags`/`/api/ps`, drain-by-model, embedding bypass,
`state_webhook_url` + artifacts + retry-on-unreachable, the CLI, and the async `/jobs` + `state_webhook_url` + artifacts + retry-on-unreachable, the CLI,
`llm.Foreman()` constructor in go-llm. and the `llm.Foreman()` constructor in go-llm.
- **M2 (later)** — optional OpenAI-compat `/v1`, Level-1 client / dedicated - **M2 (later)** — optional OpenAI-compat `/v1`, Level-1 client / dedicated
provider, metrics. provider, metrics.
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Steve Dudenhoeffer
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+17 -3
View File
@@ -1,12 +1,22 @@
# foreman # foreman
A small, always-on Go daemon that fronts **one** Ollama target. It turns a 🪓 A small, always-on Go daemon that fronts **one** Ollama target. It turns a
single Ollama instance into a queued, observable job endpoint: it polls the single Ollama instance into a queued, observable job endpoint: it polls the
target's installed models, serializes work through the target (managing model target's installed models, serializes work through the target (managing model
swaps), assigns every job an ID, and reports progress via webhooks. swaps), assigns every job an ID, and reports progress via webhooks.
On the wire it speaks **native Ollama**, so it doubles as a drop-in `go-llm` On the wire it speaks **native Ollama**, so it doubles as a drop-in target for
target. any Ollama client — including [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo)
(via its `ollama.Foreman(url, token)` preset) and, through that,
[gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly). Point a client at the
foreman URL instead of the raw Ollama and you get queuing + model-swap
serialization for free.
> **This is a public, vibe-coded project** (built largely by an AI agent). It runs
> the author's homelab but is intentionally generic — one daemon, one target, one
> queue. Treat the homelab specifics in the docs as illustrative, and don't
> oversell it: it's a deliberately small queue in front of Ollama, not a
> distributed scheduler.
## Quickstart ## Quickstart
@@ -61,3 +71,7 @@ See [`docs/adr/`](docs/adr/) for design decisions. Key points:
- Single worker loop with drain-by-model scheduling (ADR-0009) - Single worker loop with drain-by-model scheduling (ADR-0009)
- Native Ollama passthrough + async `/jobs` surface (ADR-0003, ADR-0004) - Native Ollama passthrough + async `/jobs` surface (ADR-0003, ADR-0004)
- Embeddings bypass the queue entirely (ADR-0013) - Embeddings bypass the queue entirely (ADR-0013)
## License
[MIT](LICENSE) © 2026 Steve Dudenhoeffer.