docs: MIT license + public-readiness framing
CI / Tidy (push) Successful in 9m42s
CI / Build & Test (push) Successful in 10m32s
CI / Publish Docker Image (push) Successful in 1m16s

Add MIT LICENSE (matches gadfly/majordomo, same author). README + CLAUDE.md:
note this is a public, vibe-coded project; clarify the `go-llm` referenced in
the docs is now majordomo, and link it + gadfly as the downstream consumers
(foreman is a drop-in native-Ollama target via majordomo's ollama.Foreman
preset). CLAUDE.md gains a Build / test / run section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-26 20:30:52 -04:00
parent 7cd7eaff8b
commit 823c0b4ca8
3 changed files with 101 additions and 26 deletions
+63 -23
View File
@@ -4,7 +4,16 @@ A small, always-on daemon that fronts **one** Ollama target. It turns a single
Ollama instance into a queued, observable job endpoint: it polls the target's
installed models, serializes work through the target (managing model swaps),
assigns every job an ID, and reports progress + artifacts via webhooks. On the
wire it speaks **native Ollama**, so it doubles as a drop-in `go-llm` target.
wire it speaks **native Ollama**, so it doubles as a drop-in client target — for
any Ollama client, and specifically for
[majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo) (the `go-llm`
library referenced throughout these docs is now majordomo) and the
[gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly) reviewer built on it.
> This is a public, **vibe-coded** project (built largely by an AI agent). Keep
> that framing honest in the README; don't oversell it. Homelab specifics below
> (orgrimmar, the Macs, Komodo, Tailscale) are the author's deployment and are
> illustrative — the daemon itself is generic.
foreman is the deliberately pared-down successor to `peon-overseer`. One daemon,
one target, one queue. The complexity that sank the predecessor — distributed
@@ -13,6 +22,26 @@ gates — existed to coordinate *multiple* workers and is **out of scope**.
Resisting that creep is a first-class design goal. See `docs/adr/` for the
decisions; this file summarizes them.
## Build / test / run
```sh
go build ./cmd/foreman # the daemon binary
go test ./... # client/ + internal/* unit tests
go vet ./... && gofmt -l . # must be quiet / clean before committing
```
Run it locally against a real Ollama target (only `FOREMAN_OLLAMA_URL` is
required; full env reference in `.env.example` and the README table):
```sh
FOREMAN_OLLAMA_URL=http://mac.tail:11434 go run ./cmd/foreman serve
curl -s localhost:8080/healthz # {"status":"ok","degraded":false}
scripts/pull-models.sh # pull the recommended roster on the target
```
Pure-Go only (`modernc.org/sqlite`, no CGO) so Docker/Komodo builds stay trivial
— keep it that way. The worker loop must never panic: log, mark the job, continue.
## Topology (ADR-0001, ADR-0002)
```
@@ -33,12 +62,16 @@ M1 Pro Mac: Ollama only (models on disk, no foreman logic)
1. **Primary — transparent native Ollama passthrough:** `/api/chat`, `/api/tags`,
`/api/ps`. foreman looks exactly like an Ollama server. Synchronous: calls are
queued internally but the HTTP response blocks until completion. SSE streaming
supported (ADR-0012). This is the `go-llm` target path.
2. **Async jobs — `POST /jobs`, `GET /jobs/{id}`:** body is a native-chat payload
queued internally but the HTTP response blocks until completion. NDJSON
streaming supported (`application/x-ndjson` — Ollama's native wire format, not
SSE; ADR-0012). This is the `go-llm` target path.
2. **Embeddings (bypass the queue) — `/api/embed`, `/api/embeddings`:** proxied
directly and concurrently to the always-resident embedder; never touch the
queue or worker loop (ADR-0013).
3. **Async jobs — `POST /jobs`, `GET /jobs/{id}`:** body is a native-chat payload
plus optional `state_webhook_url`. Returns `202` + `{ "job_id": "<ulid>" }`
immediately. For fire-and-forget orchestration callers.
3. **Optional OpenAI-compat `/v1/chat/completions` + `/v1/models`:** deferred;
4. **Optional OpenAI-compat `/v1/chat/completions` + `/v1/models`:** deferred;
added only if a non-go-llm caller needs it.
Job lifecycle: `queued → loading → working → done` (+ terminal `failed`). A
@@ -65,15 +98,18 @@ guard poison jobs). IDs are ULIDs (sortable, timestamped).
miss). Target unreachable → retain last-known list, mark degraded on a health
endpoint; do not reject wholesale on a single failed poll.
## Execution (ADR-0009)
## Execution (ADR-0009, ADR-0013)
- **Concurrency against the target is 1.** A single worker loop pulls a job,
ensures the right model is resident, executes, records the result.
- **Drain-by-model:** finish every queued job for the currently-resident model
before paying a swap (`ORDER BY (model != current), created_at`). A heuristic,
not a scheduler. No priorities, fairness, or budgets.
- Pin residency with Ollama `keep_alive`; target runs `OLLAMA_MAX_LOADED_MODELS=1`
and `OLLAMA_CONTEXT_LENGTH=8192`+.
- **Worker-model concurrency against the target is 1.** A single worker loop pulls
a job, ensures the right worker model is resident, executes, records the result.
Embeddings are not jobs and bypass this loop entirely (ADR-0013).
- **Drain-by-model:** finish every queued job for the currently-resident worker
model before paying a swap (`ORDER BY (model != current), created_at`). A
heuristic, not a scheduler. No priorities, fairness, or budgets.
- **Two resident slots:** target runs `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 is the
always-resident embedder (`FOREMAN_EMBED_MODEL`, pinned `keep_alive: -1`,
warmed on startup/reconnect); slot 2 is the rotating worker model. Pin the
worker with `keep_alive`; set `OLLAMA_CONTEXT_LENGTH=8192`+.
## Persistence (ADR-0008)
@@ -85,13 +121,17 @@ guard poison jobs). IDs are ULIDs (sortable, timestamped).
foreman serves **any installed model** named in a request; it does not own a
role→model mapping (the caller picks the model, e.g. go-llm `.Model(...)`).
Recommended roster to pull on the Mac (32GB, ~26-28GB usable, single-resident
swap):
Recommended roster to pull on the Mac (32GB; the embedder stays resident in slot
1, one worker model rotates through slot 2 — ADR-0013):
- **embedder (always resident)** — `nomic-embed-text` (~0.3GB) or
`qwen3-embedding:0.6b`; selected via `FOREMAN_EMBED_MODEL`.
- **parse / data** — `qwen3:14b` (~9GB, structured/JSON output).
- **agent + code** — `qwen3.6:35b` (MoE, ~3B active, ~20GB, fast tool-calling).
- Split a dedicated dense coder (`qwen3.6:27b`) off later only if `35b`'s code
quality disappoints; it's bandwidth-bound and slow on this Mac.
- **agent + code** — `qwen3:30b` (Qwen3-30B-A3B MoE, ~3B active, ~19GB, fast
tool-calling). This is the default worker model.
- Add a dedicated dense coder only if `qwen3:30b`'s code quality disappoints:
`gpt-oss:20b` (~13GB, faster) or `qwen2.5-coder:32b` (~20GB, higher quality but
bandwidth-bound and slow on this Mac).
- Verify exact tags against the Ollama library before pulling; the registry moves.
## go-llm integration (ADR-0011)
@@ -100,7 +140,7 @@ Verified: `llm.OllamaCloud(key, WithBaseURL(...))` already targets a private
authenticated native-Ollama endpoint — which foreman is. Integration is a thin
constructor, no new provider:
- **Level 0 (now):** `llm.Foreman(baseURL, token).Model("qwen3.6:35b")` — delegates
- **Level 0 (now):** `llm.Foreman(baseURL, token).Model("qwen3:30b")` — delegates
to the ollama provider; transparent, synchronous, full tool/think/stream.
- **Level 1 (later):** a `foreman` client package — synchronous facade over the
async `/jobs` surface (manages a webhook receiver, blocks to done).
@@ -117,7 +157,7 @@ constructor, no new provider:
## Stack & conventions
- Go, stdlib `net/http`, minimal deps. SQLite via `modernc.org/sqlite`.
- Go 1.26, stdlib `net/http`, minimal deps. SQLite via `modernc.org/sqlite`.
- No UI. HTTP API + small CLI only.
- Match go-llm house style: standard Go tabs; `camelCase`/`PascalCase`; check
errors immediately and wrap with `fmt.Errorf("%w: ...", err)`; imports stdlib →
@@ -137,8 +177,8 @@ so a future second backend is additive — but do not build for it now.
- **M0** — native `/api/chat` passthrough + SQLite queue + single-worker loop, one
model end to end, synchronous.
- **M1** — model poller + `/api/tags`/`/api/ps`, drain-by-model, async `/jobs` +
`state_webhook_url` + artifacts + retry-on-unreachable, the CLI, and the
`llm.Foreman()` constructor in go-llm.
- **M1** — model poller + `/api/tags`/`/api/ps`, drain-by-model, embedding bypass,
async `/jobs` + `state_webhook_url` + artifacts + retry-on-unreachable, the CLI,
and the `llm.Foreman()` constructor in go-llm.
- **M2 (later)** — optional OpenAI-compat `/v1`, Level-1 client / dedicated
provider, metrics.
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Steve Dudenhoeffer
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+17 -3
View File
@@ -1,12 +1,22 @@
# foreman
A small, always-on Go daemon that fronts **one** Ollama target. It turns a
🪓 A small, always-on Go daemon that fronts **one** Ollama target. It turns a
single Ollama instance into a queued, observable job endpoint: it polls the
target's installed models, serializes work through the target (managing model
swaps), assigns every job an ID, and reports progress via webhooks.
On the wire it speaks **native Ollama**, so it doubles as a drop-in `go-llm`
target.
On the wire it speaks **native Ollama**, so it doubles as a drop-in target for
any Ollama client — including [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo)
(via its `ollama.Foreman(url, token)` preset) and, through that,
[gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly). Point a client at the
foreman URL instead of the raw Ollama and you get queuing + model-swap
serialization for free.
> **This is a public, vibe-coded project** (built largely by an AI agent). It runs
> the author's homelab but is intentionally generic — one daemon, one target, one
> queue. Treat the homelab specifics in the docs as illustrative, and don't
> oversell it: it's a deliberately small queue in front of Ollama, not a
> distributed scheduler.
## Quickstart
@@ -61,3 +71,7 @@ See [`docs/adr/`](docs/adr/) for design decisions. Key points:
- Single worker loop with drain-by-model scheduling (ADR-0009)
- Native Ollama passthrough + async `/jobs` surface (ADR-0003, ADR-0004)
- Embeddings bypass the queue entirely (ADR-0013)
## License
[MIT](LICENSE) © 2026 Steve Dudenhoeffer.