docs: MIT license + public-readiness framing

Add MIT LICENSE (matches gadfly/majordomo, same author). README + CLAUDE.md: note this is a public, vibe-coded project; clarify the `go-llm` referenced in the docs is now majordomo, and link it + gadfly as the downstream consumers (foreman is a drop-in native-Ollama target via majordomo's ollama.Foreman preset). CLAUDE.md gains a Build / test / run section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 20:30:52 -04:00
parent 7cd7eaff8b
commit 823c0b4ca8
3 changed files with 101 additions and 26 deletions
@@ -4,7 +4,16 @@ A small, always-on daemon that fronts **one** Ollama target. It turns a single
 Ollama instance into a queued, observable job endpoint: it polls the target's
 installed models, serializes work through the target (managing model swaps),
 assigns every job an ID, and reports progress + artifacts via webhooks. On the
-wire it speaks **native Ollama**, so it doubles as a drop-in `go-llm` target.
+wire it speaks **native Ollama**, so it doubles as a drop-in client target — for
 any Ollama client, and specifically for
 [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo) (the `go-llm`
 library referenced throughout these docs is now majordomo) and the
 [gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly) reviewer built on it.
 > This is a public, **vibe-coded** project (built largely by an AI agent). Keep
 > that framing honest in the README; don't oversell it. Homelab specifics below
 > (orgrimmar, the Macs, Komodo, Tailscale) are the author's deployment and are
 > illustrative — the daemon itself is generic.
 foreman is the deliberately pared-down successor to `peon-overseer`. One daemon,
 one target, one queue. The complexity that sank the predecessor — distributed
@@ -13,6 +22,26 @@ gates — existed to coordinate *multiple* workers and is **out of scope**.
 Resisting that creep is a first-class design goal. See `docs/adr/` for the
 decisions; this file summarizes them.
 ## Build / test / run
 ```sh
 go build ./cmd/foreman        # the daemon binary
 go test ./...                 # client/ + internal/* unit tests
 go vet ./... && gofmt -l .    # must be quiet / clean before committing
 ```
 Run it locally against a real Ollama target (only `FOREMAN_OLLAMA_URL` is
 required; full env reference in `.env.example` and the README table):
 ```sh
 FOREMAN_OLLAMA_URL=http://mac.tail:11434 go run ./cmd/foreman serve
 curl -s localhost:8080/healthz          # {"status":"ok","degraded":false}
 scripts/pull-models.sh                  # pull the recommended roster on the target
 ```
 Pure-Go only (`modernc.org/sqlite`, no CGO) so Docker/Komodo builds stay trivial
 — keep it that way. The worker loop must never panic: log, mark the job, continue.
 ## Topology (ADR-0001, ADR-0002)
 ```
@@ -33,12 +62,16 @@ M1 Pro Mac:  Ollama only  (models on disk, no foreman logic)
 1. **Primary — transparent native Ollama passthrough:** `/api/chat`, `/api/tags`,
   `/api/ps`. foreman looks exactly like an Ollama server. Synchronous: calls are
-   queued internally but the HTTP response blocks until completion. SSE streaming
+   queued internally but the HTTP response blocks until completion. NDJSON
-   supported (ADR-0012). This is the `go-llm` target path.
+   streaming supported (`application/x-ndjson` — Ollama's native wire format, not
-2. **Async jobs — `POST /jobs`, `GET /jobs/{id}`:** body is a native-chat payload
+   SSE; ADR-0012). This is the `go-llm` target path.
 2. **Embeddings (bypass the queue) — `/api/embed`, `/api/embeddings`:** proxied
   directly and concurrently to the always-resident embedder; never touch the
   queue or worker loop (ADR-0013).
 3. **Async jobs — `POST /jobs`, `GET /jobs/{id}`:** body is a native-chat payload
   plus optional `state_webhook_url`. Returns `202` + `{ "job_id": "<ulid>" }`
   immediately. For fire-and-forget orchestration callers.
-3. **Optional OpenAI-compat `/v1/chat/completions` + `/v1/models`:** deferred;
+4. **Optional OpenAI-compat `/v1/chat/completions` + `/v1/models`:** deferred;
   added only if a non-go-llm caller needs it.
 Job lifecycle: `queued → loading → working → done` (+ terminal `failed`). A
@@ -65,15 +98,18 @@ guard poison jobs). IDs are ULIDs (sortable, timestamped).
  miss). Target unreachable → retain last-known list, mark degraded on a health
  endpoint; do not reject wholesale on a single failed poll.
-## Execution (ADR-0009)
+## Execution (ADR-0009, ADR-0013)
- **Concurrency against the target is 1.** A single worker loop pulls a job,
+- **Worker-model concurrency against the target is 1.** A single worker loop pulls
-  ensures the right model is resident, executes, records the result.
+  a job, ensures the right worker model is resident, executes, records the result.
- **Drain-by-model:** finish every queued job for the currently-resident model
+  Embeddings are not jobs and bypass this loop entirely (ADR-0013).
-  before paying a swap (`ORDER BY (model != current), created_at`). A heuristic,
+- **Drain-by-model:** finish every queued job for the currently-resident worker
-  not a scheduler. No priorities, fairness, or budgets.
+  model before paying a swap (`ORDER BY (model != current), created_at`). A
- Pin residency with Ollama `keep_alive`; target runs `OLLAMA_MAX_LOADED_MODELS=1`
+  heuristic, not a scheduler. No priorities, fairness, or budgets.
-  and `OLLAMA_CONTEXT_LENGTH=8192`+.
+- **Two resident slots:** target runs `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 is the
  always-resident embedder (`FOREMAN_EMBED_MODEL`, pinned `keep_alive: -1`,
  warmed on startup/reconnect); slot 2 is the rotating worker model. Pin the
  worker with `keep_alive`; set `OLLAMA_CONTEXT_LENGTH=8192`+.
 ## Persistence (ADR-0008)
@@ -85,13 +121,17 @@ guard poison jobs). IDs are ULIDs (sortable, timestamped).
 foreman serves **any installed model** named in a request; it does not own a
 role→model mapping (the caller picks the model, e.g. go-llm `.Model(...)`).
-Recommended roster to pull on the Mac (32GB, ~26-28GB usable, single-resident
+Recommended roster to pull on the Mac (32GB; the embedder stays resident in slot
-swap):
+1, one worker model rotates through slot 2 — ADR-0013):
 - **embedder (always resident)** — `nomic-embed-text` (~0.3GB) or
  `qwen3-embedding:0.6b`; selected via `FOREMAN_EMBED_MODEL`.
 - **parse / data** — `qwen3:14b` (~9GB, structured/JSON output).
- **agent + code** — `qwen3.6:35b` (MoE, ~3B active, ~20GB, fast tool-calling).
+- **agent + code** — `qwen3:30b` (Qwen3-30B-A3B MoE, ~3B active, ~19GB, fast
- Split a dedicated dense coder (`qwen3.6:27b`) off later only if `35b`'s code
+  tool-calling). This is the default worker model.
-  quality disappoints; it's bandwidth-bound and slow on this Mac.
+- Add a dedicated dense coder only if `qwen3:30b`'s code quality disappoints:
  `gpt-oss:20b` (~13GB, faster) or `qwen2.5-coder:32b` (~20GB, higher quality but
  bandwidth-bound and slow on this Mac).
 - Verify exact tags against the Ollama library before pulling; the registry moves.
 ## go-llm integration (ADR-0011)
@@ -100,7 +140,7 @@ Verified: `llm.OllamaCloud(key, WithBaseURL(...))` already targets a private
 authenticated native-Ollama endpoint — which foreman is. Integration is a thin
 constructor, no new provider:
- **Level 0 (now):** `llm.Foreman(baseURL, token).Model("qwen3.6:35b")` — delegates
+- **Level 0 (now):** `llm.Foreman(baseURL, token).Model("qwen3:30b")` — delegates
  to the ollama provider; transparent, synchronous, full tool/think/stream.
 - **Level 1 (later):** a `foreman` client package — synchronous facade over the
  async `/jobs` surface (manages a webhook receiver, blocks to done).
@@ -117,7 +157,7 @@ constructor, no new provider:
 ## Stack & conventions
- Go, stdlib `net/http`, minimal deps. SQLite via `modernc.org/sqlite`.
+- Go 1.26, stdlib `net/http`, minimal deps. SQLite via `modernc.org/sqlite`.
 - No UI. HTTP API + small CLI only.
 - Match go-llm house style: standard Go tabs; `camelCase`/`PascalCase`; check
  errors immediately and wrap with `fmt.Errorf("%w: ...", err)`; imports stdlib →
@@ -137,8 +177,8 @@ so a future second backend is additive — but do not build for it now.
 - **M0** — native `/api/chat` passthrough + SQLite queue + single-worker loop, one
  model end to end, synchronous.
- **M1** — model poller + `/api/tags`/`/api/ps`, drain-by-model, async `/jobs` +
+- **M1** — model poller + `/api/tags`/`/api/ps`, drain-by-model, embedding bypass,
-  `state_webhook_url` + artifacts + retry-on-unreachable, the CLI, and the
+  async `/jobs` + `state_webhook_url` + artifacts + retry-on-unreachable, the CLI,
-  `llm.Foreman()` constructor in go-llm.
+  and the `llm.Foreman()` constructor in go-llm.
 - **M2 (later)** — optional OpenAI-compat `/v1`, Level-1 client / dedicated
  provider, metrics.
@@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2026 Steve Dudenhoeffer
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
@@ -1,12 +1,22 @@
 # foreman
-A small, always-on Go daemon that fronts **one** Ollama target. It turns a
+🪓 A small, always-on Go daemon that fronts **one** Ollama target. It turns a
 single Ollama instance into a queued, observable job endpoint: it polls the
 target's installed models, serializes work through the target (managing model
 swaps), assigns every job an ID, and reports progress via webhooks.
-On the wire it speaks **native Ollama**, so it doubles as a drop-in `go-llm`
+On the wire it speaks **native Ollama**, so it doubles as a drop-in target for
-target.
+any Ollama client — including [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo)
 (via its `ollama.Foreman(url, token)` preset) and, through that,
 [gadfly](https://gitea.stevedudenhoeffer.com/steve/gadfly). Point a client at the
 foreman URL instead of the raw Ollama and you get queuing + model-swap
 serialization for free.
 > **This is a public, vibe-coded project** (built largely by an AI agent). It runs
 > the author's homelab but is intentionally generic — one daemon, one target, one
 > queue. Treat the homelab specifics in the docs as illustrative, and don't
 > oversell it: it's a deliberately small queue in front of Ollama, not a
 > distributed scheduler.
 ## Quickstart
@@ -61,3 +71,7 @@ See [`docs/adr/`](docs/adr/) for design decisions. Key points:
 - Single worker loop with drain-by-model scheduling (ADR-0009)
 - Native Ollama passthrough + async `/jobs` surface (ADR-0003, ADR-0004)
 - Embeddings bypass the queue entirely (ADR-0013)
 ## License
 [MIT](LICENSE) © 2026 Steve Dudenhoeffer.