# phase-0-kickoff.md — foreman autonomous build You are building **foreman** end to end, in **one autonomous run**. Execute all six phases in order (1 → 2 → 3 → 4 → 5 → 6) and do not stop between them. The run ends when foreman is a working, deployable deliverable. Do not wait for my approval at phase boundaries — keep going until done or genuinely blocked. foreman is a Go daemon that fronts one Ollama target and turns it into a queued, observable, Ollama-compatible job endpoint. It is a deliberately pared-down restart of a system (`peon-overseer`) that died of scope creep. Restraint is a feature: if a task seems to need distributed dispatch, leases, fair queueing, capacity budgets, an auth framework/SSO, a GUI, or multi-target support — stop, because that means the design is being violated. ## Read these first (authoritative, in order) 1. `CLAUDE.md` — the operating manual and source of truth. 2. `docs/adr/README.md`, then every `docs/adr/00NN-*.md` (0001–0013). The ADRs are the *why*. Do not relitigate them. 3. Via the **gitea MCP**, `steve/go-llm`: `v2/provider/provider.go` (the `Provider` interface), `v2/ollama/ollama.go` + `v2/ollama/native.go` + `v2/constructors.go` (native `/api/chat` + Bearer + base URL), `v2/CLAUDE.md` (DD#8: native API, not OpenAI-compat). 4. Via the gitea MCP, `steve/steveternet`: `kalimdor/orgrimmar/warhol-queue/`, `kalimdor/orgrimmar/ratchet/`, `kalimdor/orgrimmar/mort/`, and `kalimdor/orgrimmar/traefik/` (incl. `custom/`) for compose/Traefik/network conventions. foreman lives at `kalimdor/orgrimmar/foreman/`. Mirror these exactly; do not invent label syntax. ## The phases Each `prompts/phase-N.md` is the detailed spec for that phase. For each phase, in order: read `phase-N.md`, plan it internally, implement it, make the gates pass, record progress, commit, then immediately continue to the next phase. **Override:** the phase files open with "Plan, get approval, implement" — that was written for a paste-one-at-a-time workflow. In *this* autonomous run, treat it as "plan internally and proceed." Do not pause for approval at any phase boundary. 1. Scaffold, config, SQLite store, health, CI, Dockerfile. 2. Ollama target client + model poller + native passthrough + embedding bypass. 3. Durable queue + single worker + drain-by-model (replaces phase-2's chat gate). 4. Async `/jobs` + job IDs + state webhooks + artifacts. 5. Go client package (sync facade) + `llm.Foreman()` in go-llm. 6. Deploy: steveternet compose + Traefik, `.env.example`, deploy docs, model script. ## Per-phase loop (do this every phase, automatically) - Implement to the phase spec and the ADRs. - Run the gates; **all** must pass before moving on: `go build ./...`, `go vet ./...`, `go test -race -count=1 ./...`, and `go mod tidy` followed by `git diff --exit-code go.mod go.sum`. - Append a dated entry to `progress.md` (what landed, what's next). - Commit to the **foreman** repo with conventional-commit messages (`feat:`, `test:`, `chore:`, `docs:`). Committing to foreman's main is fine. - Continue to the next phase without pausing. ## Invariants to honor throughout (from the ADRs) - **Two-slot runtime (ADR-0013):** the target runs `OLLAMA_MAX_LOADED_MODELS=2` — an always-resident embedder (`FOREMAN_EMBED_MODEL`) plus one rotating worker model. `/api/embed` (+ `/api/embeddings`) bypass the queue and run concurrently; only `/api/chat` and `POST /jobs` are serialized through the single worker. Worker-model concurrency is exactly 1 (ADR-0009). - **NDJSON, not SSE (ADR-0012):** stream `/api/chat` as `application/x-ndjson`. - **Env namespacing:** every config key is `FOREMAN_*` (incl. `FOREMAN_OLLAMA_URL`, `FOREMAN_OLLAMA_TOKEN`). No bare `OLLAMA_*`. - **Go 1.26** in `go.mod`, Dockerfile, and CI. - Unreachable target = transient/recoverable, never fatal (ADR-0002). ## Cross-repo changes (phases 5 and 6) The `llm.Foreman()` constructor (go-llm) and the steveternet `docker-compose.yml` touch repos other than foreman. For those, **open a branch and a PR for my review — do NOT commit to their main.** Report the branch names and PR links in the final summary. ## When to stop vs. keep going - Keep going through routine ambiguity. If you hit a decision not covered by `CLAUDE.md` or an ADR, make the smallest reasonable choice, **record it as a new ADR** (append-only, next number after 0013, update the index), and continue. - **Only stop** for a true blocker: a gate you cannot make green after honest effort, a repo/tool you cannot reach, or a required choice that would contradict an accepted ADR or a scope guardrail. If you stop, say exactly why and what you need. ## Definition of done (whole run) - foreman fronts one configurable Ollama target; transparently proxies native `/api/chat`, `/api/tags`, `/api/ps` (NDJSON streaming) so go-llm uses it as a target with no provider changes; `/api/embed` bypasses the queue concurrently. - Durable SQLite queue, single worker, drain-by-model; survives restart and target sleep. - `POST /jobs` returns a ULID job id; `queued→loading→working→done|failed` state webhooks (at-least-once, optional HMAC); artifacts inline/fetch. - A Go client package (sync facade over `/jobs`); `llm.Foreman()` branch/PR on go-llm. - CI green; container builds; steveternet compose + Traefik branch/PR. ## Start now Read the sources, then begin Phase 1 and run straight through to a finished deliverable. When done, give me: a summary of what was built per phase, the go-llm and steveternet PR links, any ADRs you added, and a copy-pasteable end-to-end smoke-test checklist (pull models on the Mac → set `OLLAMA_MAX_LOADED_MODELS=2` → deploy foreman → go-llm chat → concurrent `/api/embed` → `POST /jobs` with a webhook).