# phase-0-kickoff.md — foreman autonomous build

You are building **foreman** end to end, in **one autonomous run**. Execute all
six phases in order (1 → 2 → 3 → 4 → 5 → 6) and do not stop between them. The run
ends when foreman is a working, deployable deliverable. Do not wait for my
approval at phase boundaries — keep going until done or genuinely blocked.

foreman is a Go daemon that fronts one Ollama target and turns it into a queued,
observable, Ollama-compatible job endpoint. It is a deliberately pared-down
restart of a system (`peon-overseer`) that died of scope creep. Restraint is a
feature: if a task seems to need distributed dispatch, leases, fair queueing,
capacity budgets, an auth framework/SSO, a GUI, or multi-target support — stop,
because that means the design is being violated.

## Read these first (authoritative, in order)

1. `CLAUDE.md` — the operating manual and source of truth.
2. `docs/adr/README.md`, then every `docs/adr/00NN-*.md` (0001–0013). The ADRs are
   the *why*. Do not relitigate them.
3. Via the **gitea MCP**, `steve/go-llm`: `v2/provider/provider.go` (the
   `Provider` interface), `v2/ollama/ollama.go` + `v2/ollama/native.go` +
   `v2/constructors.go` (native `/api/chat` + Bearer + base URL), `v2/CLAUDE.md`
   (DD#8: native API, not OpenAI-compat).
4. Via the gitea MCP, `steve/steveternet`: `kalimdor/orgrimmar/warhol-queue/`,
   `kalimdor/orgrimmar/ratchet/`, `kalimdor/orgrimmar/mort/`, and
   `kalimdor/orgrimmar/traefik/` (incl. `custom/`) for compose/Traefik/network
   conventions. foreman lives at `kalimdor/orgrimmar/foreman/`. Mirror these
   exactly; do not invent label syntax.

## The phases

Each `prompts/phase-N.md` is the detailed spec for that phase. For each phase, in
order: read `phase-N.md`, plan it internally, implement it, make the gates pass,
record progress, commit, then immediately continue to the next phase.

**Override:** the phase files open with "Plan, get approval, implement" — that was
written for a paste-one-at-a-time workflow. In *this* autonomous run, treat it as
"plan internally and proceed." Do not pause for approval at any phase boundary.

1. Scaffold, config, SQLite store, health, CI, Dockerfile.
2. Ollama target client + model poller + native passthrough + embedding bypass.
3. Durable queue + single worker + drain-by-model (replaces phase-2's chat gate).
4. Async `/jobs` + job IDs + state webhooks + artifacts.
5. Go client package (sync facade) + `llm.Foreman()` in go-llm.
6. Deploy: steveternet compose + Traefik, `.env.example`, deploy docs, model script.

## Per-phase loop (do this every phase, automatically)

- Implement to the phase spec and the ADRs.
- Run the gates; **all** must pass before moving on:
  `go build ./...`, `go vet ./...`, `go test -race -count=1 ./...`, and
  `go mod tidy` followed by `git diff --exit-code go.mod go.sum`.
- Append a dated entry to `progress.md` (what landed, what's next).
- Commit to the **foreman** repo with conventional-commit messages
  (`feat:`, `test:`, `chore:`, `docs:`). Committing to foreman's main is fine.
- Continue to the next phase without pausing.

## Invariants to honor throughout (from the ADRs)

- **Two-slot runtime (ADR-0013):** the target runs `OLLAMA_MAX_LOADED_MODELS=2` —
  an always-resident embedder (`FOREMAN_EMBED_MODEL`) plus one rotating worker
  model. `/api/embed` (+ `/api/embeddings`) bypass the queue and run
  concurrently; only `/api/chat` and `POST /jobs` are serialized through the
  single worker. Worker-model concurrency is exactly 1 (ADR-0009).
- **NDJSON, not SSE (ADR-0012):** stream `/api/chat` as `application/x-ndjson`.
- **Env namespacing:** every config key is `FOREMAN_*` (incl.
  `FOREMAN_OLLAMA_URL`, `FOREMAN_OLLAMA_TOKEN`). No bare `OLLAMA_*`.
- **Go 1.26** in `go.mod`, Dockerfile, and CI.
- Unreachable target = transient/recoverable, never fatal (ADR-0002).

## Cross-repo changes (phases 5 and 6)

The `llm.Foreman()` constructor (go-llm) and the steveternet `docker-compose.yml`
touch repos other than foreman. For those, **open a branch and a PR for my
review — do NOT commit to their main.** Report the branch names and PR links in
the final summary.

## When to stop vs. keep going

- Keep going through routine ambiguity. If you hit a decision not covered by
  `CLAUDE.md` or an ADR, make the smallest reasonable choice, **record it as a new
  ADR** (append-only, next number after 0013, update the index), and continue.
- **Only stop** for a true blocker: a gate you cannot make green after honest
  effort, a repo/tool you cannot reach, or a required choice that would
  contradict an accepted ADR or a scope guardrail. If you stop, say exactly why
  and what you need.

## Definition of done (whole run)

- foreman fronts one configurable Ollama target; transparently proxies native
  `/api/chat`, `/api/tags`, `/api/ps` (NDJSON streaming) so go-llm uses it as a
  target with no provider changes; `/api/embed` bypasses the queue concurrently.
- Durable SQLite queue, single worker, drain-by-model; survives restart and
  target sleep.
- `POST /jobs` returns a ULID job id; `queued→loading→working→done|failed` state
  webhooks (at-least-once, optional HMAC); artifacts inline/fetch.
- A Go client package (sync facade over `/jobs`); `llm.Foreman()` branch/PR on
  go-llm.
- CI green; container builds; steveternet compose + Traefik branch/PR.

## Start now

Read the sources, then begin Phase 1 and run straight through to a finished
deliverable. When done, give me: a summary of what was built per phase, the
go-llm and steveternet PR links, any ADRs you added, and a copy-pasteable
end-to-end smoke-test checklist (pull models on the Mac → set
`OLLAMA_MAX_LOADED_MODELS=2` → deploy foreman → go-llm chat → concurrent
`/api/embed` → `POST /jobs` with a webhook).