Commit pre-existing uncommitted working-tree changes that predate the license/public-readiness work — NOT authored in this session, just flushed so they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013 (embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and the phase-0..3 prompt revisions + prompts/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.7 KiB
phase-0-kickoff.md — foreman autonomous build
You are building foreman end to end, in one autonomous run. Execute all six phases in order (1 → 2 → 3 → 4 → 5 → 6) and do not stop between them. The run ends when foreman is a working, deployable deliverable. Do not wait for my approval at phase boundaries — keep going until done or genuinely blocked.
foreman is a Go daemon that fronts one Ollama target and turns it into a queued,
observable, Ollama-compatible job endpoint. It is a deliberately pared-down
restart of a system (peon-overseer) that died of scope creep. Restraint is a
feature: if a task seems to need distributed dispatch, leases, fair queueing,
capacity budgets, an auth framework/SSO, a GUI, or multi-target support — stop,
because that means the design is being violated.
Read these first (authoritative, in order)
CLAUDE.md— the operating manual and source of truth.docs/adr/README.md, then everydocs/adr/00NN-*.md(0001–0013). The ADRs are the why. Do not relitigate them.- Via the gitea MCP,
steve/go-llm:v2/provider/provider.go(theProviderinterface),v2/ollama/ollama.go+v2/ollama/native.go+v2/constructors.go(native/api/chat+ Bearer + base URL),v2/CLAUDE.md(DD#8: native API, not OpenAI-compat). - Via the gitea MCP,
steve/steveternet:kalimdor/orgrimmar/warhol-queue/,kalimdor/orgrimmar/ratchet/,kalimdor/orgrimmar/mort/, andkalimdor/orgrimmar/traefik/(incl.custom/) for compose/Traefik/network conventions. foreman lives atkalimdor/orgrimmar/foreman/. Mirror these exactly; do not invent label syntax.
The phases
Each prompts/phase-N.md is the detailed spec for that phase. For each phase, in
order: read phase-N.md, plan it internally, implement it, make the gates pass,
record progress, commit, then immediately continue to the next phase.
Override: the phase files open with "Plan, get approval, implement" — that was written for a paste-one-at-a-time workflow. In this autonomous run, treat it as "plan internally and proceed." Do not pause for approval at any phase boundary.
- Scaffold, config, SQLite store, health, CI, Dockerfile.
- Ollama target client + model poller + native passthrough + embedding bypass.
- Durable queue + single worker + drain-by-model (replaces phase-2's chat gate).
- Async
/jobs+ job IDs + state webhooks + artifacts. - Go client package (sync facade) +
llm.Foreman()in go-llm. - Deploy: steveternet compose + Traefik,
.env.example, deploy docs, model script.
Per-phase loop (do this every phase, automatically)
- Implement to the phase spec and the ADRs.
- Run the gates; all must pass before moving on:
go build ./...,go vet ./...,go test -race -count=1 ./..., andgo mod tidyfollowed bygit diff --exit-code go.mod go.sum. - Append a dated entry to
progress.md(what landed, what's next). - Commit to the foreman repo with conventional-commit messages
(
feat:,test:,chore:,docs:). Committing to foreman's main is fine. - Continue to the next phase without pausing.
Invariants to honor throughout (from the ADRs)
- Two-slot runtime (ADR-0013): the target runs
OLLAMA_MAX_LOADED_MODELS=2— an always-resident embedder (FOREMAN_EMBED_MODEL) plus one rotating worker model./api/embed(+/api/embeddings) bypass the queue and run concurrently; only/api/chatandPOST /jobsare serialized through the single worker. Worker-model concurrency is exactly 1 (ADR-0009). - NDJSON, not SSE (ADR-0012): stream
/api/chatasapplication/x-ndjson. - Env namespacing: every config key is
FOREMAN_*(incl.FOREMAN_OLLAMA_URL,FOREMAN_OLLAMA_TOKEN). No bareOLLAMA_*. - Go 1.26 in
go.mod, Dockerfile, and CI. - Unreachable target = transient/recoverable, never fatal (ADR-0002).
Cross-repo changes (phases 5 and 6)
The llm.Foreman() constructor (go-llm) and the steveternet docker-compose.yml
touch repos other than foreman. For those, open a branch and a PR for my
review — do NOT commit to their main. Report the branch names and PR links in
the final summary.
When to stop vs. keep going
- Keep going through routine ambiguity. If you hit a decision not covered by
CLAUDE.mdor an ADR, make the smallest reasonable choice, record it as a new ADR (append-only, next number after 0013, update the index), and continue. - Only stop for a true blocker: a gate you cannot make green after honest effort, a repo/tool you cannot reach, or a required choice that would contradict an accepted ADR or a scope guardrail. If you stop, say exactly why and what you need.
Definition of done (whole run)
- foreman fronts one configurable Ollama target; transparently proxies native
/api/chat,/api/tags,/api/ps(NDJSON streaming) so go-llm uses it as a target with no provider changes;/api/embedbypasses the queue concurrently. - Durable SQLite queue, single worker, drain-by-model; survives restart and target sleep.
POST /jobsreturns a ULID job id;queued→loading→working→done|failedstate webhooks (at-least-once, optional HMAC); artifacts inline/fetch.- A Go client package (sync facade over
/jobs);llm.Foreman()branch/PR on go-llm. - CI green; container builds; steveternet compose + Traefik branch/PR.
Start now
Read the sources, then begin Phase 1 and run straight through to a finished
deliverable. When done, give me: a summary of what was built per phase, the
go-llm and steveternet PR links, any ADRs you added, and a copy-pasteable
end-to-end smoke-test checklist (pull models on the Mac → set
OLLAMA_MAX_LOADED_MODELS=2 → deploy foreman → go-llm chat → concurrent
/api/embed → POST /jobs with a webhook).