Files
foreman/prompts/phase-0-kickoff.md
T
steve 0526bada90 docs: land prior ADR + prompt updates
Commit pre-existing uncommitted working-tree changes that predate the
license/public-readiness work — NOT authored in this session, just flushed so
they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013
(embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and
the phase-0..3 prompt revisions + prompts/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 20:33:39 -04:00

109 lines
5.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# phase-0-kickoff.md — foreman autonomous build
You are building **foreman** end to end, in **one autonomous run**. Execute all
six phases in order (1 → 2 → 3 → 4 → 5 → 6) and do not stop between them. The run
ends when foreman is a working, deployable deliverable. Do not wait for my
approval at phase boundaries — keep going until done or genuinely blocked.
foreman is a Go daemon that fronts one Ollama target and turns it into a queued,
observable, Ollama-compatible job endpoint. It is a deliberately pared-down
restart of a system (`peon-overseer`) that died of scope creep. Restraint is a
feature: if a task seems to need distributed dispatch, leases, fair queueing,
capacity budgets, an auth framework/SSO, a GUI, or multi-target support — stop,
because that means the design is being violated.
## Read these first (authoritative, in order)
1. `CLAUDE.md` — the operating manual and source of truth.
2. `docs/adr/README.md`, then every `docs/adr/00NN-*.md` (00010013). The ADRs are
the *why*. Do not relitigate them.
3. Via the **gitea MCP**, `steve/go-llm`: `v2/provider/provider.go` (the
`Provider` interface), `v2/ollama/ollama.go` + `v2/ollama/native.go` +
`v2/constructors.go` (native `/api/chat` + Bearer + base URL), `v2/CLAUDE.md`
(DD#8: native API, not OpenAI-compat).
4. Via the gitea MCP, `steve/steveternet`: `kalimdor/orgrimmar/warhol-queue/`,
`kalimdor/orgrimmar/ratchet/`, `kalimdor/orgrimmar/mort/`, and
`kalimdor/orgrimmar/traefik/` (incl. `custom/`) for compose/Traefik/network
conventions. foreman lives at `kalimdor/orgrimmar/foreman/`. Mirror these
exactly; do not invent label syntax.
## The phases
Each `prompts/phase-N.md` is the detailed spec for that phase. For each phase, in
order: read `phase-N.md`, plan it internally, implement it, make the gates pass,
record progress, commit, then immediately continue to the next phase.
**Override:** the phase files open with "Plan, get approval, implement" — that was
written for a paste-one-at-a-time workflow. In *this* autonomous run, treat it as
"plan internally and proceed." Do not pause for approval at any phase boundary.
1. Scaffold, config, SQLite store, health, CI, Dockerfile.
2. Ollama target client + model poller + native passthrough + embedding bypass.
3. Durable queue + single worker + drain-by-model (replaces phase-2's chat gate).
4. Async `/jobs` + job IDs + state webhooks + artifacts.
5. Go client package (sync facade) + `llm.Foreman()` in go-llm.
6. Deploy: steveternet compose + Traefik, `.env.example`, deploy docs, model script.
## Per-phase loop (do this every phase, automatically)
- Implement to the phase spec and the ADRs.
- Run the gates; **all** must pass before moving on:
`go build ./...`, `go vet ./...`, `go test -race -count=1 ./...`, and
`go mod tidy` followed by `git diff --exit-code go.mod go.sum`.
- Append a dated entry to `progress.md` (what landed, what's next).
- Commit to the **foreman** repo with conventional-commit messages
(`feat:`, `test:`, `chore:`, `docs:`). Committing to foreman's main is fine.
- Continue to the next phase without pausing.
## Invariants to honor throughout (from the ADRs)
- **Two-slot runtime (ADR-0013):** the target runs `OLLAMA_MAX_LOADED_MODELS=2`
an always-resident embedder (`FOREMAN_EMBED_MODEL`) plus one rotating worker
model. `/api/embed` (+ `/api/embeddings`) bypass the queue and run
concurrently; only `/api/chat` and `POST /jobs` are serialized through the
single worker. Worker-model concurrency is exactly 1 (ADR-0009).
- **NDJSON, not SSE (ADR-0012):** stream `/api/chat` as `application/x-ndjson`.
- **Env namespacing:** every config key is `FOREMAN_*` (incl.
`FOREMAN_OLLAMA_URL`, `FOREMAN_OLLAMA_TOKEN`). No bare `OLLAMA_*`.
- **Go 1.26** in `go.mod`, Dockerfile, and CI.
- Unreachable target = transient/recoverable, never fatal (ADR-0002).
## Cross-repo changes (phases 5 and 6)
The `llm.Foreman()` constructor (go-llm) and the steveternet `docker-compose.yml`
touch repos other than foreman. For those, **open a branch and a PR for my
review — do NOT commit to their main.** Report the branch names and PR links in
the final summary.
## When to stop vs. keep going
- Keep going through routine ambiguity. If you hit a decision not covered by
`CLAUDE.md` or an ADR, make the smallest reasonable choice, **record it as a new
ADR** (append-only, next number after 0013, update the index), and continue.
- **Only stop** for a true blocker: a gate you cannot make green after honest
effort, a repo/tool you cannot reach, or a required choice that would
contradict an accepted ADR or a scope guardrail. If you stop, say exactly why
and what you need.
## Definition of done (whole run)
- foreman fronts one configurable Ollama target; transparently proxies native
`/api/chat`, `/api/tags`, `/api/ps` (NDJSON streaming) so go-llm uses it as a
target with no provider changes; `/api/embed` bypasses the queue concurrently.
- Durable SQLite queue, single worker, drain-by-model; survives restart and
target sleep.
- `POST /jobs` returns a ULID job id; `queued→loading→working→done|failed` state
webhooks (at-least-once, optional HMAC); artifacts inline/fetch.
- A Go client package (sync facade over `/jobs`); `llm.Foreman()` branch/PR on
go-llm.
- CI green; container builds; steveternet compose + Traefik branch/PR.
## Start now
Read the sources, then begin Phase 1 and run straight through to a finished
deliverable. When done, give me: a summary of what was built per phase, the
go-llm and steveternet PR links, any ADRs you added, and a copy-pasteable
end-to-end smoke-test checklist (pull models on the Mac → set
`OLLAMA_MAX_LOADED_MODELS=2` → deploy foreman → go-llm chat → concurrent
`/api/embed``POST /jobs` with a webhook).