Files

T

steve 0526bada90 docs: land prior ADR + prompt updates

Commit pre-existing uncommitted working-tree changes that predate the
license/public-readiness work — NOT authored in this session, just flushed so
they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013
(embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and
the phase-0..3 prompt revisions + prompts/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-26 20:33:39 -04:00

5.7 KiB

Raw Permalink Blame History

phase-0-kickoff.md — foreman autonomous build

You are building foreman end to end, in one autonomous run. Execute all six phases in order (1 → 2 → 3 → 4 → 5 → 6) and do not stop between them. The run ends when foreman is a working, deployable deliverable. Do not wait for my approval at phase boundaries — keep going until done or genuinely blocked.

foreman is a Go daemon that fronts one Ollama target and turns it into a queued, observable, Ollama-compatible job endpoint. It is a deliberately pared-down restart of a system (peon-overseer) that died of scope creep. Restraint is a feature: if a task seems to need distributed dispatch, leases, fair queueing, capacity budgets, an auth framework/SSO, a GUI, or multi-target support — stop, because that means the design is being violated.

Read these first (authoritative, in order)

CLAUDE.md — the operating manual and source of truth.
docs/adr/README.md, then every docs/adr/00NN-*.md (0001–0013). The ADRs are the why. Do not relitigate them.
Via the gitea MCP, steve/go-llm: v2/provider/provider.go (the Provider interface), v2/ollama/ollama.go + v2/ollama/native.go + v2/constructors.go (native /api/chat + Bearer + base URL), v2/CLAUDE.md (DD#8: native API, not OpenAI-compat).
Via the gitea MCP, steve/steveternet: kalimdor/orgrimmar/warhol-queue/, kalimdor/orgrimmar/ratchet/, kalimdor/orgrimmar/mort/, and kalimdor/orgrimmar/traefik/ (incl. custom/) for compose/Traefik/network conventions. foreman lives at kalimdor/orgrimmar/foreman/. Mirror these exactly; do not invent label syntax.

The phases

Each prompts/phase-N.md is the detailed spec for that phase. For each phase, in order: read phase-N.md, plan it internally, implement it, make the gates pass, record progress, commit, then immediately continue to the next phase.

Override: the phase files open with "Plan, get approval, implement" — that was written for a paste-one-at-a-time workflow. In this autonomous run, treat it as "plan internally and proceed." Do not pause for approval at any phase boundary.

Scaffold, config, SQLite store, health, CI, Dockerfile.
Ollama target client + model poller + native passthrough + embedding bypass.
Durable queue + single worker + drain-by-model (replaces phase-2's chat gate).
Async /jobs + job IDs + state webhooks + artifacts.
Go client package (sync facade) + llm.Foreman() in go-llm.
Deploy: steveternet compose + Traefik, .env.example, deploy docs, model script.

Per-phase loop (do this every phase, automatically)

Implement to the phase spec and the ADRs.
Run the gates; all must pass before moving on: go build ./..., go vet ./..., go test -race -count=1 ./..., and go mod tidy followed by git diff --exit-code go.mod go.sum.
Append a dated entry to progress.md (what landed, what's next).
Commit to the foreman repo with conventional-commit messages (feat:, test:, chore:, docs:). Committing to foreman's main is fine.
Continue to the next phase without pausing.

Invariants to honor throughout (from the ADRs)

Two-slot runtime (ADR-0013): the target runs OLLAMA_MAX_LOADED_MODELS=2 — an always-resident embedder (FOREMAN_EMBED_MODEL) plus one rotating worker model. /api/embed (+ /api/embeddings) bypass the queue and run concurrently; only /api/chat and POST /jobs are serialized through the single worker. Worker-model concurrency is exactly 1 (ADR-0009).
NDJSON, not SSE (ADR-0012): stream /api/chat as application/x-ndjson.
Env namespacing: every config key is FOREMAN_* (incl. FOREMAN_OLLAMA_URL, FOREMAN_OLLAMA_TOKEN). No bare OLLAMA_*.
Go 1.26 in go.mod, Dockerfile, and CI.
Unreachable target = transient/recoverable, never fatal (ADR-0002).

Cross-repo changes (phases 5 and 6)

The llm.Foreman() constructor (go-llm) and the steveternet docker-compose.yml touch repos other than foreman. For those, open a branch and a PR for my review — do NOT commit to their main. Report the branch names and PR links in the final summary.

When to stop vs. keep going

Keep going through routine ambiguity. If you hit a decision not covered by CLAUDE.md or an ADR, make the smallest reasonable choice, record it as a new ADR (append-only, next number after 0013, update the index), and continue.
Only stop for a true blocker: a gate you cannot make green after honest effort, a repo/tool you cannot reach, or a required choice that would contradict an accepted ADR or a scope guardrail. If you stop, say exactly why and what you need.

Definition of done (whole run)

foreman fronts one configurable Ollama target; transparently proxies native /api/chat, /api/tags, /api/ps (NDJSON streaming) so go-llm uses it as a target with no provider changes; /api/embed bypasses the queue concurrently.
Durable SQLite queue, single worker, drain-by-model; survives restart and target sleep.
POST /jobs returns a ULID job id; queued→loading→working→done|failed state webhooks (at-least-once, optional HMAC); artifacts inline/fetch.
A Go client package (sync facade over /jobs); llm.Foreman() branch/PR on go-llm.
CI green; container builds; steveternet compose + Traefik branch/PR.

Start now

Read the sources, then begin Phase 1 and run straight through to a finished deliverable. When done, give me: a summary of what was built per phase, the go-llm and steveternet PR links, any ADRs you added, and a copy-pasteable end-to-end smoke-test checklist (pull models on the Mac → set OLLAMA_MAX_LOADED_MODELS=2 → deploy foreman → go-llm chat → concurrent /api/embed → POST /jobs with a webhook).

5.7 KiB Raw Permalink Blame History Unescape Escape