docs: land prior ADR + prompt updates
Commit pre-existing uncommitted working-tree changes that predate the license/public-readiness work — NOT authored in this session, just flushed so they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013 (embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and the phase-0..3 prompt revisions + prompts/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
+94
-61
@@ -1,75 +1,108 @@
|
||||
# phase-0-kickoff.md — foreman build kickoff
|
||||
# phase-0-kickoff.md — foreman autonomous build
|
||||
|
||||
You are building **foreman**, a Go daemon that fronts one Ollama target and turns
|
||||
it into a queued, observable, OpenAI/Ollama-compatible job endpoint. This is a
|
||||
deliberately pared-down restart of a system (`peon-overseer`) that died of scope
|
||||
creep. Restraint is a feature, not a limitation.
|
||||
You are building **foreman** end to end, in **one autonomous run**. Execute all
|
||||
six phases in order (1 → 2 → 3 → 4 → 5 → 6) and do not stop between them. The run
|
||||
ends when foreman is a working, deployable deliverable. Do not wait for my
|
||||
approval at phase boundaries — keep going until done or genuinely blocked.
|
||||
|
||||
foreman is a Go daemon that fronts one Ollama target and turns it into a queued,
|
||||
observable, Ollama-compatible job endpoint. It is a deliberately pared-down
|
||||
restart of a system (`peon-overseer`) that died of scope creep. Restraint is a
|
||||
feature: if a task seems to need distributed dispatch, leases, fair queueing,
|
||||
capacity budgets, an auth framework/SSO, a GUI, or multi-target support — stop,
|
||||
because that means the design is being violated.
|
||||
|
||||
## Read these first (authoritative, in order)
|
||||
|
||||
1. `CLAUDE.md` in this repo — the operating manual. It is the source of truth for
|
||||
architecture, stack, conventions, and the **out-of-scope guardrails**.
|
||||
2. `docs/adr/README.md` then every `docs/adr/00NN-*.md`. The ADRs are the *why*.
|
||||
Do not relitigate them; if you believe one is wrong, say so and propose a new
|
||||
superseding ADR rather than silently diverging.
|
||||
3. Via the **gitea MCP**, read the integration target — `steve/go-llm`:
|
||||
`v2/provider/provider.go` (the `Provider` interface you must stay compatible
|
||||
with), `v2/ollama/ollama.go` and `v2/constructors.go` (how `Ollama` /
|
||||
`OllamaCloud` construct over native `/api/chat` + Bearer), and `v2/CLAUDE.md`
|
||||
1. `CLAUDE.md` — the operating manual and source of truth.
|
||||
2. `docs/adr/README.md`, then every `docs/adr/00NN-*.md` (0001–0013). The ADRs are
|
||||
the *why*. Do not relitigate them.
|
||||
3. Via the **gitea MCP**, `steve/go-llm`: `v2/provider/provider.go` (the
|
||||
`Provider` interface), `v2/ollama/ollama.go` + `v2/ollama/native.go` +
|
||||
`v2/constructors.go` (native `/api/chat` + Bearer + base URL), `v2/CLAUDE.md`
|
||||
(DD#8: native API, not OpenAI-compat).
|
||||
4. Via the gitea MCP, study deployment conventions in `steve/steveternet`:
|
||||
`kalimdor/orgrimmar/warhol-queue/`, `kalimdor/orgrimmar/ratchet/`, and
|
||||
`kalimdor/orgrimmar/mort/` for `docker-compose.yml` + `.env.example` patterns,
|
||||
and `kalimdor/orgrimmar/traefik/` (incl. `custom/`) for the Traefik network
|
||||
name, entrypoint, certresolver, and router/label conventions. foreman will
|
||||
live at `kalimdor/orgrimmar/foreman/`. **Mirror these exactly; do not invent
|
||||
label syntax.**
|
||||
4. Via the gitea MCP, `steve/steveternet`: `kalimdor/orgrimmar/warhol-queue/`,
|
||||
`kalimdor/orgrimmar/ratchet/`, `kalimdor/orgrimmar/mort/`, and
|
||||
`kalimdor/orgrimmar/traefik/` (incl. `custom/`) for compose/Traefik/network
|
||||
conventions. foreman lives at `kalimdor/orgrimmar/foreman/`. Mirror these
|
||||
exactly; do not invent label syntax.
|
||||
|
||||
## Working agreement (opusplan)
|
||||
## The phases
|
||||
|
||||
- **Plan before code.** For each phase, produce a plan and wait for my approval
|
||||
before implementing. Do not run ahead to later phases.
|
||||
- **One phase at a time**, in order. Each phase is its own prompt I will paste.
|
||||
- After every phase: `go build ./...`, `go vet ./...`, `go test -race -count=1 ./...`
|
||||
must all pass. Append a dated entry to `progress.md`. Commit on a phase branch
|
||||
with conventional-commit messages (`feat:`, `chore:`, `test:`, `docs:`).
|
||||
- **Ask before assuming.** If a detail is ambiguous and not settled by CLAUDE.md
|
||||
or an ADR, ask me — don't guess.
|
||||
- **Propose an ADR** (append-only, next number) for any architectural decision
|
||||
not already covered. Keep `docs/adr/README.md`'s index current.
|
||||
- Keep dependencies minimal; match `go-llm` house style (tabs; wrap errors with
|
||||
`fmt.Errorf("%w: ...", err)`; imports stdlib → third-party → internal). SQLite
|
||||
via `modernc.org/sqlite` (pure-Go, `CGO_ENABLED=0`). No UI.
|
||||
- **Refuse scope creep.** No distributed dispatch, leases, fair queueing,
|
||||
capacity budgets, auth framework/SSO, GUI, or multi-target support. If a task
|
||||
seems to need them, stop and flag it — that means the design is being violated.
|
||||
Each `prompts/phase-N.md` is the detailed spec for that phase. For each phase, in
|
||||
order: read `phase-N.md`, plan it internally, implement it, make the gates pass,
|
||||
record progress, commit, then immediately continue to the next phase.
|
||||
|
||||
## Definition of done (whole project)
|
||||
|
||||
A deployable daemon that:
|
||||
- fronts one configurable Ollama target and transparently proxies native
|
||||
`/api/chat`, `/api/tags`, `/api/ps` (so `go-llm` uses the Mac as a target with
|
||||
no provider changes), including streaming;
|
||||
- runs a durable SQLite-backed queue with a single worker and drain-by-model
|
||||
scheduling, surviving restarts and target sleep;
|
||||
- exposes an async `POST /jobs` surface returning a job ID, with
|
||||
`queued→loading→working→done/failed` state webhooks and artifact delivery;
|
||||
- ships a Go client package (synchronous facade over the async surface);
|
||||
- passes CI on Gitea, builds as a container, and deploys via a steveternet
|
||||
`docker-compose.yml` behind Traefik.
|
||||
|
||||
## Phase map
|
||||
**Override:** the phase files open with "Plan, get approval, implement" — that was
|
||||
written for a paste-one-at-a-time workflow. In *this* autonomous run, treat it as
|
||||
"plan internally and proceed." Do not pause for approval at any phase boundary.
|
||||
|
||||
1. Scaffold, config, SQLite store, health, CI, Dockerfile.
|
||||
2. Ollama target client + model poller + native passthrough (the go-llm target).
|
||||
3. Durable queue + single worker + drain-by-model.
|
||||
2. Ollama target client + model poller + native passthrough + embedding bypass.
|
||||
3. Durable queue + single worker + drain-by-model (replaces phase-2's chat gate).
|
||||
4. Async `/jobs` + job IDs + state webhooks + artifacts.
|
||||
5. Go client package (sync facade) + `llm.Foreman()` in go-llm.
|
||||
6. Deploy: steveternet compose + Traefik, `.env.example`, deploy docs, model-pull script.
|
||||
6. Deploy: steveternet compose + Traefik, `.env.example`, deploy docs, model script.
|
||||
|
||||
## Your task right now
|
||||
## Per-phase loop (do this every phase, automatically)
|
||||
|
||||
Confirm you've read the sources above, briefly restate the architecture in your
|
||||
own words (so I can check your understanding), flag anything in the ADRs you'd
|
||||
push back on, then produce a **detailed plan for Phase 1 only**. Do not write code
|
||||
yet. Stop for my approval.
|
||||
- Implement to the phase spec and the ADRs.
|
||||
- Run the gates; **all** must pass before moving on:
|
||||
`go build ./...`, `go vet ./...`, `go test -race -count=1 ./...`, and
|
||||
`go mod tidy` followed by `git diff --exit-code go.mod go.sum`.
|
||||
- Append a dated entry to `progress.md` (what landed, what's next).
|
||||
- Commit to the **foreman** repo with conventional-commit messages
|
||||
(`feat:`, `test:`, `chore:`, `docs:`). Committing to foreman's main is fine.
|
||||
- Continue to the next phase without pausing.
|
||||
|
||||
## Invariants to honor throughout (from the ADRs)
|
||||
|
||||
- **Two-slot runtime (ADR-0013):** the target runs `OLLAMA_MAX_LOADED_MODELS=2` —
|
||||
an always-resident embedder (`FOREMAN_EMBED_MODEL`) plus one rotating worker
|
||||
model. `/api/embed` (+ `/api/embeddings`) bypass the queue and run
|
||||
concurrently; only `/api/chat` and `POST /jobs` are serialized through the
|
||||
single worker. Worker-model concurrency is exactly 1 (ADR-0009).
|
||||
- **NDJSON, not SSE (ADR-0012):** stream `/api/chat` as `application/x-ndjson`.
|
||||
- **Env namespacing:** every config key is `FOREMAN_*` (incl.
|
||||
`FOREMAN_OLLAMA_URL`, `FOREMAN_OLLAMA_TOKEN`). No bare `OLLAMA_*`.
|
||||
- **Go 1.26** in `go.mod`, Dockerfile, and CI.
|
||||
- Unreachable target = transient/recoverable, never fatal (ADR-0002).
|
||||
|
||||
## Cross-repo changes (phases 5 and 6)
|
||||
|
||||
The `llm.Foreman()` constructor (go-llm) and the steveternet `docker-compose.yml`
|
||||
touch repos other than foreman. For those, **open a branch and a PR for my
|
||||
review — do NOT commit to their main.** Report the branch names and PR links in
|
||||
the final summary.
|
||||
|
||||
## When to stop vs. keep going
|
||||
|
||||
- Keep going through routine ambiguity. If you hit a decision not covered by
|
||||
`CLAUDE.md` or an ADR, make the smallest reasonable choice, **record it as a new
|
||||
ADR** (append-only, next number after 0013, update the index), and continue.
|
||||
- **Only stop** for a true blocker: a gate you cannot make green after honest
|
||||
effort, a repo/tool you cannot reach, or a required choice that would
|
||||
contradict an accepted ADR or a scope guardrail. If you stop, say exactly why
|
||||
and what you need.
|
||||
|
||||
## Definition of done (whole run)
|
||||
|
||||
- foreman fronts one configurable Ollama target; transparently proxies native
|
||||
`/api/chat`, `/api/tags`, `/api/ps` (NDJSON streaming) so go-llm uses it as a
|
||||
target with no provider changes; `/api/embed` bypasses the queue concurrently.
|
||||
- Durable SQLite queue, single worker, drain-by-model; survives restart and
|
||||
target sleep.
|
||||
- `POST /jobs` returns a ULID job id; `queued→loading→working→done|failed` state
|
||||
webhooks (at-least-once, optional HMAC); artifacts inline/fetch.
|
||||
- A Go client package (sync facade over `/jobs`); `llm.Foreman()` branch/PR on
|
||||
go-llm.
|
||||
- CI green; container builds; steveternet compose + Traefik branch/PR.
|
||||
|
||||
## Start now
|
||||
|
||||
Read the sources, then begin Phase 1 and run straight through to a finished
|
||||
deliverable. When done, give me: a summary of what was built per phase, the
|
||||
go-llm and steveternet PR links, any ADRs you added, and a copy-pasteable
|
||||
end-to-end smoke-test checklist (pull models on the Mac → set
|
||||
`OLLAMA_MAX_LOADED_MODELS=2` → deploy foreman → go-llm chat → concurrent
|
||||
`/api/embed` → `POST /jobs` with a webhook).
|
||||
|
||||
Reference in New Issue
Block a user