diff --git a/prompts/phase-0-kickoff.md b/prompts/phase-0-kickoff.md new file mode 100644 index 0000000..25b1a5a --- /dev/null +++ b/prompts/phase-0-kickoff.md @@ -0,0 +1,75 @@ +# phase-0-kickoff.md — foreman build kickoff + +You are building **foreman**, a Go daemon that fronts one Ollama target and turns +it into a queued, observable, OpenAI/Ollama-compatible job endpoint. This is a +deliberately pared-down restart of a system (`peon-overseer`) that died of scope +creep. Restraint is a feature, not a limitation. + +## Read these first (authoritative, in order) + +1. `CLAUDE.md` in this repo — the operating manual. It is the source of truth for + architecture, stack, conventions, and the **out-of-scope guardrails**. +2. `docs/adr/README.md` then every `docs/adr/00NN-*.md`. The ADRs are the *why*. + Do not relitigate them; if you believe one is wrong, say so and propose a new + superseding ADR rather than silently diverging. +3. Via the **gitea MCP**, read the integration target — `steve/go-llm`: + `v2/provider/provider.go` (the `Provider` interface you must stay compatible + with), `v2/ollama/ollama.go` and `v2/constructors.go` (how `Ollama` / + `OllamaCloud` construct over native `/api/chat` + Bearer), and `v2/CLAUDE.md` + (DD#8: native API, not OpenAI-compat). +4. Via the gitea MCP, study deployment conventions in `steve/steveternet`: + `kalimdor/orgrimmar/warhol-queue/`, `kalimdor/orgrimmar/ratchet/`, and + `kalimdor/orgrimmar/mort/` for `docker-compose.yml` + `.env.example` patterns, + and `kalimdor/orgrimmar/traefik/` (incl. `custom/`) for the Traefik network + name, entrypoint, certresolver, and router/label conventions. foreman will + live at `kalimdor/orgrimmar/foreman/`. **Mirror these exactly; do not invent + label syntax.** + +## Working agreement (opusplan) + +- **Plan before code.** For each phase, produce a plan and wait for my approval + before implementing. Do not run ahead to later phases. +- **One phase at a time**, in order. Each phase is its own prompt I will paste. +- After every phase: `go build ./...`, `go vet ./...`, `go test -race -count=1 ./...` + must all pass. Append a dated entry to `progress.md`. Commit on a phase branch + with conventional-commit messages (`feat:`, `chore:`, `test:`, `docs:`). +- **Ask before assuming.** If a detail is ambiguous and not settled by CLAUDE.md + or an ADR, ask me — don't guess. +- **Propose an ADR** (append-only, next number) for any architectural decision + not already covered. Keep `docs/adr/README.md`'s index current. +- Keep dependencies minimal; match `go-llm` house style (tabs; wrap errors with + `fmt.Errorf("%w: ...", err)`; imports stdlib → third-party → internal). SQLite + via `modernc.org/sqlite` (pure-Go, `CGO_ENABLED=0`). No UI. +- **Refuse scope creep.** No distributed dispatch, leases, fair queueing, + capacity budgets, auth framework/SSO, GUI, or multi-target support. If a task + seems to need them, stop and flag it — that means the design is being violated. + +## Definition of done (whole project) + +A deployable daemon that: +- fronts one configurable Ollama target and transparently proxies native + `/api/chat`, `/api/tags`, `/api/ps` (so `go-llm` uses the Mac as a target with + no provider changes), including streaming; +- runs a durable SQLite-backed queue with a single worker and drain-by-model + scheduling, surviving restarts and target sleep; +- exposes an async `POST /jobs` surface returning a job ID, with + `queued→loading→working→done/failed` state webhooks and artifact delivery; +- ships a Go client package (synchronous facade over the async surface); +- passes CI on Gitea, builds as a container, and deploys via a steveternet + `docker-compose.yml` behind Traefik. + +## Phase map + +1. Scaffold, config, SQLite store, health, CI, Dockerfile. +2. Ollama target client + model poller + native passthrough (the go-llm target). +3. Durable queue + single worker + drain-by-model. +4. Async `/jobs` + job IDs + state webhooks + artifacts. +5. Go client package (sync facade) + `llm.Foreman()` in go-llm. +6. Deploy: steveternet compose + Traefik, `.env.example`, deploy docs, model-pull script. + +## Your task right now + +Confirm you've read the sources above, briefly restate the architecture in your +own words (so I can check your understanding), flag anything in the ADRs you'd +push back on, then produce a **detailed plan for Phase 1 only**. Do not write code +yet. Stop for my approval. diff --git a/prompts/phase-1.md b/prompts/phase-1.md new file mode 100644 index 0000000..b0ee4dd --- /dev/null +++ b/prompts/phase-1.md @@ -0,0 +1,86 @@ +# phase-1.md — Scaffold, config, store, health, CI, Dockerfile + +Re-ground: skim `CLAUDE.md`, `docs/adr/` (esp. 0002 placement, 0008 SQLite, +0010 security), and `progress.md`. Plan, get my approval, then implement. + +## Objective + +A buildable, testable, containerized skeleton with the persistence layer and a +health endpoint — no Ollama logic yet. + +## Tasks + +- Initialize the module: `gitea.stevedudenhoeffer.com/steve/foreman`, Go 1.26. +- Layout per CLAUDE.md: `cmd/foreman/main.go` (single binary, subcommand + skeleton: `serve`, plus stubs for `submit`, `jobs`, `ps`), `internal/config`, + `internal/store`, `internal/server`. Don't create empty packages for later + phases. +- `internal/config`: load from env into a struct — `FOREMAN_ADDR` (listen addr, + default `:8080`), `FOREMAN_OLLAMA_URL` (target, required), `FOREMAN_TOKEN` + (optional inbound bearer), `FOREMAN_DB_PATH`, `FOREMAN_POLL_INTERVAL`. Provide + a `.env.example` documenting every key. +- `internal/store`: SQLite via `modernc.org/sqlite`, WAL mode, with an embedded + migration for the `jobs` and `artifacts` tables (schema sketch in ADR-0008 / + ADR-0006). Include open/close, migrate-on-start, and basic CRUD with tests + (use a temp-file DB per test). +- `internal/server`: `net/http` server with `GET /healthz` (returns ok + a + placeholder degraded flag for later) and optional bearer-token middleware + (validate `Authorization: Bearer` only when `FOREMAN_TOKEN` is set). +- `.gitea/workflows/ci.yaml` — mirror `go-llm`'s, single-module: + +```yaml +name: CI +on: + push: { branches: ["*"] } + pull_request: { branches: ["*"] } +jobs: + build: + name: Build & Test + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-go@v5 + with: { go-version: "1.26" } + - run: go mod download + - run: go build ./... + - run: go vet ./... + - run: go test -race -count=1 ./... + tidy: + name: Tidy + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-go@v5 + with: { go-version: "1.26" } + - run: | + go mod tidy + git diff --exit-code go.mod go.sum +``` + +- `Dockerfile` — multi-stage, pure-Go static build (works because modernc is + CGO-free): + +```dockerfile +FROM golang:1.26 AS build +WORKDIR /src +COPY go.mod go.sum ./ +RUN go mod download +COPY . . +RUN CGO_ENABLED=0 go build -o /out/foreman ./cmd/foreman +FROM gcr.io/distroless/static-debian12 +COPY --from=build /out/foreman /foreman +EXPOSE 8080 +ENTRYPOINT ["/foreman", "serve"] +``` + +- `.gitignore` (use the project's), `README.md` (what + quickstart), and seed + `progress.md` with the M0 entry. + +## Definition of done + +- `go build/vet/test -race` all green; `store` has passing tests. +- `docker build .` succeeds; running the image serves `GET /healthz`. +- `cmd/foreman serve` boots, reads config from env, opens/migrates the DB. + +Wrap up: append to `progress.md`, commit on `phase-1-scaffold`, summarize what's +done and what Phase 2 will need. diff --git a/prompts/phase-2.md b/prompts/phase-2.md new file mode 100644 index 0000000..d1dd4af --- /dev/null +++ b/prompts/phase-2.md @@ -0,0 +1,42 @@ +# phase-2.md — Ollama target client, model poller, native passthrough + +Re-ground: `CLAUDE.md` + ADR-0003 (API surface), 0007 (model polling), 0012 +(streaming), 0002 (unreachable = transient). Plan, get approval, implement. + +## Objective + +Make foreman a working transparent front for its Ollama target — enough that +`go-llm` can use the Mac as a target *today*, before any queue exists. (Phase 3 +will move this through the queue; here it can proxy directly.) + +## Tasks + +- `internal/ollama`: a small client to the target (`FOREMAN_OLLAMA_URL`) behind + an interface, covering `POST /api/chat` (streaming and non-streaming), + `GET /api/tags`, `GET /api/ps`. Attach the outbound bearer if configured. Wrap + errors; classify connection failures distinctly (Phase 3 needs that signal). +- Model poller (goroutine): poll `/api/tags` every `FOREMAN_POLL_INTERVAL` + (default 30s) into an in-memory inventory with a mutex; track last-poll time + and a degraded flag. On target unreachable, retain last-known inventory and set + degraded — do not clear it. Wire degraded state into `/healthz`. +- Passthrough handlers in `internal/server`: + - `GET /api/tags` and `GET /api/ps` served from the poller/target. + - `POST /api/chat`: validate the requested model against the inventory (one + re-poll on miss, then 4xx if still absent); proxy to the target. Support + streaming faithfully (stream the target's chunks straight through; set the + right content type). For now this may call the target directly — no queue. +- Tests: a stub HTTP server standing in for Ollama; assert tags/ps proxy, + model validation rejects unknown models, streaming passes chunks through, and + the poller flips degraded on target failure and recovers. + +## Definition of done + +- `go build/vet/test -race` green. +- Against a real or stubbed Ollama: `curl .../api/tags` returns the inventory; + a non-streaming and a streaming `/api/chat` both work end-to-end. +- Acceptance: from a scratch Go program, `llm.Ollama(llm.WithBaseURL("http://:8080"))` + (or `llm.OllamaCloud(token, WithBaseURL(...))` if a token is set) completes a + chat through foreman. Note this in `progress.md`. + +Wrap up: `progress.md`, commit on `phase-2-passthrough`, note what Phase 3 changes +(routing this through the queue). diff --git a/prompts/phase-3.md b/prompts/phase-3.md new file mode 100644 index 0000000..1a95f51 --- /dev/null +++ b/prompts/phase-3.md @@ -0,0 +1,40 @@ +# phase-3.md — Durable queue, single worker, drain-by-model + +Re-ground: `CLAUDE.md` + ADR-0009 (single worker / drain-by-model), 0008 (queue), +0004 (lifecycle/retry). Plan, get approval, implement. + +## Objective + +Route execution through the SQLite queue with exactly one worker and +drain-by-model scheduling. The synchronous passthrough from Phase 2 now enqueues +and blocks on completion instead of calling the target directly. + +## Tasks + +- Promote chat requests to persisted jobs: every `/api/chat` call creates a `jobs` + row (state `queued`), and the handler blocks until that job reaches a terminal + state, then writes the response. Assign a **ULID** as the job id now (used + everywhere in Phase 4). +- `internal/worker`: a single worker loop (concurrency 1). Select the next job + with `ORDER BY (model != :current_resident), created_at` so all jobs for the + currently-resident model (from `/api/ps`) drain before a swap. Transition + `queued→loading→working→done`. Pin residency with Ollama `keep_alive`. +- Retry semantics (ADR-0004): a connection failure to the target re-queues the + job with backoff and increments `attempt`; exceeding a bounded max moves it to + `failed` with the last error stored. Never auto-fail on a single transient + error. Jobs survive process restart (resume `queued`/in-flight on boot). +- Tests: against the stub Ollama — jobs persist and execute serially; a sequence + mixing two models drains by model (assert the swap happens once, not per job); + a flapping target causes retry-then-success without data loss; restart mid-queue + resumes cleanly. + +## Definition of done + +- `go build/vet/test -race` green. +- The Phase 2 acceptance (go-llm completes a chat) still passes, now served + through the queue. +- Demonstrable: enqueue several jobs across two models and observe drain-by-model + ordering in logs; kill and restart foreman mid-queue and watch it resume. + +Wrap up: `progress.md`, commit on `phase-3-queue`. M0 is effectively complete here +— note that. Phase 4 adds the async surface on top of this same engine. diff --git a/prompts/phase-4.md b/prompts/phase-4.md new file mode 100644 index 0000000..f9d83e3 --- /dev/null +++ b/prompts/phase-4.md @@ -0,0 +1,42 @@ +# phase-4.md — Async /jobs, job IDs, state webhooks, artifacts + +Re-ground: `CLAUDE.md` + ADR-0004 (async surface), 0005 (webhook protocol), 0006 +(artifacts). Plan, get approval, implement. **This phase delivers the headline +"queue & webhooks" capability — get it genuinely working end-to-end.** + +## Objective + +A fire-and-forget async surface over the Phase 3 engine: submit a job, get an ID +immediately, receive state updates and the final result/artifacts by webhook. + +## Tasks + +- `POST /jobs`: body is a native-chat payload plus optional `state_webhook_url` + (and optional HMAC secret usage per config). Enqueue (reusing the Phase 3 + engine), return `202` with `{ "job_id": "" }` immediately. +- `GET /jobs/{id}`: current state, result, error, and artifact metadata (the + recovery/poll path for missed webhooks). +- `internal/webhook`: on each state transition (`queued→loading→working→done`, + plus `failed`), POST the event JSON from ADR-0005 to `state_webhook_url`. + At-least-once with bounded retry + backoff; never let a flaky receiver block or + fail the job. If a webhook secret is configured, sign the body with + HMAC-SHA256 in `X-Foreman-Signature`. +- Artifacts (ADR-0006): persist the completion as artifact `completion`; deliver + artifacts inline in the `done` event under a size threshold (default ~256KB), + otherwise send metadata + a URL and serve bytes at + `GET /jobs/{id}/artifacts/{name}`. Add a TTL prune sweep for jobs+artifacts. +- Decide and record (new ADR if needed) whether `state_webhook_url` is also + honored when present on the sync `/api/chat` path, or only on `/jobs`. +- Tests: spin up an in-test webhook receiver; assert the full lifecycle fires in + order with correct payloads, idempotency keys (`job_id`+`state`) are present, + a 500-ing receiver triggers retries without affecting job state, large + artifacts go by URL and small ones inline, and HMAC signatures verify. + +## Definition of done + +- `go build/vet/test -race` green. +- End-to-end demo: `POST /jobs` with a `state_webhook_url` pointed at a tiny local + listener; observe `queued→loading→working→done` plus the completion artifact; + confirm `GET /jobs/{id}` reconciles after a deliberately dropped webhook. + +Wrap up: `progress.md`, commit on `phase-4-async-webhooks`. M1 core is complete. diff --git a/prompts/phase-5.md b/prompts/phase-5.md new file mode 100644 index 0000000..ac71fd1 --- /dev/null +++ b/prompts/phase-5.md @@ -0,0 +1,41 @@ +# phase-5.md — Go client package + llm.Foreman() in go-llm + +Re-ground: `CLAUDE.md` + ADR-0011 (integration levels). Plan, get approval, +implement. Note this phase touches **two repos**. + +## Objective + +Make foreman ergonomic from Go: a synchronous client over the async surface +(Level 1), and the trivial `go-llm` target constructor (Level 0). + +## Tasks — foreman repo + +- `client/` (the one public package): a `foreman.Client` with a synchronous + `Complete(ctx, req) (Result, error)` that submits to `POST /jobs` with a + `state_webhook_url` pointed at an ephemeral receiver it stands up, blocks until + `done`, and returns the result + artifacts. Requirements: + - Fall back to polling `GET /jobs/{id}` if it can't bind/receive callbacks + (configurable; needed when foreman can't reach the caller). + - Respect `ctx` cancellation/deadline; clean up the receiver and (optionally) + best-effort cancel the job. + - Constructor takes base URL + optional token; verify HMAC if a secret is set. +- Tests against a stubbed foreman (or the real handlers): happy path returns the + artifact; polling-fallback path works; context timeout is honored. + +## Tasks — go-llm repo (via gitea MCP; open a branch/PR, do not push to main) + +- Add the `Foreman` constructor to `v2/constructors.go`, delegating to the ollama + provider (exact form is specified in ADR-0011). Update `v2/CLAUDE.md` with the + DD#9 entry. Keep it a thin pass-through (Level 0); do **not** build a dedicated + provider yet. +- Confirm it compiles and `go vet`/tests pass in the `v2` module. + +## Definition of done + +- foreman: `go build/vet/test -race` green; `client.Complete()` returns a real + result against a running foreman. +- go-llm: `llm.Foreman("http://:8080", token).Model("qwen3:30b")` + completes a chat. Changes are on a branch/PR for review, not committed to main. + +Wrap up: `progress.md`, commit `client/` on `phase-5-client`; report the go-llm +branch name and PR link for me to review/merge. diff --git a/prompts/phase-6.md b/prompts/phase-6.md new file mode 100644 index 0000000..db33e69 --- /dev/null +++ b/prompts/phase-6.md @@ -0,0 +1,57 @@ +# phase-6.md — Deploy: steveternet compose + Traefik, env, docs, model script + +Re-ground: `CLAUDE.md` + ADR-0002 (placement), 0010 (security). Plan, get +approval, implement. This phase touches **two repos** and must mirror existing +steveternet conventions — read them, don't invent. + +## Objective + +Make foreman deployable on orgrimmar via Komodo, exposed through Traefik, with +its model roster and operational notes documented. + +## Tasks — read first (gitea MCP, steve/steveternet) + +Study these for the exact conventions (network name, entrypoint, certresolver, +router/service label format, restart policy, `.env` usage): +`kalimdor/orgrimmar/warhol-queue/{docker-compose.yml,.env.example}`, +`kalimdor/orgrimmar/ratchet/docker-compose.yml`, +`kalimdor/orgrimmar/mort/docker-compose.yml`, and +`kalimdor/orgrimmar/traefik/` (incl. `custom/`). + +## Tasks — foreman repo + +- Finalize the `Dockerfile` from Phase 1 (label image, pin base digests if that's + the house style). +- `.env.example`: every config key with safe placeholder values, including + `FOREMAN_OLLAMA_URL` (the Mac's Tailscale address) and `FOREMAN_TOKEN`. +- `scripts/pull-models.sh`: the roster pulls (`qwen3:14b`, `qwen3:30b`, + `nomic-embed-text`, with the optional ones commented) plus the Mac-side + `launchctl setenv OLLAMA_MAX_LOADED_MODELS 2 / OLLAMA_KEEP_ALIVE -1 / + OLLAMA_CONTEXT_LENGTH 8192` lines as comments. +- `docs/deploy.md`: how it deploys (Komodo + compose), the security model + (Traefik internal-only or Tailscale; **not** a public entrypoint; Ollama target + firewalled to foreman), and the Mac prerequisites (Ollama bound to the tailnet, + `caffeinate`/`pmset`). + +## Tasks — steveternet repo (gitea MCP; branch/PR, not main) + +- Create `kalimdor/orgrimmar/foreman/docker-compose.yml` mirroring the analogs: + pull the foreman image from the gitea registry, the standard Traefik network + + router/service labels, `restart` policy, env from `.env`, and a named volume + for the SQLite DB. Decide (and note) whether the router is internal-only. +- Add `kalimdor/orgrimmar/foreman/.env.example`. +- If host-level routing belongs in `traefik/custom/` (as some services do), add + the file there instead/as-well, following those examples. + +## Definition of done + +- `docker build .` clean; compose validates (`docker compose config`). +- Labels/network/entrypoint match a sibling service exactly (diff against + `ratchet`/`warhol-queue` and confirm). +- `docs/deploy.md` is enough for a cold deploy. steveternet changes are on a + branch/PR for review. + +Wrap up: `progress.md` (mark the project deployable), commit foreman docs/scripts +on `phase-6-deploy`; report the steveternet branch/PR. Then give me a short +end-to-end smoke-test checklist (pull models on the Mac → deploy foreman → go-llm +chat → `POST /jobs` with a webhook).