add initial prompts

2026-05-23 16:51:19 -04:00
parent 8fde024281
commit d5702f7a75
7 changed files with 383 additions and 0 deletions
@@ -0,0 +1,75 @@
+# phase-0-kickoff.md — foreman build kickoff
+
+You are building **foreman**, a Go daemon that fronts one Ollama target and turns
+it into a queued, observable, OpenAI/Ollama-compatible job endpoint. This is a
+deliberately pared-down restart of a system (`peon-overseer`) that died of scope
+creep. Restraint is a feature, not a limitation.
+
+## Read these first (authoritative, in order)
+
+1. `CLAUDE.md` in this repo — the operating manual. It is the source of truth for
+   architecture, stack, conventions, and the **out-of-scope guardrails**.
+2. `docs/adr/README.md` then every `docs/adr/00NN-*.md`. The ADRs are the *why*.
+   Do not relitigate them; if you believe one is wrong, say so and propose a new
+   superseding ADR rather than silently diverging.
+3. Via the **gitea MCP**, read the integration target — `steve/go-llm`:
+   `v2/provider/provider.go` (the `Provider` interface you must stay compatible
+   with), `v2/ollama/ollama.go` and `v2/constructors.go` (how `Ollama` /
+   `OllamaCloud` construct over native `/api/chat` + Bearer), and `v2/CLAUDE.md`
+   (DD#8: native API, not OpenAI-compat).
+4. Via the gitea MCP, study deployment conventions in `steve/steveternet`:
+   `kalimdor/orgrimmar/warhol-queue/`, `kalimdor/orgrimmar/ratchet/`, and
+   `kalimdor/orgrimmar/mort/` for `docker-compose.yml` + `.env.example` patterns,
+   and `kalimdor/orgrimmar/traefik/` (incl. `custom/`) for the Traefik network
+   name, entrypoint, certresolver, and router/label conventions. foreman will
+   live at `kalimdor/orgrimmar/foreman/`. **Mirror these exactly; do not invent
+   label syntax.**
+
+## Working agreement (opusplan)
+
+- **Plan before code.** For each phase, produce a plan and wait for my approval
+  before implementing. Do not run ahead to later phases.
+- **One phase at a time**, in order. Each phase is its own prompt I will paste.
+- After every phase: `go build ./...`, `go vet ./...`, `go test -race -count=1 ./...`
+  must all pass. Append a dated entry to `progress.md`. Commit on a phase branch
+  with conventional-commit messages (`feat:`, `chore:`, `test:`, `docs:`).
+- **Ask before assuming.** If a detail is ambiguous and not settled by CLAUDE.md
+  or an ADR, ask me — don't guess.
+- **Propose an ADR** (append-only, next number) for any architectural decision
+  not already covered. Keep `docs/adr/README.md`'s index current.
+- Keep dependencies minimal; match `go-llm` house style (tabs; wrap errors with
+  `fmt.Errorf("%w: ...", err)`; imports stdlib → third-party → internal). SQLite
+  via `modernc.org/sqlite` (pure-Go, `CGO_ENABLED=0`). No UI.
+- **Refuse scope creep.** No distributed dispatch, leases, fair queueing,
+  capacity budgets, auth framework/SSO, GUI, or multi-target support. If a task
+  seems to need them, stop and flag it — that means the design is being violated.
+
+## Definition of done (whole project)
+
+A deployable daemon that:
+- fronts one configurable Ollama target and transparently proxies native
+  `/api/chat`, `/api/tags`, `/api/ps` (so `go-llm` uses the Mac as a target with
+  no provider changes), including streaming;
+- runs a durable SQLite-backed queue with a single worker and drain-by-model
+  scheduling, surviving restarts and target sleep;
+- exposes an async `POST /jobs` surface returning a job ID, with
+  `queued→loading→working→done/failed` state webhooks and artifact delivery;
+- ships a Go client package (synchronous facade over the async surface);
+- passes CI on Gitea, builds as a container, and deploys via a steveternet
+  `docker-compose.yml` behind Traefik.
+
+## Phase map
+
+1. Scaffold, config, SQLite store, health, CI, Dockerfile.
+2. Ollama target client + model poller + native passthrough (the go-llm target).
+3. Durable queue + single worker + drain-by-model.
+4. Async `/jobs` + job IDs + state webhooks + artifacts.
+5. Go client package (sync facade) + `llm.Foreman()` in go-llm.
+6. Deploy: steveternet compose + Traefik, `.env.example`, deploy docs, model-pull script.
+
+## Your task right now
+
+Confirm you've read the sources above, briefly restate the architecture in your
+own words (so I can check your understanding), flag anything in the ADRs you'd
+push back on, then produce a **detailed plan for Phase 1 only**. Do not write code
+yet. Stop for my approval.
@@ -0,0 +1,86 @@
+# phase-1.md — Scaffold, config, store, health, CI, Dockerfile
+
+Re-ground: skim `CLAUDE.md`, `docs/adr/` (esp. 0002 placement, 0008 SQLite,
+0010 security), and `progress.md`. Plan, get my approval, then implement.
+
+## Objective
+
+A buildable, testable, containerized skeleton with the persistence layer and a
+health endpoint — no Ollama logic yet.
+
+## Tasks
+
+- Initialize the module: `gitea.stevedudenhoeffer.com/steve/foreman`, Go 1.26.
+- Layout per CLAUDE.md: `cmd/foreman/main.go` (single binary, subcommand
+  skeleton: `serve`, plus stubs for `submit`, `jobs`, `ps`), `internal/config`,
+  `internal/store`, `internal/server`. Don't create empty packages for later
+  phases.
+- `internal/config`: load from env into a struct — `FOREMAN_ADDR` (listen addr,
+  default `:8080`), `FOREMAN_OLLAMA_URL` (target, required), `FOREMAN_TOKEN`
+  (optional inbound bearer), `FOREMAN_DB_PATH`, `FOREMAN_POLL_INTERVAL`. Provide
+  a `.env.example` documenting every key.
+- `internal/store`: SQLite via `modernc.org/sqlite`, WAL mode, with an embedded
+  migration for the `jobs` and `artifacts` tables (schema sketch in ADR-0008 /
+  ADR-0006). Include open/close, migrate-on-start, and basic CRUD with tests
+  (use a temp-file DB per test).
+- `internal/server`: `net/http` server with `GET /healthz` (returns ok + a
+  placeholder degraded flag for later) and optional bearer-token middleware
+  (validate `Authorization: Bearer` only when `FOREMAN_TOKEN` is set).
+- `.gitea/workflows/ci.yaml` — mirror `go-llm`'s, single-module:
+
+```yaml
+name: CI
+on:
+  push: { branches: ["*"] }
+  pull_request: { branches: ["*"] }
+jobs:
+  build:
+    name: Build & Test
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-go@v5
+        with: { go-version: "1.26" }
+      - run: go mod download
+      - run: go build ./...
+      - run: go vet ./...
+      - run: go test -race -count=1 ./...
+  tidy:
+    name: Tidy
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-go@v5
+        with: { go-version: "1.26" }
+      - run: |
+          go mod tidy
+          git diff --exit-code go.mod go.sum
+```
+
+- `Dockerfile` — multi-stage, pure-Go static build (works because modernc is
+  CGO-free):
+
+```dockerfile
+FROM golang:1.26 AS build
+WORKDIR /src
+COPY go.mod go.sum ./
+RUN go mod download
+COPY . .
+RUN CGO_ENABLED=0 go build -o /out/foreman ./cmd/foreman
+FROM gcr.io/distroless/static-debian12
+COPY --from=build /out/foreman /foreman
+EXPOSE 8080
+ENTRYPOINT ["/foreman", "serve"]
+```
+
+- `.gitignore` (use the project's), `README.md` (what + quickstart), and seed
+  `progress.md` with the M0 entry.
+
+## Definition of done
+
+- `go build/vet/test -race` all green; `store` has passing tests.
+- `docker build .` succeeds; running the image serves `GET /healthz`.
+- `cmd/foreman serve` boots, reads config from env, opens/migrates the DB.
+
+Wrap up: append to `progress.md`, commit on `phase-1-scaffold`, summarize what's
+done and what Phase 2 will need.
@@ -0,0 +1,42 @@
+# phase-2.md — Ollama target client, model poller, native passthrough
+
+Re-ground: `CLAUDE.md` + ADR-0003 (API surface), 0007 (model polling), 0012
+(streaming), 0002 (unreachable = transient). Plan, get approval, implement.
+
+## Objective
+
+Make foreman a working transparent front for its Ollama target — enough that
+`go-llm` can use the Mac as a target *today*, before any queue exists. (Phase 3
+will move this through the queue; here it can proxy directly.)
+
+## Tasks
+
+- `internal/ollama`: a small client to the target (`FOREMAN_OLLAMA_URL`) behind
+  an interface, covering `POST /api/chat` (streaming and non-streaming),
+  `GET /api/tags`, `GET /api/ps`. Attach the outbound bearer if configured. Wrap
+  errors; classify connection failures distinctly (Phase 3 needs that signal).
+- Model poller (goroutine): poll `/api/tags` every `FOREMAN_POLL_INTERVAL`
+  (default 30s) into an in-memory inventory with a mutex; track last-poll time
+  and a degraded flag. On target unreachable, retain last-known inventory and set
+  degraded — do not clear it. Wire degraded state into `/healthz`.
+- Passthrough handlers in `internal/server`:
+  - `GET /api/tags` and `GET /api/ps` served from the poller/target.
+  - `POST /api/chat`: validate the requested model against the inventory (one
+    re-poll on miss, then 4xx if still absent); proxy to the target. Support
+    streaming faithfully (stream the target's chunks straight through; set the
+    right content type). For now this may call the target directly — no queue.
+- Tests: a stub HTTP server standing in for Ollama; assert tags/ps proxy,
+  model validation rejects unknown models, streaming passes chunks through, and
+  the poller flips degraded on target failure and recovers.
+
+## Definition of done
+
+- `go build/vet/test -race` green.
+- Against a real or stubbed Ollama: `curl .../api/tags` returns the inventory;
+  a non-streaming and a streaming `/api/chat` both work end-to-end.
+- Acceptance: from a scratch Go program, `llm.Ollama(llm.WithBaseURL("http://<foreman>:8080"))`
+  (or `llm.OllamaCloud(token, WithBaseURL(...))` if a token is set) completes a
+  chat through foreman. Note this in `progress.md`.
+
+Wrap up: `progress.md`, commit on `phase-2-passthrough`, note what Phase 3 changes
+(routing this through the queue).
@@ -0,0 +1,40 @@
+# phase-3.md — Durable queue, single worker, drain-by-model
+
+Re-ground: `CLAUDE.md` + ADR-0009 (single worker / drain-by-model), 0008 (queue),
+0004 (lifecycle/retry). Plan, get approval, implement.
+
+## Objective
+
+Route execution through the SQLite queue with exactly one worker and
+drain-by-model scheduling. The synchronous passthrough from Phase 2 now enqueues
+and blocks on completion instead of calling the target directly.
+
+## Tasks
+
+- Promote chat requests to persisted jobs: every `/api/chat` call creates a `jobs`
+  row (state `queued`), and the handler blocks until that job reaches a terminal
+  state, then writes the response. Assign a **ULID** as the job id now (used
+  everywhere in Phase 4).
+- `internal/worker`: a single worker loop (concurrency 1). Select the next job
+  with `ORDER BY (model != :current_resident), created_at` so all jobs for the
+  currently-resident model (from `/api/ps`) drain before a swap. Transition
+  `queued→loading→working→done`. Pin residency with Ollama `keep_alive`.
+- Retry semantics (ADR-0004): a connection failure to the target re-queues the
+  job with backoff and increments `attempt`; exceeding a bounded max moves it to
+  `failed` with the last error stored. Never auto-fail on a single transient
+  error. Jobs survive process restart (resume `queued`/in-flight on boot).
+- Tests: against the stub Ollama — jobs persist and execute serially; a sequence
+  mixing two models drains by model (assert the swap happens once, not per job);
+  a flapping target causes retry-then-success without data loss; restart mid-queue
+  resumes cleanly.
+
+## Definition of done
+
+- `go build/vet/test -race` green.
+- The Phase 2 acceptance (go-llm completes a chat) still passes, now served
+  through the queue.
+- Demonstrable: enqueue several jobs across two models and observe drain-by-model
+  ordering in logs; kill and restart foreman mid-queue and watch it resume.
+
+Wrap up: `progress.md`, commit on `phase-3-queue`. M0 is effectively complete here
+— note that. Phase 4 adds the async surface on top of this same engine.
@@ -0,0 +1,42 @@
+# phase-4.md — Async /jobs, job IDs, state webhooks, artifacts
+
+Re-ground: `CLAUDE.md` + ADR-0004 (async surface), 0005 (webhook protocol), 0006
+(artifacts). Plan, get approval, implement. **This phase delivers the headline
+"queue & webhooks" capability — get it genuinely working end-to-end.**
+
+## Objective
+
+A fire-and-forget async surface over the Phase 3 engine: submit a job, get an ID
+immediately, receive state updates and the final result/artifacts by webhook.
+
+## Tasks
+
+- `POST /jobs`: body is a native-chat payload plus optional `state_webhook_url`
+  (and optional HMAC secret usage per config). Enqueue (reusing the Phase 3
+  engine), return `202` with `{ "job_id": "<ulid>" }` immediately.
+- `GET /jobs/{id}`: current state, result, error, and artifact metadata (the
+  recovery/poll path for missed webhooks).
+- `internal/webhook`: on each state transition (`queued→loading→working→done`,
+  plus `failed`), POST the event JSON from ADR-0005 to `state_webhook_url`.
+  At-least-once with bounded retry + backoff; never let a flaky receiver block or
+  fail the job. If a webhook secret is configured, sign the body with
+  HMAC-SHA256 in `X-Foreman-Signature`.
+- Artifacts (ADR-0006): persist the completion as artifact `completion`; deliver
+  artifacts inline in the `done` event under a size threshold (default ~256KB),
+  otherwise send metadata + a URL and serve bytes at
+  `GET /jobs/{id}/artifacts/{name}`. Add a TTL prune sweep for jobs+artifacts.
+- Decide and record (new ADR if needed) whether `state_webhook_url` is also
+  honored when present on the sync `/api/chat` path, or only on `/jobs`.
+- Tests: spin up an in-test webhook receiver; assert the full lifecycle fires in
+  order with correct payloads, idempotency keys (`job_id`+`state`) are present,
+  a 500-ing receiver triggers retries without affecting job state, large
+  artifacts go by URL and small ones inline, and HMAC signatures verify.
+
+## Definition of done
+
+- `go build/vet/test -race` green.
+- End-to-end demo: `POST /jobs` with a `state_webhook_url` pointed at a tiny local
+  listener; observe `queued→loading→working→done` plus the completion artifact;
+  confirm `GET /jobs/{id}` reconciles after a deliberately dropped webhook.
+
+Wrap up: `progress.md`, commit on `phase-4-async-webhooks`. M1 core is complete.
@@ -0,0 +1,41 @@
+# phase-5.md — Go client package + llm.Foreman() in go-llm
+
+Re-ground: `CLAUDE.md` + ADR-0011 (integration levels). Plan, get approval,
+implement. Note this phase touches **two repos**.
+
+## Objective
+
+Make foreman ergonomic from Go: a synchronous client over the async surface
+(Level 1), and the trivial `go-llm` target constructor (Level 0).
+
+## Tasks — foreman repo
+
+- `client/` (the one public package): a `foreman.Client` with a synchronous
+  `Complete(ctx, req) (Result, error)` that submits to `POST /jobs` with a
+  `state_webhook_url` pointed at an ephemeral receiver it stands up, blocks until
+  `done`, and returns the result + artifacts. Requirements:
+  - Fall back to polling `GET /jobs/{id}` if it can't bind/receive callbacks
+    (configurable; needed when foreman can't reach the caller).
+  - Respect `ctx` cancellation/deadline; clean up the receiver and (optionally)
+    best-effort cancel the job.
+  - Constructor takes base URL + optional token; verify HMAC if a secret is set.
+- Tests against a stubbed foreman (or the real handlers): happy path returns the
+  artifact; polling-fallback path works; context timeout is honored.
+
+## Tasks — go-llm repo (via gitea MCP; open a branch/PR, do not push to main)
+
+- Add the `Foreman` constructor to `v2/constructors.go`, delegating to the ollama
+  provider (exact form is specified in ADR-0011). Update `v2/CLAUDE.md` with the
+  DD#9 entry. Keep it a thin pass-through (Level 0); do **not** build a dedicated
+  provider yet.
+- Confirm it compiles and `go vet`/tests pass in the `v2` module.
+
+## Definition of done
+
+- foreman: `go build/vet/test -race` green; `client.Complete()` returns a real
+  result against a running foreman.
+- go-llm: `llm.Foreman("http://<foreman>:8080", token).Model("qwen3:30b")`
+  completes a chat. Changes are on a branch/PR for review, not committed to main.
+
+Wrap up: `progress.md`, commit `client/` on `phase-5-client`; report the go-llm
+branch name and PR link for me to review/merge.
@@ -0,0 +1,57 @@
+# phase-6.md — Deploy: steveternet compose + Traefik, env, docs, model script
+
+Re-ground: `CLAUDE.md` + ADR-0002 (placement), 0010 (security). Plan, get
+approval, implement. This phase touches **two repos** and must mirror existing
+steveternet conventions — read them, don't invent.
+
+## Objective
+
+Make foreman deployable on orgrimmar via Komodo, exposed through Traefik, with
+its model roster and operational notes documented.
+
+## Tasks — read first (gitea MCP, steve/steveternet)
+
+Study these for the exact conventions (network name, entrypoint, certresolver,
+router/service label format, restart policy, `.env` usage):
+`kalimdor/orgrimmar/warhol-queue/{docker-compose.yml,.env.example}`,
+`kalimdor/orgrimmar/ratchet/docker-compose.yml`,
+`kalimdor/orgrimmar/mort/docker-compose.yml`, and
+`kalimdor/orgrimmar/traefik/` (incl. `custom/`).
+
+## Tasks — foreman repo
+
+- Finalize the `Dockerfile` from Phase 1 (label image, pin base digests if that's
+  the house style).
+- `.env.example`: every config key with safe placeholder values, including
+  `FOREMAN_OLLAMA_URL` (the Mac's Tailscale address) and `FOREMAN_TOKEN`.
+- `scripts/pull-models.sh`: the roster pulls (`qwen3:14b`, `qwen3:30b`,
+  `nomic-embed-text`, with the optional ones commented) plus the Mac-side
+  `launchctl setenv OLLAMA_MAX_LOADED_MODELS 2 / OLLAMA_KEEP_ALIVE -1 /
+  OLLAMA_CONTEXT_LENGTH 8192` lines as comments.
+- `docs/deploy.md`: how it deploys (Komodo + compose), the security model
+  (Traefik internal-only or Tailscale; **not** a public entrypoint; Ollama target
+  firewalled to foreman), and the Mac prerequisites (Ollama bound to the tailnet,
+  `caffeinate`/`pmset`).
+
+## Tasks — steveternet repo (gitea MCP; branch/PR, not main)
+
+- Create `kalimdor/orgrimmar/foreman/docker-compose.yml` mirroring the analogs:
+  pull the foreman image from the gitea registry, the standard Traefik network +
+  router/service labels, `restart` policy, env from `.env`, and a named volume
+  for the SQLite DB. Decide (and note) whether the router is internal-only.
+- Add `kalimdor/orgrimmar/foreman/.env.example`.
+- If host-level routing belongs in `traefik/custom/` (as some services do), add
+  the file there instead/as-well, following those examples.
+
+## Definition of done
+
+- `docker build .` clean; compose validates (`docker compose config`).
+- Labels/network/entrypoint match a sibling service exactly (diff against
+  `ratchet`/`warhol-queue` and confirm).
+- `docs/deploy.md` is enough for a cold deploy. steveternet changes are on a
+  branch/PR for review.
+
+Wrap up: `progress.md` (mark the project deployable), commit foreman docs/scripts
+on `phase-6-deploy`; report the steveternet branch/PR. Then give me a short
+end-to-end smoke-test checklist (pull models on the Mac → deploy foreman → go-llm
+chat → `POST /jobs` with a webhook).