add initial prompts

This commit is contained in:
2026-05-23 16:51:19 -04:00
parent 8fde024281
commit d5702f7a75
7 changed files with 383 additions and 0 deletions
+75
View File
@@ -0,0 +1,75 @@
# phase-0-kickoff.md — foreman build kickoff
You are building **foreman**, a Go daemon that fronts one Ollama target and turns
it into a queued, observable, OpenAI/Ollama-compatible job endpoint. This is a
deliberately pared-down restart of a system (`peon-overseer`) that died of scope
creep. Restraint is a feature, not a limitation.
## Read these first (authoritative, in order)
1. `CLAUDE.md` in this repo — the operating manual. It is the source of truth for
architecture, stack, conventions, and the **out-of-scope guardrails**.
2. `docs/adr/README.md` then every `docs/adr/00NN-*.md`. The ADRs are the *why*.
Do not relitigate them; if you believe one is wrong, say so and propose a new
superseding ADR rather than silently diverging.
3. Via the **gitea MCP**, read the integration target — `steve/go-llm`:
`v2/provider/provider.go` (the `Provider` interface you must stay compatible
with), `v2/ollama/ollama.go` and `v2/constructors.go` (how `Ollama` /
`OllamaCloud` construct over native `/api/chat` + Bearer), and `v2/CLAUDE.md`
(DD#8: native API, not OpenAI-compat).
4. Via the gitea MCP, study deployment conventions in `steve/steveternet`:
`kalimdor/orgrimmar/warhol-queue/`, `kalimdor/orgrimmar/ratchet/`, and
`kalimdor/orgrimmar/mort/` for `docker-compose.yml` + `.env.example` patterns,
and `kalimdor/orgrimmar/traefik/` (incl. `custom/`) for the Traefik network
name, entrypoint, certresolver, and router/label conventions. foreman will
live at `kalimdor/orgrimmar/foreman/`. **Mirror these exactly; do not invent
label syntax.**
## Working agreement (opusplan)
- **Plan before code.** For each phase, produce a plan and wait for my approval
before implementing. Do not run ahead to later phases.
- **One phase at a time**, in order. Each phase is its own prompt I will paste.
- After every phase: `go build ./...`, `go vet ./...`, `go test -race -count=1 ./...`
must all pass. Append a dated entry to `progress.md`. Commit on a phase branch
with conventional-commit messages (`feat:`, `chore:`, `test:`, `docs:`).
- **Ask before assuming.** If a detail is ambiguous and not settled by CLAUDE.md
or an ADR, ask me — don't guess.
- **Propose an ADR** (append-only, next number) for any architectural decision
not already covered. Keep `docs/adr/README.md`'s index current.
- Keep dependencies minimal; match `go-llm` house style (tabs; wrap errors with
`fmt.Errorf("%w: ...", err)`; imports stdlib → third-party → internal). SQLite
via `modernc.org/sqlite` (pure-Go, `CGO_ENABLED=0`). No UI.
- **Refuse scope creep.** No distributed dispatch, leases, fair queueing,
capacity budgets, auth framework/SSO, GUI, or multi-target support. If a task
seems to need them, stop and flag it — that means the design is being violated.
## Definition of done (whole project)
A deployable daemon that:
- fronts one configurable Ollama target and transparently proxies native
`/api/chat`, `/api/tags`, `/api/ps` (so `go-llm` uses the Mac as a target with
no provider changes), including streaming;
- runs a durable SQLite-backed queue with a single worker and drain-by-model
scheduling, surviving restarts and target sleep;
- exposes an async `POST /jobs` surface returning a job ID, with
`queued→loading→working→done/failed` state webhooks and artifact delivery;
- ships a Go client package (synchronous facade over the async surface);
- passes CI on Gitea, builds as a container, and deploys via a steveternet
`docker-compose.yml` behind Traefik.
## Phase map
1. Scaffold, config, SQLite store, health, CI, Dockerfile.
2. Ollama target client + model poller + native passthrough (the go-llm target).
3. Durable queue + single worker + drain-by-model.
4. Async `/jobs` + job IDs + state webhooks + artifacts.
5. Go client package (sync facade) + `llm.Foreman()` in go-llm.
6. Deploy: steveternet compose + Traefik, `.env.example`, deploy docs, model-pull script.
## Your task right now
Confirm you've read the sources above, briefly restate the architecture in your
own words (so I can check your understanding), flag anything in the ADRs you'd
push back on, then produce a **detailed plan for Phase 1 only**. Do not write code
yet. Stop for my approval.
+86
View File
@@ -0,0 +1,86 @@
# phase-1.md — Scaffold, config, store, health, CI, Dockerfile
Re-ground: skim `CLAUDE.md`, `docs/adr/` (esp. 0002 placement, 0008 SQLite,
0010 security), and `progress.md`. Plan, get my approval, then implement.
## Objective
A buildable, testable, containerized skeleton with the persistence layer and a
health endpoint — no Ollama logic yet.
## Tasks
- Initialize the module: `gitea.stevedudenhoeffer.com/steve/foreman`, Go 1.26.
- Layout per CLAUDE.md: `cmd/foreman/main.go` (single binary, subcommand
skeleton: `serve`, plus stubs for `submit`, `jobs`, `ps`), `internal/config`,
`internal/store`, `internal/server`. Don't create empty packages for later
phases.
- `internal/config`: load from env into a struct — `FOREMAN_ADDR` (listen addr,
default `:8080`), `FOREMAN_OLLAMA_URL` (target, required), `FOREMAN_TOKEN`
(optional inbound bearer), `FOREMAN_DB_PATH`, `FOREMAN_POLL_INTERVAL`. Provide
a `.env.example` documenting every key.
- `internal/store`: SQLite via `modernc.org/sqlite`, WAL mode, with an embedded
migration for the `jobs` and `artifacts` tables (schema sketch in ADR-0008 /
ADR-0006). Include open/close, migrate-on-start, and basic CRUD with tests
(use a temp-file DB per test).
- `internal/server`: `net/http` server with `GET /healthz` (returns ok + a
placeholder degraded flag for later) and optional bearer-token middleware
(validate `Authorization: Bearer` only when `FOREMAN_TOKEN` is set).
- `.gitea/workflows/ci.yaml` — mirror `go-llm`'s, single-module:
```yaml
name: CI
on:
push: { branches: ["*"] }
pull_request: { branches: ["*"] }
jobs:
build:
name: Build & Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with: { go-version: "1.26" }
- run: go mod download
- run: go build ./...
- run: go vet ./...
- run: go test -race -count=1 ./...
tidy:
name: Tidy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with: { go-version: "1.26" }
- run: |
go mod tidy
git diff --exit-code go.mod go.sum
```
- `Dockerfile` — multi-stage, pure-Go static build (works because modernc is
CGO-free):
```dockerfile
FROM golang:1.26 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /out/foreman ./cmd/foreman
FROM gcr.io/distroless/static-debian12
COPY --from=build /out/foreman /foreman
EXPOSE 8080
ENTRYPOINT ["/foreman", "serve"]
```
- `.gitignore` (use the project's), `README.md` (what + quickstart), and seed
`progress.md` with the M0 entry.
## Definition of done
- `go build/vet/test -race` all green; `store` has passing tests.
- `docker build .` succeeds; running the image serves `GET /healthz`.
- `cmd/foreman serve` boots, reads config from env, opens/migrates the DB.
Wrap up: append to `progress.md`, commit on `phase-1-scaffold`, summarize what's
done and what Phase 2 will need.
+42
View File
@@ -0,0 +1,42 @@
# phase-2.md — Ollama target client, model poller, native passthrough
Re-ground: `CLAUDE.md` + ADR-0003 (API surface), 0007 (model polling), 0012
(streaming), 0002 (unreachable = transient). Plan, get approval, implement.
## Objective
Make foreman a working transparent front for its Ollama target — enough that
`go-llm` can use the Mac as a target *today*, before any queue exists. (Phase 3
will move this through the queue; here it can proxy directly.)
## Tasks
- `internal/ollama`: a small client to the target (`FOREMAN_OLLAMA_URL`) behind
an interface, covering `POST /api/chat` (streaming and non-streaming),
`GET /api/tags`, `GET /api/ps`. Attach the outbound bearer if configured. Wrap
errors; classify connection failures distinctly (Phase 3 needs that signal).
- Model poller (goroutine): poll `/api/tags` every `FOREMAN_POLL_INTERVAL`
(default 30s) into an in-memory inventory with a mutex; track last-poll time
and a degraded flag. On target unreachable, retain last-known inventory and set
degraded — do not clear it. Wire degraded state into `/healthz`.
- Passthrough handlers in `internal/server`:
- `GET /api/tags` and `GET /api/ps` served from the poller/target.
- `POST /api/chat`: validate the requested model against the inventory (one
re-poll on miss, then 4xx if still absent); proxy to the target. Support
streaming faithfully (stream the target's chunks straight through; set the
right content type). For now this may call the target directly — no queue.
- Tests: a stub HTTP server standing in for Ollama; assert tags/ps proxy,
model validation rejects unknown models, streaming passes chunks through, and
the poller flips degraded on target failure and recovers.
## Definition of done
- `go build/vet/test -race` green.
- Against a real or stubbed Ollama: `curl .../api/tags` returns the inventory;
a non-streaming and a streaming `/api/chat` both work end-to-end.
- Acceptance: from a scratch Go program, `llm.Ollama(llm.WithBaseURL("http://<foreman>:8080"))`
(or `llm.OllamaCloud(token, WithBaseURL(...))` if a token is set) completes a
chat through foreman. Note this in `progress.md`.
Wrap up: `progress.md`, commit on `phase-2-passthrough`, note what Phase 3 changes
(routing this through the queue).
+40
View File
@@ -0,0 +1,40 @@
# phase-3.md — Durable queue, single worker, drain-by-model
Re-ground: `CLAUDE.md` + ADR-0009 (single worker / drain-by-model), 0008 (queue),
0004 (lifecycle/retry). Plan, get approval, implement.
## Objective
Route execution through the SQLite queue with exactly one worker and
drain-by-model scheduling. The synchronous passthrough from Phase 2 now enqueues
and blocks on completion instead of calling the target directly.
## Tasks
- Promote chat requests to persisted jobs: every `/api/chat` call creates a `jobs`
row (state `queued`), and the handler blocks until that job reaches a terminal
state, then writes the response. Assign a **ULID** as the job id now (used
everywhere in Phase 4).
- `internal/worker`: a single worker loop (concurrency 1). Select the next job
with `ORDER BY (model != :current_resident), created_at` so all jobs for the
currently-resident model (from `/api/ps`) drain before a swap. Transition
`queued→loading→working→done`. Pin residency with Ollama `keep_alive`.
- Retry semantics (ADR-0004): a connection failure to the target re-queues the
job with backoff and increments `attempt`; exceeding a bounded max moves it to
`failed` with the last error stored. Never auto-fail on a single transient
error. Jobs survive process restart (resume `queued`/in-flight on boot).
- Tests: against the stub Ollama — jobs persist and execute serially; a sequence
mixing two models drains by model (assert the swap happens once, not per job);
a flapping target causes retry-then-success without data loss; restart mid-queue
resumes cleanly.
## Definition of done
- `go build/vet/test -race` green.
- The Phase 2 acceptance (go-llm completes a chat) still passes, now served
through the queue.
- Demonstrable: enqueue several jobs across two models and observe drain-by-model
ordering in logs; kill and restart foreman mid-queue and watch it resume.
Wrap up: `progress.md`, commit on `phase-3-queue`. M0 is effectively complete here
— note that. Phase 4 adds the async surface on top of this same engine.
+42
View File
@@ -0,0 +1,42 @@
# phase-4.md — Async /jobs, job IDs, state webhooks, artifacts
Re-ground: `CLAUDE.md` + ADR-0004 (async surface), 0005 (webhook protocol), 0006
(artifacts). Plan, get approval, implement. **This phase delivers the headline
"queue & webhooks" capability — get it genuinely working end-to-end.**
## Objective
A fire-and-forget async surface over the Phase 3 engine: submit a job, get an ID
immediately, receive state updates and the final result/artifacts by webhook.
## Tasks
- `POST /jobs`: body is a native-chat payload plus optional `state_webhook_url`
(and optional HMAC secret usage per config). Enqueue (reusing the Phase 3
engine), return `202` with `{ "job_id": "<ulid>" }` immediately.
- `GET /jobs/{id}`: current state, result, error, and artifact metadata (the
recovery/poll path for missed webhooks).
- `internal/webhook`: on each state transition (`queued→loading→working→done`,
plus `failed`), POST the event JSON from ADR-0005 to `state_webhook_url`.
At-least-once with bounded retry + backoff; never let a flaky receiver block or
fail the job. If a webhook secret is configured, sign the body with
HMAC-SHA256 in `X-Foreman-Signature`.
- Artifacts (ADR-0006): persist the completion as artifact `completion`; deliver
artifacts inline in the `done` event under a size threshold (default ~256KB),
otherwise send metadata + a URL and serve bytes at
`GET /jobs/{id}/artifacts/{name}`. Add a TTL prune sweep for jobs+artifacts.
- Decide and record (new ADR if needed) whether `state_webhook_url` is also
honored when present on the sync `/api/chat` path, or only on `/jobs`.
- Tests: spin up an in-test webhook receiver; assert the full lifecycle fires in
order with correct payloads, idempotency keys (`job_id`+`state`) are present,
a 500-ing receiver triggers retries without affecting job state, large
artifacts go by URL and small ones inline, and HMAC signatures verify.
## Definition of done
- `go build/vet/test -race` green.
- End-to-end demo: `POST /jobs` with a `state_webhook_url` pointed at a tiny local
listener; observe `queued→loading→working→done` plus the completion artifact;
confirm `GET /jobs/{id}` reconciles after a deliberately dropped webhook.
Wrap up: `progress.md`, commit on `phase-4-async-webhooks`. M1 core is complete.
+41
View File
@@ -0,0 +1,41 @@
# phase-5.md — Go client package + llm.Foreman() in go-llm
Re-ground: `CLAUDE.md` + ADR-0011 (integration levels). Plan, get approval,
implement. Note this phase touches **two repos**.
## Objective
Make foreman ergonomic from Go: a synchronous client over the async surface
(Level 1), and the trivial `go-llm` target constructor (Level 0).
## Tasks — foreman repo
- `client/` (the one public package): a `foreman.Client` with a synchronous
`Complete(ctx, req) (Result, error)` that submits to `POST /jobs` with a
`state_webhook_url` pointed at an ephemeral receiver it stands up, blocks until
`done`, and returns the result + artifacts. Requirements:
- Fall back to polling `GET /jobs/{id}` if it can't bind/receive callbacks
(configurable; needed when foreman can't reach the caller).
- Respect `ctx` cancellation/deadline; clean up the receiver and (optionally)
best-effort cancel the job.
- Constructor takes base URL + optional token; verify HMAC if a secret is set.
- Tests against a stubbed foreman (or the real handlers): happy path returns the
artifact; polling-fallback path works; context timeout is honored.
## Tasks — go-llm repo (via gitea MCP; open a branch/PR, do not push to main)
- Add the `Foreman` constructor to `v2/constructors.go`, delegating to the ollama
provider (exact form is specified in ADR-0011). Update `v2/CLAUDE.md` with the
DD#9 entry. Keep it a thin pass-through (Level 0); do **not** build a dedicated
provider yet.
- Confirm it compiles and `go vet`/tests pass in the `v2` module.
## Definition of done
- foreman: `go build/vet/test -race` green; `client.Complete()` returns a real
result against a running foreman.
- go-llm: `llm.Foreman("http://<foreman>:8080", token).Model("qwen3:30b")`
completes a chat. Changes are on a branch/PR for review, not committed to main.
Wrap up: `progress.md`, commit `client/` on `phase-5-client`; report the go-llm
branch name and PR link for me to review/merge.
+57
View File
@@ -0,0 +1,57 @@
# phase-6.md — Deploy: steveternet compose + Traefik, env, docs, model script
Re-ground: `CLAUDE.md` + ADR-0002 (placement), 0010 (security). Plan, get
approval, implement. This phase touches **two repos** and must mirror existing
steveternet conventions — read them, don't invent.
## Objective
Make foreman deployable on orgrimmar via Komodo, exposed through Traefik, with
its model roster and operational notes documented.
## Tasks — read first (gitea MCP, steve/steveternet)
Study these for the exact conventions (network name, entrypoint, certresolver,
router/service label format, restart policy, `.env` usage):
`kalimdor/orgrimmar/warhol-queue/{docker-compose.yml,.env.example}`,
`kalimdor/orgrimmar/ratchet/docker-compose.yml`,
`kalimdor/orgrimmar/mort/docker-compose.yml`, and
`kalimdor/orgrimmar/traefik/` (incl. `custom/`).
## Tasks — foreman repo
- Finalize the `Dockerfile` from Phase 1 (label image, pin base digests if that's
the house style).
- `.env.example`: every config key with safe placeholder values, including
`FOREMAN_OLLAMA_URL` (the Mac's Tailscale address) and `FOREMAN_TOKEN`.
- `scripts/pull-models.sh`: the roster pulls (`qwen3:14b`, `qwen3:30b`,
`nomic-embed-text`, with the optional ones commented) plus the Mac-side
`launchctl setenv OLLAMA_MAX_LOADED_MODELS 2 / OLLAMA_KEEP_ALIVE -1 /
OLLAMA_CONTEXT_LENGTH 8192` lines as comments.
- `docs/deploy.md`: how it deploys (Komodo + compose), the security model
(Traefik internal-only or Tailscale; **not** a public entrypoint; Ollama target
firewalled to foreman), and the Mac prerequisites (Ollama bound to the tailnet,
`caffeinate`/`pmset`).
## Tasks — steveternet repo (gitea MCP; branch/PR, not main)
- Create `kalimdor/orgrimmar/foreman/docker-compose.yml` mirroring the analogs:
pull the foreman image from the gitea registry, the standard Traefik network +
router/service labels, `restart` policy, env from `.env`, and a named volume
for the SQLite DB. Decide (and note) whether the router is internal-only.
- Add `kalimdor/orgrimmar/foreman/.env.example`.
- If host-level routing belongs in `traefik/custom/` (as some services do), add
the file there instead/as-well, following those examples.
## Definition of done
- `docker build .` clean; compose validates (`docker compose config`).
- Labels/network/entrypoint match a sibling service exactly (diff against
`ratchet`/`warhol-queue` and confirm).
- `docs/deploy.md` is enough for a cold deploy. steveternet changes are on a
branch/PR for review.
Wrap up: `progress.md` (mark the project deployable), commit foreman docs/scripts
on `phase-6-deploy`; report the steveternet branch/PR. Then give me a short
end-to-end smoke-test checklist (pull models on the Mac → deploy foreman → go-llm
chat → `POST /jobs` with a webhook).