From e119ed325b1419c17617725cf8fd629820aacb26 Mon Sep 17 00:00:00 2001 From: Steve Dudenhoeffer Date: Sat, 23 May 2026 18:43:10 -0400 Subject: [PATCH] chore: add deployment docs, model script, and finalize env config Phase 6 deployment infrastructure: finalize Dockerfile with OCI labels, improve .env.example with grouped config keys, add scripts/pull-models.sh for Mac-side model setup, and add docs/deploy.md covering the full deployment topology, prerequisites, security model, and troubleshooting. Co-Authored-By: Claude Opus 4.7 (1M context) --- .env.example | 39 +++++--- Dockerfile | 2 + docs/deploy.md | 214 +++++++++++++++++++++++++++++++++++++++++ progress.md | 23 +++++ scripts/pull-models.sh | 34 +++++++ 5 files changed, 297 insertions(+), 15 deletions(-) create mode 100644 docs/deploy.md create mode 100755 scripts/pull-models.sh diff --git a/.env.example b/.env.example index 37c7752..c2016b9 100644 --- a/.env.example +++ b/.env.example @@ -1,32 +1,41 @@ -# foreman configuration — all env vars are FOREMAN_* namespaced. -# Copy to .env and fill in values for local development. +# === foreman daemon configuration === -# Listen address for the HTTP server (default: :8080) +# Listen address (default: :8080) FOREMAN_ADDR=:8080 -# Base URL of the Ollama target (required) -FOREMAN_OLLAMA_URL=http://mac.tail:11434 +# Ollama target URL (required — the Mac's Tailscale address) +FOREMAN_OLLAMA_URL=http://100.x.x.x:11434 -# Optional bearer token foreman sends to the Ollama target +# Outbound bearer token for Ollama target (optional) FOREMAN_OLLAMA_TOKEN= -# Optional bearer token callers must present to foreman -FOREMAN_TOKEN= +# Inbound bearer token foreman requires of its callers (optional) +FOREMAN_TOKEN=change-me-to-a-secret -# Always-resident embedder model (e.g. nomic-embed-text, qwen3-embedding:0.6b) +# === Model configuration === + +# Always-resident embedding model (pinned in slot 1) FOREMAN_EMBED_MODEL=nomic-embed-text -# Path to the SQLite database file (default: foreman.db) -FOREMAN_DB_PATH=foreman.db +# === Persistence === -# How often to poll the target's /api/tags (default: 30s) +# SQLite database path (default: foreman.db) +FOREMAN_DB_PATH=/data/foreman.db + +# === Polling === + +# Model polling interval (default: 30s) FOREMAN_POLL_INTERVAL=30s -# Optional HMAC key for signing webhook payloads (ADR-0005) +# === Webhooks === + +# Webhook HMAC signing secret (optional — signs X-Foreman-Signature header) FOREMAN_WEBHOOK_SECRET= -# Maximum retry attempts for a job before marking as failed (default: 3) +# === Job lifecycle === + +# Max retry attempts for failed jobs (default: 3) FOREMAN_MAX_ATTEMPTS=3 -# How long to retain completed/failed jobs before pruning (default: 24h) +# TTL for completed/failed jobs before pruning (default: 24h) FOREMAN_JOB_TTL=24h diff --git a/Dockerfile b/Dockerfile index 0835faa..cce07df 100644 --- a/Dockerfile +++ b/Dockerfile @@ -6,6 +6,8 @@ COPY . . RUN CGO_ENABLED=0 go build -o /out/foreman ./cmd/foreman FROM gcr.io/distroless/static-debian12 +LABEL org.opencontainers.image.source="https://gitea.stevedudenhoeffer.com/steve/foreman" +LABEL org.opencontainers.image.description="Queued Ollama proxy daemon" COPY --from=build /out/foreman /foreman EXPOSE 8080 ENTRYPOINT ["/foreman", "serve"] diff --git a/docs/deploy.md b/docs/deploy.md new file mode 100644 index 0000000..974fcd9 --- /dev/null +++ b/docs/deploy.md @@ -0,0 +1,214 @@ +# foreman deployment guide + +## Overview + +foreman runs on **orgrimmar** (homelab server), containerized via Komodo/docker-compose, +reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a +dumb appliance running only Ollama; foreman handles queuing, model inventory, and +job lifecycle. + +``` +orgrimmar (docker) Tailscale / VLAN M1 Pro Mac +┌──────────────┐ ┌─────────────┐ ┌──────────┐ +│ foreman │──HTTP───▶│ 100.x.x.x │────────▶│ Ollama │ +│ :8080 │ │ :11434 │ │ :11434 │ +└──────────────┘ └─────────────┘ └──────────┘ +``` + +## Prerequisites on the Mac + +### 1. Install and configure Ollama + +Ollama must be installed and listening on a network-accessible address (not just +localhost). Either bind to `0.0.0.0` or the Tailscale IP: + +```bash +launchctl setenv OLLAMA_HOST 0.0.0.0:11434 +``` + +### 2. Set environment variables + +```bash +launchctl setenv OLLAMA_MAX_LOADED_MODELS 2 +launchctl setenv OLLAMA_CONTEXT_LENGTH 8192 +``` + +Then restart the Ollama application for changes to take effect. + +- `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 for the always-resident embedder, slot 2 + for the rotating worker model. +- `OLLAMA_CONTEXT_LENGTH=8192` — minimum recommended context window. + +### 3. Pull models + +Run the helper script from the foreman repo on the Mac: + +```bash +OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh +``` + +This pulls the recommended roster: +- **nomic-embed-text** — embedder (always resident, slot 1, ~0.3 GB) +- **qwen3:14b** — parse/data tasks (~9 GB) +- **qwen3:30b** — agent + code, default worker (~19 GB) + +### 4. Prevent sleep during jobs + +Use `caffeinate` or `pmset` to prevent the Mac from sleeping while foreman may +dispatch work: + +```bash +caffeinate -s & +# Or permanently via System Settings > Energy Saver +``` + +### 5. Firewall + +Ollama's `:11434` should be accessible only from foreman's IP (the orgrimmar host). +Use either: + +- **Tailscale ACLs** — restrict `:11434` to orgrimmar's Tailscale IP. +- **macOS firewall** — allow inbound on `:11434` only from orgrimmar. +- **pf rules** — for more granular control. + +## foreman deployment on orgrimmar + +### Image + +The container image is built by gitea CI (`.gitea/workflows/ci.yaml`) and pushed +to the registry: + +``` +gitea.stevedudenhoeffer.com/steve/foreman:latest +``` + +### Komodo deployment + +Komodo reads the docker-compose.yml from the steveternet repo at +`azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml`. + +1. Copy `.env.example` to `.env` and fill in values (see below). +2. Deploy via Komodo's stack sync. + +### Configuration + +Create `.env` from `.env.example` in the same directory as the compose file: + +```bash +# Required — the Mac's Tailscale or LAN address +FOREMAN_OLLAMA_URL=http://100.x.x.x:11434 + +# Optional — bearer token foreman sends to Ollama target +FOREMAN_OLLAMA_TOKEN= + +# Optional — bearer token callers must present to foreman +FOREMAN_TOKEN=your-secret-here + +# Embedder model (must be pulled on Mac) +FOREMAN_EMBED_MODEL=nomic-embed-text + +# Other settings have sensible defaults; see .env.example for the full list. +``` + +### Persistence + +SQLite is persisted in a named Docker volume `foreman_data` mounted at `/data`. +The database file is `/data/foreman.db` (WAL mode, pure-Go driver, no CGO). + +## Security model + +foreman is **not** exposed on a public Traefik entrypoint: + +- It gets Traefik labels for **internal hostname routing** only: + `foreman.orgrimmar.dudenhoeffer.casa` resolves internally on the LAN/Tailscale. +- It is **not** in any public DNS. +- Accessible via LAN and Tailscale only. + +### Authentication + +- **Inbound (callers to foreman):** optional static bearer token via + `FOREMAN_TOKEN`. When set, callers must send `Authorization: Bearer `. + The `/healthz` endpoint is always unauthenticated. +- **Outbound (foreman to Ollama):** optional bearer token via `FOREMAN_OLLAMA_TOKEN`, + forwarded to the target on every request. +- **Webhooks:** optional HMAC-SHA256 signing via `FOREMAN_WEBHOOK_SECRET`. When + set, foreman adds `X-Foreman-Signature: sha256=` to webhook POSTs. + +## go-llm usage + +foreman is a drop-in Ollama-compatible target for go-llm: + +```go +import "gitea.stevedudenhoeffer.com/steve/go-llm/v2" + +model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b") +``` + +This uses the synchronous `/api/chat` passthrough. Streaming, tool calling, and +thinking tokens all work transparently. + +For async job submission, use the client package: + +```go +import "gitea.stevedudenhoeffer.com/steve/foreman/client" + +c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa", + client.WithToken("your-token"), +) +result, err := c.Submit(ctx, client.SubmitRequest{ + Model: "qwen3:30b", + Messages: messages, +}) +``` + +## Troubleshooting + +### Target unreachable + +**Symptom:** `/healthz` returns `{"status":"ok","degraded":true}`, jobs fail with +connection errors. + +**Cause:** The Mac is asleep, Ollama is not running, or the network path is broken. + +**Fix:** +1. Wake the Mac / start Ollama. +2. Verify connectivity: `curl http://100.x.x.x:11434/api/tags` from orgrimmar. +3. Check Tailscale status: `tailscale status` on both machines. +4. Jobs will automatically retry (up to `FOREMAN_MAX_ATTEMPTS`). The poller + recovers automatically when the target comes back. + +### Model not found (404) + +**Symptom:** `/api/chat` or `POST /jobs` returns 404 for a model name. + +**Fix:** +1. Verify the model is pulled on the Mac: `ollama list`. +2. Check the exact tag — Ollama tags change between versions. +3. foreman re-polls on a miss; if the model was just pulled, retry after + `FOREMAN_POLL_INTERVAL` (default 30s). + +### HMAC signature mismatch + +**Symptom:** Webhook receiver rejects events with signature errors. + +**Fix:** +1. Verify `FOREMAN_WEBHOOK_SECRET` matches between foreman and the receiver. +2. The signature covers the raw JSON body; verify the receiver reads the body + before parsing. + +### Job stuck in loading/working + +**Symptom:** A job stays in a non-terminal state indefinitely. + +**Cause:** foreman crashed or restarted mid-job. + +**Fix:** foreman resets interrupted jobs (loading/working) to queued on startup. +Restart foreman to recover. Jobs are retried up to `FOREMAN_MAX_ATTEMPTS`. + +### SQLite busy/locked errors + +**Symptom:** HTTP handlers return 500 with "database is locked". + +**Fix:** The SQLite DSN includes `busy_timeout=5000` (5 seconds). If this is +insufficient under load, increase it. WAL mode ensures readers do not block +the single writer. diff --git a/progress.md b/progress.md index 16f4686..aed667f 100644 --- a/progress.md +++ b/progress.md @@ -203,3 +203,26 @@ with the real SQLite-backed job queue and single worker loop. - Delegates to existing `ollamaProvider.New()` — zero new code paths. - DD#9 added to `v2/CLAUDE.md`. - PR: https://gitea.stevedudenhoeffer.com/steve/go-llm/pulls/4 + +## Phase 6: Deployment infrastructure — 2026-05-23 + +**Project is deployable.** All deployment artifacts are in place. + +- `Dockerfile` — finalized with OCI labels (`image.source`, `image.description`). + Multi-stage distroless build, CGO_ENABLED=0, `foreman serve` entrypoint. +- `.env.example` — finalized with all 10 config keys from `internal/config/`, + grouped by function (daemon, model, persistence, polling, webhooks, job lifecycle) + with clear comments and example values. +- `scripts/pull-models.sh` — executable helper to pull the recommended model roster + on the Mac (nomic-embed-text, qwen3:14b, qwen3:30b). Prints Mac-side Ollama + environment setup instructions. +- `docs/deploy.md` — full deployment guide covering: topology overview, Mac + prerequisites (Ollama config, env vars, model pull, sleep prevention, firewall), + orgrimmar deployment (image registry, Komodo, config, persistence), security + model (internal-only, no public DNS, bearer tokens, HMAC), go-llm usage + (sync + async), and troubleshooting (6 common scenarios). +- steveternet compose stack — PR to `steve/steveternet` adding + `azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml` and `.env.example`. + Follows sibling conventions: `web` network (external), `unless-stopped`, + gitea registry image, Traefik labels for internal routing, named volume + for SQLite persistence, all config via `${VAR}` interpolation. diff --git a/scripts/pull-models.sh b/scripts/pull-models.sh new file mode 100755 index 0000000..cc1b58d --- /dev/null +++ b/scripts/pull-models.sh @@ -0,0 +1,34 @@ +#!/usr/bin/env bash +# Pull the recommended model roster on the Mac. +# Run this ON the Mac where Ollama is installed. +set -euo pipefail + +OLLAMA_HOST="${OLLAMA_HOST:-http://localhost:11434}" + +echo "=== Pulling models to ${OLLAMA_HOST} ===" + +# Embedder (always resident, slot 1) +echo "--- Embedder: nomic-embed-text ---" +curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"nomic-embed-text"}' | jq -r '.status // empty' + +# Worker models (rotate through slot 2) +echo "--- Worker: qwen3:14b (parse/data) ---" +curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen3:14b"}' | jq -r '.status // empty' + +echo "--- Worker: qwen3:30b (agent+code, default) ---" +curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen3:30b"}' | jq -r '.status // empty' + +# Optional — uncomment if needed: +# echo "--- Worker: gpt-oss:20b (fast coder) ---" +# curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"gpt-oss:20b"}' | jq -r '.status // empty' +# echo "--- Worker: qwen2.5-coder:32b (quality coder, slow) ---" +# curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen2.5-coder:32b"}' | jq -r '.status // empty' + +echo "" +echo "=== Mac-side Ollama environment (set via launchctl or .zshrc) ===" +echo " OLLAMA_MAX_LOADED_MODELS=2" +echo " OLLAMA_KEEP_ALIVE=-1 # for the embedder slot" +echo " OLLAMA_CONTEXT_LENGTH=8192 # minimum recommended" +echo "" +echo " Example: launchctl setenv OLLAMA_MAX_LOADED_MODELS 2" +echo " Then restart Ollama."