chore: add deployment docs, model script, and finalize env config

Phase 6 deployment infrastructure: finalize Dockerfile with OCI labels, improve .env.example with grouped config keys, add scripts/pull-models.sh for Mac-side model setup, and add docs/deploy.md covering the full deployment topology, prerequisites, security model, and troubleshooting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:43:10 -04:00
parent 4759a06d1b
commit e119ed325b
5 changed files with 297 additions and 15 deletions
@@ -1,32 +1,41 @@
-# foreman configuration — all env vars are FOREMAN_* namespaced.
+# === foreman daemon configuration ===
 # Copy to .env and fill in values for local development.
-# Listen address for the HTTP server (default: :8080)
+# Listen address (default: :8080)
 FOREMAN_ADDR=:8080
-# Base URL of the Ollama target (required)
+# Ollama target URL (required — the Mac's Tailscale address)
-FOREMAN_OLLAMA_URL=http://mac.tail:11434
+FOREMAN_OLLAMA_URL=http://100.x.x.x:11434
-# Optional bearer token foreman sends to the Ollama target
+# Outbound bearer token for Ollama target (optional)
 FOREMAN_OLLAMA_TOKEN=
-# Optional bearer token callers must present to foreman
+# Inbound bearer token foreman requires of its callers (optional)
-FOREMAN_TOKEN=
+FOREMAN_TOKEN=change-me-to-a-secret
-# Always-resident embedder model (e.g. nomic-embed-text, qwen3-embedding:0.6b)
+# === Model configuration ===
 # Always-resident embedding model (pinned in slot 1)
 FOREMAN_EMBED_MODEL=nomic-embed-text
-# Path to the SQLite database file (default: foreman.db)
+# === Persistence ===
 FOREMAN_DB_PATH=foreman.db
-# How often to poll the target's /api/tags (default: 30s)
+# SQLite database path (default: foreman.db)
 FOREMAN_DB_PATH=/data/foreman.db
 # === Polling ===
 # Model polling interval (default: 30s)
 FOREMAN_POLL_INTERVAL=30s
-# Optional HMAC key for signing webhook payloads (ADR-0005)
+# === Webhooks ===
 # Webhook HMAC signing secret (optional — signs X-Foreman-Signature header)
 FOREMAN_WEBHOOK_SECRET=
-# Maximum retry attempts for a job before marking as failed (default: 3)
+# === Job lifecycle ===
 # Max retry attempts for failed jobs (default: 3)
 FOREMAN_MAX_ATTEMPTS=3
-# How long to retain completed/failed jobs before pruning (default: 24h)
+# TTL for completed/failed jobs before pruning (default: 24h)
 FOREMAN_JOB_TTL=24h
@@ -6,6 +6,8 @@ COPY . .
 RUN CGO_ENABLED=0 go build -o /out/foreman ./cmd/foreman
 FROM gcr.io/distroless/static-debian12
 LABEL org.opencontainers.image.source="https://gitea.stevedudenhoeffer.com/steve/foreman"
 LABEL org.opencontainers.image.description="Queued Ollama proxy daemon"
 COPY --from=build /out/foreman /foreman
 EXPOSE 8080
 ENTRYPOINT ["/foreman", "serve"]
@@ -0,0 +1,214 @@
 # foreman deployment guide
 ## Overview
 foreman runs on **orgrimmar** (homelab server), containerized via Komodo/docker-compose,
 reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a
 dumb appliance running only Ollama; foreman handles queuing, model inventory, and
 job lifecycle.
 ```
 orgrimmar (docker)         Tailscale / VLAN        M1 Pro Mac
 ┌──────────────┐          ┌─────────────┐         ┌──────────┐
 │   foreman    │──HTTP───▶│  100.x.x.x  │────────▶│  Ollama  │
 │  :8080       │          │  :11434     │         │  :11434  │
 └──────────────┘          └─────────────┘         └──────────┘
 ```
 ## Prerequisites on the Mac
 ### 1. Install and configure Ollama
 Ollama must be installed and listening on a network-accessible address (not just
 localhost). Either bind to `0.0.0.0` or the Tailscale IP:
 ```bash
 launchctl setenv OLLAMA_HOST 0.0.0.0:11434
 ```
 ### 2. Set environment variables
 ```bash
 launchctl setenv OLLAMA_MAX_LOADED_MODELS 2
 launchctl setenv OLLAMA_CONTEXT_LENGTH 8192
 ```
 Then restart the Ollama application for changes to take effect.
 - `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 for the always-resident embedder, slot 2
  for the rotating worker model.
 - `OLLAMA_CONTEXT_LENGTH=8192` — minimum recommended context window.
 ### 3. Pull models
 Run the helper script from the foreman repo on the Mac:
 ```bash
 OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh
 ```
 This pulls the recommended roster:
 - **nomic-embed-text** — embedder (always resident, slot 1, ~0.3 GB)
 - **qwen3:14b** — parse/data tasks (~9 GB)
 - **qwen3:30b** — agent + code, default worker (~19 GB)
 ### 4. Prevent sleep during jobs
 Use `caffeinate` or `pmset` to prevent the Mac from sleeping while foreman may
 dispatch work:
 ```bash
 caffeinate -s &
 # Or permanently via System Settings > Energy Saver
 ```
 ### 5. Firewall
 Ollama's `:11434` should be accessible only from foreman's IP (the orgrimmar host).
 Use either:
 - **Tailscale ACLs** — restrict `:11434` to orgrimmar's Tailscale IP.
 - **macOS firewall** — allow inbound on `:11434` only from orgrimmar.
 - **pf rules** — for more granular control.
 ## foreman deployment on orgrimmar
 ### Image
 The container image is built by gitea CI (`.gitea/workflows/ci.yaml`) and pushed
 to the registry:
 ```
 gitea.stevedudenhoeffer.com/steve/foreman:latest
 ```
 ### Komodo deployment
 Komodo reads the docker-compose.yml from the steveternet repo at
 `azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml`.
 1. Copy `.env.example` to `.env` and fill in values (see below).
 2. Deploy via Komodo's stack sync.
 ### Configuration
 Create `.env` from `.env.example` in the same directory as the compose file:
 ```bash
 # Required — the Mac's Tailscale or LAN address
 FOREMAN_OLLAMA_URL=http://100.x.x.x:11434
 # Optional — bearer token foreman sends to Ollama target
 FOREMAN_OLLAMA_TOKEN=
 # Optional — bearer token callers must present to foreman
 FOREMAN_TOKEN=your-secret-here
 # Embedder model (must be pulled on Mac)
 FOREMAN_EMBED_MODEL=nomic-embed-text
 # Other settings have sensible defaults; see .env.example for the full list.
 ```
 ### Persistence
 SQLite is persisted in a named Docker volume `foreman_data` mounted at `/data`.
 The database file is `/data/foreman.db` (WAL mode, pure-Go driver, no CGO).
 ## Security model
 foreman is **not** exposed on a public Traefik entrypoint:
 - It gets Traefik labels for **internal hostname routing** only:
  `foreman.orgrimmar.dudenhoeffer.casa` resolves internally on the LAN/Tailscale.
 - It is **not** in any public DNS.
 - Accessible via LAN and Tailscale only.
 ### Authentication
 - **Inbound (callers to foreman):** optional static bearer token via
  `FOREMAN_TOKEN`. When set, callers must send `Authorization: Bearer <token>`.
  The `/healthz` endpoint is always unauthenticated.
 - **Outbound (foreman to Ollama):** optional bearer token via `FOREMAN_OLLAMA_TOKEN`,
  forwarded to the target on every request.
 - **Webhooks:** optional HMAC-SHA256 signing via `FOREMAN_WEBHOOK_SECRET`. When
  set, foreman adds `X-Foreman-Signature: sha256=<hex>` to webhook POSTs.
 ## go-llm usage
 foreman is a drop-in Ollama-compatible target for go-llm:
 ```go
 import "gitea.stevedudenhoeffer.com/steve/go-llm/v2"
 model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b")
 ```
 This uses the synchronous `/api/chat` passthrough. Streaming, tool calling, and
 thinking tokens all work transparently.
 For async job submission, use the client package:
 ```go
 import "gitea.stevedudenhoeffer.com/steve/foreman/client"
 c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa",
    client.WithToken("your-token"),
 )
 result, err := c.Submit(ctx, client.SubmitRequest{
    Model:    "qwen3:30b",
    Messages: messages,
 })
 ```
 ## Troubleshooting
 ### Target unreachable
 **Symptom:** `/healthz` returns `{"status":"ok","degraded":true}`, jobs fail with
 connection errors.
 **Cause:** The Mac is asleep, Ollama is not running, or the network path is broken.
 **Fix:**
 1. Wake the Mac / start Ollama.
 2. Verify connectivity: `curl http://100.x.x.x:11434/api/tags` from orgrimmar.
 3. Check Tailscale status: `tailscale status` on both machines.
 4. Jobs will automatically retry (up to `FOREMAN_MAX_ATTEMPTS`). The poller
   recovers automatically when the target comes back.
 ### Model not found (404)
 **Symptom:** `/api/chat` or `POST /jobs` returns 404 for a model name.
 **Fix:**
 1. Verify the model is pulled on the Mac: `ollama list`.
 2. Check the exact tag — Ollama tags change between versions.
 3. foreman re-polls on a miss; if the model was just pulled, retry after
   `FOREMAN_POLL_INTERVAL` (default 30s).
 ### HMAC signature mismatch
 **Symptom:** Webhook receiver rejects events with signature errors.
 **Fix:**
 1. Verify `FOREMAN_WEBHOOK_SECRET` matches between foreman and the receiver.
 2. The signature covers the raw JSON body; verify the receiver reads the body
   before parsing.
 ### Job stuck in loading/working
 **Symptom:** A job stays in a non-terminal state indefinitely.
 **Cause:** foreman crashed or restarted mid-job.
 **Fix:** foreman resets interrupted jobs (loading/working) to queued on startup.
 Restart foreman to recover. Jobs are retried up to `FOREMAN_MAX_ATTEMPTS`.
 ### SQLite busy/locked errors
 **Symptom:** HTTP handlers return 500 with "database is locked".
 **Fix:** The SQLite DSN includes `busy_timeout=5000` (5 seconds). If this is
 insufficient under load, increase it. WAL mode ensures readers do not block
 the single writer.
@@ -203,3 +203,26 @@ with the real SQLite-backed job queue and single worker loop.
  - Delegates to existing `ollamaProvider.New()` — zero new code paths.
  - DD#9 added to `v2/CLAUDE.md`.
  - PR: https://gitea.stevedudenhoeffer.com/steve/go-llm/pulls/4
 ## Phase 6: Deployment infrastructure — 2026-05-23
 **Project is deployable.** All deployment artifacts are in place.
 - `Dockerfile` — finalized with OCI labels (`image.source`, `image.description`).
  Multi-stage distroless build, CGO_ENABLED=0, `foreman serve` entrypoint.
 - `.env.example` — finalized with all 10 config keys from `internal/config/`,
  grouped by function (daemon, model, persistence, polling, webhooks, job lifecycle)
  with clear comments and example values.
 - `scripts/pull-models.sh` — executable helper to pull the recommended model roster
  on the Mac (nomic-embed-text, qwen3:14b, qwen3:30b). Prints Mac-side Ollama
  environment setup instructions.
 - `docs/deploy.md` — full deployment guide covering: topology overview, Mac
  prerequisites (Ollama config, env vars, model pull, sleep prevention, firewall),
  orgrimmar deployment (image registry, Komodo, config, persistence), security
  model (internal-only, no public DNS, bearer tokens, HMAC), go-llm usage
  (sync + async), and troubleshooting (6 common scenarios).
 - steveternet compose stack — PR to `steve/steveternet` adding
  `azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml` and `.env.example`.
  Follows sibling conventions: `web` network (external), `unless-stopped`,
  gitea registry image, Traefik labels for internal routing, named volume
  for SQLite persistence, all config via `${VAR}` interpolation.
@@ -0,0 +1,34 @@
 #!/usr/bin/env bash
 # Pull the recommended model roster on the Mac.
 # Run this ON the Mac where Ollama is installed.
 set -euo pipefail
 OLLAMA_HOST="${OLLAMA_HOST:-http://localhost:11434}"
 echo "=== Pulling models to ${OLLAMA_HOST} ==="
 # Embedder (always resident, slot 1)
 echo "--- Embedder: nomic-embed-text ---"
 curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"nomic-embed-text"}' | jq -r '.status // empty'
 # Worker models (rotate through slot 2)
 echo "--- Worker: qwen3:14b (parse/data) ---"
 curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen3:14b"}' | jq -r '.status // empty'
 echo "--- Worker: qwen3:30b (agent+code, default) ---"
 curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen3:30b"}' | jq -r '.status // empty'
 # Optional — uncomment if needed:
 # echo "--- Worker: gpt-oss:20b (fast coder) ---"
 # curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"gpt-oss:20b"}' | jq -r '.status // empty'
 # echo "--- Worker: qwen2.5-coder:32b (quality coder, slow) ---"
 # curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen2.5-coder:32b"}' | jq -r '.status // empty'
 echo ""
 echo "=== Mac-side Ollama environment (set via launchctl or .zshrc) ==="
 echo "  OLLAMA_MAX_LOADED_MODELS=2"
 echo "  OLLAMA_KEEP_ALIVE=-1        # for the embedder slot"
 echo "  OLLAMA_CONTEXT_LENGTH=8192  # minimum recommended"
 echo ""
 echo "  Example: launchctl setenv OLLAMA_MAX_LOADED_MODELS 2"
 echo "  Then restart Ollama."