chore: add deployment docs, model script, and finalize env config

Phase 6 deployment infrastructure: finalize Dockerfile with OCI labels, improve .env.example with grouped config keys, add scripts/pull-models.sh for Mac-side model setup, and add docs/deploy.md covering the full deployment topology, prerequisites, security model, and troubleshooting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:43:10 -04:00
parent 4759a06d1b
commit e119ed325b
5 changed files with 297 additions and 15 deletions
@@ -1,32 +1,41 @@
-# foreman configuration — all env vars are FOREMAN_* namespaced.
-# Copy to .env and fill in values for local development.
+# === foreman daemon configuration ===

-# Listen address for the HTTP server (default: :8080)
+# Listen address (default: :8080)
 FOREMAN_ADDR=:8080

-# Base URL of the Ollama target (required)
-FOREMAN_OLLAMA_URL=http://mac.tail:11434
+# Ollama target URL (required — the Mac's Tailscale address)
+FOREMAN_OLLAMA_URL=http://100.x.x.x:11434

-# Optional bearer token foreman sends to the Ollama target
+# Outbound bearer token for Ollama target (optional)
 FOREMAN_OLLAMA_TOKEN=

-# Optional bearer token callers must present to foreman
-FOREMAN_TOKEN=
+# Inbound bearer token foreman requires of its callers (optional)
+FOREMAN_TOKEN=change-me-to-a-secret

-# Always-resident embedder model (e.g. nomic-embed-text, qwen3-embedding:0.6b)
+# === Model configuration ===
+
+# Always-resident embedding model (pinned in slot 1)
 FOREMAN_EMBED_MODEL=nomic-embed-text

-# Path to the SQLite database file (default: foreman.db)
-FOREMAN_DB_PATH=foreman.db
+# === Persistence ===

-# How often to poll the target's /api/tags (default: 30s)
+# SQLite database path (default: foreman.db)
+FOREMAN_DB_PATH=/data/foreman.db
+
+# === Polling ===
+
+# Model polling interval (default: 30s)
 FOREMAN_POLL_INTERVAL=30s

-# Optional HMAC key for signing webhook payloads (ADR-0005)
+# === Webhooks ===
+
+# Webhook HMAC signing secret (optional — signs X-Foreman-Signature header)
 FOREMAN_WEBHOOK_SECRET=

-# Maximum retry attempts for a job before marking as failed (default: 3)
+# === Job lifecycle ===
+
+# Max retry attempts for failed jobs (default: 3)
 FOREMAN_MAX_ATTEMPTS=3

-# How long to retain completed/failed jobs before pruning (default: 24h)
+# TTL for completed/failed jobs before pruning (default: 24h)
 FOREMAN_JOB_TTL=24h
@@ -6,6 +6,8 @@ COPY . .
 RUN CGO_ENABLED=0 go build -o /out/foreman ./cmd/foreman

 FROM gcr.io/distroless/static-debian12
+LABEL org.opencontainers.image.source="https://gitea.stevedudenhoeffer.com/steve/foreman"
+LABEL org.opencontainers.image.description="Queued Ollama proxy daemon"
 COPY --from=build /out/foreman /foreman
 EXPOSE 8080
 ENTRYPOINT ["/foreman", "serve"]
@@ -0,0 +1,214 @@
+# foreman deployment guide
+
+## Overview
+
+foreman runs on **orgrimmar** (homelab server), containerized via Komodo/docker-compose,
+reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a
+dumb appliance running only Ollama; foreman handles queuing, model inventory, and
+job lifecycle.
+
+```
+orgrimmar (docker)         Tailscale / VLAN        M1 Pro Mac
+┌──────────────┐          ┌─────────────┐         ┌──────────┐
+│   foreman    │──HTTP───▶│  100.x.x.x  │────────▶│  Ollama  │
+│  :8080       │          │  :11434     │         │  :11434  │
+└──────────────┘          └─────────────┘         └──────────┘
+```
+
+## Prerequisites on the Mac
+
+### 1. Install and configure Ollama
+
+Ollama must be installed and listening on a network-accessible address (not just
+localhost). Either bind to `0.0.0.0` or the Tailscale IP:
+
+```bash
+launchctl setenv OLLAMA_HOST 0.0.0.0:11434
+```
+
+### 2. Set environment variables
+
+```bash
+launchctl setenv OLLAMA_MAX_LOADED_MODELS 2
+launchctl setenv OLLAMA_CONTEXT_LENGTH 8192
+```
+
+Then restart the Ollama application for changes to take effect.
+
+- `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 for the always-resident embedder, slot 2
+  for the rotating worker model.
+- `OLLAMA_CONTEXT_LENGTH=8192` — minimum recommended context window.
+
+### 3. Pull models
+
+Run the helper script from the foreman repo on the Mac:
+
+```bash
+OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh
+```
+
+This pulls the recommended roster:
+- **nomic-embed-text** — embedder (always resident, slot 1, ~0.3 GB)
+- **qwen3:14b** — parse/data tasks (~9 GB)
+- **qwen3:30b** — agent + code, default worker (~19 GB)
+
+### 4. Prevent sleep during jobs
+
+Use `caffeinate` or `pmset` to prevent the Mac from sleeping while foreman may
+dispatch work:
+
+```bash
+caffeinate -s &
+# Or permanently via System Settings > Energy Saver
+```
+
+### 5. Firewall
+
+Ollama's `:11434` should be accessible only from foreman's IP (the orgrimmar host).
+Use either:
+
+- **Tailscale ACLs** — restrict `:11434` to orgrimmar's Tailscale IP.
+- **macOS firewall** — allow inbound on `:11434` only from orgrimmar.
+- **pf rules** — for more granular control.
+
+## foreman deployment on orgrimmar
+
+### Image
+
+The container image is built by gitea CI (`.gitea/workflows/ci.yaml`) and pushed
+to the registry:
+
+```
+gitea.stevedudenhoeffer.com/steve/foreman:latest
+```
+
+### Komodo deployment
+
+Komodo reads the docker-compose.yml from the steveternet repo at
+`azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml`.
+
+1. Copy `.env.example` to `.env` and fill in values (see below).
+2. Deploy via Komodo's stack sync.
+
+### Configuration
+
+Create `.env` from `.env.example` in the same directory as the compose file:
+
+```bash
+# Required — the Mac's Tailscale or LAN address
+FOREMAN_OLLAMA_URL=http://100.x.x.x:11434
+
+# Optional — bearer token foreman sends to Ollama target
+FOREMAN_OLLAMA_TOKEN=
+
+# Optional — bearer token callers must present to foreman
+FOREMAN_TOKEN=your-secret-here
+
+# Embedder model (must be pulled on Mac)
+FOREMAN_EMBED_MODEL=nomic-embed-text
+
+# Other settings have sensible defaults; see .env.example for the full list.
+```
+
+### Persistence
+
+SQLite is persisted in a named Docker volume `foreman_data` mounted at `/data`.
+The database file is `/data/foreman.db` (WAL mode, pure-Go driver, no CGO).
+
+## Security model
+
+foreman is **not** exposed on a public Traefik entrypoint:
+
+- It gets Traefik labels for **internal hostname routing** only:
+  `foreman.orgrimmar.dudenhoeffer.casa` resolves internally on the LAN/Tailscale.
+- It is **not** in any public DNS.
+- Accessible via LAN and Tailscale only.
+
+### Authentication
+
+- **Inbound (callers to foreman):** optional static bearer token via
+  `FOREMAN_TOKEN`. When set, callers must send `Authorization: Bearer <token>`.
+  The `/healthz` endpoint is always unauthenticated.
+- **Outbound (foreman to Ollama):** optional bearer token via `FOREMAN_OLLAMA_TOKEN`,
+  forwarded to the target on every request.
+- **Webhooks:** optional HMAC-SHA256 signing via `FOREMAN_WEBHOOK_SECRET`. When
+  set, foreman adds `X-Foreman-Signature: sha256=<hex>` to webhook POSTs.
+
+## go-llm usage
+
+foreman is a drop-in Ollama-compatible target for go-llm:
+
+```go
+import "gitea.stevedudenhoeffer.com/steve/go-llm/v2"
+
+model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b")
+```
+
+This uses the synchronous `/api/chat` passthrough. Streaming, tool calling, and
+thinking tokens all work transparently.
+
+For async job submission, use the client package:
+
+```go
+import "gitea.stevedudenhoeffer.com/steve/foreman/client"
+
+c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa",
+    client.WithToken("your-token"),
+)
+result, err := c.Submit(ctx, client.SubmitRequest{
+    Model:    "qwen3:30b",
+    Messages: messages,
+})
+```
+
+## Troubleshooting
+
+### Target unreachable
+
+**Symptom:** `/healthz` returns `{"status":"ok","degraded":true}`, jobs fail with
+connection errors.
+
+**Cause:** The Mac is asleep, Ollama is not running, or the network path is broken.
+
+**Fix:**
+1. Wake the Mac / start Ollama.
+2. Verify connectivity: `curl http://100.x.x.x:11434/api/tags` from orgrimmar.
+3. Check Tailscale status: `tailscale status` on both machines.
+4. Jobs will automatically retry (up to `FOREMAN_MAX_ATTEMPTS`). The poller
+   recovers automatically when the target comes back.
+
+### Model not found (404)
+
+**Symptom:** `/api/chat` or `POST /jobs` returns 404 for a model name.
+
+**Fix:**
+1. Verify the model is pulled on the Mac: `ollama list`.
+2. Check the exact tag — Ollama tags change between versions.
+3. foreman re-polls on a miss; if the model was just pulled, retry after
+   `FOREMAN_POLL_INTERVAL` (default 30s).
+
+### HMAC signature mismatch
+
+**Symptom:** Webhook receiver rejects events with signature errors.
+
+**Fix:**
+1. Verify `FOREMAN_WEBHOOK_SECRET` matches between foreman and the receiver.
+2. The signature covers the raw JSON body; verify the receiver reads the body
+   before parsing.
+
+### Job stuck in loading/working
+
+**Symptom:** A job stays in a non-terminal state indefinitely.
+
+**Cause:** foreman crashed or restarted mid-job.
+
+**Fix:** foreman resets interrupted jobs (loading/working) to queued on startup.
+Restart foreman to recover. Jobs are retried up to `FOREMAN_MAX_ATTEMPTS`.
+
+### SQLite busy/locked errors
+
+**Symptom:** HTTP handlers return 500 with "database is locked".
+
+**Fix:** The SQLite DSN includes `busy_timeout=5000` (5 seconds). If this is
+insufficient under load, increase it. WAL mode ensures readers do not block
+the single writer.
@@ -203,3 +203,26 @@ with the real SQLite-backed job queue and single worker loop.
  - Delegates to existing `ollamaProvider.New()` — zero new code paths.
  - DD#9 added to `v2/CLAUDE.md`.
  - PR: https://gitea.stevedudenhoeffer.com/steve/go-llm/pulls/4
+
+## Phase 6: Deployment infrastructure — 2026-05-23
+
+**Project is deployable.** All deployment artifacts are in place.
+
+- `Dockerfile` — finalized with OCI labels (`image.source`, `image.description`).
+  Multi-stage distroless build, CGO_ENABLED=0, `foreman serve` entrypoint.
+- `.env.example` — finalized with all 10 config keys from `internal/config/`,
+  grouped by function (daemon, model, persistence, polling, webhooks, job lifecycle)
+  with clear comments and example values.
+- `scripts/pull-models.sh` — executable helper to pull the recommended model roster
+  on the Mac (nomic-embed-text, qwen3:14b, qwen3:30b). Prints Mac-side Ollama
+  environment setup instructions.
+- `docs/deploy.md` — full deployment guide covering: topology overview, Mac
+  prerequisites (Ollama config, env vars, model pull, sleep prevention, firewall),
+  orgrimmar deployment (image registry, Komodo, config, persistence), security
+  model (internal-only, no public DNS, bearer tokens, HMAC), go-llm usage
+  (sync + async), and troubleshooting (6 common scenarios).
+- steveternet compose stack — PR to `steve/steveternet` adding
+  `azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml` and `.env.example`.
+  Follows sibling conventions: `web` network (external), `unless-stopped`,
+  gitea registry image, Traefik labels for internal routing, named volume
+  for SQLite persistence, all config via `${VAR}` interpolation.
@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+# Pull the recommended model roster on the Mac.
+# Run this ON the Mac where Ollama is installed.
+set -euo pipefail
+
+OLLAMA_HOST="${OLLAMA_HOST:-http://localhost:11434}"
+
+echo "=== Pulling models to ${OLLAMA_HOST} ==="
+
+# Embedder (always resident, slot 1)
+echo "--- Embedder: nomic-embed-text ---"
+curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"nomic-embed-text"}' | jq -r '.status // empty'
+
+# Worker models (rotate through slot 2)
+echo "--- Worker: qwen3:14b (parse/data) ---"
+curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen3:14b"}' | jq -r '.status // empty'
+
+echo "--- Worker: qwen3:30b (agent+code, default) ---"
+curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen3:30b"}' | jq -r '.status // empty'
+
+# Optional — uncomment if needed:
+# echo "--- Worker: gpt-oss:20b (fast coder) ---"
+# curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"gpt-oss:20b"}' | jq -r '.status // empty'
+# echo "--- Worker: qwen2.5-coder:32b (quality coder, slow) ---"
+# curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen2.5-coder:32b"}' | jq -r '.status // empty'
+
+echo ""
+echo "=== Mac-side Ollama environment (set via launchctl or .zshrc) ==="
+echo "  OLLAMA_MAX_LOADED_MODELS=2"
+echo "  OLLAMA_KEEP_ALIVE=-1        # for the embedder slot"
+echo "  OLLAMA_CONTEXT_LENGTH=8192  # minimum recommended"
+echo ""
+echo "  Example: launchctl setenv OLLAMA_MAX_LOADED_MODELS 2"
+echo "  Then restart Ollama."