chore: add deployment docs, model script, and finalize env config

Phase 6 deployment infrastructure: finalize Dockerfile with OCI labels, improve .env.example with grouped config keys, add scripts/pull-models.sh for Mac-side model setup, and add docs/deploy.md covering the full deployment topology, prerequisites, security model, and troubleshooting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:43:10 -04:00
parent 4759a06d1b
commit e119ed325b
5 changed files with 297 additions and 15 deletions
@@ -0,0 +1,214 @@
+# foreman deployment guide
+
+## Overview
+
+foreman runs on **orgrimmar** (homelab server), containerized via Komodo/docker-compose,
+reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a
+dumb appliance running only Ollama; foreman handles queuing, model inventory, and
+job lifecycle.
+
+```
+orgrimmar (docker)         Tailscale / VLAN        M1 Pro Mac
+┌──────────────┐          ┌─────────────┐         ┌──────────┐
+│   foreman    │──HTTP───▶│  100.x.x.x  │────────▶│  Ollama  │
+│  :8080       │          │  :11434     │         │  :11434  │
+└──────────────┘          └─────────────┘         └──────────┘
+```
+
+## Prerequisites on the Mac
+
+### 1. Install and configure Ollama
+
+Ollama must be installed and listening on a network-accessible address (not just
+localhost). Either bind to `0.0.0.0` or the Tailscale IP:
+
+```bash
+launchctl setenv OLLAMA_HOST 0.0.0.0:11434
+```
+
+### 2. Set environment variables
+
+```bash
+launchctl setenv OLLAMA_MAX_LOADED_MODELS 2
+launchctl setenv OLLAMA_CONTEXT_LENGTH 8192
+```
+
+Then restart the Ollama application for changes to take effect.
+
+- `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 for the always-resident embedder, slot 2
+  for the rotating worker model.
+- `OLLAMA_CONTEXT_LENGTH=8192` — minimum recommended context window.
+
+### 3. Pull models
+
+Run the helper script from the foreman repo on the Mac:
+
+```bash
+OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh
+```
+
+This pulls the recommended roster:
+- **nomic-embed-text** — embedder (always resident, slot 1, ~0.3 GB)
+- **qwen3:14b** — parse/data tasks (~9 GB)
+- **qwen3:30b** — agent + code, default worker (~19 GB)
+
+### 4. Prevent sleep during jobs
+
+Use `caffeinate` or `pmset` to prevent the Mac from sleeping while foreman may
+dispatch work:
+
+```bash
+caffeinate -s &
+# Or permanently via System Settings > Energy Saver
+```
+
+### 5. Firewall
+
+Ollama's `:11434` should be accessible only from foreman's IP (the orgrimmar host).
+Use either:
+
+- **Tailscale ACLs** — restrict `:11434` to orgrimmar's Tailscale IP.
+- **macOS firewall** — allow inbound on `:11434` only from orgrimmar.
+- **pf rules** — for more granular control.
+
+## foreman deployment on orgrimmar
+
+### Image
+
+The container image is built by gitea CI (`.gitea/workflows/ci.yaml`) and pushed
+to the registry:
+
+```
+gitea.stevedudenhoeffer.com/steve/foreman:latest
+```
+
+### Komodo deployment
+
+Komodo reads the docker-compose.yml from the steveternet repo at
+`azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml`.
+
+1. Copy `.env.example` to `.env` and fill in values (see below).
+2. Deploy via Komodo's stack sync.
+
+### Configuration
+
+Create `.env` from `.env.example` in the same directory as the compose file:
+
+```bash
+# Required — the Mac's Tailscale or LAN address
+FOREMAN_OLLAMA_URL=http://100.x.x.x:11434
+
+# Optional — bearer token foreman sends to Ollama target
+FOREMAN_OLLAMA_TOKEN=
+
+# Optional — bearer token callers must present to foreman
+FOREMAN_TOKEN=your-secret-here
+
+# Embedder model (must be pulled on Mac)
+FOREMAN_EMBED_MODEL=nomic-embed-text
+
+# Other settings have sensible defaults; see .env.example for the full list.
+```
+
+### Persistence
+
+SQLite is persisted in a named Docker volume `foreman_data` mounted at `/data`.
+The database file is `/data/foreman.db` (WAL mode, pure-Go driver, no CGO).
+
+## Security model
+
+foreman is **not** exposed on a public Traefik entrypoint:
+
+- It gets Traefik labels for **internal hostname routing** only:
+  `foreman.orgrimmar.dudenhoeffer.casa` resolves internally on the LAN/Tailscale.
+- It is **not** in any public DNS.
+- Accessible via LAN and Tailscale only.
+
+### Authentication
+
+- **Inbound (callers to foreman):** optional static bearer token via
+  `FOREMAN_TOKEN`. When set, callers must send `Authorization: Bearer <token>`.
+  The `/healthz` endpoint is always unauthenticated.
+- **Outbound (foreman to Ollama):** optional bearer token via `FOREMAN_OLLAMA_TOKEN`,
+  forwarded to the target on every request.
+- **Webhooks:** optional HMAC-SHA256 signing via `FOREMAN_WEBHOOK_SECRET`. When
+  set, foreman adds `X-Foreman-Signature: sha256=<hex>` to webhook POSTs.
+
+## go-llm usage
+
+foreman is a drop-in Ollama-compatible target for go-llm:
+
+```go
+import "gitea.stevedudenhoeffer.com/steve/go-llm/v2"
+
+model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b")
+```
+
+This uses the synchronous `/api/chat` passthrough. Streaming, tool calling, and
+thinking tokens all work transparently.
+
+For async job submission, use the client package:
+
+```go
+import "gitea.stevedudenhoeffer.com/steve/foreman/client"
+
+c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa",
+    client.WithToken("your-token"),
+)
+result, err := c.Submit(ctx, client.SubmitRequest{
+    Model:    "qwen3:30b",
+    Messages: messages,
+})
+```
+
+## Troubleshooting
+
+### Target unreachable
+
+**Symptom:** `/healthz` returns `{"status":"ok","degraded":true}`, jobs fail with
+connection errors.
+
+**Cause:** The Mac is asleep, Ollama is not running, or the network path is broken.
+
+**Fix:**
+1. Wake the Mac / start Ollama.
+2. Verify connectivity: `curl http://100.x.x.x:11434/api/tags` from orgrimmar.
+3. Check Tailscale status: `tailscale status` on both machines.
+4. Jobs will automatically retry (up to `FOREMAN_MAX_ATTEMPTS`). The poller
+   recovers automatically when the target comes back.
+
+### Model not found (404)
+
+**Symptom:** `/api/chat` or `POST /jobs` returns 404 for a model name.
+
+**Fix:**
+1. Verify the model is pulled on the Mac: `ollama list`.
+2. Check the exact tag — Ollama tags change between versions.
+3. foreman re-polls on a miss; if the model was just pulled, retry after
+   `FOREMAN_POLL_INTERVAL` (default 30s).
+
+### HMAC signature mismatch
+
+**Symptom:** Webhook receiver rejects events with signature errors.
+
+**Fix:**
+1. Verify `FOREMAN_WEBHOOK_SECRET` matches between foreman and the receiver.
+2. The signature covers the raw JSON body; verify the receiver reads the body
+   before parsing.
+
+### Job stuck in loading/working
+
+**Symptom:** A job stays in a non-terminal state indefinitely.
+
+**Cause:** foreman crashed or restarted mid-job.
+
+**Fix:** foreman resets interrupted jobs (loading/working) to queued on startup.
+Restart foreman to recover. Jobs are retried up to `FOREMAN_MAX_ATTEMPTS`.
+
+### SQLite busy/locked errors
+
+**Symptom:** HTTP handlers return 500 with "database is locked".
+
+**Fix:** The SQLite DSN includes `busy_timeout=5000` (5 seconds). If this is
+insufficient under load, increase it. WAL mode ensures readers do not block
+the single writer.