# foreman deployment guide ## Overview foreman runs on **orgrimmar** (homelab server), containerized via Komodo/docker-compose, reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a dumb appliance running only Ollama; foreman handles queuing, model inventory, and job lifecycle. ``` orgrimmar (docker) Tailscale / VLAN M1 Pro Mac ┌──────────────┐ ┌─────────────┐ ┌──────────┐ │ foreman │──HTTP───▶│ 100.x.x.x │────────▶│ Ollama │ │ :8080 │ │ :11434 │ │ :11434 │ └──────────────┘ └─────────────┘ └──────────┘ ``` ## Prerequisites on the Mac ### 1. Install and configure Ollama Ollama must be installed and listening on a network-accessible address (not just localhost). Either bind to `0.0.0.0` or the Tailscale IP: ```bash launchctl setenv OLLAMA_HOST 0.0.0.0:11434 ``` ### 2. Set environment variables ```bash launchctl setenv OLLAMA_MAX_LOADED_MODELS 2 launchctl setenv OLLAMA_CONTEXT_LENGTH 8192 ``` Then restart the Ollama application for changes to take effect. - `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 for the always-resident embedder, slot 2 for the rotating worker model. - `OLLAMA_CONTEXT_LENGTH=8192` — minimum recommended context window. ### 3. Pull models Run the helper script from the foreman repo on the Mac: ```bash OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh ``` This pulls the recommended roster: - **nomic-embed-text** — embedder (always resident, slot 1, ~0.3 GB) - **qwen3:14b** — parse/data tasks (~9 GB) - **qwen3:30b** — agent + code, default worker (~19 GB) ### 4. Prevent sleep during jobs Use `caffeinate` or `pmset` to prevent the Mac from sleeping while foreman may dispatch work: ```bash caffeinate -s & # Or permanently via System Settings > Energy Saver ``` ### 5. Firewall Ollama's `:11434` should be accessible only from foreman's IP (the orgrimmar host). Use either: - **Tailscale ACLs** — restrict `:11434` to orgrimmar's Tailscale IP. - **macOS firewall** — allow inbound on `:11434` only from orgrimmar. - **pf rules** — for more granular control. ## foreman deployment on orgrimmar ### Image The container image is built by gitea CI (`.gitea/workflows/ci.yaml`) and pushed to the registry: ``` gitea.stevedudenhoeffer.com/steve/foreman:latest ``` ### Komodo deployment Komodo reads the docker-compose.yml from the steveternet repo at `azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml`. 1. Copy `.env.example` to `.env` and fill in values (see below). 2. Deploy via Komodo's stack sync. ### Configuration Create `.env` from `.env.example` in the same directory as the compose file: ```bash # Required — the Mac's Tailscale or LAN address FOREMAN_OLLAMA_URL=http://100.x.x.x:11434 # Optional — bearer token foreman sends to Ollama target FOREMAN_OLLAMA_TOKEN= # Optional — bearer token callers must present to foreman FOREMAN_TOKEN=your-secret-here # Embedder model (must be pulled on Mac) FOREMAN_EMBED_MODEL=nomic-embed-text # Other settings have sensible defaults; see .env.example for the full list. ``` ### Persistence SQLite is persisted in a named Docker volume `foreman_data` mounted at `/data`. The database file is `/data/foreman.db` (WAL mode, pure-Go driver, no CGO). ## Security model foreman is **not** exposed on a public Traefik entrypoint: - It gets Traefik labels for **internal hostname routing** only: `foreman.orgrimmar.dudenhoeffer.casa` resolves internally on the LAN/Tailscale. - It is **not** in any public DNS. - Accessible via LAN and Tailscale only. ### Authentication - **Inbound (callers to foreman):** optional static bearer token via `FOREMAN_TOKEN`. When set, callers must send `Authorization: Bearer `. The `/healthz` endpoint is always unauthenticated. - **Outbound (foreman to Ollama):** optional bearer token via `FOREMAN_OLLAMA_TOKEN`, forwarded to the target on every request. - **Webhooks:** optional HMAC-SHA256 signing via `FOREMAN_WEBHOOK_SECRET`. When set, foreman adds `X-Foreman-Signature: sha256=` to webhook POSTs. ## go-llm usage foreman is a drop-in Ollama-compatible target for go-llm: ```go import "gitea.stevedudenhoeffer.com/steve/go-llm/v2" model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b") ``` This uses the synchronous `/api/chat` passthrough. Streaming, tool calling, and thinking tokens all work transparently. For async job submission, use the client package: ```go import "gitea.stevedudenhoeffer.com/steve/foreman/client" c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa", client.WithToken("your-token"), ) result, err := c.Submit(ctx, client.SubmitRequest{ Model: "qwen3:30b", Messages: messages, }) ``` ## Troubleshooting ### Target unreachable **Symptom:** `/healthz` returns `{"status":"ok","degraded":true}`, jobs fail with connection errors. **Cause:** The Mac is asleep, Ollama is not running, or the network path is broken. **Fix:** 1. Wake the Mac / start Ollama. 2. Verify connectivity: `curl http://100.x.x.x:11434/api/tags` from orgrimmar. 3. Check Tailscale status: `tailscale status` on both machines. 4. Jobs will automatically retry (up to `FOREMAN_MAX_ATTEMPTS`). The poller recovers automatically when the target comes back. ### Model not found (404) **Symptom:** `/api/chat` or `POST /jobs` returns 404 for a model name. **Fix:** 1. Verify the model is pulled on the Mac: `ollama list`. 2. Check the exact tag — Ollama tags change between versions. 3. foreman re-polls on a miss; if the model was just pulled, retry after `FOREMAN_POLL_INTERVAL` (default 30s). ### HMAC signature mismatch **Symptom:** Webhook receiver rejects events with signature errors. **Fix:** 1. Verify `FOREMAN_WEBHOOK_SECRET` matches between foreman and the receiver. 2. The signature covers the raw JSON body; verify the receiver reads the body before parsing. ### Job stuck in loading/working **Symptom:** A job stays in a non-terminal state indefinitely. **Cause:** foreman crashed or restarted mid-job. **Fix:** foreman resets interrupted jobs (loading/working) to queued on startup. Restart foreman to recover. Jobs are retried up to `FOREMAN_MAX_ATTEMPTS`. ### SQLite busy/locked errors **Symptom:** HTTP handlers return 500 with "database is locked". **Fix:** The SQLite DSN includes `busy_timeout=5000` (5 seconds). If this is insufficient under load, increase it. WAL mode ensures readers do not block the single writer.