Phase 6 deployment infrastructure: finalize Dockerfile with OCI labels, improve .env.example with grouped config keys, add scripts/pull-models.sh for Mac-side model setup, and add docs/deploy.md covering the full deployment topology, prerequisites, security model, and troubleshooting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.6 KiB
foreman deployment guide
Overview
foreman runs on orgrimmar (homelab server), containerized via Komodo/docker-compose, reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a dumb appliance running only Ollama; foreman handles queuing, model inventory, and job lifecycle.
orgrimmar (docker) Tailscale / VLAN M1 Pro Mac
┌──────────────┐ ┌─────────────┐ ┌──────────┐
│ foreman │──HTTP───▶│ 100.x.x.x │────────▶│ Ollama │
│ :8080 │ │ :11434 │ │ :11434 │
└──────────────┘ └─────────────┘ └──────────┘
Prerequisites on the Mac
1. Install and configure Ollama
Ollama must be installed and listening on a network-accessible address (not just
localhost). Either bind to 0.0.0.0 or the Tailscale IP:
launchctl setenv OLLAMA_HOST 0.0.0.0:11434
2. Set environment variables
launchctl setenv OLLAMA_MAX_LOADED_MODELS 2
launchctl setenv OLLAMA_CONTEXT_LENGTH 8192
Then restart the Ollama application for changes to take effect.
OLLAMA_MAX_LOADED_MODELS=2— slot 1 for the always-resident embedder, slot 2 for the rotating worker model.OLLAMA_CONTEXT_LENGTH=8192— minimum recommended context window.
3. Pull models
Run the helper script from the foreman repo on the Mac:
OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh
This pulls the recommended roster:
- nomic-embed-text — embedder (always resident, slot 1, ~0.3 GB)
- qwen3:14b — parse/data tasks (~9 GB)
- qwen3:30b — agent + code, default worker (~19 GB)
4. Prevent sleep during jobs
Use caffeinate or pmset to prevent the Mac from sleeping while foreman may
dispatch work:
caffeinate -s &
# Or permanently via System Settings > Energy Saver
5. Firewall
Ollama's :11434 should be accessible only from foreman's IP (the orgrimmar host).
Use either:
- Tailscale ACLs — restrict
:11434to orgrimmar's Tailscale IP. - macOS firewall — allow inbound on
:11434only from orgrimmar. - pf rules — for more granular control.
foreman deployment on orgrimmar
Image
The container image is built by gitea CI (.gitea/workflows/ci.yaml) and pushed
to the registry:
gitea.stevedudenhoeffer.com/steve/foreman:latest
Komodo deployment
Komodo reads the docker-compose.yml from the steveternet repo at
azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml.
- Copy
.env.exampleto.envand fill in values (see below). - Deploy via Komodo's stack sync.
Configuration
Create .env from .env.example in the same directory as the compose file:
# Required — the Mac's Tailscale or LAN address
FOREMAN_OLLAMA_URL=http://100.x.x.x:11434
# Optional — bearer token foreman sends to Ollama target
FOREMAN_OLLAMA_TOKEN=
# Optional — bearer token callers must present to foreman
FOREMAN_TOKEN=your-secret-here
# Embedder model (must be pulled on Mac)
FOREMAN_EMBED_MODEL=nomic-embed-text
# Other settings have sensible defaults; see .env.example for the full list.
Persistence
SQLite is persisted in a named Docker volume foreman_data mounted at /data.
The database file is /data/foreman.db (WAL mode, pure-Go driver, no CGO).
Security model
foreman is not exposed on a public Traefik entrypoint:
- It gets Traefik labels for internal hostname routing only:
foreman.orgrimmar.dudenhoeffer.casaresolves internally on the LAN/Tailscale. - It is not in any public DNS.
- Accessible via LAN and Tailscale only.
Authentication
- Inbound (callers to foreman): optional static bearer token via
FOREMAN_TOKEN. When set, callers must sendAuthorization: Bearer <token>. The/healthzendpoint is always unauthenticated. - Outbound (foreman to Ollama): optional bearer token via
FOREMAN_OLLAMA_TOKEN, forwarded to the target on every request. - Webhooks: optional HMAC-SHA256 signing via
FOREMAN_WEBHOOK_SECRET. When set, foreman addsX-Foreman-Signature: sha256=<hex>to webhook POSTs.
go-llm usage
foreman is a drop-in Ollama-compatible target for go-llm:
import "gitea.stevedudenhoeffer.com/steve/go-llm/v2"
model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b")
This uses the synchronous /api/chat passthrough. Streaming, tool calling, and
thinking tokens all work transparently.
For async job submission, use the client package:
import "gitea.stevedudenhoeffer.com/steve/foreman/client"
c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa",
client.WithToken("your-token"),
)
result, err := c.Submit(ctx, client.SubmitRequest{
Model: "qwen3:30b",
Messages: messages,
})
Troubleshooting
Target unreachable
Symptom: /healthz returns {"status":"ok","degraded":true}, jobs fail with
connection errors.
Cause: The Mac is asleep, Ollama is not running, or the network path is broken.
Fix:
- Wake the Mac / start Ollama.
- Verify connectivity:
curl http://100.x.x.x:11434/api/tagsfrom orgrimmar. - Check Tailscale status:
tailscale statuson both machines. - Jobs will automatically retry (up to
FOREMAN_MAX_ATTEMPTS). The poller recovers automatically when the target comes back.
Model not found (404)
Symptom: /api/chat or POST /jobs returns 404 for a model name.
Fix:
- Verify the model is pulled on the Mac:
ollama list. - Check the exact tag — Ollama tags change between versions.
- foreman re-polls on a miss; if the model was just pulled, retry after
FOREMAN_POLL_INTERVAL(default 30s).
HMAC signature mismatch
Symptom: Webhook receiver rejects events with signature errors.
Fix:
- Verify
FOREMAN_WEBHOOK_SECRETmatches between foreman and the receiver. - The signature covers the raw JSON body; verify the receiver reads the body before parsing.
Job stuck in loading/working
Symptom: A job stays in a non-terminal state indefinitely.
Cause: foreman crashed or restarted mid-job.
Fix: foreman resets interrupted jobs (loading/working) to queued on startup.
Restart foreman to recover. Jobs are retried up to FOREMAN_MAX_ATTEMPTS.
SQLite busy/locked errors
Symptom: HTTP handlers return 500 with "database is locked".
Fix: The SQLite DSN includes busy_timeout=5000 (5 seconds). If this is
insufficient under load, increase it. WAL mode ensures readers do not block
the single writer.