Files
steve e119ed325b
CI / Build & Test (push) Failing after 5m53s
CI / Tidy (push) Successful in 9m37s
chore: add deployment docs, model script, and finalize env config
Phase 6 deployment infrastructure: finalize Dockerfile with OCI labels,
improve .env.example with grouped config keys, add scripts/pull-models.sh
for Mac-side model setup, and add docs/deploy.md covering the full
deployment topology, prerequisites, security model, and troubleshooting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:43:10 -04:00

6.6 KiB

foreman deployment guide

Overview

foreman runs on orgrimmar (homelab server), containerized via Komodo/docker-compose, reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a dumb appliance running only Ollama; foreman handles queuing, model inventory, and job lifecycle.

orgrimmar (docker)         Tailscale / VLAN        M1 Pro Mac
┌──────────────┐          ┌─────────────┐         ┌──────────┐
│   foreman    │──HTTP───▶│  100.x.x.x  │────────▶│  Ollama  │
│  :8080       │          │  :11434     │         │  :11434  │
└──────────────┘          └─────────────┘         └──────────┘

Prerequisites on the Mac

1. Install and configure Ollama

Ollama must be installed and listening on a network-accessible address (not just localhost). Either bind to 0.0.0.0 or the Tailscale IP:

launchctl setenv OLLAMA_HOST 0.0.0.0:11434

2. Set environment variables

launchctl setenv OLLAMA_MAX_LOADED_MODELS 2
launchctl setenv OLLAMA_CONTEXT_LENGTH 8192

Then restart the Ollama application for changes to take effect.

  • OLLAMA_MAX_LOADED_MODELS=2 — slot 1 for the always-resident embedder, slot 2 for the rotating worker model.
  • OLLAMA_CONTEXT_LENGTH=8192 — minimum recommended context window.

3. Pull models

Run the helper script from the foreman repo on the Mac:

OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh

This pulls the recommended roster:

  • nomic-embed-text — embedder (always resident, slot 1, ~0.3 GB)
  • qwen3:14b — parse/data tasks (~9 GB)
  • qwen3:30b — agent + code, default worker (~19 GB)

4. Prevent sleep during jobs

Use caffeinate or pmset to prevent the Mac from sleeping while foreman may dispatch work:

caffeinate -s &
# Or permanently via System Settings > Energy Saver

5. Firewall

Ollama's :11434 should be accessible only from foreman's IP (the orgrimmar host). Use either:

  • Tailscale ACLs — restrict :11434 to orgrimmar's Tailscale IP.
  • macOS firewall — allow inbound on :11434 only from orgrimmar.
  • pf rules — for more granular control.

foreman deployment on orgrimmar

Image

The container image is built by gitea CI (.gitea/workflows/ci.yaml) and pushed to the registry:

gitea.stevedudenhoeffer.com/steve/foreman:latest

Komodo deployment

Komodo reads the docker-compose.yml from the steveternet repo at azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml.

  1. Copy .env.example to .env and fill in values (see below).
  2. Deploy via Komodo's stack sync.

Configuration

Create .env from .env.example in the same directory as the compose file:

# Required — the Mac's Tailscale or LAN address
FOREMAN_OLLAMA_URL=http://100.x.x.x:11434

# Optional — bearer token foreman sends to Ollama target
FOREMAN_OLLAMA_TOKEN=

# Optional — bearer token callers must present to foreman
FOREMAN_TOKEN=your-secret-here

# Embedder model (must be pulled on Mac)
FOREMAN_EMBED_MODEL=nomic-embed-text

# Other settings have sensible defaults; see .env.example for the full list.

Persistence

SQLite is persisted in a named Docker volume foreman_data mounted at /data. The database file is /data/foreman.db (WAL mode, pure-Go driver, no CGO).

Security model

foreman is not exposed on a public Traefik entrypoint:

  • It gets Traefik labels for internal hostname routing only: foreman.orgrimmar.dudenhoeffer.casa resolves internally on the LAN/Tailscale.
  • It is not in any public DNS.
  • Accessible via LAN and Tailscale only.

Authentication

  • Inbound (callers to foreman): optional static bearer token via FOREMAN_TOKEN. When set, callers must send Authorization: Bearer <token>. The /healthz endpoint is always unauthenticated.
  • Outbound (foreman to Ollama): optional bearer token via FOREMAN_OLLAMA_TOKEN, forwarded to the target on every request.
  • Webhooks: optional HMAC-SHA256 signing via FOREMAN_WEBHOOK_SECRET. When set, foreman adds X-Foreman-Signature: sha256=<hex> to webhook POSTs.

go-llm usage

foreman is a drop-in Ollama-compatible target for go-llm:

import "gitea.stevedudenhoeffer.com/steve/go-llm/v2"

model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b")

This uses the synchronous /api/chat passthrough. Streaming, tool calling, and thinking tokens all work transparently.

For async job submission, use the client package:

import "gitea.stevedudenhoeffer.com/steve/foreman/client"

c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa",
    client.WithToken("your-token"),
)
result, err := c.Submit(ctx, client.SubmitRequest{
    Model:    "qwen3:30b",
    Messages: messages,
})

Troubleshooting

Target unreachable

Symptom: /healthz returns {"status":"ok","degraded":true}, jobs fail with connection errors.

Cause: The Mac is asleep, Ollama is not running, or the network path is broken.

Fix:

  1. Wake the Mac / start Ollama.
  2. Verify connectivity: curl http://100.x.x.x:11434/api/tags from orgrimmar.
  3. Check Tailscale status: tailscale status on both machines.
  4. Jobs will automatically retry (up to FOREMAN_MAX_ATTEMPTS). The poller recovers automatically when the target comes back.

Model not found (404)

Symptom: /api/chat or POST /jobs returns 404 for a model name.

Fix:

  1. Verify the model is pulled on the Mac: ollama list.
  2. Check the exact tag — Ollama tags change between versions.
  3. foreman re-polls on a miss; if the model was just pulled, retry after FOREMAN_POLL_INTERVAL (default 30s).

HMAC signature mismatch

Symptom: Webhook receiver rejects events with signature errors.

Fix:

  1. Verify FOREMAN_WEBHOOK_SECRET matches between foreman and the receiver.
  2. The signature covers the raw JSON body; verify the receiver reads the body before parsing.

Job stuck in loading/working

Symptom: A job stays in a non-terminal state indefinitely.

Cause: foreman crashed or restarted mid-job.

Fix: foreman resets interrupted jobs (loading/working) to queued on startup. Restart foreman to recover. Jobs are retried up to FOREMAN_MAX_ATTEMPTS.

SQLite busy/locked errors

Symptom: HTTP handlers return 500 with "database is locked".

Fix: The SQLite DSN includes busy_timeout=5000 (5 seconds). If this is insufficient under load, increase it. WAL mode ensures readers do not block the single writer.