steve/foreman

Fork 0

Files

T

steve e119ed325b

CI / Build & Test (push) Failing after 5m53s

Details

CI / Tidy (push) Successful in 9m37s

Details

chore: add deployment docs, model script, and finalize env config

Phase 6 deployment infrastructure: finalize Dockerfile with OCI labels,
improve .env.example with grouped config keys, add scripts/pull-models.sh
for Mac-side model setup, and add docs/deploy.md covering the full
deployment topology, prerequisites, security model, and troubleshooting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 18:43:10 -04:00

6.6 KiB

Raw Permalink Blame History

foreman deployment guide

Overview

foreman runs on orgrimmar (homelab server), containerized via Komodo/docker-compose, reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a dumb appliance running only Ollama; foreman handles queuing, model inventory, and job lifecycle.

orgrimmar (docker)         Tailscale / VLAN        M1 Pro Mac
┌──────────────┐          ┌─────────────┐         ┌──────────┐
│   foreman    │──HTTP───▶│  100.x.x.x  │────────▶│  Ollama  │
│  :8080       │          │  :11434     │         │  :11434  │
└──────────────┘          └─────────────┘         └──────────┘

Prerequisites on the Mac

1. Install and configure Ollama

Ollama must be installed and listening on a network-accessible address (not just localhost). Either bind to 0.0.0.0 or the Tailscale IP:

launchctl setenv OLLAMA_HOST 0.0.0.0:11434

2. Set environment variables

launchctl setenv OLLAMA_MAX_LOADED_MODELS 2
launchctl setenv OLLAMA_CONTEXT_LENGTH 8192

Then restart the Ollama application for changes to take effect.

OLLAMA_MAX_LOADED_MODELS=2 — slot 1 for the always-resident embedder, slot 2 for the rotating worker model.
OLLAMA_CONTEXT_LENGTH=8192 — minimum recommended context window.

3. Pull models

Run the helper script from the foreman repo on the Mac:

OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh

This pulls the recommended roster:

nomic-embed-text — embedder (always resident, slot 1, ~0.3 GB)
qwen3:14b — parse/data tasks (~9 GB)
qwen3:30b — agent + code, default worker (~19 GB)

4. Prevent sleep during jobs

Use caffeinate or pmset to prevent the Mac from sleeping while foreman may dispatch work:

caffeinate -s &
# Or permanently via System Settings > Energy Saver

5. Firewall

Ollama's :11434 should be accessible only from foreman's IP (the orgrimmar host). Use either:

Tailscale ACLs — restrict :11434 to orgrimmar's Tailscale IP.
macOS firewall — allow inbound on :11434 only from orgrimmar.
pf rules — for more granular control.

foreman deployment on orgrimmar

Image

The container image is built by gitea CI (.gitea/workflows/ci.yaml) and pushed to the registry:

gitea.stevedudenhoeffer.com/steve/foreman:latest

Komodo deployment

Komodo reads the docker-compose.yml from the steveternet repo at azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml.

Copy .env.example to .env and fill in values (see below).
Deploy via Komodo's stack sync.

Configuration

Create .env from .env.example in the same directory as the compose file:

# Required — the Mac's Tailscale or LAN address
FOREMAN_OLLAMA_URL=http://100.x.x.x:11434

# Optional — bearer token foreman sends to Ollama target
FOREMAN_OLLAMA_TOKEN=

# Optional — bearer token callers must present to foreman
FOREMAN_TOKEN=your-secret-here

# Embedder model (must be pulled on Mac)
FOREMAN_EMBED_MODEL=nomic-embed-text

# Other settings have sensible defaults; see .env.example for the full list.

Persistence

SQLite is persisted in a named Docker volume foreman_data mounted at /data. The database file is /data/foreman.db (WAL mode, pure-Go driver, no CGO).

Security model

foreman is not exposed on a public Traefik entrypoint:

It gets Traefik labels for internal hostname routing only: foreman.orgrimmar.dudenhoeffer.casa resolves internally on the LAN/Tailscale.
It is not in any public DNS.
Accessible via LAN and Tailscale only.

Authentication

Inbound (callers to foreman): optional static bearer token via FOREMAN_TOKEN. When set, callers must send Authorization: Bearer <token>. The /healthz endpoint is always unauthenticated.
Outbound (foreman to Ollama): optional bearer token via FOREMAN_OLLAMA_TOKEN, forwarded to the target on every request.
Webhooks: optional HMAC-SHA256 signing via FOREMAN_WEBHOOK_SECRET. When set, foreman adds X-Foreman-Signature: sha256=<hex> to webhook POSTs.

go-llm usage

foreman is a drop-in Ollama-compatible target for go-llm:

import "gitea.stevedudenhoeffer.com/steve/go-llm/v2"

model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b")

This uses the synchronous /api/chat passthrough. Streaming, tool calling, and thinking tokens all work transparently.

For async job submission, use the client package:

import "gitea.stevedudenhoeffer.com/steve/foreman/client"

c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa",
    client.WithToken("your-token"),
)
result, err := c.Submit(ctx, client.SubmitRequest{
    Model:    "qwen3:30b",
    Messages: messages,
})

Troubleshooting

Target unreachable

Symptom: /healthz returns {"status":"ok","degraded":true}, jobs fail with connection errors.

Cause: The Mac is asleep, Ollama is not running, or the network path is broken.

Fix:

Wake the Mac / start Ollama.
Verify connectivity: curl http://100.x.x.x:11434/api/tags from orgrimmar.
Check Tailscale status: tailscale status on both machines.
Jobs will automatically retry (up to FOREMAN_MAX_ATTEMPTS). The poller recovers automatically when the target comes back.

Model not found (404)

Symptom: /api/chat or POST /jobs returns 404 for a model name.

Fix:

Verify the model is pulled on the Mac: ollama list.
Check the exact tag — Ollama tags change between versions.
foreman re-polls on a miss; if the model was just pulled, retry after FOREMAN_POLL_INTERVAL (default 30s).

HMAC signature mismatch

Symptom: Webhook receiver rejects events with signature errors.

Fix:

Verify FOREMAN_WEBHOOK_SECRET matches between foreman and the receiver.
The signature covers the raw JSON body; verify the receiver reads the body before parsing.

Job stuck in loading/working

Symptom: A job stays in a non-terminal state indefinitely.

Cause: foreman crashed or restarted mid-job.

Fix: foreman resets interrupted jobs (loading/working) to queued on startup. Restart foreman to recover. Jobs are retried up to FOREMAN_MAX_ATTEMPTS.

SQLite busy/locked errors

Symptom: HTTP handlers return 500 with "database is locked".

Fix: The SQLite DSN includes busy_timeout=5000 (5 seconds). If this is insufficient under load, increase it. WAL mode ensures readers do not block the single writer.

6.6 KiB Raw Permalink Blame History

foreman deployment guide

Overview

Prerequisites on the Mac

1. Install and configure Ollama

2. Set environment variables

3. Pull models

4. Prevent sleep during jobs

5. Firewall

foreman deployment on orgrimmar

Image

Komodo deployment

Configuration

Persistence

Security model

Authentication

go-llm usage

Troubleshooting

Target unreachable

Model not found (404)

HMAC signature mismatch

Job stuck in loading/working

SQLite busy/locked errors

6.6 KiB

Raw Permalink Blame History