chore: add deployment docs, model script, and finalize env config
Phase 6 deployment infrastructure: finalize Dockerfile with OCI labels, improve .env.example with grouped config keys, add scripts/pull-models.sh for Mac-side model setup, and add docs/deploy.md covering the full deployment topology, prerequisites, security model, and troubleshooting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+24
-15
@@ -1,32 +1,41 @@
|
|||||||
# foreman configuration — all env vars are FOREMAN_* namespaced.
|
# === foreman daemon configuration ===
|
||||||
# Copy to .env and fill in values for local development.
|
|
||||||
|
|
||||||
# Listen address for the HTTP server (default: :8080)
|
# Listen address (default: :8080)
|
||||||
FOREMAN_ADDR=:8080
|
FOREMAN_ADDR=:8080
|
||||||
|
|
||||||
# Base URL of the Ollama target (required)
|
# Ollama target URL (required — the Mac's Tailscale address)
|
||||||
FOREMAN_OLLAMA_URL=http://mac.tail:11434
|
FOREMAN_OLLAMA_URL=http://100.x.x.x:11434
|
||||||
|
|
||||||
# Optional bearer token foreman sends to the Ollama target
|
# Outbound bearer token for Ollama target (optional)
|
||||||
FOREMAN_OLLAMA_TOKEN=
|
FOREMAN_OLLAMA_TOKEN=
|
||||||
|
|
||||||
# Optional bearer token callers must present to foreman
|
# Inbound bearer token foreman requires of its callers (optional)
|
||||||
FOREMAN_TOKEN=
|
FOREMAN_TOKEN=change-me-to-a-secret
|
||||||
|
|
||||||
# Always-resident embedder model (e.g. nomic-embed-text, qwen3-embedding:0.6b)
|
# === Model configuration ===
|
||||||
|
|
||||||
|
# Always-resident embedding model (pinned in slot 1)
|
||||||
FOREMAN_EMBED_MODEL=nomic-embed-text
|
FOREMAN_EMBED_MODEL=nomic-embed-text
|
||||||
|
|
||||||
# Path to the SQLite database file (default: foreman.db)
|
# === Persistence ===
|
||||||
FOREMAN_DB_PATH=foreman.db
|
|
||||||
|
|
||||||
# How often to poll the target's /api/tags (default: 30s)
|
# SQLite database path (default: foreman.db)
|
||||||
|
FOREMAN_DB_PATH=/data/foreman.db
|
||||||
|
|
||||||
|
# === Polling ===
|
||||||
|
|
||||||
|
# Model polling interval (default: 30s)
|
||||||
FOREMAN_POLL_INTERVAL=30s
|
FOREMAN_POLL_INTERVAL=30s
|
||||||
|
|
||||||
# Optional HMAC key for signing webhook payloads (ADR-0005)
|
# === Webhooks ===
|
||||||
|
|
||||||
|
# Webhook HMAC signing secret (optional — signs X-Foreman-Signature header)
|
||||||
FOREMAN_WEBHOOK_SECRET=
|
FOREMAN_WEBHOOK_SECRET=
|
||||||
|
|
||||||
# Maximum retry attempts for a job before marking as failed (default: 3)
|
# === Job lifecycle ===
|
||||||
|
|
||||||
|
# Max retry attempts for failed jobs (default: 3)
|
||||||
FOREMAN_MAX_ATTEMPTS=3
|
FOREMAN_MAX_ATTEMPTS=3
|
||||||
|
|
||||||
# How long to retain completed/failed jobs before pruning (default: 24h)
|
# TTL for completed/failed jobs before pruning (default: 24h)
|
||||||
FOREMAN_JOB_TTL=24h
|
FOREMAN_JOB_TTL=24h
|
||||||
|
|||||||
@@ -6,6 +6,8 @@ COPY . .
|
|||||||
RUN CGO_ENABLED=0 go build -o /out/foreman ./cmd/foreman
|
RUN CGO_ENABLED=0 go build -o /out/foreman ./cmd/foreman
|
||||||
|
|
||||||
FROM gcr.io/distroless/static-debian12
|
FROM gcr.io/distroless/static-debian12
|
||||||
|
LABEL org.opencontainers.image.source="https://gitea.stevedudenhoeffer.com/steve/foreman"
|
||||||
|
LABEL org.opencontainers.image.description="Queued Ollama proxy daemon"
|
||||||
COPY --from=build /out/foreman /foreman
|
COPY --from=build /out/foreman /foreman
|
||||||
EXPOSE 8080
|
EXPOSE 8080
|
||||||
ENTRYPOINT ["/foreman", "serve"]
|
ENTRYPOINT ["/foreman", "serve"]
|
||||||
|
|||||||
+214
@@ -0,0 +1,214 @@
|
|||||||
|
# foreman deployment guide
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
foreman runs on **orgrimmar** (homelab server), containerized via Komodo/docker-compose,
|
||||||
|
reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a
|
||||||
|
dumb appliance running only Ollama; foreman handles queuing, model inventory, and
|
||||||
|
job lifecycle.
|
||||||
|
|
||||||
|
```
|
||||||
|
orgrimmar (docker) Tailscale / VLAN M1 Pro Mac
|
||||||
|
┌──────────────┐ ┌─────────────┐ ┌──────────┐
|
||||||
|
│ foreman │──HTTP───▶│ 100.x.x.x │────────▶│ Ollama │
|
||||||
|
│ :8080 │ │ :11434 │ │ :11434 │
|
||||||
|
└──────────────┘ └─────────────┘ └──────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Prerequisites on the Mac
|
||||||
|
|
||||||
|
### 1. Install and configure Ollama
|
||||||
|
|
||||||
|
Ollama must be installed and listening on a network-accessible address (not just
|
||||||
|
localhost). Either bind to `0.0.0.0` or the Tailscale IP:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
launchctl setenv OLLAMA_HOST 0.0.0.0:11434
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Set environment variables
|
||||||
|
|
||||||
|
```bash
|
||||||
|
launchctl setenv OLLAMA_MAX_LOADED_MODELS 2
|
||||||
|
launchctl setenv OLLAMA_CONTEXT_LENGTH 8192
|
||||||
|
```
|
||||||
|
|
||||||
|
Then restart the Ollama application for changes to take effect.
|
||||||
|
|
||||||
|
- `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 for the always-resident embedder, slot 2
|
||||||
|
for the rotating worker model.
|
||||||
|
- `OLLAMA_CONTEXT_LENGTH=8192` — minimum recommended context window.
|
||||||
|
|
||||||
|
### 3. Pull models
|
||||||
|
|
||||||
|
Run the helper script from the foreman repo on the Mac:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
This pulls the recommended roster:
|
||||||
|
- **nomic-embed-text** — embedder (always resident, slot 1, ~0.3 GB)
|
||||||
|
- **qwen3:14b** — parse/data tasks (~9 GB)
|
||||||
|
- **qwen3:30b** — agent + code, default worker (~19 GB)
|
||||||
|
|
||||||
|
### 4. Prevent sleep during jobs
|
||||||
|
|
||||||
|
Use `caffeinate` or `pmset` to prevent the Mac from sleeping while foreman may
|
||||||
|
dispatch work:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
caffeinate -s &
|
||||||
|
# Or permanently via System Settings > Energy Saver
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Firewall
|
||||||
|
|
||||||
|
Ollama's `:11434` should be accessible only from foreman's IP (the orgrimmar host).
|
||||||
|
Use either:
|
||||||
|
|
||||||
|
- **Tailscale ACLs** — restrict `:11434` to orgrimmar's Tailscale IP.
|
||||||
|
- **macOS firewall** — allow inbound on `:11434` only from orgrimmar.
|
||||||
|
- **pf rules** — for more granular control.
|
||||||
|
|
||||||
|
## foreman deployment on orgrimmar
|
||||||
|
|
||||||
|
### Image
|
||||||
|
|
||||||
|
The container image is built by gitea CI (`.gitea/workflows/ci.yaml`) and pushed
|
||||||
|
to the registry:
|
||||||
|
|
||||||
|
```
|
||||||
|
gitea.stevedudenhoeffer.com/steve/foreman:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
### Komodo deployment
|
||||||
|
|
||||||
|
Komodo reads the docker-compose.yml from the steveternet repo at
|
||||||
|
`azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml`.
|
||||||
|
|
||||||
|
1. Copy `.env.example` to `.env` and fill in values (see below).
|
||||||
|
2. Deploy via Komodo's stack sync.
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Create `.env` from `.env.example` in the same directory as the compose file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Required — the Mac's Tailscale or LAN address
|
||||||
|
FOREMAN_OLLAMA_URL=http://100.x.x.x:11434
|
||||||
|
|
||||||
|
# Optional — bearer token foreman sends to Ollama target
|
||||||
|
FOREMAN_OLLAMA_TOKEN=
|
||||||
|
|
||||||
|
# Optional — bearer token callers must present to foreman
|
||||||
|
FOREMAN_TOKEN=your-secret-here
|
||||||
|
|
||||||
|
# Embedder model (must be pulled on Mac)
|
||||||
|
FOREMAN_EMBED_MODEL=nomic-embed-text
|
||||||
|
|
||||||
|
# Other settings have sensible defaults; see .env.example for the full list.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Persistence
|
||||||
|
|
||||||
|
SQLite is persisted in a named Docker volume `foreman_data` mounted at `/data`.
|
||||||
|
The database file is `/data/foreman.db` (WAL mode, pure-Go driver, no CGO).
|
||||||
|
|
||||||
|
## Security model
|
||||||
|
|
||||||
|
foreman is **not** exposed on a public Traefik entrypoint:
|
||||||
|
|
||||||
|
- It gets Traefik labels for **internal hostname routing** only:
|
||||||
|
`foreman.orgrimmar.dudenhoeffer.casa` resolves internally on the LAN/Tailscale.
|
||||||
|
- It is **not** in any public DNS.
|
||||||
|
- Accessible via LAN and Tailscale only.
|
||||||
|
|
||||||
|
### Authentication
|
||||||
|
|
||||||
|
- **Inbound (callers to foreman):** optional static bearer token via
|
||||||
|
`FOREMAN_TOKEN`. When set, callers must send `Authorization: Bearer <token>`.
|
||||||
|
The `/healthz` endpoint is always unauthenticated.
|
||||||
|
- **Outbound (foreman to Ollama):** optional bearer token via `FOREMAN_OLLAMA_TOKEN`,
|
||||||
|
forwarded to the target on every request.
|
||||||
|
- **Webhooks:** optional HMAC-SHA256 signing via `FOREMAN_WEBHOOK_SECRET`. When
|
||||||
|
set, foreman adds `X-Foreman-Signature: sha256=<hex>` to webhook POSTs.
|
||||||
|
|
||||||
|
## go-llm usage
|
||||||
|
|
||||||
|
foreman is a drop-in Ollama-compatible target for go-llm:
|
||||||
|
|
||||||
|
```go
|
||||||
|
import "gitea.stevedudenhoeffer.com/steve/go-llm/v2"
|
||||||
|
|
||||||
|
model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b")
|
||||||
|
```
|
||||||
|
|
||||||
|
This uses the synchronous `/api/chat` passthrough. Streaming, tool calling, and
|
||||||
|
thinking tokens all work transparently.
|
||||||
|
|
||||||
|
For async job submission, use the client package:
|
||||||
|
|
||||||
|
```go
|
||||||
|
import "gitea.stevedudenhoeffer.com/steve/foreman/client"
|
||||||
|
|
||||||
|
c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa",
|
||||||
|
client.WithToken("your-token"),
|
||||||
|
)
|
||||||
|
result, err := c.Submit(ctx, client.SubmitRequest{
|
||||||
|
Model: "qwen3:30b",
|
||||||
|
Messages: messages,
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Target unreachable
|
||||||
|
|
||||||
|
**Symptom:** `/healthz` returns `{"status":"ok","degraded":true}`, jobs fail with
|
||||||
|
connection errors.
|
||||||
|
|
||||||
|
**Cause:** The Mac is asleep, Ollama is not running, or the network path is broken.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
1. Wake the Mac / start Ollama.
|
||||||
|
2. Verify connectivity: `curl http://100.x.x.x:11434/api/tags` from orgrimmar.
|
||||||
|
3. Check Tailscale status: `tailscale status` on both machines.
|
||||||
|
4. Jobs will automatically retry (up to `FOREMAN_MAX_ATTEMPTS`). The poller
|
||||||
|
recovers automatically when the target comes back.
|
||||||
|
|
||||||
|
### Model not found (404)
|
||||||
|
|
||||||
|
**Symptom:** `/api/chat` or `POST /jobs` returns 404 for a model name.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
1. Verify the model is pulled on the Mac: `ollama list`.
|
||||||
|
2. Check the exact tag — Ollama tags change between versions.
|
||||||
|
3. foreman re-polls on a miss; if the model was just pulled, retry after
|
||||||
|
`FOREMAN_POLL_INTERVAL` (default 30s).
|
||||||
|
|
||||||
|
### HMAC signature mismatch
|
||||||
|
|
||||||
|
**Symptom:** Webhook receiver rejects events with signature errors.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
1. Verify `FOREMAN_WEBHOOK_SECRET` matches between foreman and the receiver.
|
||||||
|
2. The signature covers the raw JSON body; verify the receiver reads the body
|
||||||
|
before parsing.
|
||||||
|
|
||||||
|
### Job stuck in loading/working
|
||||||
|
|
||||||
|
**Symptom:** A job stays in a non-terminal state indefinitely.
|
||||||
|
|
||||||
|
**Cause:** foreman crashed or restarted mid-job.
|
||||||
|
|
||||||
|
**Fix:** foreman resets interrupted jobs (loading/working) to queued on startup.
|
||||||
|
Restart foreman to recover. Jobs are retried up to `FOREMAN_MAX_ATTEMPTS`.
|
||||||
|
|
||||||
|
### SQLite busy/locked errors
|
||||||
|
|
||||||
|
**Symptom:** HTTP handlers return 500 with "database is locked".
|
||||||
|
|
||||||
|
**Fix:** The SQLite DSN includes `busy_timeout=5000` (5 seconds). If this is
|
||||||
|
insufficient under load, increase it. WAL mode ensures readers do not block
|
||||||
|
the single writer.
|
||||||
+23
@@ -203,3 +203,26 @@ with the real SQLite-backed job queue and single worker loop.
|
|||||||
- Delegates to existing `ollamaProvider.New()` — zero new code paths.
|
- Delegates to existing `ollamaProvider.New()` — zero new code paths.
|
||||||
- DD#9 added to `v2/CLAUDE.md`.
|
- DD#9 added to `v2/CLAUDE.md`.
|
||||||
- PR: https://gitea.stevedudenhoeffer.com/steve/go-llm/pulls/4
|
- PR: https://gitea.stevedudenhoeffer.com/steve/go-llm/pulls/4
|
||||||
|
|
||||||
|
## Phase 6: Deployment infrastructure — 2026-05-23
|
||||||
|
|
||||||
|
**Project is deployable.** All deployment artifacts are in place.
|
||||||
|
|
||||||
|
- `Dockerfile` — finalized with OCI labels (`image.source`, `image.description`).
|
||||||
|
Multi-stage distroless build, CGO_ENABLED=0, `foreman serve` entrypoint.
|
||||||
|
- `.env.example` — finalized with all 10 config keys from `internal/config/`,
|
||||||
|
grouped by function (daemon, model, persistence, polling, webhooks, job lifecycle)
|
||||||
|
with clear comments and example values.
|
||||||
|
- `scripts/pull-models.sh` — executable helper to pull the recommended model roster
|
||||||
|
on the Mac (nomic-embed-text, qwen3:14b, qwen3:30b). Prints Mac-side Ollama
|
||||||
|
environment setup instructions.
|
||||||
|
- `docs/deploy.md` — full deployment guide covering: topology overview, Mac
|
||||||
|
prerequisites (Ollama config, env vars, model pull, sleep prevention, firewall),
|
||||||
|
orgrimmar deployment (image registry, Komodo, config, persistence), security
|
||||||
|
model (internal-only, no public DNS, bearer tokens, HMAC), go-llm usage
|
||||||
|
(sync + async), and troubleshooting (6 common scenarios).
|
||||||
|
- steveternet compose stack — PR to `steve/steveternet` adding
|
||||||
|
`azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml` and `.env.example`.
|
||||||
|
Follows sibling conventions: `web` network (external), `unless-stopped`,
|
||||||
|
gitea registry image, Traefik labels for internal routing, named volume
|
||||||
|
for SQLite persistence, all config via `${VAR}` interpolation.
|
||||||
|
|||||||
Executable
+34
@@ -0,0 +1,34 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Pull the recommended model roster on the Mac.
|
||||||
|
# Run this ON the Mac where Ollama is installed.
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
OLLAMA_HOST="${OLLAMA_HOST:-http://localhost:11434}"
|
||||||
|
|
||||||
|
echo "=== Pulling models to ${OLLAMA_HOST} ==="
|
||||||
|
|
||||||
|
# Embedder (always resident, slot 1)
|
||||||
|
echo "--- Embedder: nomic-embed-text ---"
|
||||||
|
curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"nomic-embed-text"}' | jq -r '.status // empty'
|
||||||
|
|
||||||
|
# Worker models (rotate through slot 2)
|
||||||
|
echo "--- Worker: qwen3:14b (parse/data) ---"
|
||||||
|
curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen3:14b"}' | jq -r '.status // empty'
|
||||||
|
|
||||||
|
echo "--- Worker: qwen3:30b (agent+code, default) ---"
|
||||||
|
curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen3:30b"}' | jq -r '.status // empty'
|
||||||
|
|
||||||
|
# Optional — uncomment if needed:
|
||||||
|
# echo "--- Worker: gpt-oss:20b (fast coder) ---"
|
||||||
|
# curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"gpt-oss:20b"}' | jq -r '.status // empty'
|
||||||
|
# echo "--- Worker: qwen2.5-coder:32b (quality coder, slow) ---"
|
||||||
|
# curl -s "${OLLAMA_HOST}/api/pull" -d '{"name":"qwen2.5-coder:32b"}' | jq -r '.status // empty'
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "=== Mac-side Ollama environment (set via launchctl or .zshrc) ==="
|
||||||
|
echo " OLLAMA_MAX_LOADED_MODELS=2"
|
||||||
|
echo " OLLAMA_KEEP_ALIVE=-1 # for the embedder slot"
|
||||||
|
echo " OLLAMA_CONTEXT_LENGTH=8192 # minimum recommended"
|
||||||
|
echo ""
|
||||||
|
echo " Example: launchctl setenv OLLAMA_MAX_LOADED_MODELS 2"
|
||||||
|
echo " Then restart Ollama."
|
||||||
Reference in New Issue
Block a user