foreman/docs/deploy.md

# foreman deployment guide

## Overview

foreman runs on **orgrimmar** (homelab server), containerized via Komodo/docker-compose,
reaching the Mac's Ollama instance over the trusted VLAN or Tailscale. The Mac is a
dumb appliance running only Ollama; foreman handles queuing, model inventory, and
job lifecycle.

```
orgrimmar (docker)         Tailscale / VLAN        M1 Pro Mac
┌──────────────┐          ┌─────────────┐         ┌──────────┐
│   foreman    │──HTTP───▶│  100.x.x.x  │────────▶│  Ollama  │
│  :8080       │          │  :11434     │         │  :11434  │
└──────────────┘          └─────────────┘         └──────────┘
```

## Prerequisites on the Mac

### 1. Install and configure Ollama

Ollama must be installed and listening on a network-accessible address (not just
localhost). Either bind to `0.0.0.0` or the Tailscale IP:

```bash
launchctl setenv OLLAMA_HOST 0.0.0.0:11434
```

### 2. Set environment variables

```bash
launchctl setenv OLLAMA_MAX_LOADED_MODELS 2
launchctl setenv OLLAMA_CONTEXT_LENGTH 8192
```

Then restart the Ollama application for changes to take effect.

- `OLLAMA_MAX_LOADED_MODELS=2` — slot 1 for the always-resident embedder, slot 2
  for the rotating worker model.
- `OLLAMA_CONTEXT_LENGTH=8192` — minimum recommended context window.

### 3. Pull models

Run the helper script from the foreman repo on the Mac:

```bash
OLLAMA_HOST=http://localhost:11434 ./scripts/pull-models.sh
```

This pulls the recommended roster:
- **nomic-embed-text** — embedder (always resident, slot 1, ~0.3 GB)
- **qwen3:14b** — parse/data tasks (~9 GB)
- **qwen3:30b** — agent + code, default worker (~19 GB)

### 4. Prevent sleep during jobs

Use `caffeinate` or `pmset` to prevent the Mac from sleeping while foreman may
dispatch work:

```bash
caffeinate -s &
# Or permanently via System Settings > Energy Saver
```

### 5. Firewall

Ollama's `:11434` should be accessible only from foreman's IP (the orgrimmar host).
Use either:

- **Tailscale ACLs** — restrict `:11434` to orgrimmar's Tailscale IP.
- **macOS firewall** — allow inbound on `:11434` only from orgrimmar.
- **pf rules** — for more granular control.

## foreman deployment on orgrimmar

### Image

The container image is built by gitea CI (`.gitea/workflows/ci.yaml`) and pushed
to the registry:

```
gitea.stevedudenhoeffer.com/steve/foreman:latest
```

### Komodo deployment

Komodo reads the docker-compose.yml from the steveternet repo at
`azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml`.

1. Copy `.env.example` to `.env` and fill in values (see below).
2. Deploy via Komodo's stack sync.

### Configuration

Create `.env` from `.env.example` in the same directory as the compose file:

```bash
# Required — the Mac's Tailscale or LAN address
FOREMAN_OLLAMA_URL=http://100.x.x.x:11434

# Optional — bearer token foreman sends to Ollama target
FOREMAN_OLLAMA_TOKEN=

# Optional — bearer token callers must present to foreman
FOREMAN_TOKEN=your-secret-here

# Embedder model (must be pulled on Mac)
FOREMAN_EMBED_MODEL=nomic-embed-text

# Other settings have sensible defaults; see .env.example for the full list.
```

### Persistence

SQLite is persisted in a named Docker volume `foreman_data` mounted at `/data`.
The database file is `/data/foreman.db` (WAL mode, pure-Go driver, no CGO).

## Security model

foreman is **not** exposed on a public Traefik entrypoint:

- It gets Traefik labels for **internal hostname routing** only:
  `foreman.orgrimmar.dudenhoeffer.casa` resolves internally on the LAN/Tailscale.
- It is **not** in any public DNS.
- Accessible via LAN and Tailscale only.

### Authentication

- **Inbound (callers to foreman):** optional static bearer token via
  `FOREMAN_TOKEN`. When set, callers must send `Authorization: Bearer <token>`.
  The `/healthz` endpoint is always unauthenticated.
- **Outbound (foreman to Ollama):** optional bearer token via `FOREMAN_OLLAMA_TOKEN`,
  forwarded to the target on every request.
- **Webhooks:** optional HMAC-SHA256 signing via `FOREMAN_WEBHOOK_SECRET`. When
  set, foreman adds `X-Foreman-Signature: sha256=<hex>` to webhook POSTs.

## go-llm usage

foreman is a drop-in Ollama-compatible target for go-llm:

```go
import "gitea.stevedudenhoeffer.com/steve/go-llm/v2"

model := llm.Foreman("http://foreman.orgrimmar.dudenhoeffer.casa", token).Model("qwen3:30b")
```

This uses the synchronous `/api/chat` passthrough. Streaming, tool calling, and
thinking tokens all work transparently.

For async job submission, use the client package:

```go
import "gitea.stevedudenhoeffer.com/steve/foreman/client"

c := client.New("http://foreman.orgrimmar.dudenhoeffer.casa",
    client.WithToken("your-token"),
)
result, err := c.Submit(ctx, client.SubmitRequest{
    Model:    "qwen3:30b",
    Messages: messages,
})
```

## Troubleshooting

### Target unreachable

**Symptom:** `/healthz` returns `{"status":"ok","degraded":true}`, jobs fail with
connection errors.

**Cause:** The Mac is asleep, Ollama is not running, or the network path is broken.

**Fix:**
1. Wake the Mac / start Ollama.
2. Verify connectivity: `curl http://100.x.x.x:11434/api/tags` from orgrimmar.
3. Check Tailscale status: `tailscale status` on both machines.
4. Jobs will automatically retry (up to `FOREMAN_MAX_ATTEMPTS`). The poller
   recovers automatically when the target comes back.

### Model not found (404)

**Symptom:** `/api/chat` or `POST /jobs` returns 404 for a model name.

**Fix:**
1. Verify the model is pulled on the Mac: `ollama list`.
2. Check the exact tag — Ollama tags change between versions.
3. foreman re-polls on a miss; if the model was just pulled, retry after
   `FOREMAN_POLL_INTERVAL` (default 30s).

### HMAC signature mismatch

**Symptom:** Webhook receiver rejects events with signature errors.

**Fix:**
1. Verify `FOREMAN_WEBHOOK_SECRET` matches between foreman and the receiver.
2. The signature covers the raw JSON body; verify the receiver reads the body
   before parsing.

### Job stuck in loading/working

**Symptom:** A job stays in a non-terminal state indefinitely.

**Cause:** foreman crashed or restarted mid-job.

**Fix:** foreman resets interrupted jobs (loading/working) to queued on startup.
Restart foreman to recover. Jobs are retried up to `FOREMAN_MAX_ATTEMPTS`.

### SQLite busy/locked errors

**Symptom:** HTTP handlers return 500 with "database is locked".

**Fix:** The SQLite DSN includes `busy_timeout=5000` (5 seconds). If this is
insufficient under load, increase it. WAL mode ensures readers do not block
the single writer.