steve 6fd050855a feat: add durable queue, single worker, and drain-by-model scheduling
Replace the Phase 2 in-flight chat gate (buffered channel) with a real
SQLite-backed job queue and single worker loop. Every /api/chat request
now creates a job row, blocks until the worker completes it, and returns
the result transparently.

Key changes:
- internal/store: NextJob (drain-by-model ordering), IncrementAttempt,
  ResetInterruptedJobs, DeleteTerminalJobsBefore; busy_timeout pragma
- internal/worker: single-threaded worker loop with Notifier for sync
  handler completion signaling; retry on ConnectionError, terminal fail
  on HTTPError; crash recovery resets interrupted jobs on startup
- internal/webhook: dispatcher infrastructure for async webhook delivery
- internal/server: chat handler rewritten to enqueue+wait; old chatGate
  removed; embeddings remain direct concurrent proxies (ADR-0013)
- internal/config: FOREMAN_MAX_ATTEMPTS, FOREMAN_JOB_TTL

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:29:32 -04:00
2026-05-23 16:41:20 -04:00
2026-05-23 16:51:19 -04:00
2026-05-23 16:41:20 -04:00

foreman

A small, always-on Go daemon that fronts one Ollama target. It turns a single Ollama instance into a queued, observable job endpoint: it polls the target's installed models, serializes work through the target (managing model swaps), assigns every job an ID, and reports progress via webhooks.

On the wire it speaks native Ollama, so it doubles as a drop-in go-llm target.

Quickstart

# Set the required Ollama target URL
export FOREMAN_OLLAMA_URL=http://mac.tail:11434

# Run directly
go run ./cmd/foreman serve

# Or build and run
go build -o foreman ./cmd/foreman
./foreman serve

Docker

docker build -t foreman .
docker run -e FOREMAN_OLLAMA_URL=http://mac.tail:11434 -p 8080:8080 foreman

Configuration

All configuration is via environment variables, namespaced under FOREMAN_*. See .env.example for the full list.

Variable Default Description
FOREMAN_ADDR :8080 Listen address
FOREMAN_OLLAMA_URL (required) Ollama target base URL
FOREMAN_OLLAMA_TOKEN (empty) Bearer token sent to the target
FOREMAN_TOKEN (empty) Bearer token callers must present
FOREMAN_EMBED_MODEL (empty) Always-resident embedder model
FOREMAN_DB_PATH foreman.db SQLite database path
FOREMAN_POLL_INTERVAL 30s Target model poll interval
FOREMAN_WEBHOOK_SECRET (empty) HMAC key for webhook signing

Health check

curl http://localhost:8080/healthz
# {"status":"ok","degraded":false}

Architecture

See docs/adr/ for design decisions. Key points:

  • One daemon per Ollama target (ADR-0001)
  • SQLite-backed durable job queue in WAL mode (ADR-0008)
  • Single worker loop with drain-by-model scheduling (ADR-0009)
  • Native Ollama passthrough + async /jobs surface (ADR-0003, ADR-0004)
  • Embeddings bypass the queue entirely (ADR-0013)
S
Description
🪓 Small always-on Go daemon that fronts one Ollama target — turns it into a queued, observable job endpoint (model-swap serialization, job IDs, progress webhooks). Speaks native Ollama on the wire, so it's a drop-in target for any Ollama client.
Readme MIT 244 KiB
Languages
Go 99.1%
Shell 0.7%
Dockerfile 0.2%