# foreman — progress ## Phase 1: Scaffold — 2026-05-23 - Go module initialized (`gitea.stevedudenhoeffer.com/steve/foreman`) - Project layout: `cmd/foreman/`, `internal/config/`, `internal/store/`, `internal/server/` - `internal/config`: loads all `FOREMAN_*` env vars with defaults and validation - `internal/store`: SQLite-backed durable queue (WAL mode, `modernc.org/sqlite`) - `jobs` table: ULID PK, model, payload, state machine, retry tracking, timestamps - `artifacts` table: named typed blobs per job, unique on (job_id, name) - Full CRUD: CreateJob, GetJob, UpdateJobState, ListJobs, CreateArtifact, GetArtifact, GetArtifactsByJob - `internal/server`: stdlib `net/http` server - `GET /healthz` returning `{"status":"ok","degraded":false}` - Optional bearer-token auth middleware (skips /healthz) - `cmd/foreman/main.go`: subcommand dispatch (serve + stubs for submit, jobs, ps) - CI: `.gitea/workflows/ci.yaml` (build, vet, test -race, tidy check) - Dockerfile: multi-stage distroless build - Config files: `.env.example`, `.gitignore` - Tests: config validation, store CRUD + edge cases, server health + auth middleware ## Phase 2: Ollama target client, model poller, native passthrough — 2026-05-23 - `internal/ollama/` — target client package: - Wire types (`types.go`): ChatRequest/Response, EmbedRequest/Response, TagsResponse, PsResponse, ModelInfo, RunningModel — matching Ollama's native JSON API exactly. Polymorphic fields (think, keep_alive, tools, options) use `json.RawMessage` for transparent passthrough fidelity. - `Client` interface (`client.go`): Chat (stream/non-stream), Embed, Tags, Ps, RawChat, RawEmbed. RawChat/RawEmbed return `*http.Response` for zero-copy streaming passthrough. - `httpClient` implementation: auth token injection, NDJSON streaming via `bufio.Scanner` with 4 MB buffer, connection vs HTTP error classification. - Custom error types (`errors.go`): `*ConnectionError` for network failures (retry-eligible), `*HTTPError` for non-2xx responses. `errors.Is`/`errors.As` compatible. - `ModelInventory` (`inventory.go`): mutex-protected in-memory cache of installed and running models. Methods: Models(), HasModel(), ResidentModels(), LastPoll(), Degraded(), Refresh(). Background `Start()` goroutine polls at `FOREMAN_POLL_INTERVAL` (default 30s). On target unreachable: retains last-known inventory, sets `degraded=true`. Clears degraded on recovery. - `internal/server/` — new Ollama passthrough routes: - `GET /api/tags` — serves poller's cached model list - `GET /api/ps` — serves poller's cached running models - `POST /api/embed`, `POST /api/embeddings` — direct concurrent proxy to target, bypasses the chat gate entirely (ADR-0013) - `POST /api/chat` — critical path: validates model (re-poll on miss, 404 if still absent), serializes through a capacity-1 channel gate, proxies to target with NDJSON streaming (`application/x-ndjson`, flushed per chunk) or non-streaming JSON passthrough - `GET /healthz` — now wired to `inventory.Degraded()` for real target status - `cmd/foreman/main.go` — full serve wiring: - Creates Ollama client, starts model poller goroutine, warms embedder (`keep_alive: -1`), creates server with all dependencies, signal-based graceful shutdown via `context.NotifyContext` - Tests (all passing with `-race`): - Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token forwarding, `*ConnectionError` on unreachable target, `*HTTPError` on non-2xx - Inventory: refresh populates models, degraded on failure, model retention, recovery from degraded, Start/cancel lifecycle - Server: tags/ps passthrough, model validation (404 on unknown), non-streaming chat proxy, NDJSON streaming passthrough with correct Content-Type, chat serialization (gate holds concurrent requests to max 1 in-flight), concurrent embed bypass (multiple requests run in parallel), degraded health endpoint, embeddings alias path The Mac is now usable as a go-llm target through foreman: `llm.OllamaCloud(token, WithBaseURL("http://foreman:8080"))` works transparently for chat (streaming + non-streaming), tags, ps, and embeddings. ## Phase 3: Durable queue, single worker, drain-by-model — 2026-05-23 **M0 complete.** The Phase 2 in-flight chat gate (buffered channel) is replaced with the real SQLite-backed job queue and single worker loop. - `internal/store/` — new store methods: - `NextJob(currentModel)`: drain-by-model ordering — prefers jobs matching the currently-resident model to minimize swap cost, then FIFO by created_at. - `IncrementAttempt(id)`: bumps attempt counter and re-queues for retry. - `ResetInterruptedJobs()`: resets loading/working jobs to queued on startup (crash recovery). - `DeleteTerminalJobsBefore(cutoff)`: TTL pruner for old done/failed jobs. - SQLite DSN now includes `_pragma=busy_timeout(5000)` for reliable concurrent access from HTTP handlers + worker. - `internal/worker/` — single worker loop (`worker.go`): - `Worker.Run(ctx)`: main goroutine loop — resets interrupted jobs on startup, then continuously picks the next job using drain-by-model ordering, executes via the Ollama client, stores result + completion artifact, notifies waiters. - `Worker.Wake()`: non-blocking signal for new job availability. - `Notifier`: sync.Map-based completion notification — HTTP handlers register a channel per job ID, the worker closes it on completion. Supports `Register()`, `Complete()`, `Result()`. - Retry semantics: `*ollama.ConnectionError` causes re-queue with incremented attempt; `*ollama.HTTPError` is a terminal failure (no retry). Max attempts configurable via `FOREMAN_MAX_ATTEMPTS` (default 3). - The worker loop never panics — all errors are logged, jobs are marked, loop continues. - `internal/server/` — chat handler rewrite: - `POST /api/chat` now creates a job row (state `queued`), registers a completion waiter, wakes the worker, and blocks until the job reaches a terminal state. Returns the Ollama response on success, 502 on failure. - ULID job IDs generated at submission time (`github.com/oklog/ulid/v2`). - The old `chatGate` (buffered channel) is removed entirely. - `/api/embed` and `/api/embeddings` remain direct concurrent proxies (unchanged from Phase 2, per ADR-0013). - `internal/config/` — new config fields: - `FOREMAN_MAX_ATTEMPTS` (int, default 3) - `FOREMAN_JOB_TTL` (duration, default 24h) - Tests (all passing with `-race`): - Worker: single job execution, serial enforcement, drain-by-model ordering, retry on connection error, max attempts exhaustion, HTTP error terminal failure, interrupted job reset on startup, wake signal, notifier lifecycle. - Store: NextJob drain-by-model, empty queue, IncrementAttempt, ResetInterrupted, DeleteTerminalJobsBefore. - Server: chat model validation (404), non-streaming chat through queue, serialization (max 1 concurrent), context cancellation, embed bypass unchanged. ## Phase 4: Async /jobs surface, webhooks, artifacts — 2026-05-23 **M1 core complete** (minus CLI and go-llm constructor, which are separate work). - `internal/webhook/` — webhook dispatcher: - `Dispatcher.Fire(url, event)`: non-blocking goroutine delivery with exponential backoff retry (1s, 2s, 4s, 8s, 16s — max 5 attempts). - Optional HMAC-SHA256 signing via `FOREMAN_WEBHOOK_SECRET` — sets `X-Foreman-Signature: sha256=` header. - `VerifySignature()`: exported for webhook receivers. - `FormatArtifacts()`: inline (data field) for artifacts <= 256KB, URL reference for larger ones. - Webhook failures are logged and dropped — never block or fail the job (ADR-0005). - `internal/server/` — new routes: - `POST /jobs`: validates model, creates job row with optional `state_webhook_url`, returns `202 Accepted` with `{"job_id":""}`. Fires initial "queued" webhook. Wakes worker. - `GET /jobs/{id}`: returns full job state, result, error, and artifact metadata. 404 for unknown IDs. Artifacts under 256KB are inlined; larger ones get a URL reference. - `GET /jobs/{id}/artifacts/{name}`: serves raw artifact data with stored content type. 404 for unknown job/artifact. - `docs/adr/0014-no-webhooks-on-sync-chat.md`: - `state_webhook_url` is only honored on `POST /jobs`. Sync `/api/chat` does not fire webhooks (ADR-0014). Rationale: the caller already holds a blocking HTTP connection. - `cmd/foreman/main.go` — full serve wiring: - Creates webhook dispatcher, notifier, worker. - Starts worker loop goroutine and TTL pruner goroutine. - TTL pruner runs every `jobTTL/4` (min 1 minute), deletes terminal jobs older than `FOREMAN_JOB_TTL` (default 24h). - Server constructor now receives notifier, worker, and dispatcher. - Tests (all passing with `-race`): - Jobs API: 202 on submit, ULID format, 404 for unknown model, 400 for missing model, 404 for unknown job, job state after completion, artifact retrieval, artifact 404. - Webhooks: full lifecycle events (queued->working->done), 500-returning receiver does not affect job state, HMAC signature verification. - Webhook dispatcher: delivery, retry on 500, non-blocking Fire, HMAC signing, no HMAC when no secret, signature format validation. - Artifacts: small inline, large by URL, empty returns nil. - TTL pruner: deletes old terminal jobs.