foreman/progress.md

# foreman — progress

## Phase 1: Scaffold — 2026-05-23

- Go module initialized (`gitea.stevedudenhoeffer.com/steve/foreman`)
- Project layout: `cmd/foreman/`, `internal/config/`, `internal/store/`, `internal/server/`
- `internal/config`: loads all `FOREMAN_*` env vars with defaults and validation
- `internal/store`: SQLite-backed durable queue (WAL mode, `modernc.org/sqlite`)
  - `jobs` table: ULID PK, model, payload, state machine, retry tracking, timestamps
  - `artifacts` table: named typed blobs per job, unique on (job_id, name)
  - Full CRUD: CreateJob, GetJob, UpdateJobState, ListJobs, CreateArtifact, GetArtifact, GetArtifactsByJob
- `internal/server`: stdlib `net/http` server
  - `GET /healthz` returning `{"status":"ok","degraded":false}`
  - Optional bearer-token auth middleware (skips /healthz)
- `cmd/foreman/main.go`: subcommand dispatch (serve + stubs for submit, jobs, ps)
- CI: `.gitea/workflows/ci.yaml` (build, vet, test -race, tidy check)
- Dockerfile: multi-stage distroless build
- Config files: `.env.example`, `.gitignore`
- Tests: config validation, store CRUD + edge cases, server health + auth middleware

## Phase 2: Ollama target client, model poller, native passthrough — 2026-05-23

- `internal/ollama/` — target client package:
  - Wire types (`types.go`): ChatRequest/Response, EmbedRequest/Response, TagsResponse,
    PsResponse, ModelInfo, RunningModel — matching Ollama's native JSON API exactly.
    Polymorphic fields (think, keep_alive, tools, options) use `json.RawMessage`
    for transparent passthrough fidelity.
  - `Client` interface (`client.go`): Chat (stream/non-stream), Embed, Tags, Ps,
    RawChat, RawEmbed. RawChat/RawEmbed return `*http.Response` for zero-copy
    streaming passthrough.
  - `httpClient` implementation: auth token injection, NDJSON streaming via
    `bufio.Scanner` with 4 MB buffer, connection vs HTTP error classification.
  - Custom error types (`errors.go`): `*ConnectionError` for network failures
    (retry-eligible), `*HTTPError` for non-2xx responses. `errors.Is`/`errors.As`
    compatible.
  - `ModelInventory` (`inventory.go`): mutex-protected in-memory cache of installed
    and running models. Methods: Models(), HasModel(), ResidentModels(), LastPoll(),
    Degraded(), Refresh(). Background `Start()` goroutine polls at
    `FOREMAN_POLL_INTERVAL` (default 30s). On target unreachable: retains last-known
    inventory, sets `degraded=true`. Clears degraded on recovery.
- `internal/server/` — new Ollama passthrough routes:
  - `GET /api/tags` — serves poller's cached model list
  - `GET /api/ps` — serves poller's cached running models
  - `POST /api/embed`, `POST /api/embeddings` — direct concurrent proxy to target,
    bypasses the chat gate entirely (ADR-0013)
  - `POST /api/chat` — critical path: validates model (re-poll on miss, 404 if
    still absent), serializes through a capacity-1 channel gate, proxies to target
    with NDJSON streaming (`application/x-ndjson`, flushed per chunk) or
    non-streaming JSON passthrough
  - `GET /healthz` — now wired to `inventory.Degraded()` for real target status
- `cmd/foreman/main.go` — full serve wiring:
  - Creates Ollama client, starts model poller goroutine, warms embedder
    (`keep_alive: -1`), creates server with all dependencies, signal-based
    graceful shutdown via `context.NotifyContext`
- Tests (all passing with `-race`):
  - Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token
    forwarding, `*ConnectionError` on unreachable target, `*HTTPError` on non-2xx
  - Inventory: refresh populates models, degraded on failure, model retention,
    recovery from degraded, Start/cancel lifecycle
  - Server: tags/ps passthrough, model validation (404 on unknown), non-streaming
    chat proxy, NDJSON streaming passthrough with correct Content-Type, chat
    serialization (gate holds concurrent requests to max 1 in-flight), concurrent
    embed bypass (multiple requests run in parallel), degraded health endpoint,
    embeddings alias path

The Mac is now usable as a go-llm target through foreman:
`llm.OllamaCloud(token, WithBaseURL("http://foreman:8080"))` works transparently
for chat (streaming + non-streaming), tags, ps, and embeddings.

## Phase 3: Durable queue, single worker, drain-by-model — 2026-05-23

**M0 complete.** The Phase 2 in-flight chat gate (buffered channel) is replaced
with the real SQLite-backed job queue and single worker loop.

- `internal/store/` — new store methods:
  - `NextJob(currentModel)`: drain-by-model ordering — prefers jobs matching the
    currently-resident model to minimize swap cost, then FIFO by created_at.
  - `IncrementAttempt(id)`: bumps attempt counter and re-queues for retry.
  - `ResetInterruptedJobs()`: resets loading/working jobs to queued on startup
    (crash recovery).
  - `DeleteTerminalJobsBefore(cutoff)`: TTL pruner for old done/failed jobs.
  - SQLite DSN now includes `_pragma=busy_timeout(5000)` for reliable concurrent
    access from HTTP handlers + worker.

- `internal/worker/` — single worker loop (`worker.go`):
  - `Worker.Run(ctx)`: main goroutine loop — resets interrupted jobs on startup,
    then continuously picks the next job using drain-by-model ordering, executes
    via the Ollama client, stores result + completion artifact, notifies waiters.
  - `Worker.Wake()`: non-blocking signal for new job availability.
  - `Notifier`: sync.Map-based completion notification — HTTP handlers register
    a channel per job ID, the worker closes it on completion. Supports
    `Register()`, `Complete()`, `Result()`.
  - Retry semantics: `*ollama.ConnectionError` causes re-queue with incremented
    attempt; `*ollama.HTTPError` is a terminal failure (no retry). Max attempts
    configurable via `FOREMAN_MAX_ATTEMPTS` (default 3).
  - The worker loop never panics — all errors are logged, jobs are marked, loop
    continues.

- `internal/server/` — chat handler rewrite:
  - `POST /api/chat` now creates a job row (state `queued`), registers a
    completion waiter, wakes the worker, and blocks until the job reaches a
    terminal state. Returns the Ollama response on success, 502 on failure.
  - ULID job IDs generated at submission time (`github.com/oklog/ulid/v2`).
  - The old `chatGate` (buffered channel) is removed entirely.
  - `/api/embed` and `/api/embeddings` remain direct concurrent proxies (unchanged
    from Phase 2, per ADR-0013).

- `internal/config/` — new config fields:
  - `FOREMAN_MAX_ATTEMPTS` (int, default 3)
  - `FOREMAN_JOB_TTL` (duration, default 24h)

- Tests (all passing with `-race`):
  - Worker: single job execution, serial enforcement, drain-by-model ordering,
    retry on connection error, max attempts exhaustion, HTTP error terminal
    failure, interrupted job reset on startup, wake signal, notifier lifecycle.
  - Store: NextJob drain-by-model, empty queue, IncrementAttempt, ResetInterrupted,
    DeleteTerminalJobsBefore.
  - Server: chat model validation (404), non-streaming chat through queue,
    serialization (max 1 concurrent), context cancellation, embed bypass unchanged.