foreman/prompts/phase-2.md

# phase-2.md — Ollama target client, model poller, native passthrough

Re-ground: `CLAUDE.md` + ADR-0003 (API surface), 0007 (model polling), 0012
(streaming), 0002 (unreachable = transient). Plan, get approval, implement.

## Objective

Make foreman a working transparent front for its Ollama target — enough that
`go-llm` can use the Mac as a target *today*, before any queue exists. (Phase 3
will move this through the queue; here it can proxy directly.)

## Tasks

- `internal/ollama`: a small client to the target (`FOREMAN_OLLAMA_URL`) behind
  an interface, covering `POST /api/chat` (streaming and non-streaming),
  `GET /api/tags`, `GET /api/ps`. Attach the outbound bearer if configured. Wrap
  errors; classify connection failures distinctly (Phase 3 needs that signal).
- Model poller (goroutine): poll `/api/tags` every `FOREMAN_POLL_INTERVAL`
  (default 30s) into an in-memory inventory with a mutex; track last-poll time
  and a degraded flag. On target unreachable, retain last-known inventory and set
  degraded — do not clear it. Wire degraded state into `/healthz`.
- Passthrough handlers in `internal/server`:
  - `GET /api/tags` and `GET /api/ps` served from the poller/target.
  - `POST /api/chat`: validate the requested model against the inventory (one
    re-poll on miss, then 4xx if still absent); proxy to the target. Support
    streaming faithfully (stream the target's chunks straight through; set the
    right content type). For now this may call the target directly — no queue.
- Tests: a stub HTTP server standing in for Ollama; assert tags/ps proxy,
  model validation rejects unknown models, streaming passes chunks through, and
  the poller flips degraded on target failure and recovers.

## Definition of done

- `go build/vet/test -race` green.
- Against a real or stubbed Ollama: `curl .../api/tags` returns the inventory;
  a non-streaming and a streaming `/api/chat` both work end-to-end.
- Acceptance: from a scratch Go program, `llm.Ollama(llm.WithBaseURL("http://<foreman>:8080"))`
  (or `llm.OllamaCloud(token, WithBaseURL(...))` if a token is set) completes a
  chat through foreman. Note this in `progress.md`.

Wrap up: `progress.md`, commit on `phase-2-passthrough`, note what Phase 3 changes
(routing this through the queue).