Files
foreman/prompts/phase-2.md
T
2026-05-23 16:51:19 -04:00

2.2 KiB

phase-2.md — Ollama target client, model poller, native passthrough

Re-ground: CLAUDE.md + ADR-0003 (API surface), 0007 (model polling), 0012 (streaming), 0002 (unreachable = transient). Plan, get approval, implement.

Objective

Make foreman a working transparent front for its Ollama target — enough that go-llm can use the Mac as a target today, before any queue exists. (Phase 3 will move this through the queue; here it can proxy directly.)

Tasks

  • internal/ollama: a small client to the target (FOREMAN_OLLAMA_URL) behind an interface, covering POST /api/chat (streaming and non-streaming), GET /api/tags, GET /api/ps. Attach the outbound bearer if configured. Wrap errors; classify connection failures distinctly (Phase 3 needs that signal).
  • Model poller (goroutine): poll /api/tags every FOREMAN_POLL_INTERVAL (default 30s) into an in-memory inventory with a mutex; track last-poll time and a degraded flag. On target unreachable, retain last-known inventory and set degraded — do not clear it. Wire degraded state into /healthz.
  • Passthrough handlers in internal/server:
    • GET /api/tags and GET /api/ps served from the poller/target.
    • POST /api/chat: validate the requested model against the inventory (one re-poll on miss, then 4xx if still absent); proxy to the target. Support streaming faithfully (stream the target's chunks straight through; set the right content type). For now this may call the target directly — no queue.
  • Tests: a stub HTTP server standing in for Ollama; assert tags/ps proxy, model validation rejects unknown models, streaming passes chunks through, and the poller flips degraded on target failure and recovers.

Definition of done

  • go build/vet/test -race green.
  • Against a real or stubbed Ollama: curl .../api/tags returns the inventory; a non-streaming and a streaming /api/chat both work end-to-end.
  • Acceptance: from a scratch Go program, llm.Ollama(llm.WithBaseURL("http://<foreman>:8080")) (or llm.OllamaCloud(token, WithBaseURL(...)) if a token is set) completes a chat through foreman. Note this in progress.md.

Wrap up: progress.md, commit on phase-2-passthrough, note what Phase 3 changes (routing this through the queue).