add initial prompts

2026-05-23 16:51:19 -04:00
parent 8fde024281
commit d5702f7a75
7 changed files with 383 additions and 0 deletions
@@ -0,0 +1,42 @@
+# phase-2.md — Ollama target client, model poller, native passthrough
+
+Re-ground: `CLAUDE.md` + ADR-0003 (API surface), 0007 (model polling), 0012
+(streaming), 0002 (unreachable = transient). Plan, get approval, implement.
+
+## Objective
+
+Make foreman a working transparent front for its Ollama target — enough that
+`go-llm` can use the Mac as a target *today*, before any queue exists. (Phase 3
+will move this through the queue; here it can proxy directly.)
+
+## Tasks
+
+- `internal/ollama`: a small client to the target (`FOREMAN_OLLAMA_URL`) behind
+  an interface, covering `POST /api/chat` (streaming and non-streaming),
+  `GET /api/tags`, `GET /api/ps`. Attach the outbound bearer if configured. Wrap
+  errors; classify connection failures distinctly (Phase 3 needs that signal).
+- Model poller (goroutine): poll `/api/tags` every `FOREMAN_POLL_INTERVAL`
+  (default 30s) into an in-memory inventory with a mutex; track last-poll time
+  and a degraded flag. On target unreachable, retain last-known inventory and set
+  degraded — do not clear it. Wire degraded state into `/healthz`.
+- Passthrough handlers in `internal/server`:
+  - `GET /api/tags` and `GET /api/ps` served from the poller/target.
+  - `POST /api/chat`: validate the requested model against the inventory (one
+    re-poll on miss, then 4xx if still absent); proxy to the target. Support
+    streaming faithfully (stream the target's chunks straight through; set the
+    right content type). For now this may call the target directly — no queue.
+- Tests: a stub HTTP server standing in for Ollama; assert tags/ps proxy,
+  model validation rejects unknown models, streaming passes chunks through, and
+  the poller flips degraded on target failure and recovers.
+
+## Definition of done
+
+- `go build/vet/test -race` green.
+- Against a real or stubbed Ollama: `curl .../api/tags` returns the inventory;
+  a non-streaming and a streaming `/api/chat` both work end-to-end.
+- Acceptance: from a scratch Go program, `llm.Ollama(llm.WithBaseURL("http://<foreman>:8080"))`
+  (or `llm.OllamaCloud(token, WithBaseURL(...))` if a token is set) completes a
+  chat through foreman. Note this in `progress.md`.
+
+Wrap up: `progress.md`, commit on `phase-2-passthrough`, note what Phase 3 changes
+(routing this through the queue).