43 lines
2.2 KiB
Markdown
43 lines
2.2 KiB
Markdown
# phase-2.md — Ollama target client, model poller, native passthrough
|
|
|
|
Re-ground: `CLAUDE.md` + ADR-0003 (API surface), 0007 (model polling), 0012
|
|
(streaming), 0002 (unreachable = transient). Plan, get approval, implement.
|
|
|
|
## Objective
|
|
|
|
Make foreman a working transparent front for its Ollama target — enough that
|
|
`go-llm` can use the Mac as a target *today*, before any queue exists. (Phase 3
|
|
will move this through the queue; here it can proxy directly.)
|
|
|
|
## Tasks
|
|
|
|
- `internal/ollama`: a small client to the target (`FOREMAN_OLLAMA_URL`) behind
|
|
an interface, covering `POST /api/chat` (streaming and non-streaming),
|
|
`GET /api/tags`, `GET /api/ps`. Attach the outbound bearer if configured. Wrap
|
|
errors; classify connection failures distinctly (Phase 3 needs that signal).
|
|
- Model poller (goroutine): poll `/api/tags` every `FOREMAN_POLL_INTERVAL`
|
|
(default 30s) into an in-memory inventory with a mutex; track last-poll time
|
|
and a degraded flag. On target unreachable, retain last-known inventory and set
|
|
degraded — do not clear it. Wire degraded state into `/healthz`.
|
|
- Passthrough handlers in `internal/server`:
|
|
- `GET /api/tags` and `GET /api/ps` served from the poller/target.
|
|
- `POST /api/chat`: validate the requested model against the inventory (one
|
|
re-poll on miss, then 4xx if still absent); proxy to the target. Support
|
|
streaming faithfully (stream the target's chunks straight through; set the
|
|
right content type). For now this may call the target directly — no queue.
|
|
- Tests: a stub HTTP server standing in for Ollama; assert tags/ps proxy,
|
|
model validation rejects unknown models, streaming passes chunks through, and
|
|
the poller flips degraded on target failure and recovers.
|
|
|
|
## Definition of done
|
|
|
|
- `go build/vet/test -race` green.
|
|
- Against a real or stubbed Ollama: `curl .../api/tags` returns the inventory;
|
|
a non-streaming and a streaming `/api/chat` both work end-to-end.
|
|
- Acceptance: from a scratch Go program, `llm.Ollama(llm.WithBaseURL("http://<foreman>:8080"))`
|
|
(or `llm.OllamaCloud(token, WithBaseURL(...))` if a token is set) completes a
|
|
chat through foreman. Note this in `progress.md`.
|
|
|
|
Wrap up: `progress.md`, commit on `phase-2-passthrough`, note what Phase 3 changes
|
|
(routing this through the queue).
|