feat: add Ollama target client, model poller, and native passthrough
Phase 2 of foreman: the daemon now acts as a transparent Ollama proxy. - internal/ollama: Client interface and HTTP implementation for chat (streaming + non-streaming), embed, tags, ps with auth forwarding, NDJSON streaming via bufio.Scanner, and connection vs HTTP error classification via custom error types. - internal/ollama: ModelInventory with background poller for /api/tags and /api/ps, degraded mode on target unreachable with model retention, automatic recovery on reconnect. - internal/server: Passthrough routes (/api/chat, /api/tags, /api/ps, /api/embed, /api/embeddings) with model validation, chat serialization gate (capacity-1 channel), concurrent embedding bypass (ADR-0013), NDJSON streaming with per-chunk flush, and degraded health reporting. - cmd/foreman: Full serve wiring with Ollama client, poller goroutine, embedder warmup (keep_alive:-1), and signal-based shutdown. The Mac is now usable as a go-llm target through foreman. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+49
@@ -17,3 +17,52 @@
|
||||
- Dockerfile: multi-stage distroless build
|
||||
- Config files: `.env.example`, `.gitignore`
|
||||
- Tests: config validation, store CRUD + edge cases, server health + auth middleware
|
||||
|
||||
## Phase 2: Ollama target client, model poller, native passthrough — 2026-05-23
|
||||
|
||||
- `internal/ollama/` — target client package:
|
||||
- Wire types (`types.go`): ChatRequest/Response, EmbedRequest/Response, TagsResponse,
|
||||
PsResponse, ModelInfo, RunningModel — matching Ollama's native JSON API exactly.
|
||||
Polymorphic fields (think, keep_alive, tools, options) use `json.RawMessage`
|
||||
for transparent passthrough fidelity.
|
||||
- `Client` interface (`client.go`): Chat (stream/non-stream), Embed, Tags, Ps,
|
||||
RawChat, RawEmbed. RawChat/RawEmbed return `*http.Response` for zero-copy
|
||||
streaming passthrough.
|
||||
- `httpClient` implementation: auth token injection, NDJSON streaming via
|
||||
`bufio.Scanner` with 4 MB buffer, connection vs HTTP error classification.
|
||||
- Custom error types (`errors.go`): `*ConnectionError` for network failures
|
||||
(retry-eligible), `*HTTPError` for non-2xx responses. `errors.Is`/`errors.As`
|
||||
compatible.
|
||||
- `ModelInventory` (`inventory.go`): mutex-protected in-memory cache of installed
|
||||
and running models. Methods: Models(), HasModel(), ResidentModels(), LastPoll(),
|
||||
Degraded(), Refresh(). Background `Start()` goroutine polls at
|
||||
`FOREMAN_POLL_INTERVAL` (default 30s). On target unreachable: retains last-known
|
||||
inventory, sets `degraded=true`. Clears degraded on recovery.
|
||||
- `internal/server/` — new Ollama passthrough routes:
|
||||
- `GET /api/tags` — serves poller's cached model list
|
||||
- `GET /api/ps` — serves poller's cached running models
|
||||
- `POST /api/embed`, `POST /api/embeddings` — direct concurrent proxy to target,
|
||||
bypasses the chat gate entirely (ADR-0013)
|
||||
- `POST /api/chat` — critical path: validates model (re-poll on miss, 404 if
|
||||
still absent), serializes through a capacity-1 channel gate, proxies to target
|
||||
with NDJSON streaming (`application/x-ndjson`, flushed per chunk) or
|
||||
non-streaming JSON passthrough
|
||||
- `GET /healthz` — now wired to `inventory.Degraded()` for real target status
|
||||
- `cmd/foreman/main.go` — full serve wiring:
|
||||
- Creates Ollama client, starts model poller goroutine, warms embedder
|
||||
(`keep_alive: -1`), creates server with all dependencies, signal-based
|
||||
graceful shutdown via `context.NotifyContext`
|
||||
- Tests (all passing with `-race`):
|
||||
- Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token
|
||||
forwarding, `*ConnectionError` on unreachable target, `*HTTPError` on non-2xx
|
||||
- Inventory: refresh populates models, degraded on failure, model retention,
|
||||
recovery from degraded, Start/cancel lifecycle
|
||||
- Server: tags/ps passthrough, model validation (404 on unknown), non-streaming
|
||||
chat proxy, NDJSON streaming passthrough with correct Content-Type, chat
|
||||
serialization (gate holds concurrent requests to max 1 in-flight), concurrent
|
||||
embed bypass (multiple requests run in parallel), degraded health endpoint,
|
||||
embeddings alias path
|
||||
|
||||
The Mac is now usable as a go-llm target through foreman:
|
||||
`llm.OllamaCloud(token, WithBaseURL("http://foreman:8080"))` works transparently
|
||||
for chat (streaming + non-streaming), tags, ps, and embeddings.
|
||||
|
||||
Reference in New Issue
Block a user