Files

T

steve 27f196d333 feat: add Ollama target client, model poller, and native passthrough

Phase 2 of foreman: the daemon now acts as a transparent Ollama proxy.

- internal/ollama: Client interface and HTTP implementation for chat
  (streaming + non-streaming), embed, tags, ps with auth forwarding,
  NDJSON streaming via bufio.Scanner, and connection vs HTTP error
  classification via custom error types.
- internal/ollama: ModelInventory with background poller for /api/tags
  and /api/ps, degraded mode on target unreachable with model retention,
  automatic recovery on reconnect.
- internal/server: Passthrough routes (/api/chat, /api/tags, /api/ps,
  /api/embed, /api/embeddings) with model validation, chat serialization
  gate (capacity-1 channel), concurrent embedding bypass (ADR-0013),
  NDJSON streaming with per-chunk flush, and degraded health reporting.
- cmd/foreman: Full serve wiring with Ollama client, poller goroutine,
  embedder warmup (keep_alive:-1), and signal-based shutdown.

The Mac is now usable as a go-llm target through foreman.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 18:07:33 -04:00

4.2 KiB

Raw Blame History

foreman — progress

Phase 1: Scaffold — 2026-05-23

Go module initialized (gitea.stevedudenhoeffer.com/steve/foreman)
Project layout: cmd/foreman/, internal/config/, internal/store/, internal/server/
internal/config: loads all FOREMAN_* env vars with defaults and validation
internal/store: SQLite-backed durable queue (WAL mode, modernc.org/sqlite)
- jobs table: ULID PK, model, payload, state machine, retry tracking, timestamps
- artifacts table: named typed blobs per job, unique on (job_id, name)
- Full CRUD: CreateJob, GetJob, UpdateJobState, ListJobs, CreateArtifact, GetArtifact, GetArtifactsByJob
internal/server: stdlib net/http server
- GET /healthz returning {"status":"ok","degraded":false}
- Optional bearer-token auth middleware (skips /healthz)
cmd/foreman/main.go: subcommand dispatch (serve + stubs for submit, jobs, ps)
CI: .gitea/workflows/ci.yaml (build, vet, test -race, tidy check)
Dockerfile: multi-stage distroless build
Config files: .env.example, .gitignore
Tests: config validation, store CRUD + edge cases, server health + auth middleware

Phase 2: Ollama target client, model poller, native passthrough — 2026-05-23

internal/ollama/ — target client package:
- Wire types (types.go): ChatRequest/Response, EmbedRequest/Response, TagsResponse, PsResponse, ModelInfo, RunningModel — matching Ollama's native JSON API exactly. Polymorphic fields (think, keep_alive, tools, options) use json.RawMessage for transparent passthrough fidelity.
- Client interface (client.go): Chat (stream/non-stream), Embed, Tags, Ps, RawChat, RawEmbed. RawChat/RawEmbed return *http.Response for zero-copy streaming passthrough.
- httpClient implementation: auth token injection, NDJSON streaming via bufio.Scanner with 4 MB buffer, connection vs HTTP error classification.
- Custom error types (errors.go): *ConnectionError for network failures (retry-eligible), *HTTPError for non-2xx responses. errors.Is/errors.As compatible.
- ModelInventory (inventory.go): mutex-protected in-memory cache of installed and running models. Methods: Models(), HasModel(), ResidentModels(), LastPoll(), Degraded(), Refresh(). Background Start() goroutine polls at FOREMAN_POLL_INTERVAL (default 30s). On target unreachable: retains last-known inventory, sets degraded=true. Clears degraded on recovery.
internal/server/ — new Ollama passthrough routes:
- GET /api/tags — serves poller's cached model list
- GET /api/ps — serves poller's cached running models
- POST /api/embed, POST /api/embeddings — direct concurrent proxy to target, bypasses the chat gate entirely (ADR-0013)
- POST /api/chat — critical path: validates model (re-poll on miss, 404 if still absent), serializes through a capacity-1 channel gate, proxies to target with NDJSON streaming (application/x-ndjson, flushed per chunk) or non-streaming JSON passthrough
- GET /healthz — now wired to inventory.Degraded() for real target status
cmd/foreman/main.go — full serve wiring:
- Creates Ollama client, starts model poller goroutine, warms embedder (keep_alive: -1), creates server with all dependencies, signal-based graceful shutdown via context.NotifyContext
Tests (all passing with -race):
- Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token forwarding, *ConnectionError on unreachable target, *HTTPError on non-2xx
- Inventory: refresh populates models, degraded on failure, model retention, recovery from degraded, Start/cancel lifecycle
- Server: tags/ps passthrough, model validation (404 on unknown), non-streaming chat proxy, NDJSON streaming passthrough with correct Content-Type, chat serialization (gate holds concurrent requests to max 1 in-flight), concurrent embed bypass (multiple requests run in parallel), degraded health endpoint, embeddings alias path

The Mac is now usable as a go-llm target through foreman: llm.OllamaCloud(token, WithBaseURL("http://foreman:8080")) works transparently for chat (streaming + non-streaming), tags, ps, and embeddings.

4.2 KiB Raw Blame History

foreman — progress

Phase 1: Scaffold — 2026-05-23

Phase 2: Ollama target client, model poller, native passthrough — 2026-05-23

4.2 KiB

Raw Blame History