27f196d333
Phase 2 of foreman: the daemon now acts as a transparent Ollama proxy. - internal/ollama: Client interface and HTTP implementation for chat (streaming + non-streaming), embed, tags, ps with auth forwarding, NDJSON streaming via bufio.Scanner, and connection vs HTTP error classification via custom error types. - internal/ollama: ModelInventory with background poller for /api/tags and /api/ps, degraded mode on target unreachable with model retention, automatic recovery on reconnect. - internal/server: Passthrough routes (/api/chat, /api/tags, /api/ps, /api/embed, /api/embeddings) with model validation, chat serialization gate (capacity-1 channel), concurrent embedding bypass (ADR-0013), NDJSON streaming with per-chunk flush, and degraded health reporting. - cmd/foreman: Full serve wiring with Ollama client, poller goroutine, embedder warmup (keep_alive:-1), and signal-based shutdown. The Mac is now usable as a go-llm target through foreman. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.2 KiB
4.2 KiB
foreman — progress
Phase 1: Scaffold — 2026-05-23
- Go module initialized (
gitea.stevedudenhoeffer.com/steve/foreman) - Project layout:
cmd/foreman/,internal/config/,internal/store/,internal/server/ internal/config: loads allFOREMAN_*env vars with defaults and validationinternal/store: SQLite-backed durable queue (WAL mode,modernc.org/sqlite)jobstable: ULID PK, model, payload, state machine, retry tracking, timestampsartifactstable: named typed blobs per job, unique on (job_id, name)- Full CRUD: CreateJob, GetJob, UpdateJobState, ListJobs, CreateArtifact, GetArtifact, GetArtifactsByJob
internal/server: stdlibnet/httpserverGET /healthzreturning{"status":"ok","degraded":false}- Optional bearer-token auth middleware (skips /healthz)
cmd/foreman/main.go: subcommand dispatch (serve + stubs for submit, jobs, ps)- CI:
.gitea/workflows/ci.yaml(build, vet, test -race, tidy check) - Dockerfile: multi-stage distroless build
- Config files:
.env.example,.gitignore - Tests: config validation, store CRUD + edge cases, server health + auth middleware
Phase 2: Ollama target client, model poller, native passthrough — 2026-05-23
internal/ollama/— target client package:- Wire types (
types.go): ChatRequest/Response, EmbedRequest/Response, TagsResponse, PsResponse, ModelInfo, RunningModel — matching Ollama's native JSON API exactly. Polymorphic fields (think, keep_alive, tools, options) usejson.RawMessagefor transparent passthrough fidelity. Clientinterface (client.go): Chat (stream/non-stream), Embed, Tags, Ps, RawChat, RawEmbed. RawChat/RawEmbed return*http.Responsefor zero-copy streaming passthrough.httpClientimplementation: auth token injection, NDJSON streaming viabufio.Scannerwith 4 MB buffer, connection vs HTTP error classification.- Custom error types (
errors.go):*ConnectionErrorfor network failures (retry-eligible),*HTTPErrorfor non-2xx responses.errors.Is/errors.Ascompatible. ModelInventory(inventory.go): mutex-protected in-memory cache of installed and running models. Methods: Models(), HasModel(), ResidentModels(), LastPoll(), Degraded(), Refresh(). BackgroundStart()goroutine polls atFOREMAN_POLL_INTERVAL(default 30s). On target unreachable: retains last-known inventory, setsdegraded=true. Clears degraded on recovery.
- Wire types (
internal/server/— new Ollama passthrough routes:GET /api/tags— serves poller's cached model listGET /api/ps— serves poller's cached running modelsPOST /api/embed,POST /api/embeddings— direct concurrent proxy to target, bypasses the chat gate entirely (ADR-0013)POST /api/chat— critical path: validates model (re-poll on miss, 404 if still absent), serializes through a capacity-1 channel gate, proxies to target with NDJSON streaming (application/x-ndjson, flushed per chunk) or non-streaming JSON passthroughGET /healthz— now wired toinventory.Degraded()for real target status
cmd/foreman/main.go— full serve wiring:- Creates Ollama client, starts model poller goroutine, warms embedder
(
keep_alive: -1), creates server with all dependencies, signal-based graceful shutdown viacontext.NotifyContext
- Creates Ollama client, starts model poller goroutine, warms embedder
(
- Tests (all passing with
-race):- Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token
forwarding,
*ConnectionErroron unreachable target,*HTTPErroron non-2xx - Inventory: refresh populates models, degraded on failure, model retention, recovery from degraded, Start/cancel lifecycle
- Server: tags/ps passthrough, model validation (404 on unknown), non-streaming chat proxy, NDJSON streaming passthrough with correct Content-Type, chat serialization (gate holds concurrent requests to max 1 in-flight), concurrent embed bypass (multiple requests run in parallel), degraded health endpoint, embeddings alias path
- Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token
forwarding,
The Mac is now usable as a go-llm target through foreman:
llm.OllamaCloud(token, WithBaseURL("http://foreman:8080")) works transparently
for chat (streaming + non-streaming), tags, ps, and embeddings.