Adds client/ -- a public Go package providing a synchronous facade over
foreman's async POST /jobs API (Level 1 integration per ADR-0011).
Two delivery modes:
- Webhook receiver (preferred): ephemeral HTTP server on random port,
pushes results immediately, verifies HMAC when configured
- Polling fallback: polls GET /jobs/{id} at configurable interval
Also includes Tags() and Embed() helpers, bearer auth support, and
comprehensive integration tests against the real foreman HTTP handlers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
foreman — progress
Phase 1: Scaffold — 2026-05-23
- Go module initialized (
gitea.stevedudenhoeffer.com/steve/foreman) - Project layout:
cmd/foreman/,internal/config/,internal/store/,internal/server/ internal/config: loads allFOREMAN_*env vars with defaults and validationinternal/store: SQLite-backed durable queue (WAL mode,modernc.org/sqlite)jobstable: ULID PK, model, payload, state machine, retry tracking, timestampsartifactstable: named typed blobs per job, unique on (job_id, name)- Full CRUD: CreateJob, GetJob, UpdateJobState, ListJobs, CreateArtifact, GetArtifact, GetArtifactsByJob
internal/server: stdlibnet/httpserverGET /healthzreturning{"status":"ok","degraded":false}- Optional bearer-token auth middleware (skips /healthz)
cmd/foreman/main.go: subcommand dispatch (serve + stubs for submit, jobs, ps)- CI:
.gitea/workflows/ci.yaml(build, vet, test -race, tidy check) - Dockerfile: multi-stage distroless build
- Config files:
.env.example,.gitignore - Tests: config validation, store CRUD + edge cases, server health + auth middleware
Phase 2: Ollama target client, model poller, native passthrough — 2026-05-23
internal/ollama/— target client package:- Wire types (
types.go): ChatRequest/Response, EmbedRequest/Response, TagsResponse, PsResponse, ModelInfo, RunningModel — matching Ollama's native JSON API exactly. Polymorphic fields (think, keep_alive, tools, options) usejson.RawMessagefor transparent passthrough fidelity. Clientinterface (client.go): Chat (stream/non-stream), Embed, Tags, Ps, RawChat, RawEmbed. RawChat/RawEmbed return*http.Responsefor zero-copy streaming passthrough.httpClientimplementation: auth token injection, NDJSON streaming viabufio.Scannerwith 4 MB buffer, connection vs HTTP error classification.- Custom error types (
errors.go):*ConnectionErrorfor network failures (retry-eligible),*HTTPErrorfor non-2xx responses.errors.Is/errors.Ascompatible. ModelInventory(inventory.go): mutex-protected in-memory cache of installed and running models. Methods: Models(), HasModel(), ResidentModels(), LastPoll(), Degraded(), Refresh(). BackgroundStart()goroutine polls atFOREMAN_POLL_INTERVAL(default 30s). On target unreachable: retains last-known inventory, setsdegraded=true. Clears degraded on recovery.
- Wire types (
internal/server/— new Ollama passthrough routes:GET /api/tags— serves poller's cached model listGET /api/ps— serves poller's cached running modelsPOST /api/embed,POST /api/embeddings— direct concurrent proxy to target, bypasses the chat gate entirely (ADR-0013)POST /api/chat— critical path: validates model (re-poll on miss, 404 if still absent), serializes through a capacity-1 channel gate, proxies to target with NDJSON streaming (application/x-ndjson, flushed per chunk) or non-streaming JSON passthroughGET /healthz— now wired toinventory.Degraded()for real target status
cmd/foreman/main.go— full serve wiring:- Creates Ollama client, starts model poller goroutine, warms embedder
(
keep_alive: -1), creates server with all dependencies, signal-based graceful shutdown viacontext.NotifyContext
- Creates Ollama client, starts model poller goroutine, warms embedder
(
- Tests (all passing with
-race):- Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token
forwarding,
*ConnectionErroron unreachable target,*HTTPErroron non-2xx - Inventory: refresh populates models, degraded on failure, model retention, recovery from degraded, Start/cancel lifecycle
- Server: tags/ps passthrough, model validation (404 on unknown), non-streaming chat proxy, NDJSON streaming passthrough with correct Content-Type, chat serialization (gate holds concurrent requests to max 1 in-flight), concurrent embed bypass (multiple requests run in parallel), degraded health endpoint, embeddings alias path
- Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token
forwarding,
The Mac is now usable as a go-llm target through foreman:
llm.OllamaCloud(token, WithBaseURL("http://foreman:8080")) works transparently
for chat (streaming + non-streaming), tags, ps, and embeddings.
Phase 3: Durable queue, single worker, drain-by-model — 2026-05-23
M0 complete. The Phase 2 in-flight chat gate (buffered channel) is replaced with the real SQLite-backed job queue and single worker loop.
-
internal/store/— new store methods:NextJob(currentModel): drain-by-model ordering — prefers jobs matching the currently-resident model to minimize swap cost, then FIFO by created_at.IncrementAttempt(id): bumps attempt counter and re-queues for retry.ResetInterruptedJobs(): resets loading/working jobs to queued on startup (crash recovery).DeleteTerminalJobsBefore(cutoff): TTL pruner for old done/failed jobs.- SQLite DSN now includes
_pragma=busy_timeout(5000)for reliable concurrent access from HTTP handlers + worker.
-
internal/worker/— single worker loop (worker.go):Worker.Run(ctx): main goroutine loop — resets interrupted jobs on startup, then continuously picks the next job using drain-by-model ordering, executes via the Ollama client, stores result + completion artifact, notifies waiters.Worker.Wake(): non-blocking signal for new job availability.Notifier: sync.Map-based completion notification — HTTP handlers register a channel per job ID, the worker closes it on completion. SupportsRegister(),Complete(),Result().- Retry semantics:
*ollama.ConnectionErrorcauses re-queue with incremented attempt;*ollama.HTTPErroris a terminal failure (no retry). Max attempts configurable viaFOREMAN_MAX_ATTEMPTS(default 3). - The worker loop never panics — all errors are logged, jobs are marked, loop continues.
-
internal/server/— chat handler rewrite:POST /api/chatnow creates a job row (statequeued), registers a completion waiter, wakes the worker, and blocks until the job reaches a terminal state. Returns the Ollama response on success, 502 on failure.- ULID job IDs generated at submission time (
github.com/oklog/ulid/v2). - The old
chatGate(buffered channel) is removed entirely. /api/embedand/api/embeddingsremain direct concurrent proxies (unchanged from Phase 2, per ADR-0013).
-
internal/config/— new config fields:FOREMAN_MAX_ATTEMPTS(int, default 3)FOREMAN_JOB_TTL(duration, default 24h)
-
Tests (all passing with
-race):- Worker: single job execution, serial enforcement, drain-by-model ordering, retry on connection error, max attempts exhaustion, HTTP error terminal failure, interrupted job reset on startup, wake signal, notifier lifecycle.
- Store: NextJob drain-by-model, empty queue, IncrementAttempt, ResetInterrupted, DeleteTerminalJobsBefore.
- Server: chat model validation (404), non-streaming chat through queue, serialization (max 1 concurrent), context cancellation, embed bypass unchanged.
Phase 4: Async /jobs surface, webhooks, artifacts — 2026-05-23
M1 core complete (minus CLI and go-llm constructor, which are separate work).
-
internal/webhook/— webhook dispatcher:Dispatcher.Fire(url, event): non-blocking goroutine delivery with exponential backoff retry (1s, 2s, 4s, 8s, 16s — max 5 attempts).- Optional HMAC-SHA256 signing via
FOREMAN_WEBHOOK_SECRET— setsX-Foreman-Signature: sha256=<hex>header. VerifySignature(): exported for webhook receivers.FormatArtifacts(): inline (data field) for artifacts <= 256KB, URL reference for larger ones.- Webhook failures are logged and dropped — never block or fail the job (ADR-0005).
-
internal/server/— new routes:POST /jobs: validates model, creates job row with optionalstate_webhook_url, returns202 Acceptedwith{"job_id":"<ulid>"}. Fires initial "queued" webhook. Wakes worker.GET /jobs/{id}: returns full job state, result, error, and artifact metadata. 404 for unknown IDs. Artifacts under 256KB are inlined; larger ones get a URL reference.GET /jobs/{id}/artifacts/{name}: serves raw artifact data with stored content type. 404 for unknown job/artifact.
-
docs/adr/0014-no-webhooks-on-sync-chat.md:state_webhook_urlis only honored onPOST /jobs. Sync/api/chatdoes not fire webhooks (ADR-0014). Rationale: the caller already holds a blocking HTTP connection.
-
cmd/foreman/main.go— full serve wiring:- Creates webhook dispatcher, notifier, worker.
- Starts worker loop goroutine and TTL pruner goroutine.
- TTL pruner runs every
jobTTL/4(min 1 minute), deletes terminal jobs older thanFOREMAN_JOB_TTL(default 24h). - Server constructor now receives notifier, worker, and dispatcher.
-
Tests (all passing with
-race):- Jobs API: 202 on submit, ULID format, 404 for unknown model, 400 for missing model, 404 for unknown job, job state after completion, artifact retrieval, artifact 404.
- Webhooks: full lifecycle events (queued->working->done), 500-returning receiver does not affect job state, HMAC signature verification.
- Webhook dispatcher: delivery, retry on 500, non-blocking Fire, HMAC signing, no HMAC when no secret, signature format validation.
- Artifacts: small inline, large by URL, empty returns nil.
- TTL pruner: deletes old terminal jobs.
Phase 5: Go client package + go-llm Foreman() constructor — 2026-05-23
Level 0 + Level 1 integration complete (ADR-0011).
-
client/— public Go client package (sync facade over async/jobsAPI):client.New(baseURL, opts...): configurable client with bearer auth, webhook secret, custom HTTP client, poll interval.client.Submit(ctx, SubmitRequest) (*Result, error): synchronous submission — blocks until the job reaches a terminal state (done/failed).- Two delivery modes:
- Webhook receiver (preferred): starts an ephemeral HTTP server on a
random port, sets
state_webhook_url, waits for thedone/failedwebhook event. Verifies HMAC signature whenWithWebhookSecretis set. Falls back to polling automatically if the listener fails to bind. - Polling fallback: polls
GET /jobs/{id}atpollInterval(default 2s) until terminal state. Forced viaWithPollingMode().
- Webhook receiver (preferred): starts an ephemeral HTTP server on a
random port, sets
client.Tags(ctx): fetches installed models viaGET /api/tags.client.Embed(ctx, EmbedRequest): sends embedding requests viaPOST /api/embed(bypasses queue, ADR-0013).- Both modes respect context cancellation/deadline and clean up resources.
-
Tests (all passing with
-race):- Happy path (polling): submit, poll, verify completed result + artifacts.
- Happy path (webhook): submit with webhook receiver, verify push delivery.
- Failed job: returns Result with state=failed and error message.
- Context timeout: returns error on deadline exceeded.
- Auth: bearer token sent when configured; 401 without it.
- HMAC webhook verification: signed webhooks verified correctly.
- Tags and Embed endpoints: round-trip through the client.
- Missing model validation: returns error before network call.
-
go-llm integration (Level 0):
llm.Foreman(baseURL, apiKey, opts...)constructor added tov2/constructors.goon branchfeat/foreman-constructor.- Delegates to existing
ollamaProvider.New()— zero new code paths. - DD#9 added to
v2/CLAUDE.md. - PR: steve/go-llm#4