Files
foreman/progress.md
T
steve 4759a06d1b feat: add Go client package with sync facade over async /jobs
Adds client/ -- a public Go package providing a synchronous facade over
foreman's async POST /jobs API (Level 1 integration per ADR-0011).

Two delivery modes:
- Webhook receiver (preferred): ephemeral HTTP server on random port,
  pushes results immediately, verifies HMAC when configured
- Polling fallback: polls GET /jobs/{id} at configurable interval

Also includes Tags() and Embed() helpers, bearer auth support, and
comprehensive integration tests against the real foreman HTTP handlers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:38:16 -04:00

11 KiB

foreman — progress

Phase 1: Scaffold — 2026-05-23

  • Go module initialized (gitea.stevedudenhoeffer.com/steve/foreman)
  • Project layout: cmd/foreman/, internal/config/, internal/store/, internal/server/
  • internal/config: loads all FOREMAN_* env vars with defaults and validation
  • internal/store: SQLite-backed durable queue (WAL mode, modernc.org/sqlite)
    • jobs table: ULID PK, model, payload, state machine, retry tracking, timestamps
    • artifacts table: named typed blobs per job, unique on (job_id, name)
    • Full CRUD: CreateJob, GetJob, UpdateJobState, ListJobs, CreateArtifact, GetArtifact, GetArtifactsByJob
  • internal/server: stdlib net/http server
    • GET /healthz returning {"status":"ok","degraded":false}
    • Optional bearer-token auth middleware (skips /healthz)
  • cmd/foreman/main.go: subcommand dispatch (serve + stubs for submit, jobs, ps)
  • CI: .gitea/workflows/ci.yaml (build, vet, test -race, tidy check)
  • Dockerfile: multi-stage distroless build
  • Config files: .env.example, .gitignore
  • Tests: config validation, store CRUD + edge cases, server health + auth middleware

Phase 2: Ollama target client, model poller, native passthrough — 2026-05-23

  • internal/ollama/ — target client package:
    • Wire types (types.go): ChatRequest/Response, EmbedRequest/Response, TagsResponse, PsResponse, ModelInfo, RunningModel — matching Ollama's native JSON API exactly. Polymorphic fields (think, keep_alive, tools, options) use json.RawMessage for transparent passthrough fidelity.
    • Client interface (client.go): Chat (stream/non-stream), Embed, Tags, Ps, RawChat, RawEmbed. RawChat/RawEmbed return *http.Response for zero-copy streaming passthrough.
    • httpClient implementation: auth token injection, NDJSON streaming via bufio.Scanner with 4 MB buffer, connection vs HTTP error classification.
    • Custom error types (errors.go): *ConnectionError for network failures (retry-eligible), *HTTPError for non-2xx responses. errors.Is/errors.As compatible.
    • ModelInventory (inventory.go): mutex-protected in-memory cache of installed and running models. Methods: Models(), HasModel(), ResidentModels(), LastPoll(), Degraded(), Refresh(). Background Start() goroutine polls at FOREMAN_POLL_INTERVAL (default 30s). On target unreachable: retains last-known inventory, sets degraded=true. Clears degraded on recovery.
  • internal/server/ — new Ollama passthrough routes:
    • GET /api/tags — serves poller's cached model list
    • GET /api/ps — serves poller's cached running models
    • POST /api/embed, POST /api/embeddings — direct concurrent proxy to target, bypasses the chat gate entirely (ADR-0013)
    • POST /api/chat — critical path: validates model (re-poll on miss, 404 if still absent), serializes through a capacity-1 channel gate, proxies to target with NDJSON streaming (application/x-ndjson, flushed per chunk) or non-streaming JSON passthrough
    • GET /healthz — now wired to inventory.Degraded() for real target status
  • cmd/foreman/main.go — full serve wiring:
    • Creates Ollama client, starts model poller goroutine, warms embedder (keep_alive: -1), creates server with all dependencies, signal-based graceful shutdown via context.NotifyContext
  • Tests (all passing with -race):
    • Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token forwarding, *ConnectionError on unreachable target, *HTTPError on non-2xx
    • Inventory: refresh populates models, degraded on failure, model retention, recovery from degraded, Start/cancel lifecycle
    • Server: tags/ps passthrough, model validation (404 on unknown), non-streaming chat proxy, NDJSON streaming passthrough with correct Content-Type, chat serialization (gate holds concurrent requests to max 1 in-flight), concurrent embed bypass (multiple requests run in parallel), degraded health endpoint, embeddings alias path

The Mac is now usable as a go-llm target through foreman: llm.OllamaCloud(token, WithBaseURL("http://foreman:8080")) works transparently for chat (streaming + non-streaming), tags, ps, and embeddings.

Phase 3: Durable queue, single worker, drain-by-model — 2026-05-23

M0 complete. The Phase 2 in-flight chat gate (buffered channel) is replaced with the real SQLite-backed job queue and single worker loop.

  • internal/store/ — new store methods:

    • NextJob(currentModel): drain-by-model ordering — prefers jobs matching the currently-resident model to minimize swap cost, then FIFO by created_at.
    • IncrementAttempt(id): bumps attempt counter and re-queues for retry.
    • ResetInterruptedJobs(): resets loading/working jobs to queued on startup (crash recovery).
    • DeleteTerminalJobsBefore(cutoff): TTL pruner for old done/failed jobs.
    • SQLite DSN now includes _pragma=busy_timeout(5000) for reliable concurrent access from HTTP handlers + worker.
  • internal/worker/ — single worker loop (worker.go):

    • Worker.Run(ctx): main goroutine loop — resets interrupted jobs on startup, then continuously picks the next job using drain-by-model ordering, executes via the Ollama client, stores result + completion artifact, notifies waiters.
    • Worker.Wake(): non-blocking signal for new job availability.
    • Notifier: sync.Map-based completion notification — HTTP handlers register a channel per job ID, the worker closes it on completion. Supports Register(), Complete(), Result().
    • Retry semantics: *ollama.ConnectionError causes re-queue with incremented attempt; *ollama.HTTPError is a terminal failure (no retry). Max attempts configurable via FOREMAN_MAX_ATTEMPTS (default 3).
    • The worker loop never panics — all errors are logged, jobs are marked, loop continues.
  • internal/server/ — chat handler rewrite:

    • POST /api/chat now creates a job row (state queued), registers a completion waiter, wakes the worker, and blocks until the job reaches a terminal state. Returns the Ollama response on success, 502 on failure.
    • ULID job IDs generated at submission time (github.com/oklog/ulid/v2).
    • The old chatGate (buffered channel) is removed entirely.
    • /api/embed and /api/embeddings remain direct concurrent proxies (unchanged from Phase 2, per ADR-0013).
  • internal/config/ — new config fields:

    • FOREMAN_MAX_ATTEMPTS (int, default 3)
    • FOREMAN_JOB_TTL (duration, default 24h)
  • Tests (all passing with -race):

    • Worker: single job execution, serial enforcement, drain-by-model ordering, retry on connection error, max attempts exhaustion, HTTP error terminal failure, interrupted job reset on startup, wake signal, notifier lifecycle.
    • Store: NextJob drain-by-model, empty queue, IncrementAttempt, ResetInterrupted, DeleteTerminalJobsBefore.
    • Server: chat model validation (404), non-streaming chat through queue, serialization (max 1 concurrent), context cancellation, embed bypass unchanged.

Phase 4: Async /jobs surface, webhooks, artifacts — 2026-05-23

M1 core complete (minus CLI and go-llm constructor, which are separate work).

  • internal/webhook/ — webhook dispatcher:

    • Dispatcher.Fire(url, event): non-blocking goroutine delivery with exponential backoff retry (1s, 2s, 4s, 8s, 16s — max 5 attempts).
    • Optional HMAC-SHA256 signing via FOREMAN_WEBHOOK_SECRET — sets X-Foreman-Signature: sha256=<hex> header.
    • VerifySignature(): exported for webhook receivers.
    • FormatArtifacts(): inline (data field) for artifacts <= 256KB, URL reference for larger ones.
    • Webhook failures are logged and dropped — never block or fail the job (ADR-0005).
  • internal/server/ — new routes:

    • POST /jobs: validates model, creates job row with optional state_webhook_url, returns 202 Accepted with {"job_id":"<ulid>"}. Fires initial "queued" webhook. Wakes worker.
    • GET /jobs/{id}: returns full job state, result, error, and artifact metadata. 404 for unknown IDs. Artifacts under 256KB are inlined; larger ones get a URL reference.
    • GET /jobs/{id}/artifacts/{name}: serves raw artifact data with stored content type. 404 for unknown job/artifact.
  • docs/adr/0014-no-webhooks-on-sync-chat.md:

    • state_webhook_url is only honored on POST /jobs. Sync /api/chat does not fire webhooks (ADR-0014). Rationale: the caller already holds a blocking HTTP connection.
  • cmd/foreman/main.go — full serve wiring:

    • Creates webhook dispatcher, notifier, worker.
    • Starts worker loop goroutine and TTL pruner goroutine.
    • TTL pruner runs every jobTTL/4 (min 1 minute), deletes terminal jobs older than FOREMAN_JOB_TTL (default 24h).
    • Server constructor now receives notifier, worker, and dispatcher.
  • Tests (all passing with -race):

    • Jobs API: 202 on submit, ULID format, 404 for unknown model, 400 for missing model, 404 for unknown job, job state after completion, artifact retrieval, artifact 404.
    • Webhooks: full lifecycle events (queued->working->done), 500-returning receiver does not affect job state, HMAC signature verification.
    • Webhook dispatcher: delivery, retry on 500, non-blocking Fire, HMAC signing, no HMAC when no secret, signature format validation.
    • Artifacts: small inline, large by URL, empty returns nil.
    • TTL pruner: deletes old terminal jobs.

Phase 5: Go client package + go-llm Foreman() constructor — 2026-05-23

Level 0 + Level 1 integration complete (ADR-0011).

  • client/ — public Go client package (sync facade over async /jobs API):

    • client.New(baseURL, opts...): configurable client with bearer auth, webhook secret, custom HTTP client, poll interval.
    • client.Submit(ctx, SubmitRequest) (*Result, error): synchronous submission — blocks until the job reaches a terminal state (done/failed).
    • Two delivery modes:
      • Webhook receiver (preferred): starts an ephemeral HTTP server on a random port, sets state_webhook_url, waits for the done/failed webhook event. Verifies HMAC signature when WithWebhookSecret is set. Falls back to polling automatically if the listener fails to bind.
      • Polling fallback: polls GET /jobs/{id} at pollInterval (default 2s) until terminal state. Forced via WithPollingMode().
    • client.Tags(ctx): fetches installed models via GET /api/tags.
    • client.Embed(ctx, EmbedRequest): sends embedding requests via POST /api/embed (bypasses queue, ADR-0013).
    • Both modes respect context cancellation/deadline and clean up resources.
  • Tests (all passing with -race):

    • Happy path (polling): submit, poll, verify completed result + artifacts.
    • Happy path (webhook): submit with webhook receiver, verify push delivery.
    • Failed job: returns Result with state=failed and error message.
    • Context timeout: returns error on deadline exceeded.
    • Auth: bearer token sent when configured; 401 without it.
    • HMAC webhook verification: signed webhooks verified correctly.
    • Tags and Embed endpoints: round-trip through the client.
    • Missing model validation: returns error before network call.
  • go-llm integration (Level 0):

    • llm.Foreman(baseURL, apiKey, opts...) constructor added to v2/constructors.go on branch feat/foreman-constructor.
    • Delegates to existing ollamaProvider.New() — zero new code paths.
    • DD#9 added to v2/CLAUDE.md.
    • PR: steve/go-llm#4