e119ed325b
Phase 6 deployment infrastructure: finalize Dockerfile with OCI labels, improve .env.example with grouped config keys, add scripts/pull-models.sh for Mac-side model setup, and add docs/deploy.md covering the full deployment topology, prerequisites, security model, and troubleshooting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
229 lines
13 KiB
Markdown
229 lines
13 KiB
Markdown
# foreman — progress
|
|
|
|
## Phase 1: Scaffold — 2026-05-23
|
|
|
|
- Go module initialized (`gitea.stevedudenhoeffer.com/steve/foreman`)
|
|
- Project layout: `cmd/foreman/`, `internal/config/`, `internal/store/`, `internal/server/`
|
|
- `internal/config`: loads all `FOREMAN_*` env vars with defaults and validation
|
|
- `internal/store`: SQLite-backed durable queue (WAL mode, `modernc.org/sqlite`)
|
|
- `jobs` table: ULID PK, model, payload, state machine, retry tracking, timestamps
|
|
- `artifacts` table: named typed blobs per job, unique on (job_id, name)
|
|
- Full CRUD: CreateJob, GetJob, UpdateJobState, ListJobs, CreateArtifact, GetArtifact, GetArtifactsByJob
|
|
- `internal/server`: stdlib `net/http` server
|
|
- `GET /healthz` returning `{"status":"ok","degraded":false}`
|
|
- Optional bearer-token auth middleware (skips /healthz)
|
|
- `cmd/foreman/main.go`: subcommand dispatch (serve + stubs for submit, jobs, ps)
|
|
- CI: `.gitea/workflows/ci.yaml` (build, vet, test -race, tidy check)
|
|
- Dockerfile: multi-stage distroless build
|
|
- Config files: `.env.example`, `.gitignore`
|
|
- Tests: config validation, store CRUD + edge cases, server health + auth middleware
|
|
|
|
## Phase 2: Ollama target client, model poller, native passthrough — 2026-05-23
|
|
|
|
- `internal/ollama/` — target client package:
|
|
- Wire types (`types.go`): ChatRequest/Response, EmbedRequest/Response, TagsResponse,
|
|
PsResponse, ModelInfo, RunningModel — matching Ollama's native JSON API exactly.
|
|
Polymorphic fields (think, keep_alive, tools, options) use `json.RawMessage`
|
|
for transparent passthrough fidelity.
|
|
- `Client` interface (`client.go`): Chat (stream/non-stream), Embed, Tags, Ps,
|
|
RawChat, RawEmbed. RawChat/RawEmbed return `*http.Response` for zero-copy
|
|
streaming passthrough.
|
|
- `httpClient` implementation: auth token injection, NDJSON streaming via
|
|
`bufio.Scanner` with 4 MB buffer, connection vs HTTP error classification.
|
|
- Custom error types (`errors.go`): `*ConnectionError` for network failures
|
|
(retry-eligible), `*HTTPError` for non-2xx responses. `errors.Is`/`errors.As`
|
|
compatible.
|
|
- `ModelInventory` (`inventory.go`): mutex-protected in-memory cache of installed
|
|
and running models. Methods: Models(), HasModel(), ResidentModels(), LastPoll(),
|
|
Degraded(), Refresh(). Background `Start()` goroutine polls at
|
|
`FOREMAN_POLL_INTERVAL` (default 30s). On target unreachable: retains last-known
|
|
inventory, sets `degraded=true`. Clears degraded on recovery.
|
|
- `internal/server/` — new Ollama passthrough routes:
|
|
- `GET /api/tags` — serves poller's cached model list
|
|
- `GET /api/ps` — serves poller's cached running models
|
|
- `POST /api/embed`, `POST /api/embeddings` — direct concurrent proxy to target,
|
|
bypasses the chat gate entirely (ADR-0013)
|
|
- `POST /api/chat` — critical path: validates model (re-poll on miss, 404 if
|
|
still absent), serializes through a capacity-1 channel gate, proxies to target
|
|
with NDJSON streaming (`application/x-ndjson`, flushed per chunk) or
|
|
non-streaming JSON passthrough
|
|
- `GET /healthz` — now wired to `inventory.Degraded()` for real target status
|
|
- `cmd/foreman/main.go` — full serve wiring:
|
|
- Creates Ollama client, starts model poller goroutine, warms embedder
|
|
(`keep_alive: -1`), creates server with all dependencies, signal-based
|
|
graceful shutdown via `context.NotifyContext`
|
|
- Tests (all passing with `-race`):
|
|
- Client: tags/ps parsing, chat streaming + non-streaming, embed, auth token
|
|
forwarding, `*ConnectionError` on unreachable target, `*HTTPError` on non-2xx
|
|
- Inventory: refresh populates models, degraded on failure, model retention,
|
|
recovery from degraded, Start/cancel lifecycle
|
|
- Server: tags/ps passthrough, model validation (404 on unknown), non-streaming
|
|
chat proxy, NDJSON streaming passthrough with correct Content-Type, chat
|
|
serialization (gate holds concurrent requests to max 1 in-flight), concurrent
|
|
embed bypass (multiple requests run in parallel), degraded health endpoint,
|
|
embeddings alias path
|
|
|
|
The Mac is now usable as a go-llm target through foreman:
|
|
`llm.OllamaCloud(token, WithBaseURL("http://foreman:8080"))` works transparently
|
|
for chat (streaming + non-streaming), tags, ps, and embeddings.
|
|
|
|
## Phase 3: Durable queue, single worker, drain-by-model — 2026-05-23
|
|
|
|
**M0 complete.** The Phase 2 in-flight chat gate (buffered channel) is replaced
|
|
with the real SQLite-backed job queue and single worker loop.
|
|
|
|
- `internal/store/` — new store methods:
|
|
- `NextJob(currentModel)`: drain-by-model ordering — prefers jobs matching the
|
|
currently-resident model to minimize swap cost, then FIFO by created_at.
|
|
- `IncrementAttempt(id)`: bumps attempt counter and re-queues for retry.
|
|
- `ResetInterruptedJobs()`: resets loading/working jobs to queued on startup
|
|
(crash recovery).
|
|
- `DeleteTerminalJobsBefore(cutoff)`: TTL pruner for old done/failed jobs.
|
|
- SQLite DSN now includes `_pragma=busy_timeout(5000)` for reliable concurrent
|
|
access from HTTP handlers + worker.
|
|
|
|
- `internal/worker/` — single worker loop (`worker.go`):
|
|
- `Worker.Run(ctx)`: main goroutine loop — resets interrupted jobs on startup,
|
|
then continuously picks the next job using drain-by-model ordering, executes
|
|
via the Ollama client, stores result + completion artifact, notifies waiters.
|
|
- `Worker.Wake()`: non-blocking signal for new job availability.
|
|
- `Notifier`: sync.Map-based completion notification — HTTP handlers register
|
|
a channel per job ID, the worker closes it on completion. Supports
|
|
`Register()`, `Complete()`, `Result()`.
|
|
- Retry semantics: `*ollama.ConnectionError` causes re-queue with incremented
|
|
attempt; `*ollama.HTTPError` is a terminal failure (no retry). Max attempts
|
|
configurable via `FOREMAN_MAX_ATTEMPTS` (default 3).
|
|
- The worker loop never panics — all errors are logged, jobs are marked, loop
|
|
continues.
|
|
|
|
- `internal/server/` — chat handler rewrite:
|
|
- `POST /api/chat` now creates a job row (state `queued`), registers a
|
|
completion waiter, wakes the worker, and blocks until the job reaches a
|
|
terminal state. Returns the Ollama response on success, 502 on failure.
|
|
- ULID job IDs generated at submission time (`github.com/oklog/ulid/v2`).
|
|
- The old `chatGate` (buffered channel) is removed entirely.
|
|
- `/api/embed` and `/api/embeddings` remain direct concurrent proxies (unchanged
|
|
from Phase 2, per ADR-0013).
|
|
|
|
- `internal/config/` — new config fields:
|
|
- `FOREMAN_MAX_ATTEMPTS` (int, default 3)
|
|
- `FOREMAN_JOB_TTL` (duration, default 24h)
|
|
|
|
- Tests (all passing with `-race`):
|
|
- Worker: single job execution, serial enforcement, drain-by-model ordering,
|
|
retry on connection error, max attempts exhaustion, HTTP error terminal
|
|
failure, interrupted job reset on startup, wake signal, notifier lifecycle.
|
|
- Store: NextJob drain-by-model, empty queue, IncrementAttempt, ResetInterrupted,
|
|
DeleteTerminalJobsBefore.
|
|
- Server: chat model validation (404), non-streaming chat through queue,
|
|
serialization (max 1 concurrent), context cancellation, embed bypass unchanged.
|
|
|
|
## Phase 4: Async /jobs surface, webhooks, artifacts — 2026-05-23
|
|
|
|
**M1 core complete** (minus CLI and go-llm constructor, which are separate work).
|
|
|
|
- `internal/webhook/` — webhook dispatcher:
|
|
- `Dispatcher.Fire(url, event)`: non-blocking goroutine delivery with
|
|
exponential backoff retry (1s, 2s, 4s, 8s, 16s — max 5 attempts).
|
|
- Optional HMAC-SHA256 signing via `FOREMAN_WEBHOOK_SECRET` — sets
|
|
`X-Foreman-Signature: sha256=<hex>` header.
|
|
- `VerifySignature()`: exported for webhook receivers.
|
|
- `FormatArtifacts()`: inline (data field) for artifacts <= 256KB, URL reference
|
|
for larger ones.
|
|
- Webhook failures are logged and dropped — never block or fail the job
|
|
(ADR-0005).
|
|
|
|
- `internal/server/` — new routes:
|
|
- `POST /jobs`: validates model, creates job row with optional
|
|
`state_webhook_url`, returns `202 Accepted` with `{"job_id":"<ulid>"}`.
|
|
Fires initial "queued" webhook. Wakes worker.
|
|
- `GET /jobs/{id}`: returns full job state, result, error, and artifact
|
|
metadata. 404 for unknown IDs. Artifacts under 256KB are inlined; larger
|
|
ones get a URL reference.
|
|
- `GET /jobs/{id}/artifacts/{name}`: serves raw artifact data with stored
|
|
content type. 404 for unknown job/artifact.
|
|
|
|
- `docs/adr/0014-no-webhooks-on-sync-chat.md`:
|
|
- `state_webhook_url` is only honored on `POST /jobs`. Sync `/api/chat` does
|
|
not fire webhooks (ADR-0014). Rationale: the caller already holds a blocking
|
|
HTTP connection.
|
|
|
|
- `cmd/foreman/main.go` — full serve wiring:
|
|
- Creates webhook dispatcher, notifier, worker.
|
|
- Starts worker loop goroutine and TTL pruner goroutine.
|
|
- TTL pruner runs every `jobTTL/4` (min 1 minute), deletes terminal jobs
|
|
older than `FOREMAN_JOB_TTL` (default 24h).
|
|
- Server constructor now receives notifier, worker, and dispatcher.
|
|
|
|
- Tests (all passing with `-race`):
|
|
- Jobs API: 202 on submit, ULID format, 404 for unknown model, 400 for
|
|
missing model, 404 for unknown job, job state after completion, artifact
|
|
retrieval, artifact 404.
|
|
- Webhooks: full lifecycle events (queued->working->done), 500-returning
|
|
receiver does not affect job state, HMAC signature verification.
|
|
- Webhook dispatcher: delivery, retry on 500, non-blocking Fire, HMAC signing,
|
|
no HMAC when no secret, signature format validation.
|
|
- Artifacts: small inline, large by URL, empty returns nil.
|
|
- TTL pruner: deletes old terminal jobs.
|
|
|
|
## Phase 5: Go client package + go-llm Foreman() constructor — 2026-05-23
|
|
|
|
**Level 0 + Level 1 integration complete** (ADR-0011).
|
|
|
|
- `client/` — public Go client package (sync facade over async `/jobs` API):
|
|
- `client.New(baseURL, opts...)`: configurable client with bearer auth,
|
|
webhook secret, custom HTTP client, poll interval.
|
|
- `client.Submit(ctx, SubmitRequest) (*Result, error)`: synchronous
|
|
submission — blocks until the job reaches a terminal state (`done`/`failed`).
|
|
- **Two delivery modes:**
|
|
- **Webhook receiver (preferred):** starts an ephemeral HTTP server on a
|
|
random port, sets `state_webhook_url`, waits for the `done`/`failed`
|
|
webhook event. Verifies HMAC signature when `WithWebhookSecret` is set.
|
|
Falls back to polling automatically if the listener fails to bind.
|
|
- **Polling fallback:** polls `GET /jobs/{id}` at `pollInterval` (default
|
|
2s) until terminal state. Forced via `WithPollingMode()`.
|
|
- `client.Tags(ctx)`: fetches installed models via `GET /api/tags`.
|
|
- `client.Embed(ctx, EmbedRequest)`: sends embedding requests via
|
|
`POST /api/embed` (bypasses queue, ADR-0013).
|
|
- Both modes respect context cancellation/deadline and clean up resources.
|
|
|
|
- Tests (all passing with `-race`):
|
|
- Happy path (polling): submit, poll, verify completed result + artifacts.
|
|
- Happy path (webhook): submit with webhook receiver, verify push delivery.
|
|
- Failed job: returns Result with state=failed and error message.
|
|
- Context timeout: returns error on deadline exceeded.
|
|
- Auth: bearer token sent when configured; 401 without it.
|
|
- HMAC webhook verification: signed webhooks verified correctly.
|
|
- Tags and Embed endpoints: round-trip through the client.
|
|
- Missing model validation: returns error before network call.
|
|
|
|
- go-llm integration (Level 0):
|
|
- `llm.Foreman(baseURL, apiKey, opts...)` constructor added to
|
|
`v2/constructors.go` on branch `feat/foreman-constructor`.
|
|
- Delegates to existing `ollamaProvider.New()` — zero new code paths.
|
|
- DD#9 added to `v2/CLAUDE.md`.
|
|
- PR: https://gitea.stevedudenhoeffer.com/steve/go-llm/pulls/4
|
|
|
|
## Phase 6: Deployment infrastructure — 2026-05-23
|
|
|
|
**Project is deployable.** All deployment artifacts are in place.
|
|
|
|
- `Dockerfile` — finalized with OCI labels (`image.source`, `image.description`).
|
|
Multi-stage distroless build, CGO_ENABLED=0, `foreman serve` entrypoint.
|
|
- `.env.example` — finalized with all 10 config keys from `internal/config/`,
|
|
grouped by function (daemon, model, persistence, polling, webhooks, job lifecycle)
|
|
with clear comments and example values.
|
|
- `scripts/pull-models.sh` — executable helper to pull the recommended model roster
|
|
on the Mac (nomic-embed-text, qwen3:14b, qwen3:30b). Prints Mac-side Ollama
|
|
environment setup instructions.
|
|
- `docs/deploy.md` — full deployment guide covering: topology overview, Mac
|
|
prerequisites (Ollama config, env vars, model pull, sleep prevention, firewall),
|
|
orgrimmar deployment (image registry, Komodo, config, persistence), security
|
|
model (internal-only, no public DNS, bearer tokens, HMAC), go-llm usage
|
|
(sync + async), and troubleshooting (6 common scenarios).
|
|
- steveternet compose stack — PR to `steve/steveternet` adding
|
|
`azeroth/kalimdor/orgrimmar/foreman/docker-compose.yml` and `.env.example`.
|
|
Follows sibling conventions: `web` network (external), `unless-stopped`,
|
|
gitea registry image, Traefik labels for internal routing, named volume
|
|
for SQLite persistence, all config via `${VAR}` interpolation.
|