feat: add durable queue, single worker, and drain-by-model scheduling
Replace the Phase 2 in-flight chat gate (buffered channel) with a real SQLite-backed job queue and single worker loop. Every /api/chat request now creates a job row, blocks until the worker completes it, and returns the result transparently. Key changes: - internal/store: NextJob (drain-by-model ordering), IncrementAttempt, ResetInterruptedJobs, DeleteTerminalJobsBefore; busy_timeout pragma - internal/worker: single-threaded worker loop with Notifier for sync handler completion signaling; retry on ConnectionError, terminal fail on HTTPError; crash recovery resets interrupted jobs on startup - internal/webhook: dispatcher infrastructure for async webhook delivery - internal/server: chat handler rewritten to enqueue+wait; old chatGate removed; embeddings remain direct concurrent proxies (ADR-0013) - internal/config: FOREMAN_MAX_ATTEMPTS, FOREMAN_JOB_TTL Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+51
@@ -66,3 +66,54 @@
|
||||
The Mac is now usable as a go-llm target through foreman:
|
||||
`llm.OllamaCloud(token, WithBaseURL("http://foreman:8080"))` works transparently
|
||||
for chat (streaming + non-streaming), tags, ps, and embeddings.
|
||||
|
||||
## Phase 3: Durable queue, single worker, drain-by-model — 2026-05-23
|
||||
|
||||
**M0 complete.** The Phase 2 in-flight chat gate (buffered channel) is replaced
|
||||
with the real SQLite-backed job queue and single worker loop.
|
||||
|
||||
- `internal/store/` — new store methods:
|
||||
- `NextJob(currentModel)`: drain-by-model ordering — prefers jobs matching the
|
||||
currently-resident model to minimize swap cost, then FIFO by created_at.
|
||||
- `IncrementAttempt(id)`: bumps attempt counter and re-queues for retry.
|
||||
- `ResetInterruptedJobs()`: resets loading/working jobs to queued on startup
|
||||
(crash recovery).
|
||||
- `DeleteTerminalJobsBefore(cutoff)`: TTL pruner for old done/failed jobs.
|
||||
- SQLite DSN now includes `_pragma=busy_timeout(5000)` for reliable concurrent
|
||||
access from HTTP handlers + worker.
|
||||
|
||||
- `internal/worker/` — single worker loop (`worker.go`):
|
||||
- `Worker.Run(ctx)`: main goroutine loop — resets interrupted jobs on startup,
|
||||
then continuously picks the next job using drain-by-model ordering, executes
|
||||
via the Ollama client, stores result + completion artifact, notifies waiters.
|
||||
- `Worker.Wake()`: non-blocking signal for new job availability.
|
||||
- `Notifier`: sync.Map-based completion notification — HTTP handlers register
|
||||
a channel per job ID, the worker closes it on completion. Supports
|
||||
`Register()`, `Complete()`, `Result()`.
|
||||
- Retry semantics: `*ollama.ConnectionError` causes re-queue with incremented
|
||||
attempt; `*ollama.HTTPError` is a terminal failure (no retry). Max attempts
|
||||
configurable via `FOREMAN_MAX_ATTEMPTS` (default 3).
|
||||
- The worker loop never panics — all errors are logged, jobs are marked, loop
|
||||
continues.
|
||||
|
||||
- `internal/server/` — chat handler rewrite:
|
||||
- `POST /api/chat` now creates a job row (state `queued`), registers a
|
||||
completion waiter, wakes the worker, and blocks until the job reaches a
|
||||
terminal state. Returns the Ollama response on success, 502 on failure.
|
||||
- ULID job IDs generated at submission time (`github.com/oklog/ulid/v2`).
|
||||
- The old `chatGate` (buffered channel) is removed entirely.
|
||||
- `/api/embed` and `/api/embeddings` remain direct concurrent proxies (unchanged
|
||||
from Phase 2, per ADR-0013).
|
||||
|
||||
- `internal/config/` — new config fields:
|
||||
- `FOREMAN_MAX_ATTEMPTS` (int, default 3)
|
||||
- `FOREMAN_JOB_TTL` (duration, default 24h)
|
||||
|
||||
- Tests (all passing with `-race`):
|
||||
- Worker: single job execution, serial enforcement, drain-by-model ordering,
|
||||
retry on connection error, max attempts exhaustion, HTTP error terminal
|
||||
failure, interrupted job reset on startup, wake signal, notifier lifecycle.
|
||||
- Store: NextJob drain-by-model, empty queue, IncrementAttempt, ResetInterrupted,
|
||||
DeleteTerminalJobsBefore.
|
||||
- Server: chat model validation (404), non-streaming chat through queue,
|
||||
serialization (max 1 concurrent), context cancellation, embed bypass unchanged.
|
||||
|
||||
Reference in New Issue
Block a user