0526bada90
Commit pre-existing uncommitted working-tree changes that predate the license/public-readiness work — NOT authored in this session, just flushed so they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013 (embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and the phase-0..3 prompt revisions + prompts/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
44 lines
2.2 KiB
Markdown
44 lines
2.2 KiB
Markdown
# phase-3.md — Durable queue, single worker, drain-by-model
|
|
|
|
Re-ground: `CLAUDE.md` + ADR-0009 (single worker / drain-by-model), 0013
|
|
(embeddings bypass — they must NOT be touched here), 0008 (queue), 0004
|
|
(lifecycle/retry). Plan, get approval, implement.
|
|
|
|
## Objective
|
|
|
|
Replace Phase 2's interim single-flight chat gate with the real SQLite queue and
|
|
one worker, with drain-by-model scheduling. The synchronous passthrough now
|
|
enqueues and blocks on completion instead of holding a direct gate.
|
|
`/api/embed` stays exactly as Phase 2 built it — direct, concurrent, never
|
|
queued (ADR-0013). Do not route embeddings through any of this.
|
|
|
|
## Tasks
|
|
|
|
- Promote chat requests to persisted jobs: every `/api/chat` call creates a `jobs`
|
|
row (state `queued`), and the handler blocks until that job reaches a terminal
|
|
state, then writes the response. Assign a **ULID** as the job id now (used
|
|
everywhere in Phase 4).
|
|
- `internal/worker`: a single worker loop (concurrency 1). Select the next job
|
|
with `ORDER BY (model != :current_resident), created_at` so all jobs for the
|
|
currently-resident model (from `/api/ps`) drain before a swap. Transition
|
|
`queued→loading→working→done`. Pin residency with Ollama `keep_alive`.
|
|
- Retry semantics (ADR-0004): a connection failure to the target re-queues the
|
|
job with backoff and increments `attempt`; exceeding a bounded max moves it to
|
|
`failed` with the last error stored. Never auto-fail on a single transient
|
|
error. Jobs survive process restart (resume `queued`/in-flight on boot).
|
|
- Tests: against the stub Ollama — jobs persist and execute serially; a sequence
|
|
mixing two models drains by model (assert the swap happens once, not per job);
|
|
a flapping target causes retry-then-success without data loss; restart mid-queue
|
|
resumes cleanly.
|
|
|
|
## Definition of done
|
|
|
|
- `go build/vet/test -race` green.
|
|
- The Phase 2 acceptance (go-llm completes a chat) still passes, now served
|
|
through the queue.
|
|
- Demonstrable: enqueue several jobs across two models and observe drain-by-model
|
|
ordering in logs; kill and restart foreman mid-queue and watch it resume.
|
|
|
|
Wrap up: `progress.md`, commit on `phase-3-queue`. M0 is effectively complete here
|
|
— note that. Phase 4 adds the async surface on top of this same engine.
|