Files
foreman/prompts/phase-3.md
T
steve 0526bada90 docs: land prior ADR + prompt updates
Commit pre-existing uncommitted working-tree changes that predate the
license/public-readiness work — NOT authored in this session, just flushed so
they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013
(embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and
the phase-0..3 prompt revisions + prompts/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 20:33:39 -04:00

2.2 KiB

phase-3.md — Durable queue, single worker, drain-by-model

Re-ground: CLAUDE.md + ADR-0009 (single worker / drain-by-model), 0013 (embeddings bypass — they must NOT be touched here), 0008 (queue), 0004 (lifecycle/retry). Plan, get approval, implement.

Objective

Replace Phase 2's interim single-flight chat gate with the real SQLite queue and one worker, with drain-by-model scheduling. The synchronous passthrough now enqueues and blocks on completion instead of holding a direct gate. /api/embed stays exactly as Phase 2 built it — direct, concurrent, never queued (ADR-0013). Do not route embeddings through any of this.

Tasks

  • Promote chat requests to persisted jobs: every /api/chat call creates a jobs row (state queued), and the handler blocks until that job reaches a terminal state, then writes the response. Assign a ULID as the job id now (used everywhere in Phase 4).
  • internal/worker: a single worker loop (concurrency 1). Select the next job with ORDER BY (model != :current_resident), created_at so all jobs for the currently-resident model (from /api/ps) drain before a swap. Transition queued→loading→working→done. Pin residency with Ollama keep_alive.
  • Retry semantics (ADR-0004): a connection failure to the target re-queues the job with backoff and increments attempt; exceeding a bounded max moves it to failed with the last error stored. Never auto-fail on a single transient error. Jobs survive process restart (resume queued/in-flight on boot).
  • Tests: against the stub Ollama — jobs persist and execute serially; a sequence mixing two models drains by model (assert the swap happens once, not per job); a flapping target causes retry-then-success without data loss; restart mid-queue resumes cleanly.

Definition of done

  • go build/vet/test -race green.
  • The Phase 2 acceptance (go-llm completes a chat) still passes, now served through the queue.
  • Demonstrable: enqueue several jobs across two models and observe drain-by-model ordering in logs; kill and restart foreman mid-queue and watch it resume.

Wrap up: progress.md, commit on phase-3-queue. M0 is effectively complete here — note that. Phase 4 adds the async surface on top of this same engine.