docs: land prior ADR + prompt updates

Commit pre-existing uncommitted working-tree changes that predate the
license/public-readiness work — NOT authored in this session, just flushed so
they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013
(embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and
the phase-0..3 prompt revisions + prompts/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-26 20:33:39 -04:00
parent 823c0b4ca8
commit 0526bada90
10 changed files with 276 additions and 98 deletions
+7 -5
View File
@@ -13,11 +13,13 @@ different granularity than token streaming.
## Decision
- **Sync passthrough: support streaming.** When a `/api/chat` request sets
`stream: true`, foreman streams the target's token deltas back to the caller
(SSE/chunked, matching Ollama's native streaming). A streamed job still moves
through the queue; streaming begins once the job reaches `working`, so a job
waiting behind the drain-by-model queue (ADR-0009) simply starts streaming when
its turn comes. go-llm's `Stream()` works against foreman unchanged.
`stream: true`, foreman streams the target's token deltas back to the caller as
**NDJSON** (`application/x-ndjson`, newline-delimited JSON chunks — Ollama's
native streaming wire format, which go-llm reads with a `bufio.Scanner`). This
is *not* SSE/`text/event-stream`. A streamed job still moves through the queue;
streaming begins once the job reaches `working`, so a job waiting behind the
drain-by-model queue (ADR-0009) simply starts streaming when its turn comes.
go-llm's `Stream()` works against foreman unchanged.
- **Async `/jobs` surface: no token streaming in v1.** Webhooks carry coarse state
transitions (ADR-0005) and the final result/artifacts, not per-token deltas.
Token-level streaming over a fire-and-forget webhook job is deliberately