docs: land prior ADR + prompt updates
Commit pre-existing uncommitted working-tree changes that predate the license/public-readiness work — NOT authored in this session, just flushed so they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013 (embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and the phase-0..3 prompt revisions + prompts/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -13,11 +13,13 @@ different granularity than token streaming.
|
||||
## Decision
|
||||
|
||||
- **Sync passthrough: support streaming.** When a `/api/chat` request sets
|
||||
`stream: true`, foreman streams the target's token deltas back to the caller
|
||||
(SSE/chunked, matching Ollama's native streaming). A streamed job still moves
|
||||
through the queue; streaming begins once the job reaches `working`, so a job
|
||||
waiting behind the drain-by-model queue (ADR-0009) simply starts streaming when
|
||||
its turn comes. go-llm's `Stream()` works against foreman unchanged.
|
||||
`stream: true`, foreman streams the target's token deltas back to the caller as
|
||||
**NDJSON** (`application/x-ndjson`, newline-delimited JSON chunks — Ollama's
|
||||
native streaming wire format, which go-llm reads with a `bufio.Scanner`). This
|
||||
is *not* SSE/`text/event-stream`. A streamed job still moves through the queue;
|
||||
streaming begins once the job reaches `working`, so a job waiting behind the
|
||||
drain-by-model queue (ADR-0009) simply starts streaming when its turn comes.
|
||||
go-llm's `Stream()` works against foreman unchanged.
|
||||
- **Async `/jobs` surface: no token streaming in v1.** Webhooks carry coarse state
|
||||
transitions (ADR-0005) and the final result/artifacts, not per-token deltas.
|
||||
Token-level streaming over a fire-and-forget webhook job is deliberately
|
||||
|
||||
Reference in New Issue
Block a user