0526bada90
Commit pre-existing uncommitted working-tree changes that predate the license/public-readiness work — NOT authored in this session, just flushed so they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013 (embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and the phase-0..3 prompt revisions + prompts/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
44 lines
2.1 KiB
Markdown
44 lines
2.1 KiB
Markdown
# ADR-0012: Streaming support
|
|
|
|
**Status:** Accepted — 2026-05-23
|
|
|
|
## Context
|
|
|
|
`go-llm`'s provider interface has a `Stream()` method, and Ollama's native
|
|
`/api/chat` streams token-by-token by default. The synchronous passthrough
|
|
(ADR-0003) must not break streaming clients. Separately, the async `/jobs`
|
|
surface (ADR-0004) reports progress via discrete state webhooks, which is a
|
|
different granularity than token streaming.
|
|
|
|
## Decision
|
|
|
|
- **Sync passthrough: support streaming.** When a `/api/chat` request sets
|
|
`stream: true`, foreman streams the target's token deltas back to the caller as
|
|
**NDJSON** (`application/x-ndjson`, newline-delimited JSON chunks — Ollama's
|
|
native streaming wire format, which go-llm reads with a `bufio.Scanner`). This
|
|
is *not* SSE/`text/event-stream`. A streamed job still moves through the queue;
|
|
streaming begins once the job reaches `working`, so a job waiting behind the
|
|
drain-by-model queue (ADR-0009) simply starts streaming when its turn comes.
|
|
go-llm's `Stream()` works against foreman unchanged.
|
|
- **Async `/jobs` surface: no token streaming in v1.** Webhooks carry coarse state
|
|
transitions (ADR-0005) and the final result/artifacts, not per-token deltas.
|
|
Token-level streaming over a fire-and-forget webhook job is deliberately
|
|
deferred — it adds a transport (persistent connection or chunked webhook) whose
|
|
complexity isn't justified yet.
|
|
|
|
## Consequences
|
|
|
|
- Interactive go-llm usage gets real streaming through the transparent surface.
|
|
- Orchestration callers get state + final artifacts, which is what they need;
|
|
they can use the sync streaming surface directly if they want tokens.
|
|
- The job state machine and webhook protocol stay simple (no streaming transport
|
|
to design or operate).
|
|
|
|
## Alternatives considered
|
|
|
|
- **Stream tokens over the async surface too.** Deferred: requires either a
|
|
long-lived connection (defeats the point of async) or chunked-delta webhooks
|
|
(complex, rarely needed). Revisit only on a concrete need.
|
|
- **No streaming at all.** Would break go-llm's `Stream()` and interactive use on
|
|
the very path that is the primary goal. Rejected.
|