0526bada90
Commit pre-existing uncommitted working-tree changes that predate the license/public-readiness work — NOT authored in this session, just flushed so they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013 (embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and the phase-0..3 prompt revisions + prompts/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
52 lines
2.3 KiB
Markdown
52 lines
2.3 KiB
Markdown
# ADR-0003: API surface — native Ollama passthrough vs OpenAI-compat
|
|
|
|
**Status:** Accepted — 2026-05-23 (resolved in favor of native Ollama)
|
|
|
|
## Context
|
|
|
|
Two goals were in mild tension: the original phrasing asked for an
|
|
"OpenAI-compatible API," while the stated ultimate goal is to use the M1 Pro
|
|
**simply as a target for `go-llm`**.
|
|
|
|
`go-llm`'s `v2/CLAUDE.md` Key Design Decision #8 is explicit: its Ollama provider
|
|
deliberately uses native `/api/chat`, *not* OpenAI-compat `/v1`, for `think:false`
|
|
support, more reliable tool calling, and ~15-20% lower latency.
|
|
|
|
**Verified in code (`v2/constructors.go`).** `llm.OllamaCloud(apiKey, opts...)`
|
|
sends the key as `Authorization: Bearer <key>` over native `/api/chat`, and its
|
|
doc comment says to "use `WithBaseURL` to point at a private Ollama deployment
|
|
that requires auth." So go-llm *already* has a first-class path for a private,
|
|
authenticated, native-Ollama endpoint — exactly what foreman is on the wire.
|
|
Choosing OpenAI-compat would push go-llm onto a path its own author rejected, for
|
|
no benefit to the primary caller.
|
|
|
|
## Decision
|
|
|
|
Native Ollama is **the** surface for v1. foreman speaks native `/api/chat`,
|
|
`/api/tags`, and `/api/ps`, optionally behind a Bearer token (ADR-0010). To
|
|
go-llm and any Ollama client it is indistinguishable from a private Ollama
|
|
deployment.
|
|
|
|
The synchronous passthrough is transparent: calls are queued internally
|
|
(ADR-0009) but the HTTP response blocks until the job completes. Async features
|
|
(job IDs, `state_webhook_url`, artifacts) live on a separate `/jobs` surface
|
|
(ADR-0004), not bolted onto the passthrough.
|
|
|
|
OpenAI-compat `/v1/chat/completions` is **deferred**, added in a later milestone
|
|
only if a non-go-llm caller needs it.
|
|
|
|
## Consequences
|
|
|
|
- "Set up the Mac as a go-llm target" needs zero provider changes — a thin
|
|
constructor only (ADR-0011).
|
|
- Preserves `think:false`, reliable tool calls, and lower latency.
|
|
- foreman must faithfully proxy native `/api/chat` semantics, including NDJSON
|
|
streaming (`application/x-ndjson`, not SSE; ADR-0012).
|
|
|
|
## Alternatives considered
|
|
|
|
- **OpenAI-compat as primary/only surface.** Matches the original phrasing but
|
|
contradicts go-llm DD#8 and adds nothing for the primary caller. Rejected.
|
|
- **Native-only, never add OpenAI-compat.** Fully serves the goal; the secondary
|
|
surface is kept as an option, not a commitment.
|