Commit pre-existing uncommitted working-tree changes that predate the license/public-readiness work — NOT authored in this session, just flushed so they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013 (embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and the phase-0..3 prompt revisions + prompts/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2.3 KiB
ADR-0003: API surface — native Ollama passthrough vs OpenAI-compat
Status: Accepted — 2026-05-23 (resolved in favor of native Ollama)
Context
Two goals were in mild tension: the original phrasing asked for an
"OpenAI-compatible API," while the stated ultimate goal is to use the M1 Pro
simply as a target for go-llm.
go-llm's v2/CLAUDE.md Key Design Decision #8 is explicit: its Ollama provider
deliberately uses native /api/chat, not OpenAI-compat /v1, for think:false
support, more reliable tool calling, and ~15-20% lower latency.
Verified in code (v2/constructors.go). llm.OllamaCloud(apiKey, opts...)
sends the key as Authorization: Bearer <key> over native /api/chat, and its
doc comment says to "use WithBaseURL to point at a private Ollama deployment
that requires auth." So go-llm already has a first-class path for a private,
authenticated, native-Ollama endpoint — exactly what foreman is on the wire.
Choosing OpenAI-compat would push go-llm onto a path its own author rejected, for
no benefit to the primary caller.
Decision
Native Ollama is the surface for v1. foreman speaks native /api/chat,
/api/tags, and /api/ps, optionally behind a Bearer token (ADR-0010). To
go-llm and any Ollama client it is indistinguishable from a private Ollama
deployment.
The synchronous passthrough is transparent: calls are queued internally
(ADR-0009) but the HTTP response blocks until the job completes. Async features
(job IDs, state_webhook_url, artifacts) live on a separate /jobs surface
(ADR-0004), not bolted onto the passthrough.
OpenAI-compat /v1/chat/completions is deferred, added in a later milestone
only if a non-go-llm caller needs it.
Consequences
- "Set up the Mac as a go-llm target" needs zero provider changes — a thin constructor only (ADR-0011).
- Preserves
think:false, reliable tool calls, and lower latency. - foreman must faithfully proxy native
/api/chatsemantics, including NDJSON streaming (application/x-ndjson, not SSE; ADR-0012).
Alternatives considered
- OpenAI-compat as primary/only surface. Matches the original phrasing but contradicts go-llm DD#8 and adds nothing for the primary caller. Rejected.
- Native-only, never add OpenAI-compat. Fully serves the goal; the secondary surface is kept as an option, not a commitment.