initial commit

2026-05-23 16:41:20 -04:00
commit 8fde024281
15 changed files with 803 additions and 0 deletions
@@ -0,0 +1,51 @@
+# ADR-0003: API surface — native Ollama passthrough vs OpenAI-compat
+
+**Status:** Accepted — 2026-05-23 (resolved in favor of native Ollama)
+
+## Context
+
+Two goals were in mild tension: the original phrasing asked for an
+"OpenAI-compatible API," while the stated ultimate goal is to use the M1 Pro
+**simply as a target for `go-llm`**.
+
+`go-llm`'s `v2/CLAUDE.md` Key Design Decision #8 is explicit: its Ollama provider
+deliberately uses native `/api/chat`, *not* OpenAI-compat `/v1`, for `think:false`
+support, more reliable tool calling, and ~15-20% lower latency.
+
+**Verified in code (`v2/constructors.go`).** `llm.OllamaCloud(apiKey, opts...)`
+sends the key as `Authorization: Bearer <key>` over native `/api/chat`, and its
+doc comment says to "use `WithBaseURL` to point at a private Ollama deployment
+that requires auth." So go-llm *already* has a first-class path for a private,
+authenticated, native-Ollama endpoint — exactly what foreman is on the wire.
+Choosing OpenAI-compat would push go-llm onto a path its own author rejected, for
+no benefit to the primary caller.
+
+## Decision
+
+Native Ollama is **the** surface for v1. foreman speaks native `/api/chat`,
+`/api/tags`, and `/api/ps`, optionally behind a Bearer token (ADR-0010). To
+go-llm and any Ollama client it is indistinguishable from a private Ollama
+deployment.
+
+The synchronous passthrough is transparent: calls are queued internally
+(ADR-0009) but the HTTP response blocks until the job completes. Async features
+(job IDs, `state_webhook_url`, artifacts) live on a separate `/jobs` surface
+(ADR-0004), not bolted onto the passthrough.
+
+OpenAI-compat `/v1/chat/completions` is **deferred**, added in a later milestone
+only if a non-go-llm caller needs it.
+
+## Consequences
+
+- "Set up the Mac as a go-llm target" needs zero provider changes — a thin
+  constructor only (ADR-0011).
+- Preserves `think:false`, reliable tool calls, and lower latency.
+- foreman must faithfully proxy native `/api/chat` semantics, including SSE
+  streaming (ADR-0012).
+
+## Alternatives considered
+
+- **OpenAI-compat as primary/only surface.** Matches the original phrasing but
+  contradicts go-llm DD#8 and adds nothing for the primary caller. Rejected.
+- **Native-only, never add OpenAI-compat.** Fully serves the goal; the secondary
+  surface is kept as an option, not a commitment.