Files

T

steve 0526bada90 docs: land prior ADR + prompt updates

Commit pre-existing uncommitted working-tree changes that predate the
license/public-readiness work — NOT authored in this session, just flushed so
they're not lost: ADR-0003/0005/0009/0012 edits, the new ADR-0013
(embeddings-bypass + two-slot residency, already referenced by CLAUDE.md), and
the phase-0..3 prompt revisions + prompts/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-26 20:33:39 -04:00

2.3 KiB

Raw Blame History

ADR-0003: API surface — native Ollama passthrough vs OpenAI-compat

Status: Accepted — 2026-05-23 (resolved in favor of native Ollama)

Context

Two goals were in mild tension: the original phrasing asked for an "OpenAI-compatible API," while the stated ultimate goal is to use the M1 Pro simply as a target for go-llm.

go-llm's v2/CLAUDE.md Key Design Decision #8 is explicit: its Ollama provider deliberately uses native /api/chat, not OpenAI-compat /v1, for think:false support, more reliable tool calling, and ~15-20% lower latency.

Verified in code (v2/constructors.go). llm.OllamaCloud(apiKey, opts...) sends the key as Authorization: Bearer <key> over native /api/chat, and its doc comment says to "use WithBaseURL to point at a private Ollama deployment that requires auth." So go-llm already has a first-class path for a private, authenticated, native-Ollama endpoint — exactly what foreman is on the wire. Choosing OpenAI-compat would push go-llm onto a path its own author rejected, for no benefit to the primary caller.

Decision

Native Ollama is the surface for v1. foreman speaks native /api/chat, /api/tags, and /api/ps, optionally behind a Bearer token (ADR-0010). To go-llm and any Ollama client it is indistinguishable from a private Ollama deployment.

The synchronous passthrough is transparent: calls are queued internally (ADR-0009) but the HTTP response blocks until the job completes. Async features (job IDs, state_webhook_url, artifacts) live on a separate /jobs surface (ADR-0004), not bolted onto the passthrough.

OpenAI-compat /v1/chat/completions is deferred, added in a later milestone only if a non-go-llm caller needs it.

Consequences

"Set up the Mac as a go-llm target" needs zero provider changes — a thin constructor only (ADR-0011).
Preserves think:false, reliable tool calls, and lower latency.
foreman must faithfully proxy native /api/chat semantics, including NDJSON streaming (application/x-ndjson, not SSE; ADR-0012).

Alternatives considered

OpenAI-compat as primary/only surface. Matches the original phrasing but contradicts go-llm DD#8 and adds nothing for the primary caller. Rejected.
Native-only, never add OpenAI-compat. Fully serves the goal; the secondary surface is kept as an option, not a commitment.

2.3 KiB Raw Blame History