Phase 3: - provider/openai: Chat Completions for OpenAI + compat endpoints (SSE streaming with by-index tool-call assembly, response_format json_schema, legacy max_tokens option, reasoning_effort) - provider/anthropic: Messages API (tool_use/tool_result, GA structured output via output_config.format, full SSE event parser, 529 transient) - provider/ollama: one native /api/chat client behind the ollama, ollama-cloud, and foreman built-ins (presets; NDJSON streaming tolerant of foreman's buffered single-object responses; object tool arguments; format-schema structured output; think mapping) - media/: capability normalization (sniff, downscale, transcode, byte ladder, ErrUnsupported), wired into the chain executor per target with penalty-free advance past incapable elements - registry: real provider + scheme wiring, WithHTTPClient option, required env-foreman TLS chat round-trip test - ADR-0009 multimodal strategy, ADR-0010 tools/structured mapping; README matrix + CLAUDE.md synced Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
5.1 KiB
progress
2026-06-10 — Phase 3: REST providers (OpenAI, Anthropic, Ollama×3) + media
Landed:
provider/openai: Chat Completions client for OpenAI and every OpenAI-compatible endpoint (tools with string-arguments mapping, strict SSE streaming incl. by-index tool-call assembly and the empty-choices usage chunk, response_format json_schema, max_completion_tokens with a WithLegacyMaxTokens compat option, reasoning_effort).provider/anthropic: Messages API client (anthropic-version 2023-06-01, required-max_tokens defaulting, tool_use/tool_result blocks with native is_error, GA structured output via output_config.format, full SSE event parser with input_json_delta buffering, 529-overloaded classified transient, usage sums cache tokens).provider/ollama: ONE native /api/chat client serving ollama (local, OLLAMA_HOST normalization), ollama-cloud (https://ollama.com + bearer OLLAMA_API_KEY), and foreman (base URL + bearer; tolerates its buffered-single-object "streaming"). Object tool arguments, tool_name results, format-schema structured output, think-level mapping, NDJSON streaming with 16MB lines.media/: normalization pipeline per ADR-0009 (magic-byte sniffing, box-filter downscale, transcode preference ladder, byte-budget quality ladder, webp passthrough-or-reject, copy-on-write, everything-unfittable wraps ErrUnsupported).- Chain executor now normalizes media PER TARGET before each attempt and advances penalty-free past targets that can't take the request (proven: text-only head + vision fallback; per-target downscale assertions).
- Registry: real providers + scheme factories wired for openai, anthropic, ollama, ollama-cloud, foreman (google still stubbed, Phase 4); WithHTTPClient registry option; required env-foreman TLS chat round-trip test (LLM_FM=foreman://token@host → Parse("fm/qwen3:30b") → bearer arrives, chat answers).
- ADR-0009 (multimodal), ADR-0010 (tools/structured mapping); README matrix flipped to ✅ for the four landed provider families; ~70 new hermetic tests across the three provider packages + media.
- Run note: openai/anthropic/media were built by three parallel subagents against the frozen llm contract; ollama/foreman, chain wiring, and registry integration done in the main line. All gates green.
Next: Phase 4 — Google provider on google.golang.org/genai.
2026-06-10 — Phase 2: health + failover chain, proven
Landed: the full deterministic failover test matrix over the fake
provider + fake clock (no sleeps, no network): single-transient recovery
via same-target retry; repeated transients bench + advance; cooldown expiry
re-admits and success resets; backoff doubling across bench rounds;
mixed chain with an inline-expanded alias element failing over through the
expanded targets; permanent-policy default (fail-fast on auth) and
AdvanceOnPermanent override; TransientRetries disabled/custom; retry
loop stops early when the tracker benches mid-request; exhaustion error
lists skipped-while-benched targets; custom classifier override; chain-of-
one gets identical semantics; HTTP 529 fails over. Implementation needed no
changes — Phase 1's executor held up.
Next: Phase 3 — OpenAI/Anthropic/Ollama/foreman REST clients + media pipeline.
2026-06-10 — Phase 1: foundations, ADRs, skeleton, docs
Landed:
- Module scaffold (Go 1.26),
.gitea/workflows/ci.yaml(foreman-style gates: build, vet, race tests, tidy-diff),.env.example. llm/canonical contract: Message/Part (sealed; text+image), Request/Options, Response/Usage/FinishReason, Stream/StreamEvent, Tool/Toolbox (panic-safe Execute), Capabilities (zero-value semantics), Model/Provider interfaces, APIError + transient/permanent Classify.health/: clock-injected tracker — consecutive-failure threshold, exponential capped cooldown, reset-on-success, thread-safe; full deterministic test suite (fake clock).- Root: Registry (providers/aliases/schemes/health), Parse with the binding grammar (verbatim model ids, inline recursive alias expansion, cycle detection, dedup), LLM_* env-DSN loading (go-llm-parity lazy fallback + eager LoadEnv/New scan), chain executor implementing Model (retry-on-transient, bench-on-repeat, skip-benched, 404-advance, fail-fast-on-auth, joined exhaustion errors). Built-ins register as resolvable stubs until their phases land.
provider/fake/: scriptable provider (per-model outcome queues, request recording, capabilities overrides, streaming) — the hermetic test rig.- ADRs 0001–0008 + index; CLAUDE.md; honest README with pending-marked matrix.
- Tests cover the two required cases: the trailing-
thinkingchain parse andLLM_M1=foreman://token@hostloading (plus DSN table, lazy fallback, cycle detection, chain failover/backoff/exhaustion, toolbox execution, error classification).
Notes: chain executor landed in Phase 1 (design was settled); Phase 2 deepens its test matrix (cooldown re-admission via fake clock, alias-in-chain failover, permanent-policy override) and wires anything the tests flush out.
Next: Phase 2 — exhaustive health/chain test matrix.