feat: OpenAI, Anthropic, and native-Ollama providers + media pipeline

Phase 3:
- provider/openai: Chat Completions for OpenAI + compat endpoints (SSE
  streaming with by-index tool-call assembly, response_format json_schema,
  legacy max_tokens option, reasoning_effort)
- provider/anthropic: Messages API (tool_use/tool_result, GA structured
  output via output_config.format, full SSE event parser, 529 transient)
- provider/ollama: one native /api/chat client behind the ollama,
  ollama-cloud, and foreman built-ins (presets; NDJSON streaming tolerant
  of foreman's buffered single-object responses; object tool arguments;
  format-schema structured output; think mapping)
- media/: capability normalization (sniff, downscale, transcode, byte
  ladder, ErrUnsupported), wired into the chain executor per target with
  penalty-free advance past incapable elements
- registry: real provider + scheme wiring, WithHTTPClient option, required
  env-foreman TLS chat round-trip test
- ADR-0009 multimodal strategy, ADR-0010 tools/structured mapping; README
  matrix + CLAUDE.md synced

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-06-10 12:58:08 +02:00
parent 323558ed72
commit 043249e0e1
31 changed files with 6194 additions and 74 deletions
+57
View File
@@ -0,0 +1,57 @@
# ADR-0009: Multimodal strategy — normalize per target, enforce at the provider
**Status:** Accepted — 2026-06-10
## Context
Every provider (and some models) imposes different image rules: max
dimensions/bytes, allowed MIME types, max images per request. A caller must
be able to attach an image without knowing the eventual target — especially
with failover chains, where the serving target isn't known until runtime.
## Decision
Two cooperating layers:
1. **`media.Normalize(req, caps)`** — the transformation point. The chain
executor calls it **per target, per attempt**, against the actual
target's capabilities, before the provider sees the request:
- The real format is **sniffed from magic bytes** and wins over the
declared MIME (callers lie; jpeg/png/gif/webp recognized).
- Already-fitting images pass through untouched (fast path: zero copies).
- Oversize dimensions downscale (aspect-preserving) with a hand-rolled
box-filter — stdlib has no scaler and `x/image` stays out per
ADR-0007; box-average quality is ample for vision input.
- Disallowed MIME re-encodes: original format if allowed, else JPEG
(q85), else PNG, else the first allowed encodable type.
- Byte budgets enforce via a quality ladder (jpeg 85→65→45→30) then
dimension halving; ~6 attempts before giving up.
- WebP cannot be decoded by stdlib: it passes through when it fits and
is allowed; any needed transform is a clear error.
- Everything that cannot be made to fit errors **wrapping
`llm.ErrUnsupported`** — never silently dropped.
2. **Provider backstop** — each provider cheaply enforces its effective
capabilities at request time (image count/MIME/bytes, plus
tools/structured/streaming support flags) and rejects with
`ErrUnsupported`. This keeps providers honest for expert callers who
build models directly without the registry.
Chain semantics: a normalization failure for one target **advances** to the
next element with no health penalty (the target isn't sick, it's just
incapable) — so `fp/text-only,fp/vision` serves an image request from the
vision element automatically.
Canonical image content stays **bytes + MIME** (ADR-0002); no URL fetching.
## Consequences
- A 100×50 PNG sent at a 32px-cap target arrives as a 32×16 PNG; the same
request served by an 8000px target arrives untouched.
- Conditional provider rules (e.g. Anthropic's 2000px cap above 20 images)
are approximated by the flat declared caps — conservative and simple.
## Alternatives considered
- Normalize once against chain-intersection caps: over-restricts every
request for the sake of rarely-used fallbacks. Rejected (ADR-0008).
- `x/image/draw` scalers: a dependency for one function. Rejected.
@@ -0,0 +1,49 @@
# ADR-0010: Tools and structured output — one canonical shape, native mappings
**Status:** Accepted — 2026-06-10
## Context
Tool calling and schema-constrained output exist on every target but with
different wire shapes (verified against current docs, June 2026; shapes
recorded in each provider's package doc). The canonical API must hide all
of it.
## Decision
Canonical: `Tool{Name, Description, Parameters (JSON Schema), Handler}`;
`Response.ToolCalls[]{ID, Name, Arguments json.RawMessage}`; results return
as `ToolResultsMessage(ToolResult{ID, Name, Content, IsError})`. Structured
output via `WithSchema(schema, name)`. Per-provider mapping:
| Concern | OpenAI(+compat) | Anthropic(+compat) | Ollama/foreman | Google (Phase 4) |
|---|---|---|---|---|
| Tool def | `tools[].function{name,description,parameters}` | `tools[]{name,description,input_schema}` | `tools[].function` | `FunctionDeclaration.ParametersJsonSchema` |
| Call args | JSON **string** → RawMessage | `tool_use.input` object | `arguments` **object** | `FunctionCall.Args` map |
| Results | one `role:tool` msg per result (`tool_call_id`) | one **user** msg, `tool_result` blocks (`is_error` native) | `role:tool` + `tool_name` | `FunctionResponse` parts |
| IsError | `"ERROR: "` content prefix | `is_error: true` | `"ERROR: "` prefix | response payload field |
| Forced choice | `tool_choice` string / named object | `{"type":"any"/"tool"/"none"}` | none → drop tools; others best-effort ignored | `FunctionCallingConfig` modes |
| Structured | `response_format json_schema` (no strict flag) | `output_config.format json_schema` (GA mechanism) | `format: <schema>` | `ResponseJsonSchema` + JSON MIME |
Cross-cutting decisions:
- **Missing call ids are synthesized** (`call_<n>`) — Ollama and some
compat servers omit them; the agent loop needs ids to match results.
- **Streaming buffers tool-call arguments to completion** (ADR-0002):
OpenAI fragments accumulate by index, Anthropic `input_json_delta`
fragments accumulate per block; consumers only ever see parseable calls.
- **No strict-mode flag is sent** to OpenAI: strict mode imposes schema
constraints (every property required, additionalProperties:false) that
caller-supplied schemas may not satisfy. The `Generate[T]` reflector
(Phase 5) emits strict-compatible schemas anyway.
- `SchemaName` feeds providers that need a name (OpenAI; default
"response"); others ignore it.
- Tool handlers never panic the loop: `Toolbox.Execute`/`ExecuteTool`
recover panics and JSON-encode results (ADR to agent loop, Phase 5).
## Consequences
- One test matrix per provider asserts the exact wire JSON both directions;
drift is caught by httptest fixtures, not in production.
- Ollama's missing tool_choice means "required" cannot be enforced there —
documented in the README matrix rather than emulated.
+2
View File
@@ -12,3 +12,5 @@ One decision per file, append-only; supersede rather than rewrite.
| [0006](0006-health-and-backoff.md) | Model health tracking and backoff | Accepted |
| [0007](0007-dependency-policy.md) | Dependency policy — stdlib-first, hand-rolled REST clients | Accepted |
| [0008](0008-chain-semantics.md) | Failover-chain execution semantics | Accepted |
| [0009](0009-multimodal-strategy.md) | Multimodal strategy — normalize per target, enforce at provider | Accepted |
| [0010](0010-tools-structured-output-mapping.md) | Tools and structured output — canonical shape, native mappings | Accepted |