043249e0e1
Phase 3: - provider/openai: Chat Completions for OpenAI + compat endpoints (SSE streaming with by-index tool-call assembly, response_format json_schema, legacy max_tokens option, reasoning_effort) - provider/anthropic: Messages API (tool_use/tool_result, GA structured output via output_config.format, full SSE event parser, 529 transient) - provider/ollama: one native /api/chat client behind the ollama, ollama-cloud, and foreman built-ins (presets; NDJSON streaming tolerant of foreman's buffered single-object responses; object tool arguments; format-schema structured output; think mapping) - media/: capability normalization (sniff, downscale, transcode, byte ladder, ErrUnsupported), wired into the chain executor per target with penalty-free advance past incapable elements - registry: real provider + scheme wiring, WithHTTPClient option, required env-foreman TLS chat round-trip test - ADR-0009 multimodal strategy, ADR-0010 tools/structured mapping; README matrix + CLAUDE.md synced Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
248 lines
9.0 KiB
Markdown
248 lines
9.0 KiB
Markdown
# majordomo
|
||
|
||
A clean-slate Go library for building LLM-backed agents: one canonical API
|
||
over many model providers, a parseable model naming / failover / tiering
|
||
system with built-in health tracking, capability-aware multimodality, tool
|
||
calls, structured output, and composable agents and skills.
|
||
|
||
> **Status:** under construction, phase by phase. The
|
||
> [support matrix](#featureprovider-support-matrix) below is kept honest:
|
||
> *pending* means not built yet, and this README is updated in the same
|
||
> commit as the behavior it describes.
|
||
|
||
## Install
|
||
|
||
```bash
|
||
go get gitea.stevedudenhoeffer.com/steve/majordomo
|
||
```
|
||
|
||
Requires Go 1.26+.
|
||
|
||
## Quickstart
|
||
|
||
```go
|
||
package main
|
||
|
||
import (
|
||
"context"
|
||
"fmt"
|
||
|
||
"gitea.stevedudenhoeffer.com/steve/majordomo"
|
||
)
|
||
|
||
func main() {
|
||
reg := majordomo.New() // built-ins + LLM_* env providers
|
||
|
||
m, err := reg.Parse("ollama-cloud/minimax-m3:cloud")
|
||
if err != nil { panic(err) }
|
||
|
||
resp, err := m.Generate(context.Background(), majordomo.Request{
|
||
Messages: []majordomo.Message{majordomo.UserText("hello!")},
|
||
})
|
||
if err != nil { panic(err) }
|
||
fmt.Println(resp.Text())
|
||
}
|
||
```
|
||
|
||
`majordomo.Parse(...)` (package level) uses a lazily-built default registry
|
||
if you don't need isolation.
|
||
|
||
## Model specs: targets, chains, tiers
|
||
|
||
A model spec is a comma-separated **failover chain**; each element is either
|
||
a `provider/model` target or a registered **alias** (tier):
|
||
|
||
```go
|
||
// Try minimax-m3 first; on failure kimi-k2.6; finally fall back to opus-4.8.
|
||
m, _ := reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8")
|
||
|
||
// Identical, with the registered alias "thinking" appended and expanded
|
||
// in place as the tail of the chain:
|
||
m, _ = reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8,thinking")
|
||
```
|
||
|
||
Everything after the **first `/`** (up to the next comma) is the model id,
|
||
passed to the provider **verbatim** — tags (`:cloud`, `:30b`) and ids with
|
||
extra slashes survive intact. majordomo never validates ids against a
|
||
catalog.
|
||
|
||
### Custom tiers (aliases)
|
||
|
||
```go
|
||
reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
|
||
reg.RegisterAlias("workhorse", "ollama-cloud/minimax-m2.7:cloud,ollama-cloud/qwen3-coder:480b-cloud")
|
||
|
||
m, _ := reg.Parse("thinking") // a chain, same Model interface as a single target
|
||
```
|
||
|
||
Aliases may appear anywhere in a chain (head, middle, tail), may reference
|
||
other aliases, and expand inline and recursively; cycles are detected and
|
||
returned as errors.
|
||
|
||
### Failover & health
|
||
|
||
Chains are health-tracked per target:
|
||
|
||
- A **single transient error** (429/5xx, timeout, connection failure) is
|
||
retried once on the same target.
|
||
- **Repeated transient errors** (default: 2 consecutive failed attempts)
|
||
bench the target — chains skip it until its cooldown expires (exponential:
|
||
5s, 10s, 20s, ... capped at 5m). Any success resets it.
|
||
- `model not found` advances down the chain without penalty; auth/malformed
|
||
errors fail fast (failing over can't fix a bad key). All knobs are
|
||
configurable via `WithChainConfig` / `WithHealthConfig`.
|
||
- If every element fails, you get one joined error naming each target and
|
||
why it failed.
|
||
|
||
## Providers
|
||
|
||
### Built-in env vars
|
||
|
||
| Provider | Spec name | Key env var | Default endpoint |
|
||
|----------|-----------|-------------|------------------|
|
||
| OpenAI (+compatible) | `openai` | `OPENAI_API_KEY` | https://api.openai.com/v1 |
|
||
| Anthropic (+compatible) | `anthropic` | `ANTHROPIC_API_KEY` | https://api.anthropic.com |
|
||
| Google (Gemini) | `google` | `GOOGLE_API_KEY` / `GEMINI_API_KEY` | Gen AI API *(pending)* |
|
||
| Ollama Cloud | `ollama-cloud` | `OLLAMA_API_KEY` | https://ollama.com |
|
||
| Ollama (local) | `ollama` | — | `OLLAMA_HOST` or http://localhost:11434 |
|
||
| foreman | `foreman` | — (token via DSN) | requires an LLM_* DSN or `ollama.Foreman(url, token)` |
|
||
|
||
OpenAI-compatible / Anthropic-compatible endpoints: construct the provider
|
||
with a name and base URL and register it —
|
||
|
||
```go
|
||
reg.RegisterProvider(openai.New(
|
||
openai.WithName("groq"),
|
||
openai.WithBaseURL("https://api.groq.com/openai/v1"),
|
||
openai.WithAPIKey(key),
|
||
// openai.WithLegacyMaxTokens(), // for servers that only honor max_tokens
|
||
))
|
||
// now "groq/llama-3.3-70b" works in Parse, chains, and aliases
|
||
```
|
||
|
||
### `LLM_*` env-DSN provider definitions
|
||
|
||
Define named providers entirely from the environment (go-llm parity):
|
||
|
||
```
|
||
LLM_M1=foreman://test-token-change-me@foreman-m1.orgrimmar.dudenhoeffer.casa
|
||
LLM_M5=foreman://test-token-change-me@foreman-m5.orgrimmar.dudenhoeffer.casa
|
||
```
|
||
|
||
defines providers `m1` and `m5` (foreman targets — native Ollama wire
|
||
protocol behind a bearer token). They are first-class in `Parse`, chains,
|
||
and aliases:
|
||
|
||
```go
|
||
m, _ := reg.Parse("m5/qwen3:30b,m1/qwen3:30b,thinking")
|
||
```
|
||
|
||
DSN format: `scheme://[token@]host[/path]`, scheme ∈ `foreman`, `ollama`,
|
||
`ollama-cloud`, `openai`, `anthropic`, `google`/`gemini`, or any scheme you
|
||
add with `RegisterScheme`. The token is the credential (bearer token / API
|
||
key); the base URL is always `https://host[/path]`. `New()` loads `LLM_*`
|
||
vars eagerly; unknown provider names also resolve lazily at Parse time
|
||
(`my-prov/x` → `LLM_MY_PROV`).
|
||
|
||
### Custom providers
|
||
|
||
Implement the two-method `Provider` interface and register it:
|
||
|
||
```go
|
||
reg.RegisterProvider(myProvider) // now "myprovider/model-x" parses, chains, aliases
|
||
```
|
||
|
||
## Multimodality
|
||
|
||
Attach images without knowing the target's limits. Before each attempt the
|
||
request is normalized against the **actual serving target's** declared
|
||
capabilities: the real format is sniffed from the bytes, oversize images
|
||
are downscaled (aspect preserved), disallowed formats are re-encoded, and
|
||
byte budgets are enforced by a quality ladder. What cannot be made to fit
|
||
is rejected with a clear `ErrUnsupported` error — and in a chain, the
|
||
request simply advances to the next (e.g. vision-capable) element.
|
||
|
||
```go
|
||
resp, err := m.Generate(ctx, majordomo.Request{
|
||
Messages: []majordomo.Message{
|
||
majordomo.UserParts(majordomo.Text("what's in this image?"),
|
||
majordomo.Image("image/png", pngBytes)),
|
||
},
|
||
})
|
||
```
|
||
|
||
## Tool calls
|
||
|
||
```go
|
||
weather := majordomo.Tool{
|
||
Name: "get_weather",
|
||
Description: "Current weather for a city",
|
||
Parameters: json.RawMessage(`{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}`),
|
||
Handler: func(ctx context.Context, args json.RawMessage) (any, error) {
|
||
var p struct{ City string `json:"city"` }
|
||
_ = json.Unmarshal(args, &p)
|
||
return map[string]any{"city": p.City, "temp_c": 21}, nil
|
||
},
|
||
}
|
||
resp, _ := m.Generate(ctx, req, majordomo.WithTools(weather))
|
||
// resp.ToolCalls → execute → append ToolResultsMessage → continue
|
||
```
|
||
|
||
Each provider maps this one shape to its native function-calling format
|
||
(OpenAI tools/tool_calls, Anthropic tool_use/tool_result, Ollama tools with
|
||
object arguments). Tool-call ids are synthesized when a backend omits them;
|
||
streaming buffers tool-call arguments until they parse.
|
||
|
||
## Structured output
|
||
|
||
```go
|
||
resp, _ := m.Generate(ctx, req, majordomo.WithSchema(schemaJSON, "answer"))
|
||
```
|
||
|
||
Maps to OpenAI `response_format: json_schema`, Anthropic
|
||
`output_config.format`, and Ollama `format`. A generic `Generate[T]` helper
|
||
(schema from your struct, unmarshal into it) lands with the agent phase.
|
||
|
||
## Agents & skills *(pending — Phases 5–6)*
|
||
|
||
Agents = model + system prompt + toolboxes, running a tool-dispatch loop;
|
||
skills = reusable instruction+tool bundles attachable to any agent.
|
||
|
||
## Feature/provider support matrix
|
||
|
||
| Provider | Resolve/Parse | Chat | Streaming | Tools | Structured | Images | Env DSN |
|
||
|----------------------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
||
| OpenAI (+compatible) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||
| Anthropic (+compat) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||
| Google (Gemini) | ✅ | pending | pending | pending | pending | pending | ✅ |
|
||
| Ollama Cloud | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||
| Ollama (local) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||
| foreman | ✅ | ✅ | ✅¹ | ✅ | ✅ | ✅ | ✅ |
|
||
| fake (testing) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — |
|
||
|
||
¹ foreman's daemon currently buffers sync chat responses (no token-by-token
|
||
streaming); majordomo's stream API works against it and delivers the
|
||
response as a single delta plus final event.
|
||
|
||
Notes: Ollama has no native tool_choice — `"none"` drops the tools;
|
||
`"required"`/named choices are best-effort ignored there.
|
||
|
||
Cross-cutting: Parse grammar ✅ · aliases/tiers ✅ · failover chains ✅ ·
|
||
health tracking/backoff ✅ · LLM_* env DSNs ✅ · media pipeline ✅
|
||
(per-target normalization in chains) · agent loop pending · skills pending
|
||
· `Generate[T]` pending.
|
||
|
||
## Development
|
||
|
||
```bash
|
||
go build ./... && go vet ./... && go test -race -count=1 ./...
|
||
```
|
||
|
||
The default test suite is fully hermetic (no network, no credentials).
|
||
Live integration tests (Phase 8) are gated behind the `live` build tag and
|
||
read `.env` (see `.env.example`; never commit `.env`).
|
||
|
||
Design decisions are recorded in [docs/adr/](docs/adr/README.md);
|
||
conventions in [CLAUDE.md](CLAUDE.md); build history in
|
||
[progress.md](progress.md).
|