96c612e707
Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping
proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to
provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with
legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and
a timeout-free client so cold model swaps rely on context deadlines. The
"tailored" surface is concrete management methods (ListModels / Running /
Unload) that don't belong on the canonical llm.Provider interface. The
llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL
built-in errors clearly on use, mirroring foreman.
Add imagegen, a new canonical text-to-image interface separate from llm
(Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed
straight back into chat). First backend is llama-swap via OpenAI
/v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is
txt2img only.
Hermetic httptest coverage for chat delegation, management endpoints, image
decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix +
image-gen section, CLAUDE.md package map, and progress.md updated in the same
commit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
388 lines
15 KiB
Markdown
388 lines
15 KiB
Markdown
# majordomo
|
|
|
|
A clean-slate Go library for building LLM-backed agents: one canonical API
|
|
over many model providers, a parseable model naming / failover / tiering
|
|
system with built-in health tracking, capability-aware multimodality, tool
|
|
calls, structured output, and composable agents and skills.
|
|
|
|
> ### 🤖 Heads up: this is a vibe-coded project
|
|
> majordomo was built almost entirely by an AI agent (Claude Code) — design,
|
|
> code, and docs. It is reasonably well-tested (a fully hermetic suite plus
|
|
> gated live integration tests) and is used in earnest, but treat it
|
|
> accordingly: read the code before depending on it, expect the occasional
|
|
> AI-flavored rough edge, and please open issues. No warranty implied.
|
|
|
|
> The [support matrix](#featureprovider-support-matrix) below is kept
|
|
> honest: *pending* means not built, and this README is updated in the
|
|
> same commit as the behavior it describes. Runnable programs for every
|
|
> feature live in [examples/](examples/README.md).
|
|
|
|
## Install
|
|
|
|
```bash
|
|
go get gitea.stevedudenhoeffer.com/steve/majordomo
|
|
```
|
|
|
|
Requires Go 1.26+.
|
|
|
|
## Quickstart
|
|
|
|
```go
|
|
package main
|
|
|
|
import (
|
|
"context"
|
|
"fmt"
|
|
|
|
"gitea.stevedudenhoeffer.com/steve/majordomo"
|
|
)
|
|
|
|
func main() {
|
|
reg := majordomo.New() // built-ins + LLM_* env providers
|
|
|
|
m, err := reg.Parse("ollama-cloud/minimax-m3:cloud")
|
|
if err != nil { panic(err) }
|
|
|
|
resp, err := m.Generate(context.Background(), majordomo.Request{
|
|
Messages: []majordomo.Message{majordomo.UserText("hello!")},
|
|
})
|
|
if err != nil { panic(err) }
|
|
fmt.Println(resp.Text())
|
|
}
|
|
```
|
|
|
|
`majordomo.Parse(...)` (package level) uses a lazily-built default registry
|
|
if you don't need isolation.
|
|
|
|
## Model specs: targets, chains, tiers
|
|
|
|
A model spec is a comma-separated **failover chain**; each element is either
|
|
a `provider/model` target or a registered **alias** (tier):
|
|
|
|
```go
|
|
// Try minimax-m3 first; on failure kimi-k2.6; finally fall back to opus-4.8.
|
|
m, _ := reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8")
|
|
|
|
// Identical, with the registered alias "thinking" appended and expanded
|
|
// in place as the tail of the chain:
|
|
m, _ = reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8,thinking")
|
|
```
|
|
|
|
Everything after the **first `/`** (up to the next comma) is the model id,
|
|
passed to the provider **verbatim** — tags (`:cloud`, `:30b`) and ids with
|
|
extra slashes survive intact. majordomo never validates ids against a
|
|
catalog.
|
|
|
|
### Custom tiers (aliases)
|
|
|
|
```go
|
|
reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
|
|
reg.RegisterAlias("workhorse", "ollama-cloud/minimax-m2.7:cloud,ollama-cloud/qwen3-coder:480b-cloud")
|
|
|
|
m, _ := reg.Parse("thinking") // a chain, same Model interface as a single target
|
|
```
|
|
|
|
Aliases may appear anywhere in a chain (head, middle, tail), may reference
|
|
other aliases, and expand inline and recursively; cycles are detected and
|
|
returned as errors.
|
|
|
|
For tiers that live in a database or config system, register a **dynamic
|
|
resolver** — consulted after static aliases, output expanded with the same
|
|
recursion and cycle guards:
|
|
|
|
```go
|
|
reg.RegisterResolver(majordomo.ResolverFunc(func(name string) (string, bool) {
|
|
return myConfigStore.LookupTier(name) // e.g. "agent-thinking" → a chain
|
|
}))
|
|
```
|
|
|
|
### Failover & health
|
|
|
|
Chains are health-tracked per target:
|
|
|
|
- A **single transient error** (429/5xx, timeout, connection failure) is
|
|
retried once on the same target.
|
|
- **Repeated transient errors** (default: 2 consecutive failed attempts)
|
|
bench the target — chains skip it until its cooldown expires (exponential:
|
|
5s, 10s, 20s, ... capped at 5m). Any success resets it.
|
|
- `model not found` advances down the chain without penalty; auth/malformed
|
|
errors fail fast (failing over can't fix a bad key). All knobs are
|
|
configurable via `WithChainConfig` / `WithHealthConfig`.
|
|
- If every element fails, you get one joined error naming each target and
|
|
why it failed.
|
|
- Ops surfaces: `reg.Health()` exposes `Bench`/`Unbench`/`Snapshot` for
|
|
manual control and dashboards; `ChainConfig.Observer` receives one event
|
|
per failover decision (failed attempt, bench, benched-skip) for logging.
|
|
|
|
## Providers
|
|
|
|
### Built-in env vars
|
|
|
|
| Provider | Spec name | Key env var | Default endpoint |
|
|
|----------|-----------|-------------|------------------|
|
|
| OpenAI (+compatible) | `openai` | `OPENAI_API_KEY` | https://api.openai.com/v1 |
|
|
| Anthropic (+compatible) | `anthropic` | `ANTHROPIC_API_KEY` | https://api.anthropic.com |
|
|
| Google (Gemini) | `google` | `GOOGLE_API_KEY` / `GEMINI_API_KEY` | Gemini API (official SDK) |
|
|
| Ollama Cloud | `ollama-cloud` | `OLLAMA_API_KEY` | https://ollama.com |
|
|
| Ollama (local) | `ollama` | — | `OLLAMA_HOST` or http://localhost:11434 |
|
|
| foreman | `foreman` | — (token via DSN) | requires an LLM_* DSN or `ollama.Foreman(url, token)` |
|
|
| llama-swap | `llama-swap` | — (token via DSN) | requires an LLM_* DSN or `llamaswap.New(...)` |
|
|
|
|
OpenAI-compatible / Anthropic-compatible endpoints: construct the provider
|
|
with a name and base URL and register it —
|
|
|
|
```go
|
|
reg.RegisterProvider(openai.New(
|
|
openai.WithName("groq"),
|
|
openai.WithBaseURL("https://api.groq.com/openai/v1"),
|
|
openai.WithAPIKey(key),
|
|
// openai.WithLegacyMaxTokens(), // for servers that only honor max_tokens
|
|
))
|
|
// now "groq/llama-3.3-70b" works in Parse, chains, and aliases
|
|
```
|
|
|
|
### `LLM_*` env-DSN provider definitions
|
|
|
|
Define named providers entirely from the environment (go-llm parity):
|
|
|
|
```
|
|
LLM_M1=foreman://test-token-change-me@foreman-m1.example.com
|
|
LLM_M5=foreman://test-token-change-me@foreman-m5.example.com
|
|
```
|
|
|
|
defines providers `m1` and `m5` (foreman targets — native Ollama wire
|
|
protocol behind a bearer token). They are first-class in `Parse`, chains,
|
|
and aliases:
|
|
|
|
```go
|
|
m, _ := reg.Parse("m5/qwen3:30b,m1/qwen3:30b,thinking")
|
|
```
|
|
|
|
DSN format: `scheme://[token@]host[/path]`, scheme ∈ `foreman`, `ollama`,
|
|
`ollama-cloud`, `openai`, `anthropic`, `google`/`gemini`, `llama-swap`, or any
|
|
scheme you add with `RegisterScheme`. The token is the credential (bearer token
|
|
/ API key); the base URL is always `https://host[/path]` — except `llama-swap`,
|
|
which builds `http://host[:port]` since it's local-first. `New()` loads `LLM_*`
|
|
vars eagerly; unknown provider names also resolve lazily at Parse time
|
|
(`my-prov/x` → `LLM_MY_PROV`).
|
|
|
|
```
|
|
LLM_LS=llama-swap://token@box.local:8080 # then "ls/qwen3:14b" parses
|
|
```
|
|
|
|
[llama-swap](https://github.com/mostlygeek/llama-swap) is a model-swapping proxy
|
|
over llama.cpp. Its chat API is OpenAI-compatible (majordomo reuses the openai
|
|
client), and the `*llamaswap.Provider` adds management methods
|
|
(`ListModels`/`Running`/`Unload`) plus image generation (see below). A cold
|
|
model swap can take many seconds — bound calls with a context deadline, not a
|
|
client timeout.
|
|
|
|
### Custom providers
|
|
|
|
Implement the two-method `Provider` interface and register it:
|
|
|
|
```go
|
|
reg.RegisterProvider(myProvider) // now "myprovider/model-x" parses, chains, aliases
|
|
```
|
|
|
|
## Multimodality
|
|
|
|
Attach images without knowing the target's limits. Before each attempt the
|
|
request is normalized against the **actual serving target's** declared
|
|
capabilities: the real format is sniffed from the bytes, oversize images
|
|
are downscaled (aspect preserved), disallowed formats are re-encoded, and
|
|
byte budgets are enforced by a quality ladder. What cannot be made to fit
|
|
is rejected with a clear `ErrUnsupported` error — and in a chain, the
|
|
request simply advances to the next (e.g. vision-capable) element.
|
|
|
|
```go
|
|
resp, err := m.Generate(ctx, majordomo.Request{
|
|
Messages: []majordomo.Message{
|
|
majordomo.UserParts(majordomo.Text("what's in this image?"),
|
|
majordomo.Image("image/png", pngBytes)),
|
|
},
|
|
})
|
|
```
|
|
|
|
## Image generation
|
|
|
|
Text-to-image is a separate contract (`imagegen`) from chat, because it shares
|
|
none of the message/tool/stream machinery. Generated images come back as
|
|
`llm.ImagePart`, so they drop straight back into a chat turn. The first backend
|
|
is llama-swap (OpenAI `/v1/images/generations` → a stable-diffusion.cpp
|
|
upstream).
|
|
|
|
```go
|
|
ls := llamaswap.New(llamaswap.WithBaseURL("http://box.local:8080"))
|
|
im, _ := ls.ImageModel("sd-xl")
|
|
|
|
res, err := im.Generate(ctx, imagegen.Request{Prompt: "a red bicycle"},
|
|
imagegen.WithSize("1024x1024"))
|
|
// res.Images[0] is an llm.ImagePart (bytes + MIME) — feed it back into chat:
|
|
// majordomo.UserParts(majordomo.Text("describe this"), res.Images[0])
|
|
```
|
|
|
|
`*llamaswap.Provider` also exposes management methods: `ListModels` (what
|
|
llama-swap can serve), `Running` (what's loaded), and `Unload` (free a model).
|
|
|
|
## Tool calls
|
|
|
|
```go
|
|
weather := majordomo.Tool{
|
|
Name: "get_weather",
|
|
Description: "Current weather for a city",
|
|
Parameters: json.RawMessage(`{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}`),
|
|
Handler: func(ctx context.Context, args json.RawMessage) (any, error) {
|
|
var p struct{ City string `json:"city"` }
|
|
_ = json.Unmarshal(args, &p)
|
|
return map[string]any{"city": p.City, "temp_c": 21}, nil
|
|
},
|
|
}
|
|
resp, _ := m.Generate(ctx, req, majordomo.WithTools(weather))
|
|
// resp.ToolCalls → execute → append ToolResultsMessage → continue
|
|
```
|
|
|
|
Or typed, with the schema derived from your argument struct:
|
|
|
|
```go
|
|
weather := majordomo.DefineTool("get_weather", "Current weather for a city",
|
|
func(ctx context.Context, args struct {
|
|
City string `json:"city" description:"city name"`
|
|
}) (any, error) {
|
|
return lookup(args.City)
|
|
})
|
|
```
|
|
|
|
Each provider maps this one shape to its native function-calling format
|
|
(OpenAI tools/tool_calls, Anthropic tool_use/tool_result, Ollama tools with
|
|
object arguments). Tool-call ids are synthesized when a backend omits them;
|
|
streaming buffers tool-call arguments until they parse.
|
|
|
|
## Structured output
|
|
|
|
```go
|
|
resp, _ := m.Generate(ctx, req, majordomo.WithSchema(schemaJSON, "answer"))
|
|
```
|
|
|
|
Maps to OpenAI `response_format: json_schema`, Anthropic
|
|
`output_config.format`, Ollama `format`, and Google `responseJsonSchema`.
|
|
|
|
The typed helper derives the schema from your struct (all fields required,
|
|
`additionalProperties:false`, pointers nullable; `description:"..."` and
|
|
`enum:"a,b,c"` tags supported) and unmarshals the result:
|
|
|
|
```go
|
|
type Verdict struct {
|
|
Guilty bool `json:"guilty"`
|
|
Why string `json:"why" description:"one-sentence rationale"`
|
|
}
|
|
v, err := majordomo.Generate[Verdict](ctx, m, req)
|
|
```
|
|
|
|
## Agents
|
|
|
|
An agent is a model + system prompt + toolboxes, run as a tool-dispatch
|
|
loop until the model answers (or `MaxSteps`):
|
|
|
|
```go
|
|
import "gitea.stevedudenhoeffer.com/steve/majordomo/agent"
|
|
|
|
a := agent.New(m, "You are a research assistant.",
|
|
agent.WithToolbox(searchTools),
|
|
agent.WithMaxSteps(8),
|
|
agent.WithStepObserver(func(s agent.Step) { log.Printf("step %d", s.Index) }),
|
|
)
|
|
res, err := a.Run(ctx, "What changed in Go 1.26?")
|
|
// res.Output, res.Steps, res.Usage; res.Messages round-trips via
|
|
// agent.WithHistory for conversation continuation.
|
|
```
|
|
|
|
The loop never panics: tool handler errors and panics become error results
|
|
the model can react to; unknown tools likewise; duplicate tool names across
|
|
toolboxes fail loudly. On `agent.ErrMaxSteps` (and on model errors) the
|
|
partial result with the full transcript is still returned.
|
|
|
|
Supervision hooks for orchestrators: `WithMaxStepsFunc` (dynamic step
|
|
budget), `WithSteer` (inject messages into a running agent),
|
|
`WithCompactor` (transform the outbound transcript when context grows —
|
|
the canonical `Result.Messages` stays complete), and `WithToolErrorLimits`
|
|
(circuit breakers for all-error steps and identical repeated calls,
|
|
surfacing `agent.ErrToolLoop`).
|
|
|
|
## Skills
|
|
|
|
Skills are reusable instruction+tool bundles attachable to **any** agent,
|
|
at construction or on demand. Instructions extend the system prompt;
|
|
tools extend the toolset — additively, in attachment order.
|
|
|
|
```go
|
|
import (
|
|
"gitea.stevedudenhoeffer.com/steve/majordomo/skill"
|
|
"gitea.stevedudenhoeffer.com/steve/majordomo/skill/calc"
|
|
"gitea.stevedudenhoeffer.com/steve/majordomo/skill/clock"
|
|
)
|
|
|
|
research := skill.New("research",
|
|
skill.WithInstructions("Cite a source for every claim."),
|
|
skill.WithTools(searchTool, fetchTool),
|
|
)
|
|
|
|
a := agent.New(m, "You are helpful.", agent.WithSkill(research))
|
|
a.AddSkill(clock.New()) // ready-made: time awareness
|
|
a.AddSkill(calc.New()) // ready-made: exact arithmetic
|
|
```
|
|
|
|
Anything implementing the three-method `agent.Skill` interface (Name /
|
|
Instructions / Tools) is a skill — `skill.New` is just the convenient way
|
|
to build one.
|
|
|
|
## Feature/provider support matrix
|
|
|
|
| Provider | Resolve/Parse | Chat | Streaming | Tools | Structured | Images | Env DSN |
|
|
|----------------------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
| OpenAI (+compatible) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
| Anthropic (+compat) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
| Google (Gemini) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
| Ollama Cloud | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
| Ollama (local) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
| foreman | ✅ | ✅ | ✅¹ | ✅ | ✅ | ✅ | ✅ |
|
|
| llama-swap | ✅ | ✅ | ✅ | ✅² | ✅² | ✅² | ✅ |
|
|
| fake (testing) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — |
|
|
|
|
¹ foreman's daemon currently buffers sync chat responses (no token-by-token
|
|
streaming); majordomo's stream API works against it and delivers the
|
|
response as a single delta plus final event.
|
|
|
|
² llama-swap's chat is OpenAI-compatible and reuses the openai client, so these
|
|
capabilities are present at the client level; whether a given call succeeds
|
|
depends on the llama.cpp model llama-swap loads. llama-swap also provides
|
|
**image generation** (a separate `imagegen` axis, not shown above) and
|
|
management methods on `*llamaswap.Provider`.
|
|
|
|
Notes: Ollama has no native tool_choice — `"none"` drops the tools;
|
|
`"required"`/named choices are best-effort ignored there. Ollama Cloud
|
|
ignores the `format` field (verified live), so the provider also states
|
|
the schema as an explicit system instruction — constrained decoding on
|
|
local Ollama, instruction-guided JSON on cloud, one canonical API either
|
|
way.
|
|
|
|
Cross-cutting: Parse grammar ✅ · aliases/tiers ✅ · failover chains ✅ ·
|
|
health tracking/backoff ✅ · LLM_* env DSNs ✅ · media pipeline ✅
|
|
(per-target normalization in chains) · agent loop ✅ · `Generate[T]` +
|
|
schema derivation ✅ · skills ✅ (with clock + calc examples).
|
|
|
|
## Development
|
|
|
|
```bash
|
|
go build ./... && go vet ./... && go test -race -count=1 ./...
|
|
```
|
|
|
|
The default test suite is fully hermetic (no network, no credentials).
|
|
Live integration tests (Phase 8) are gated behind the `live` build tag and
|
|
read `.env` (see `.env.example`; never commit `.env`).
|
|
|
|
Design decisions are recorded in [docs/adr/](docs/adr/README.md);
|
|
conventions in [CLAUDE.md](CLAUDE.md); build history in
|
|
[progress.md](progress.md); the mort conversion plan in
|
|
[docs/mort-migration.md](docs/mort-migration.md).
|