The agent loop took the final answer only from the terminal (no-tool-call)
turn. Models that "front-load" their answer into an earlier turn that also
calls a tool — then close with a trivial pointer like "(Already answered
above.)" — had their real answer discarded and the pointer delivered. This
recurs across several open-weight models (glm-5.2, etc.); well-behaved models
(Claude/GPT) defer their answer to the terminal turn and are unaffected.
finalOutput() now falls back to the last substantive assistant content in the
transcript when the terminal text is weak (empty, or a short back-reference).
The predicate is narrow and back-reference-gated so short-but-correct answers
("42", "It's down, restarting now.") are never overridden; recovery only picks
a prior turn that reads like a real answer, not a preamble. Zero extra model
calls. Terminal-answer behavior for normal runs is unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
majordomo
A clean-slate Go library for building LLM-backed agents: one canonical API over many model providers, a parseable model naming / failover / tiering system with built-in health tracking, capability-aware multimodality, tool calls, structured output, and composable agents and skills.
🤖 Heads up: this is a vibe-coded project
majordomo was built almost entirely by an AI agent (Claude Code) — design, code, and docs. It is reasonably well-tested (a fully hermetic suite plus gated live integration tests) and is used in earnest, but treat it accordingly: read the code before depending on it, expect the occasional AI-flavored rough edge, and please open issues. No warranty implied.
The support matrix below is kept honest: pending means not built, and this README is updated in the same commit as the behavior it describes. Runnable programs for every feature live in examples/.
Install
go get gitea.stevedudenhoeffer.com/steve/majordomo
Requires Go 1.26+.
Quickstart
package main
import (
"context"
"fmt"
"gitea.stevedudenhoeffer.com/steve/majordomo"
)
func main() {
reg := majordomo.New() // built-ins + LLM_* env providers
m, err := reg.Parse("ollama-cloud/minimax-m3:cloud")
if err != nil { panic(err) }
resp, err := m.Generate(context.Background(), majordomo.Request{
Messages: []majordomo.Message{majordomo.UserText("hello!")},
})
if err != nil { panic(err) }
fmt.Println(resp.Text())
}
majordomo.Parse(...) (package level) uses a lazily-built default registry
if you don't need isolation.
Model specs: targets, chains, tiers
A model spec is a comma-separated failover chain; each element is either
a provider/model target or a registered alias (tier):
// Try minimax-m3 first; on failure kimi-k2.6; finally fall back to opus-4.8.
m, _ := reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8")
// Identical, with the registered alias "thinking" appended and expanded
// in place as the tail of the chain:
m, _ = reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8,thinking")
Everything after the first / (up to the next comma) is the model id,
passed to the provider verbatim — tags (:cloud, :30b) and ids with
extra slashes survive intact. majordomo never validates ids against a
catalog.
Custom tiers (aliases)
reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
reg.RegisterAlias("workhorse", "ollama-cloud/minimax-m2.7:cloud,ollama-cloud/qwen3-coder:480b-cloud")
m, _ := reg.Parse("thinking") // a chain, same Model interface as a single target
Aliases may appear anywhere in a chain (head, middle, tail), may reference other aliases, and expand inline and recursively; cycles are detected and returned as errors.
For tiers that live in a database or config system, register a dynamic resolver — consulted after static aliases, output expanded with the same recursion and cycle guards:
reg.RegisterResolver(majordomo.ResolverFunc(func(name string) (string, bool) {
return myConfigStore.LookupTier(name) // e.g. "agent-thinking" → a chain
}))
Failover & health
Chains are health-tracked per target:
- A single transient error (429/5xx, timeout, connection failure) is retried once on the same target.
- Repeated transient errors (default: 2 consecutive failed attempts) bench the target — chains skip it until its cooldown expires (exponential: 5s, 10s, 20s, ... capped at 5m). Any success resets it.
model not foundadvances down the chain without penalty; auth/malformed errors fail fast (failing over can't fix a bad key). All knobs are configurable viaWithChainConfig/WithHealthConfig.- If every element fails, you get one joined error naming each target and why it failed.
- Ops surfaces:
reg.Health()exposesBench/Unbench/Snapshotfor manual control and dashboards;ChainConfig.Observerreceives one event per failover decision (failed attempt, bench, benched-skip) for logging.
Providers
Built-in env vars
| Provider | Spec name | Key env var | Default endpoint |
|---|---|---|---|
| OpenAI (+compatible) | openai |
OPENAI_API_KEY |
https://api.openai.com/v1 |
| Anthropic (+compatible) | anthropic |
ANTHROPIC_API_KEY |
https://api.anthropic.com |
| Google (Gemini) | google |
GOOGLE_API_KEY / GEMINI_API_KEY |
Gemini API (official SDK) |
| Ollama Cloud | ollama-cloud |
OLLAMA_API_KEY |
https://ollama.com |
| Ollama (local) | ollama |
— | OLLAMA_HOST or http://localhost:11434 |
| foreman | foreman |
— (token via DSN) | requires an LLM_* DSN or ollama.Foreman(url, token) |
OpenAI-compatible / Anthropic-compatible endpoints: construct the provider with a name and base URL and register it —
reg.RegisterProvider(openai.New(
openai.WithName("groq"),
openai.WithBaseURL("https://api.groq.com/openai/v1"),
openai.WithAPIKey(key),
// openai.WithLegacyMaxTokens(), // for servers that only honor max_tokens
))
// now "groq/llama-3.3-70b" works in Parse, chains, and aliases
LLM_* env-DSN provider definitions
Define named providers entirely from the environment (go-llm parity):
LLM_M1=foreman://test-token-change-me@foreman-m1.example.com
LLM_M5=foreman://test-token-change-me@foreman-m5.example.com
defines providers m1 and m5 (foreman targets — native Ollama wire
protocol behind a bearer token). They are first-class in Parse, chains,
and aliases:
m, _ := reg.Parse("m5/qwen3:30b,m1/qwen3:30b,thinking")
DSN format: scheme://[token@]host[/path], scheme ∈ foreman, ollama,
ollama-cloud, openai, anthropic, google/gemini, or any scheme you
add with RegisterScheme. The token is the credential (bearer token / API
key); the base URL is always https://host[/path]. New() loads LLM_*
vars eagerly; unknown provider names also resolve lazily at Parse time
(my-prov/x → LLM_MY_PROV).
Custom providers
Implement the two-method Provider interface and register it:
reg.RegisterProvider(myProvider) // now "myprovider/model-x" parses, chains, aliases
Multimodality
Attach images without knowing the target's limits. Before each attempt the
request is normalized against the actual serving target's declared
capabilities: the real format is sniffed from the bytes, oversize images
are downscaled (aspect preserved), disallowed formats are re-encoded, and
byte budgets are enforced by a quality ladder. What cannot be made to fit
is rejected with a clear ErrUnsupported error — and in a chain, the
request simply advances to the next (e.g. vision-capable) element.
resp, err := m.Generate(ctx, majordomo.Request{
Messages: []majordomo.Message{
majordomo.UserParts(majordomo.Text("what's in this image?"),
majordomo.Image("image/png", pngBytes)),
},
})
Tool calls
weather := majordomo.Tool{
Name: "get_weather",
Description: "Current weather for a city",
Parameters: json.RawMessage(`{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}`),
Handler: func(ctx context.Context, args json.RawMessage) (any, error) {
var p struct{ City string `json:"city"` }
_ = json.Unmarshal(args, &p)
return map[string]any{"city": p.City, "temp_c": 21}, nil
},
}
resp, _ := m.Generate(ctx, req, majordomo.WithTools(weather))
// resp.ToolCalls → execute → append ToolResultsMessage → continue
Or typed, with the schema derived from your argument struct:
weather := majordomo.DefineTool("get_weather", "Current weather for a city",
func(ctx context.Context, args struct {
City string `json:"city" description:"city name"`
}) (any, error) {
return lookup(args.City)
})
Each provider maps this one shape to its native function-calling format (OpenAI tools/tool_calls, Anthropic tool_use/tool_result, Ollama tools with object arguments). Tool-call ids are synthesized when a backend omits them; streaming buffers tool-call arguments until they parse.
Structured output
resp, _ := m.Generate(ctx, req, majordomo.WithSchema(schemaJSON, "answer"))
Maps to OpenAI response_format: json_schema, Anthropic
output_config.format, Ollama format, and Google responseJsonSchema.
The typed helper derives the schema from your struct (all fields required,
additionalProperties:false, pointers nullable; description:"..." and
enum:"a,b,c" tags supported) and unmarshals the result:
type Verdict struct {
Guilty bool `json:"guilty"`
Why string `json:"why" description:"one-sentence rationale"`
}
v, err := majordomo.Generate[Verdict](ctx, m, req)
Agents
An agent is a model + system prompt + toolboxes, run as a tool-dispatch
loop until the model answers (or MaxSteps):
import "gitea.stevedudenhoeffer.com/steve/majordomo/agent"
a := agent.New(m, "You are a research assistant.",
agent.WithToolbox(searchTools),
agent.WithMaxSteps(8),
agent.WithStepObserver(func(s agent.Step) { log.Printf("step %d", s.Index) }),
)
res, err := a.Run(ctx, "What changed in Go 1.26?")
// res.Output, res.Steps, res.Usage; res.Messages round-trips via
// agent.WithHistory for conversation continuation.
The loop never panics: tool handler errors and panics become error results
the model can react to; unknown tools likewise; duplicate tool names across
toolboxes fail loudly. On agent.ErrMaxSteps (and on model errors) the
partial result with the full transcript is still returned.
Supervision hooks for orchestrators: WithMaxStepsFunc (dynamic step
budget), WithSteer (inject messages into a running agent),
WithCompactor (transform the outbound transcript when context grows —
the canonical Result.Messages stays complete), and WithToolErrorLimits
(circuit breakers for all-error steps and identical repeated calls,
surfacing agent.ErrToolLoop).
Skills
Skills are reusable instruction+tool bundles attachable to any agent, at construction or on demand. Instructions extend the system prompt; tools extend the toolset — additively, in attachment order.
import (
"gitea.stevedudenhoeffer.com/steve/majordomo/skill"
"gitea.stevedudenhoeffer.com/steve/majordomo/skill/calc"
"gitea.stevedudenhoeffer.com/steve/majordomo/skill/clock"
)
research := skill.New("research",
skill.WithInstructions("Cite a source for every claim."),
skill.WithTools(searchTool, fetchTool),
)
a := agent.New(m, "You are helpful.", agent.WithSkill(research))
a.AddSkill(clock.New()) // ready-made: time awareness
a.AddSkill(calc.New()) // ready-made: exact arithmetic
Anything implementing the three-method agent.Skill interface (Name /
Instructions / Tools) is a skill — skill.New is just the convenient way
to build one.
Feature/provider support matrix
| Provider | Resolve/Parse | Chat | Streaming | Tools | Structured | Images | Env DSN |
|---|---|---|---|---|---|---|---|
| OpenAI (+compatible) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Anthropic (+compat) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Google (Gemini) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Ollama Cloud | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Ollama (local) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| foreman | ✅ | ✅ | ✅¹ | ✅ | ✅ | ✅ | ✅ |
| fake (testing) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — |
¹ foreman's daemon currently buffers sync chat responses (no token-by-token streaming); majordomo's stream API works against it and delivers the response as a single delta plus final event.
Notes: Ollama has no native tool_choice — "none" drops the tools;
"required"/named choices are best-effort ignored there. Ollama Cloud
ignores the format field (verified live), so the provider also states
the schema as an explicit system instruction — constrained decoding on
local Ollama, instruction-guided JSON on cloud, one canonical API either
way.
Cross-cutting: Parse grammar ✅ · aliases/tiers ✅ · failover chains ✅ ·
health tracking/backoff ✅ · LLM_* env DSNs ✅ · media pipeline ✅
(per-target normalization in chains) · agent loop ✅ · Generate[T] +
schema derivation ✅ · skills ✅ (with clock + calc examples).
Development
go build ./... && go vet ./... && go test -race -count=1 ./...
The default test suite is fully hermetic (no network, no credentials).
Live integration tests (Phase 8) are gated behind the live build tag and
read .env (see .env.example; never commit .env).
Design decisions are recorded in docs/adr/; conventions in CLAUDE.md; build history in progress.md; the mort conversion plan in docs/mort-migration.md.