steve/majordomo

Fork 0

T

steve 1fd7109a42

CI / Tidy (pull_request) Successful in 9m31s

Details

CI / Build & Test (pull_request) Successful in 10m14s

Details

CI / Tidy (push) Successful in 9m26s

Details

CI / Build & Test (push) Successful in 10m19s

Details

fix(agent): recover front-loaded answer when terminal turn is degenerate

The agent loop took the final answer only from the terminal (no-tool-call)
turn. Models that "front-load" their answer into an earlier turn that also
calls a tool — then close with a trivial pointer like "(Already answered
above.)" — had their real answer discarded and the pointer delivered. This
recurs across several open-weight models (glm-5.2, etc.); well-behaved models
(Claude/GPT) defer their answer to the terminal turn and are unaffected.

finalOutput() now falls back to the last substantive assistant content in the
transcript when the terminal text is weak (empty, or a short back-reference).
The predicate is narrow and back-reference-gated so short-but-correct answers
("42", "It's down, restarting now.") are never overridden; recovery only picks
a prior turn that reads like a real answer, not a preamble. Zero extra model
calls. Terminal-answer behavior for normal runs is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-26 18:37:38 -04:00

.gitea/workflows

feat: foundations — canonical types, Parse grammar, env DSNs, health, chains

2026-06-10 12:35:34 +02:00

agent

fix(agent): recover front-loaded answer when terminal turn is degenerate

2026-06-26 18:37:38 -04:00

docs

docs: public-readiness — vibe-coded disclosure + genericize internal hosts

2026-06-25 19:25:58 -04:00

examples

docs: public-readiness — vibe-coded disclosure + genericize internal hosts

2026-06-25 19:25:58 -04:00

health

feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls

2026-06-10 13:30:06 +02:00

llm

feat(chain): fail over on empty/degenerate responses

2026-06-26 10:35:07 -04:00

media

feat: OpenAI, Anthropic, and native-Ollama providers + media pipeline

2026-06-10 12:58:08 +02:00

provider

feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls

2026-06-10 13:30:06 +02:00

skill

feat: skills — additive instruction+tool bundles, clock + calc examples

2026-06-10 13:13:07 +02:00

.env.example

feat: foundations — canonical types, Parse grammar, env DSNs, health, chains

2026-06-10 12:35:34 +02:00

.gitignore

feat: foundations — canonical types, Parse grammar, env DSNs, health, chains

2026-06-10 12:35:34 +02:00

builtin.go

feat: Google (Gemini) provider on the official Gen AI SDK

2026-06-10 13:04:28 +02:00

chain_test.go

feat: foundations — canonical types, Parse grammar, env DSNs, health, chains

2026-06-10 12:35:34 +02:00

chain.go

feat(chain): fail over on empty/degenerate responses

2026-06-26 10:35:07 -04:00

CLAUDE.md

feat(chain): fail over on empty/degenerate responses

2026-06-26 10:35:07 -04:00

env_test.go

docs: public-readiness — vibe-coded disclosure + genericize internal hosts

2026-06-25 19:25:58 -04:00

env.go

feat: foundations — canonical types, Parse grammar, env DSNs, health, chains

2026-06-10 12:35:34 +02:00

extensions_test.go

feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls

2026-06-10 13:30:06 +02:00

failover_empty_test.go

feat(chain): fail over on empty/degenerate responses

2026-06-26 10:35:07 -04:00

failover_test.go

feat: OpenAI, Anthropic, and native-Ollama providers + media pipeline

2026-06-10 12:58:08 +02:00

generate_test.go

feat: agent run loop, Generate[T], reflect-derived schemas

2026-06-10 13:10:18 +02:00

generate.go

feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls

2026-06-10 13:30:06 +02:00

go.mod

feat: Google (Gemini) provider on the official Gen AI SDK

2026-06-10 13:04:28 +02:00

go.sum

feat: Google (Gemini) provider on the official Gen AI SDK

2026-06-10 13:04:28 +02:00

majordomo.go

feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls

2026-06-10 13:30:06 +02:00

parse_test.go

feat: OpenAI, Anthropic, and native-Ollama providers + media pipeline

2026-06-10 12:58:08 +02:00

parse.go

feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls

2026-06-10 13:30:06 +02:00

progress.md

docs: record Phase 9 completion — mort conversion PR open

2026-06-10 18:08:53 +02:00

README.md

docs: public-readiness — vibe-coded disclosure + genericize internal hosts

2026-06-25 19:25:58 -04:00

registry.go

feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls

2026-06-10 13:30:06 +02:00

README.md

majordomo

A clean-slate Go library for building LLM-backed agents: one canonical API over many model providers, a parseable model naming / failover / tiering system with built-in health tracking, capability-aware multimodality, tool calls, structured output, and composable agents and skills.

🤖 Heads up: this is a vibe-coded project

majordomo was built almost entirely by an AI agent (Claude Code) — design, code, and docs. It is reasonably well-tested (a fully hermetic suite plus gated live integration tests) and is used in earnest, but treat it accordingly: read the code before depending on it, expect the occasional AI-flavored rough edge, and please open issues. No warranty implied.

The support matrix below is kept honest: pending means not built, and this README is updated in the same commit as the behavior it describes. Runnable programs for every feature live in examples/.

Install

go get gitea.stevedudenhoeffer.com/steve/majordomo

Requires Go 1.26+.

Quickstart

package main

import (
    "context"
    "fmt"

    "gitea.stevedudenhoeffer.com/steve/majordomo"
)

func main() {
    reg := majordomo.New() // built-ins + LLM_* env providers

    m, err := reg.Parse("ollama-cloud/minimax-m3:cloud")
    if err != nil { panic(err) }

    resp, err := m.Generate(context.Background(), majordomo.Request{
        Messages: []majordomo.Message{majordomo.UserText("hello!")},
    })
    if err != nil { panic(err) }
    fmt.Println(resp.Text())
}

majordomo.Parse(...) (package level) uses a lazily-built default registry if you don't need isolation.

Model specs: targets, chains, tiers

A model spec is a comma-separated failover chain; each element is either a provider/model target or a registered alias (tier):

// Try minimax-m3 first; on failure kimi-k2.6; finally fall back to opus-4.8.
m, _ := reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8")

// Identical, with the registered alias "thinking" appended and expanded
// in place as the tail of the chain:
m, _ = reg.Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8,thinking")

Everything after the first / (up to the next comma) is the model id, passed to the provider verbatim — tags (:cloud, :30b) and ids with extra slashes survive intact. majordomo never validates ids against a catalog.

Custom tiers (aliases)

reg.RegisterAlias("thinking", "anthropic/opus-4.8,ollama-cloud/minimax-m3:cloud")
reg.RegisterAlias("workhorse", "ollama-cloud/minimax-m2.7:cloud,ollama-cloud/qwen3-coder:480b-cloud")

m, _ := reg.Parse("thinking") // a chain, same Model interface as a single target

Aliases may appear anywhere in a chain (head, middle, tail), may reference other aliases, and expand inline and recursively; cycles are detected and returned as errors.

For tiers that live in a database or config system, register a dynamic resolver — consulted after static aliases, output expanded with the same recursion and cycle guards:

reg.RegisterResolver(majordomo.ResolverFunc(func(name string) (string, bool) {
    return myConfigStore.LookupTier(name) // e.g. "agent-thinking" → a chain
}))

Failover & health

Chains are health-tracked per target:

A single transient error (429/5xx, timeout, connection failure) is retried once on the same target.
Repeated transient errors (default: 2 consecutive failed attempts) bench the target — chains skip it until its cooldown expires (exponential: 5s, 10s, 20s, ... capped at 5m). Any success resets it.
model not found advances down the chain without penalty; auth/malformed errors fail fast (failing over can't fix a bad key). All knobs are configurable via WithChainConfig / WithHealthConfig.
If every element fails, you get one joined error naming each target and why it failed.
Ops surfaces: reg.Health() exposes Bench/Unbench/Snapshot for manual control and dashboards; ChainConfig.Observer receives one event per failover decision (failed attempt, bench, benched-skip) for logging.

Providers

Built-in env vars

Provider	Spec name	Key env var	Default endpoint
OpenAI (+compatible)	`openai`	`OPENAI_API_KEY`	https://api.openai.com/v1
Anthropic (+compatible)	`anthropic`	`ANTHROPIC_API_KEY`	https://api.anthropic.com
Google (Gemini)	`google`	`GOOGLE_API_KEY` / `GEMINI_API_KEY`	Gemini API (official SDK)
Ollama Cloud	`ollama-cloud`	`OLLAMA_API_KEY`	https://ollama.com
Ollama (local)	`ollama`	—	`OLLAMA_HOST` or http://localhost:11434
foreman	`foreman`	— (token via DSN)	requires an LLM_* DSN or `ollama.Foreman(url, token)`

OpenAI-compatible / Anthropic-compatible endpoints: construct the provider with a name and base URL and register it —

reg.RegisterProvider(openai.New(
    openai.WithName("groq"),
    openai.WithBaseURL("https://api.groq.com/openai/v1"),
    openai.WithAPIKey(key),
    // openai.WithLegacyMaxTokens(), // for servers that only honor max_tokens
))
// now "groq/llama-3.3-70b" works in Parse, chains, and aliases

`LLM_*` env-DSN provider definitions

Define named providers entirely from the environment (go-llm parity):

LLM_M1=foreman://test-token-change-me@foreman-m1.example.com
LLM_M5=foreman://test-token-change-me@foreman-m5.example.com

defines providers m1 and m5 (foreman targets — native Ollama wire protocol behind a bearer token). They are first-class in Parse, chains, and aliases:

m, _ := reg.Parse("m5/qwen3:30b,m1/qwen3:30b,thinking")

DSN format: scheme://[token@]host[/path], scheme ∈ foreman, ollama, ollama-cloud, openai, anthropic, google/gemini, or any scheme you add with RegisterScheme. The token is the credential (bearer token / API key); the base URL is always https://host[/path]. New() loads LLM_* vars eagerly; unknown provider names also resolve lazily at Parse time (my-prov/x → LLM_MY_PROV).

Custom providers

Implement the two-method Provider interface and register it:

reg.RegisterProvider(myProvider) // now "myprovider/model-x" parses, chains, aliases

Multimodality

Attach images without knowing the target's limits. Before each attempt the request is normalized against the actual serving target's declared capabilities: the real format is sniffed from the bytes, oversize images are downscaled (aspect preserved), disallowed formats are re-encoded, and byte budgets are enforced by a quality ladder. What cannot be made to fit is rejected with a clear ErrUnsupported error — and in a chain, the request simply advances to the next (e.g. vision-capable) element.

resp, err := m.Generate(ctx, majordomo.Request{
    Messages: []majordomo.Message{
        majordomo.UserParts(majordomo.Text("what's in this image?"),
            majordomo.Image("image/png", pngBytes)),
    },
})

Tool calls

weather := majordomo.Tool{
    Name:        "get_weather",
    Description: "Current weather for a city",
    Parameters:  json.RawMessage(`{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}`),
    Handler: func(ctx context.Context, args json.RawMessage) (any, error) {
        var p struct{ City string `json:"city"` }
        _ = json.Unmarshal(args, &p)
        return map[string]any{"city": p.City, "temp_c": 21}, nil
    },
}
resp, _ := m.Generate(ctx, req, majordomo.WithTools(weather))
// resp.ToolCalls → execute → append ToolResultsMessage → continue

Or typed, with the schema derived from your argument struct:

weather := majordomo.DefineTool("get_weather", "Current weather for a city",
    func(ctx context.Context, args struct {
        City string `json:"city" description:"city name"`
    }) (any, error) {
        return lookup(args.City)
    })

Each provider maps this one shape to its native function-calling format (OpenAI tools/tool_calls, Anthropic tool_use/tool_result, Ollama tools with object arguments). Tool-call ids are synthesized when a backend omits them; streaming buffers tool-call arguments until they parse.

Structured output

resp, _ := m.Generate(ctx, req, majordomo.WithSchema(schemaJSON, "answer"))

Maps to OpenAI response_format: json_schema, Anthropic output_config.format, Ollama format, and Google responseJsonSchema.

The typed helper derives the schema from your struct (all fields required, additionalProperties:false, pointers nullable; description:"..." and enum:"a,b,c" tags supported) and unmarshals the result:

type Verdict struct {
    Guilty bool   `json:"guilty"`
    Why    string `json:"why" description:"one-sentence rationale"`
}
v, err := majordomo.Generate[Verdict](ctx, m, req)

Agents

An agent is a model + system prompt + toolboxes, run as a tool-dispatch loop until the model answers (or MaxSteps):

import "gitea.stevedudenhoeffer.com/steve/majordomo/agent"

a := agent.New(m, "You are a research assistant.",
    agent.WithToolbox(searchTools),
    agent.WithMaxSteps(8),
    agent.WithStepObserver(func(s agent.Step) { log.Printf("step %d", s.Index) }),
)
res, err := a.Run(ctx, "What changed in Go 1.26?")
// res.Output, res.Steps, res.Usage; res.Messages round-trips via
// agent.WithHistory for conversation continuation.

The loop never panics: tool handler errors and panics become error results the model can react to; unknown tools likewise; duplicate tool names across toolboxes fail loudly. On agent.ErrMaxSteps (and on model errors) the partial result with the full transcript is still returned.

Supervision hooks for orchestrators: WithMaxStepsFunc (dynamic step budget), WithSteer (inject messages into a running agent), WithCompactor (transform the outbound transcript when context grows — the canonical Result.Messages stays complete), and WithToolErrorLimits (circuit breakers for all-error steps and identical repeated calls, surfacing agent.ErrToolLoop).

Skills

Skills are reusable instruction+tool bundles attachable to any agent, at construction or on demand. Instructions extend the system prompt; tools extend the toolset — additively, in attachment order.

import (
    "gitea.stevedudenhoeffer.com/steve/majordomo/skill"
    "gitea.stevedudenhoeffer.com/steve/majordomo/skill/calc"
    "gitea.stevedudenhoeffer.com/steve/majordomo/skill/clock"
)

research := skill.New("research",
    skill.WithInstructions("Cite a source for every claim."),
    skill.WithTools(searchTool, fetchTool),
)

a := agent.New(m, "You are helpful.", agent.WithSkill(research))
a.AddSkill(clock.New()) // ready-made: time awareness
a.AddSkill(calc.New())  // ready-made: exact arithmetic

Anything implementing the three-method agent.Skill interface (Name / Instructions / Tools) is a skill — skill.New is just the convenient way to build one.

Feature/provider support matrix

Provider	Resolve/Parse	Chat	Streaming	Tools	Structured	Images	Env DSN
OpenAI (+compatible)	✅	✅	✅	✅	✅	✅	✅
Anthropic (+compat)	✅	✅	✅	✅	✅	✅	✅
Google (Gemini)	✅	✅	✅	✅	✅	✅	✅
Ollama Cloud	✅	✅	✅	✅	✅	✅	✅
Ollama (local)	✅	✅	✅	✅	✅	✅	✅
foreman	✅	✅	✅¹	✅	✅	✅	✅
fake (testing)	✅	✅	✅	✅	✅	✅	—

¹ foreman's daemon currently buffers sync chat responses (no token-by-token streaming); majordomo's stream API works against it and delivers the response as a single delta plus final event.

Notes: Ollama has no native tool_choice — "none" drops the tools; "required"/named choices are best-effort ignored there. Ollama Cloud ignores the format field (verified live), so the provider also states the schema as an explicit system instruction — constrained decoding on local Ollama, instruction-guided JSON on cloud, one canonical API either way.

Cross-cutting: Parse grammar ✅ · aliases/tiers ✅ · failover chains ✅ · health tracking/backoff ✅ · LLM_* env DSNs ✅ · media pipeline ✅ (per-target normalization in chains) · agent loop ✅ · Generate[T] + schema derivation ✅ · skills ✅ (with clock + calc examples).

Development

go build ./... && go vet ./... && go test -race -count=1 ./...

The default test suite is fully hermetic (no network, no credentials). Live integration tests (Phase 8) are gated behind the live build tag and read .env (see .env.example; never commit .env).

Design decisions are recorded in docs/adr/; conventions in CLAUDE.md; build history in progress.md; the mort conversion plan in docs/mort-migration.md.

README.md

majordomo

🤖 Heads up: this is a vibe-coded project

Install

Quickstart

Model specs: targets, chains, tiers

Custom tiers (aliases)

Failover & health

Providers

Built-in env vars

LLM_* env-DSN provider definitions

Custom providers

Multimodality

Tool calls

Structured output

Agents

Skills

Feature/provider support matrix

Development

`LLM_*` env-DSN provider definitions