74474c6da0
A failover chain previously treated a successful-but-empty completion (no content parts and no tool calls — a "stop with nothing") as a valid result and returned it. The agent loop then ended the run with empty output, and the configured backup models were never tried because no error was raised. This let a single flaky model silently terminate an agent/skill run with no answer (observed in the wild with ollama-cloud/glm-5.2 returning empty completions right after a large tool/think turn). - Add llm.ErrEmptyResponse (classified transient) and Response.IsEmpty(): true only when there are no tool calls and no meaningful content (no parts, or whitespace-only text). A media/image part counts as content, so image-only responses are NOT empty. - chain.Generate converts an empty completion into ErrEmptyResponse so the chain fails over to the next target. Unlike an ordinary transient it is NOT retried on the same target (the model just produced it; these calls are expensive) — the chain penalizes health (so a persistently-empty target benches) and advances immediately. - When every target returns empty the call fails with ErrChainExhausted joined to ErrEmptyResponse — a visible error instead of a hollow success. Single-element chains therefore also surface empties as errors. Stream path is unchanged (can't inspect content before the consumer reads it). Tests: Response.IsEmpty table; chain fails over past an empty head; all-empty chain returns ErrChainExhausted/ErrEmptyResponse; repeated empties bench the target across requests. Full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
138 lines
6.9 KiB
Markdown
138 lines
6.9 KiB
Markdown
# CLAUDE.md — majordomo operating manual
|
||
|
||
majordomo is a clean-slate Go substrate for LLM-backed agents:
|
||
target-agnostic model access, a parseable model naming / failover / tiering
|
||
system with health tracking, multimodality, tool calls, structured output,
|
||
and agents composed from model + system prompt + toolboxes + skills.
|
||
|
||
> **Public, vibe-coded project.** This is built almost entirely by an AI agent
|
||
> (Claude Code) and is public. Keep that framing honest in the README — don't
|
||
> oversell it — and keep the README/support-matrix/examples updated in the same
|
||
> commit as the behavior they describe (that in-sync promise is part of the
|
||
> project's credibility).
|
||
|
||
**North star:** majordomo exists to re-architect mort's agentic layer. mort
|
||
is the first consumer and the design's acceptance test — when a choice is a
|
||
toss-up, pick what makes mort's tiers, failover chains, toolboxes, and
|
||
skills cleanest to express. But majordomo itself stays general-purpose and
|
||
mort-agnostic: no mort types, no Discord, no mort config.
|
||
|
||
## Module & stack
|
||
|
||
- Module: `gitea.stevedudenhoeffer.com/steve/majordomo`, Go 1.26.
|
||
- Stdlib-first (ADR-0007): hand-rolled `net/http` clients for
|
||
OpenAI(+compat), Anthropic(+compat), Ollama (cloud+local), foreman. The
|
||
one approved dependency is `google.golang.org/genai` (Google provider).
|
||
Anything else needs an ADR. No `go-llm`, no `go-agentkit` — importing
|
||
either is an automatic failure.
|
||
|
||
## Package map (ADR-0001)
|
||
|
||
```
|
||
majordomo Registry, Parse, env-DSN loading, chain executor, re-exports
|
||
llm/ canonical contract: Message/Part/Request/Response/Option,
|
||
Tool/Toolbox, Capabilities, Stream, Model, Provider, errors
|
||
health/ clock-injected health tracker (bench/backoff)
|
||
media/ image normalization to target capabilities (sniff real
|
||
format, downscale, transcode, byte ladder; ErrUnsupported
|
||
for what can't fit) — chains normalize PER TARGET
|
||
provider/fake/ scriptable in-memory provider for hermetic tests
|
||
provider/openai/ Chat Completions client (+ all OpenAI-compat targets)
|
||
provider/anthropic/ Messages API client (+ Anthropic-compat targets)
|
||
provider/ollama/ one native /api/chat client serving the ollama,
|
||
ollama-cloud, and foreman built-ins via presets
|
||
provider/google/ Gemini on google.golang.org/genai (the one approved
|
||
dependency; lazy client, raw-JSON-schema tools,
|
||
ThinkingLevel reasoning, iter.Pull2 streaming)
|
||
agent/ Agent run loop (Phase 5)
|
||
skill/ Skill interface + composition (Phase 6)
|
||
examples/ one runnable example per hard requirement (Phase 7-8)
|
||
```
|
||
|
||
Canonical types live in leaf package `llm`; the root re-exports them via
|
||
type aliases. Providers import `llm`, never each other, never the root.
|
||
|
||
## Parse grammar (ADR-0003)
|
||
|
||
```
|
||
spec := element ("," element)* # ordered failover chain
|
||
element := target | alias
|
||
target := provider "/" model # model id VERBATIM after first "/"
|
||
alias := bare token (no slash), expands INLINE, recursively, cycle-checked
|
||
```
|
||
|
||
- `Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8")`
|
||
→ try head-to-tail. Appending `,thinking` expands the registered alias in
|
||
place at the tail.
|
||
- Provider resolution: registry (built-ins, RegisterProvider, eager env) →
|
||
lazy `LLM_{UPPER(name)}` env DSN → error.
|
||
- Single element ≡ chain of one; same Model interface, same semantics.
|
||
- No reasoning suffixes (`:high` etc. are NOT stripped — model ids are
|
||
verbatim). Reasoning effort becomes a request option (provider phases).
|
||
|
||
## LLM_* env-DSN providers (ADR-0004, go-llm parity)
|
||
|
||
`LLM_<NAME>=scheme://[token@]host[/path]` — e.g.
|
||
`LLM_M5=foreman://token@foreman-m5.example` defines provider `m5`; then
|
||
`m5/qwen3:30b` works in Parse, chains, and aliases. Scheme ∈ {foreman,
|
||
ollama, ollama-cloud, openai, anthropic, google, gemini} ∪ RegisterScheme.
|
||
Token = credential; base URL = `https://host` always. `New()` scans the
|
||
process env eagerly; unknown names also resolve lazily at Parse time
|
||
(`my-prov` → `LLM_MY_PROV`). Malformed entries fail on use, not at startup.
|
||
|
||
## Health & failover (ADR-0006, ADR-0008)
|
||
|
||
- Transient (408/429/5xx, timeouts, conn refused/reset, DNS, deadline) vs
|
||
permanent (400/401/403/404/405/422, model-not-found, ctx.Canceled).
|
||
Unknown → transient. Classifier overridable.
|
||
- One transient error → retry same target (default 1 retry). Every failed
|
||
attempt counts; at threshold (default 2 consecutive) the target is
|
||
benched for base 5s × 2^n, capped 5m. Success fully resets. Chains skip
|
||
benched targets; 404 advances penalty-free; auth/malformed fail fast
|
||
(configurable); exhaustion returns a joined error naming every target.
|
||
- **Empty response = failover.** A target that returns *without error* but
|
||
with no usable output — no content parts and no tool calls (`Response.IsEmpty`;
|
||
a media/image part counts as content) — is treated as a per-target failure
|
||
(`llm.ErrEmptyResponse`, classified transient). Unlike an ordinary
|
||
transient it is **not** retried on the same target (the model just did
|
||
this; the call is expensive): the chain penalizes health and advances
|
||
immediately. If every target comes back empty the call fails with
|
||
`ErrChainExhausted` rather than a hollow "successful" empty completion, so
|
||
a single flaky model can't silently end an agent run with nothing.
|
||
- Tracker is in-memory, process-local, clock-injected. No persistence.
|
||
|
||
## House conventions (mirror foreman)
|
||
|
||
- gofmt; check errors immediately and wrap with `fmt.Errorf("%w: ...")`;
|
||
imports stdlib → third-party → internal; `// Why:` doc comments where
|
||
rationale isn't obvious.
|
||
- ADRs in `docs/adr/`, one decision each, append-only, indexed in its
|
||
README. progress.md gets a dated entry per phase.
|
||
- Conventional commits (`feat:`, `test:`, `docs:`, `chore:`, `refactor:`).
|
||
- Tests are hermetic: fake provider + fake clock; provider clients test
|
||
against `httptest`; **no network or credentials in the default suite**.
|
||
Live tests sit behind `//go:build live` / `examples/live/` and skip
|
||
without their env vars.
|
||
- `.env` holds live keys (gitignored, never committed/printed/quoted);
|
||
`.env.example` carries placeholders.
|
||
|
||
## Gates (every phase; what CI runs)
|
||
|
||
```
|
||
go build ./...
|
||
go vet ./...
|
||
go test -race -count=1 ./...
|
||
go mod tidy && git diff --exit-code go.mod go.sum
|
||
```
|
||
|
||
CI: `.gitea/workflows/ci.yaml` (Gitea Actions, mirrors foreman). README.md
|
||
must match reality in the same commit that changes behavior — no
|
||
aspirational docs; unbuilt features are marked pending in the matrix.
|
||
|
||
## Out of scope (anti-creep)
|
||
|
||
No persistent store (health is in-memory behind the registry), no
|
||
observability/metrics stack, no config-file framework beyond LLM_* env
|
||
DSNs, no CLI beyond examples, no provider-specific features leaking into
|
||
the canonical API, nothing mort-specific in the library.
|