Files
majordomo/CLAUDE.md
T
2026-06-27 22:56:59 +00:00

166 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md — majordomo operating manual
majordomo is a clean-slate Go substrate for LLM-backed agents:
target-agnostic model access, a parseable model naming / failover / tiering
system with health tracking, multimodality, tool calls, structured output,
and agents composed from model + system prompt + toolboxes + skills.
> **Public, vibe-coded project.** This is built almost entirely by an AI agent
> (Claude Code) and is public. Keep that framing honest in the README — don't
> oversell it — and keep the README/support-matrix/examples updated in the same
> commit as the behavior they describe (that in-sync promise is part of the
> project's credibility).
**North star:** majordomo exists to re-architect mort's agentic layer. mort
is the first consumer and the design's acceptance test — when a choice is a
toss-up, pick what makes mort's tiers, failover chains, toolboxes, and
skills cleanest to express. But majordomo itself stays general-purpose and
mort-agnostic: no mort types, no Discord, no mort config.
## Module & stack
- Module: `gitea.stevedudenhoeffer.com/steve/majordomo`, Go 1.26.
- Stdlib-first (ADR-0007): hand-rolled `net/http` clients for
OpenAI(+compat), Anthropic(+compat), Ollama (cloud+local), foreman. The
one approved dependency is `google.golang.org/genai` (Google provider).
Anything else needs an ADR. No `go-llm`, no `go-agentkit` — importing
either is an automatic failure.
## Package map (ADR-0001)
```
majordomo Registry, Parse, env-DSN loading, chain executor, re-exports
llm/ canonical contract: Message/Part/Request/Response/Option,
Tool/Toolbox, Capabilities, Stream, Model, Provider, errors
imagegen/ canonical text-to-image contract: Request/Result/Model/
Provider (separate from llm; Image = llm.ImagePart) (ADR-0016)
health/ clock-injected health tracker (bench/backoff)
media/ image normalization to target capabilities (sniff real
format, downscale, transcode, byte ladder; ErrUnsupported
for what can't fit) — chains normalize PER TARGET
provider/fake/ scriptable in-memory provider for hermetic tests
provider/openai/ Chat Completions client (+ all OpenAI-compat targets)
provider/anthropic/ Messages API client (+ Anthropic-compat targets)
provider/ollama/ one native /api/chat client serving the ollama,
ollama-cloud, and foreman built-ins via presets
provider/llamaswap/ llama-swap proxy: chat delegates to provider/openai,
plus management methods + imagegen image client (ADR-0015)
provider/google/ Gemini on google.golang.org/genai (the one approved
dependency; lazy client, raw-JSON-schema tools,
ThinkingLevel reasoning, iter.Pull2 streaming)
agent/ Agent run loop (Phase 5)
skill/ Skill interface + composition (Phase 6)
examples/ one runnable example per hard requirement (Phase 7-8)
```
Canonical types live in leaf package `llm`; the root re-exports them via
type aliases. Providers import `llm`, never each other, never the root.
## Parse grammar (ADR-0003)
```
spec := element ("," element)* # ordered failover chain
element := target | alias
target := provider "/" model # model id VERBATIM after first "/"
alias := bare token (no slash), expands INLINE, recursively, cycle-checked
```
- `Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8")`
→ try head-to-tail. Appending `,thinking` expands the registered alias in
place at the tail.
- Provider resolution: registry (built-ins, RegisterProvider, eager env) →
lazy `LLM_{UPPER(name)}` env DSN → error.
- Single element ≡ chain of one; same Model interface, same semantics.
- No reasoning suffixes (`:high` etc. are NOT stripped — model ids are
verbatim). Reasoning effort becomes a request option (provider phases).
## LLM_* env-DSN providers (ADR-0004, go-llm parity)
`LLM_<NAME>=scheme://[token@]host[/path]` — e.g.
`LLM_M5=foreman://token@foreman-m5.example` defines provider `m5`; then
`m5/qwen3:30b` works in Parse, chains, and aliases. Scheme ∈ {foreman,
ollama, ollama-cloud, openai, anthropic, google, gemini, llama-swap,
llama-swaps} RegisterScheme. Token = credential; base URL = `https://host`
always — **except `llama-swap`, which builds `http://host` (local-first);
`llama-swaps` is its TLS twin (`https://host`), mirroring redis/rediss
(ADR-0015).** `New()` scans the process env eagerly; unknown names also resolve
lazily at Parse time (`my-prov``LLM_MY_PROV`). Malformed entries fail on use,
not at startup.
## Health & failover (ADR-0006, ADR-0008)
- Transient (408/429/5xx, timeouts, conn refused/reset, DNS, deadline) vs
permanent (400/401/403/404/405/422, model-not-found, ctx.Canceled).
Unknown → transient. Classifier overridable.
- One transient error → retry same target (default 1 retry). Every failed
attempt counts; at threshold (default 2 consecutive) the target is
benched for base 5s × 2^n, capped 5m. Success fully resets. Chains skip
benched targets; 404 advances penalty-free; auth/malformed fail fast
(configurable); exhaustion returns a joined error naming every target.
- **Empty response = failover.** A target that returns *without error* but
with no usable output — no content parts and no tool calls (`Response.IsEmpty`;
a media/image part counts as content) — is treated as a per-target failure
(`llm.ErrEmptyResponse`, classified transient). Unlike an ordinary
transient it is **not** retried on the same target (the model just did
this; the call is expensive): the chain penalizes health and advances
immediately. If every target comes back empty the call fails with
`ErrChainExhausted` rather than a hollow "successful" empty completion, so
a single flaky model can't silently end an agent run with nothing.
- Tracker is in-memory, process-local, clock-injected. No persistence.
## House conventions (mirror foreman)
- gofmt; check errors immediately and wrap with `fmt.Errorf("%w: ...")`;
imports stdlib → third-party → internal; `// Why:` doc comments where
rationale isn't obvious.
- ADRs in `docs/adr/`, one decision each, append-only, indexed in its
README. progress.md gets a dated entry per phase.
- Conventional commits (`feat:`, `test:`, `docs:`, `chore:`, `refactor:`).
- Tests are hermetic: fake provider + fake clock; provider clients test
against `httptest`; **no network or credentials in the default suite**.
Live tests sit behind `//go:build live` / `examples/live/` and skip
without their env vars.
- `.env` holds live keys (gitignored, never committed/printed/quoted);
`.env.example` carries placeholders.
## Gates (every phase; what CI runs)
```
go build ./...
go vet ./...
go test -race -count=1 ./...
go mod tidy && git diff --exit-code go.mod go.sum
```
CI: `.gitea/workflows/ci.yaml` (Gitea Actions, mirrors foreman). README.md
must match reality in the same commit that changes behavior — no
aspirational docs; unbuilt features are marked pending in the matrix.
## Adversarial review loop (Gadfly)
Ship work through PRs and let Gadfly review it before merge:
- **Push to a PR, never straight to `main`.** Branch, push, open a PR.
`.gitea/workflows/adversarial-review.yml` runs Gadfly (the standalone
agentic adversarial reviewer) — a fleet of 6 ollama-cloud models, each
running the 3-lens suite (security, correctness, error-handling). Advisory
only; it never blocks the merge.
- **Wait for Gadfly to finish, then read its output.** Don't merge while the
review is still running. Each model posts one consolidated comment; weigh
every finding on its merits and fix the real ones (Gadfly is a simple
system — findings are advisory, so confirm before acting).
- **Grade the findings back to the Gadfly MCP.** For each finding, call
`mcp__gadfly__record_finding_grade`: `is_real=true` + a `severity`
(trivial|small|medium|high|critical) for a genuine problem, or
`is_real=false` for a false positive; add `notes`/`usefulness` when
useful. Use `mcp__gadfly__list_findings` (`only_ungraded=true`) to find
what still needs grading and `mcp__gadfly__scoreboard` for the per-model
rollup. This telemetry is how we measure whether each model earns its keep.
## Out of scope (anti-creep)
No persistent store (health is in-memory behind the registry), no
observability/metrics stack, no config-file framework beyond LLM_* env
DSNs, no CLI beyond examples, no provider-specific features leaking into
the canonical API, nothing mort-specific in the library.