96c612e707
Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping
proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to
provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with
legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and
a timeout-free client so cold model swaps rely on context deadlines. The
"tailored" surface is concrete management methods (ListModels / Running /
Unload) that don't belong on the canonical llm.Provider interface. The
llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL
built-in errors clearly on use, mirroring foreman.
Add imagegen, a new canonical text-to-image interface separate from llm
(Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed
straight back into chat). First backend is llama-swap via OpenAI
/v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is
txt2img only.
Hermetic httptest coverage for chat delegation, management endpoints, image
decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix +
image-gen section, CLAUDE.md package map, and progress.md updated in the same
commit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
144 lines
7.3 KiB
Markdown
144 lines
7.3 KiB
Markdown
# CLAUDE.md — majordomo operating manual
|
||
|
||
majordomo is a clean-slate Go substrate for LLM-backed agents:
|
||
target-agnostic model access, a parseable model naming / failover / tiering
|
||
system with health tracking, multimodality, tool calls, structured output,
|
||
and agents composed from model + system prompt + toolboxes + skills.
|
||
|
||
> **Public, vibe-coded project.** This is built almost entirely by an AI agent
|
||
> (Claude Code) and is public. Keep that framing honest in the README — don't
|
||
> oversell it — and keep the README/support-matrix/examples updated in the same
|
||
> commit as the behavior they describe (that in-sync promise is part of the
|
||
> project's credibility).
|
||
|
||
**North star:** majordomo exists to re-architect mort's agentic layer. mort
|
||
is the first consumer and the design's acceptance test — when a choice is a
|
||
toss-up, pick what makes mort's tiers, failover chains, toolboxes, and
|
||
skills cleanest to express. But majordomo itself stays general-purpose and
|
||
mort-agnostic: no mort types, no Discord, no mort config.
|
||
|
||
## Module & stack
|
||
|
||
- Module: `gitea.stevedudenhoeffer.com/steve/majordomo`, Go 1.26.
|
||
- Stdlib-first (ADR-0007): hand-rolled `net/http` clients for
|
||
OpenAI(+compat), Anthropic(+compat), Ollama (cloud+local), foreman. The
|
||
one approved dependency is `google.golang.org/genai` (Google provider).
|
||
Anything else needs an ADR. No `go-llm`, no `go-agentkit` — importing
|
||
either is an automatic failure.
|
||
|
||
## Package map (ADR-0001)
|
||
|
||
```
|
||
majordomo Registry, Parse, env-DSN loading, chain executor, re-exports
|
||
llm/ canonical contract: Message/Part/Request/Response/Option,
|
||
Tool/Toolbox, Capabilities, Stream, Model, Provider, errors
|
||
imagegen/ canonical text-to-image contract: Request/Result/Model/
|
||
Provider (separate from llm; Image = llm.ImagePart) (ADR-0016)
|
||
health/ clock-injected health tracker (bench/backoff)
|
||
media/ image normalization to target capabilities (sniff real
|
||
format, downscale, transcode, byte ladder; ErrUnsupported
|
||
for what can't fit) — chains normalize PER TARGET
|
||
provider/fake/ scriptable in-memory provider for hermetic tests
|
||
provider/openai/ Chat Completions client (+ all OpenAI-compat targets)
|
||
provider/anthropic/ Messages API client (+ Anthropic-compat targets)
|
||
provider/ollama/ one native /api/chat client serving the ollama,
|
||
ollama-cloud, and foreman built-ins via presets
|
||
provider/llamaswap/ llama-swap proxy: chat delegates to provider/openai,
|
||
plus management methods + imagegen image client (ADR-0015)
|
||
provider/google/ Gemini on google.golang.org/genai (the one approved
|
||
dependency; lazy client, raw-JSON-schema tools,
|
||
ThinkingLevel reasoning, iter.Pull2 streaming)
|
||
agent/ Agent run loop (Phase 5)
|
||
skill/ Skill interface + composition (Phase 6)
|
||
examples/ one runnable example per hard requirement (Phase 7-8)
|
||
```
|
||
|
||
Canonical types live in leaf package `llm`; the root re-exports them via
|
||
type aliases. Providers import `llm`, never each other, never the root.
|
||
|
||
## Parse grammar (ADR-0003)
|
||
|
||
```
|
||
spec := element ("," element)* # ordered failover chain
|
||
element := target | alias
|
||
target := provider "/" model # model id VERBATIM after first "/"
|
||
alias := bare token (no slash), expands INLINE, recursively, cycle-checked
|
||
```
|
||
|
||
- `Parse("ollama-cloud/minimax-m3:cloud,ollama-cloud/kimi-k2.6:cloud,anthropic/opus-4.8")`
|
||
→ try head-to-tail. Appending `,thinking` expands the registered alias in
|
||
place at the tail.
|
||
- Provider resolution: registry (built-ins, RegisterProvider, eager env) →
|
||
lazy `LLM_{UPPER(name)}` env DSN → error.
|
||
- Single element ≡ chain of one; same Model interface, same semantics.
|
||
- No reasoning suffixes (`:high` etc. are NOT stripped — model ids are
|
||
verbatim). Reasoning effort becomes a request option (provider phases).
|
||
|
||
## LLM_* env-DSN providers (ADR-0004, go-llm parity)
|
||
|
||
`LLM_<NAME>=scheme://[token@]host[/path]` — e.g.
|
||
`LLM_M5=foreman://token@foreman-m5.example` defines provider `m5`; then
|
||
`m5/qwen3:30b` works in Parse, chains, and aliases. Scheme ∈ {foreman,
|
||
ollama, ollama-cloud, openai, anthropic, google, gemini, llama-swap} ∪
|
||
RegisterScheme. Token = credential; base URL = `https://host` always —
|
||
**except `llama-swap`, which builds `http://host` (local-first; ADR-0015).**
|
||
`New()` scans the process env eagerly; unknown names also resolve lazily at
|
||
Parse time (`my-prov` → `LLM_MY_PROV`). Malformed entries fail on use, not at
|
||
startup.
|
||
|
||
## Health & failover (ADR-0006, ADR-0008)
|
||
|
||
- Transient (408/429/5xx, timeouts, conn refused/reset, DNS, deadline) vs
|
||
permanent (400/401/403/404/405/422, model-not-found, ctx.Canceled).
|
||
Unknown → transient. Classifier overridable.
|
||
- One transient error → retry same target (default 1 retry). Every failed
|
||
attempt counts; at threshold (default 2 consecutive) the target is
|
||
benched for base 5s × 2^n, capped 5m. Success fully resets. Chains skip
|
||
benched targets; 404 advances penalty-free; auth/malformed fail fast
|
||
(configurable); exhaustion returns a joined error naming every target.
|
||
- **Empty response = failover.** A target that returns *without error* but
|
||
with no usable output — no content parts and no tool calls (`Response.IsEmpty`;
|
||
a media/image part counts as content) — is treated as a per-target failure
|
||
(`llm.ErrEmptyResponse`, classified transient). Unlike an ordinary
|
||
transient it is **not** retried on the same target (the model just did
|
||
this; the call is expensive): the chain penalizes health and advances
|
||
immediately. If every target comes back empty the call fails with
|
||
`ErrChainExhausted` rather than a hollow "successful" empty completion, so
|
||
a single flaky model can't silently end an agent run with nothing.
|
||
- Tracker is in-memory, process-local, clock-injected. No persistence.
|
||
|
||
## House conventions (mirror foreman)
|
||
|
||
- gofmt; check errors immediately and wrap with `fmt.Errorf("%w: ...")`;
|
||
imports stdlib → third-party → internal; `// Why:` doc comments where
|
||
rationale isn't obvious.
|
||
- ADRs in `docs/adr/`, one decision each, append-only, indexed in its
|
||
README. progress.md gets a dated entry per phase.
|
||
- Conventional commits (`feat:`, `test:`, `docs:`, `chore:`, `refactor:`).
|
||
- Tests are hermetic: fake provider + fake clock; provider clients test
|
||
against `httptest`; **no network or credentials in the default suite**.
|
||
Live tests sit behind `//go:build live` / `examples/live/` and skip
|
||
without their env vars.
|
||
- `.env` holds live keys (gitignored, never committed/printed/quoted);
|
||
`.env.example` carries placeholders.
|
||
|
||
## Gates (every phase; what CI runs)
|
||
|
||
```
|
||
go build ./...
|
||
go vet ./...
|
||
go test -race -count=1 ./...
|
||
go mod tidy && git diff --exit-code go.mod go.sum
|
||
```
|
||
|
||
CI: `.gitea/workflows/ci.yaml` (Gitea Actions, mirrors foreman). README.md
|
||
must match reality in the same commit that changes behavior — no
|
||
aspirational docs; unbuilt features are marked pending in the matrix.
|
||
|
||
## Out of scope (anti-creep)
|
||
|
||
No persistent store (health is in-memory behind the registry), no
|
||
observability/metrics stack, no config-file framework beyond LLM_* env
|
||
DSNs, no CLI beyond examples, no provider-specific features leaking into
|
||
the canonical API, nothing mort-specific in the library.
|