Files
majordomo/progress.md
T
steve 96c612e707
CI / Tidy (pull_request) Successful in 9m25s
CI / Build & Test (pull_request) Successful in 10m15s
feat(llamaswap): add llama-swap provider + canonical imagegen interface
Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping
proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to
provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with
legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and
a timeout-free client so cold model swaps rely on context deadlines. The
"tailored" surface is concrete management methods (ListModels / Running /
Unload) that don't belong on the canonical llm.Provider interface. The
llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL
built-in errors clearly on use, mirroring foreman.

Add imagegen, a new canonical text-to-image interface separate from llm
(Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed
straight back into chat). First backend is llama-swap via OpenAI
/v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is
txt2img only.

Hermetic httptest coverage for chat delegation, management endpoints, image
decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix +
image-gen section, CLAUDE.md package map, and progress.md updated in the same
commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 15:01:54 -04:00

244 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# progress
## 2026-06-27 — llama-swap provider + canonical image-gen interface
**Landed (ADR-0015, ADR-0016).** New `provider/llamaswap`: chat **delegates to
`provider/openai`** at `{base}/v1` (no duplicated wire client per ADR-0007), with
legacy `max_tokens`, a `Bearer no-key` placeholder for keyless local instances,
and a timeout-free client (swap cold starts → use context deadlines). Tailored
management methods on the concrete type — `ListModels`, `Running` (raw JSON),
`Unload`. DSN scheme `llama-swap://token@host:port` builds an **http** base URL
(local-first), registered in `builtin.go` alongside a no-URL built-in that errors
on use (mirrors foreman).
New canonical `imagegen` package (text-to-image), separate from `llm`:
`Request`/`Result`/`Model`/`Provider`, `Image = llm.ImagePart` so generated
images feed back into chat. First backend is llama-swap via OpenAI
`/v1/images/generations` (`b64_json`, bytes-only). Re-exported from root
(`ImageModel`, `ImageRequest`, `WithImageSize`, ...). v1 is txt2img only; edits/
img2img and registry image-DSN resolution deferred.
Hermetic `httptest` tests for chat delegation, management endpoints, image
decode, and scheme wiring. Gates green. README support matrix + image-gen
section, CLAUDE.md package map, and ADR index updated in the same change.
## 2026-06-10 — Phase 9b: mort converted, PR open
**Done.** mort fully re-based on majordomo on branch
`majordomo-conversion`: 230 files (+8726/6211), go-llm/v2 and
go-agentkit removed from go.mod/go.sum with a clean repo-wide grep,
`go build`/`go vet` clean, full test suite green (80 packages ok, 0
failures). Highlights: pkg/logic/llms rebuilt as the choke point
(registry, lane decorators, convar tier resolver, failover wiring via
llms.Wire); skillexec/agentexec on majordomo agent loops (critic budget
via WithMaxStepsFunc, steer, compactor, tool-error guards); runDirect
special case deleted; scaddy critic redesigned as one-shot multimodal
Generate; agentkit httpapi replaced by a mort-side server; ~96 tools on
DefineTool. PR (open, not merged):
https://gitea.stevedudenhoeffer.com/steve/mort/pulls/1274
Run note: executed by an 8-agent staged workflow; one mid-run deadlock
(a cluster agent polling a long-tail package) was broken by converting
tasks/recipe/summary/cookbook in the main line; one full workflow restart
after a network outage.
## 2026-06-10 — Phase 9a: conversion-driven library extensions
**Landed (ADR-0014):** RegisterResolver (dynamic DB-backed tiers, static
aliases win, recursive + cycle-guarded), DefineTool[Args] (typed tools
over SchemaFor), Usage cache/reasoning detail fields populated by
anthropic/openai/google, WithPromptCaching (Anthropic top-level
cache_control), agent hooks (WithMaxStepsFunc, WithSteer, WithCompactor —
non-fatal on error, canonical transcript stays uncompacted —
WithToolErrorLimits with ErrToolLoop), health Bench/Unbench/Snapshot,
ChainConfig.Observer failover events (attempt/bench/skip). Full hermetic
coverage for each.
**Next:** Phase 9b — the mort conversion branch.
## 2026-06-10 — Phase 8: live validation against real Ollama Cloud
**All six checks PASS** (examples/live harness, OLLAMA_API_KEY from .env):
1. Tier aliases (`thinking` = minimax-m3:cloud→kimi-k2.6:cloud,
`workhorse` = minimax-m2.7:cloud→qwen3-coder:480b-cloud) resolve via
Parse, incl. as a trailing chain element.
2. Plain chat served by ollama-cloud/minimax-m3:cloud (189 in/48 out).
3. Live tool call: the workhorse agent actually invoked get_launch_code
and answered from its result in 2 steps.
4. Structured Generate[T] decoded {City:Tokyo Country:Japan
Population:14000000 Latitude:35.6762}.
5. Forced failover: an unreachable head (connection refused = transient)
was retried, benched, and fell through to a live cloud tail; the second
request skipped the benched head without dialing it.
6. Agent with the calc skill attached invoked calculate and answered
56161.
**Discovery + fix:** Ollama Cloud ignores the `format` field entirely
(verified with raw curl — markdown came back despite a schema). The
ollama provider now also states the schema as an explicit system
instruction (local stays constrained-decoded; cloud becomes
instruction-guided); hermetic test added. The `:cloud`-suffixed model
names work verbatim against ollama.com — mort's tier strings carry over
unchanged.
**Next:** Phase 9 — convert mort onto majordomo, open the PR.
## 2026-06-10 — Phase 7: examples, migration blueprint, README finalization
**Landed:** `examples/` — nine runnable programs, one per hard requirement
(parse, failover incl. trailing-alias chains, custom tiers, LLM_* env
providers + foreman, multimodal, raw tool loop, structured Generate[T],
agent with toolbox, skills) + examples/README index; all built by the
hermetic gate suite. `docs/mort-migration.md` — the full conversion
blueprint: layering (what stays mort-side), the symbol-level core
mappings table, seven planned additive library extensions (dynamic
resolvers, DefineTool[Args], usage detail fields, prompt caching, agent
loop hooks, manual bench controls, failover observer), the Phase 9
execution order, and the behavioral deltas to verify (failover knob
mapping, AdvanceOnPermanent for go-llm's ErrRequestSpecific behavior,
bytes-only images). README final pass with the complete feature/provider
matrix.
**Next:** Phase 8 — live validation against real Ollama Cloud.
## 2026-06-10 — Phase 6: skills
**Landed:** `skill/` (ADR-0013): the agent.Skill contract satisfied by a
buildable skill.New(name, WithInstructions/WithTools/WithToolbox);
instruction-only skills legal; same-instance reuse across agents; additive
ordered composition proven (prompt appending + toolset merge + loud
duplicate policy). Example skills: `skill/clock` (time_now/time_convert,
injectable clock) and `skill/calc` (calculate over a hand-rolled
recursive-descent evaluator: + - * / % ^, parens, unary minus, scientific
notation; division-by-zero and non-finite results rejected). Tests cover
the evaluator table, tool execution through ExecuteTool, and a full
agent-loop run answering from the calculate result.
**Next:** Phase 7 — examples/, mort migration map, README finalization.
## 2026-06-10 — Phase 5: agent loop, Generate[T], schema derivation
**Landed:** `agent/` (ADR-0012): New(model, system, opts) with toolboxes,
max steps (default 10), per-step request options, agent-level observers +
per-run OnStep, WithHistory continuation (Result.Messages round-trips),
sequential tool dispatch through panic-recovering ExecuteTool, unknown
tools → IsError results, duplicate tool names fail loudly, partial Result
preserved on ErrMaxSteps/model errors/cancellation. The agent.Skill
interface ships here (instructions + tools composition is tested with a
stub); the skill package with real implementations is Phase 6.
`llm.SchemaFor[T]` reflect-derived strict-compatible JSON schemas
(pointers→nullable anyOf, description/enum tags, maps/slices/time/RawMessage,
recursion rejected) and root `majordomo.Generate[T]` (schema injection,
fence-stripping decode, model-naming errors). 15 agent tests + schema +
Generate suites, all hermetic.
**Next:** Phase 6 — skill package + two example skills.
## 2026-06-10 — Phase 4: Google provider (official genai SDK)
**Landed:** `provider/google` on google.golang.org/genai v1.59.0 (ADR-0011):
lazy cached client (construction never fails; missing key = synthetic 401
so chains fail over), assistant→model role mapping, FunctionResponse tool
results with output/error payloads, ParametersJsonSchema raw-schema tools,
ResponseJsonSchema structured output, ToolChoice→FunctionCallingConfig,
ReasoningEffort→ThinkingConfig.ThinkingLevel, usage includes thought
tokens, iter.Pull2-adapted streaming, genai.APIError→llm.APIError mapping.
Hermetic tests via HTTPOptions.BaseURL + httptest (SSE fixtures for
streaming). Registry: google + gemini schemes wired to the real provider;
the last stub machinery deleted — all six built-ins are now real clients.
README matrix: Google row fully ✅.
**Next:** Phase 5 — Agent run loop, Toolbox ergonomics, Generate[T].
## 2026-06-10 — Phase 3: REST providers (OpenAI, Anthropic, Ollama×3) + media
**Landed:**
- `provider/openai`: Chat Completions client for OpenAI and every
OpenAI-compatible endpoint (tools with string-arguments mapping, strict
SSE streaming incl. by-index tool-call assembly and the empty-choices
usage chunk, response_format json_schema, max_completion_tokens with a
WithLegacyMaxTokens compat option, reasoning_effort).
- `provider/anthropic`: Messages API client (anthropic-version 2023-06-01,
required-max_tokens defaulting, tool_use/tool_result blocks with native
is_error, GA structured output via output_config.format, full SSE event
parser with input_json_delta buffering, 529-overloaded classified
transient, usage sums cache tokens).
- `provider/ollama`: ONE native /api/chat client serving ollama (local,
OLLAMA_HOST normalization), ollama-cloud (https://ollama.com + bearer
OLLAMA_API_KEY), and foreman (base URL + bearer; tolerates its
buffered-single-object "streaming"). Object tool arguments, tool_name
results, format-schema structured output, think-level mapping, NDJSON
streaming with 16MB lines.
- `media/`: normalization pipeline per ADR-0009 (magic-byte sniffing,
box-filter downscale, transcode preference ladder, byte-budget quality
ladder, webp passthrough-or-reject, copy-on-write, everything-unfittable
wraps ErrUnsupported).
- Chain executor now normalizes media PER TARGET before each attempt and
advances penalty-free past targets that can't take the request (proven:
text-only head + vision fallback; per-target downscale assertions).
- Registry: real providers + scheme factories wired for openai, anthropic,
ollama, ollama-cloud, foreman (google still stubbed, Phase 4);
WithHTTPClient registry option; required env-foreman TLS chat round-trip
test (LLM_FM=foreman://token@host → Parse("fm/qwen3:30b") → bearer
arrives, chat answers).
- ADR-0009 (multimodal), ADR-0010 (tools/structured mapping); README
matrix flipped to ✅ for the four landed provider families; ~70 new
hermetic tests across the three provider packages + media.
- Run note: openai/anthropic/media were built by three parallel
subagents against the frozen llm contract; ollama/foreman, chain wiring,
and registry integration done in the main line. All gates green.
**Next:** Phase 4 — Google provider on google.golang.org/genai.
## 2026-06-10 — Phase 2: health + failover chain, proven
**Landed:** the full deterministic failover test matrix over the fake
provider + fake clock (no sleeps, no network): single-transient recovery
via same-target retry; repeated transients bench + advance; cooldown expiry
re-admits and success resets; backoff doubling across bench rounds;
mixed chain with an inline-expanded alias element failing over through the
expanded targets; permanent-policy default (fail-fast on auth) and
`AdvanceOnPermanent` override; `TransientRetries` disabled/custom; retry
loop stops early when the tracker benches mid-request; exhaustion error
lists skipped-while-benched targets; custom classifier override; chain-of-
one gets identical semantics; HTTP 529 fails over. Implementation needed no
changes — Phase 1's executor held up.
**Next:** Phase 3 — OpenAI/Anthropic/Ollama/foreman REST clients + media
pipeline.
## 2026-06-10 — Phase 1: foundations, ADRs, skeleton, docs
**Landed:**
- Module scaffold (Go 1.26), `.gitea/workflows/ci.yaml` (foreman-style
gates: build, vet, race tests, tidy-diff), `.env.example`.
- `llm/` canonical contract: Message/Part (sealed; text+image),
Request/Options, Response/Usage/FinishReason, Stream/StreamEvent,
Tool/Toolbox (panic-safe Execute), Capabilities (zero-value semantics),
Model/Provider interfaces, APIError + transient/permanent Classify.
- `health/`: clock-injected tracker — consecutive-failure threshold,
exponential capped cooldown, reset-on-success, thread-safe; full
deterministic test suite (fake clock).
- Root: Registry (providers/aliases/schemes/health), Parse with the binding
grammar (verbatim model ids, inline recursive alias expansion, cycle
detection, dedup), LLM_* env-DSN loading (go-llm-parity lazy fallback +
eager LoadEnv/New scan), chain executor implementing Model
(retry-on-transient, bench-on-repeat, skip-benched, 404-advance,
fail-fast-on-auth, joined exhaustion errors). Built-ins register as
resolvable stubs until their phases land.
- `provider/fake/`: scriptable provider (per-model outcome queues, request
recording, capabilities overrides, streaming) — the hermetic test rig.
- ADRs 00010008 + index; CLAUDE.md; honest README with pending-marked
matrix.
- Tests cover the two required cases: the trailing-`thinking` chain parse
and `LLM_M1=foreman://token@host` loading (plus DSN table, lazy fallback,
cycle detection, chain failover/backoff/exhaustion, toolbox execution,
error classification).
**Notes:** chain executor landed in Phase 1 (design was settled);
Phase 2 deepens its test matrix (cooldown re-admission via fake clock,
alias-in-chain failover, permanent-policy override) and wires anything the
tests flush out.
**Next:** Phase 2 — exhaustive health/chain test matrix.