majordomo/progress.md

# progress

## 2026-06-10 — Phase 6: skills

**Landed:** `skill/` (ADR-0013): the agent.Skill contract satisfied by a
buildable skill.New(name, WithInstructions/WithTools/WithToolbox);
instruction-only skills legal; same-instance reuse across agents; additive
ordered composition proven (prompt appending + toolset merge + loud
duplicate policy). Example skills: `skill/clock` (time_now/time_convert,
injectable clock) and `skill/calc` (calculate over a hand-rolled
recursive-descent evaluator: + - * / % ^, parens, unary minus, scientific
notation; division-by-zero and non-finite results rejected). Tests cover
the evaluator table, tool execution through ExecuteTool, and a full
agent-loop run answering from the calculate result.

**Next:** Phase 7 — examples/, mort migration map, README finalization.

## 2026-06-10 — Phase 5: agent loop, Generate[T], schema derivation

**Landed:** `agent/` (ADR-0012): New(model, system, opts) with toolboxes,
max steps (default 10), per-step request options, agent-level observers +
per-run OnStep, WithHistory continuation (Result.Messages round-trips),
sequential tool dispatch through panic-recovering ExecuteTool, unknown
tools → IsError results, duplicate tool names fail loudly, partial Result
preserved on ErrMaxSteps/model errors/cancellation. The agent.Skill
interface ships here (instructions + tools composition is tested with a
stub); the skill package with real implementations is Phase 6.
`llm.SchemaFor[T]` reflect-derived strict-compatible JSON schemas
(pointers→nullable anyOf, description/enum tags, maps/slices/time/RawMessage,
recursion rejected) and root `majordomo.Generate[T]` (schema injection,
fence-stripping decode, model-naming errors). 15 agent tests + schema +
Generate suites, all hermetic.

**Next:** Phase 6 — skill package + two example skills.

## 2026-06-10 — Phase 4: Google provider (official genai SDK)

**Landed:** `provider/google` on google.golang.org/genai v1.59.0 (ADR-0011):
lazy cached client (construction never fails; missing key = synthetic 401
so chains fail over), assistant→model role mapping, FunctionResponse tool
results with output/error payloads, ParametersJsonSchema raw-schema tools,
ResponseJsonSchema structured output, ToolChoice→FunctionCallingConfig,
ReasoningEffort→ThinkingConfig.ThinkingLevel, usage includes thought
tokens, iter.Pull2-adapted streaming, genai.APIError→llm.APIError mapping.
Hermetic tests via HTTPOptions.BaseURL + httptest (SSE fixtures for
streaming). Registry: google + gemini schemes wired to the real provider;
the last stub machinery deleted — all six built-ins are now real clients.
README matrix: Google row fully ✅.

**Next:** Phase 5 — Agent run loop, Toolbox ergonomics, Generate[T].

## 2026-06-10 — Phase 3: REST providers (OpenAI, Anthropic, Ollama×3) + media

**Landed:**
- `provider/openai`: Chat Completions client for OpenAI and every
  OpenAI-compatible endpoint (tools with string-arguments mapping, strict
  SSE streaming incl. by-index tool-call assembly and the empty-choices
  usage chunk, response_format json_schema, max_completion_tokens with a
  WithLegacyMaxTokens compat option, reasoning_effort).
- `provider/anthropic`: Messages API client (anthropic-version 2023-06-01,
  required-max_tokens defaulting, tool_use/tool_result blocks with native
  is_error, GA structured output via output_config.format, full SSE event
  parser with input_json_delta buffering, 529-overloaded classified
  transient, usage sums cache tokens).
- `provider/ollama`: ONE native /api/chat client serving ollama (local,
  OLLAMA_HOST normalization), ollama-cloud (https://ollama.com + bearer
  OLLAMA_API_KEY), and foreman (base URL + bearer; tolerates its
  buffered-single-object "streaming"). Object tool arguments, tool_name
  results, format-schema structured output, think-level mapping, NDJSON
  streaming with 16MB lines.
- `media/`: normalization pipeline per ADR-0009 (magic-byte sniffing,
  box-filter downscale, transcode preference ladder, byte-budget quality
  ladder, webp passthrough-or-reject, copy-on-write, everything-unfittable
  wraps ErrUnsupported).
- Chain executor now normalizes media PER TARGET before each attempt and
  advances penalty-free past targets that can't take the request (proven:
  text-only head + vision fallback; per-target downscale assertions).
- Registry: real providers + scheme factories wired for openai, anthropic,
  ollama, ollama-cloud, foreman (google still stubbed, Phase 4);
  WithHTTPClient registry option; required env-foreman TLS chat round-trip
  test (LLM_FM=foreman://token@host → Parse("fm/qwen3:30b") → bearer
  arrives, chat answers).
- ADR-0009 (multimodal), ADR-0010 (tools/structured mapping); README
  matrix flipped to ✅ for the four landed provider families; ~70 new
  hermetic tests across the three provider packages + media.
- Run note: openai/anthropic/media were built by three parallel
  subagents against the frozen llm contract; ollama/foreman, chain wiring,
  and registry integration done in the main line. All gates green.

**Next:** Phase 4 — Google provider on google.golang.org/genai.

## 2026-06-10 — Phase 2: health + failover chain, proven

**Landed:** the full deterministic failover test matrix over the fake
provider + fake clock (no sleeps, no network): single-transient recovery
via same-target retry; repeated transients bench + advance; cooldown expiry
re-admits and success resets; backoff doubling across bench rounds;
mixed chain with an inline-expanded alias element failing over through the
expanded targets; permanent-policy default (fail-fast on auth) and
`AdvanceOnPermanent` override; `TransientRetries` disabled/custom; retry
loop stops early when the tracker benches mid-request; exhaustion error
lists skipped-while-benched targets; custom classifier override; chain-of-
one gets identical semantics; HTTP 529 fails over. Implementation needed no
changes — Phase 1's executor held up.

**Next:** Phase 3 — OpenAI/Anthropic/Ollama/foreman REST clients + media
pipeline.

## 2026-06-10 — Phase 1: foundations, ADRs, skeleton, docs

**Landed:**
- Module scaffold (Go 1.26), `.gitea/workflows/ci.yaml` (foreman-style
  gates: build, vet, race tests, tidy-diff), `.env.example`.
- `llm/` canonical contract: Message/Part (sealed; text+image),
  Request/Options, Response/Usage/FinishReason, Stream/StreamEvent,
  Tool/Toolbox (panic-safe Execute), Capabilities (zero-value semantics),
  Model/Provider interfaces, APIError + transient/permanent Classify.
- `health/`: clock-injected tracker — consecutive-failure threshold,
  exponential capped cooldown, reset-on-success, thread-safe; full
  deterministic test suite (fake clock).
- Root: Registry (providers/aliases/schemes/health), Parse with the binding
  grammar (verbatim model ids, inline recursive alias expansion, cycle
  detection, dedup), LLM_* env-DSN loading (go-llm-parity lazy fallback +
  eager LoadEnv/New scan), chain executor implementing Model
  (retry-on-transient, bench-on-repeat, skip-benched, 404-advance,
  fail-fast-on-auth, joined exhaustion errors). Built-ins register as
  resolvable stubs until their phases land.
- `provider/fake/`: scriptable provider (per-model outcome queues, request
  recording, capabilities overrides, streaming) — the hermetic test rig.
- ADRs 0001–0008 + index; CLAUDE.md; honest README with pending-marked
  matrix.
- Tests cover the two required cases: the trailing-`thinking` chain parse
  and `LLM_M1=foreman://token@host` loading (plus DSN table, lazy fallback,
  cycle detection, chain failover/backoff/exhaustion, toolbox execution,
  error classification).

**Notes:** chain executor landed in Phase 1 (design was settled);
Phase 2 deepens its test matrix (cooldown re-admission via fake clock,
alias-in-chain failover, permanent-policy override) and wires anything the
tests flush out.

**Next:** Phase 2 — exhaustive health/chain test matrix.