Files
majordomo/progress.md
T
steve 76ecf0e49e feat: skills — additive instruction+tool bundles, clock + calc examples
Phase 6: skill.New constructor satisfying the agent.Skill contract;
instruction-only skills; ordered additive composition; skill/clock
(injectable-clock time tools) and skill/calc (recursive-descent arithmetic
evaluator) as ready-made examples with full test suites incl. an
agent-loop round trip. ADR-0013; README skills section + matrix synced.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:13:07 +02:00

143 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# progress
## 2026-06-10 — Phase 6: skills
**Landed:** `skill/` (ADR-0013): the agent.Skill contract satisfied by a
buildable skill.New(name, WithInstructions/WithTools/WithToolbox);
instruction-only skills legal; same-instance reuse across agents; additive
ordered composition proven (prompt appending + toolset merge + loud
duplicate policy). Example skills: `skill/clock` (time_now/time_convert,
injectable clock) and `skill/calc` (calculate over a hand-rolled
recursive-descent evaluator: + - * / % ^, parens, unary minus, scientific
notation; division-by-zero and non-finite results rejected). Tests cover
the evaluator table, tool execution through ExecuteTool, and a full
agent-loop run answering from the calculate result.
**Next:** Phase 7 — examples/, mort migration map, README finalization.
## 2026-06-10 — Phase 5: agent loop, Generate[T], schema derivation
**Landed:** `agent/` (ADR-0012): New(model, system, opts) with toolboxes,
max steps (default 10), per-step request options, agent-level observers +
per-run OnStep, WithHistory continuation (Result.Messages round-trips),
sequential tool dispatch through panic-recovering ExecuteTool, unknown
tools → IsError results, duplicate tool names fail loudly, partial Result
preserved on ErrMaxSteps/model errors/cancellation. The agent.Skill
interface ships here (instructions + tools composition is tested with a
stub); the skill package with real implementations is Phase 6.
`llm.SchemaFor[T]` reflect-derived strict-compatible JSON schemas
(pointers→nullable anyOf, description/enum tags, maps/slices/time/RawMessage,
recursion rejected) and root `majordomo.Generate[T]` (schema injection,
fence-stripping decode, model-naming errors). 15 agent tests + schema +
Generate suites, all hermetic.
**Next:** Phase 6 — skill package + two example skills.
## 2026-06-10 — Phase 4: Google provider (official genai SDK)
**Landed:** `provider/google` on google.golang.org/genai v1.59.0 (ADR-0011):
lazy cached client (construction never fails; missing key = synthetic 401
so chains fail over), assistant→model role mapping, FunctionResponse tool
results with output/error payloads, ParametersJsonSchema raw-schema tools,
ResponseJsonSchema structured output, ToolChoice→FunctionCallingConfig,
ReasoningEffort→ThinkingConfig.ThinkingLevel, usage includes thought
tokens, iter.Pull2-adapted streaming, genai.APIError→llm.APIError mapping.
Hermetic tests via HTTPOptions.BaseURL + httptest (SSE fixtures for
streaming). Registry: google + gemini schemes wired to the real provider;
the last stub machinery deleted — all six built-ins are now real clients.
README matrix: Google row fully ✅.
**Next:** Phase 5 — Agent run loop, Toolbox ergonomics, Generate[T].
## 2026-06-10 — Phase 3: REST providers (OpenAI, Anthropic, Ollama×3) + media
**Landed:**
- `provider/openai`: Chat Completions client for OpenAI and every
OpenAI-compatible endpoint (tools with string-arguments mapping, strict
SSE streaming incl. by-index tool-call assembly and the empty-choices
usage chunk, response_format json_schema, max_completion_tokens with a
WithLegacyMaxTokens compat option, reasoning_effort).
- `provider/anthropic`: Messages API client (anthropic-version 2023-06-01,
required-max_tokens defaulting, tool_use/tool_result blocks with native
is_error, GA structured output via output_config.format, full SSE event
parser with input_json_delta buffering, 529-overloaded classified
transient, usage sums cache tokens).
- `provider/ollama`: ONE native /api/chat client serving ollama (local,
OLLAMA_HOST normalization), ollama-cloud (https://ollama.com + bearer
OLLAMA_API_KEY), and foreman (base URL + bearer; tolerates its
buffered-single-object "streaming"). Object tool arguments, tool_name
results, format-schema structured output, think-level mapping, NDJSON
streaming with 16MB lines.
- `media/`: normalization pipeline per ADR-0009 (magic-byte sniffing,
box-filter downscale, transcode preference ladder, byte-budget quality
ladder, webp passthrough-or-reject, copy-on-write, everything-unfittable
wraps ErrUnsupported).
- Chain executor now normalizes media PER TARGET before each attempt and
advances penalty-free past targets that can't take the request (proven:
text-only head + vision fallback; per-target downscale assertions).
- Registry: real providers + scheme factories wired for openai, anthropic,
ollama, ollama-cloud, foreman (google still stubbed, Phase 4);
WithHTTPClient registry option; required env-foreman TLS chat round-trip
test (LLM_FM=foreman://token@host → Parse("fm/qwen3:30b") → bearer
arrives, chat answers).
- ADR-0009 (multimodal), ADR-0010 (tools/structured mapping); README
matrix flipped to ✅ for the four landed provider families; ~70 new
hermetic tests across the three provider packages + media.
- Run note: openai/anthropic/media were built by three parallel
subagents against the frozen llm contract; ollama/foreman, chain wiring,
and registry integration done in the main line. All gates green.
**Next:** Phase 4 — Google provider on google.golang.org/genai.
## 2026-06-10 — Phase 2: health + failover chain, proven
**Landed:** the full deterministic failover test matrix over the fake
provider + fake clock (no sleeps, no network): single-transient recovery
via same-target retry; repeated transients bench + advance; cooldown expiry
re-admits and success resets; backoff doubling across bench rounds;
mixed chain with an inline-expanded alias element failing over through the
expanded targets; permanent-policy default (fail-fast on auth) and
`AdvanceOnPermanent` override; `TransientRetries` disabled/custom; retry
loop stops early when the tracker benches mid-request; exhaustion error
lists skipped-while-benched targets; custom classifier override; chain-of-
one gets identical semantics; HTTP 529 fails over. Implementation needed no
changes — Phase 1's executor held up.
**Next:** Phase 3 — OpenAI/Anthropic/Ollama/foreman REST clients + media
pipeline.
## 2026-06-10 — Phase 1: foundations, ADRs, skeleton, docs
**Landed:**
- Module scaffold (Go 1.26), `.gitea/workflows/ci.yaml` (foreman-style
gates: build, vet, race tests, tidy-diff), `.env.example`.
- `llm/` canonical contract: Message/Part (sealed; text+image),
Request/Options, Response/Usage/FinishReason, Stream/StreamEvent,
Tool/Toolbox (panic-safe Execute), Capabilities (zero-value semantics),
Model/Provider interfaces, APIError + transient/permanent Classify.
- `health/`: clock-injected tracker — consecutive-failure threshold,
exponential capped cooldown, reset-on-success, thread-safe; full
deterministic test suite (fake clock).
- Root: Registry (providers/aliases/schemes/health), Parse with the binding
grammar (verbatim model ids, inline recursive alias expansion, cycle
detection, dedup), LLM_* env-DSN loading (go-llm-parity lazy fallback +
eager LoadEnv/New scan), chain executor implementing Model
(retry-on-transient, bench-on-repeat, skip-benched, 404-advance,
fail-fast-on-auth, joined exhaustion errors). Built-ins register as
resolvable stubs until their phases land.
- `provider/fake/`: scriptable provider (per-model outcome queues, request
recording, capabilities overrides, streaming) — the hermetic test rig.
- ADRs 00010008 + index; CLAUDE.md; honest README with pending-marked
matrix.
- Tests cover the two required cases: the trailing-`thinking` chain parse
and `LLM_M1=foreman://token@host` loading (plus DSN table, lazy fallback,
cycle detection, chain failover/backoff/exhaustion, toolbox execution,
error classification).
**Notes:** chain executor landed in Phase 1 (design was settled);
Phase 2 deepens its test matrix (cooldown re-admission via fake clock,
alias-in-chain failover, permanent-policy override) and wires anything the
tests flush out.
**Next:** Phase 2 — exhaustive health/chain test matrix.