222 lines
12 KiB
Markdown
222 lines
12 KiB
Markdown
# progress
|
||
|
||
## 2026-06-10 — Phase 9b: mort converted, PR open
|
||
|
||
**Done.** mort fully re-based on majordomo on branch
|
||
`majordomo-conversion`: 230 files (+8726/−6211), go-llm/v2 and
|
||
go-agentkit removed from go.mod/go.sum with a clean repo-wide grep,
|
||
`go build`/`go vet` clean, full test suite green (80 packages ok, 0
|
||
failures). Highlights: pkg/logic/llms rebuilt as the choke point
|
||
(registry, lane decorators, convar tier resolver, failover wiring via
|
||
llms.Wire); skillexec/agentexec on majordomo agent loops (critic budget
|
||
via WithMaxStepsFunc, steer, compactor, tool-error guards); runDirect
|
||
special case deleted; scaddy critic redesigned as one-shot multimodal
|
||
Generate; agentkit httpapi replaced by a mort-side server; ~96 tools on
|
||
DefineTool. PR (open, not merged):
|
||
https://gitea.stevedudenhoeffer.com/steve/mort/pulls/1274
|
||
|
||
Run note: executed by an 8-agent staged workflow; one mid-run deadlock
|
||
(a cluster agent polling a long-tail package) was broken by converting
|
||
tasks/recipe/summary/cookbook in the main line; one full workflow restart
|
||
after a network outage.
|
||
|
||
## 2026-06-10 — Phase 9a: conversion-driven library extensions
|
||
|
||
**Landed (ADR-0014):** RegisterResolver (dynamic DB-backed tiers, static
|
||
aliases win, recursive + cycle-guarded), DefineTool[Args] (typed tools
|
||
over SchemaFor), Usage cache/reasoning detail fields populated by
|
||
anthropic/openai/google, WithPromptCaching (Anthropic top-level
|
||
cache_control), agent hooks (WithMaxStepsFunc, WithSteer, WithCompactor —
|
||
non-fatal on error, canonical transcript stays uncompacted —
|
||
WithToolErrorLimits with ErrToolLoop), health Bench/Unbench/Snapshot,
|
||
ChainConfig.Observer failover events (attempt/bench/skip). Full hermetic
|
||
coverage for each.
|
||
|
||
**Next:** Phase 9b — the mort conversion branch.
|
||
|
||
## 2026-06-10 — Phase 8: live validation against real Ollama Cloud
|
||
|
||
**All six checks PASS** (examples/live harness, OLLAMA_API_KEY from .env):
|
||
1. Tier aliases (`thinking` = minimax-m3:cloud→kimi-k2.6:cloud,
|
||
`workhorse` = minimax-m2.7:cloud→qwen3-coder:480b-cloud) resolve via
|
||
Parse, incl. as a trailing chain element.
|
||
2. Plain chat served by ollama-cloud/minimax-m3:cloud (189 in/48 out).
|
||
3. Live tool call: the workhorse agent actually invoked get_launch_code
|
||
and answered from its result in 2 steps.
|
||
4. Structured Generate[T] decoded {City:Tokyo Country:Japan
|
||
Population:14000000 Latitude:35.6762}.
|
||
5. Forced failover: an unreachable head (connection refused = transient)
|
||
was retried, benched, and fell through to a live cloud tail; the second
|
||
request skipped the benched head without dialing it.
|
||
6. Agent with the calc skill attached invoked calculate and answered
|
||
56161.
|
||
|
||
**Discovery + fix:** Ollama Cloud ignores the `format` field entirely
|
||
(verified with raw curl — markdown came back despite a schema). The
|
||
ollama provider now also states the schema as an explicit system
|
||
instruction (local stays constrained-decoded; cloud becomes
|
||
instruction-guided); hermetic test added. The `:cloud`-suffixed model
|
||
names work verbatim against ollama.com — mort's tier strings carry over
|
||
unchanged.
|
||
|
||
**Next:** Phase 9 — convert mort onto majordomo, open the PR.
|
||
|
||
## 2026-06-10 — Phase 7: examples, migration blueprint, README finalization
|
||
|
||
**Landed:** `examples/` — nine runnable programs, one per hard requirement
|
||
(parse, failover incl. trailing-alias chains, custom tiers, LLM_* env
|
||
providers + foreman, multimodal, raw tool loop, structured Generate[T],
|
||
agent with toolbox, skills) + examples/README index; all built by the
|
||
hermetic gate suite. `docs/mort-migration.md` — the full conversion
|
||
blueprint: layering (what stays mort-side), the symbol-level core
|
||
mappings table, seven planned additive library extensions (dynamic
|
||
resolvers, DefineTool[Args], usage detail fields, prompt caching, agent
|
||
loop hooks, manual bench controls, failover observer), the Phase 9
|
||
execution order, and the behavioral deltas to verify (failover knob
|
||
mapping, AdvanceOnPermanent for go-llm's ErrRequestSpecific behavior,
|
||
bytes-only images). README final pass with the complete feature/provider
|
||
matrix.
|
||
|
||
**Next:** Phase 8 — live validation against real Ollama Cloud.
|
||
|
||
## 2026-06-10 — Phase 6: skills
|
||
|
||
**Landed:** `skill/` (ADR-0013): the agent.Skill contract satisfied by a
|
||
buildable skill.New(name, WithInstructions/WithTools/WithToolbox);
|
||
instruction-only skills legal; same-instance reuse across agents; additive
|
||
ordered composition proven (prompt appending + toolset merge + loud
|
||
duplicate policy). Example skills: `skill/clock` (time_now/time_convert,
|
||
injectable clock) and `skill/calc` (calculate over a hand-rolled
|
||
recursive-descent evaluator: + - * / % ^, parens, unary minus, scientific
|
||
notation; division-by-zero and non-finite results rejected). Tests cover
|
||
the evaluator table, tool execution through ExecuteTool, and a full
|
||
agent-loop run answering from the calculate result.
|
||
|
||
**Next:** Phase 7 — examples/, mort migration map, README finalization.
|
||
|
||
## 2026-06-10 — Phase 5: agent loop, Generate[T], schema derivation
|
||
|
||
**Landed:** `agent/` (ADR-0012): New(model, system, opts) with toolboxes,
|
||
max steps (default 10), per-step request options, agent-level observers +
|
||
per-run OnStep, WithHistory continuation (Result.Messages round-trips),
|
||
sequential tool dispatch through panic-recovering ExecuteTool, unknown
|
||
tools → IsError results, duplicate tool names fail loudly, partial Result
|
||
preserved on ErrMaxSteps/model errors/cancellation. The agent.Skill
|
||
interface ships here (instructions + tools composition is tested with a
|
||
stub); the skill package with real implementations is Phase 6.
|
||
`llm.SchemaFor[T]` reflect-derived strict-compatible JSON schemas
|
||
(pointers→nullable anyOf, description/enum tags, maps/slices/time/RawMessage,
|
||
recursion rejected) and root `majordomo.Generate[T]` (schema injection,
|
||
fence-stripping decode, model-naming errors). 15 agent tests + schema +
|
||
Generate suites, all hermetic.
|
||
|
||
**Next:** Phase 6 — skill package + two example skills.
|
||
|
||
## 2026-06-10 — Phase 4: Google provider (official genai SDK)
|
||
|
||
**Landed:** `provider/google` on google.golang.org/genai v1.59.0 (ADR-0011):
|
||
lazy cached client (construction never fails; missing key = synthetic 401
|
||
so chains fail over), assistant→model role mapping, FunctionResponse tool
|
||
results with output/error payloads, ParametersJsonSchema raw-schema tools,
|
||
ResponseJsonSchema structured output, ToolChoice→FunctionCallingConfig,
|
||
ReasoningEffort→ThinkingConfig.ThinkingLevel, usage includes thought
|
||
tokens, iter.Pull2-adapted streaming, genai.APIError→llm.APIError mapping.
|
||
Hermetic tests via HTTPOptions.BaseURL + httptest (SSE fixtures for
|
||
streaming). Registry: google + gemini schemes wired to the real provider;
|
||
the last stub machinery deleted — all six built-ins are now real clients.
|
||
README matrix: Google row fully ✅.
|
||
|
||
**Next:** Phase 5 — Agent run loop, Toolbox ergonomics, Generate[T].
|
||
|
||
## 2026-06-10 — Phase 3: REST providers (OpenAI, Anthropic, Ollama×3) + media
|
||
|
||
**Landed:**
|
||
- `provider/openai`: Chat Completions client for OpenAI and every
|
||
OpenAI-compatible endpoint (tools with string-arguments mapping, strict
|
||
SSE streaming incl. by-index tool-call assembly and the empty-choices
|
||
usage chunk, response_format json_schema, max_completion_tokens with a
|
||
WithLegacyMaxTokens compat option, reasoning_effort).
|
||
- `provider/anthropic`: Messages API client (anthropic-version 2023-06-01,
|
||
required-max_tokens defaulting, tool_use/tool_result blocks with native
|
||
is_error, GA structured output via output_config.format, full SSE event
|
||
parser with input_json_delta buffering, 529-overloaded classified
|
||
transient, usage sums cache tokens).
|
||
- `provider/ollama`: ONE native /api/chat client serving ollama (local,
|
||
OLLAMA_HOST normalization), ollama-cloud (https://ollama.com + bearer
|
||
OLLAMA_API_KEY), and foreman (base URL + bearer; tolerates its
|
||
buffered-single-object "streaming"). Object tool arguments, tool_name
|
||
results, format-schema structured output, think-level mapping, NDJSON
|
||
streaming with 16MB lines.
|
||
- `media/`: normalization pipeline per ADR-0009 (magic-byte sniffing,
|
||
box-filter downscale, transcode preference ladder, byte-budget quality
|
||
ladder, webp passthrough-or-reject, copy-on-write, everything-unfittable
|
||
wraps ErrUnsupported).
|
||
- Chain executor now normalizes media PER TARGET before each attempt and
|
||
advances penalty-free past targets that can't take the request (proven:
|
||
text-only head + vision fallback; per-target downscale assertions).
|
||
- Registry: real providers + scheme factories wired for openai, anthropic,
|
||
ollama, ollama-cloud, foreman (google still stubbed, Phase 4);
|
||
WithHTTPClient registry option; required env-foreman TLS chat round-trip
|
||
test (LLM_FM=foreman://token@host → Parse("fm/qwen3:30b") → bearer
|
||
arrives, chat answers).
|
||
- ADR-0009 (multimodal), ADR-0010 (tools/structured mapping); README
|
||
matrix flipped to ✅ for the four landed provider families; ~70 new
|
||
hermetic tests across the three provider packages + media.
|
||
- Run note: openai/anthropic/media were built by three parallel
|
||
subagents against the frozen llm contract; ollama/foreman, chain wiring,
|
||
and registry integration done in the main line. All gates green.
|
||
|
||
**Next:** Phase 4 — Google provider on google.golang.org/genai.
|
||
|
||
## 2026-06-10 — Phase 2: health + failover chain, proven
|
||
|
||
**Landed:** the full deterministic failover test matrix over the fake
|
||
provider + fake clock (no sleeps, no network): single-transient recovery
|
||
via same-target retry; repeated transients bench + advance; cooldown expiry
|
||
re-admits and success resets; backoff doubling across bench rounds;
|
||
mixed chain with an inline-expanded alias element failing over through the
|
||
expanded targets; permanent-policy default (fail-fast on auth) and
|
||
`AdvanceOnPermanent` override; `TransientRetries` disabled/custom; retry
|
||
loop stops early when the tracker benches mid-request; exhaustion error
|
||
lists skipped-while-benched targets; custom classifier override; chain-of-
|
||
one gets identical semantics; HTTP 529 fails over. Implementation needed no
|
||
changes — Phase 1's executor held up.
|
||
|
||
**Next:** Phase 3 — OpenAI/Anthropic/Ollama/foreman REST clients + media
|
||
pipeline.
|
||
|
||
## 2026-06-10 — Phase 1: foundations, ADRs, skeleton, docs
|
||
|
||
**Landed:**
|
||
- Module scaffold (Go 1.26), `.gitea/workflows/ci.yaml` (foreman-style
|
||
gates: build, vet, race tests, tidy-diff), `.env.example`.
|
||
- `llm/` canonical contract: Message/Part (sealed; text+image),
|
||
Request/Options, Response/Usage/FinishReason, Stream/StreamEvent,
|
||
Tool/Toolbox (panic-safe Execute), Capabilities (zero-value semantics),
|
||
Model/Provider interfaces, APIError + transient/permanent Classify.
|
||
- `health/`: clock-injected tracker — consecutive-failure threshold,
|
||
exponential capped cooldown, reset-on-success, thread-safe; full
|
||
deterministic test suite (fake clock).
|
||
- Root: Registry (providers/aliases/schemes/health), Parse with the binding
|
||
grammar (verbatim model ids, inline recursive alias expansion, cycle
|
||
detection, dedup), LLM_* env-DSN loading (go-llm-parity lazy fallback +
|
||
eager LoadEnv/New scan), chain executor implementing Model
|
||
(retry-on-transient, bench-on-repeat, skip-benched, 404-advance,
|
||
fail-fast-on-auth, joined exhaustion errors). Built-ins register as
|
||
resolvable stubs until their phases land.
|
||
- `provider/fake/`: scriptable provider (per-model outcome queues, request
|
||
recording, capabilities overrides, streaming) — the hermetic test rig.
|
||
- ADRs 0001–0008 + index; CLAUDE.md; honest README with pending-marked
|
||
matrix.
|
||
- Tests cover the two required cases: the trailing-`thinking` chain parse
|
||
and `LLM_M1=foreman://token@host` loading (plus DSN table, lazy fallback,
|
||
cycle detection, chain failover/backoff/exhaustion, toolbox execution,
|
||
error classification).
|
||
|
||
**Notes:** chain executor landed in Phase 1 (design was settled);
|
||
Phase 2 deepens its test matrix (cooldown re-admission via fake clock,
|
||
alias-in-chain failover, permanent-policy override) and wires anything the
|
||
tests flush out.
|
||
|
||
**Next:** Phase 2 — exhaustive health/chain test matrix.
|