Files
majordomo/progress.md
T
steve 04b21fdad2 feat: live-validated against Ollama Cloud; schema instruction fallback for cloud
Phase 8: all six live checks pass (tier aliases, thinking-tier chat, real
tool invocation, structured Generate[T], forced failover with bench+skip,
skill agent). Discovery: ollama.com ignores the format field — the
provider now also states the schema as a system instruction (constrained
decoding locally, instruction-guided JSON on cloud), with hermetic test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:22:54 +02:00

10 KiB
Raw Blame History

progress

2026-06-10 — Phase 8: live validation against real Ollama Cloud

All six checks PASS (examples/live harness, OLLAMA_API_KEY from .env):

  1. Tier aliases (thinking = minimax-m3:cloud→kimi-k2.6:cloud, workhorse = minimax-m2.7:cloud→qwen3-coder:480b-cloud) resolve via Parse, incl. as a trailing chain element.
  2. Plain chat served by ollama-cloud/minimax-m3:cloud (189 in/48 out).
  3. Live tool call: the workhorse agent actually invoked get_launch_code and answered from its result in 2 steps.
  4. Structured Generate[T] decoded {City:Tokyo Country:Japan Population:14000000 Latitude:35.6762}.
  5. Forced failover: an unreachable head (connection refused = transient) was retried, benched, and fell through to a live cloud tail; the second request skipped the benched head without dialing it.
  6. Agent with the calc skill attached invoked calculate and answered 56161.

Discovery + fix: Ollama Cloud ignores the format field entirely (verified with raw curl — markdown came back despite a schema). The ollama provider now also states the schema as an explicit system instruction (local stays constrained-decoded; cloud becomes instruction-guided); hermetic test added. The :cloud-suffixed model names work verbatim against ollama.com — mort's tier strings carry over unchanged.

Next: Phase 9 — convert mort onto majordomo, open the PR.

2026-06-10 — Phase 7: examples, migration blueprint, README finalization

Landed: examples/ — nine runnable programs, one per hard requirement (parse, failover incl. trailing-alias chains, custom tiers, LLM_* env providers + foreman, multimodal, raw tool loop, structured Generate[T], agent with toolbox, skills) + examples/README index; all built by the hermetic gate suite. docs/mort-migration.md — the full conversion blueprint: layering (what stays mort-side), the symbol-level core mappings table, seven planned additive library extensions (dynamic resolvers, DefineTool[Args], usage detail fields, prompt caching, agent loop hooks, manual bench controls, failover observer), the Phase 9 execution order, and the behavioral deltas to verify (failover knob mapping, AdvanceOnPermanent for go-llm's ErrRequestSpecific behavior, bytes-only images). README final pass with the complete feature/provider matrix.

Next: Phase 8 — live validation against real Ollama Cloud.

2026-06-10 — Phase 6: skills

Landed: skill/ (ADR-0013): the agent.Skill contract satisfied by a buildable skill.New(name, WithInstructions/WithTools/WithToolbox); instruction-only skills legal; same-instance reuse across agents; additive ordered composition proven (prompt appending + toolset merge + loud duplicate policy). Example skills: skill/clock (time_now/time_convert, injectable clock) and skill/calc (calculate over a hand-rolled recursive-descent evaluator: + - * / % ^, parens, unary minus, scientific notation; division-by-zero and non-finite results rejected). Tests cover the evaluator table, tool execution through ExecuteTool, and a full agent-loop run answering from the calculate result.

Next: Phase 7 — examples/, mort migration map, README finalization.

2026-06-10 — Phase 5: agent loop, Generate[T], schema derivation

Landed: agent/ (ADR-0012): New(model, system, opts) with toolboxes, max steps (default 10), per-step request options, agent-level observers + per-run OnStep, WithHistory continuation (Result.Messages round-trips), sequential tool dispatch through panic-recovering ExecuteTool, unknown tools → IsError results, duplicate tool names fail loudly, partial Result preserved on ErrMaxSteps/model errors/cancellation. The agent.Skill interface ships here (instructions + tools composition is tested with a stub); the skill package with real implementations is Phase 6. llm.SchemaFor[T] reflect-derived strict-compatible JSON schemas (pointers→nullable anyOf, description/enum tags, maps/slices/time/RawMessage, recursion rejected) and root majordomo.Generate[T] (schema injection, fence-stripping decode, model-naming errors). 15 agent tests + schema + Generate suites, all hermetic.

Next: Phase 6 — skill package + two example skills.

2026-06-10 — Phase 4: Google provider (official genai SDK)

Landed: provider/google on google.golang.org/genai v1.59.0 (ADR-0011): lazy cached client (construction never fails; missing key = synthetic 401 so chains fail over), assistant→model role mapping, FunctionResponse tool results with output/error payloads, ParametersJsonSchema raw-schema tools, ResponseJsonSchema structured output, ToolChoice→FunctionCallingConfig, ReasoningEffort→ThinkingConfig.ThinkingLevel, usage includes thought tokens, iter.Pull2-adapted streaming, genai.APIError→llm.APIError mapping. Hermetic tests via HTTPOptions.BaseURL + httptest (SSE fixtures for streaming). Registry: google + gemini schemes wired to the real provider; the last stub machinery deleted — all six built-ins are now real clients. README matrix: Google row fully .

Next: Phase 5 — Agent run loop, Toolbox ergonomics, Generate[T].

2026-06-10 — Phase 3: REST providers (OpenAI, Anthropic, Ollama×3) + media

Landed:

  • provider/openai: Chat Completions client for OpenAI and every OpenAI-compatible endpoint (tools with string-arguments mapping, strict SSE streaming incl. by-index tool-call assembly and the empty-choices usage chunk, response_format json_schema, max_completion_tokens with a WithLegacyMaxTokens compat option, reasoning_effort).
  • provider/anthropic: Messages API client (anthropic-version 2023-06-01, required-max_tokens defaulting, tool_use/tool_result blocks with native is_error, GA structured output via output_config.format, full SSE event parser with input_json_delta buffering, 529-overloaded classified transient, usage sums cache tokens).
  • provider/ollama: ONE native /api/chat client serving ollama (local, OLLAMA_HOST normalization), ollama-cloud (https://ollama.com + bearer OLLAMA_API_KEY), and foreman (base URL + bearer; tolerates its buffered-single-object "streaming"). Object tool arguments, tool_name results, format-schema structured output, think-level mapping, NDJSON streaming with 16MB lines.
  • media/: normalization pipeline per ADR-0009 (magic-byte sniffing, box-filter downscale, transcode preference ladder, byte-budget quality ladder, webp passthrough-or-reject, copy-on-write, everything-unfittable wraps ErrUnsupported).
  • Chain executor now normalizes media PER TARGET before each attempt and advances penalty-free past targets that can't take the request (proven: text-only head + vision fallback; per-target downscale assertions).
  • Registry: real providers + scheme factories wired for openai, anthropic, ollama, ollama-cloud, foreman (google still stubbed, Phase 4); WithHTTPClient registry option; required env-foreman TLS chat round-trip test (LLM_FM=foreman://token@host → Parse("fm/qwen3:30b") → bearer arrives, chat answers).
  • ADR-0009 (multimodal), ADR-0010 (tools/structured mapping); README matrix flipped to for the four landed provider families; ~70 new hermetic tests across the three provider packages + media.
  • Run note: openai/anthropic/media were built by three parallel subagents against the frozen llm contract; ollama/foreman, chain wiring, and registry integration done in the main line. All gates green.

Next: Phase 4 — Google provider on google.golang.org/genai.

2026-06-10 — Phase 2: health + failover chain, proven

Landed: the full deterministic failover test matrix over the fake provider + fake clock (no sleeps, no network): single-transient recovery via same-target retry; repeated transients bench + advance; cooldown expiry re-admits and success resets; backoff doubling across bench rounds; mixed chain with an inline-expanded alias element failing over through the expanded targets; permanent-policy default (fail-fast on auth) and AdvanceOnPermanent override; TransientRetries disabled/custom; retry loop stops early when the tracker benches mid-request; exhaustion error lists skipped-while-benched targets; custom classifier override; chain-of- one gets identical semantics; HTTP 529 fails over. Implementation needed no changes — Phase 1's executor held up.

Next: Phase 3 — OpenAI/Anthropic/Ollama/foreman REST clients + media pipeline.

2026-06-10 — Phase 1: foundations, ADRs, skeleton, docs

Landed:

  • Module scaffold (Go 1.26), .gitea/workflows/ci.yaml (foreman-style gates: build, vet, race tests, tidy-diff), .env.example.
  • llm/ canonical contract: Message/Part (sealed; text+image), Request/Options, Response/Usage/FinishReason, Stream/StreamEvent, Tool/Toolbox (panic-safe Execute), Capabilities (zero-value semantics), Model/Provider interfaces, APIError + transient/permanent Classify.
  • health/: clock-injected tracker — consecutive-failure threshold, exponential capped cooldown, reset-on-success, thread-safe; full deterministic test suite (fake clock).
  • Root: Registry (providers/aliases/schemes/health), Parse with the binding grammar (verbatim model ids, inline recursive alias expansion, cycle detection, dedup), LLM_* env-DSN loading (go-llm-parity lazy fallback + eager LoadEnv/New scan), chain executor implementing Model (retry-on-transient, bench-on-repeat, skip-benched, 404-advance, fail-fast-on-auth, joined exhaustion errors). Built-ins register as resolvable stubs until their phases land.
  • provider/fake/: scriptable provider (per-model outcome queues, request recording, capabilities overrides, streaming) — the hermetic test rig.
  • ADRs 00010008 + index; CLAUDE.md; honest README with pending-marked matrix.
  • Tests cover the two required cases: the trailing-thinking chain parse and LLM_M1=foreman://token@host loading (plus DSN table, lazy fallback, cycle detection, chain failover/backoff/exhaustion, toolbox execution, error classification).

Notes: chain executor landed in Phase 1 (design was settled); Phase 2 deepens its test matrix (cooldown re-admission via fake clock, alias-in-chain failover, permanent-policy override) and wires anything the tests flush out.

Next: Phase 2 — exhaustive health/chain test matrix.