feat: live-validated against Ollama Cloud; schema instruction fallback for cloud

Phase 8: all six live checks pass (tier aliases, thinking-tier chat, real
tool invocation, structured Generate[T], forced failover with bench+skip,
skill agent). Discovery: ollama.com ignores the format field — the
provider now also states the schema as a system instruction (constrained
decoding locally, instruction-guided JSON on cloud), with hermetic test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-06-10 13:22:54 +02:00
parent 97513141dc
commit 04b21fdad2
5 changed files with 299 additions and 1 deletions
+27
View File
@@ -1,5 +1,32 @@
# progress
## 2026-06-10 — Phase 8: live validation against real Ollama Cloud
**All six checks PASS** (examples/live harness, OLLAMA_API_KEY from .env):
1. Tier aliases (`thinking` = minimax-m3:cloud→kimi-k2.6:cloud,
`workhorse` = minimax-m2.7:cloud→qwen3-coder:480b-cloud) resolve via
Parse, incl. as a trailing chain element.
2. Plain chat served by ollama-cloud/minimax-m3:cloud (189 in/48 out).
3. Live tool call: the workhorse agent actually invoked get_launch_code
and answered from its result in 2 steps.
4. Structured Generate[T] decoded {City:Tokyo Country:Japan
Population:14000000 Latitude:35.6762}.
5. Forced failover: an unreachable head (connection refused = transient)
was retried, benched, and fell through to a live cloud tail; the second
request skipped the benched head without dialing it.
6. Agent with the calc skill attached invoked calculate and answered
56161.
**Discovery + fix:** Ollama Cloud ignores the `format` field entirely
(verified with raw curl — markdown came back despite a schema). The
ollama provider now also states the schema as an explicit system
instruction (local stays constrained-decoded; cloud becomes
instruction-guided); hermetic test added. The `:cloud`-suffixed model
names work verbatim against ollama.com — mort's tier strings carry over
unchanged.
**Next:** Phase 9 — convert mort onto majordomo, open the PR.
## 2026-06-10 — Phase 7: examples, migration blueprint, README finalization
**Landed:** `examples/` — nine runnable programs, one per hard requirement