The agent loop took the final answer only from the terminal (no-tool-call)
turn. Models that "front-load" their answer into an earlier turn that also
calls a tool — then close with a trivial pointer like "(Already answered
above.)" — had their real answer discarded and the pointer delivered. This
recurs across several open-weight models (glm-5.2, etc.); well-behaved models
(Claude/GPT) defer their answer to the terminal turn and are unaffected.
finalOutput() now falls back to the last substantive assistant content in the
transcript when the terminal text is weak (empty, or a short back-reference).
The predicate is narrow and back-reference-gated so short-but-correct answers
("42", "It's down, restarting now.") are never overridden; recovery only picks
a prior turn that reads like a real answer, not a preamble. Zero extra model
calls. Terminal-answer behavior for normal runs is unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A failover chain previously treated a successful-but-empty completion (no
content parts and no tool calls — a "stop with nothing") as a valid result
and returned it. The agent loop then ended the run with empty output, and
the configured backup models were never tried because no error was raised.
This let a single flaky model silently terminate an agent/skill run with
no answer (observed in the wild with ollama-cloud/glm-5.2 returning empty
completions right after a large tool/think turn).
- Add llm.ErrEmptyResponse (classified transient) and Response.IsEmpty():
true only when there are no tool calls and no meaningful content (no
parts, or whitespace-only text). A media/image part counts as content,
so image-only responses are NOT empty.
- chain.Generate converts an empty completion into ErrEmptyResponse so the
chain fails over to the next target. Unlike an ordinary transient it is
NOT retried on the same target (the model just produced it; these calls
are expensive) — the chain penalizes health (so a persistently-empty
target benches) and advances immediately.
- When every target returns empty the call fails with ErrChainExhausted
joined to ErrEmptyResponse — a visible error instead of a hollow success.
Single-element chains therefore also surface empties as errors.
Stream path is unchanged (can't inspect content before the consumer reads
it). Tests: Response.IsEmpty table; chain fails over past an empty head;
all-empty chain returns ErrChainExhausted/ErrEmptyResponse; repeated
empties bench the target across requests. Full suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- README + CLAUDE.md: upfront "this is a vibe-coded project" disclosure for
going public.
- Replace internal LAN hostnames (*.orgrimmar.dudenhoeffer.casa) with
example.com across README, ADR-0004, the envproviders example, and env_test.go
(assertions updated together; suite still green). Token was already a
"change-me" placeholder, not a real secret.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 8: all six live checks pass (tier aliases, thinking-tier chat, real
tool invocation, structured Generate[T], forced failover with bench+skip,
skill agent). Discovery: ollama.com ignores the format field — the
provider now also states the schema as a system instruction (constrained
decoding locally, instruction-guided JSON on cloud), with hermetic test.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Groundwork for the provider phase: reasoning levels map to native knobs
(OpenAI reasoning_effort, Ollama think); ErrUnsupported marks declared
capability mismatches that chains advance past without health penalty.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>