fix(agent): recover front-loaded answer when terminal turn is degenerate
CI / Tidy (pull_request) Successful in 9m31s
CI / Build & Test (pull_request) Successful in 10m14s
CI / Tidy (push) Successful in 9m26s
CI / Build & Test (push) Successful in 10m19s

The agent loop took the final answer only from the terminal (no-tool-call)
turn. Models that "front-load" their answer into an earlier turn that also
calls a tool — then close with a trivial pointer like "(Already answered
above.)" — had their real answer discarded and the pointer delivered. This
recurs across several open-weight models (glm-5.2, etc.); well-behaved models
(Claude/GPT) defer their answer to the terminal turn and are unaffected.

finalOutput() now falls back to the last substantive assistant content in the
transcript when the terminal text is weak (empty, or a short back-reference).
The predicate is narrow and back-reference-gated so short-but-correct answers
("42", "It's down, restarting now.") are never overridden; recovery only picks
a prior turn that reads like a real answer, not a preamble. Zero extra model
calls. Terminal-answer behavior for normal runs is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit was merged in pull request #1.
This commit is contained in:
2026-06-26 18:37:38 -04:00
parent 74474c6da0
commit 1fd7109a42
3 changed files with 283 additions and 2 deletions
+4 -2
View File
@@ -313,8 +313,10 @@ func (a *Agent) Run(ctx context.Context, input string, opts ...RunOption) (*Resu
step := Step{Index: stepIdx, Response: resp}
if len(resp.ToolCalls) == 0 {
// Final answer.
result.Output = resp.Text()
// Final answer. Usually this terminal turn's text; but if the model
// front-loaded its answer into an earlier tool-call turn and closed
// with a trivial pointer, recover that earlier content instead.
result.Output = finalOutput(msgs, resp.Text())
result.Steps = append(result.Steps, step)
result.Messages = msgs
a.notify(rc, step)