fix(agent): recover front-loaded answer when terminal turn is degenerate #1

Merged
steve merged 1 commits from fix/agent-front-loaded-final-output into main 2026-06-26 22:48:10 +00:00
Owner

Problem

The agent loop took the final answer only from the terminal (no-tool-call) turn (agent.go: result.Output = resp.Text()). Models that "front-load" their full answer into an earlier turn that also calls a tool — then close with a trivial pointer like "(Already answered above.)" — had their real answer discarded and the pointer delivered.

This is a recurring class of bug across open-weight models (observed with glm-5.2:cloud in the downstream mort bot: the model produced its complete answer alongside a cite call, then emitted "(Already answered above.)" as the terminal turn, and that pointer is what the user saw). Well-behaved models (Claude/GPT) defer their answer to the terminal turn and are unaffected.

Fix

New agent/finalize.go: finalOutput(msgs, terminal) returns the terminal text when it's substantive, else falls back to the last substantive assistant content already in the transcript. Guardrails keep it safe:

  • Weak-terminal predicate is narrow and back-reference-gated: empty/whitespace, or len ≤ 120 and matches a back-reference regex (already answered, see above, as I said, …). Short-but-correct answers ("42", "Yes.", "It's down, restarting now.") are never treated as weak.
  • Recovery only picks a prior turn that reads like a real answer (≥ 200 chars, or ≥ 80 and ≥ 3× the terminal and not a preamble like "Let me check…").
  • Zero extra model calls; normal terminal-answer behavior is unchanged.

Tests

agent/finalize_test.go — predicate + recovery table tests (including false-positive guards) and two end-to-end loop tests via the fake provider: the front-loaded-answer shape is recovered with no extra model call, and a healthy deferred answer is delivered verbatim. Full suite green (go test ./...).

🤖 Generated with Claude Code

## Problem The agent loop took the final answer **only** from the terminal (no-tool-call) turn (`agent.go`: `result.Output = resp.Text()`). Models that "front-load" their full answer into an *earlier* turn that also calls a tool — then close with a trivial pointer like `"(Already answered above.)"` — had their real answer discarded and the pointer delivered. This is a recurring class of bug across open-weight models (observed with `glm-5.2:cloud` in the downstream mort bot: the model produced its complete answer alongside a `cite` call, then emitted `"(Already answered above.)"` as the terminal turn, and that pointer is what the user saw). Well-behaved models (Claude/GPT) defer their answer to the terminal turn and are unaffected. ## Fix New `agent/finalize.go`: `finalOutput(msgs, terminal)` returns the terminal text when it's substantive, else falls back to the **last substantive assistant content** already in the transcript. Guardrails keep it safe: - **Weak-terminal predicate** is narrow and back-reference-gated: empty/whitespace, or `len ≤ 120` **and** matches a back-reference regex (`already answered`, `see above`, `as I said`, …). Short-but-correct answers (`"42"`, `"Yes."`, `"It's down, restarting now."`) are never treated as weak. - **Recovery** only picks a prior turn that reads like a real answer (`≥ 200` chars, or `≥ 80` and `≥ 3×` the terminal and not a preamble like `"Let me check…"`). - Zero extra model calls; normal terminal-answer behavior is unchanged. ## Tests `agent/finalize_test.go` — predicate + recovery table tests (including false-positive guards) and two end-to-end loop tests via the fake provider: the front-loaded-answer shape is recovered with no extra model call, and a healthy deferred answer is delivered verbatim. Full suite green (`go test ./...`). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-06-26 22:37:59 +00:00
fix(agent): recover front-loaded answer when terminal turn is degenerate
CI / Tidy (pull_request) Successful in 9m31s
CI / Build & Test (pull_request) Successful in 10m14s
CI / Tidy (push) Successful in 9m26s
CI / Build & Test (push) Successful in 10m19s
1fd7109a42
The agent loop took the final answer only from the terminal (no-tool-call)
turn. Models that "front-load" their answer into an earlier turn that also
calls a tool — then close with a trivial pointer like "(Already answered
above.)" — had their real answer discarded and the pointer delivered. This
recurs across several open-weight models (glm-5.2, etc.); well-behaved models
(Claude/GPT) defer their answer to the terminal turn and are unaffected.

finalOutput() now falls back to the last substantive assistant content in the
transcript when the terminal text is weak (empty, or a short back-reference).
The predicate is narrow and back-reference-gated so short-but-correct answers
("42", "It's down, restarting now.") are never overridden; recovery only picks
a prior turn that reads like a real answer, not a preamble. Zero extra model
calls. Terminal-answer behavior for normal runs is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve merged commit 1fd7109a42 into main 2026-06-26 22:48:10 +00:00
steve deleted branch fix/agent-front-loaded-final-output 2026-06-26 22:48:10 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/majordomo#1