fix: address verified gadfly P2 findings (9 real of 18)
Independently verified all 18 gadfly findings against the code (18-agent
fan-out). Fixed the 9 real ones; the other 9 were false-positive /
hallucinated / valid-tradeoff (no change).
High:
- F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the
agent loop and nil-panicked. Now a clean error (Run never panics). +test.
- F9 compactor data-leak: renderTranscript sent tool-call args verbatim to
the summarizer (a possibly-different provider/tier); secret-bearing tool
args (mcp_call/email_send/http_*/webhook_*) are now redacted, with a doc
note that result bodies still flow (summary needs them).
Medium/minor:
- F2 compactor error path returned the folded slice, not the original msgs
(contradicting the documented non-fatal contract) -> return msgs.
- F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) /
cancelled (Canceled) via statusFor. +test.
- F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs
min(calls,results) only.
- F5 SetIteration was never called -> RunState.Iteration always 0; the step
observer now updates it each loop.
- F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue).
- F7 estimateTokens had no default arm (future Part kinds counted as 0);
unknown parts now counted conservatively.
- F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now
a clear "response exceeded N bytes" via readCapped.
- F12 step observer captured the caller ctx; now the merged runCtx.
- F13 compaction onFire was nil (doc claimed it logged); now wired to
audit LogEvent("compaction_fired").
- F11 (no pre-dispatch hook in majordomo) documented honestly as a known
limitation; F18 UsageSink doc clarified cache tokens are subsets of input.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
+24
-12
@@ -1,11 +1,10 @@
|
||||
package run
|
||||
|
||||
// steps.go — the per-run step emitter and the tool→step presentation
|
||||
// mapping. This is the single place that turns the executor's two loop
|
||||
// chokepoints (the pre-dispatch tool hook + the post-step observer in
|
||||
// executor.go) into ordered tool.Step records: one per tool call,
|
||||
// each with a stable id, an open-vocabulary kind, and a human
|
||||
// present-tense summary that flips running→complete/error.
|
||||
// mapping. This is the single place that turns the executor's post-step
|
||||
// observer into ordered tool.Step records: one per tool call, each with a
|
||||
// stable id, an open-vocabulary kind, and a human present-tense summary
|
||||
// that flips running→complete/error.
|
||||
//
|
||||
// One source feeds two consumers (mirroring the OnEvent/OnToolEvent/
|
||||
// PostRunResult pattern): the live tool.Invocation.OnStep callback
|
||||
@@ -13,6 +12,16 @@ package run
|
||||
// Because the Result accumulation does not depend on OnStep being set,
|
||||
// every surface — chat (JSON + SSE), Discord, cron, sub-agents — carries
|
||||
// steps; OnStep is needed only for live streaming.
|
||||
//
|
||||
// LIMITATION (current): majordomo exposes only a POST-step observer, so
|
||||
// the executor calls toolStart+toolEnd back-to-back after each tool has
|
||||
// already run. Steps are therefore recorded faithfully, but step.StartedAt
|
||||
// ≈ EndedAt and the intermediate "running" phase is never observable to a
|
||||
// live OnStep consumer. A pre-dispatch hook (wrapping each tool's handler
|
||||
// to emit toolStart before execution, like mort's state-react decorator)
|
||||
// is a follow-up that would restore real start timing + the running phase.
|
||||
// The emitter already supports that two-call shape — toolStart and toolEnd
|
||||
// are separate methods — so wiring it later is additive.
|
||||
|
||||
import (
|
||||
"context"
|
||||
@@ -35,8 +44,8 @@ const stepSummaryMaxLen = 200
|
||||
// stepEmitter accumulates ordered steps for one run and fires the live
|
||||
// OnStep callback.
|
||||
//
|
||||
// Concurrency: touched ONLY from the agent-loop goroutine. Both call
|
||||
// sites (the hookToolbox `before` closure and the stepObserver) run
|
||||
// Concurrency: touched ONLY from the agent-loop goroutine — the executor's
|
||||
// stepObserver (and, once a pre-dispatch hook is wired, that hook) run
|
||||
// there; majordomo executes a step's tool calls sequentially, and
|
||||
// sub-agents build their own Invocation so they never reach this
|
||||
// emitter. Same single-goroutine contract as the audit Writer and the
|
||||
@@ -46,7 +55,7 @@ type stepEmitter struct {
|
||||
now func() time.Time
|
||||
|
||||
seq int
|
||||
steps []tool.Step // ordered; the snapshot for Result.Steps
|
||||
steps []tool.Step // ordered; the snapshot for Result.Steps
|
||||
byID map[string]int // step id -> index into steps
|
||||
pending map[string][]string // correlation key -> queued running ids (FIFO)
|
||||
}
|
||||
@@ -126,9 +135,12 @@ func (e *stepEmitter) newStep(name string, args json.RawMessage) tool.Step {
|
||||
return step
|
||||
}
|
||||
|
||||
// matchPending pops the oldest running step id for (name, args). Falls
|
||||
// back to the most recent still-running step of the same tool name when
|
||||
// the args don't byte-match between start and end. Returns "" on no match.
|
||||
// matchPending pops the oldest running step id for (name, args). Falls back to
|
||||
// the OLDEST still-running step of the same tool name when the args don't
|
||||
// byte-match between start and end (e.g. JSON key reordering). FIFO on the
|
||||
// fallback too, consistent with the per-key queue pop above — closing the
|
||||
// oldest avoids mis-correlating concurrent same-named calls. Returns "" on no
|
||||
// match.
|
||||
func (e *stepEmitter) matchPending(name string, args json.RawMessage) string {
|
||||
key := corrKey(name, args)
|
||||
if q := e.pending[key]; len(q) > 0 {
|
||||
@@ -140,7 +152,7 @@ func (e *stepEmitter) matchPending(name string, args json.RawMessage) string {
|
||||
}
|
||||
return id
|
||||
}
|
||||
for i := len(e.steps) - 1; i >= 0; i-- {
|
||||
for i := 0; i < len(e.steps); i++ {
|
||||
if e.steps[i].Title == name && e.steps[i].Status == "running" {
|
||||
return e.steps[i].ID
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user