run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status) #11
Reference in New Issue
Block a user
Delete Branch "critic-parity-recordstep-kill"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Completes the run-critic seam so a host adapter (mort's
agentcritic) has full fidelity — closing the two limitations gadfly surfaced on mort #1334.Changes
RecordStep(iter int, resp *llm.Response)— the completed step's model response is now passed to the critic (was index-only), so a host that keeps a trace (mort'sProgressRecorder) can show what the agent produced, not just an iteration count. The executor forwardss.Response; the battery ignores it (itsProgressis count-based).CriticHandle.KillCause() error+run.ErrCriticKill— the executor now distinguishes an explicit critic KILL from a natural backstop timeout.runCtxuses a cause-carrying cancel (WithCancelCause+ aMaxRuntimetimer cancelling withDeadlineExceeded); the deadline-watch cancels withErrCriticKillwhenKillCause()!=nil, elseDeadlineExceeded.statusForreadscontext.Cause→ killed / timeout / cancelled are now distinct (were all collapsed tocancelled). The battery setskillCausefromDecision.KillReason.Tests
statusFor"killed" case (cause=ErrCriticKill, err=Canceled); fake handle + batteryRecordStep/KillCausesignatures updated. Core stays battery-free; full suite green.Pairs with a mort adapter PR that uses the richer interface (feed the recorder the full step; return the kill error from
KillCause). Shipping in executus v0.1.0.🤖 Generated with Claude Code
🪰 Gadfly review —
m5/qwen3.6:35b-mlx(m5)Verdict: No material issues found · ⚠️ 2/3 lens(es) errored — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found
Security lens review:
The changes are primarily structural (interface evolution + context cancellation semantics). I verified the following:
KillReasonfrom untrusted host input into error messages:critic/critic.go:283—h.killCause = errors.New(reason)usesd.KillReasonverbatim, with no validation, sanitization, or length limit. This string originates from the host-providedOnSoftTimeoutcallback, which could be attacker-controlled (misconfigured host) or contain sensitive system details. If error messages are logged, this creates a log injection / information disclosure vector. A plausible mitigation is truncating or at least validatingKillReasonbefore wrapping in an error (e.g., strip newlines, cap length).LLM-sourced
resp *llm.Responsethrough theRecordStepinterface: The response is silently ignored (_), which is safe for now. However, the comment explicitly notes "a richer Escalator could record it" — any future host implementation that stores or logs this payload from untrusted model output without redaction would inherit an information disclosure risk. This isn't a bug in this PR (the nil-ignoring path is correct), but the interface design invites misuse without guardrails.All other changes —
RecordStepsignature,KillCausemethod,context.WithCancelCausemigration,statusFordistinction logic — are sound within this lens:statusForcorrectly useserrors.Is(context.Cause(runCtx), ErrCriticKill)to check the sentinel, not equality (preventing spoofing).MergeCancellation+cancelCauseinteraction is consistent; no new attack surface.The
KillReasoninjection point (#1 above) is the only concrete security finding, and it's a low-severity pattern common in Go error wrapping — worth sanitizing but not a blocking issue.🎯 Correctness — ⚠️ could not complete
⚠️ This reviewer failed to complete: agent: step 6: context deadline exceeded
🧯 Error handling & edge cases — ⚠️ could not complete
⚠️ This reviewer failed to complete: agent: step 7: context deadline exceeded
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 22m 26s
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
🔒 Security Review
VERDICT: No material issues found
Verified findings:
run/ports.go:17—ErrCriticKillis a package-private sentinel constructed viaerrors.Newwith a static string; no untrusted input flows into it. Safe.run/critic.go:67— The kill-reason string from the critic'sKillCause()is wrapped viafmt.Errorf("%w: %s", ErrCriticKill, cause.Error()).causeoriginates from the in-process critic (critic/critic.go:283, which buildserrors.New(reason)fromd.KillReasonset by a hostEscalator). It's host-controlled but stays in-process (no SQL/command/path/shell sinks, no template rendering, no SSRF target). The%sverb with a non-error value is fine here sincecause.Error()returns astring. No injection vector.critic/critic.go:283—killCause = errors.New(reason)wherereasonis aDecision.KillReasonstring from the host'sEscalator. Host-controlled but trusted (same trust domain as the rest of the critic battery); no untrusted-data sinks. Safe.run/executor.go:306-308—statusForreadscontext.Cause(runCtx).runCtxiscontext.WithoutCancel(ctx)wrapped inWithCancelCausethenMergeCancellation.MergeCancellation(run/runengine.go:69) propagatescontext.Cause(secondary)— i.e. the caller's cancellation cause. A caller stampingcontext.Canceledas the cause is correctly reported as "cancelled"; a caller stamping something exotic flows intoerrors.Ischecks againstErrCriticKill/DeadlineExceeded/Canceled. No bypass of the cause classification — only the three intended sentinels map to terminal statuses, anything else falls through to "error". Safe.run/executor.go:205—time.AfterFunc(maxRuntime, ...)cancels with the staticcontext.DeadlineExceededsentinel. No untrusted input. Safe.CriticHandle.RecordStepnow carries*llm.Response. The critic battery (critic/critic.go:172) discards it (_ *llm.Response); the executor forwards it from majordomo's step observer (run/executor.go:230).*llm.Responseis the same struct already passed torec.OnStep(audit) without sanitisation — no new taint surface introduced. A host critic now sees the model's response, but trust is unchanged (the host already trusted the model layer to invokeRecordStep). No new attack surface.🎯 Correctness — No material issues found
VERDICT: No material issues found
I traced the full correctness path through the change (context plumbing, cause propagation, ordering of
statusForcases, defer ordering) and could not surface a material bug.Verified by reading:
run/executor.go:203-215,302-315—WithCancelCause+time.AfterFunc(maxRuntime, …), defer order (stopCritic → mergeCancel → runTimer.Stop → cancelCause(nil)),statusForordering (ErrCriticKill → DeadlineExceeded → Canceled → error).run/critic.go:34-86— watch goroutine uses the mergedrunCtx;cancelCauseis the WithCancelCause cancel; cause is set toErrCriticKill(wrapped) orcontext.DeadlineExceededbased onKillCause().critic/critic.go:211-215,277-286—KillCause()is mutex-guarded;killCauseis set fromDecision.KillReason(default"critic killed the run").run/runengine.go:55-73—MergeCancellationcascades cause from parent (WithCancelCause) into the merged child via context-tree propagation, which is whatstatusForrelies on.run/executor_test.go:148-174— every test case is consistent: for an un-cancelledbg,context.Cause(bg)==nil; forkillCtxcancelled withfmt.Errorf("%w: hung", ErrCriticKill),errors.Is(context.Cause(killCtx), ErrCriticKill)is true.Notes (verified, no defect):
context.Causereflects the first cancellation in its ancestry viaCancelCauseFunc. WhencancelCause(ErrCriticKill)is called, the merged child reportsErrCriticKill; whencancelCause(DeadlineExceeded)is called, it reportsDeadlineExceeded. Both land on the rightstatusForarm.KillCause()!=nilis sticky (killedflag in the critic handle prevents later un-kill), so the watch goroutine won't observe a "kill" then a "natural expiry" — kill wins, andcancelCause(ErrCriticKill)is idempotent against any laterMaxRuntimetimer firing.defer cancelCause(nil)runs afterstatusFor, so it can't influence the recorded status; clearing the cause on the way out is purely hygiene.MaxRuntime→AfterFuncreplacement preserves the priorWithTimeout → "timeout"semantics for the common case (no critic kill).No constants, formulas, units, or thresholds introduced by this change are arithmetically suspect. The
criticDeadlineCheck = 1sandMaxRuntime = 60sdefaults were pre-existing.🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
run/critic.go:48-75— panic-recover in the deadline-watch silently abandons monitoring. Thedefer func() { _ = recover() }()swallows any panic fromh.Deadline()/h.KillCause()and the goroutine then returns without logging.stopCritic's deferredh.Stop()still fires (no leak), but the hard-deadline backstop is silently lost for the rest of the run — onlyrunTimer(MaxRuntime) or caller-cancel will trip. Worth at minimum anslogwarning so a misbehaving hostCriticHandledoesn't degrade the run invisibly; contrastcritic/critic.go:228-236which does log on panic. Confirmed by reading the goroutine body and thewatchanalogue incritic/critic.go.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 38s
🪰 Gadfly review —
glm-5.1:cloud(ollama-cloud)Verdict: Minor issues · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
Reviewing through the security lens (authn/authz, injection, SSRF, deserialization, credential leakage, input validation, unsafe handling of untrusted data, insecure defaults):
Decision.KillReason→errors.New(reason)→KillCause()→fmt.Errorf("%w: %s", ErrCriticKill, …)→statusFor/stats.Error: The string originates from the host-providedEscalatorimplementation (a controlled, trusted seam — mort plugs its own critic-agent here). It is only ever stringified into error messages and audit fields; it is never interpolated into commands, queries, paths, or templates. No injection vector.ErrCriticKillexported sentinel: Identity-check viaerrors.Isworks correctly (errors.Newproduces a unique pointer). NoUnwraporIsmethod is needed or expected. Not a security concern.RecordStep(iter int, resp *llm.Response): The battery implementation discards the response (_ *llm.Response). No untrusted data is persisted or executed from this parameter. The*llm.Responseis produced by the executor's own model call, not by the caller — and even if it were attacker-influenced, the battery ignores it.Race between deadline-watch goroutine and
statusFor:KillCause()is mutex-protected; the cause is read beforecancelCauseis called, andstatusForreadscontext.Cause(runCtx)only afterag.Runreturns (meaningrunCtxis already done and the cause is settled). No TOCTOU issue.MergeCancellationpropagatescontext.Cause(secondary): When the caller cancels, the original context's cause (which could be nil) flows intorunCtx.statusForhandles this correctly — it falls through to thecontext.Canceledcheck. No status-confusion vulnerability.No credential/secret leakage, no SSRF, no path traversal, no unsafe deserialization: None of the changed code handles URLs, file paths, secrets, or deserialization.
The change is clean from a security perspective.
🎯 Correctness — ⚠️ could not complete
⚠️ This reviewer failed to complete: agent: step 12: all chain targets failed
ollama-cloud/glm-5.1☁️ ollama ollama-cloud/glm-5.1☁️ do request: Post "https://ollama.com/api/chat": context deadline exceeded
🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
statusFordoesn't readcontext.Causeon the correct context whenMergeCancellationre-wraps it (verified). Atrun/executor.go:206,runCtxis reassigned to the merged context returned byMergeCancellation. However,context.Cause(mergedCtx)returns the cause set on the merged context itself, not its parent. When therunTimeror the critic deadline-watch goroutine callscancelCause(…)on the outer context (line 203), the cancellation propagates down to the merged child, and Go'scancelCtx.canceldoes propagate the parent's cause to children. So this works correctly today — but it relies on an internal implementation detail ofcontext.cancelCtx(cause propagation from parent to child). IfMergeCancellationis ever changed to use a different context type (e.g.,valueCtxwrapping),context.Causewould stop seeing the timer/critic causes. This is fragile but not currently broken. No fix needed now, but a comment at theMergeCancellationcall site noting thatcontext.Causeon the merged context depends on parent-to-child cause propagation would guard against future regressions.Missing test: MaxRuntime timeout via
context.Causepath (verified). ThestatusFortest atrun/executor_test.go:152coversErrCriticKillcause and directcontext.DeadlineExceedederror, but it does not test the actual MaxRuntime timeout path whererunErriscontext.Canceled(because the context was cancelled, not timed out directly) andcontext.Cause(runCtx)iscontext.DeadlineExceeded. This is the most common "timeout" scenario in production. A test like:would close the gap. Low severity — the logic is correct, but the primary production path for timeouts has no direct unit test.
No issue with
KillCauserace (verified).KillCause()andcancelCausein the deadline-watch goroutine are called sequentially in the same goroutine (run/critic.go:66-71), so there's no TOCTOU gap between readingKillCause()and callingcancelCause.No issue with
runErr == nilvs cause (verified). WhenrunErris nil butcontext.Cause(runCtx)isErrCriticKill,statusForcorrectly returns "ok" (the nil-error check is first). A completed run can't be "killed" — the kill must manifest as a non-nil error fromag.Run.handle.Stop()doesn't setkillCause(verified, not a bug).Stop()just closes the stop channel; it doesn't touchkillCause. SincekillCauseis only set intick()whend.Killis true, andStop()is called at run end, there's no path whereStop()should need to set it.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 10m 0s
🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
Escalator'sDecision.KillReason(critic/critic.go:279) — host-controlled, not attacker/untrusted input. It is formatted viafmt.Errorf("%w: %s", ErrCriticKill, cause.Error())(run/critic.go:67) using%sinterpolation, so no format-string injection, and the nil guard (cause != nil) precedes.Error(), so no nil-deref. Verified by reading run/critic.go:60-72 and critic/critic.go:277-285.RecordStep(iter, resp *llm.Response)forwards the completed step's model response (potentially sensitive content) to the host'sCriticHandle. This is the intended inversion seam (Ports.Criticis a host-supplied port), not a cross-trust-boundary leak — the host already owns the agent/model/output. Verified against run/ports.go:130-148 (interface) and run/executor.go:230 (call site);respis nil-safe at both the battery (ignored) and the binding (b != nilguard).statusForreadscontext.Cause(runCtx)for control-flow only (errors.Is), and the cause/kill reason is not written into audit output (finishAuditrecordsrunErr.Error(), not the cause — run/executor.go:329-330). So the kill reason does not surface in audit logs as unattributed/uncontrolled text. Verified.ErrCriticKillsentinel are internal status plumbing with host-controlled inputs.🎯 Correctness — Minor issues
VERDICT: Minor issues
Through the correctness lens, I verified the status-cause plumbing end to end against the actual code. The core logic is sound, but there is one real race that can defeat the PR's central distinction (kill vs timeout), plus a narrow behavior change in
statusFor.run/critic.go:55-74+run/executor.go:203-206— critic-kill can be misclassified as "timeout" when it coincides withMaxRuntime. The critic's deadline-watch polls everycriticDeadlineCheck(1s) and only cancels withErrCriticKillfrom inside the<-t.Cbranch (after re-readingh.KillCause()). When a kill is decided (critic/critic.go:277-285:deadline = now,killCauseset underh.mu), the watch won't notice until its next tick — up to ~1s later. MeanwhilerunTimer(executor.go:205) fires atmaxRuntimeand callscancelCause(context.DeadlineExceeded).context.WithCancelCausehonors only the first cancel call, so ifmaxRuntimeelapses inside that polling window,runTimerwins,context.Cause(runCtx)becomesDeadlineExceeded, andstatusFor(executor.go:308) returns"timeout"even though the critic explicitly killed the run. The watch also has a<-runCtx.Done(): returnarm (critic.go:59), so oncerunTimerfires the watch simply exits without ever stampingErrCriticKill. The window is narrow (~1s relative to a typically largemaxRuntime), but it is a genuine gap in the kill-vs-timeout guarantee the PR introduces.Suggested fix: when the watch detects
KillCause() != nil, cancel withErrCriticKillimmediately on kill detection independent of the polling tick (e.g., haveticksignal a dedicated channel the watch selects on), or haverunTimerconsultKillCause()before choosing its cause. At minimum, narrowingcriticDeadlineCheckreduces but does not eliminate the window.run/executor.go:308— a non-timeout generic error can be relabeled "timeout" if the context was independently cancelled.statusFornow returns"timeout"whenerrors.Is(runErr, context.DeadlineExceeded) || errors.Is(context.Cause(runCtx), context.DeadlineExceeded)regardless ofrunErr's actual value. If the agent returns a genuine non-context error (e.g., a model/transport error) in the same window thatrunTimer(orMergeCancellationpropagating a callerDeadlineExceeded—runengine.go:69) fires, the cause wins and the real error is masked as a timeout. The oldstatusFor(runErr)reported"error"in that case. This is a narrow edge (normally a cancelled ctx makesag.Runreturnctx.Err(), not a generic error), so likely not material in practice, but it is a behavior change worth noting.Other correctness aspects checked and clean:
statusForcase ordering (ok→killed→timeout→cancelled→error) is correct givenCauseis a single value;context.Causereturnsnilfor an uncanceled context so successful runs still map to"ok";MergeCancellationpropagates the caller's cause viacancel(context.Cause(secondary))(runengine.go:69), consistent with prior behavior (callerCanceled→"cancelled", callerDeadlineExceeded→"timeout");defer cancelCause(nil)(executor.go:204) runs only at function return, afterstatusForis computed atexecutor.go:281, so it cannot overwrite the status; the battery'skillCauseis set underh.muintick(critic/critic.go:272-285) and read underh.muinKillCause()(critic/critic.go:208-213);RecordStep(iter, _ *llm.Response)is a nil-safe signature widening with all call sites updated (run/critic.go:84,run/executor.go:230,critic/critic_test.go:54).🧯 Error handling & edge cases — Minor issues
Verified review
VERDICT: Minor issues
Both findings confirmed against the actual code.
run/executor.go:302-309— caller-deadline cancellation is now misclassified as "timeout" instead of "cancelled". The newstatusForaddserrors.Is(context.Cause(runCtx), context.DeadlineExceeded). Confirmed atrun/runengine.go:62-73:MergeCancellation(runCtx, ctx)spawns a goroutine that, onctx.Done(), callscancel(context.Cause(ctx))against theWithCancelCause-derived merged context. When the caller'sctxis acontext.WithTimeoutthat expires,context.Cause(ctx)iscontext.DeadlineExceeded, so the mergedrunCtx's cause becomesDeadlineExceededeven though the run's ownMaxRuntimebackstop (runTimer,run/executor.go:205) did not fire. The agent returns an error wrappingcontext.Canceled(aWithCancelCausecancel reportsCanceledviactx.Err()), so the oldstatusFor(which only inspectedrunErr) mapped this to"cancelled"; the new one inspects the cause and maps it to"timeout". This is a real, undocumented behavior change for the caller-deadline-expiry case, distinct from the PR's intended run-backstop timeout. Suggested fix: stamp the run's own backstop with a distinct sentinel (e.g.ErrRunTimeout) and check only that instatusFor, leaving caller-propagatedDeadlineExceededto fall through to"cancelled".run/critic.go:53-71vsrun/executor.go:205— race between critic kill and theMaxRuntimetimer can lose the "killed" status. Confirmed: on a kill,critic/critic.go:277-285setsh.deadline = h.now()andh.killCause, but the deadline-watch goroutine (run/critic.go:53-72) polls only everycriticDeadlineCheck(1s,run/critic.go:14) before callingcancelCause(ErrCriticKill...). TherunTimer(run/executor.go:205) fires independently withcancelCause(context.DeadlineExceeded)and is only stopped by the deferredrunTimer.Stop()at the end ofRun— the critic kill path does not stop it.context.CancelCauseFuncis first-write-wins, so ifmaxRuntimeexpires inside the ~1s window between the kill decision and the next poll, the cause becomesDeadlineExceededand the critic'sErrCriticKillis dropped — status reports"timeout"for a run the critic explicitly killed, undermining the PR's core distinction. Narrow window, but real. Suggested fix: have the kill path cancelrunCtxdirectly (or race the watch against a kill-signal channel / stoprunTimeron kill) instead of relying on the polled deadline to relay it.The happy paths and nil-safety (
RecordStepignoring an unused*llm.Response, nil-safecriticBindingmethods, recover in the watch goroutine atrun/critic.go:52) are handled correctly.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 0s
🪰 Gadfly review —
kimi-k2.7-code:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
Minor issues
Unvalidated host/model kill reason flows into errors, logs, and audit traces
critic/critic.go:279-283storesDecision.KillReasondirectly into anerrorviaerrors.New(reason), defaulting to"critic killed the run". The hostEscalator(e.g., mort's LLM critic-agent) can supply arbitrary text into this field with no length limit or character sanitization.run/critic.go:66-67then wraps that cause withErrCriticKilland stamps it as the run's cancellation cause, so the raw reason ends up incontext.Cause(runCtx)and is used bystatusForto report"killed". Depending on how the agent wraps the cancellation error, it may also reachResult.Err, auditRunStats.Error, andDeliverError.Impact: log/audit pollution, potential PII leakage, and a trivial DoS vector from a multi-megabyte kill reason.
Fix: cap
KillReasonlength and sanitize control characters incritic/critic.gobefore storing it.Same issue at the public host seam
run/ports.go:149-154documentsCriticHandle.KillCause()as returning a reason, but the interface places no contract or validation on its content. A malicious or buggy host adapter can return an oversized or malformed string, andrun/critic.go:66-67will faithfully adopt it as part of the cancellation cause.Fix: add a defensive length cap and normalization when wrapping
KillCause()inrun/critic.go:66-67.No material authn/authz, command/SQL/template injection, SSRF, deserialization, or credential-leak issues were found in this diff.
🎯 Correctness — No material issues found
Verdict: No material issues found
runCtxis the merged context fromMergeCancellation, whose parent is the cause-carrying context. When the critic watch calls the parent'sCancelCauseFuncwithErrCriticKillorDeadlineExceeded,context.Cause(runCtx)resolves to that cause (Cause walks ancestor contexts).statusForchecksCausebeforectx.Err()-derived values, so critic kill is correctly distinguished from caller cancel/timeout.MergeCancellationbehavior: when the secondary (caller) context cancels, it stamps the merged context withcontext.Cause(secondary), preserving caller-cancel classification; when the parent cancels first with a critic cause, the merged context inherits it.handle.killCause/handle.deadlineupdate incritic/critic.go:277-285is done under a single mutex critical section, so thestartCriticpoll cannot observe a past deadline without a matching non-nilKillCauseonce a kill decision has been applied.CriticHandleimplementations exist in the repo (critic/critic.goandrun/critic_test.go); both signatures were updated forRecordStep(iter int, resp *llm.Response)andKillCause() error.WithTimeoutsemantics by firingcancelCause(context.DeadlineExceeded)from atime.AfterFunc.🧯 Error handling & edge cases — No material issues found
Verdict: No material issues found
Findings:
run/executor.go:203—runCtxis created withcontext.WithCancelCause(context.WithoutCancel(ctx))and then immediately wrapped byMergeCancellationatrun/executor.go:207.MergeCancellationmay callcancel(context.Cause(secondary))if the caller contextctxis cancelled, which would overwrite any cancellation cause previously set onrunCtx(e.g.,ErrCriticKillfromstartCriticorcontext.DeadlineExceededfromrunTimer). SincestatusForreadscontext.Cause(runCtx), a caller-cancellation racing with a critic kill or timeout could overwrite the original cause and causestatusForto report"cancelled"instead of"killed"or"timeout". However, this ordering is exactly the same as the prior code's behavior withcontext.WithTimeout(context.WithoutCancel(ctx), maxRuntime)followed byMergeCancellation; the only new risk is the loss of the reason when a critic kill races with caller cancellation, and there is no existing test or usage path in the diff or repo that demonstrates this race causing a user-visible misclassification. Treat as a latent edge case rather than a blocking bug.run/executor_test.go:157— The test createskillCtxbut does not cancel the background contextbgorkillCtxvia defer. This is a minor test-only resource leak and not a production issue.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 2s
🪰 Gadfly review —
deepseek-v4-pro:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
I traced every tainted-data path through the diff and verified the surrounding code. Here's what I checked:
KillReason→ error chain →context.Cause: The reason string flows from the hostEscalator(viaDecision.KillReason) →errors.New(reason)→KillCause()→fmt.Errorf("%w: %s", ErrCriticKill, cause.Error())→cancelCause(...). It is consumed only bystatusFor'serrors.Is(context.Cause(runCtx), ErrCriticKill)boolean check. The reason string is never surfaced to users, logs, or audit output —statusForreturns the hardcoded"killed", andres.Errremainscontext.Canceled(the agent loop seesctx.Err(), not the cause). Verified by readingexecutor.go:279-287andports.go:63-74.*llm.ResponseinRecordStep: The model response is forwarded to the critic handle. The battery (critic/critic.go:172) discards it (_). A host adapter that does consume it is responsible for its own handling of model output (PII, logging, etc.) — the kernel merely passes data that was already available elsewhere in the same goroutine. No new exposure.fmt.Errorf("%w: %s", ...)with attacker-controlledcause.Error(): The%sverb prints the string literally; format specifiers inside the argument are not interpreted. No format-string vulnerability.No new network calls, no deserialization, no authz changes, no file paths constructed from untrusted input.
The changes are mechanically sound from a security standpoint.
🎯 Correctness — No material issues found
Verdict: No material issues found
I traced every logic path through the cause-carrying cancellation, the
statusFordispatch, theAfterFunc/critic-watch interaction, and theMergeCancellationpropagation. The semantics are sound:statusForpriority order is correct: "killed" (viacontext.Cause→ErrCriticKill) is checked before "timeout" (viacontext.Cause→DeadlineExceededorrunErr), which is checked before "cancelled" (viarunErr→Canceled). A critic kill that wrapsErrCriticKillwith%wis correctly unwrapped byerrors.Is.cancelCausemeans theAfterFunctimer and the critic watch goroutine can't clobber each other; whichever fires first sets the cause permanently.MergeCancellationchild propagation: the merged context (child of the pre-mergerunCtx) correctly inherits the parent's cancellation cause, sostatusFor(runCtx, runErr)whererunCtxis the merged context still sees the critic-kill or deadline-exceeded cause set on the pre-merge parent.statusForis called before any of them fire.KillCause()is properly mutex-guarded and returns nil untilticksetskillCause; the executor'sKillCause() != nilgate correctly branches betweenErrCriticKillandDeadlineExceeded.One minor observation (not a bug): the
RunStats.Statuscomment onrun/ports.go:66listsok | error | timeout | budget_exceeded | cancelled | dry_runbut doesn't mention the new"killed"status. This is a documentation drift, not a logic defect, and outside my lens.🧯 Error handling & edge cases — No material issues found
No material issues found.
I traced every cancellation path, defer ordering, nil-safety, and race condition across the four changed files. The error handling is solid:
MergeCancellation, caller cancel) are safe:WithCancelCausemakes subsequentcancelCausecalls no-ops, and the first cause sticks.defer cancelCause(nil)atexecutor.go:204runs afterstatusFor/finishAudit/deliver, so it cannot corrupt the status. The deferredrunTimer.Stop()runs before it, preventing most late-timer firings.context.Cause(runCtx)instatusForis safe when the context isn't cancelled (returns nil). TherunErr == nilshort-circuit prevents a successful run from being misclassified even if a late cancellation lands.*llm.ResponseinRecordStepis explicitly documented as nil-safe; both the battery (critic/critic.go:172) and the fake handle ignore it.KillCause()returning nil is the default; the deadline-watch only takes the kill path when it's non-nil (critic.go:66).fmt.Errorf("%w: %s", ErrCriticKill, cause.Error())—causeis guarded by the!= nilcheck, so.Error()won't panic.errors.Iscorrectly unwraps throughfmt.Errorfto findErrCriticKill.tick+KillCause: theh.killedcheck atcritic.go:274prevents a double-kill, andkillCauseis set under the same lock askilled.The one minor observation (not a code defect):
executor_test.gotests the "timeout" path only viarunErr == DeadlineExceeded, not viarunErr == Canceled+Cause == DeadlineExceeded(the timer-firing scenario). The code handles it correctly; the test just doesn't cover that specific combination.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9m 49s
🪰 Gadfly review —
nemotron-3-super:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found
Explanation
After reviewing the changes through the security lens (focusing on authn/authz, injection, SSRF, secret leakage, input validation, and insecure defaults), no security vulnerabilities were identified.
Key points verified:
KillReason(which could be influenced by LLM output) is only used to wrap an error for cancellation cause. ThestatusForfunction checks only the error type (errors.Is(context.Cause(runCtx), ErrCriticKill)), not the error string itself. While the error string could theoretically be logged unsafely elsewhere, this change does not introduce new logging or insecure usage of the string.*llm.Responseparameter is passed through toRecordStepbut is ignored in the battery implementation (as noted in the comment) and in test fakes. No security-sensitive operations (e.g., command injection, SQL injection) are performed with this data in the changed code.errors.Isandfmt.Errorfare safe and follow idiomatic Go patterns.KillCause()method properly locks the mutex before accessingh.killCause.All changes are consistent with the stated goal of passing the LLM response to the critic for tracking and distinguishing kill causes via context cancellation. No insecure defaults or missing validations were identified.
Verification steps included:
critic/critic.go,run/critic.go,run/executor.go) to confirm safe usage of new parameters and error handling.ErrCriticKillandcontext.Causeto confirm usage is limited to equality checks.llm.Responseis not dereferenced or used in unsafe ways in the changed code.🎯 Correctness — No material issues found
No material issues found.
🧯 Error handling & edge cases — No material issues found
No material issues found
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 5s
🪰 Gadfly review —
gpt-oss:120b-cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
CriticHandleAPI. No new authentication/authorization checks are needed, and the added error handling does not expose unvalidated input or secrets.d.KillReason). It is stored in anerrorand later wrapped withfmt.Errorf("%w: %s", ErrCriticKill, cause.Error()). This does not lead to command/SQL/template injection, nor is it logged or returned to external callers in a way that would leak sensitive data.No security‑related concerns identified.
🎯 Correctness — Minor issues
Verdict: Minor issues
run/critic_test.gouses an uninitializedkillCtx– the test creates a context withkillCtx, killCancel := context.WithCancelCause(context.Background())but never declareskillCtx. This variable is referenced later, causing a compilation error.Impact: Compile‑time failure.
run/executor.goimportserrorsbut does not use it after the changes; the import remains unused.Impact: Linter/compile warning.
🧯 Error handling & edge cases — No material issues found
Verdict: No material issues found
CancelCausecalls are guarded by thecontextAPI, and the deferredcancelCause(nil)does not overwrite an already‑set cause.killCause,iterations, and other mutable state prevents data races. TheStopmethod safely guards against double‑close of its channel.maxRuntimeorCriticSoftTimeoutare handled by fall‑backs to sensible defaults.No error‑handling or edge‑case problems detected in the changes.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 13s
🪰 Gadfly review —
qwen3-coder:480b-cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found.
The changes introduce a more nuanced status reporting mechanism for run execution by distinguishing between critic kills, timeouts, and cancellations. The security implications have been carefully considered:
KillCause()mechanism properly encapsulates error information without exposing internal statefmt.Errorfwith%w)The implementation maintains secure error handling practices while providing better observability into run termination causes.
🎯 Correctness — No material issues found
VERDICT: No material issues found
The changes correctly implement the critic parity features:
RecordStep now carries the full response - The signature change from
RecordStep(iter int)toRecordStep(iter int, resp *llm.Response)is consistently applied across the critic handle, binding, and executor. The battery implementation correctly ignores the response payload as noted in its comment, while allowing richer escalators to use it.Cause-carrying cancellation - The executor now uses
context.WithCancelCauseand properly distinguishes between critic kills (viaErrCriticKill) and natural timeouts (viacontext.DeadlineExceeded). ThestatusForfunction correctly checkscontext.Cause(runCtx)to determine the final status.Kill cause propagation - The critic handle's
KillCause()method properly surfaces the kill reason, which is then used by the deadline watcher to cancel the context with the appropriate cause.All interface changes are consistently implemented across the codebase, and the test cases have been updated to match the new signatures and behavior. The logic for distinguishing kill causes appears correct and handles the various cancellation scenarios appropriately.
🧯 Error handling & edge cases — No material issues found
No material issues found
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 29s
🪰 Gadfly review —
gemma4:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found.
I have reviewed the changes through the Security lens. The modifications focus on enhancing the fidelity of run status reporting (distinguishing between "killed" by a critic, "timeout" by a backstop, and "cancelled" by the user) and enriching the data passed to the critic (
llm.Response).llm.Responseis now passed toRecordStep. In the provided implementation (critic/critic.go), this parameter is ignored. Since the critic handle is an interface used by host adapters, any security implications of processingllm.Response(e.g., logging untrusted model output) would reside in the adapter implementation, not the executor's orchestration.context.WithTimeouttocontext.WithCancelCausecombined withtime.AfterFuncmaintains the same hard-deadline safety. The use ofdefer runTimer.Stop()ensures that timers are cleaned up regardless of how the run terminates, preventing timer leaks.KillCauseis wrapped inErrCriticKilland used to determine the run status. The cause is derived fromDecision.KillReason. Given that this is internal state for a run's audit trail/status, there is no exposure of sensitive system secrets to external untrusted parties.handleincritic/critic.gocorrectly uses a mutex (h.mu) to protectkilledand the newkillCausefield, preventing race conditions during the check instartCritic.The changes are localized to control flow and metadata tracking and do not introduce new attack vectors.
🎯 Correctness — No material issues found
VERDICT: No material issues found
I have reviewed the changes through the Correctness lens.
context.WithTimeouttocontext.WithCancelCausecombined with a manualtime.AfterFunccorrectly separates the three termination states (User Cancel, Backstop Timeout, and Critic Kill).run/critic.go:63-67ensures that ifKillCause()is set at the moment of expiration, the cause isErrCriticKill, otherwise it iscontext.DeadlineExceeded.statusForlogic inrun/executor.go:297-302correctly prioritizes the cause from the context over the returned error, ensuring "killed" takes precedence when the context was cancelled by the critic.RecordStepinrun/ports.go:133and its propagation throughrun/executor.go:226andrun/critic.go:77correctly forwards thellm.Responseto the critic, satisfying the requirement for fuller fidelity in trace recording.critic.handleincritic/critic.gocorrectly uses its mutex to protect bothkilledand the newkillCausefield, preventing races when the executor pollsKillCause()while the critic'stickgoroutine is updating it.time.Now().Before(d)inrun/critic.go:62is correct for checking if a deadline has passed.All logic appears sound and consistent with the stated goals.
🧯 Error handling & edge cases — No material issues found
VERDICT: No material issues found
I have reviewed the changes through the lens of error handling and edge cases.
context.CancelCauseFunc: The transition fromWithTimeouttoWithCancelCauseis handled correctly. ThemaxRuntimeis now enforced by a manualtime.AfterFuncthat triggerscancelCause(context.DeadlineExceeded), which preserves the original timeout behavior.statusForLogic: The new priority for determining status (ErrCriticKill->DeadlineExceeded->Canceled) correctly distinguishes between a critic's explicit kill, a backstop timeout, and a general cancellation. The use ofcontext.Cause(runCtx)ensures that the specific reason for cancellation is captured even whenrunErris a genericcontext.Canceled.KillCauseThread Safety: Incritic/critic.go,KillCause()correctly uses the handle's mutex to protect access toh.killCause, preventing data races with thetickgoroutine that updates it.RecordStepupdate maintains nil-safety for thellm.Responsepayload, andcriticBinding.recordStepremains nil-safe for the binding itself.ticklogic incritic.gohandles emptyKillReasonstrings by providing a default message, avoiding empty error messages.The implementation is robust and correctly handles the unhappy paths introduced by the new "killed" status.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 20s
Pull request closed