C0b: wire Critic + Delivery into run.Executor #9
Reference in New Issue
Block a user
Delete Branch "phase-c0b-critic-delivery"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Continues finishing the executor's
run.Portswiring (after C0's Palette). With this, the only un-wired Port isCheckpointer(deferred — needs a majordomo hook to snapshot the running message history).Critic (
run/critic.go)When
Ports.Criticis set and the agent enables it (RunnableAgent.Critic.Enabled):Monitorat run start;RecordStep/RecordToolStartfed from the step observer;Steermessages drained into the loop viaagent.WithSteer;Deadlinevia a watch goroutine — a healthy-but-slow run gets room while a hung one is killed;Stop()on run end. Soft timeout fromDefaults.CriticSoftTimeout(default 90s).Delivery (
run/executor.go)After the run, when
Ports.Deliveryis set andinv.DeliveryIDis non-empty, the executor postsResult.Output(orDeliverErroron failure) to a host-interpreteddeliver.Target{inv.DeliveryKind, inv.DeliveryID}. Empty target = caller readsResult.Outputitself (the synchronous default; the.agent runcanary works this way). Best-effort + detached.tool.InvocationgainsDeliveryKind/DeliveryID(host-set egress target).Tests
Critic monitored/fed/steered/stopped when enabled, untouched when disabled; delivery posts on a target, skips without one, doesn't fire on an early resolve error.
Next: C1 — mort-side
skilltools.Tool → tool.Toolbridge + route.agent runthroughexecutushostbehind a convar flag.🤖 Generated with Claude Code
Continues finishing the executor's run.Ports wiring (after C0's Palette). Critic (run/critic.go): when Ports.Critic is set and the agent enables it, the executor calls Monitor at run start, feeds RecordStep/RecordToolStart from the step observer, drains the critic's Steer messages into the loop via agent.WithSteer, and binds the run's hard cancellation to the critic's (extendable) Deadline through a watch goroutine — a healthy-but-slow run gets room while a hung one is killed. Stop() on run end. Soft timeout from Defaults.CriticSoftTimeout (default 90s). nil-safe: no critic / not-enabled = no-op. Delivery (run/executor.go deliver): after the run, when Ports.Delivery is set and inv.DeliveryID is non-empty, the executor posts Result.Output (or DeliverError on failure) to a host-interpreted deliver.Target {inv.DeliveryKind, inv.DeliveryID}. Empty target = caller reads Result.Output itself (the synchronous default; the `.agent run` canary). Best-effort + detached. tool.Invocation gains DeliveryKind/DeliveryID (host-set egress target). Tests: critic monitored/fed/steered/stopped when enabled, untouched when not; delivery posts on a target, skips without one. Deferred: Checkpointer (needs a majordomo hook to snapshot the running message history). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
From the security lens, this PR is the run kernel wiring a host-supplied watchdog (Critic) and a host-supplied egress channel (Delivery). The kernel intentionally performs no validation of
inv.DeliveryKind/inv.DeliveryID— both fields are documented as host-interpreted, andtool/registry.go:176-182says so. The actual trust boundary is the host'srun.Ports.Deliveryimplementation, which is not in this diff. Same story forCritic: only the host'sEscalatordecides whatSteermessages get injected.What I verified by reading the code:
run/executor.go:338-349—deliver()only fires when bothPorts.Deliveryis set andinv.DeliveryID != "". The empty-ID gate is the correct safe-default, matching the PR's "synchronous default" claim. Errors fromDeliver/DeliverErrorare deliberately dropped (best-effort side channel) — that's a deliberate trade, not a leak vector.run/executor.go:343— delivery runs ondetach(ctx), so a cancelled caller can't kill the post, andCleanupContextTimeout(10s,run/runengine.go:43) bounds it. No DoS amplification.run/critic.go:29-66— the watch goroutine'scancel()is the timeout-boundedcontext.WithTimeoutcancel fromrunCtx. Thedonechannel is closed beforeh.Stop()in the returned stop closure, so the goroutine always exits deterministically. No leak path.run/critic.go:86—agent.WithSteer(b.h.Steer)pushes the host's critic-emitted messages into the LLM loop. This is a real trust expansion (a malicious or buggyEscalatorcan inject text into the agent's context), but theEscalatoris host-trusted by design; the kernel is the right layer to put no further filter.deliver/deliver.go— theDeliveryinterface never parsesTarget.Kind/ID; the harness deliberately treats them as opaque. No template/SQL/path-injection surface is introduced here.run/executor.go:275—e.deliver(ctx, inv, res, runErr)is reached only after the model resolves and the loop runs; theTestDeliveryErrorPathtest'sd.delivers == 0assertion is correct because the model-resolvereturnatrun/executor.go:137happens beforee.deliver. No bypass.No authn/authz gap is created inside the kernel (authorization to post to a given
DeliveryIDis the host's job, set when the Invocation is built). No untrusted-string-into-shell, no template, no SQL, no path concat, no SSRF, no deserialization — the diff is interface-only. Nothing material in this lens.🎯 Correctness — Minor issues
VERDICT: Minor issues
Findings
run/executor.go:129, 137, 174, 182 — Delivery is silently skipped for early errors (budget reject, model resolve, toolbox build, delegation-tools build).
e.deliver(...)is called only once atrun/executor.go:275, afterag.Runreturns. The four early-return paths (executor.go:129budget,executor.go:137model resolve,executor.go:174Registry.Buildfailure,executor.go:182addDelegationToolsfailure) all bypass it. They setres.Errand (for the latter two) callfinishAudit, but never invokeDeliverErroreven wheninv.DeliveryIDis set. The asymmetry with the post-ag.Runerror path (line 275) is real: a caller who setDeliveryIDexpecting an error post for any run failure will get nothing for these four classes of error. TheTestDeliveryErrorPathtest (run/delivery_test.go:70-88) only countsdelivers(the success path) and asserts it stays 0, with a misleading comment that "DeliverError on in-loop errors is exercised by the wiring" — there is no test that exercisesDeliverErrorat all, and the early-return errors that should reach it don't. Whether this is intended is a domain call, but the test naming and the comment overstate coverage.run/critic.go:34-37 —
soft <= 0guard is unreachable.Defaults.withFallbacks()atrun/executor.go:53-55coercesCriticSoftTimeoutto90 * time.Secondwhenever<= 0, andNew(run/executor.go:92) applieswithFallbacksonce at construction. By the timestartCriticis reached inRun,e.cfg.Defaults.CriticSoftTimeoutis guaranteed positive, so theif soft <= 0 { return nil, noop }branch atcritic.go:35-37is dead. Harmless but redundant; could be deleted or replaced with a panic to surface a future regression where someone bypasseswithFallbacks.run/critic.go:55 — Deadline comparison uses
!time.Now().Before(d). This is correct (kills at-or-past deadline) but cancel fires up to onecriticDeadlineCheck(1s) after the actual deadline. With the defaultbackstopMul=3andCriticSoftTimeout=90s(critic/critic.go:111-112,run/executor.go:54), the hard backstop is ~270s, so 1s of jitter is well below theExtendBymargin a typical Escalator would grant. Worth flagging only if a future Escalator chooses very smallExtendByvalues (e.g. seconds).run/executor.go:226 —
string(call.Arguments)is a lossy cast for non-UTF-8 JSON.llm.ToolCall.Argumentsisjson.RawMessage([]byte) — confirmed atrun/palette_test.go:46(Arguments: json.RawMessage(...)). TheCriticHandle.RecordToolStart(name, args string)seam atrun/ports.go:120forces astringcast, and the only production implcritic/critic.go:171discards the args entirely (RecordToolStart(name, _ string)), so the corruption is never actually surfaced. The mismatch is real but currently benign; flagging as a semantic seam to watch if a future Critic impl ever inspectsargs.Outside my lens
None.
🧯 Error handling & edge cases — Minor issues
All findings confirmed against the actual code. Let me now produce the corrected review. Most findings hold up; I'll keep them all. Note that the reviewer's verdict of "Minor issues" remains appropriate since none of these are blocking compilation or correctness violations — they're concerns/observations.
VERDICT: Minor issues
run/critic.go:29-66— Confirmed.startCritic's watch goroutine has norecover, yetrun/executor.go:107documentsRunas never propagating a panic. A panic inh.Deadline()(or anything inside the goroutine on the first tick) escapes the run and kills the host. Wrap the goroutine body and/orMonitorin a recover, or document thatCriticHandleimplementations must not panic.run/critic.go:34-37— Confirmed. AfterNewinvokesDefaults.withFallbacks()(run/executor.go:92),CriticSoftTimeout <= 0cannot occur; the check is effectively dead. It also silently diverges from the fallback contract: a host that somehow passes a negative value gets "no critic" instead of the intended 90s default, becausewithFallbackspatches<= 0to 90s, butstartCriticshort-circuits on<= 0to noop. Pick one layer to handle this.run/executor.go:215, 226— Confirmed.critic.recordStep/critic.recordToolStartare called insidestepObserverwith no recover. The methods are nil-safe, but a panic inside the underlying handle methods would surface from inside the agent loop, abort the run, and could leak the goroutine / skip the deferredstopCritic/ audit close. Same recover concern as above.run/executor.go:275— Confirmed.e.deliver(ctx, inv, res, runErr)runs synchronously afterBudget.Commitand beforereturn res, on adetach(ctx)so cancellation is dropped — correct, but a slowDeliverextends the wall-clock to the caller even thoughResultis already known. The comment atrun/executor.go:336says "detached" which reads ambiguously (decoupled-from-cancel vs fire-and-forget). Worth a small clarify or a goroutine+WaitGroup.run/executor.go:342-348— Confirmed. The error path indelivercallsDeliverErrorand discardsres.Output. WithResult.Outputpopulated butErr != nil, the host must reason about both. Not a bug per the PR's "DeliverError on failure" contract, but worth flagging.run/critic.go:42-65— Confirmed.h.Deadline()is read from the watch goroutine concurrently withh.Stop()called from the deferredstopCritic. TheCriticHandleinterface (run/ports.go:115-129) does not document thread-safety, and the test fixture'sfakeCriticHandle.Deadline(no mutex) only works because it returns a constant. A real implementation that mutates internal state inStopraces. Document the concurrency contract on the interface or guard the watch goroutine.run/critic_test.go— Confirmed. No test exercises the deadline-watch firing path (the fake'sDeadline()always returns zero), theMonitorreturning nil edge (run/critic.go:39-41), or the panic recovery noted above. These are the unhappy paths most worth covering.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 26s
🪰 Gadfly review —
deepseek-v4-flash:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
🔒 Security — Verdict: No material issues found
After reviewing the diff and verifying the surrounding code (ports, interfaces, delivery package, invocation struct, context management), I find no material security vulnerabilities introduced by this change.
Why nothing rises to a finding:
Critic wiring — The critic is a host-provided, trusted component (
Ports.Critic). ItsSteer()messages are injected into the agent loop, and itsDeadline()drives a context cancellation goroutine. Both are standard, safe patterns. The deadline-watch goroutine is properly cleaned up via adonechannel and deferredstopCritic(). Double-callingcancel()is safe in Go (subsequent calls are no-ops).Delivery path —
DeliveryKind/DeliveryIDare explicitly host-set fields onInvocation(per the PR description and the doc comments). The executor trusts the host that built the invocation — this is the correct trust boundary. The delivery is best-effort and detached (failures silently ignored), so it cannot crash or hang the run. No injection vector exists because thedeliver.Targetis host-interpreted; the host'sDeliveryimplementation is responsible for validating whatever it receives.No new untrusted input paths — The PR adds no new user-facing input parsing, no SQL/command/path construction from untrusted data, no deserialization, and no credential handling.
No SSRF — The delivery target is a pair of strings passed to a host-provided interface; no network calls are made by the executor itself.
The only defense-in-depth observation (not a finding) is that the executor does not validate
DeliveryKind/DeliveryIDbefore forwarding them to the delivery system. This is by design — the host sets these values and the host'sDeliveryimplementation is the appropriate place for any validation. A separate reviewer covering the correctness or architecture lens may wish to confirm that no caller path exists where user-controlled data flows into these fields without host mediation.🎯 Correctness — No material issues found
🎯 Correctness — Verdict: No material issues found
After reading the full implementation files, the interface definitions in
run/ports.go, thedeliverpackage, and the test files, I find no logic bugs, semantic errors, or incorrect behavior introduced by this change.What I verified
Critic deadline-watch goroutine (
critic.go:43-61): Thecancelcaptured in the closure is the timeout context's cancel function (line 190). When the critic fires, it cancels the timeout context, which propagates throughMergeCancellationto the mergedrunCtx. Double-cancel is safe. Thedonechannel correctly signals the goroutine to stop. ✅Nil-safety (
critic.go:68-86):recordStep,recordToolStart, andsteerOptionsall guard onb != nil/b == nil. ThestartCriticfunction returns(nil, noop)for every early-exit path. ✅steerOptionspassed as variadic (executor.go:261):critic.steerOptions()returnsnilwhen there's no critic;nil...in a variadic call is a no-op in Go. ✅Delivery uses detached context (
executor.go:338-349):detach(ctx)strips cancellation viacontext.WithoutCanceland appliesCleanupContextTimeout, so a cancelled caller or run context doesn't prevent the best-effort egress. ✅Delivery gating (
executor.go:339): Checksinv.DeliveryID == ""(not justDeliveryKind), which matches theTargetstruct semantics — an ID-less target is meaningless. ✅Early-return paths skip delivery (
executor.go:128-131, 136-138, 142-144, 172-174, 180-182): All return beforedeliveris called on line 275. TheTestDeliveryErrorPathtest confirms this. ✅CriticSoftTimeoutdefault (executor.go:53-55): 90s, applied when<= 0. ThestartCriticfunction also guards onsoft <= 0, so an explicit zero disables the critic (consistent with the zero-value semantics). ✅Test assertions (
critic_test.go,delivery_test.go): All tests correctly assert the expected call counts and target values. ThefakeCriticHandle.Deadline()returns zero time, so the deadline-watch goroutine never fires spuriously in tests. ✅No material correctness issues found in this change.
🧯 Error handling & edge cases — No material issues found
🧯 Error handling & edge cases — Verdict: No material issues found
After reading the full source of
run/critic.go,run/executor.go,run/ports.go,run/agent.go,deliver/deliver.go, and the test files, I find no material error-handling or edge-case bugs in this diff.What was verified:
Goroutine lifecycle (critic.go:43–61): The deadline-watch goroutine exits cleanly on all three paths (
donechannel closed,runCtx.Done(), or deadline-passed →cancel()). Thedonechannel is only closed by the stop function; the goroutine never closes it, so there is no double-close or send-on-closed risk. Thecancel()call is idempotent (Go'scontext.WithTimeoutcancel is safe to call multiple times), so the deferredcancel()at executor.go:191 and the goroutine'scancel()at critic.go:56 do not conflict.Nil-safety throughout:
startCriticreturns(nil, noop)for every "not configured" case (nilPorts.Critic,Critic.Enabled == false,soft <= 0, nil handle fromMonitor). AllcriticBindingmethods (recordStep,recordToolStart,steerOptions) checkb != nilbefore dereferencing. The executor calls them unconditionally — safe.Delivery best-effort + detached:
deliver(executor.go:338–349) usesdetach(ctx)so post-run writes survive caller cancellation. BothDeliverandDeliverErrorreturn values are discarded (_ =,_, _ =), which is the documented contract ("must not change the run's outcome").Early-return paths skip delivery: Model-resolve failure (line 136), toolbox build failure (line 172), and delegation-tool build failure (line 180) all return before
deliveris reached. TheTestDeliveryErrorPathtest confirms this. This is intentional — delivery only fires after a full run attempt.Step observer edge cases (executor.go:207–233): When
s.Responseis nil,callsis nil (len 0), so the pairwise min(len(calls),len(s.Results)) correctly yields zero iterations — no out-of-bounds access. Thecritic.recordStep/recordToolStartcalls are nil-safe.withFallbacksguarantees:CriticSoftTimeoutdefaults to 90s inwithFallbacks()(executor.go:53–55), which runs inNew(). By the timestartCriticreads it, it is always positive when a critic is configured. Thesoft <= 0guard instartCriticis a safety net that will never trigger in practice.One design observation (not a bug): When
runErr != nil,delivercalls onlyDeliverErrorand discards any partialres.Output. The delivery target gets the error but not the partial work. This is the documented contract ("postsResult.OutputorDeliverErroron failure"), so it is intentional — but callers who want partial output on failure must readResult.Outputfrom the return value directly.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 7s
🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
Through the security lens (authn/authz, injection, SSRF, secret leakage, input validation, unsafe untrusted-data handling), I verified the following and found nothing materially wrong:
Delivery target provenance (
run/executor.go:339-348,tool/registry.go:181-182):DeliveryKind/DeliveryIDare documented as host-set and are not parsed/validated by the harness — thedeliver.Targetis explicitly host-interpreted (deliver/deliver.go:18-23). I grepped the whole repo for setters of these fields: the only writes are in tests; no LLM/tool-arg-reachable path currently populates them. So there is no tainted-input-to-egress-target flow today. The seam is open and relies on the host never letting untrusted input stamp it — worth keeping in mind for C1 (when.agent runrouting wires the host side), but not a present vulnerability. Unverified as a future risk only.DeliverErrorforwards the rawrunErr(run/executor.go:344-346): internal wrapped errors (e.g.fmt.Errorf("resolve model %q: %w", tier, err)atrun/executor.go:136) are handed to the egress seam. Whether internal details (tier names, provider errors) leak to a public channel is entirely a function of the host'sDelivery.DeliverErrorrendering — the harness passes the error to an interface explicitly designed to receive it. Not a harness-side disclosure bug; host implementations should redact. Minor observation, not material.Critic receives raw tool args (
run/critic.go:74-77, fed atrun/executor.go:226):recordToolStart(call.Name, string(call.Arguments))forwards tool call arguments (which may contain user data / secrets) toPorts.Critic. The Critic is a trusted host Port (run/ports.go:111), not an untrusted sink, so this is consistent with the existing audit/emitter wiring that already forwards the same data. No new exposure.Deadline-watch goroutine (
run/critic.go:42-61): only callscancel()(the run's own context cancel) on a host-providedDeadline(); no external side effects, no injection surface. Safe.detach(ctx)for delivery (run/executor.go:343): runs post-run egress on a context decoupled from caller cancellation, bounded byCleanupContextTimeout. This is the same detached-cleanup pattern already used for audit/Budget commits (run/executor.go:273,311) — no new authz or escape concern.No blocking or material security issues in this change.
🎯 Correctness — Minor issues
Both findings survive verification. Outputting the corrected review.
Verdict: Minor issues
run/executor.go:224-226—recordToolStartis fired after the tool has already executed, not at tool start. Confirmed in the checked-out code: thestepObservercallback receivesagent.Stepwiths.Resultsalready populated, andcritic.recordToolStart(call.Name, string(call.Arguments))(executor.go:226) runs inside the pairwise loop over completed calls/results — i.e. the iteration's tool calls have finished by the time it fires. The inline comment at executor.go:215 ("keep the critic's activity clock fresh") and theCriticHandlecontract inrun/ports.go:117-120describeRecordStep/RecordToolStartas keeping the activity clock fresh, but this delivery is post-completion. Impact: during a genuinely long-running tool call the critic's activity clock is not refreshed, so the stall scenario the critic exists to detect (a hung tool) can look idle and trigger a premature hard-kill. This mirrors the existingemitter.toolStartpattern (also post-completion, executor.go:227), so it is not a regression introduced by this PR — but a host implementingCriticHandle.RecordToolStartagainst the documented contract gets a misleading signal. Suggested fix: rename/document to reflect post-iteration delivery, or wire a true pre-tool hook if majordomo exposes one.run/critic.go:34-37— redundant/deadsoft <= 0guard. Confirmed:e.cfg.Defaults.withFallbacks()(run inNew) setsCriticSoftTimeoutto 90s whenever it is<= 0(executor.go:53-55), so by the timestartCriticreadssoft := e.cfg.Defaults.CriticSoftTimeout,softis always positive and theif soft <= 0 { return nil, noop }branch is unreachable. Harmless functionally, but it silently diverges from the documented "nil-safe no-op" intent: a caller who explicitly setCriticSoftTimeout: -1expecting to disable the critic gets the 90s fallback instead. Suggested fix: drop the guard, or decide on one documented disable mechanism.🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
run/executor.go:126-183(early-return paths skip delivery entirely). The newe.deliver(ctx, inv, res, runErr)is only called at the end of the happy-path-into-loop flow (:275). The early-return error branches — budget gate rejection (:129), model resolve error (:137), nil model (:143), toolbox build failure (:174), delegation-tool build failure (:182) — allreturn reswithout callingdeliver. So when a host setsinv.DeliveryIDand the run fails before the agent loop starts, the target gets nothing: noResult.Outputand noDeliverError. The PR description says "DeliverErroron failure," but that only holds for in-loop failures.TestDeliveryErrorPathexplicitly asserts the early-return path produces zero deliveries and frames it as intentional, but from an error-handling-completeness standpoint a configured egress target silently swallowing a hard run failure is a real gap. If early failures should be surfaced to the target, route those returns throughe.deliver(or at leastDeliverError); if silence is truly intended, document it ondeliver/DeliveryIDso callers know egress is best-effort-only-after-the-loop-starts.run/critic.go:34-36(explicitCriticSoftTimeout <= 0silently disables an enabled critic).startCriticreturns(nil, noop)whensoft <= 0.withFallbacks(executor.go:53-55) only fills a missing value (<= 0→ 90s), so this branch is only reachable if a caller somehow bypasseswithFallbacks. SinceNewcallswithFallbacks(executor.go:92), this is effectively dead in normal construction — but it meansRunnableAgent.Critic.Enabled=truecombined with a hand-constructedExecutor(noNew) andCriticSoftTimeout=0silently disables the critic with no error. Low impact given theNewguarantee; flagging as a sharp edge.run/executor.go:338-348deliverswallows bothDeliver/DeliverErrorerrors via_, _ =/_ =. This is explicitly documented as "best-effort + detached: a delivery failure must not change the run's outcome," so it's by design. No action needed; confirming it's intentional rather than an accidental swallow.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 34s
Pull request closed