C0b: wire Critic + Delivery into run.Executor #9

Closed
steve wants to merge 0 commits from phase-c0b-critic-delivery into main
Owner

Continues finishing the executor's run.Ports wiring (after C0's Palette). With this, the only un-wired Port is Checkpointer (deferred — needs a majordomo hook to snapshot the running message history).

Critic (run/critic.go)

When Ports.Critic is set and the agent enables it (RunnableAgent.Critic.Enabled):

  • Monitor at run start; RecordStep/RecordToolStart fed from the step observer;
  • the critic's Steer messages drained into the loop via agent.WithSteer;
  • the run's hard cancellation bound to the critic's extendable Deadline via a watch goroutine — a healthy-but-slow run gets room while a hung one is killed;
  • Stop() on run end. Soft timeout from Defaults.CriticSoftTimeout (default 90s).
  • nil-safe: no critic / not-enabled = no-op.

Delivery (run/executor.go)

After the run, when Ports.Delivery is set and inv.DeliveryID is non-empty, the executor posts Result.Output (or DeliverError on failure) to a host-interpreted deliver.Target{inv.DeliveryKind, inv.DeliveryID}. Empty target = caller reads Result.Output itself (the synchronous default; the .agent run canary works this way). Best-effort + detached.

  • tool.Invocation gains DeliveryKind/DeliveryID (host-set egress target).

Tests

Critic monitored/fed/steered/stopped when enabled, untouched when disabled; delivery posts on a target, skips without one, doesn't fire on an early resolve error.

Next: C1 — mort-side skilltools.Tool → tool.Tool bridge + route .agent run through executushost behind a convar flag.

🤖 Generated with Claude Code

Continues finishing the executor's `run.Ports` wiring (after C0's Palette). With this, the only un-wired Port is `Checkpointer` (deferred — needs a majordomo hook to snapshot the running message history). ## Critic (`run/critic.go`) When `Ports.Critic` is set **and** the agent enables it (`RunnableAgent.Critic.Enabled`): - `Monitor` at run start; `RecordStep`/`RecordToolStart` fed from the step observer; - the critic's `Steer` messages drained into the loop via `agent.WithSteer`; - the run's hard cancellation **bound to the critic's extendable `Deadline`** via a watch goroutine — a healthy-but-slow run gets room while a hung one is killed; - `Stop()` on run end. Soft timeout from `Defaults.CriticSoftTimeout` (default 90s). - **nil-safe**: no critic / not-enabled = no-op. ## Delivery (`run/executor.go`) After the run, when `Ports.Delivery` is set and `inv.DeliveryID` is non-empty, the executor posts `Result.Output` (or `DeliverError` on failure) to a host-interpreted `deliver.Target{inv.DeliveryKind, inv.DeliveryID}`. Empty target = caller reads `Result.Output` itself (the synchronous default; the `.agent run` canary works this way). Best-effort + detached. - `tool.Invocation` gains `DeliveryKind`/`DeliveryID` (host-set egress target). ## Tests Critic monitored/fed/steered/stopped when enabled, untouched when disabled; delivery posts on a target, skips without one, doesn't fire on an early resolve error. Next: **C1** — mort-side `skilltools.Tool → tool.Tool` bridge + route `.agent run` through `executushost` behind a convar flag. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-06-27 14:00:24 +00:00
C0b: wire Critic + Delivery into run.Executor
executus CI / test (pull_request) Failing after 1m0s
Adversarial Review (Gadfly) / review (pull_request) Successful in 5m9s
43b2471737
Continues finishing the executor's run.Ports wiring (after C0's Palette).

Critic (run/critic.go): when Ports.Critic is set and the agent enables it, the
executor calls Monitor at run start, feeds RecordStep/RecordToolStart from the
step observer, drains the critic's Steer messages into the loop via
agent.WithSteer, and binds the run's hard cancellation to the critic's
(extendable) Deadline through a watch goroutine — a healthy-but-slow run gets
room while a hung one is killed. Stop() on run end. Soft timeout from
Defaults.CriticSoftTimeout (default 90s). nil-safe: no critic / not-enabled =
no-op.

Delivery (run/executor.go deliver): after the run, when Ports.Delivery is set
and inv.DeliveryID is non-empty, the executor posts Result.Output (or
DeliverError on failure) to a host-interpreted deliver.Target
{inv.DeliveryKind, inv.DeliveryID}. Empty target = caller reads Result.Output
itself (the synchronous default; the `.agent run` canary). Best-effort +
detached.

tool.Invocation gains DeliveryKind/DeliveryID (host-set egress target).

Tests: critic monitored/fed/steered/stopped when enabled, untouched when not;
delivery posts on a target, skips without one. Deferred: Checkpointer (needs a
majordomo hook to snapshot the running message history).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

From the security lens, this PR is the run kernel wiring a host-supplied watchdog (Critic) and a host-supplied egress channel (Delivery). The kernel intentionally performs no validation of inv.DeliveryKind / inv.DeliveryID — both fields are documented as host-interpreted, and tool/registry.go:176-182 says so. The actual trust boundary is the host's run.Ports.Delivery implementation, which is not in this diff. Same story for Critic: only the host's Escalator decides what Steer messages get injected.

What I verified by reading the code:

  • run/executor.go:338-349deliver() only fires when both Ports.Delivery is set and inv.DeliveryID != "". The empty-ID gate is the correct safe-default, matching the PR's "synchronous default" claim. Errors from Deliver/DeliverError are deliberately dropped (best-effort side channel) — that's a deliberate trade, not a leak vector.
  • run/executor.go:343 — delivery runs on detach(ctx), so a cancelled caller can't kill the post, and CleanupContextTimeout (10s, run/runengine.go:43) bounds it. No DoS amplification.
  • run/critic.go:29-66 — the watch goroutine's cancel() is the timeout-bounded context.WithTimeout cancel from runCtx. The done channel is closed before h.Stop() in the returned stop closure, so the goroutine always exits deterministically. No leak path.
  • run/critic.go:86agent.WithSteer(b.h.Steer) pushes the host's critic-emitted messages into the LLM loop. This is a real trust expansion (a malicious or buggy Escalator can inject text into the agent's context), but the Escalator is host-trusted by design; the kernel is the right layer to put no further filter.
  • deliver/deliver.go — the Delivery interface never parses Target.Kind/ID; the harness deliberately treats them as opaque. No template/SQL/path-injection surface is introduced here.
  • run/executor.go:275e.deliver(ctx, inv, res, runErr) is reached only after the model resolves and the loop runs; the TestDeliveryErrorPath test's d.delivers == 0 assertion is correct because the model-resolve return at run/executor.go:137 happens before e.deliver. No bypass.

No authn/authz gap is created inside the kernel (authorization to post to a given DeliveryID is the host's job, set when the Invocation is built). No untrusted-string-into-shell, no template, no SQL, no path concat, no SSRF, no deserialization — the diff is interface-only. Nothing material in this lens.

🎯 Correctness — Minor issues

VERDICT: Minor issues

Findings

  • run/executor.go:129, 137, 174, 182 — Delivery is silently skipped for early errors (budget reject, model resolve, toolbox build, delegation-tools build). e.deliver(...) is called only once at run/executor.go:275, after ag.Run returns. The four early-return paths (executor.go:129 budget, executor.go:137 model resolve, executor.go:174 Registry.Build failure, executor.go:182 addDelegationTools failure) all bypass it. They set res.Err and (for the latter two) call finishAudit, but never invoke DeliverError even when inv.DeliveryID is set. The asymmetry with the post-ag.Run error path (line 275) is real: a caller who set DeliveryID expecting an error post for any run failure will get nothing for these four classes of error. The TestDeliveryErrorPath test (run/delivery_test.go:70-88) only counts delivers (the success path) and asserts it stays 0, with a misleading comment that "DeliverError on in-loop errors is exercised by the wiring" — there is no test that exercises DeliverError at all, and the early-return errors that should reach it don't. Whether this is intended is a domain call, but the test naming and the comment overstate coverage.

  • run/critic.go:34-37 — soft <= 0 guard is unreachable. Defaults.withFallbacks() at run/executor.go:53-55 coerces CriticSoftTimeout to 90 * time.Second whenever <= 0, and New (run/executor.go:92) applies withFallbacks once at construction. By the time startCritic is reached in Run, e.cfg.Defaults.CriticSoftTimeout is guaranteed positive, so the if soft <= 0 { return nil, noop } branch at critic.go:35-37 is dead. Harmless but redundant; could be deleted or replaced with a panic to surface a future regression where someone bypasses withFallbacks.

  • run/critic.go:55 — Deadline comparison uses !time.Now().Before(d). This is correct (kills at-or-past deadline) but cancel fires up to one criticDeadlineCheck (1s) after the actual deadline. With the default backstopMul=3 and CriticSoftTimeout=90s (critic/critic.go:111-112, run/executor.go:54), the hard backstop is ~270s, so 1s of jitter is well below the ExtendBy margin a typical Escalator would grant. Worth flagging only if a future Escalator chooses very small ExtendBy values (e.g. seconds).

  • run/executor.go:226 — string(call.Arguments) is a lossy cast for non-UTF-8 JSON. llm.ToolCall.Arguments is json.RawMessage ([]byte) — confirmed at run/palette_test.go:46 (Arguments: json.RawMessage(...)). The CriticHandle.RecordToolStart(name, args string) seam at run/ports.go:120 forces a string cast, and the only production impl critic/critic.go:171 discards the args entirely (RecordToolStart(name, _ string)), so the corruption is never actually surfaced. The mismatch is real but currently benign; flagging as a semantic seam to watch if a future Critic impl ever inspects args.

Outside my lens

None.

🧯 Error handling & edge cases — Minor issues

All findings confirmed against the actual code. Let me now produce the corrected review. Most findings hold up; I'll keep them all. Note that the reviewer's verdict of "Minor issues" remains appropriate since none of these are blocking compilation or correctness violations — they're concerns/observations.

VERDICT: Minor issues

  • run/critic.go:29-66 — Confirmed. startCritic's watch goroutine has no recover, yet run/executor.go:107 documents Run as never propagating a panic. A panic in h.Deadline() (or anything inside the goroutine on the first tick) escapes the run and kills the host. Wrap the goroutine body and/or Monitor in a recover, or document that CriticHandle implementations must not panic.

  • run/critic.go:34-37 — Confirmed. After New invokes Defaults.withFallbacks() (run/executor.go:92), CriticSoftTimeout <= 0 cannot occur; the check is effectively dead. It also silently diverges from the fallback contract: a host that somehow passes a negative value gets "no critic" instead of the intended 90s default, because withFallbacks patches <= 0 to 90s, but startCritic short-circuits on <= 0 to noop. Pick one layer to handle this.

  • run/executor.go:215, 226 — Confirmed. critic.recordStep / critic.recordToolStart are called inside stepObserver with no recover. The methods are nil-safe, but a panic inside the underlying handle methods would surface from inside the agent loop, abort the run, and could leak the goroutine / skip the deferred stopCritic / audit close. Same recover concern as above.

  • run/executor.go:275 — Confirmed. e.deliver(ctx, inv, res, runErr) runs synchronously after Budget.Commit and before return res, on a detach(ctx) so cancellation is dropped — correct, but a slow Deliver extends the wall-clock to the caller even though Result is already known. The comment at run/executor.go:336 says "detached" which reads ambiguously (decoupled-from-cancel vs fire-and-forget). Worth a small clarify or a goroutine+WaitGroup.

  • run/executor.go:342-348 — Confirmed. The error path in deliver calls DeliverError and discards res.Output. With Result.Output populated but Err != nil, the host must reason about both. Not a bug per the PR's "DeliverError on failure" contract, but worth flagging.

  • run/critic.go:42-65 — Confirmed. h.Deadline() is read from the watch goroutine concurrently with h.Stop() called from the deferred stopCritic. The CriticHandle interface (run/ports.go:115-129) does not document thread-safety, and the test fixture's fakeCriticHandle.Deadline (no mutex) only works because it returns a constant. A real implementation that mutates internal state in Stop races. Document the concurrency contract on the interface or guard the watch goroutine.

  • run/critic_test.go — Confirmed. No test exercises the deadline-watch firing path (the fake's Deadline() always returns zero), the Monitor returning nil edge (run/critic.go:39-41), or the panic recovery noted above. These are the unhappy paths most worth covering.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 26s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **VERDICT: No material issues found** From the security lens, this PR is the run kernel wiring a host-supplied watchdog (Critic) and a host-supplied egress channel (Delivery). The kernel intentionally performs no validation of `inv.DeliveryKind` / `inv.DeliveryID` — both fields are documented as host-interpreted, and `tool/registry.go:176-182` says so. The actual trust boundary is the host's `run.Ports.Delivery` implementation, which is not in this diff. Same story for `Critic`: only the host's `Escalator` decides what `Steer` messages get injected. What I verified by reading the code: - `run/executor.go:338-349` — `deliver()` only fires when **both** `Ports.Delivery` is set **and** `inv.DeliveryID != ""`. The empty-ID gate is the correct safe-default, matching the PR's "synchronous default" claim. Errors from `Deliver`/`DeliverError` are deliberately dropped (best-effort side channel) — that's a deliberate trade, not a leak vector. - `run/executor.go:343` — delivery runs on `detach(ctx)`, so a cancelled caller can't kill the post, and `CleanupContextTimeout` (10s, `run/runengine.go:43`) bounds it. No DoS amplification. - `run/critic.go:29-66` — the watch goroutine's `cancel()` is the timeout-bounded `context.WithTimeout` cancel from `runCtx`. The `done` channel is closed before `h.Stop()` in the returned stop closure, so the goroutine always exits deterministically. No leak path. - `run/critic.go:86` — `agent.WithSteer(b.h.Steer)` pushes the host's critic-emitted messages into the LLM loop. This is a real trust expansion (a malicious or buggy `Escalator` can inject text into the agent's context), but the `Escalator` is host-trusted by design; the kernel is the right layer to put no further filter. - `deliver/deliver.go` — the `Delivery` interface never parses `Target.Kind`/`ID`; the harness deliberately treats them as opaque. No template/SQL/path-injection surface is introduced here. - `run/executor.go:275` — `e.deliver(ctx, inv, res, runErr)` is reached only after the model resolves and the loop runs; the `TestDeliveryErrorPath` test's `d.delivers == 0` assertion is correct because the model-resolve `return` at `run/executor.go:137` happens before `e.deliver`. No bypass. No authn/authz gap is created inside the kernel (authorization to post to a given `DeliveryID` is the host's job, set when the Invocation is built). No untrusted-string-into-shell, no template, no SQL, no path concat, no SSRF, no deserialization — the diff is interface-only. Nothing material in this lens. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> ## VERDICT: Minor issues ### Findings - **run/executor.go:129, 137, 174, 182 — Delivery is silently skipped for early errors (budget reject, model resolve, toolbox build, delegation-tools build).** `e.deliver(...)` is called only once at `run/executor.go:275`, after `ag.Run` returns. The four early-return paths (`executor.go:129` budget, `executor.go:137` model resolve, `executor.go:174` `Registry.Build` failure, `executor.go:182` `addDelegationTools` failure) all bypass it. They set `res.Err` and (for the latter two) call `finishAudit`, but never invoke `DeliverError` even when `inv.DeliveryID` is set. The asymmetry with the post-`ag.Run` error path (line 275) is real: a caller who set `DeliveryID` expecting an error post for *any* run failure will get nothing for these four classes of error. The `TestDeliveryErrorPath` test (`run/delivery_test.go:70-88`) only counts `delivers` (the success path) and asserts it stays 0, with a misleading comment that "DeliverError on in-loop errors is exercised by the wiring" — there is no test that exercises `DeliverError` at all, and the early-return errors that *should* reach it don't. Whether this is intended is a domain call, but the test naming and the comment overstate coverage. - **run/critic.go:34-37 — `soft <= 0` guard is unreachable.** `Defaults.withFallbacks()` at `run/executor.go:53-55` coerces `CriticSoftTimeout` to `90 * time.Second` whenever `<= 0`, and `New` (`run/executor.go:92`) applies `withFallbacks` once at construction. By the time `startCritic` is reached in `Run`, `e.cfg.Defaults.CriticSoftTimeout` is guaranteed positive, so the `if soft <= 0 { return nil, noop }` branch at `critic.go:35-37` is dead. Harmless but redundant; could be deleted or replaced with a panic to surface a future regression where someone bypasses `withFallbacks`. - **run/critic.go:55 — Deadline comparison uses `!time.Now().Before(d)`.** This is correct (kills at-or-past deadline) but cancel fires up to one `criticDeadlineCheck` (1s) after the actual deadline. With the default `backstopMul=3` and `CriticSoftTimeout=90s` (`critic/critic.go:111-112`, `run/executor.go:54`), the hard backstop is ~270s, so 1s of jitter is well below the `ExtendBy` margin a typical Escalator would grant. Worth flagging only if a future Escalator chooses very small `ExtendBy` values (e.g. seconds). - **run/executor.go:226 — `string(call.Arguments)` is a lossy cast for non-UTF-8 JSON.** `llm.ToolCall.Arguments` is `json.RawMessage` (`[]byte`) — confirmed at `run/palette_test.go:46` (`Arguments: json.RawMessage(...)`). The `CriticHandle.RecordToolStart(name, args string)` seam at `run/ports.go:120` forces a `string` cast, and the only production impl `critic/critic.go:171` discards the args entirely (`RecordToolStart(name, _ string)`), so the corruption is never actually surfaced. The mismatch is real but currently benign; flagging as a semantic seam to watch if a future Critic impl ever inspects `args`. ### Outside my lens None. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> All findings confirmed against the actual code. Let me now produce the corrected review. Most findings hold up; I'll keep them all. Note that the reviewer's verdict of "Minor issues" remains appropriate since none of these are blocking compilation or correctness violations — they're concerns/observations. **VERDICT: Minor issues** - `run/critic.go:29-66` — Confirmed. `startCritic`'s watch goroutine has no `recover`, yet `run/executor.go:107` documents `Run` as never propagating a panic. A panic in `h.Deadline()` (or anything inside the goroutine on the first tick) escapes the run and kills the host. Wrap the goroutine body and/or `Monitor` in a recover, or document that `CriticHandle` implementations must not panic. - `run/critic.go:34-37` — Confirmed. After `New` invokes `Defaults.withFallbacks()` (`run/executor.go:92`), `CriticSoftTimeout <= 0` cannot occur; the check is effectively dead. It also silently diverges from the fallback contract: a host that somehow passes a negative value gets "no critic" instead of the intended 90s default, because `withFallbacks` patches `<= 0` to 90s, but `startCritic` short-circuits on `<= 0` to noop. Pick one layer to handle this. - `run/executor.go:215, 226` — Confirmed. `critic.recordStep` / `critic.recordToolStart` are called inside `stepObserver` with no recover. The methods are nil-safe, but a panic inside the underlying handle methods would surface from inside the agent loop, abort the run, and could leak the goroutine / skip the deferred `stopCritic` / audit close. Same recover concern as above. - `run/executor.go:275` — Confirmed. `e.deliver(ctx, inv, res, runErr)` runs synchronously after `Budget.Commit` and before `return res`, on a `detach(ctx)` so cancellation is dropped — correct, but a slow `Deliver` extends the wall-clock to the caller even though `Result` is already known. The comment at `run/executor.go:336` says "detached" which reads ambiguously (decoupled-from-cancel vs fire-and-forget). Worth a small clarify or a goroutine+WaitGroup. - `run/executor.go:342-348` — Confirmed. The error path in `deliver` calls `DeliverError` and discards `res.Output`. With `Result.Output` populated but `Err != nil`, the host must reason about both. Not a bug per the PR's "DeliverError on failure" contract, but worth flagging. - `run/critic.go:42-65` — Confirmed. `h.Deadline()` is read from the watch goroutine concurrently with `h.Stop()` called from the deferred `stopCritic`. The `CriticHandle` interface (`run/ports.go:115-129`) does not document thread-safety, and the test fixture's `fakeCriticHandle.Deadline` (no mutex) only works because it returns a constant. A real implementation that mutates internal state in `Stop` races. Document the concurrency contract on the interface or guard the watch goroutine. - `run/critic_test.go` — Confirmed. No test exercises the deadline-watch firing path (the fake's `Deadline()` always returns zero), the `Monitor` returning nil edge (`run/critic.go:39-41`), or the panic recovery noted above. These are the unhappy paths most worth covering. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 26s</sub>

🪰 Gadfly review — deepseek-v4-flash:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

🔒 Security — Verdict: No material issues found

After reviewing the diff and verifying the surrounding code (ports, interfaces, delivery package, invocation struct, context management), I find no material security vulnerabilities introduced by this change.

Why nothing rises to a finding:

  • Critic wiring — The critic is a host-provided, trusted component (Ports.Critic). Its Steer() messages are injected into the agent loop, and its Deadline() drives a context cancellation goroutine. Both are standard, safe patterns. The deadline-watch goroutine is properly cleaned up via a done channel and deferred stopCritic(). Double-calling cancel() is safe in Go (subsequent calls are no-ops).

  • Delivery pathDeliveryKind/DeliveryID are explicitly host-set fields on Invocation (per the PR description and the doc comments). The executor trusts the host that built the invocation — this is the correct trust boundary. The delivery is best-effort and detached (failures silently ignored), so it cannot crash or hang the run. No injection vector exists because the deliver.Target is host-interpreted; the host's Delivery implementation is responsible for validating whatever it receives.

  • No new untrusted input paths — The PR adds no new user-facing input parsing, no SQL/command/path construction from untrusted data, no deserialization, and no credential handling.

  • No SSRF — The delivery target is a pair of strings passed to a host-provided interface; no network calls are made by the executor itself.

The only defense-in-depth observation (not a finding) is that the executor does not validate DeliveryKind/DeliveryID before forwarding them to the delivery system. This is by design — the host sets these values and the host's Delivery implementation is the appropriate place for any validation. A separate reviewer covering the correctness or architecture lens may wish to confirm that no caller path exists where user-controlled data flows into these fields without host mediation.

🎯 Correctness — No material issues found

🎯 Correctness — Verdict: No material issues found

After reading the full implementation files, the interface definitions in run/ports.go, the deliver package, and the test files, I find no logic bugs, semantic errors, or incorrect behavior introduced by this change.

What I verified

  • Critic deadline-watch goroutine (critic.go:43-61): The cancel captured in the closure is the timeout context's cancel function (line 190). When the critic fires, it cancels the timeout context, which propagates through MergeCancellation to the merged runCtx. Double-cancel is safe. The done channel correctly signals the goroutine to stop.

  • Nil-safety (critic.go:68-86): recordStep, recordToolStart, and steerOptions all guard on b != nil / b == nil. The startCritic function returns (nil, noop) for every early-exit path.

  • steerOptions passed as variadic (executor.go:261): critic.steerOptions() returns nil when there's no critic; nil... in a variadic call is a no-op in Go.

  • Delivery uses detached context (executor.go:338-349): detach(ctx) strips cancellation via context.WithoutCancel and applies CleanupContextTimeout, so a cancelled caller or run context doesn't prevent the best-effort egress.

  • Delivery gating (executor.go:339): Checks inv.DeliveryID == "" (not just DeliveryKind), which matches the Target struct semantics — an ID-less target is meaningless.

  • Early-return paths skip delivery (executor.go:128-131, 136-138, 142-144, 172-174, 180-182): All return before deliver is called on line 275. The TestDeliveryErrorPath test confirms this.

  • CriticSoftTimeout default (executor.go:53-55): 90s, applied when <= 0. The startCritic function also guards on soft <= 0, so an explicit zero disables the critic (consistent with the zero-value semantics).

  • Test assertions (critic_test.go, delivery_test.go): All tests correctly assert the expected call counts and target values. The fakeCriticHandle.Deadline() returns zero time, so the deadline-watch goroutine never fires spuriously in tests.

No material correctness issues found in this change.

🧯 Error handling & edge cases — No material issues found

🧯 Error handling & edge cases — Verdict: No material issues found

After reading the full source of run/critic.go, run/executor.go, run/ports.go, run/agent.go, deliver/deliver.go, and the test files, I find no material error-handling or edge-case bugs in this diff.

What was verified:

  • Goroutine lifecycle (critic.go:43–61): The deadline-watch goroutine exits cleanly on all three paths (done channel closed, runCtx.Done(), or deadline-passed → cancel()). The done channel is only closed by the stop function; the goroutine never closes it, so there is no double-close or send-on-closed risk. The cancel() call is idempotent (Go's context.WithTimeout cancel is safe to call multiple times), so the deferred cancel() at executor.go:191 and the goroutine's cancel() at critic.go:56 do not conflict.

  • Nil-safety throughout: startCritic returns (nil, noop) for every "not configured" case (nil Ports.Critic, Critic.Enabled == false, soft <= 0, nil handle from Monitor). All criticBinding methods (recordStep, recordToolStart, steerOptions) check b != nil before dereferencing. The executor calls them unconditionally — safe.

  • Delivery best-effort + detached: deliver (executor.go:338–349) uses detach(ctx) so post-run writes survive caller cancellation. Both Deliver and DeliverError return values are discarded (_ =, _, _ =), which is the documented contract ("must not change the run's outcome").

  • Early-return paths skip delivery: Model-resolve failure (line 136), toolbox build failure (line 172), and delegation-tool build failure (line 180) all return before deliver is reached. The TestDeliveryErrorPath test confirms this. This is intentional — delivery only fires after a full run attempt.

  • Step observer edge cases (executor.go:207–233): When s.Response is nil, calls is nil (len 0), so the pairwise min(len(calls), len(s.Results)) correctly yields zero iterations — no out-of-bounds access. The critic.recordStep/recordToolStart calls are nil-safe.

  • withFallbacks guarantees: CriticSoftTimeout defaults to 90s in withFallbacks() (executor.go:53–55), which runs in New(). By the time startCritic reads it, it is always positive when a critic is configured. The soft <= 0 guard in startCritic is a safety net that will never trigger in practice.

One design observation (not a bug): When runErr != nil, deliver calls only DeliverError and discards any partial res.Output. The delivery target gets the error but not the partial work. This is the documented contract ("posts Result.Output or DeliverError on failure"), so it is intentional — but callers who want partial output on failure must read Result.Output from the return value directly.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 7s

<!-- gadfly-review:ollama:deepseek-v4-flash:cloud --> ### 🪰 Gadfly review — `deepseek-v4-flash:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## 🔒 Security — Verdict: No material issues found After reviewing the diff and verifying the surrounding code (ports, interfaces, delivery package, invocation struct, context management), I find no material security vulnerabilities introduced by this change. **Why nothing rises to a finding:** - **Critic wiring** — The critic is a host-provided, trusted component (`Ports.Critic`). Its `Steer()` messages are injected into the agent loop, and its `Deadline()` drives a context cancellation goroutine. Both are standard, safe patterns. The deadline-watch goroutine is properly cleaned up via a `done` channel and deferred `stopCritic()`. Double-calling `cancel()` is safe in Go (subsequent calls are no-ops). - **Delivery path** — `DeliveryKind`/`DeliveryID` are explicitly **host-set** fields on `Invocation` (per the PR description and the doc comments). The executor trusts the host that built the invocation — this is the correct trust boundary. The delivery is best-effort and detached (failures silently ignored), so it cannot crash or hang the run. No injection vector exists because the `deliver.Target` is host-interpreted; the host's `Delivery` implementation is responsible for validating whatever it receives. - **No new untrusted input paths** — The PR adds no new user-facing input parsing, no SQL/command/path construction from untrusted data, no deserialization, and no credential handling. - **No SSRF** — The delivery target is a pair of strings passed to a host-provided interface; no network calls are made by the executor itself. The only defense-in-depth observation (not a finding) is that the executor does not validate `DeliveryKind`/`DeliveryID` before forwarding them to the delivery system. This is by design — the host sets these values and the host's `Delivery` implementation is the appropriate place for any validation. A separate reviewer covering the **correctness** or **architecture** lens may wish to confirm that no caller path exists where user-controlled data flows into these fields without host mediation. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> ## 🎯 Correctness — Verdict: No material issues found After reading the full implementation files, the interface definitions in `run/ports.go`, the `deliver` package, and the test files, I find no logic bugs, semantic errors, or incorrect behavior introduced by this change. ### What I verified - **Critic deadline-watch goroutine** (`critic.go:43-61`): The `cancel` captured in the closure is the timeout context's cancel function (line 190). When the critic fires, it cancels the timeout context, which propagates through `MergeCancellation` to the merged `runCtx`. Double-cancel is safe. The `done` channel correctly signals the goroutine to stop. ✅ - **Nil-safety** (`critic.go:68-86`): `recordStep`, `recordToolStart`, and `steerOptions` all guard on `b != nil` / `b == nil`. The `startCritic` function returns `(nil, noop)` for every early-exit path. ✅ - **`steerOptions` passed as variadic** (`executor.go:261`): `critic.steerOptions()` returns `nil` when there's no critic; `nil...` in a variadic call is a no-op in Go. ✅ - **Delivery uses detached context** (`executor.go:338-349`): `detach(ctx)` strips cancellation via `context.WithoutCancel` and applies `CleanupContextTimeout`, so a cancelled caller or run context doesn't prevent the best-effort egress. ✅ - **Delivery gating** (`executor.go:339`): Checks `inv.DeliveryID == ""` (not just `DeliveryKind`), which matches the `Target` struct semantics — an ID-less target is meaningless. ✅ - **Early-return paths skip delivery** (`executor.go:128-131, 136-138, 142-144, 172-174, 180-182`): All return before `deliver` is called on line 275. The `TestDeliveryErrorPath` test confirms this. ✅ - **`CriticSoftTimeout` default** (`executor.go:53-55`): 90s, applied when `<= 0`. The `startCritic` function also guards on `soft <= 0`, so an explicit zero disables the critic (consistent with the zero-value semantics). ✅ - **Test assertions** (`critic_test.go`, `delivery_test.go`): All tests correctly assert the expected call counts and target values. The `fakeCriticHandle.Deadline()` returns zero time, so the deadline-watch goroutine never fires spuriously in tests. ✅ No material correctness issues found in this change. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> ## 🧯 Error handling & edge cases — Verdict: No material issues found After reading the full source of `run/critic.go`, `run/executor.go`, `run/ports.go`, `run/agent.go`, `deliver/deliver.go`, and the test files, I find no material error-handling or edge-case bugs in this diff. **What was verified:** - **Goroutine lifecycle (critic.go:43–61):** The deadline-watch goroutine exits cleanly on all three paths (`done` channel closed, `runCtx.Done()`, or deadline-passed → `cancel()`). The `done` channel is only closed by the stop function; the goroutine never closes it, so there is no double-close or send-on-closed risk. The `cancel()` call is idempotent (Go's `context.WithTimeout` cancel is safe to call multiple times), so the deferred `cancel()` at executor.go:191 and the goroutine's `cancel()` at critic.go:56 do not conflict. - **Nil-safety throughout:** `startCritic` returns `(nil, noop)` for every "not configured" case (nil `Ports.Critic`, `Critic.Enabled == false`, `soft <= 0`, nil handle from `Monitor`). All `criticBinding` methods (`recordStep`, `recordToolStart`, `steerOptions`) check `b != nil` before dereferencing. The executor calls them unconditionally — safe. - **Delivery best-effort + detached:** `deliver` (executor.go:338–349) uses `detach(ctx)` so post-run writes survive caller cancellation. Both `Deliver` and `DeliverError` return values are discarded (`_ =`, `_, _ =`), which is the documented contract ("must not change the run's outcome"). - **Early-return paths skip delivery:** Model-resolve failure (line 136), toolbox build failure (line 172), and delegation-tool build failure (line 180) all return before `deliver` is reached. The `TestDeliveryErrorPath` test confirms this. This is intentional — delivery only fires after a full run attempt. - **Step observer edge cases (executor.go:207–233):** When `s.Response` is nil, `calls` is nil (len 0), so the pairwise min(`len(calls)`, `len(s.Results)`) correctly yields zero iterations — no out-of-bounds access. The `critic.recordStep`/`recordToolStart` calls are nil-safe. - **`withFallbacks` guarantees:** `CriticSoftTimeout` defaults to 90s in `withFallbacks()` (executor.go:53–55), which runs in `New()`. By the time `startCritic` reads it, it is always positive when a critic is configured. The `soft <= 0` guard in `startCritic` is a safety net that will never trigger in practice. **One design observation (not a bug):** When `runErr != nil`, `deliver` calls only `DeliverError` and discards any partial `res.Output`. The delivery target gets the error but not the partial work. This is the documented contract ("posts `Result.Output` *or* `DeliverError` on failure"), so it is intentional — but callers who want partial output on failure must read `Result.Output` from the return value directly. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 7s</sub>

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Through the security lens (authn/authz, injection, SSRF, secret leakage, input validation, unsafe untrusted-data handling), I verified the following and found nothing materially wrong:

  • Delivery target provenance (run/executor.go:339-348, tool/registry.go:181-182): DeliveryKind/DeliveryID are documented as host-set and are not parsed/validated by the harness — the deliver.Target is explicitly host-interpreted (deliver/deliver.go:18-23). I grepped the whole repo for setters of these fields: the only writes are in tests; no LLM/tool-arg-reachable path currently populates them. So there is no tainted-input-to-egress-target flow today. The seam is open and relies on the host never letting untrusted input stamp it — worth keeping in mind for C1 (when .agent run routing wires the host side), but not a present vulnerability. Unverified as a future risk only.

  • DeliverError forwards the raw runErr (run/executor.go:344-346): internal wrapped errors (e.g. fmt.Errorf("resolve model %q: %w", tier, err) at run/executor.go:136) are handed to the egress seam. Whether internal details (tier names, provider errors) leak to a public channel is entirely a function of the host's Delivery.DeliverError rendering — the harness passes the error to an interface explicitly designed to receive it. Not a harness-side disclosure bug; host implementations should redact. Minor observation, not material.

  • Critic receives raw tool args (run/critic.go:74-77, fed at run/executor.go:226): recordToolStart(call.Name, string(call.Arguments)) forwards tool call arguments (which may contain user data / secrets) to Ports.Critic. The Critic is a trusted host Port (run/ports.go:111), not an untrusted sink, so this is consistent with the existing audit/emitter wiring that already forwards the same data. No new exposure.

  • Deadline-watch goroutine (run/critic.go:42-61): only calls cancel() (the run's own context cancel) on a host-provided Deadline(); no external side effects, no injection surface. Safe.

  • detach(ctx) for delivery (run/executor.go:343): runs post-run egress on a context decoupled from caller cancellation, bounded by CleanupContextTimeout. This is the same detached-cleanup pattern already used for audit/Budget commits (run/executor.go:273,311) — no new authz or escape concern.

No blocking or material security issues in this change.

🎯 Correctness — Minor issues

Both findings survive verification. Outputting the corrected review.

Verdict: Minor issues

  • run/executor.go:224-226recordToolStart is fired after the tool has already executed, not at tool start. Confirmed in the checked-out code: the stepObserver callback receives agent.Step with s.Results already populated, and critic.recordToolStart(call.Name, string(call.Arguments)) (executor.go:226) runs inside the pairwise loop over completed calls/results — i.e. the iteration's tool calls have finished by the time it fires. The inline comment at executor.go:215 ("keep the critic's activity clock fresh") and the CriticHandle contract in run/ports.go:117-120 describe RecordStep/RecordToolStart as keeping the activity clock fresh, but this delivery is post-completion. Impact: during a genuinely long-running tool call the critic's activity clock is not refreshed, so the stall scenario the critic exists to detect (a hung tool) can look idle and trigger a premature hard-kill. This mirrors the existing emitter.toolStart pattern (also post-completion, executor.go:227), so it is not a regression introduced by this PR — but a host implementing CriticHandle.RecordToolStart against the documented contract gets a misleading signal. Suggested fix: rename/document to reflect post-iteration delivery, or wire a true pre-tool hook if majordomo exposes one.

  • run/critic.go:34-37 — redundant/dead soft <= 0 guard. Confirmed: e.cfg.Defaults.withFallbacks() (run in New) sets CriticSoftTimeout to 90s whenever it is <= 0 (executor.go:53-55), so by the time startCritic reads soft := e.cfg.Defaults.CriticSoftTimeout, soft is always positive and the if soft <= 0 { return nil, noop } branch is unreachable. Harmless functionally, but it silently diverges from the documented "nil-safe no-op" intent: a caller who explicitly set CriticSoftTimeout: -1 expecting to disable the critic gets the 90s fallback instead. Suggested fix: drop the guard, or decide on one documented disable mechanism.

🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

  • run/executor.go:126-183 (early-return paths skip delivery entirely). The new e.deliver(ctx, inv, res, runErr) is only called at the end of the happy-path-into-loop flow (:275). The early-return error branches — budget gate rejection (:129), model resolve error (:137), nil model (:143), toolbox build failure (:174), delegation-tool build failure (:182) — all return res without calling deliver. So when a host sets inv.DeliveryID and the run fails before the agent loop starts, the target gets nothing: no Result.Output and no DeliverError. The PR description says "DeliverError on failure," but that only holds for in-loop failures. TestDeliveryErrorPath explicitly asserts the early-return path produces zero deliveries and frames it as intentional, but from an error-handling-completeness standpoint a configured egress target silently swallowing a hard run failure is a real gap. If early failures should be surfaced to the target, route those returns through e.deliver (or at least DeliverError); if silence is truly intended, document it on deliver/DeliveryID so callers know egress is best-effort-only-after-the-loop-starts.

  • run/critic.go:34-36 (explicit CriticSoftTimeout <= 0 silently disables an enabled critic). startCritic returns (nil, noop) when soft <= 0. withFallbacks (executor.go:53-55) only fills a missing value (<= 0 → 90s), so this branch is only reachable if a caller somehow bypasses withFallbacks. Since New calls withFallbacks (executor.go:92), this is effectively dead in normal construction — but it means RunnableAgent.Critic.Enabled=true combined with a hand-constructed Executor (no New) and CriticSoftTimeout=0 silently disables the critic with no error. Low impact given the New guarantee; flagging as a sharp edge.

  • run/executor.go:338-348 deliver swallows both Deliver/DeliverError errors via _, _ = / _ =. This is explicitly documented as "best-effort + detached: a delivery failure must not change the run's outcome," so it's by design. No action needed; confirming it's intentional rather than an accidental swallow.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 34s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **VERDICT: No material issues found** Through the security lens (authn/authz, injection, SSRF, secret leakage, input validation, unsafe untrusted-data handling), I verified the following and found nothing materially wrong: - **Delivery target provenance** (`run/executor.go:339-348`, `tool/registry.go:181-182`): `DeliveryKind`/`DeliveryID` are documented as host-set and are not parsed/validated by the harness — the `deliver.Target` is explicitly host-interpreted (`deliver/deliver.go:18-23`). I grepped the whole repo for setters of these fields: the only writes are in tests; no LLM/tool-arg-reachable path currently populates them. So there is no tainted-input-to-egress-target flow today. The seam is open and relies on the host never letting untrusted input stamp it — worth keeping in mind for C1 (when `.agent run` routing wires the host side), but not a present vulnerability. *Unverified as a future risk only.* - **`DeliverError` forwards the raw `runErr`** (`run/executor.go:344-346`): internal wrapped errors (e.g. `fmt.Errorf("resolve model %q: %w", tier, err)` at `run/executor.go:136`) are handed to the egress seam. Whether internal details (tier names, provider errors) leak to a public channel is entirely a function of the host's `Delivery.DeliverError` rendering — the harness passes the error to an interface explicitly designed to receive it. Not a harness-side disclosure bug; host implementations should redact. Minor observation, not material. - **Critic receives raw tool args** (`run/critic.go:74-77`, fed at `run/executor.go:226`): `recordToolStart(call.Name, string(call.Arguments))` forwards tool call arguments (which may contain user data / secrets) to `Ports.Critic`. The Critic is a trusted host Port (`run/ports.go:111`), not an untrusted sink, so this is consistent with the existing audit/emitter wiring that already forwards the same data. No new exposure. - **Deadline-watch goroutine** (`run/critic.go:42-61`): only calls `cancel()` (the run's own context cancel) on a host-provided `Deadline()`; no external side effects, no injection surface. Safe. - **`detach(ctx)` for delivery** (`run/executor.go:343`): runs post-run egress on a context decoupled from caller cancellation, bounded by `CleanupContextTimeout`. This is the same detached-cleanup pattern already used for audit/Budget commits (`run/executor.go:273,311`) — no new authz or escape concern. No blocking or material security issues in this change. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> Both findings survive verification. Outputting the corrected review. **Verdict: Minor issues** - **`run/executor.go:224-226` — `recordToolStart` is fired *after* the tool has already executed, not at tool start.** Confirmed in the checked-out code: the `stepObserver` callback receives `agent.Step` with `s.Results` already populated, and `critic.recordToolStart(call.Name, string(call.Arguments))` (executor.go:226) runs inside the pairwise loop over completed calls/results — i.e. the iteration's tool calls have finished by the time it fires. The inline comment at executor.go:215 ("keep the critic's activity clock fresh") and the `CriticHandle` contract in `run/ports.go:117-120` describe `RecordStep`/`RecordToolStart` as keeping the activity clock fresh, but this delivery is post-completion. Impact: during a genuinely long-running tool call the critic's activity clock is *not* refreshed, so the stall scenario the critic exists to detect (a hung tool) can look idle and trigger a premature hard-kill. This mirrors the existing `emitter.toolStart` pattern (also post-completion, executor.go:227), so it is not a regression introduced by this PR — but a host implementing `CriticHandle.RecordToolStart` against the documented contract gets a misleading signal. Suggested fix: rename/document to reflect post-iteration delivery, or wire a true pre-tool hook if majordomo exposes one. - **`run/critic.go:34-37` — redundant/dead `soft <= 0` guard.** Confirmed: `e.cfg.Defaults.withFallbacks()` (run in `New`) sets `CriticSoftTimeout` to 90s whenever it is `<= 0` (executor.go:53-55), so by the time `startCritic` reads `soft := e.cfg.Defaults.CriticSoftTimeout`, `soft` is always positive and the `if soft <= 0 { return nil, noop }` branch is unreachable. Harmless functionally, but it silently diverges from the documented "nil-safe no-op" intent: a caller who explicitly set `CriticSoftTimeout: -1` expecting to disable the critic gets the 90s fallback instead. Suggested fix: drop the guard, or decide on one documented disable mechanism. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> ## VERDICT: Minor issues - **`run/executor.go:126-183` (early-return paths skip delivery entirely).** The new `e.deliver(ctx, inv, res, runErr)` is only called at the end of the happy-path-into-loop flow (`:275`). The early-return error branches — budget gate rejection (`:129`), model resolve error (`:137`), nil model (`:143`), toolbox build failure (`:174`), delegation-tool build failure (`:182`) — all `return res` without calling `deliver`. So when a host sets `inv.DeliveryID` and the run fails before the agent loop starts, the target gets *nothing*: no `Result.Output` and no `DeliverError`. The PR description says "`DeliverError` on failure," but that only holds for in-loop failures. `TestDeliveryErrorPath` explicitly asserts the early-return path produces zero deliveries and frames it as intentional, but from an error-handling-completeness standpoint a configured egress target silently swallowing a hard run failure is a real gap. If early failures should be surfaced to the target, route those returns through `e.deliver` (or at least `DeliverError`); if silence is truly intended, document it on `deliver`/`DeliveryID` so callers know egress is best-effort-only-after-the-loop-starts. - **`run/critic.go:34-36` (explicit `CriticSoftTimeout <= 0` silently disables an enabled critic).** `startCritic` returns `(nil, noop)` when `soft <= 0`. `withFallbacks` (`executor.go:53-55`) only fills a *missing* value (`<= 0` → 90s), so this branch is only reachable if a caller somehow bypasses `withFallbacks`. Since `New` calls `withFallbacks` (`executor.go:92`), this is effectively dead in normal construction — but it means `RunnableAgent.Critic.Enabled=true` combined with a hand-constructed `Executor` (no `New`) and `CriticSoftTimeout=0` silently disables the critic with no error. Low impact given the `New` guarantee; flagging as a sharp edge. - **`run/executor.go:338-348` `deliver` swallows both `Deliver`/`DeliverError` errors via `_, _ =` / `_ =`.** This is explicitly documented as "best-effort + detached: a delivery failure must not change the run's outcome," so it's by design. No action needed; confirming it's intentional rather than an accidental swallow. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 34s</sub>
steve added 1 commit 2026-06-27 14:09:24 +00:00
C0b: address verified gadfly findings (panic-safety + test honesty)
executus CI / test (pull_request) Failing after 58s
4aa06f652e
From PR #9 (minimax + deepseek):
- Run now has a top-level recover() — the "never propagates a panic" promise was
  unenforced; a panicking host Port (Critic/Audit/Palette) on the run goroutine
  now becomes Result.Err instead of unwinding into the caller.
- The critic deadline-watch goroutine recovers panics from a host Deadline()
  (it's a separate goroutine, so Run's recover can't catch it) — a buggy
  CriticHandle can't crash the process.
- CriticHandle interface documents its concurrency contract (Record*/Steer on the
  run goroutine vs Deadline()/Stop() from the watch goroutine — impls must be
  concurrent-safe; the critic battery already is).
- startCritic's dead `soft <= 0 -> noop` guard (withFallbacks already coerces to
  90s) replaced with a defensive inline 90s default, so a bypass of withFallbacks
  still gets a working critic instead of silently none.
- Delivery tests made honest: the old "error path" test only checked the
  early-return (no delivery); added TestDeliverErrorOnRunFailure (in-loop model
  error -> DeliverError to the target) + renamed the early-return test.

Graded all #9 findings in the gadfly MCP.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve added 1 commit 2026-06-27 14:10:57 +00:00
C0b: document recordToolStart post-iteration timing (gadfly glm finding)
executus CI / test (pull_request) Failing after 59s
executus CI / test (push) Failing after 1m1s
97154395e6
majordomo's step observer fires post-iteration, so the critic's activity clock
refreshes per-iteration, not mid-tool — a single long tool call won't refresh it
until it returns. Documented + the host-progress-bridge mitigation (mort's
pattern). A true pre-dispatch hook needs majordomo support (follow-up).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve closed this pull request 2026-06-27 14:27:19 +00:00
steve deleted branch phase-c0b-critic-delivery 2026-06-27 14:27:19 +00:00
Some checks are pending
executus CI / test (pull_request) Failing after 59s
executus CI / test (push) Failing after 1m1s

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/executus#9