feat(run): critic owns the deadline — MaxRuntime becomes the soft trigger #21

2026-06-30T15:04:16Z

steve commented

2026-06-30 15:04:16 +00:00

Problem

The executus migration ported mort's run-critic faithfully, but the kernel defeats it. run/executor.go applied context.WithTimeout(ctx, MaxRuntime) unconditionally — even for critic_enabled agents — and started the critic on a context already hard-capped at MaxRuntime. So a critic-enabled agent (mort's general, MaxRuntime=30m) was hard-killed at exactly 30 min; the critic's extendable backstop (MaxRuntime × multiplier, clamped to 6h) was structurally unreachable. The soft trigger also used a global Defaults.CriticSoftTimeout (90s) instead of the agent's MaxRuntime.

Live symptom in mort: a general @mention that delegated to the long-running animate skill (a legit 30-min render) was killed at MaxRuntime (agent: step 3: context deadline exceeded), cascading to kill the in-flight render.

Fix — restore the old agentexec two-tier semantics

run/executor.go: the WithTimeout(MaxRuntime) is now conditional on criticOwnsDeadline = Ports.Critic != nil && ra.Critic.Enabled.
- No critic (default): MaxRuntime stays a literal hard kill (→ timeout). Unchanged.
- Critic-owned: MaxRuntime becomes the SOFT trigger; the run is wrapped in a generous WithTimeout at the new Defaults.CriticAbsoluteMax (default 6h) as a failsafe ceiling only — it never fires before the critic's backstop, and guarantees a broken/nil host handle can't run unbounded. The critic's extendable deadline (watched in startCritic, cancels via cancelCause) is the real deadline.
run/critic.go: startCritic takes the resolved MaxRuntime as the soft trigger (falls back to Defaults.CriticSoftTimeout, then 90s).
Defaults.CriticAbsoluteMax added (withFallbacks default 6h).

Tests (`run/critic_deadline_test.go`)

TestNoCritic_MaxRuntimeIsHardCap — legacy contract preserved: a non-critic run whose work (200ms tool) outlasts MaxRuntime (20ms) ends in error.
TestCriticOwnsDeadline_SurvivesPastMaxRuntime — the fix: an identical critic-owned run completes.
TestCriticSoftTriggerIsMaxRuntime — the soft trigger handed to Monitor is the agent's MaxRuntime, not Defaults.CriticSoftTimeout.

Full go test ./... green. The mort host adapter (executushost/critic_adapter.go) already clamps the per-run backstop to its convar-driven AbsoluteMax, so the kernel ceiling is a redundant failsafe.

🤖 Generated with Claude Code

## Problem The executus migration ported mort's run-critic faithfully, but the kernel defeats it. `run/executor.go` applied `context.WithTimeout(ctx, MaxRuntime)` **unconditionally** — even for `critic_enabled` agents — and started the critic on a context already hard-capped at `MaxRuntime`. So a critic-enabled agent (mort's `general`, `MaxRuntime=30m`) was hard-killed at exactly 30 min; the critic's extendable backstop (`MaxRuntime × multiplier`, clamped to 6h) was structurally unreachable. The soft trigger also used a global `Defaults.CriticSoftTimeout` (90s) instead of the agent's `MaxRuntime`. Live symptom in mort: a `general` @mention that delegated to the long-running `animate` skill (a legit 30-min render) was killed at `MaxRuntime` (`agent: step 3: context deadline exceeded`), cascading to kill the in-flight render. ## Fix — restore the old agentexec two-tier semantics - **`run/executor.go`**: the `WithTimeout(MaxRuntime)` is now conditional on `criticOwnsDeadline = Ports.Critic != nil && ra.Critic.Enabled`. - **No critic (default):** `MaxRuntime` stays a literal hard kill (→ `timeout`). Unchanged. - **Critic-owned:** `MaxRuntime` becomes the SOFT trigger; the run is wrapped in a generous `WithTimeout` at the new `Defaults.CriticAbsoluteMax` (default 6h) as a **failsafe ceiling only** — it never fires before the critic's backstop, and guarantees a broken/nil host handle can't run unbounded. The critic's extendable deadline (watched in `startCritic`, cancels via `cancelCause`) is the real deadline. - **`run/critic.go`**: `startCritic` takes the resolved `MaxRuntime` as the soft trigger (falls back to `Defaults.CriticSoftTimeout`, then 90s). - **`Defaults.CriticAbsoluteMax`** added (withFallbacks default 6h). ## Tests (`run/critic_deadline_test.go`) - `TestNoCritic_MaxRuntimeIsHardCap` — legacy contract preserved: a non-critic run whose work (200ms tool) outlasts `MaxRuntime` (20ms) ends in error. - `TestCriticOwnsDeadline_SurvivesPastMaxRuntime` — the fix: an identical critic-owned run **completes**. - `TestCriticSoftTriggerIsMaxRuntime` — the soft trigger handed to `Monitor` is the agent's `MaxRuntime`, not `Defaults.CriticSoftTimeout`. Full `go test ./...` green. The mort host adapter (`executushost/critic_adapter.go`) already clamps the per-run backstop to its convar-driven `AbsoluteMax`, so the kernel ceiling is a redundant failsafe. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

steve added 1 commit 2026-06-30 15:04:17 +00:00

feat(run): critic owns the deadline — MaxRuntime becomes the soft trigger

executus CI / test (pull_request) Successful in 47s

Details

Adversarial Review (Gadfly) / review (pull_request) Successful in 23m4s

Details

5b5ee4148e

When a run enables the critic (Ports.Critic set + RunnableAgent.Critic.Enabled),
the kernel no longer hard-caps it at MaxRuntime. MaxRuntime becomes the SOFT
trigger (passed to startCritic, used by the host critic as its wake + the base
for its extendable backstop); the critic's deadline-watch is the real hard
cancel. This restores mort's old agentexec two-tier timeout semantics — a
slow-but-progressing run (e.g. a parent agent blocked on a 30-min animate render)
is given room up to the critic's backstop instead of being killed at the nominal
MaxRuntime.

Specifics:
- run/executor.go: the WithTimeout(MaxRuntime) is now conditional. Non-critic
  runs keep the literal MaxRuntime kill (→ "timeout"). Critic-owned runs get a
  GENEROUS WithTimeout at the new Defaults.CriticAbsoluteMax (default 6h) as a
  failsafe ceiling only — it never fires before the critic's backstop, and it
  guarantees a broken/nil host handle can't run unbounded.
- run/critic.go: startCritic takes the resolved MaxRuntime as the soft trigger
  (falling back to Defaults.CriticSoftTimeout, then 90s), instead of always using
  the global CriticSoftTimeout.
- Defaults.CriticAbsoluteMax added (withFallbacks default 6h).
- Tests: non-critic dies at MaxRuntime; critic-owned survives past it; soft
  trigger == MaxRuntime.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X

gitea-actions bot commented

2026-06-30 15:04:18 +00:00

🪰 Gadfly — live review status

6/6 reviewers finished · updated 2026-06-30 15:49:58Z

`claude-code/opus` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — Minor issues
✅ maintainability — No material issues found
✅ performance — No material issues found
✅ error-handling — No material issues found

`claude-code/opus:max` · claude-code — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — Minor issues
✅ performance — No material issues found
✅ error-handling — No material issues found

`deepseek-v4-pro:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — No material issues found
✅ performance — No material issues found
✅ error-handling — No material issues found

`glm-5.2:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
✅ correctness — No material issues found
✅ maintainability — No material issues found
✅ performance — No material issues found
✅ error-handling — No material issues found

`kimi-k2.6:cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
⚠️ correctness — could not complete
✅ maintainability — No material issues found
✅ performance — No material issues found
✅ error-handling — No material issues found

`qwen3.5:397b-cloud` · ollama-cloud — ✅ done

✅ security — No material issues found
⚠️ correctness — could not complete
✅ maintainability — Minor issues
✅ performance — Minor issues
✅ error-handling — No material issues found

_{Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.}

## 🪰 Gadfly — live review status 6/6 reviewers finished · updated 2026-06-30 15:49:58Z #### `claude-code/opus` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/opus:max` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `deepseek-v4-pro:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `glm-5.2:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `kimi-k2.6:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ⚠️ **correctness** — could not complete - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `qwen3.5:397b-cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ⚠️ **correctness** — could not complete - ✅ **maintainability** — Minor issues - ✅ **performance** — Minor issues - ✅ **error-handling** — No material issues found Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.

gitea-actions bot commented

2026-06-30 15:27:21 +00:00

🪰 Gadfly review — consensus across 6 models

Verdict: Minor issues · 5 findings (0 with multi-model agreement)

5 single-model findings (lower confidence)

	Finding	Where	Model	Lens
🟠	Nested 24h timeout wastes timer resources in unsupervised-run failsafe path	`run/executor.go:315`	qwen3.5:397b-cloud	performance
🟡	softTrigger=MaxRuntime is overloaded as the default battery's idle-stall window (critic.go:262), delaying hang detection from ~90s to a full MaxRuntime; comment 'reviews once the run exceeds its nominal budget' misdescribes the idle-based default behavior	`run/critic.go:44`	claude-code/opus	correctness
🟡	Unusual defer-in-conditional pattern in unsupervised failsafe	`run/executor.go:336`	qwen3.5:397b-cloud	maintainability
⚪	Unreachable 90s defensive fallback reintroduces a bare magic literal that was previously a named default (CriticSoftTimeout)	`run/critic.go:56`	claude-code/opus:max	maintainability
⚪	Comment references external system convar value	`run/executor.go:37`	qwen3.5:397b-cloud	maintainability

Per-model detail

claude-code/opus (claude-code) — Minor issues

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

I reviewed this change strictly through the Security lens (authz, injection, SSRF, deserialization, secret leakage, input validation, untrusted-data handling, insecure defaults, resource/DoS bounding). This is a timeout-semantics change in run/executor.go + run/critic.go; it touches no parsing, no authz, no external I/O, and no secret handling. The one security-relevant axis is resource bounding / availability, which I traced carefully:

Critic-owned runs no longer hard-capped at MaxRuntime. Verified executor.go:304-340: for a critic-owned run the hard ceiling becomes Defaults.CriticAbsoluteMax (default 24h, executor.go:65-67). The effective bound is normally the critic's extendable backstop, enforced by the deadline-watch in critic.go:62-90. I confirmed the shipped default handle (critic/critic.go:143) always initializes deadline = now + softTimeout × backstopMul — i.e. never zero — so a healthy supervised run is always bounded well below the 24h ceiling. The 24h ceiling is a failsafe, not the normal bound.
Unsupervised failsafe. executor.go:336-340 re-tightens to MaxRuntime when the host Monitor returns a nil handle, preventing an enabled-but-unarmed run from holding a worker slot up to 24h. Verified the criticOwns && critic == nil predicate matches startCritic's nil-handle return (critic.go:51-61), so the two can't drift (shared criticOwnsDeadline). The test TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime exercises exactly this.
Caller cancellation still propagates. executor.go:319 re-merges the caller ctx via MergeCancellation after the WithoutCancel detach, so a cancelled/disconnected caller still tears the run down — no detach regression that would let an abandoned run linger.
Insecure-default check on the new constant. 24h CriticAbsoluteMax is a deliberately-loose failsafe ceiling, not a per-run budget; the host (mort) clamps its own backstop to 6h and the default battery's backstop is far smaller. I re-derived that the ceiling only bites for a misbehaving custom host handle that returns a permanently zero Deadline() — which contradicts the shipped default's behavior and is explicitly called out as defence-in-depth. Not a material insecure default.

One non-blocking observation (defense-in-depth only, not a finding): the unsupervised failsafe keys on critic == nil, so it does not cover a custom host handle that arms but leaves Deadline() zero forever — such a run would be bound only by the 24h ceiling. The shipped default never produces that state, so this is hypothetical and arguably a correctness/robustness concern rather than security; I'm not raising it as a finding.

Nothing in my lane is materially wrong.

🎯 Correctness — Minor issues

I've verified the finding against the actual source. Every cited line checks out:

critic/critic.go:262 — the default battery's tick gates escalation on idle := now - lastActivity; if idle < h.softTimeout { skip }. Confirmed it is idleness-based, not elapsed-runtime-based.
critic/critic.go:143 — deadline = now.Add(softTimeout × backstopMul), confirming softTimeout doubles as the backstop basis.
run/critic.go:43-45 — the doc comment does assert "the critic first reviews once the run exceeds its nominal budget," and the sole caller passes maxRuntime as softTrigger, so the single value now feeds both the idle-stall window and the backstop. Confirmed.
The removal of CriticSoftTimeout (was default 90s, doc "idle window before the critic wakes") is confirmed from the diff; the idle window is now forced to equal MaxRuntime.

The semantics/documentation concern is real and confirmable, so it survives.

VERDICT: Minor issues

The two-tier rewrite in run/executor.go is logically sound and I could not find a hard logic bug. The context layering is correct: in the critic-owned path the run is bounded by CriticAbsoluteMax (runaway guard) + the critic's cancelCause-driven deadline-watch; in the unsupervised-handle path it's correctly tightened back to maxRuntime. The removal of CriticSoftTimeout is clean — nothing fails to compile.

One substantive correctness/semantics concern:

run/critic.go:43-45 (and the new executor.go call passing maxRuntime as softTrigger) — the soft-trigger value is overloaded, and the comment misdescribes the shipped default's behavior. The doc says "the critic first reviews once the run exceeds its nominal budget." But the shipped default critic battery does not review on elapsed-runtime; it escalates on idleness: critic/critic.go:262 gates on idle := now - lastActivity; if idle < h.softTimeout { skip }. So the same value now plays two roles for the default battery: (a) the stall-detection idle window, and (b) the backstop basis (deadline = now + softTimeout × backstopMul, critic.go:143). Before this PR the idle window was the independent CriticSoftTimeout (default 90s, doc: "idle window before the critic wakes"); this PR forces it to equal MaxRuntime. Concrete effect for a default-battery host with a 30-min agent: a genuinely hung run is no longer flagged after ~90s of inactivity but only after a full 30 minutes of inactivity — the watchdog's primary stall-detection purpose is materially weakened, and a host can no longer express "review after 90s idle but allow up to MaxRuntime total" because the knob that separated those two meanings was deleted. This may be a deliberate trade-off (it also prevents false-killing a legit long single tool call, which is the bug the PR targets), but the comment asserting "reviews once the run exceeds its nominal budget" is inaccurate for the default battery. Suggest either keeping a distinct idle-window knob or correcting the comment to describe the actual idle-based semantics. Verified by reading critic/critic.go:125-149 (Monitor) and :251-268 (tick/idle gate).

Non-blocking note (not a code defect): the PR description says CriticAbsoluteMax defaults to 6h, but the code defaults to 24h (executor.go withFallbacks, and the struct comment consistently says 24h). The code is internally consistent; only the PR prose is wrong.

🧹 Code cleanliness & maintainability — No material issues found

I've now verified everything. The commit message is decisive on the draft's only finding.

The draft's single finding rests on "concrete evidence this already drifted: the PR description repeatedly says the default is 6h, while the code... say 24h." That evidence is false. The commit body shows the PR deliberately changed CriticAbsoluteMax from 6h to 24h ("CriticAbsoluteMax was 6h... Now CriticAbsoluteMax is a 24h RUNAWAY guard"), precisely because 6h was too low. The 6h that remains in the comments is mort's host convar (mort's 6h convar), a deliberately distinct value from the 24h executus default — the comments correctly distinguish the two. There is no drift; the PR documents an intentional value change.

Stripped of that false evidence, the finding reduces to "the constant 24h is also mentioned in a third comment" (lines 37, 66, 294) — a trivial, speculative doc nit the draft itself concedes is partly house pattern. It cannot be confirmed as a material issue, so it drops. The draft's positive prose (rename complete — no CriticSoftTimeout refs remain; single caller at executor.go:327 passing maxRuntime; maxRuntime>0 guaranteed by withFallbacks) all checks out, but those are not findings.

With the only finding dropped, the verdict resets.

VERDICT: No material issues found

This PR is small, focused, and well-documented. The CriticSoftTimeout → CriticAbsoluteMax rename is complete (no stale references remain in any .go, README.md, CLAUDE.md, or examples/). startCritic has a single caller (run/executor.go:327) that passes the resolved maxRuntime, which withFallbacks (executor.go:50) guarantees is >0. The criticOwnsDeadline extraction (critic.go:29) is a clean dedup shared by Run and startCritic. The new test helpers are well-factored and the capturingCritic interface-typed handle correctly exercises the nil-handle path.

The "default 24h vs 6h" concern in the draft does not hold: the comments deliberately distinguish the executus default ceiling (24h) from mort's host backstop clamp (6h), and the PR intentionally moved the default from 6h to 24h — there is no drift.

⚡ Performance — No material issues found

I have enough to assess the performance lens. Let me confirm one thing about the deadline-watch cadence relative to the new much-longer-lived runs.

The relevant facts I verified by reading the code:

run/critic.go:14 — criticDeadlineCheck = time.Second; the watch goroutine (startCritic, lines 63–90) ticks once/sec doing a cheap h.Deadline() compare. This is pre-existing and unchanged by the PR.
run/executor.go:315 — the new context.WithTimeout(..., hardCap) and the failsafe re-wrap at :338 are O(1) per run, with their timers stopped via defer cancelTimeout() / defer cancelUnsupervised(). No timer/goroutine leak.

VERDICT: No material issues found

Reviewing strictly through the ⚡ Performance lens:

The change is timeout-ownership plumbing. Every construct it adds — criticOwnsDeadline (a two-field predicate), one extra context.WithTimeout at hardCap, and the conditional unsupervised re-wrap — executes once per Run, not inside the agent loop or any tool/step hot path. No new per-iteration work, allocation, or copy in stepObserver (executor.go:353).
The new 24h CriticAbsoluteMax WithTimeout does not create a lingering timer: Go schedules it via time.AfterFunc and defer cancelTimeout() stops it when Run returns, so a fast-completing run pays nothing measurable. Same for the failsafe's cancelUnsupervised.
The 1-second deadline-watch ticker is the only recurring cost, and the PR neither adds it nor changes its cadence. It is one cheap goroutine per critic-owned run (a small population — mort's general, etc.), so the fact that such runs may now legitimately live to the backstop (minutes/hours) instead of being killed at MaxRuntime adds no per-tick cost beyond an already-trivial Deadline() comparison. Not a regression I can justify flagging.
No N+1, no unbounded growth, no blocking call introduced on a hot path. The longer-lived runs are the intended behavior and are explicitly bounded (critic backstop, or maxRuntime failsafe when unsupervised), so there's no unbounded-resource-hold concern within this lens.

Nothing in the performance lane is materially wrong here.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

I reviewed this PR strictly through the error-handling & edge-cases lens, verifying the diff against the checked-out code rather than the hunks alone.

What I verified:

Nil-handle / unsupervised path is correctly handled. startCritic returns (nil, noop) both when !criticOwnsDeadline and when Monitor returns a nil handle (run/critic.go:51, :59). The re-wrap guard at run/executor.go:336 uses the same criticOwnsDeadline predicate, so the "critic enabled but no handle armed" case correctly tightens the ceiling back from 24h to maxRuntime. TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime exercises exactly this (nil interface handle → falls back). No drift between the two decision sites.
Degenerate MaxRuntime > CriticAbsoluteMax is guarded (run/executor.go:311-313): hardCap is clamped up to maxRuntime, so the ceiling can never be shorter than the nominal budget. No off-by-direction bug.
Non-positive softTrigger has the 90s floor (run/critic.go:55); withFallbacks guarantees maxRuntime > 0 (:50-52) so it's defensive-only, as commented — not dead-but-wrong.
Panic isolation intact. The deadline-watch goroutine recovers from a host Deadline()/KillCause() panic on its own goroutine (run/critic.go:67); a panic in the deferred stopCritic→h.Stop() unwinds into the top-level recover() at executor.go:139 (registered first, runs last), so it still becomes Result.Err rather than crashing.
No cleanup/leak introduced. Every new context layer (cancelTimeout, cancelCause, mergeCancel, cancelUnsupervised) is defer-cancelled. checkpointCause is correctly set after the unsupervised re-wrap (executor.go:344) so it reads the context the loop actually runs on.
No dangling references. The renamed CriticSoftTimeout field has zero remaining references in .go files; the only startCritic call site was updated to the new 5-arg signature. The diff compiles cleanly across files.

Non-blocking observation (not a finding): the unsupervised-failsafe only tightens the ceiling when the handle is nil. A handle that is non-nil but never advances Deadline() (a "live but lazy/broken" host critic) is bounded only by CriticAbsoluteMax (kernel default 24h). This is the documented runaway-guard behavior and is exercised by TestCriticOwnsDeadline_SurvivesPastMaxRuntime, so it's intentional, not a bug — I note it only because the asymmetry (nil → maxRuntime; live-but-zero-deadline → 24h) is a real edge case a host should be aware of.

Outside my lens: the PR description says CriticAbsoluteMax defaults to 6h, but the code defaults to 24h (executor.go:66) — a description/doc mismatch worth a correctness/docs pass.

claude-code/opus:max (claude-code) — Minor issues

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

This change is purely about timeout/deadline ownership semantics for critic-owned runs. I've verified the key paths (run/critic.go:29-95, run/executor.go:304-344).

Through the security lens specifically:

No authn/authz surface touched — this is internal run-lifecycle wiring, no caller-identity or permission decision changes.
No injection / untrusted-data sink introduced. The only place untrusted-ish data flows is critic.recordToolStart(call.Name, string(call.Arguments)) (executor.go:371) and the KillCause().Error() string wrapped into ErrCriticKill (critic.go:82) — both are forwarded to host-side handles / error chains, not into a shell, SQL, path, or template. Pre-existing behavior, unchanged by this diff.
No deserialization, SSRF, or secret handling in scope.
Insecure-default consideration (resource bounding): the diff replaces the unconditional MaxRuntime hard cap with a CriticAbsoluteMax (default 24h) ceiling for critic-owned runs. This loosens the worst-case resource-holding bound, but it is deliberate and still bounded: the nil-handle failsafe (executor.go:336-340) re-tightens to MaxRuntime when no supervisor arms, the ceiling is never shorter than MaxRuntime, and a healthy host clamps its own backstop well below the ceiling. This is an availability/DoS-resilience tuning question (24h is a long slot to hold), but it does not constitute a security regression in my lane — it's a resource-management/correctness concern that the correctness/availability lens owns.

Nothing in the security lens is materially wrong.

VERDICT: No material issues found

🎯 Correctness — No material issues found

I've verified the full two-tier timeout flow against the actual code (run/executor.go, run/critic.go, the critic battery in critic/critic.go, and the Critic port + CriticConfig contracts). Here is my correctness review.

VERDICT: No material issues found

The two-tier timeout logic is correct and, importantly, consistent with the documented design intent — which is what makes this a genuine fix rather than a behavior change.

What I checked and confirmed sound:

criticOwnsDeadline gating is consistent across both call sites. executor.go:304 (choosing hardCap) and critic.go:51 (the arm gate) use the same predicate, so the "did we widen the cap?" decision and the "did we arm the watch?" decision can't disagree. (Read both.)
Unsupervised-run failsafe is correct. criticOwns && critic == nil (executor.go:336) is reachable only when Monitor returned a nil handle (the other nil-return path in startCritic requires !criticOwns). Re-wrapping runCtx at maxRuntime correctly tightens an armed-but-unsupervised run back to the nominal budget. The checkpointCause closure is set after this re-wrap (executor.go:344), so it reads the context the loop actually runs on. Verified against the test TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime.
hardCap < maxRuntime clamp (executor.go:311) correctly handles the degenerate MaxRuntime > CriticAbsoluteMax config.
Soft trigger = MaxRuntime is the correct value, not a regression. I was suspicious that flipping soft from the dedicated CriticSoftTimeout (90s) to MaxRuntime would mis-drive the battery, since critic/critic.go uses softTimeout for three things: the backstop base (deadline = now + soft×mul, line 143), the idle-escalation threshold (line 262), and the poll interval (soft/2, line 132). But the pre-existing contract already nails the intended meaning: CriticConfig.BackstopMultiplier is documented (run/agent.go:83) as "a multiple of the soft trigger (MaxRuntime)". So the old code passing CriticSoftTimeout was the value that contradicted the design; this PR aligns it. The backstop now correctly resolves to MaxRuntime × multiplier, matching the PR's stated semantics.
Field removal is clean. grep for CriticSoftTimeout across the whole module returns zero hits — no dangling references, no compile break from dropping the field.

Two non-blocking observations (neither is a code defect):

PR-description / code default mismatch. The PR description repeatedly says CriticAbsoluteMax defaults to 6h, but the code (executor.go:66) and its own doc comment say 24h. The code is internally consistent and is the correct choice: the runaway ceiling must sit strictly above the host's own backstop clamp (mort's 6h), so 24h > 6h is right and a 6h default would be wrong (it could pre-empt a healthy supervised run). Only the PR prose is stale — worth fixing the description so a future reader doesn't "correct" the code down to 6h.
Doc nuance in critic.go:44. The comment says the critic "first reviews once the run exceeds its nominal budget," but the battery actually escalates on idle time (idle >= softTimeout), not total elapsed — a progressing run whose activity clock keeps refreshing may never be reviewed before the backstop. This is the battery's unchanged behavior, not something this PR alters; the wording just slightly oversells when a review fires. Cosmetic.

I did not find any introduced logic bug in my lens.

🧹 Code cleanliness & maintainability — Minor issues

All claims verified against the code:

run/critic.go:54-57: the soft <= 0 → 90s branch confirmed — soft = softTrigger, and the sole caller (executor.go:327) passes maxRuntime, which executor.go:158-160 resolves to either ra.MaxRuntime or Defaults.MaxRuntime. New runs withFallbacks() (executor.go:104) guaranteeing Defaults.MaxRuntime >= 60s (executor.go:50-51). So softTrigger is always > 0 and the branch is unreachable. The bare 90 * time.Second literal replaces the former named CriticSoftTimeout field. Confirmed; only one caller exists.
No stale CriticSoftTimeout references remain (grep clean).
executor.go:66 defaults CriticAbsoluteMax to 24h; the field doc (executor.go:38) cites mort's 6h host clamp. Internally consistent, so the "6h" PR-description note holds.

The finding survives. Outputting the corrected review.

VERDICT: Minor issues

Through the cleanliness/maintainability lens this is a careful, well-structured change. The new criticOwnsDeadline helper (run/critic.go:29) sensibly factors out the predicate so Run and startCritic can't drift, names are clear (hardCap, criticOwns, softTrigger), and the removed CriticSoftTimeout field is gone cleanly — no stale references remain in .go, .md, or examples/ (verified via grep). Tests are descriptively named and commented. Only small notes:

run/critic.go:54-57 — the soft <= 0 → 90s fallback is now unreachable dead code with a bare magic literal. Verified: the sole caller (run/executor.go:327) always passes maxRuntime, which is resolved at executor.go:158-160 from ra.MaxRuntime or e.cfg.Defaults.MaxRuntime, and cfg.Defaults = cfg.Defaults.withFallbacks() (executor.go:104) guarantees MaxRuntime >= 60s (executor.go:50-51). So softTrigger is always > 0 and the branch never fires. Previously this 90s lived as a named, fallback-defaulted field (CriticSoftTimeout); collapsing it into an inline literal inside an unreachable branch is a minor regression in maintainability. The code even documents it as dead twice (the doc comment at critic.go:47 and the inline comment at critic.go:56). Low-churn fix: either drop the branch and assert/document the precondition, or keep it but lift 90 * time.Second to a named const so the magic number isn't bare. Trivial; the current form is harmless and clearly commented.
PR-description vs. code drift (not a code defect): the description says CriticAbsoluteMax defaults to "6h", but the code defaults to 24h (executor.go:66, and the field doc at executor.go:38 and inline comment at executor.go:294 both say 24h — internally consistent). The "6h" in the description is mort's host clamp, not the kernel default. Worth correcting the PR text so future readers aren't misled; nothing to change in code.

No other cleanliness problems: the triple documentation of the two-tier semantics (field doc, the Run block comment at executor.go:279-303, and the startCritic doc) is verbose but each block targets a distinct reader and matches this repo's documented // Why: house style, so I'm not flagging it as a defect.

⚡ Performance — No material issues found

I've verified the relevant code. The polling goroutine (criticDeadlineCheck = time.Second, run/critic.go:14) is pre-existing and untouched by this diff — the soft-trigger change (CriticSoftTimeout → MaxRuntime) doesn't alter its cadence. The unsupervised failsafe adds one trivial extra context.WithTimeout layer. No new allocations in hot loops, no N+1, no quadratic behavior.

The one performance-adjacent effect is that critic-owned runs now hold their lane/worker slot up to the extendable backstop (host-clamped 6h) instead of being hard-killed at MaxRuntime (e.g. 30m) — and the per-run 1s-ticker goroutine therefore lives proportionally longer (~21k ticks over 6h vs ~1.8k over 30m). But (a) that longer hold is the deliberate, documented intent of the change, not an accidental regression; (b) the per-tick cost is a single Deadline()/KillCause() read — negligible even across many concurrent runs; and (c) the unbounded-hold risk is explicitly bounded by the CriticAbsoluteMax ceiling and the unsupervised-failsafe re-tighten. None of this rises to a material efficiency problem in my lane.

VERDICT: No material issues found

No N+1, no work added inside the agent loop, no unbounded growth introduced, no new blocking calls on the hot path. The deadline-watch ticker is unchanged by this diff and its cost is immaterial. The longer slot-hold for critic-owned runs is intended behavior, bounded by the runaway ceiling and the nil-handle failsafe.

🧯 Error handling & edge cases — No material issues found

I've verified the change against the checked-out code. Summary of what I checked through the error handling & edge cases lens:

cfg.Defaults is normalized at construction (run.New calls withFallbacks(), executor.go:104), so CriticAbsoluteMax is guaranteed >0 (24h) before it's ever used as a WithTimeout duration at executor.go:315. No zero-duration / instantly-expired context risk.
maxRuntime is guaranteed >0 (executor.go:158-161 → Defaults.MaxRuntime floor of 60s), so the soft trigger passed to startCritic and the hardCap < maxRuntime clamp (executor.go:311) are well-defined. The soft <= 0 → 90s guard in startCritic (critic.go:55) is genuinely just defensive.
The unsupervised-run failsafe (executor.go:336-340) correctly re-evaluates criticOwns so it only fires on the nil-handle case (not the no-critic case), and tightens the cap back to maxRuntime. cancelUnsupervised is deferred — no context leak. Verified the nil-handle path in capturingCritic.Monitor returns a nil interface, so the test exercises it.
checkpointCause reordering (moved to executor.go:344, after the re-wrap) reads the actual loop context; the top-level defer is nil-safe (checkpointCause != nil, executor.go:144), so the early-return/panic window before assignment is handled.
All criticBinding methods are nil-receiver-safe (critic.go:97-139), so the critic == nil failsafe path doesn't panic on recordStep/maxStepsOption/drainSteer.
No stale references to the removed CriticSoftTimeout field anywhere in the repo (grep clean); status classification (statusFor) is untouched and still nil-safe.
Deadline-watch goroutine has a recover() against a host Deadline() panic (critic.go:67), and exits on both done and runCtx.Done() — no goroutine leak.

The unhappy paths this diff introduces (zero/negative durations, nil critic handle, degenerate MaxRuntime > AbsoluteMax, cleanup on every exit) are each handled.

One pre-existing item I noticed but is outside this diff and outside my lens (correctness): a natural backstop expiry cancels via cancelCause(context.DeadlineExceeded) (critic.go:84), but statusFor only consults context.Cause for ErrCriticKill (executor.go:529) — so a natural backstop expiry likely classifies as cancelled, not the timeout the critic.go:39 comment promises. Not introduced here; flagging only as it brushes the deadline logic.

VERDICT: No material issues found (error handling & edge cases lens)

The two-tier timeout change is defensively solid: durations are floored before use, the nil-handle and no-critic paths are both bounded, contexts are cancelled on every exit path, and the new test file exercises the hard-cap, soft-trigger, survival, and nil-handle-fallback edge cases.

deepseek-v4-pro:cloud (ollama-cloud) — No material issues found

Verdict: No material issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

I reviewed this change through the security lens: authn/authz, injection, SSRF, unsafe deserialization, secret leakage, input validation of untrusted data, and insecure defaults. The diff is purely about context-timeout management — converting MaxRuntime from a hard WithTimeout kill to a soft trigger when a critic owns the deadline, with a generous CriticAbsoluteMax runaway ceiling as defense-in-depth.

No injection vectors: no SQL, command, path, or template construction is introduced. The only "input" flowing through new paths is maxRuntime (a time.Duration), which is passed to the host Critic.Monitor interface — the host is responsible for its own validation.
No authn/authz gaps: the criticOwnsDeadline predicate gates the two-tier path on Ports.Critic != nil && ra.Critic.Enabled, both controlled by the host/agent config, not by untrusted user input.
No secret/credential leakage: no secrets are handled, logged, or transmitted.
No SSRF or unsafe deserialization: no network calls or deserialization are added.
Insecure defaults: the CriticAbsoluteMax default is 24h (not 6h as the PR description states — the code comment explains this is intentional: the host clamps to a smaller value, and the kernel ceiling must sit well above it). A 24h runaway ceiling for a broken critic is generous but is a deliberate defense-in-depth choice, not a vulnerability — the host's own clamp (e.g., mort's 6h convar) is the real bound. I verified the code at run/executor.go:65-66 and the comment at lines 32-42.
Unsupervised-run failsafe (executor.go:336-339): when the critic is enabled but Monitor returns nil, the run context is re-wrapped with the nominal MaxRuntime timeout. This correctly prevents an unsupervised run from consuming resources up to the 24h ceiling. I verified that startCritic launches no goroutine in this path (critic.go:58-60), so no dangling watcher exists on the old context.
TOCTOU: the criticOwnsDeadline predicate is called once at executor.go:304 and again inside startCritic at critic.go:51 — the same predicate gates both the ceiling choice and the arm/no-op decision, so they cannot drift.

No material security issues found.

🎯 Correctness — No material issues found

Verdict: No material issues found

I traced the full context chain, the two-tier timeout logic, the unsupervised failsafe, the checkpointCause timing, the statusFor classification, and the deadline-watch goroutine interaction. The logic is sound:

criticOwnsDeadline is the single predicate used in both Run (hard cap selection) and startCritic (arm/no-op gate) — they cannot drift.
Hard cap selection: non-critic → maxRuntime; critic-owned → max(CriticAbsoluteMax, maxRuntime). The floor guard handles degenerate configs correctly.
startCritic soft trigger: now receives the resolved maxRuntime (always > 0 via withFallbacks), with a 90s defensive floor that is genuinely unreachable from the sole caller.
Unsupervised failsafe: when criticOwns && critic == nil, runCtx is re-wrapped with WithTimeout(runCtx, maxRuntime). The child deadline is tighter than the parent's CriticAbsoluteMax, so it fires first — correct. The cancelCause function (which references the pre-wrap parent) is never called with a cause in this path because the deadline-watch goroutine was never started.
checkpointCause closure is set after the unsupervised re-wrap, capturing the context the loop actually runs on. No early returns exist between the old and new positions.
statusFor correctly classifies the unsupervised timeout as "timeout" via errors.Is(runErr, context.DeadlineExceeded).
Defaults.CriticSoftTimeout removal is clean — zero remaining references confirmed by grep.

One minor documentation discrepancy (not a code bug): the PR description says CriticAbsoluteMax default is 6h, but the code has 24h. The code is internally consistent (the 24h comment explains it must be "FAR beyond any realistic backstop" since the host clamps to 6h), so this is a PR-description error, not a logic defect.

🧹 Code cleanliness & maintainability — No material issues found

Verdict: No material issues found

The change is clean and well-structured through the maintainability lens:

criticOwnsDeadline is a well-extracted single predicate, used in both Run and startCritic, preventing the two decision points from drifting — exactly the right pattern.
CriticSoftTimeout removal is complete: zero remaining references across the repo (confirmed via grep). No dead config field left behind.
Defaults struct comment for CriticAbsoluteMax is thorough and explains the "why" (runaway guard, host clamps lower, never pre-empts a healthy run). The Run method comment repeats some of this, but that's appropriate for a caller-side explanation vs. a field-level contract.
checkpointCause reassignment after the unsupervised-failsafe re-wrap is correctly ordered and explicitly commented, so a future reader won't mistakenly move it back above the re-wrap.
Test file (critic_deadline_test.go) is well-organized: helpers (slowToolInvocation, slowModel, capturingCritic) are scoped to the file, tests are named clearly, and the capturingCritic type is a purposeful variant of the existing fakeCritic (captures soft trigger vs. ignores it) — not copy-paste.
The 90s defensive floor in startCritic is explicitly documented as unreachable by the sole caller; it's dead code by construction but the comment makes the intent unambiguous, so it won't confuse a future maintainer.

No duplicated logic, no leaky abstractions, no confusing names, no deeply nested or overly long functions introduced by this diff.

⚡ Performance — No material issues found

I've verified the full code in run/executor.go and run/critic.go, confirmed there are no other callers of startCritic, and confirmed CriticSoftTimeout is fully removed (no stale references). Let me now assess through the performance lens.

The changes are all per-run (not per-step or per-tool):

criticOwnsDeadline: two field reads, called once per Run(). Zero cost.
hardCap computation: a conditional + comparison, once per Run(). Zero cost.
startCritic signature change: passes a value already computed (maxRuntime). No new work.
Unsupervised failsafe WithTimeout: one extra context layer, only in the edge case where Monitor returns nil. Single allocation, negligible.
Context chain depth: unchanged in the normal path; +1 layer only in the unsupervised edge case. Not material.
Goroutine in startCritic: unchanged polling frequency (criticDeadlineCheck = 1s). No regression.
Defaults.CriticAbsoluteMax (24h) vs old CriticSoftTimeout (90s): both are just timer durations in the runtime heap — a 24h timer costs the same as a 90s timer.
No new allocations in hot paths (step observer, tool dispatch, critic binding methods are untouched).
No N+1, no unbounded growth, no blocking calls on hot paths, no quadratic behavior.

Verdict: No material issues found

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

I reviewed the diff through the 🧯 Error handling & edge cases lens. The change is defensive and well-considered:

startCritic soft-trigger fallback: The if soft <= 0 { soft = 90 * time.Second } guard at critic.go:55 is genuinely defensive — the sole caller (executor.go:327) passes maxRuntime, which is always > 0 after withFallbacks() (verified at executor.go:158-161 and executor.go:50-52). No reachable zero path.
Unsupervised-run failsafe (executor.go:336-340): When criticOwns && critic == nil, the code wraps runCtx with a tighter WithTimeout(maxRuntime). The cancelCause function remains tied to the parent WithCancelCause context, but since critic == nil means no goroutine was launched to call cancelCause, the only cancellation source is the new WithTimeout — context.Cause correctly returns DeadlineExceeded. The checkpointCause closure is set after this re-wrap (executor.go:344), so it reads the final context. Verified by tracing the full context chain at executor.go:315-344.
criticOwnsDeadline predicate (critic.go:29-31): Single source of truth used by both the hard-cap decision and the arm/no-op gate — they cannot drift. The nil-check on e.cfg.Ports.Critic is correct (interface nil check).
Nil-handle path in startCritic (critic.go:58-60): Monitor returning nil is handled cleanly — returns nil, noop, no goroutine launched, and the caller's unsupervised failsafe catches it.
Goroutine panic recovery (critic.go:67): The recover() in the deadline-watch goroutine is pre-existing, not introduced by this diff. Silent swallow is intentional per the comment.
Deferred cleanup ordering: cancelUnsupervised → stopCritic → mergeCancel → cancelCause(nil) → cancelTimeout — each cancels its own resource in correct reverse order. The conditional defer cancelUnsupervised() inside the if block is correct Go.
hardCap floor (executor.go:311-313): The if hardCap < maxRuntime { hardCap = maxRuntime } guard handles the degenerate config case cleanly.

No ignored errors, missing cleanup, nil-pointer risks, or unhandled edge cases found in the new code paths.

glm-5.2:cloud (ollama-cloud) — No material issues found

Verdict: No material issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Reviewed through the 🔒 Security lens. This is timeout-plumbing: no new untrusted-input paths, no authn/authz surfaces, no injection/SSRF/deserialization/secret-handling changes. criticOwnsDeadline (run/critic.go:29) gates only on operator-controlled config (Ports.Critic != nil && ra.Critic.Enabled); the softTrigger/maxRuntime values are agent/host config, not user-tainted.

Security-relevant behavior I verified:

Unsupervised-run failsafe (run/executor.go:336-340): when a critic-enabled run's Monitor returns a nil handle, the base context is re-tightened from the 24h runaway ceiling back to MaxRuntime via context.WithTimeout(runCtx, maxRuntime). Verified critic == nil is the exact nil-handle return path from startCritic (run/critic.go:59-61). This closes the obvious "critic enabled but never armed → run free for 24h" hole. Good defense-in-depth.
Ultimate runaway ceiling: even with a broken (non-nil) handle whose Deadline() always returns zero, the WithTimeout(..., CriticAbsoluteMax) at executor.go:315 still bounds the run. So there is no unbounded-execution path.

Minor (non-blocking) note, not a security vulnerability:

The PR description repeatedly states CriticAbsoluteMax defaults to 6h, but the code (run/executor.go:65-67) and its own comment default to 24h. From a threat-model standpoint an operator relying on the description would underestimate the worst-case runaway window (resource exhaustion / cost exposure for a broken-critic run) by 4×. The code is self-consistent; only the description is wrong. Suggest correcting the description to 24h so operators size the failsafe accurately. (Verified: withFallbacks sets 24 * time.Hour.)

No findings rise to a security defect; the timeout layering is bounded at every path.

🎯 Correctness — No material issues found

VerDICT: No material issues found

I traced the two-tier timeout logic end-to-end against the actual checked-out code and verified the context-layering semantics are correct.

Verified findings (all confirmed by reading the code):

No-critic path (run/executor.go:304-315): criticOwns is false → hardCap = maxRuntime → WithTimeout(..., maxRuntime). Legacy hard-cap preserved. ✓
Critic-owned, handle armed (run/critic.go:49-95, executor.go:317-327): base context gets CriticAbsoluteMax (24h) ceiling; critic goroutine calls cancelCause(ErrCriticKill|DeadlineExceeded) on the WithCancelCause layer. Cause propagates: critic-kill → Cause(runCtx) = ErrCriticKill → "killed"; backstop expiry → DeadlineExceeded → "timeout"; caller cancel flows through MergeCancellation. All three classification paths correct. ✓
Critic-owned, nil handle failsafe (executor.go:336-340): critic == nil → no deadline-watch goroutine was started (startCritic returns nil, noop), so the WithTimeout(runCtx, maxRuntime) re-wrap is the only sub-24h deadline. It fires at ~maxRuntime from now, Cause = DeadlineExceeded → "timeout". checkpointCause is set after the re-wrap (line 344) so it reads the context the loop runs on. Correct. ✓
Defers LIFO: cancelUnsupervised (innermost) runs before stopCritic (noop here) and the outer cancels — no leak. ✓
CriticSoftTimeout fully removed: grep across run/ shows zero remaining references; the withFallbacks now seeds CriticAbsoluteMax instead. Clean. ✓
criticOwnsDeadline single predicate used by both Run (ceiling choice) and startCritic (arm/no-op gate) — they cannot drift. ✓

One trivial, non-blocking note (doc vs code, not a runtime bug):

run/executor.go:66 sets the CriticAbsoluteMax default to 24h, and the code comment at line 294/308 says "default 24h", but the PR description repeatedly claims 6h ("withFallbacks default 6h", "clamped to 6h"). The code is internally consistent and 24h is strictly safer (more conservative ceiling; the host adapter clamps its own backstop regardless), so there is no behavioral defect — but the description would mislead an operator reading only the PR text. Suggest updating the description to 24h.

🧹 Code cleanliness & maintainability — No material issues found

Verdict: No material issues found

Through the code-cleanliness & maintainability lens, this change is well-structured:

Good refactor — single predicate. criticOwnsDeadline (run/critic.go:29) is the one source of truth shared by both Run (ceiling choice) and startCritic (arm/no-op gate), with a clear doc note that they share it "so the two can never drift." This eliminates the old duplicated e.cfg.Ports.Critic == nil || !ra.Critic.Enabled check and keeps the two-tier decision consistent.
Clean field swap. CriticSoftTimeout is fully removed — I grepped the whole repo and found zero remaining references. CriticAbsoluteMax replaces it with a thorough doc comment. No dangling/stale references.
Test helpers shared, not duplicated. The new critic_deadline_test.go is in run_test and reuses fakeCritic/fakeCriticHandle from critic_test.go rather than redefining them. slowToolInvocation/slowModel are factored out and reused across the new tests.
Ordering is deliberate and documented. The checkpointCause closure is set after the unsupervised-failsafe re-wrap, with a comment explaining why ("reads the context the loop runs on") — avoiding a subtle stale-closure bug a future maintainer could easily introduce.

Minor/trivial (not material):

run/critic_deadline_test.go:94 capturingCritic partially re-implements fakeCritic.Monitor (just adds a soft-capture); it could embed fakeCritic to avoid the near-duplicate, but it's a 4-line test helper, so churn isn't worth it.
The CriticAbsoluteMax field doc and the Run block comment are quite long (~11 lines / ~20 lines). They're accurate and genuinely help a future maintainer avoid re-breaking the two-tier semantics, so this is a style preference, not a defect.

⚡ Performance — No material issues found

VERDICT: No material issues found

run/executor.go:310 — e.cfg.Defaults.CriticAbsoluteMax is read from the withFallbacks()-resolved copy (applied in New at line 104), so it is the defaulted value (24h when unset). Correct, not a bug.
run/executor.go:336-339 — the unsupervised-failsafe WithTimeout re-wraps runCtx only on the rare nil-handle path (host Monitor returns nil). One extra timer allocation per such run; immaterial on the hot path and the checkpointCause closure is set afterward (line 344) so it reads the re-wrapped context. Correct.
run/critic.go:68 — the deadline-watch ticker interval (criticDeadlineCheck = time.Second, line 14) is unchanged by this diff; the soft trigger value does not affect the poll cadence. No regression; negligible per-in-flight-run cost. Correct.
No N+1, unbounded growth, new hot-loop work, or missing limits introduced. The only new per-run allocations are the existing context.WithTimeout plus, in the rare nil-handle case, one extra WithTimeout and a deferred cancel. criticOwnsDeadline is two field reads. No quadratic or blocking behavior on a hot path.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

I reviewed the diff through the error-handling & edge-cases lens, verifying against the checked-out code.

nil-handle failsafe (run/executor.go:336-340): when criticOwns && critic == nil, the run is re-wrapped with context.WithTimeout(runCtx, maxRuntime). I confirmed maxRuntime is always > 0 (resolved at executor.go:158-161 with a Defaults.MaxRuntime fallback that withFallbacks guarantees > 0), so this never creates a zero/burst timeout. The defer cancelUnsupervised() is registered in correct LIFO order after cancelCause/mergeCancel, so cleanup is complete on every exit path. Verified.
cause propagation for the failsafe ceiling (run/executor.go:315, statusFor at :520-538): the 24h WithTimeout parent → WithCancelCause child chain means a ceiling expiry surfaces as context.DeadlineExceeded via context.Cause(runCtx) → "timeout"; a critic kill surfaces as ErrCriticKill → "killed". The checkpointCause closure is deliberately assigned after the unsupervised re-wrap (:344) so it reads the context the loop actually runs on. Verified correct.
startCritic soft-trigger floor (run/critic.go:54-57): soft <= 0 falls back to 90s. The sole caller passes the resolved maxRuntime (always > 0), so the floor is pure defence — but it correctly handles a hypothetical non-positive caller, no panic/zero-duration Monitor call. Verified.
critic goroutine panic recovery (run/critic.go:67): defer func() { _ = recover() }() guards a panicking host Deadline()/KillCause() on its own goroutine (the main recover can't catch it). Verified present and appropriate.
re-wrap vs. goroutine reference aliasing: the unsupervised WithTimeout reassigns runCtx, but the critic deadline-watch goroutine only exists when critic != nil, and the re-wrap branch is gated on critic == nil — so the goroutine never observes a stale runCtx. No leak or missed cancellation. Verified by reading both branches.
MergeCancellation interaction (run/runengine.go:62-73): the merge goroutine cancels the merged context with the caller's cause on caller cancel; this composes correctly with both the criticOwns and non-critic paths. No swallowed error.

No ignored errors, missing defers, nil-deref, zero/negative-duration, or boundary issues found in this lens.

kimi-k2.6:cloud (ollama-cloud) — No material issues found

Verdict: No material issues found · ⚠️ 1/5 lens(es) errored — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

From a security lens, the change correctly restores the two-tier timeout semantics without introducing new trust-boundary gaps, injection vectors, or unsafe defaults.

The criticOwnsDeadline predicate is used consistently as the single gate for both the run context ceiling selection and the critic arm decision, preventing drift.
The CriticAbsoluteMax (default 24h) is explicitly documented as a generous runaway guard for a broken/nil host handle, not the normal path. The host is expected to clamp its own backstop to a much smaller value (e.g., mort's 6h convar). An operator can lower it via Defaults.CriticAbsoluteMax.
The unsupervised-run failsafe (criticOwns && critic == nil) correctly tightens the context back to MaxRuntime when the host Monitor returns no handle, preventing a run from escaping to the 24h ceiling if the critic fails to arm.
Input to startCritic's softTrigger is the resolved MaxRuntime, which withFallbacks guarantees is > 0; the <= 0 branch is purely defensive and unreachable from the actual caller.
statusFor preserves the kill-vs-timeout-vs-cancel distinction via context.Cause, with ErrCriticKill wrapped via %w so errors.Is matches correctly.

No authn/authz gaps, injection, SSRF, deserialization, secret leakage, or insecure-default issues were identified.

🎯 Correctness — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 10: context deadline exceeded

🧹 Code cleanliness & maintainability — No material issues found

No material issues found.

The change is structurally clean through the maintainability lens:

criticOwnsDeadline is a well-chosen extraction that centralises the two-tier-timeout predicate and prevents Run and startCritic from drifting out of sync.
CriticSoftTimeout is cleanly removed from Defaults; no stale references remain.
Test helpers in the new file are appropriately scoped and factored (slowToolInvocation, slowModel, capturingCritic).
Comments in executor.go and critic.go are verbose but justified by the semantic complexity of the two-tier timeout model.

⚡ Performance — No material issues found

No material issues found

Reviewed through the Performance lens. The change restores two-tier deadline semantics: critic-owned runs get a generous runaway ceiling (CriticAbsoluteMax, default 24 h) instead of MaxRuntime, while non-critic runs remain unchanged.

What I checked:

run/executor.go:304-344 — the new context chain for critic-owned runs adds at most one extra WithTimeout wrap (the unsupervised nil-handle fallback). This is per-run, not per-step or per-iteration, so it does not affect hot-path latency.
run/critic.go:62-95 — the deadline-watch goroutine and its 1-second ticker are unchanged.
MergeCancellation goroutine behavior is unchanged.
The 24-hour default timer is created once per run and immediately stopped via deferred cancelTimeout() on every exit path (including panic/recover), so there is no timer leak or heap pressure from long-duration timers.

No N+1 queries, no new allocations inside loops, no unbounded growth, no blocking calls on hot paths, and no quadratic behavior introduced. The diff is clean from a performance standpoint.

🧯 Error handling & edge cases — No material issues found

No material issues found.

The two-tier timeout logic is carefully structured with proper nil-safety, defer ordering, and edge-case handling:

The unsupervised-run fallback (criticOwns && critic == nil) correctly re-wraps runCtx with maxRuntime so a nil Monitor handle cannot accidentally run up to the 24h ceiling.
checkpointCause is assigned after the unsupervised re-wrap, so it reads the correct context the loop actually executes on.
All criticBinding methods are nil-safe, so the executor can call critic.recordStep, critic.drainSteer(), etc. unconditionally.
startCritic's deadline-watch goroutine defends against panics in h.Deadline() via recover().
withFallbacks guarantees CriticAbsoluteMax > 0, and the hardCap < maxRuntime clamp ensures the ceiling never undershoots the nominal budget even for degenerate configs.

No ignored errors, missing cleanups, nil panics, or unhandled boundary conditions were introduced.

qwen3.5:397b-cloud (ollama-cloud) — Minor issues

Verdict: Minor issues · ⚠️ 1/5 lens(es) errored — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

After reviewing the changes through the security lens and verifying against the checked-out code:

What I verified:

Read run/executor.go (lines 1-100, 260-360) to trace the timeout context chain
Read run/critic.go (lines 1-100) to verify the deadline-watch goroutine logic
Confirmed CriticSoftTimeout is no longer referenced (grep returned no results)
Verified the checkpointCause closure is set AFTER the unsupervised failsafe re-wrap (line 344)

Security analysis:

Timeout escape paths: The unsupervised failsafe (lines 336-340) correctly tightens the context back to maxRuntime when criticOwns && critic == nil. The context chain is: timeoutCtx (24h ceiling) → runCtx (cancelCause) → new runCtx (maxRuntime timeout). When maxRuntime expires, cancellation propagates correctly.
Critic handle with no deadline: A non-nil critic handle whose Deadline() always returns zero would leave the run bounded only by CriticAbsoluteMax (24h default). This is intentional design - confirmed by TestCriticOwnsDeadline_SurvivesPastMaxRuntime which explicitly tests this scenario with fakeCriticHandle{} (Deadline returns zero). The 24h ceiling is the documented runaway guard.
Input validation: The softTrigger parameter comes from maxRuntime, which withFallbacks() guarantees is >0. The defensive check at critic.go:55-56 is a harmless backup.
No injection/leakage/authz issues: The changes are purely timeout semantics - no untrusted input handling, no secrets, no external calls modified.

The 24h default for CriticAbsoluteMax could be considered generous (mort uses 6h per the PR description), but this is a configuration choice, not a code vulnerability. The mechanism correctly implements the two-tier timeout semantics.

🎯 Correctness — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: max steps reached without a final answer (max 14)

🧹 Code cleanliness & maintainability — Minor issues

VERDICT: Minor issues found

Findings:

run/executor.go:336-340 — Unusual defer-in-conditional pattern: The unsupervised-run failsafe wraps runCtx with a new timeout and defers cancellation inside the if block. While this works in Go (the defer is registered conditionally and captures the variable), this pattern is less common and slightly harder to read than the alternative of declaring the variable outside and using a named cleanup function. Suggested fix: Extract to a small helper or use a more conventional pattern:
```
var cancelUnsupervised context.CancelFunc
if criticOwns && critic == nil {
    runCtx, cancelUnsupervised = context.WithTimeout(runCtx, maxRuntime)
}
defer func() { if cancelUnsupervised != nil { cancelUnsupervised() } }()
```
run/executor.go:37-39 — External system reference in comment: The CriticAbsoluteMax field comment references mort's agents.critic.absolute_max_seconds = 6h — an external convar from a different codebase. While contextually helpful, embedding external system configuration details in kernel comments creates a maintenance coupling. Suggested fix: Rephrase to describe the relationship without hardcoding external values (e.g., "the host clamps its own backstop to a smaller absolute max").

⚡ Performance — Minor issues

The finding is verified. The code does create a nested timeout structure where:

Line 315: timeoutCtx is created with hardCap (24h when critic owns)
Lines 336-340: When criticOwns && critic == nil, a new timeout wraps runCtx with maxRuntime

The outer 24h timer remains scheduled until the function exits and the defer at line 316 fires. This is technically wasteful, though it's a rare edge case (defensive path).

The finding is accurate and survives verification. The verdict of "Minor issues" is appropriate since this is a resource efficiency concern in a rare edge case, not a correctness bug.

VERDICT: Minor issues

Findings:

run/executor.go:315-340 — Inefficient nested timeout in unsupervised-run failsafe. When criticOwns && critic == nil (critic configured/enabled but host returned nil handle), the code first creates timeoutCtx with the generous 24h CriticAbsoluteMax (line 315), then wraps it with a nested WithTimeout(runCtx, maxRuntime) (line 338). The outer 24h timer remains scheduled even though it will never fire (the inner maxRuntime timeout fires first). In a high-throughput system where critic handles occasionally fail to arm, this accumulates dormant 24h timers that waste resources. Fix: Check the critic handle validity before creating the generous timeout, or cancel/replace the outer timeout when tightening: cancelTimeout(); timeoutCtx, cancelTimeout = context.WithTimeout(context.WithoutCancel(ctx), maxRuntime). (Verified by reading run/executor.go:315-340 and run/executor.go:65-66 confirming 24h default)

🧯 Error handling & edge cases — No material issues found

All findings in the draft review are either style suggestions or based on misunderstandings. The code is well-documented, tested, and works as designed:

The context.Cause usage is standard Go 1.20+ behavior and is already well-documented in the code comments.
The silent return when Monitor returns nil is intentional defensive behavior, as documented in the code comment ("mort's adapter always arms when the flag is set, so this is pure defence in depth").
The CriticAbsoluteMax default of 24h vs the host's 6h clamp is intentional design - the 24h is the kernel ceiling while 6h is the host clamp, as clearly documented in the struct comment at lines 32-42.

No material issues found.

Verdict: No material issues found

After reviewing the changes through the error handling & edge cases lens and verifying against the checked-out code, all findings in the draft review are either style suggestions or based on misunderstandings of the intentional design:

The unsupervised failsafe at executor.go:336-340 correctly re-wraps the context and sets checkpointCause after the re-wrap, ensuring proper cancellation cause propagation. The behavior is tested by TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime.
The silent return when Monitor returns nil at critic.go:58-60 is intentional defensive behavior, as documented in the code comment at executor.go:330-335.
The CriticAbsoluteMax default of 24h at executor.go:66 is intentionally larger than the host's 6h clamp, as explained in the struct comment at lines 32-42.

_{Automated adversarial review by Gadfly — consensus across the model swarm. Advisory only — does not block merge.}

## 🪰 Gadfly review — consensus across 6 models **Verdict: Minor issues** · 5 findings (0 with multi-model agreement) <details><summary>5 single-model findings (lower confidence)</summary> | | Finding | Where | Model | Lens | |--|--|--|--|--| | 🟠 | Nested 24h timeout wastes timer resources in unsupervised-run failsafe path | `run/executor.go:315` | qwen3.5:397b-cloud | performance | | 🟡 | softTrigger=MaxRuntime is overloaded as the default battery's idle-stall window (critic.go:262), delaying hang detection from ~90s to a full MaxRuntime; comment 'reviews once the run exceeds its nominal budget' misdescribes the idle-based default behavior | `run/critic.go:44` | claude-code/opus | correctness | | 🟡 | Unusual defer-in-conditional pattern in unsupervised failsafe | `run/executor.go:336` | qwen3.5:397b-cloud | maintainability | | ⚪ | Unreachable 90s defensive fallback reintroduces a bare magic literal that was previously a named default (CriticSoftTimeout) | `run/critic.go:56` | claude-code/opus:max | maintainability | | ⚪ | Comment references external system convar value | `run/executor.go:37` | qwen3.5:397b-cloud | maintainability | </details> <details><summary>Per-model detail</summary> <details><summary>claude-code/opus (claude-code) — Minor issues</summary> **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> ## VERDICT: No material issues found I reviewed this change strictly through the **Security** lens (authz, injection, SSRF, deserialization, secret leakage, input validation, untrusted-data handling, insecure defaults, resource/DoS bounding). This is a timeout-semantics change in `run/executor.go` + `run/critic.go`; it touches no parsing, no authz, no external I/O, and no secret handling. The one security-relevant axis is **resource bounding / availability**, which I traced carefully: - **Critic-owned runs no longer hard-capped at `MaxRuntime`.** Verified `executor.go:304-340`: for a critic-owned run the hard ceiling becomes `Defaults.CriticAbsoluteMax` (default 24h, `executor.go:65-67`). The *effective* bound is normally the critic's extendable backstop, enforced by the deadline-watch in `critic.go:62-90`. I confirmed the shipped default handle (`critic/critic.go:143`) always initializes `deadline = now + softTimeout × backstopMul` — i.e. **never zero** — so a healthy supervised run is always bounded well below the 24h ceiling. The 24h ceiling is a failsafe, not the normal bound. - **Unsupervised failsafe.** `executor.go:336-340` re-tightens to `MaxRuntime` when the host `Monitor` returns a nil handle, preventing an enabled-but-unarmed run from holding a worker slot up to 24h. Verified the `criticOwns && critic == nil` predicate matches `startCritic`'s nil-handle return (`critic.go:51-61`), so the two can't drift (shared `criticOwnsDeadline`). The test `TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime` exercises exactly this. - **Caller cancellation still propagates.** `executor.go:319` re-merges the caller ctx via `MergeCancellation` after the `WithoutCancel` detach, so a cancelled/disconnected caller still tears the run down — no detach regression that would let an abandoned run linger. - **Insecure-default check on the new constant.** 24h `CriticAbsoluteMax` is a deliberately-loose *failsafe* ceiling, not a per-run budget; the host (mort) clamps its own backstop to 6h and the default battery's backstop is far smaller. I re-derived that the ceiling only bites for a misbehaving custom host handle that returns a *permanently* zero `Deadline()` — which contradicts the shipped default's behavior and is explicitly called out as defence-in-depth. Not a material insecure default. One non-blocking observation (defense-in-depth only, not a finding): the unsupervised failsafe keys on `critic == nil`, so it does **not** cover a custom host handle that arms but leaves `Deadline()` zero forever — such a run would be bound only by the 24h ceiling. The shipped default never produces that state, so this is hypothetical and arguably a correctness/robustness concern rather than security; I'm not raising it as a finding. Nothing in my lane is materially wrong. </details> <details><summary>🎯 Correctness — Minor issues</summary> I've verified the finding against the actual source. Every cited line checks out: - `critic/critic.go:262` — the default battery's `tick` gates escalation on `idle := now - lastActivity; if idle < h.softTimeout { skip }`. Confirmed it is idleness-based, not elapsed-runtime-based. - `critic/critic.go:143` — `deadline = now.Add(softTimeout × backstopMul)`, confirming `softTimeout` doubles as the backstop basis. - `run/critic.go:43-45` — the doc comment does assert "the critic first reviews once the run exceeds its nominal budget," and the sole caller passes `maxRuntime` as `softTrigger`, so the single value now feeds both the idle-stall window and the backstop. Confirmed. - The removal of `CriticSoftTimeout` (was default 90s, doc "idle window before the critic wakes") is confirmed from the diff; the idle window is now forced to equal `MaxRuntime`. The semantics/documentation concern is real and confirmable, so it survives. --- ## VERDICT: Minor issues The two-tier rewrite in `run/executor.go` is logically sound and I could not find a hard logic bug. The context layering is correct: in the critic-owned path the run is bounded by `CriticAbsoluteMax` (runaway guard) + the critic's `cancelCause`-driven deadline-watch; in the unsupervised-handle path it's correctly tightened back to `maxRuntime`. The removal of `CriticSoftTimeout` is clean — nothing fails to compile. One substantive correctness/semantics concern: - **`run/critic.go:43-45` (and the new `executor.go` call passing `maxRuntime` as `softTrigger`) — the soft-trigger value is overloaded, and the comment misdescribes the shipped default's behavior.** The doc says *"the critic first reviews once the run exceeds its nominal budget."* But the shipped default `critic` battery does **not** review on elapsed-runtime; it escalates on **idleness**: `critic/critic.go:262` gates on `idle := now - lastActivity; if idle < h.softTimeout { skip }`. So the same value now plays two roles for the default battery: (a) the stall-detection idle window, and (b) the backstop basis (`deadline = now + softTimeout × backstopMul`, `critic.go:143`). Before this PR the idle window was the independent `CriticSoftTimeout` (default 90s, doc: *"idle window before the critic wakes"*); this PR forces it to equal `MaxRuntime`. Concrete effect for a default-battery host with a 30-min agent: a genuinely hung run is no longer flagged after ~90s of inactivity but only after a **full 30 minutes** of inactivity — the watchdog's primary stall-detection purpose is materially weakened, and a host can no longer express "review after 90s idle but allow up to `MaxRuntime` total" because the knob that separated those two meanings was deleted. This may be a deliberate trade-off (it also prevents false-killing a legit long single tool call, which is the bug the PR targets), but the comment asserting "reviews once the run exceeds its nominal budget" is inaccurate for the default battery. Suggest either keeping a distinct idle-window knob or correcting the comment to describe the actual idle-based semantics. *Verified by reading `critic/critic.go:125-149` (Monitor) and `:251-268` (tick/idle gate).* Non-blocking note (not a code defect): the PR description says `CriticAbsoluteMax` defaults to **6h**, but the code defaults to **24h** (`executor.go` `withFallbacks`, and the struct comment consistently says 24h). The code is internally consistent; only the PR prose is wrong. </details> <details><summary>🧹 Code cleanliness & maintainability — No material issues found</summary> I've now verified everything. The commit message is decisive on the draft's only finding. The draft's single finding rests on "concrete evidence this already drifted: the PR description repeatedly says the default is **6h**, while the code... say **24h**." That evidence is false. The commit body shows the PR *deliberately* changed `CriticAbsoluteMax` from 6h to 24h ("CriticAbsoluteMax was 6h... Now CriticAbsoluteMax is a 24h RUNAWAY guard"), precisely because 6h was too low. The 6h that remains in the comments is mort's *host* convar (`mort's 6h convar`), a deliberately distinct value from the 24h executus default — the comments correctly distinguish the two. There is no drift; the PR documents an intentional value change. Stripped of that false evidence, the finding reduces to "the constant 24h is also mentioned in a third comment" (lines 37, 66, 294) — a trivial, speculative doc nit the draft itself concedes is partly house pattern. It cannot be confirmed as a material issue, so it drops. The draft's positive prose (rename complete — no `CriticSoftTimeout` refs remain; single caller at `executor.go:327` passing `maxRuntime`; `maxRuntime>0` guaranteed by `withFallbacks`) all checks out, but those are not findings. With the only finding dropped, the verdict resets. --- VERDICT: **No material issues found** This PR is small, focused, and well-documented. The `CriticSoftTimeout` → `CriticAbsoluteMax` rename is complete (no stale references remain in any `.go`, `README.md`, `CLAUDE.md`, or `examples/`). `startCritic` has a single caller (`run/executor.go:327`) that passes the resolved `maxRuntime`, which `withFallbacks` (`executor.go:50`) guarantees is `>0`. The `criticOwnsDeadline` extraction (`critic.go:29`) is a clean dedup shared by `Run` and `startCritic`. The new test helpers are well-factored and the `capturingCritic` interface-typed handle correctly exercises the `nil`-handle path. The "default 24h vs 6h" concern in the draft does not hold: the comments deliberately distinguish the executus default ceiling (24h) from mort's host backstop clamp (6h), and the PR intentionally moved the default from 6h to 24h — there is no drift. </details> <details><summary>⚡ Performance — No material issues found</summary> I have enough to assess the performance lens. Let me confirm one thing about the deadline-watch cadence relative to the new much-longer-lived runs. The relevant facts I verified by reading the code: - `run/critic.go:14` — `criticDeadlineCheck = time.Second`; the watch goroutine (`startCritic`, lines 63–90) ticks once/sec doing a cheap `h.Deadline()` compare. This is **pre-existing** and unchanged by the PR. - `run/executor.go:315` — the new `context.WithTimeout(..., hardCap)` and the failsafe re-wrap at `:338` are O(1) per run, with their timers stopped via `defer cancelTimeout()` / `defer cancelUnsupervised()`. No timer/goroutine leak. ## VERDICT: No material issues found Reviewing strictly through the **⚡ Performance** lens: - The change is timeout-ownership plumbing. Every construct it adds — `criticOwnsDeadline` (a two-field predicate), one extra `context.WithTimeout` at `hardCap`, and the conditional unsupervised re-wrap — executes **once per `Run`**, not inside the agent loop or any tool/step hot path. No new per-iteration work, allocation, or copy in `stepObserver` (`executor.go:353`). - The new 24h `CriticAbsoluteMax` `WithTimeout` does not create a lingering timer: Go schedules it via `time.AfterFunc` and `defer cancelTimeout()` stops it when `Run` returns, so a fast-completing run pays nothing measurable. Same for the failsafe's `cancelUnsupervised`. - The 1-second deadline-watch ticker is the only recurring cost, and the PR neither adds it nor changes its cadence. It is one cheap goroutine per *critic-owned* run (a small population — mort's `general`, etc.), so the fact that such runs may now legitimately live to the backstop (minutes/hours) instead of being killed at `MaxRuntime` adds no per-tick cost beyond an already-trivial `Deadline()` comparison. Not a regression I can justify flagging. - No N+1, no unbounded growth, no blocking call introduced on a hot path. The longer-lived runs are the *intended* behavior and are explicitly bounded (critic backstop, or `maxRuntime` failsafe when unsupervised), so there's no unbounded-resource-hold concern within this lens. Nothing in the performance lane is materially wrong here. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> ## VERDICT: No material issues found I reviewed this PR strictly through the **error-handling & edge-cases** lens, verifying the diff against the checked-out code rather than the hunks alone. **What I verified:** - **Nil-handle / unsupervised path is correctly handled.** `startCritic` returns `(nil, noop)` both when `!criticOwnsDeadline` and when `Monitor` returns a nil handle (`run/critic.go:51`, `:59`). The re-wrap guard at `run/executor.go:336` uses the *same* `criticOwnsDeadline` predicate, so the "critic enabled but no handle armed" case correctly tightens the ceiling back from 24h to `maxRuntime`. `TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime` exercises exactly this (nil interface handle → falls back). No drift between the two decision sites. - **Degenerate `MaxRuntime > CriticAbsoluteMax` is guarded** (`run/executor.go:311-313`): `hardCap` is clamped *up* to `maxRuntime`, so the ceiling can never be shorter than the nominal budget. No off-by-direction bug. - **Non-positive `softTrigger`** has the 90s floor (`run/critic.go:55`); `withFallbacks` guarantees `maxRuntime > 0` (`:50-52`) so it's defensive-only, as commented — not dead-but-wrong. - **Panic isolation intact.** The deadline-watch goroutine recovers from a host `Deadline()`/`KillCause()` panic on its own goroutine (`run/critic.go:67`); a panic in the deferred `stopCritic`→`h.Stop()` unwinds into the top-level `recover()` at `executor.go:139` (registered first, runs last), so it still becomes `Result.Err` rather than crashing. - **No cleanup/leak introduced.** Every new context layer (`cancelTimeout`, `cancelCause`, `mergeCancel`, `cancelUnsupervised`) is `defer`-cancelled. `checkpointCause` is correctly set *after* the unsupervised re-wrap (`executor.go:344`) so it reads the context the loop actually runs on. - **No dangling references.** The renamed `CriticSoftTimeout` field has zero remaining references in `.go` files; the only `startCritic` call site was updated to the new 5-arg signature. The diff compiles cleanly across files. **Non-blocking observation (not a finding):** the unsupervised-failsafe only tightens the ceiling when the handle is `nil`. A handle that is non-nil but never advances `Deadline()` (a "live but lazy/broken" host critic) is bounded only by `CriticAbsoluteMax` (kernel default 24h). This is the documented runaway-guard behavior and is exercised by `TestCriticOwnsDeadline_SurvivesPastMaxRuntime`, so it's intentional, not a bug — I note it only because the asymmetry (nil → `maxRuntime`; live-but-zero-deadline → 24h) is a real edge case a host should be aware of. Outside my lens: the PR description says `CriticAbsoluteMax` defaults to 6h, but the code defaults to 24h (`executor.go:66`) — a description/doc mismatch worth a correctness/docs pass. </details> </details> <details><summary>claude-code/opus:max (claude-code) — Minor issues</summary> **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> This change is purely about timeout/deadline ownership semantics for critic-owned runs. I've verified the key paths (`run/critic.go:29-95`, `run/executor.go:304-344`). Through the **security lens** specifically: - **No authn/authz surface** touched — this is internal run-lifecycle wiring, no caller-identity or permission decision changes. - **No injection / untrusted-data sink introduced.** The only place untrusted-ish data flows is `critic.recordToolStart(call.Name, string(call.Arguments))` (`executor.go:371`) and the `KillCause().Error()` string wrapped into `ErrCriticKill` (`critic.go:82`) — both are forwarded to host-side handles / error chains, not into a shell, SQL, path, or template. Pre-existing behavior, unchanged by this diff. - **No deserialization, SSRF, or secret handling** in scope. - **Insecure-default consideration (resource bounding):** the diff replaces the unconditional `MaxRuntime` hard cap with a `CriticAbsoluteMax` (default 24h) ceiling for critic-owned runs. This loosens the worst-case resource-holding bound, but it is *deliberate and still bounded*: the nil-handle failsafe (`executor.go:336-340`) re-tightens to `MaxRuntime` when no supervisor arms, the ceiling is never shorter than `MaxRuntime`, and a healthy host clamps its own backstop well below the ceiling. This is an availability/DoS-resilience tuning question (24h is a long slot to hold), but it does not constitute a security regression in my lane — it's a resource-management/correctness concern that the correctness/availability lens owns. Nothing in the security lens is materially wrong. VERDICT: No material issues found </details> <details><summary>🎯 Correctness — No material issues found</summary> I've verified the full two-tier timeout flow against the actual code (`run/executor.go`, `run/critic.go`, the critic battery in `critic/critic.go`, and the `Critic` port + `CriticConfig` contracts). Here is my correctness review. ## VERDICT: No material issues found The two-tier timeout logic is correct and, importantly, consistent with the *documented* design intent — which is what makes this a genuine fix rather than a behavior change. What I checked and confirmed sound: - **`criticOwnsDeadline` gating is consistent across both call sites.** `executor.go:304` (choosing `hardCap`) and `critic.go:51` (the arm gate) use the same predicate, so the "did we widen the cap?" decision and the "did we arm the watch?" decision can't disagree. (Read both.) - **Unsupervised-run failsafe is correct.** `criticOwns && critic == nil` (`executor.go:336`) is reachable *only* when `Monitor` returned a nil handle (the other nil-return path in `startCritic` requires `!criticOwns`). Re-wrapping `runCtx` at `maxRuntime` correctly tightens an armed-but-unsupervised run back to the nominal budget. The `checkpointCause` closure is set *after* this re-wrap (`executor.go:344`), so it reads the context the loop actually runs on. Verified against the test `TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime`. - **`hardCap < maxRuntime` clamp** (`executor.go:311`) correctly handles the degenerate `MaxRuntime > CriticAbsoluteMax` config. - **Soft trigger = `MaxRuntime` is the correct value, not a regression.** I was suspicious that flipping `soft` from the dedicated `CriticSoftTimeout` (90s) to `MaxRuntime` would mis-drive the battery, since `critic/critic.go` uses `softTimeout` for *three* things: the backstop base (`deadline = now + soft×mul`, line 143), the idle-escalation threshold (line 262), and the poll interval (`soft/2`, line 132). But the pre-existing contract already nails the intended meaning: `CriticConfig.BackstopMultiplier` is documented (`run/agent.go:83`) as "a multiple of the soft trigger **(MaxRuntime)**". So the *old* code passing `CriticSoftTimeout` was the value that contradicted the design; this PR aligns it. The backstop now correctly resolves to `MaxRuntime × multiplier`, matching the PR's stated semantics. - **Field removal is clean.** `grep` for `CriticSoftTimeout` across the whole module returns zero hits — no dangling references, no compile break from dropping the field. Two non-blocking observations (neither is a code defect): - **PR-description / code default mismatch.** The PR description repeatedly says `CriticAbsoluteMax` defaults to **6h**, but the code (`executor.go:66`) and its own doc comment say **24h**. The *code* is internally consistent and is the *correct* choice: the runaway ceiling must sit strictly above the host's own backstop clamp (mort's 6h), so 24h > 6h is right and a 6h default would be wrong (it could pre-empt a healthy supervised run). Only the PR prose is stale — worth fixing the description so a future reader doesn't "correct" the code down to 6h. - **Doc nuance in `critic.go:44`.** The comment says the critic "first reviews once the run exceeds its nominal budget," but the battery actually escalates on *idle* time (`idle >= softTimeout`), not total elapsed — a progressing run whose activity clock keeps refreshing may never be reviewed before the backstop. This is the battery's unchanged behavior, not something this PR alters; the wording just slightly oversells when a review fires. Cosmetic. I did not find any introduced logic bug in my lens. </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> All claims verified against the code: - `run/critic.go:54-57`: the `soft <= 0 → 90s` branch confirmed — `soft = softTrigger`, and the sole caller (`executor.go:327`) passes `maxRuntime`, which `executor.go:158-160` resolves to either `ra.MaxRuntime` or `Defaults.MaxRuntime`. `New` runs `withFallbacks()` (`executor.go:104`) guaranteeing `Defaults.MaxRuntime >= 60s` (`executor.go:50-51`). So `softTrigger` is always `> 0` and the branch is unreachable. The bare `90 * time.Second` literal replaces the former named `CriticSoftTimeout` field. Confirmed; only one caller exists. - No stale `CriticSoftTimeout` references remain (grep clean). - `executor.go:66` defaults `CriticAbsoluteMax` to 24h; the field doc (`executor.go:38`) cites mort's 6h host clamp. Internally consistent, so the "6h" PR-description note holds. The finding survives. Outputting the corrected review. --- ## VERDICT: Minor issues Through the cleanliness/maintainability lens this is a careful, well-structured change. The new `criticOwnsDeadline` helper (`run/critic.go:29`) sensibly factors out the predicate so `Run` and `startCritic` can't drift, names are clear (`hardCap`, `criticOwns`, `softTrigger`), and the removed `CriticSoftTimeout` field is gone cleanly — no stale references remain in `.go`, `.md`, or `examples/` (verified via grep). Tests are descriptively named and commented. Only small notes: - **`run/critic.go:54-57` — the `soft <= 0 → 90s` fallback is now unreachable dead code with a bare magic literal.** Verified: the sole caller (`run/executor.go:327`) always passes `maxRuntime`, which is resolved at `executor.go:158-160` from `ra.MaxRuntime` or `e.cfg.Defaults.MaxRuntime`, and `cfg.Defaults = cfg.Defaults.withFallbacks()` (`executor.go:104`) guarantees `MaxRuntime >= 60s` (`executor.go:50-51`). So `softTrigger` is always `> 0` and the branch never fires. Previously this 90s lived as a named, fallback-defaulted field (`CriticSoftTimeout`); collapsing it into an inline literal inside an unreachable branch is a minor regression in maintainability. The code even documents it as dead twice (the doc comment at `critic.go:47` and the inline comment at `critic.go:56`). Low-churn fix: either drop the branch and assert/document the precondition, or keep it but lift `90 * time.Second` to a named const so the magic number isn't bare. Trivial; the current form is harmless and clearly commented. - **PR-description vs. code drift (not a code defect):** the description says `CriticAbsoluteMax` defaults to "6h", but the code defaults to **24h** (`executor.go:66`, and the field doc at `executor.go:38` and inline comment at `executor.go:294` both say 24h — internally consistent). The "6h" in the description is mort's *host* clamp, not the kernel default. Worth correcting the PR text so future readers aren't misled; nothing to change in code. No other cleanliness problems: the triple documentation of the two-tier semantics (field doc, the `Run` block comment at `executor.go:279-303`, and the `startCritic` doc) is verbose but each block targets a distinct reader and matches this repo's documented `// Why:` house style, so I'm not flagging it as a defect. </details> <details><summary>⚡ Performance — No material issues found</summary> I've verified the relevant code. The polling goroutine (`criticDeadlineCheck = time.Second`, `run/critic.go:14`) is pre-existing and untouched by this diff — the soft-trigger change (`CriticSoftTimeout` → `MaxRuntime`) doesn't alter its cadence. The unsupervised failsafe adds one trivial extra `context.WithTimeout` layer. No new allocations in hot loops, no N+1, no quadratic behavior. The one performance-adjacent effect is that critic-owned runs now hold their lane/worker slot up to the extendable backstop (host-clamped 6h) instead of being hard-killed at `MaxRuntime` (e.g. 30m) — and the per-run 1s-ticker goroutine therefore lives proportionally longer (~21k ticks over 6h vs ~1.8k over 30m). But (a) that longer hold is the deliberate, documented intent of the change, not an accidental regression; (b) the per-tick cost is a single `Deadline()`/`KillCause()` read — negligible even across many concurrent runs; and (c) the unbounded-hold risk is explicitly bounded by the `CriticAbsoluteMax` ceiling and the unsupervised-failsafe re-tighten. None of this rises to a material efficiency problem in my lane. VERDICT: No material issues found - No N+1, no work added inside the agent loop, no unbounded growth introduced, no new blocking calls on the hot path. The deadline-watch ticker is unchanged by this diff and its cost is immaterial. The longer slot-hold for critic-owned runs is intended behavior, bounded by the runaway ceiling and the nil-handle failsafe. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> I've verified the change against the checked-out code. Summary of what I checked through the **error handling & edge cases** lens: - **`cfg.Defaults` is normalized at construction** (`run.New` calls `withFallbacks()`, executor.go:104), so `CriticAbsoluteMax` is guaranteed `>0` (24h) before it's ever used as a `WithTimeout` duration at executor.go:315. No zero-duration / instantly-expired context risk. - **`maxRuntime` is guaranteed `>0`** (executor.go:158-161 → `Defaults.MaxRuntime` floor of 60s), so the soft trigger passed to `startCritic` and the `hardCap < maxRuntime` clamp (executor.go:311) are well-defined. The `soft <= 0 → 90s` guard in `startCritic` (critic.go:55) is genuinely just defensive. - **The unsupervised-run failsafe** (executor.go:336-340) correctly re-evaluates `criticOwns` so it only fires on the nil-handle case (not the no-critic case), and tightens the cap back to `maxRuntime`. `cancelUnsupervised` is deferred — no context leak. Verified the nil-handle path in `capturingCritic.Monitor` returns a nil interface, so the test exercises it. - **`checkpointCause` reordering** (moved to executor.go:344, after the re-wrap) reads the actual loop context; the top-level defer is nil-safe (`checkpointCause != nil`, executor.go:144), so the early-return/panic window before assignment is handled. - **All `criticBinding` methods are nil-receiver-safe** (critic.go:97-139), so the `critic == nil` failsafe path doesn't panic on `recordStep`/`maxStepsOption`/`drainSteer`. - **No stale references** to the removed `CriticSoftTimeout` field anywhere in the repo (grep clean); status classification (`statusFor`) is untouched and still nil-safe. - **Deadline-watch goroutine** has a `recover()` against a host `Deadline()` panic (critic.go:67), and exits on both `done` and `runCtx.Done()` — no goroutine leak. The unhappy paths this diff introduces (zero/negative durations, nil critic handle, degenerate `MaxRuntime > AbsoluteMax`, cleanup on every exit) are each handled. One pre-existing item I noticed but is **outside this diff and outside my lens** (correctness): a *natural* backstop expiry cancels via `cancelCause(context.DeadlineExceeded)` (critic.go:84), but `statusFor` only consults `context.Cause` for `ErrCriticKill` (executor.go:529) — so a natural backstop expiry likely classifies as `cancelled`, not the `timeout` the critic.go:39 comment promises. Not introduced here; flagging only as it brushes the deadline logic. --- **VERDICT: No material issues found** (error handling & edge cases lens) The two-tier timeout change is defensively solid: durations are floored before use, the nil-handle and no-critic paths are both bounded, contexts are cancelled on every exit path, and the new test file exercises the hard-cap, soft-trigger, survival, and nil-handle-fallback edge cases. </details> </details> <details><summary>deepseek-v4-pro:cloud (ollama-cloud) — No material issues found</summary> **Verdict: No material issues found** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> ## Verdict: No material issues found I reviewed this change through the **security** lens: authn/authz, injection, SSRF, unsafe deserialization, secret leakage, input validation of untrusted data, and insecure defaults. The diff is purely about context-timeout management — converting `MaxRuntime` from a hard `WithTimeout` kill to a soft trigger when a critic owns the deadline, with a generous `CriticAbsoluteMax` runaway ceiling as defense-in-depth. - **No injection vectors**: no SQL, command, path, or template construction is introduced. The only "input" flowing through new paths is `maxRuntime` (a `time.Duration`), which is passed to the host `Critic.Monitor` interface — the host is responsible for its own validation. - **No authn/authz gaps**: the `criticOwnsDeadline` predicate gates the two-tier path on `Ports.Critic != nil && ra.Critic.Enabled`, both controlled by the host/agent config, not by untrusted user input. - **No secret/credential leakage**: no secrets are handled, logged, or transmitted. - **No SSRF or unsafe deserialization**: no network calls or deserialization are added. - **Insecure defaults**: the `CriticAbsoluteMax` default is 24h (not 6h as the PR description states — the code comment explains this is intentional: the host clamps to a smaller value, and the kernel ceiling must sit well above it). A 24h runaway ceiling for a broken critic is generous but is a deliberate defense-in-depth choice, not a vulnerability — the host's own clamp (e.g., mort's 6h convar) is the real bound. I verified the code at `run/executor.go:65-66` and the comment at lines 32-42. - **Unsupervised-run failsafe** (`executor.go:336-339`): when the critic is enabled but `Monitor` returns nil, the run context is re-wrapped with the nominal `MaxRuntime` timeout. This correctly prevents an unsupervised run from consuming resources up to the 24h ceiling. I verified that `startCritic` launches no goroutine in this path (`critic.go:58-60`), so no dangling watcher exists on the old context. - **TOCTOU**: the `criticOwnsDeadline` predicate is called once at `executor.go:304` and again inside `startCritic` at `critic.go:51` — the same predicate gates both the ceiling choice and the arm/no-op decision, so they cannot drift. No material security issues found. </details> <details><summary>🎯 Correctness — No material issues found</summary> ## Verdict: No material issues found I traced the full context chain, the two-tier timeout logic, the unsupervised failsafe, the `checkpointCause` timing, the `statusFor` classification, and the deadline-watch goroutine interaction. The logic is sound: - **`criticOwnsDeadline`** is the single predicate used in both `Run` (hard cap selection) and `startCritic` (arm/no-op gate) — they cannot drift. - **Hard cap selection**: non-critic → `maxRuntime`; critic-owned → `max(CriticAbsoluteMax, maxRuntime)`. The floor guard handles degenerate configs correctly. - **`startCritic` soft trigger**: now receives the resolved `maxRuntime` (always > 0 via `withFallbacks`), with a 90s defensive floor that is genuinely unreachable from the sole caller. - **Unsupervised failsafe**: when `criticOwns && critic == nil`, `runCtx` is re-wrapped with `WithTimeout(runCtx, maxRuntime)`. The child deadline is tighter than the parent's `CriticAbsoluteMax`, so it fires first — correct. The `cancelCause` function (which references the pre-wrap parent) is never called with a cause in this path because the deadline-watch goroutine was never started. - **`checkpointCause`** closure is set after the unsupervised re-wrap, capturing the context the loop actually runs on. No early returns exist between the old and new positions. - **`statusFor`** correctly classifies the unsupervised timeout as `"timeout"` via `errors.Is(runErr, context.DeadlineExceeded)`. - **`Defaults.CriticSoftTimeout`** removal is clean — zero remaining references confirmed by grep. One minor documentation discrepancy (not a code bug): the PR description says `CriticAbsoluteMax` default is 6h, but the code has 24h. The code is internally consistent (the 24h comment explains it must be "FAR beyond any realistic backstop" since the host clamps to 6h), so this is a PR-description error, not a logic defect. </details> <details><summary>🧹 Code cleanliness & maintainability — No material issues found</summary> **Verdict: No material issues found** The change is clean and well-structured through the maintainability lens: - **`criticOwnsDeadline`** is a well-extracted single predicate, used in both `Run` and `startCritic`, preventing the two decision points from drifting — exactly the right pattern. - **`CriticSoftTimeout` removal** is complete: zero remaining references across the repo (confirmed via grep). No dead config field left behind. - **`Defaults` struct comment** for `CriticAbsoluteMax` is thorough and explains the "why" (runaway guard, host clamps lower, never pre-empts a healthy run). The `Run` method comment repeats some of this, but that's appropriate for a caller-side explanation vs. a field-level contract. - **`checkpointCause` reassignment** after the unsupervised-failsafe re-wrap is correctly ordered and explicitly commented, so a future reader won't mistakenly move it back above the re-wrap. - **Test file** (`critic_deadline_test.go`) is well-organized: helpers (`slowToolInvocation`, `slowModel`, `capturingCritic`) are scoped to the file, tests are named clearly, and the `capturingCritic` type is a purposeful variant of the existing `fakeCritic` (captures soft trigger vs. ignores it) — not copy-paste. - **The 90s defensive floor** in `startCritic` is explicitly documented as unreachable by the sole caller; it's dead code by construction but the comment makes the intent unambiguous, so it won't confuse a future maintainer. No duplicated logic, no leaky abstractions, no confusing names, no deeply nested or overly long functions introduced by this diff. </details> <details><summary>⚡ Performance — No material issues found</summary> I've verified the full code in `run/executor.go` and `run/critic.go`, confirmed there are no other callers of `startCritic`, and confirmed `CriticSoftTimeout` is fully removed (no stale references). Let me now assess through the performance lens. The changes are all per-run (not per-step or per-tool): - **`criticOwnsDeadline`**: two field reads, called once per `Run()`. Zero cost. - **`hardCap` computation**: a conditional + comparison, once per `Run()`. Zero cost. - **`startCritic` signature change**: passes a value already computed (`maxRuntime`). No new work. - **Unsupervised failsafe `WithTimeout`**: one extra context layer, only in the edge case where `Monitor` returns nil. Single allocation, negligible. - **Context chain depth**: unchanged in the normal path; +1 layer only in the unsupervised edge case. Not material. - **Goroutine in `startCritic`**: unchanged polling frequency (`criticDeadlineCheck = 1s`). No regression. - **`Defaults.CriticAbsoluteMax` (24h) vs old `CriticSoftTimeout` (90s)**: both are just timer durations in the runtime heap — a 24h timer costs the same as a 90s timer. - **No new allocations in hot paths** (step observer, tool dispatch, critic binding methods are untouched). - **No N+1, no unbounded growth, no blocking calls on hot paths, no quadratic behavior.** ## Verdict: No material issues found </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> ## Verdict: No material issues found I reviewed the diff through the 🧯 Error handling & edge cases lens. The change is defensive and well-considered: - **`startCritic` soft-trigger fallback**: The `if soft <= 0 { soft = 90 * time.Second }` guard at `critic.go:55` is genuinely defensive — the sole caller (`executor.go:327`) passes `maxRuntime`, which is always `> 0` after `withFallbacks()` (verified at `executor.go:158-161` and `executor.go:50-52`). No reachable zero path. - **Unsupervised-run failsafe** (`executor.go:336-340`): When `criticOwns && critic == nil`, the code wraps `runCtx` with a tighter `WithTimeout(maxRuntime)`. The `cancelCause` function remains tied to the parent `WithCancelCause` context, but since `critic == nil` means no goroutine was launched to call `cancelCause`, the only cancellation source is the new `WithTimeout` — `context.Cause` correctly returns `DeadlineExceeded`. The `checkpointCause` closure is set *after* this re-wrap (`executor.go:344`), so it reads the final context. Verified by tracing the full context chain at `executor.go:315-344`. - **`criticOwnsDeadline` predicate** (`critic.go:29-31`): Single source of truth used by both the hard-cap decision and the arm/no-op gate — they cannot drift. The nil-check on `e.cfg.Ports.Critic` is correct (interface nil check). - **Nil-handle path in `startCritic`** (`critic.go:58-60`): `Monitor` returning nil is handled cleanly — returns `nil, noop`, no goroutine launched, and the caller's unsupervised failsafe catches it. - **Goroutine panic recovery** (`critic.go:67`): The `recover()` in the deadline-watch goroutine is pre-existing, not introduced by this diff. Silent swallow is intentional per the comment. - **Deferred cleanup ordering**: `cancelUnsupervised` → `stopCritic` → `mergeCancel` → `cancelCause(nil)` → `cancelTimeout` — each cancels its own resource in correct reverse order. The conditional `defer cancelUnsupervised()` inside the `if` block is correct Go. - **`hardCap` floor** (`executor.go:311-313`): The `if hardCap < maxRuntime { hardCap = maxRuntime }` guard handles the degenerate config case cleanly. No ignored errors, missing cleanup, nil-pointer risks, or unhandled edge cases found in the new code paths. </details> </details> <details><summary>glm-5.2:cloud (ollama-cloud) — No material issues found</summary> **Verdict: No material issues found** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> **VERDICT: No material issues found** Reviewed through the 🔒 Security lens. This is timeout-plumbing: no new untrusted-input paths, no authn/authz surfaces, no injection/SSRF/deserialization/secret-handling changes. `criticOwnsDeadline` (run/critic.go:29) gates only on operator-controlled config (`Ports.Critic != nil && ra.Critic.Enabled`); the `softTrigger`/`maxRuntime` values are agent/host config, not user-tainted. Security-relevant behavior I verified: - **Unsupervised-run failsafe** (run/executor.go:336-340): when a critic-enabled run's `Monitor` returns a nil handle, the base context is re-tightened from the 24h runaway ceiling back to `MaxRuntime` via `context.WithTimeout(runCtx, maxRuntime)`. Verified `critic == nil` is the exact nil-handle return path from `startCritic` (run/critic.go:59-61). This closes the obvious "critic enabled but never armed → run free for 24h" hole. Good defense-in-depth. - **Ultimate runaway ceiling**: even with a *broken* (non-nil) handle whose `Deadline()` always returns zero, the `WithTimeout(..., CriticAbsoluteMax)` at executor.go:315 still bounds the run. So there is no unbounded-execution path. Minor (non-blocking) note, not a security vulnerability: - The PR description repeatedly states `CriticAbsoluteMax` defaults to **6h**, but the code (`run/executor.go:65-67`) and its own comment default to **24h**. From a threat-model standpoint an operator relying on the description would underestimate the worst-case runaway window (resource exhaustion / cost exposure for a broken-critic run) by 4×. The code is self-consistent; only the description is wrong. Suggest correcting the description to 24h so operators size the failsafe accurately. (Verified: `withFallbacks` sets `24 * time.Hour`.) No findings rise to a security defect; the timeout layering is bounded at every path. </details> <details><summary>🎯 Correctness — No material issues found</summary> **VerDICT: No material issues found** I traced the two-tier timeout logic end-to-end against the actual checked-out code and verified the context-layering semantics are correct. Verified findings (all confirmed by reading the code): - **No-critic path** (`run/executor.go:304-315`): `criticOwns` is false → `hardCap = maxRuntime` → `WithTimeout(..., maxRuntime)`. Legacy hard-cap preserved. ✓ - **Critic-owned, handle armed** (`run/critic.go:49-95`, `executor.go:317-327`): base context gets `CriticAbsoluteMax` (24h) ceiling; critic goroutine calls `cancelCause(ErrCriticKill|DeadlineExceeded)` on the `WithCancelCause` layer. Cause propagates: critic-kill → `Cause(runCtx)` = `ErrCriticKill` → "killed"; backstop expiry → `DeadlineExceeded` → "timeout"; caller cancel flows through `MergeCancellation`. All three classification paths correct. ✓ - **Critic-owned, nil handle failsafe** (`executor.go:336-340`): `critic == nil` → no deadline-watch goroutine was started (`startCritic` returns `nil, noop`), so the `WithTimeout(runCtx, maxRuntime)` re-wrap is the *only* sub-24h deadline. It fires at ~`maxRuntime` from now, `Cause` = `DeadlineExceeded` → "timeout". `checkpointCause` is set *after* the re-wrap (line 344) so it reads the context the loop runs on. Correct. ✓ - **Defers LIFO**: `cancelUnsupervised` (innermost) runs before `stopCritic` (noop here) and the outer cancels — no leak. ✓ - **`CriticSoftTimeout` fully removed**: grep across `run/` shows zero remaining references; the `withFallbacks` now seeds `CriticAbsoluteMax` instead. Clean. ✓ - **`criticOwnsDeadline` single predicate** used by both `Run` (ceiling choice) and `startCritic` (arm/no-op gate) — they cannot drift. ✓ One trivial, non-blocking note (doc vs code, not a runtime bug): - `run/executor.go:66` sets the `CriticAbsoluteMax` default to **24h**, and the code comment at line 294/308 says "default 24h", but the PR description repeatedly claims **6h** ("withFallbacks default 6h", "clamped to 6h"). The code is internally consistent and 24h is strictly safer (more conservative ceiling; the host adapter clamps its own backstop regardless), so there is no behavioral defect — but the description would mislead an operator reading only the PR text. Suggest updating the description to 24h. </details> <details><summary>🧹 Code cleanliness & maintainability — No material issues found</summary> **Verdict: No material issues found** Through the code-cleanliness & maintainability lens, this change is well-structured: - **Good refactor — single predicate.** `criticOwnsDeadline` (`run/critic.go:29`) is the one source of truth shared by both `Run` (ceiling choice) and `startCritic` (arm/no-op gate), with a clear doc note that they share it "so the two can never drift." This eliminates the old duplicated `e.cfg.Ports.Critic == nil || !ra.Critic.Enabled` check and keeps the two-tier decision consistent. - **Clean field swap.** `CriticSoftTimeout` is fully removed — I grepped the whole repo and found zero remaining references. `CriticAbsoluteMax` replaces it with a thorough doc comment. No dangling/stale references. - **Test helpers shared, not duplicated.** The new `critic_deadline_test.go` is in `run_test` and reuses `fakeCritic`/`fakeCriticHandle` from `critic_test.go` rather than redefining them. `slowToolInvocation`/`slowModel` are factored out and reused across the new tests. - **Ordering is deliberate and documented.** The `checkpointCause` closure is set *after* the unsupervised-failsafe re-wrap, with a comment explaining why ("reads the context the loop runs on") — avoiding a subtle stale-closure bug a future maintainer could easily introduce. Minor/trivial (not material): - `run/critic_deadline_test.go:94` `capturingCritic` partially re-implements `fakeCritic.Monitor` (just adds a soft-capture); it could embed `fakeCritic` to avoid the near-duplicate, but it's a 4-line test helper, so churn isn't worth it. - The `CriticAbsoluteMax` field doc and the `Run` block comment are quite long (~11 lines / ~20 lines). They're accurate and genuinely help a future maintainer avoid re-breaking the two-tier semantics, so this is a style preference, not a defect. </details> <details><summary>⚡ Performance — No material issues found</summary> **VERDICT: No material issues found** - `run/executor.go:310` — `e.cfg.Defaults.CriticAbsoluteMax` is read from the `withFallbacks()`-resolved copy (applied in `New` at line 104), so it is the defaulted value (24h when unset). Correct, not a bug. - `run/executor.go:336-339` — the unsupervised-failsafe `WithTimeout` re-wraps `runCtx` only on the rare nil-handle path (host `Monitor` returns nil). One extra timer allocation per such run; immaterial on the hot path and the `checkpointCause` closure is set afterward (line 344) so it reads the re-wrapped context. Correct. - `run/critic.go:68` — the deadline-watch ticker interval (`criticDeadlineCheck = time.Second`, line 14) is unchanged by this diff; the soft trigger value does not affect the poll cadence. No regression; negligible per-in-flight-run cost. Correct. - No N+1, unbounded growth, new hot-loop work, or missing limits introduced. The only new per-run allocations are the existing `context.WithTimeout` plus, in the rare nil-handle case, one extra `WithTimeout` and a deferred cancel. `criticOwnsDeadline` is two field reads. No quadratic or blocking behavior on a hot path. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **VERDICT: No material issues found** I reviewed the diff through the error-handling & edge-cases lens, verifying against the checked-out code. - **nil-handle failsafe** (`run/executor.go:336-340`): when `criticOwns && critic == nil`, the run is re-wrapped with `context.WithTimeout(runCtx, maxRuntime)`. I confirmed `maxRuntime` is always > 0 (resolved at `executor.go:158-161` with a `Defaults.MaxRuntime` fallback that `withFallbacks` guarantees > 0), so this never creates a zero/burst timeout. The `defer cancelUnsupervised()` is registered in correct LIFO order after `cancelCause`/`mergeCancel`, so cleanup is complete on every exit path. Verified. - **cause propagation for the failsafe ceiling** (`run/executor.go:315`, `statusFor` at `:520-538`): the 24h `WithTimeout` parent → `WithCancelCause` child chain means a ceiling expiry surfaces as `context.DeadlineExceeded` via `context.Cause(runCtx)` → "timeout"; a critic kill surfaces as `ErrCriticKill` → "killed". The `checkpointCause` closure is deliberately assigned *after* the unsupervised re-wrap (`:344`) so it reads the context the loop actually runs on. Verified correct. - **`startCritic` soft-trigger floor** (`run/critic.go:54-57`): `soft <= 0` falls back to 90s. The sole caller passes the resolved `maxRuntime` (always > 0), so the floor is pure defence — but it correctly handles a hypothetical non-positive caller, no panic/zero-duration `Monitor` call. Verified. - **critic goroutine panic recovery** (`run/critic.go:67`): `defer func() { _ = recover() }()` guards a panicking host `Deadline()`/`KillCause()` on its own goroutine (the main `recover` can't catch it). Verified present and appropriate. - **re-wrap vs. goroutine reference aliasing**: the unsupervised `WithTimeout` reassigns `runCtx`, but the critic deadline-watch goroutine only exists when `critic != nil`, and the re-wrap branch is gated on `critic == nil` — so the goroutine never observes a stale `runCtx`. No leak or missed cancellation. Verified by reading both branches. - **MergeCancellation interaction** (`run/runengine.go:62-73`): the merge goroutine cancels the merged context with the caller's cause on caller cancel; this composes correctly with both the `criticOwns` and non-critic paths. No swallowed error. No ignored errors, missing defers, nil-deref, zero/negative-duration, or boundary issues found in this lens. </details> </details> <details><summary>kimi-k2.6:cloud (ollama-cloud) — No material issues found</summary> **Verdict: No material issues found · ⚠️ 1/5 lens(es) errored** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> **Verdict: No material issues found** From a security lens, the change correctly restores the two-tier timeout semantics without introducing new trust-boundary gaps, injection vectors, or unsafe defaults. - The `criticOwnsDeadline` predicate is used consistently as the single gate for both the run context ceiling selection and the critic arm decision, preventing drift. - The `CriticAbsoluteMax` (default 24h) is explicitly documented as a generous runaway guard for a broken/nil host handle, not the normal path. The host is expected to clamp its own backstop to a much smaller value (e.g., mort's 6h convar). An operator can lower it via `Defaults.CriticAbsoluteMax`. - The unsupervised-run failsafe (`criticOwns && critic == nil`) correctly tightens the context back to `MaxRuntime` when the host Monitor returns no handle, preventing a run from escaping to the 24h ceiling if the critic fails to arm. - Input to `startCritic`'s `softTrigger` is the resolved `MaxRuntime`, which `withFallbacks` guarantees is `> 0`; the `<= 0` branch is purely defensive and unreachable from the actual caller. - `statusFor` preserves the kill-vs-timeout-vs-cancel distinction via `context.Cause`, with `ErrCriticKill` wrapped via `%w` so `errors.Is` matches correctly. No authn/authz gaps, injection, SSRF, deserialization, secret leakage, or insecure-default issues were identified. </details> <details><summary>🎯 Correctness — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 10: context deadline exceeded </details> <details><summary>🧹 Code cleanliness & maintainability — No material issues found</summary> No material issues found. The change is structurally clean through the maintainability lens: - `criticOwnsDeadline` is a well-chosen extraction that centralises the two-tier-timeout predicate and prevents `Run` and `startCritic` from drifting out of sync. - `CriticSoftTimeout` is cleanly removed from `Defaults`; no stale references remain. - Test helpers in the new file are appropriately scoped and factored (`slowToolInvocation`, `slowModel`, `capturingCritic`). - Comments in `executor.go` and `critic.go` are verbose but justified by the semantic complexity of the two-tier timeout model. </details> <details><summary>⚡ Performance — No material issues found</summary> **No material issues found** Reviewed through the **Performance** lens. The change restores two-tier deadline semantics: critic-owned runs get a generous runaway ceiling (`CriticAbsoluteMax`, default 24 h) instead of `MaxRuntime`, while non-critic runs remain unchanged. What I checked: - `run/executor.go:304-344` — the new context chain for critic-owned runs adds at most one extra `WithTimeout` wrap (the unsupervised nil-handle fallback). This is per-run, not per-step or per-iteration, so it does not affect hot-path latency. - `run/critic.go:62-95` — the deadline-watch goroutine and its 1-second ticker are unchanged. - `MergeCancellation` goroutine behavior is unchanged. - The 24-hour default timer is created once per run and immediately stopped via deferred `cancelTimeout()` on every exit path (including panic/recover), so there is no timer leak or heap pressure from long-duration timers. No N+1 queries, no new allocations inside loops, no unbounded growth, no blocking calls on hot paths, and no quadratic behavior introduced. The diff is clean from a performance standpoint. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> No material issues found. The two-tier timeout logic is carefully structured with proper nil-safety, defer ordering, and edge-case handling: - The unsupervised-run fallback (`criticOwns && critic == nil`) correctly re-wraps `runCtx` with `maxRuntime` so a nil `Monitor` handle cannot accidentally run up to the 24h ceiling. - `checkpointCause` is assigned **after** the unsupervised re-wrap, so it reads the correct context the loop actually executes on. - All `criticBinding` methods are nil-safe, so the executor can call `critic.recordStep`, `critic.drainSteer()`, etc. unconditionally. - `startCritic`'s deadline-watch goroutine defends against panics in `h.Deadline()` via `recover()`. - `withFallbacks` guarantees `CriticAbsoluteMax > 0`, and the `hardCap < maxRuntime` clamp ensures the ceiling never undershoots the nominal budget even for degenerate configs. No ignored errors, missing cleanups, nil panics, or unhandled boundary conditions were introduced. </details> </details> <details><summary>qwen3.5:397b-cloud (ollama-cloud) — Minor issues</summary> **Verdict: Minor issues · ⚠️ 1/5 lens(es) errored** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary>🔒 Security — No material issues found</summary> ## Verdict: No material issues found After reviewing the changes through the security lens and verifying against the checked-out code: **What I verified:** - Read `run/executor.go` (lines 1-100, 260-360) to trace the timeout context chain - Read `run/critic.go` (lines 1-100) to verify the deadline-watch goroutine logic - Confirmed `CriticSoftTimeout` is no longer referenced (grep returned no results) - Verified the `checkpointCause` closure is set AFTER the unsupervised failsafe re-wrap (line 344) **Security analysis:** 1. **Timeout escape paths**: The unsupervised failsafe (lines 336-340) correctly tightens the context back to `maxRuntime` when `criticOwns && critic == nil`. The context chain is: `timeoutCtx` (24h ceiling) → `runCtx` (cancelCause) → new `runCtx` (maxRuntime timeout). When maxRuntime expires, cancellation propagates correctly. 2. **Critic handle with no deadline**: A non-nil critic handle whose `Deadline()` always returns zero would leave the run bounded only by `CriticAbsoluteMax` (24h default). This is **intentional design** - confirmed by `TestCriticOwnsDeadline_SurvivesPastMaxRuntime` which explicitly tests this scenario with `fakeCriticHandle{}` (Deadline returns zero). The 24h ceiling is the documented runaway guard. 3. **Input validation**: The `softTrigger` parameter comes from `maxRuntime`, which `withFallbacks()` guarantees is >0. The defensive check at `critic.go:55-56` is a harmless backup. 4. **No injection/leakage/authz issues**: The changes are purely timeout semantics - no untrusted input handling, no secrets, no external calls modified. The 24h default for `CriticAbsoluteMax` could be considered generous (mort uses 6h per the PR description), but this is a configuration choice, not a code vulnerability. The mechanism correctly implements the two-tier timeout semantics. </details> <details><summary>🎯 Correctness — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: max steps reached without a final answer (max 14) </details> <details><summary>🧹 Code cleanliness & maintainability — Minor issues</summary> ## VERDICT: Minor issues found **Findings:** - **`run/executor.go:336-340` — Unusual defer-in-conditional pattern**: The unsupervised-run failsafe wraps `runCtx` with a new timeout and defers cancellation inside the `if` block. While this works in Go (the defer is registered conditionally and captures the variable), this pattern is less common and slightly harder to read than the alternative of declaring the variable outside and using a named cleanup function. Suggested fix: Extract to a small helper or use a more conventional pattern: ```go var cancelUnsupervised context.CancelFunc if criticOwns && critic == nil { runCtx, cancelUnsupervised = context.WithTimeout(runCtx, maxRuntime) } defer func() { if cancelUnsupervised != nil { cancelUnsupervised() } }() ``` - **`run/executor.go:37-39` — External system reference in comment**: The `CriticAbsoluteMax` field comment references `mort's agents.critic.absolute_max_seconds = 6h` — an external convar from a different codebase. While contextually helpful, embedding external system configuration details in kernel comments creates a maintenance coupling. Suggested fix: Rephrase to describe the relationship without hardcoding external values (e.g., "the host clamps its own backstop to a smaller absolute max"). </details> <details><summary>⚡ Performance — Minor issues</summary> The finding is verified. The code does create a nested timeout structure where: 1. Line 315: `timeoutCtx` is created with `hardCap` (24h when critic owns) 2. Lines 336-340: When `criticOwns && critic == nil`, a new timeout wraps `runCtx` with `maxRuntime` The outer 24h timer remains scheduled until the function exits and the defer at line 316 fires. This is technically wasteful, though it's a rare edge case (defensive path). The finding is accurate and survives verification. The verdict of "Minor issues" is appropriate since this is a resource efficiency concern in a rare edge case, not a correctness bug. **VERDICT:** Minor issues **Findings:** - **run/executor.go:315-340** — *Inefficient nested timeout in unsupervised-run failsafe.* When `criticOwns && critic == nil` (critic configured/enabled but host returned nil handle), the code first creates `timeoutCtx` with the generous 24h `CriticAbsoluteMax` (line 315), then wraps it with a nested `WithTimeout(runCtx, maxRuntime)` (line 338). The outer 24h timer remains scheduled even though it will never fire (the inner `maxRuntime` timeout fires first). In a high-throughput system where critic handles occasionally fail to arm, this accumulates dormant 24h timers that waste resources. **Fix:** Check the critic handle validity *before* creating the generous timeout, or cancel/replace the outer timeout when tightening: `cancelTimeout(); timeoutCtx, cancelTimeout = context.WithTimeout(context.WithoutCancel(ctx), maxRuntime)`. *(Verified by reading run/executor.go:315-340 and run/executor.go:65-66 confirming 24h default)* </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> All findings in the draft review are either style suggestions or based on misunderstandings. The code is well-documented, tested, and works as designed: 1. The `context.Cause` usage is standard Go 1.20+ behavior and is already well-documented in the code comments. 2. The silent return when `Monitor` returns `nil` is intentional defensive behavior, as documented in the code comment ("mort's adapter always arms when the flag is set, so this is pure defence in depth"). 3. The `CriticAbsoluteMax` default of 24h vs the host's 6h clamp is intentional design - the 24h is the kernel ceiling while 6h is the host clamp, as clearly documented in the struct comment at lines 32-42. No material issues found. ## Verdict: No material issues found After reviewing the changes through the error handling & edge cases lens and verifying against the checked-out code, all findings in the draft review are either style suggestions or based on misunderstandings of the intentional design: - The unsupervised failsafe at `executor.go:336-340` correctly re-wraps the context and sets `checkpointCause` after the re-wrap, ensuring proper cancellation cause propagation. The behavior is tested by `TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime`. - The silent return when `Monitor` returns `nil` at `critic.go:58-60` is intentional defensive behavior, as documented in the code comment at `executor.go:330-335`. - The `CriticAbsoluteMax` default of 24h at `executor.go:66` is intentionally larger than the host's 6h clamp, as explained in the struct comment at lines 32-42. </details> </details> </details> Automated adversarial review by Gadfly — consensus across the model swarm. Advisory only — does not block merge.

steve added 1 commit 2026-06-30 15:32:49 +00:00

fix(run): address gadfly review of the critic-deadline PR

executus CI / test (pull_request) Successful in 1m45s

Details

cb4c612461

All 11 findings were real (3 clusters):

- Failsafe ceiling could pre-empt the critic's backstop (e9c9483f, 9109317b,
  d5a9bf0d, 76ad171e): CriticAbsoluteMax was 6h, but the host's backstop
  (MaxRuntime × multiplier, or its own absolute max) can reach 6h+, so the
  ceiling fired first and reintroduced a premature hard cap. Now CriticAbsoluteMax
  is a 24h RUNAWAY guard set far beyond any realistic backstop (the host clamps
  its own backstop to a much smaller absolute max, e.g. mort's 6h convar), so it
  never pre-empts a healthy supervised run. Comments corrected.

- nil Monitor handle lost the MaxRuntime cap (df016a6f, 9dd42827): a critic-enabled
  run whose host Monitor returned no handle had no deadline-watch and was bounded
  only by the generous ceiling. Added an unsupervised-run failsafe that re-wraps
  runCtx to the nominal MaxRuntime when the critic is enabled but didn't arm.
  New test TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime.

- CriticSoftTimeout vestigial / dead fallback (f7764919, 9805bebe, 6864086f,
  b2b11721): the soft trigger is now always the resolved MaxRuntime (> 0), so the
  CriticSoftTimeout field + its startCritic fallback were unreachable. Removed the
  field entirely; the remaining 90s floor is documented as defensive-only.

- DRY (f30ce827): extracted e.criticOwnsDeadline(ra), now the single predicate used
  by both Run and startCritic so they can't drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X

gitea-actions bot reviewed 2026-06-30 15:49:58 +00:00

gitea-actions bot left a comment

🪰 Gadfly consensus review — 5 inline findings on changed lines. See the consensus comment for the full ranked summary.

_{Advisory only — does not block merge.}

🪰 **Gadfly consensus review** — 5 inline findings on changed lines. See the consensus comment for the full ranked summary. Advisory only — does not block merge.

run/critic.go

						
				@@ -34,1 +42,3 @@

				func (e *Executor) startCritic(runCtx context.Context, cancelCause context.CancelCauseFunc, ra RunnableAgent, info RunInfo) (*criticBinding, func()) {

				//

				// softTrigger is the run's resolved MaxRuntime: for a critic-owned run MaxRuntime

				// is the soft wake (mort's two-tier semantics — the critic first reviews once the

gitea-actions bot commented

2026-06-30 15:49:58 +00:00

🟡 softTrigger=MaxRuntime is overloaded as the default battery's idle-stall window (critic.go:262), delaying hang detection from ~90s to a full MaxRuntime; comment 'reviews once the run exceeds its nominal budget' misdescribes the idle-based default behavior

correctness · flagged by 1 model

_{🪰 Gadfly · advisory}

🟡 **softTrigger=MaxRuntime is overloaded as the default battery's idle-stall window (critic.go:262), delaying hang detection from ~90s to a full MaxRuntime; comment 'reviews once the run exceeds its nominal budget' misdescribes the idle-based default behavior** _correctness · flagged by 1 model_ 🪰 Gadfly · advisory

run/critic.go Outdated

						
				@@ -40,2 +54,3 @@

					soft := softTrigger

					if soft <= 0 {

						soft = 90 * time.Second // defensive: withFallbacks normally guarantees >0

						soft = 90 * time.Second // defensive only; the sole caller passes MaxRuntime (>0)

gitea-actions bot commented

2026-06-30 15:49:58 +00:00

⚪ Unreachable 90s defensive fallback reintroduces a bare magic literal that was previously a named default (CriticSoftTimeout)

maintainability · flagged by 1 model

_{🪰 Gadfly · advisory}

⚪ **Unreachable 90s defensive fallback reintroduces a bare magic literal that was previously a named default (CriticSoftTimeout)** _maintainability · flagged by 1 model_ 🪰 Gadfly · advisory

run/executor.go

						
				@@ -33,0 +34,4 @@

					// not a hard cap, and the critic's own extendable backstop is the normal

					// deadline. This ceiling exists ONLY to stop a critic that never advances its

					// deadline (a broken host handle) from running forever, so it is deliberately

					// set FAR beyond any realistic backstop (default 24h): the host clamps its own

gitea-actions bot commented

2026-06-30 15:49:58 +00:00

⚪ Comment references external system convar value

maintainability · flagged by 1 model

run/executor.go:37-39 — External system reference in comment: The CriticAbsoluteMax field comment references mort's agents.critic.absolute_max_seconds = 6h — an external convar from a different codebase. While contextually helpful, embedding external system configuration details in kernel comments creates a maintenance coupling. Suggested fix: Rephrase to describe the relationship without hardcoding external values (e.g., "the host clamps its own backstop to a smaller absolute max").

_{🪰 Gadfly · advisory}

⚪ **Comment references external system convar value** _maintainability · flagged by 1 model_ - **`run/executor.go:37-39` — External system reference in comment**: The `CriticAbsoluteMax` field comment references `mort's agents.critic.absolute_max_seconds = 6h` — an external convar from a different codebase. While contextually helpful, embedding external system configuration details in kernel comments creates a maintenance coupling. Suggested fix: Rephrase to describe the relationship without hardcoding external values (e.g., "the host clamps its own backstop to a smaller absolute max"). 🪰 Gadfly · advisory

run/executor.go Outdated

						
				@@ -280,0 +312,4 @@

							hardCap = maxRuntime

						}

					}

					timeoutCtx, cancelTimeout := context.WithTimeout(context.WithoutCancel(ctx), hardCap)

gitea-actions bot commented

2026-06-30 15:49:58 +00:00

🟠 Nested 24h timeout wastes timer resources in unsupervised-run failsafe path

performance · flagged by 1 model

run/executor.go:315-340 — Inefficient nested timeout in unsupervised-run failsafe. When criticOwns && critic == nil (critic configured/enabled but host returned nil handle), the code first creates timeoutCtx with the generous 24h CriticAbsoluteMax (line 315), then wraps it with a nested WithTimeout(runCtx, maxRuntime) (line 338). The outer 24h timer remains scheduled even though it will never fire (the inner maxRuntime timeout fires first). In a high-throughput system where cri…

_{🪰 Gadfly · advisory}

🟠 **Nested 24h timeout wastes timer resources in unsupervised-run failsafe path** _performance · flagged by 1 model_ - **run/executor.go:315-340** — *Inefficient nested timeout in unsupervised-run failsafe.* When `criticOwns && critic == nil` (critic configured/enabled but host returned nil handle), the code first creates `timeoutCtx` with the generous 24h `CriticAbsoluteMax` (line 315), then wraps it with a nested `WithTimeout(runCtx, maxRuntime)` (line 338). The outer 24h timer remains scheduled even though it will never fire (the inner `maxRuntime` timeout fires first). In a high-throughput system where cri… 🪰 Gadfly · advisory

run/executor.go

						
				@@ -295,0 +333,4 @@

					// bounded only by the 24h ceiling. Tighten it back to the nominal MaxRuntime so

					// an unsupervised run can't hold its slot far past budget. mort's adapter always

					// arms when the flag is set, so this is pure defence in depth.

					if criticOwns && critic == nil {

gitea-actions bot commented

2026-06-30 15:49:58 +00:00

🟡 Unusual defer-in-conditional pattern in unsupervised failsafe

maintainability · flagged by 1 model

run/executor.go:336-340 — Unusual defer-in-conditional pattern: The unsupervised-run failsafe wraps runCtx with a new timeout and defers cancellation inside the if block. While this works in Go (the defer is registered conditionally and captures the variable), this pattern is less common and slightly harder to read than the alternative of declaring the variable outside and using a named cleanup function. Suggested fix: Extract to a small helper or use a more conventional pattern: ```…

_{🪰 Gadfly · advisory}

🟡 **Unusual defer-in-conditional pattern in unsupervised failsafe** _maintainability · flagged by 1 model_ - **`run/executor.go:336-340` — Unusual defer-in-conditional pattern**: The unsupervised-run failsafe wraps `runCtx` with a new timeout and defers cancellation inside the `if` block. While this works in Go (the defer is registered conditionally and captures the variable), this pattern is less common and slightly harder to read than the alternative of declaring the variable outside and using a named cleanup function. Suggested fix: Extract to a small helper or use a more conventional pattern: ```… 🪰 Gadfly · advisory

steve added 1 commit 2026-06-30 15:54:39 +00:00

fix(run): address round-2 gadfly nits (max(), drop dead soft fallback, decouple doc)

executus CI / test (pull_request) Successful in 49s

Details

79ce833dd7

All low-severity follow-ups on the critic-deadline change:
- hardCap uses max(CriticAbsoluteMax, maxRuntime) instead of a nested if (723193a7).
- Drop the now-dead 90s soft-trigger fallback + its bare literal: the sole caller
  passes the resolved MaxRuntime (>0), and Run's unsupervised-run failsafe bounds
  even an impossible 0 (8d377051, 2f86bf58).
- Decouple the kernel doc from a named downstream convar ("a 6h host convar")
  (730c67fc).

Graded false-positive: agent.go BackstopMultiplier validation (handled in the host;
not in this diff), the 24h default "magic number" (matches every withFallbacks
default), and the defer-in-conditional pattern (idiomatic). Kept: the thorough
two-tier comment (this logic regressed once) and the rare-path nested timer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X

steve merged commit 2a43210f38 into main

2026-06-30 15:56:31 +00:00

steve deleted branch feat/critic-owns-deadline

2026-06-30 15:56:31 +00:00

steve referenced this issue from a commit

2026-06-30 15:58:01 +00:00

feat(critic): restore the supervised two-tier timeout + extend it to skills

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: steve/executus#21

feat(run): critic owns the deadline — MaxRuntime becomes the soft trigger #21

Problem

Fix — restore the old agentexec two-tier semantics

Tests (run/critic_deadline_test.go)

🪰 Gadfly — live review status

claude-code/opus · claude-code — ✅ done

claude-code/opus:max · claude-code — ✅ done

deepseek-v4-pro:cloud · ollama-cloud — ✅ done

glm-5.2:cloud · ollama-cloud — ✅ done

kimi-k2.6:cloud · ollama-cloud — ✅ done

qwen3.5:397b-cloud · ollama-cloud — ✅ done

🪰 Gadfly review — consensus across 6 models

VERDICT: No material issues found

VERDICT: Minor issues

VERDICT: No material issues found

VERDICT: No material issues found

VERDICT: No material issues found

VERDICT: Minor issues

Verdict: No material issues found

Verdict: No material issues found

Verdict: No material issues found

Verdict: No material issues found

Verdict: No material issues found

VERDICT: Minor issues found

Verdict: No material issues found

Tests (`run/critic_deadline_test.go`)

`claude-code/opus` · claude-code — ✅ done

`claude-code/opus:max` · claude-code — ✅ done

`deepseek-v4-pro:cloud` · ollama-cloud — ✅ done

`glm-5.2:cloud` · ollama-cloud — ✅ done

`kimi-k2.6:cloud` · ollama-cloud — ✅ done

`qwen3.5:397b-cloud` · ollama-cloud — ✅ done