run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status) #11

2026-06-27T20:35:29Z

steve commented

2026-06-27 20:35:29 +00:00

Completes the run-critic seam so a host adapter (mort's agentcritic) has full fidelity — closing the two limitations gadfly surfaced on mort #1334.

Changes

RecordStep(iter int, resp *llm.Response) — the completed step's model response is now passed to the critic (was index-only), so a host that keeps a trace (mort's ProgressRecorder) can show what the agent produced, not just an iteration count. The executor forwards s.Response; the battery ignores it (its Progress is count-based).
CriticHandle.KillCause() error + run.ErrCriticKill — the executor now distinguishes an explicit critic KILL from a natural backstop timeout. runCtx uses a cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause → killed / timeout / cancelled are now distinct (were all collapsed to cancelled). The battery sets killCause from Decision.KillReason.

Tests

statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle + battery RecordStep/KillCause signatures updated. Core stays battery-free; full suite green.

Pairs with a mort adapter PR that uses the richer interface (feed the recorder the full step; return the kill error from KillCause). Shipping in executus v0.1.0.

🤖 Generated with Claude Code

Completes the run-critic seam so a host adapter (mort's `agentcritic`) has full fidelity — closing the two limitations gadfly surfaced on mort #1334. ### Changes - **`RecordStep(iter int, resp *llm.Response)`** — the completed step's model response is now passed to the critic (was index-only), so a host that keeps a trace (mort's `ProgressRecorder`) can show *what the agent produced*, not just an iteration count. The executor forwards `s.Response`; the battery ignores it (its `Progress` is count-based). - **`CriticHandle.KillCause() error` + `run.ErrCriticKill`** — the executor now distinguishes an explicit critic **KILL** from a natural backstop **timeout**. `runCtx` uses a cause-carrying cancel (`WithCancelCause` + a `MaxRuntime` timer cancelling with `DeadlineExceeded`); the deadline-watch cancels with `ErrCriticKill` when `KillCause()!=nil`, else `DeadlineExceeded`. `statusFor` reads `context.Cause` → **killed / timeout / cancelled are now distinct** (were all collapsed to `cancelled`). The battery sets `killCause` from `Decision.KillReason`. ### Tests `statusFor` "killed" case (cause=`ErrCriticKill`, err=`Canceled`); fake handle + battery `RecordStep`/`KillCause` signatures updated. Core stays battery-free; full suite green. Pairs with a mort adapter PR that uses the richer interface (feed the recorder the full step; return the kill error from `KillCause`). Shipping in **executus v0.1.0**. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

steve added 1 commit 2026-06-27 20:35:30 +00:00

run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status)

executus CI / test (pull_request) Successful in 46s

Details

Adversarial Review (Gadfly) / review (pull_request) Successful in 22m30s

Details

390e6cf905

Completes the run-critic seam so a host adapter (mort's agentcritic) has full
fidelity, closing the two limitations gadfly surfaced on mort #1334.

- RecordStep(iter int, resp *llm.Response): the completed step's model response
  is now passed to the critic (was index-only), so a host that records a trace
  (mort's ProgressRecorder) can show what the agent actually produced, not just
  an iteration count. The executor forwards s.Response; the battery ignores it
  (its Progress is count-based).
- CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes
  an explicit critic KILL from a natural backstop expiry. runCtx uses a
  cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with
  DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when
  KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause →
  killed / timeout / cancelled are now distinct (were all "cancelled"). The
  battery sets killCause from Decision.KillReason on a Kill.

Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle
+ battery RecordStep/KillCause signatures. Core stays battery-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitea-actions bot commented

2026-06-27 20:35:33 +00:00

🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5)

Verdict: No material issues found · ⚠️ 2/3 lens(es) errored — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

Security lens review:

The changes are primarily structural (interface evolution + context cancellation semantics). I verified the following:

KillReason from untrusted host input into error messages: critic/critic.go:283 — h.killCause = errors.New(reason) uses d.KillReason verbatim, with no validation, sanitization, or length limit. This string originates from the host-provided OnSoftTimeout callback, which could be attacker-controlled (misconfigured host) or contain sensitive system details. If error messages are logged, this creates a log injection / information disclosure vector. A plausible mitigation is truncating or at least validating KillReason before wrapping in an error (e.g., strip newlines, cap length).
LLM-sourced resp *llm.Response through the RecordStep interface: The response is silently ignored (_), which is safe for now. However, the comment explicitly notes "a richer Escalator could record it" — any future host implementation that stores or logs this payload from untrusted model output without redaction would inherit an information disclosure risk. This isn't a bug in this PR (the nil-ignoring path is correct), but the interface design invites misuse without guardrails.

All other changes — RecordStep signature, KillCause method, context.WithCancelCause migration, statusFor distinction logic — are sound within this lens:

statusFor correctly uses errors.Is(context.Cause(runCtx), ErrCriticKill) to check the sentinel, not equality (preventing spoofing).
The MergeCancellation + cancelCause interaction is consistent; no new attack surface.
No authn/authz, SSRF, deserialization, or secret leakage issues introduced.

The KillReason injection point (#1 above) is the only concrete security finding, and it's a low-severity pattern common in Go error wrapping — worth sanitizing but not a blocking issue.

🎯 Correctness — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 6: context deadline exceeded

🧯 Error handling & edge cases — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 7: context deadline exceeded

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 22m 26s}

### 🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5) **Verdict: No material issues found · ⚠️ 2/3 lens(es) errored** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> ## No material issues found **Security lens review:** The changes are primarily structural (interface evolution + context cancellation semantics). I verified the following: 1. **`KillReason` from untrusted host input into error messages**: `critic/critic.go:283` — `h.killCause = errors.New(reason)` uses `d.KillReason` verbatim, with no validation, sanitization, or length limit. This string originates from the host-provided `OnSoftTimeout` callback, which could be attacker-controlled (misconfigured host) or contain sensitive system details. If error messages are logged, this creates a **log injection** / information disclosure vector. A plausible mitigation is truncating or at least validating `KillReason` before wrapping in an error (e.g., strip newlines, cap length). 2. **LLM-sourced `resp *llm.Response` through the `RecordStep` interface**: The response is silently ignored (`_`), which is safe for now. However, the comment explicitly notes "a richer Escalator could record it" — any future host implementation that stores or logs this payload from untrusted model output without redaction would inherit an information disclosure risk. This isn't a bug in *this* PR (the nil-ignoring path is correct), but the interface design invites misuse without guardrails. All other changes — `RecordStep` signature, `KillCause` method, `context.WithCancelCause` migration, `statusFor` distinction logic — are sound within this lens: - `statusFor` correctly uses `errors.Is(context.Cause(runCtx), ErrCriticKill)` to check the sentinel, not equality (preventing spoofing). - The `MergeCancellation` + `cancelCause` interaction is consistent; no new attack surface. - No authn/authz, SSRF, deserialization, or secret leakage issues introduced. The `KillReason` injection point (#1 above) is the only concrete security finding, and it's a low-severity pattern common in Go error wrapping — worth sanitizing but not a blocking issue. </details> <details><summary>🎯 Correctness — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 6: context deadline exceeded </details> <details><summary>🧯 Error handling & edge cases — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 7: context deadline exceeded </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 22m 26s

gitea-actions bot commented

2026-06-27 20:35:33 +00:00

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

🔒 Security Review

VERDICT: No material issues found

Verified findings:

run/ports.go:17 — ErrCriticKill is a package-private sentinel constructed via errors.New with a static string; no untrusted input flows into it. Safe.
run/critic.go:67 — The kill-reason string from the critic's KillCause() is wrapped via fmt.Errorf("%w: %s", ErrCriticKill, cause.Error()). cause originates from the in-process critic (critic/critic.go:283, which builds errors.New(reason) from d.KillReason set by a host Escalator). It's host-controlled but stays in-process (no SQL/command/path/shell sinks, no template rendering, no SSRF target). The %s verb with a non-error value is fine here since cause.Error() returns a string. No injection vector.
critic/critic.go:283 — killCause = errors.New(reason) where reason is a Decision.KillReason string from the host's Escalator. Host-controlled but trusted (same trust domain as the rest of the critic battery); no untrusted-data sinks. Safe.
run/executor.go:306-308 — statusFor reads context.Cause(runCtx). runCtx is context.WithoutCancel(ctx) wrapped in WithCancelCause then MergeCancellation. MergeCancellation (run/runengine.go:69) propagates context.Cause(secondary) — i.e. the caller's cancellation cause. A caller stamping context.Canceled as the cause is correctly reported as "cancelled"; a caller stamping something exotic flows into errors.Is checks against ErrCriticKill/DeadlineExceeded/Canceled. No bypass of the cause classification — only the three intended sentinels map to terminal statuses, anything else falls through to "error". Safe.
run/executor.go:205 — time.AfterFunc(maxRuntime, ...) cancels with the static context.DeadlineExceeded sentinel. No untrusted input. Safe.
Interface change — CriticHandle.RecordStep now carries *llm.Response. The critic battery (critic/critic.go:172) discards it (_ *llm.Response); the executor forwards it from majordomo's step observer (run/executor.go:230). *llm.Response is the same struct already passed to rec.OnStep (audit) without sanitisation — no new taint surface introduced. A host critic now sees the model's response, but trust is unchanged (the host already trusted the model layer to invoke RecordStep). No new attack surface.
No new credentials, no new external I/O, no new deserialization, no new authn/authz seams. The diff is internal control-flow / interface evolution; no security-relevant change.

🎯 Correctness — No material issues found

VERDICT: No material issues found

I traced the full correctness path through the change (context plumbing, cause propagation, ordering of statusFor cases, defer ordering) and could not surface a material bug.

Verified by reading:

run/executor.go:203-215,302-315 — WithCancelCause + time.AfterFunc(maxRuntime, …), defer order (stopCritic → mergeCancel → runTimer.Stop → cancelCause(nil)), statusFor ordering (ErrCriticKill → DeadlineExceeded → Canceled → error).
run/critic.go:34-86 — watch goroutine uses the merged runCtx; cancelCause is the WithCancelCause cancel; cause is set to ErrCriticKill (wrapped) or context.DeadlineExceeded based on KillCause().
critic/critic.go:211-215,277-286 — KillCause() is mutex-guarded; killCause is set from Decision.KillReason (default "critic killed the run").
run/runengine.go:55-73 — MergeCancellation cascades cause from parent (WithCancelCause) into the merged child via context-tree propagation, which is what statusFor relies on.
run/executor_test.go:148-174 — every test case is consistent: for an un-cancelled bg, context.Cause(bg)==nil; for killCtx cancelled with fmt.Errorf("%w: hung", ErrCriticKill), errors.Is(context.Cause(killCtx), ErrCriticKill) is true.

Notes (verified, no defect):

Cause-cascading correctness relies on Go's documented behavior that a child's context.Cause reflects the first cancellation in its ancestry via CancelCauseFunc. When cancelCause(ErrCriticKill) is called, the merged child reports ErrCriticKill; when cancelCause(DeadlineExceeded) is called, it reports DeadlineExceeded. Both land on the right statusFor arm.
KillCause()!=nil is sticky (killed flag in the critic handle prevents later un-kill), so the watch goroutine won't observe a "kill" then a "natural expiry" — kill wins, and cancelCause(ErrCriticKill) is idempotent against any later MaxRuntime timer firing.
defer cancelCause(nil) runs after statusFor, so it can't influence the recorded status; clearing the cause on the way out is purely hygiene.
The MaxRuntime → AfterFunc replacement preserves the prior WithTimeout → "timeout" semantics for the common case (no critic kill).

No constants, formulas, units, or thresholds introduced by this change are arithmetically suspect. The criticDeadlineCheck = 1s and MaxRuntime = 60s defaults were pre-existing.

🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

run/critic.go:48-75 — panic-recover in the deadline-watch silently abandons monitoring. The defer func() { _ = recover() }() swallows any panic from h.Deadline() / h.KillCause() and the goroutine then returns without logging. stopCritic's deferred h.Stop() still fires (no leak), but the hard-deadline backstop is silently lost for the rest of the run — only runTimer (MaxRuntime) or caller-cancel will trip. Worth at minimum an slog warning so a misbehaving host CriticHandle doesn't degrade the run invisibly; contrast critic/critic.go:228-236 which does log on panic. Confirmed by reading the goroutine body and the watch analogue in critic/critic.go.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 38s}

### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> ## 🔒 Security Review **VERDICT: No material issues found** Verified findings: - **`run/ports.go:17`** — `ErrCriticKill` is a package-private sentinel constructed via `errors.New` with a static string; no untrusted input flows into it. Safe. - **`run/critic.go:67`** — The kill-reason string from the critic's `KillCause()` is wrapped via `fmt.Errorf("%w: %s", ErrCriticKill, cause.Error())`. `cause` originates from the in-process critic (`critic/critic.go:283`, which builds `errors.New(reason)` from `d.KillReason` set by a host `Escalator`). It's host-controlled but stays in-process (no SQL/command/path/shell sinks, no template rendering, no SSRF target). The `%s` verb with a non-error value is fine here since `cause.Error()` returns a `string`. No injection vector. - **`critic/critic.go:283`** — `killCause = errors.New(reason)` where `reason` is a `Decision.KillReason` string from the host's `Escalator`. Host-controlled but trusted (same trust domain as the rest of the critic battery); no untrusted-data sinks. Safe. - **`run/executor.go:306-308`** — `statusFor` reads `context.Cause(runCtx)`. `runCtx` is `context.WithoutCancel(ctx)` wrapped in `WithCancelCause` then `MergeCancellation`. `MergeCancellation` (`run/runengine.go:69`) propagates `context.Cause(secondary)` — i.e. the caller's cancellation cause. A caller stamping `context.Canceled` as the cause is correctly reported as "cancelled"; a caller stamping something exotic flows into `errors.Is` checks against `ErrCriticKill`/`DeadlineExceeded`/`Canceled`. No bypass of the cause classification — only the three intended sentinels map to terminal statuses, anything else falls through to "error". Safe. - **`run/executor.go:205`** — `time.AfterFunc(maxRuntime, ...)` cancels with the static `context.DeadlineExceeded` sentinel. No untrusted input. Safe. - **Interface change** — `CriticHandle.RecordStep` now carries `*llm.Response`. The critic battery (`critic/critic.go:172`) discards it (`_ *llm.Response`); the executor forwards it from majordomo's step observer (`run/executor.go:230`). `*llm.Response` is the same struct already passed to `rec.OnStep` (audit) without sanitisation — no new taint surface introduced. A host critic now sees the model's response, but trust is unchanged (the host already trusted the model layer to invoke `RecordStep`). No new attack surface. - **No new credentials, no new external I/O, no new deserialization, no new authn/authz seams.** The diff is internal control-flow / interface evolution; no security-relevant change. </details> <details><summary>🎯 Correctness — No material issues found</summary> **VERDICT: No material issues found** I traced the full correctness path through the change (context plumbing, cause propagation, ordering of `statusFor` cases, defer ordering) and could not surface a material bug. Verified by reading: - `run/executor.go:203-215,302-315` — `WithCancelCause` + `time.AfterFunc(maxRuntime, …)`, defer order (stopCritic → mergeCancel → runTimer.Stop → cancelCause(nil)), `statusFor` ordering (ErrCriticKill → DeadlineExceeded → Canceled → error). - `run/critic.go:34-86` — watch goroutine uses the merged `runCtx`; `cancelCause` is the WithCancelCause cancel; cause is set to `ErrCriticKill` (wrapped) or `context.DeadlineExceeded` based on `KillCause()`. - `critic/critic.go:211-215,277-286` — `KillCause()` is mutex-guarded; `killCause` is set from `Decision.KillReason` (default `"critic killed the run"`). - `run/runengine.go:55-73` — `MergeCancellation` cascades cause from parent (WithCancelCause) into the merged child via context-tree propagation, which is what `statusFor` relies on. - `run/executor_test.go:148-174` — every test case is consistent: for an un-cancelled `bg`, `context.Cause(bg)==nil`; for `killCtx` cancelled with `fmt.Errorf("%w: hung", ErrCriticKill)`, `errors.Is(context.Cause(killCtx), ErrCriticKill)` is true. Notes (verified, no defect): - Cause-cascading correctness relies on Go's documented behavior that a child's `context.Cause` reflects the first cancellation in its ancestry via `CancelCauseFunc`. When `cancelCause(ErrCriticKill)` is called, the merged child reports `ErrCriticKill`; when `cancelCause(DeadlineExceeded)` is called, it reports `DeadlineExceeded`. Both land on the right `statusFor` arm. - `KillCause()!=nil` is sticky (`killed` flag in the critic handle prevents later un-kill), so the watch goroutine won't observe a "kill" then a "natural expiry" — kill wins, and `cancelCause(ErrCriticKill)` is idempotent against any later `MaxRuntime` timer firing. - `defer cancelCause(nil)` runs after `statusFor`, so it can't influence the recorded status; clearing the cause on the way out is purely hygiene. - The `MaxRuntime` → `AfterFunc` replacement preserves the prior `WithTimeout → "timeout"` semantics for the common case (no critic kill). No constants, formulas, units, or thresholds introduced by this change are arithmetically suspect. The `criticDeadlineCheck = 1s` and `MaxRuntime = 60s` defaults were pre-existing. </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> **VERDICT: Minor issues** - **`run/critic.go:48-75` — panic-recover in the deadline-watch silently abandons monitoring.** The `defer func() { _ = recover() }()` swallows any panic from `h.Deadline()` / `h.KillCause()` and the goroutine then returns without logging. `stopCritic`'s deferred `h.Stop()` still fires (no leak), but the hard-deadline backstop is silently lost for the rest of the run — only `runTimer` (MaxRuntime) or caller-cancel will trip. Worth at minimum an `slog` warning so a misbehaving host `CriticHandle` doesn't degrade the run invisibly; contrast `critic/critic.go:228-236` which does log on panic. Confirmed by reading the goroutine body and the `watch` analogue in `critic/critic.go`. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 38s

gitea-actions bot commented

2026-06-27 20:35:33 +00:00

🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud)

Verdict: Minor issues · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Reviewing through the security lens (authn/authz, injection, SSRF, deserialization, credential leakage, input validation, unsafe handling of untrusted data, insecure defaults):

Decision.KillReason → errors.New(reason) → KillCause() → fmt.Errorf("%w: %s", ErrCriticKill, …) → statusFor / stats.Error: The string originates from the host-provided Escalator implementation (a controlled, trusted seam — mort plugs its own critic-agent here). It is only ever stringified into error messages and audit fields; it is never interpolated into commands, queries, paths, or templates. No injection vector.
ErrCriticKill exported sentinel: Identity-check via errors.Is works correctly (errors.New produces a unique pointer). No Unwrap or Is method is needed or expected. Not a security concern.
RecordStep(iter int, resp *llm.Response): The battery implementation discards the response (_ *llm.Response). No untrusted data is persisted or executed from this parameter. The *llm.Response is produced by the executor's own model call, not by the caller — and even if it were attacker-influenced, the battery ignores it.
Race between deadline-watch goroutine and statusFor: KillCause() is mutex-protected; the cause is read before cancelCause is called, and statusFor reads context.Cause(runCtx) only after ag.Run returns (meaning runCtx is already done and the cause is settled). No TOCTOU issue.
MergeCancellation propagates context.Cause(secondary): When the caller cancels, the original context's cause (which could be nil) flows into runCtx. statusFor handles this correctly — it falls through to the context.Canceled check. No status-confusion vulnerability.
No credential/secret leakage, no SSRF, no path traversal, no unsafe deserialization: None of the changed code handles URLs, file paths, secrets, or deserialization.

The change is clean from a security perspective.

🎯 Correctness — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 12: all chain targets failed
ollama-cloud/glm-5.1☁️ ollama ollama-cloud/glm-5.1☁️ do request: Post "https://ollama.com/api/chat": context deadline exceeded

🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

statusFor doesn't read context.Cause on the correct context when MergeCancellation re-wraps it (verified). At run/executor.go:206, runCtx is reassigned to the merged context returned by MergeCancellation. However, context.Cause(mergedCtx) returns the cause set on the merged context itself, not its parent. When the runTimer or the critic deadline-watch goroutine calls cancelCause(…) on the outer context (line 203), the cancellation propagates down to the merged child, and Go's cancelCtx.cancel does propagate the parent's cause to children. So this works correctly today — but it relies on an internal implementation detail of context.cancelCtx (cause propagation from parent to child). If MergeCancellation is ever changed to use a different context type (e.g., valueCtx wrapping), context.Cause would stop seeing the timer/critic causes. This is fragile but not currently broken. No fix needed now, but a comment at the MergeCancellation call site noting that context.Cause on the merged context depends on parent-to-child cause propagation would guard against future regressions.
Missing test: MaxRuntime timeout via context.Cause path (verified). The statusFor test at run/executor_test.go:152 covers ErrCriticKill cause and direct context.DeadlineExceeded error, but it does not test the actual MaxRuntime timeout path where runErr is context.Canceled (because the context was cancelled, not timed out directly) and context.Cause(runCtx) is context.DeadlineExceeded. This is the most common "timeout" scenario in production. A test like:
```
timeoutCtx, timeoutCancel := context.WithCancelCause(context.Background())
timeoutCancel(context.DeadlineExceeded)
// statusFor(timeoutCtx, context.Canceled) should == "timeout"
```
would close the gap. Low severity — the logic is correct, but the primary production path for timeouts has no direct unit test.
No issue with KillCause race (verified). KillCause() and cancelCause in the deadline-watch goroutine are called sequentially in the same goroutine (run/critic.go:66-71), so there's no TOCTOU gap between reading KillCause() and calling cancelCause.
No issue with runErr == nil vs cause (verified). When runErr is nil but context.Cause(runCtx) is ErrCriticKill, statusFor correctly returns "ok" (the nil-error check is first). A completed run can't be "killed" — the kill must manifest as a non-nil error from ag.Run.
handle.Stop() doesn't set killCause (verified, not a bug). Stop() just closes the stop channel; it doesn't touch killCause. Since killCause is only set in tick() when d.Kill is true, and Stop() is called at run end, there's no path where Stop() should need to set it.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 10m 0s}

### 🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud) **Verdict: Minor issues · ⚠️ 1/3 lens(es) errored** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> **VERDICT: No material issues found** Reviewing through the security lens (authn/authz, injection, SSRF, deserialization, credential leakage, input validation, unsafe handling of untrusted data, insecure defaults): - **`Decision.KillReason` → `errors.New(reason)` → `KillCause()` → `fmt.Errorf("%w: %s", ErrCriticKill, …)` → `statusFor` / `stats.Error`**: The string originates from the host-provided `Escalator` implementation (a controlled, trusted seam — mort plugs its own critic-agent here). It is only ever stringified into error messages and audit fields; it is never interpolated into commands, queries, paths, or templates. No injection vector. - **`ErrCriticKill` exported sentinel**: Identity-check via `errors.Is` works correctly (`errors.New` produces a unique pointer). No `Unwrap` or `Is` method is needed or expected. Not a security concern. - **`RecordStep(iter int, resp *llm.Response)`**: The battery implementation discards the response (`_ *llm.Response`). No untrusted data is persisted or executed from this parameter. The `*llm.Response` is produced by the executor's own model call, not by the caller — and even if it were attacker-influenced, the battery ignores it. - **Race between deadline-watch goroutine and `statusFor`**: `KillCause()` is mutex-protected; the cause is read *before* `cancelCause` is called, and `statusFor` reads `context.Cause(runCtx)` only *after* `ag.Run` returns (meaning `runCtx` is already done and the cause is settled). No TOCTOU issue. - **`MergeCancellation` propagates `context.Cause(secondary)`**: When the caller cancels, the original context's cause (which could be nil) flows into `runCtx`. `statusFor` handles this correctly — it falls through to the `context.Canceled` check. No status-confusion vulnerability. - **No credential/secret leakage, no SSRF, no path traversal, no unsafe deserialization**: None of the changed code handles URLs, file paths, secrets, or deserialization. The change is clean from a security perspective. </details> <details><summary>🎯 Correctness — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 12: all chain targets failed ollama-cloud/glm-5.1:cloud: ollama ollama-cloud/glm-5.1:cloud: do request: Post "https://ollama.com/api/chat": context deadline exceeded </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> **VERDICT: Minor issues** - **`statusFor` doesn't read `context.Cause` on the correct context when `MergeCancellation` re-wraps it (verified).** At `run/executor.go:206`, `runCtx` is reassigned to the merged context returned by `MergeCancellation`. However, `context.Cause(mergedCtx)` returns the cause set on the *merged* context itself, not its parent. When the `runTimer` or the critic deadline-watch goroutine calls `cancelCause(…)` on the **outer** context (line 203), the cancellation propagates *down* to the merged child, and Go's `cancelCtx.cancel` does propagate the parent's cause to children. So this works correctly today — but it relies on an internal implementation detail of `context.cancelCtx` (cause propagation from parent to child). If `MergeCancellation` is ever changed to use a different context type (e.g., `valueCtx` wrapping), `context.Cause` would stop seeing the timer/critic causes. This is fragile but not currently broken. **No fix needed now**, but a comment at the `MergeCancellation` call site noting that `context.Cause` on the merged context depends on parent-to-child cause propagation would guard against future regressions. - **Missing test: MaxRuntime timeout via `context.Cause` path (verified).** The `statusFor` test at `run/executor_test.go:152` covers `ErrCriticKill` cause and direct `context.DeadlineExceeded` error, but it does **not** test the actual MaxRuntime timeout path where `runErr` is `context.Canceled` (because the context was cancelled, not timed out directly) and `context.Cause(runCtx)` is `context.DeadlineExceeded`. This is the most common "timeout" scenario in production. A test like: ```go timeoutCtx, timeoutCancel := context.WithCancelCause(context.Background()) timeoutCancel(context.DeadlineExceeded) // statusFor(timeoutCtx, context.Canceled) should == "timeout" ``` would close the gap. **Low severity** — the logic is correct, but the primary production path for timeouts has no direct unit test. - **No issue with `KillCause` race (verified).** `KillCause()` and `cancelCause` in the deadline-watch goroutine are called sequentially in the same goroutine (`run/critic.go:66-71`), so there's no TOCTOU gap between reading `KillCause()` and calling `cancelCause`. - **No issue with `runErr == nil` vs cause (verified).** When `runErr` is nil but `context.Cause(runCtx)` is `ErrCriticKill`, `statusFor` correctly returns "ok" (the nil-error check is first). A completed run can't be "killed" — the kill must manifest as a non-nil error from `ag.Run`. - **`handle.Stop()` doesn't set `killCause` (verified, not a bug).** `Stop()` just closes the stop channel; it doesn't touch `killCause`. Since `killCause` is only set in `tick()` when `d.Kill` is true, and `Stop()` is called at run end, there's no path where `Stop()` should need to set it. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 10m 0s

gitea-actions bot commented

2026-06-27 20:35:33 +00:00

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Reviewed the tainted-data path through my lens: the critic kill reason originates from a host-supplied Escalator's Decision.KillReason (critic/critic.go:279) — host-controlled, not attacker/untrusted input. It is formatted via fmt.Errorf("%w: %s", ErrCriticKill, cause.Error()) (run/critic.go:67) using %s interpolation, so no format-string injection, and the nil guard (cause != nil) precedes .Error(), so no nil-deref. Verified by reading run/critic.go:60-72 and critic/critic.go:277-285.
The new RecordStep(iter, resp *llm.Response) forwards the completed step's model response (potentially sensitive content) to the host's CriticHandle. This is the intended inversion seam (Ports.Critic is a host-supplied port), not a cross-trust-boundary leak — the host already owns the agent/model/output. Verified against run/ports.go:130-148 (interface) and run/executor.go:230 (call site); resp is nil-safe at both the battery (ignored) and the binding (b != nil guard).
statusFor reads context.Cause(runCtx) for control-flow only (errors.Is), and the cause/kill reason is not written into audit output (finishAudit records runErr.Error(), not the cause — run/executor.go:329-330). So the kill reason does not surface in audit logs as unattributed/uncontrolled text. Verified.
No new injection, SSRF, deserialization, authn/authz, secret-leakage, or missing-validation surface introduced by this diff. The cause-carrying cancel and ErrCriticKill sentinel are internal status plumbing with host-controlled inputs.

🎯 Correctness — Minor issues

VERDICT: Minor issues

Through the correctness lens, I verified the status-cause plumbing end to end against the actual code. The core logic is sound, but there is one real race that can defeat the PR's central distinction (kill vs timeout), plus a narrow behavior change in statusFor.

run/critic.go:55-74 + run/executor.go:203-206 — critic-kill can be misclassified as "timeout" when it coincides with MaxRuntime. The critic's deadline-watch polls every criticDeadlineCheck (1s) and only cancels with ErrCriticKill from inside the <-t.C branch (after re-reading h.KillCause()). When a kill is decided (critic/critic.go:277-285: deadline = now, killCause set under h.mu), the watch won't notice until its next tick — up to ~1s later. Meanwhile runTimer (executor.go:205) fires at maxRuntime and calls cancelCause(context.DeadlineExceeded). context.WithCancelCause honors only the first cancel call, so if maxRuntime elapses inside that polling window, runTimer wins, context.Cause(runCtx) becomes DeadlineExceeded, and statusFor (executor.go:308) returns "timeout" even though the critic explicitly killed the run. The watch also has a <-runCtx.Done(): return arm (critic.go:59), so once runTimer fires the watch simply exits without ever stamping ErrCriticKill. The window is narrow (~1s relative to a typically large maxRuntime), but it is a genuine gap in the kill-vs-timeout guarantee the PR introduces.

Suggested fix: when the watch detects KillCause() != nil, cancel with ErrCriticKill immediately on kill detection independent of the polling tick (e.g., have tick signal a dedicated channel the watch selects on), or have runTimer consult KillCause() before choosing its cause. At minimum, narrowing criticDeadlineCheck reduces but does not eliminate the window.
run/executor.go:308 — a non-timeout generic error can be relabeled "timeout" if the context was independently cancelled. statusFor now returns "timeout" when errors.Is(runErr, context.DeadlineExceeded) || errors.Is(context.Cause(runCtx), context.DeadlineExceeded) regardless of runErr's actual value. If the agent returns a genuine non-context error (e.g., a model/transport error) in the same window that runTimer (or MergeCancellation propagating a caller DeadlineExceeded — runengine.go:69) fires, the cause wins and the real error is masked as a timeout. The old statusFor(runErr) reported "error" in that case. This is a narrow edge (normally a cancelled ctx makes ag.Run return ctx.Err(), not a generic error), so likely not material in practice, but it is a behavior change worth noting.

Other correctness aspects checked and clean: statusFor case ordering (ok → killed → timeout → cancelled → error) is correct given Cause is a single value; context.Cause returns nil for an uncanceled context so successful runs still map to "ok"; MergeCancellation propagates the caller's cause via cancel(context.Cause(secondary)) (runengine.go:69), consistent with prior behavior (caller Canceled → "cancelled", caller DeadlineExceeded → "timeout"); defer cancelCause(nil) (executor.go:204) runs only at function return, after statusFor is computed at executor.go:281, so it cannot overwrite the status; the battery's killCause is set under h.mu in tick (critic/critic.go:272-285) and read under h.mu in KillCause() (critic/critic.go:208-213); RecordStep(iter, _ *llm.Response) is a nil-safe signature widening with all call sites updated (run/critic.go:84, run/executor.go:230, critic/critic_test.go:54).

🧯 Error handling & edge cases — Minor issues

Verified review

VERDICT: Minor issues

Both findings confirmed against the actual code.

run/executor.go:302-309 — caller-deadline cancellation is now misclassified as "timeout" instead of "cancelled". The new statusFor adds errors.Is(context.Cause(runCtx), context.DeadlineExceeded). Confirmed at run/runengine.go:62-73: MergeCancellation(runCtx, ctx) spawns a goroutine that, on ctx.Done(), calls cancel(context.Cause(ctx)) against the WithCancelCause-derived merged context. When the caller's ctx is a context.WithTimeout that expires, context.Cause(ctx) is context.DeadlineExceeded, so the merged runCtx's cause becomes DeadlineExceeded even though the run's own MaxRuntime backstop (runTimer, run/executor.go:205) did not fire. The agent returns an error wrapping context.Canceled (a WithCancelCause cancel reports Canceled via ctx.Err()), so the old statusFor (which only inspected runErr) mapped this to "cancelled"; the new one inspects the cause and maps it to "timeout". This is a real, undocumented behavior change for the caller-deadline-expiry case, distinct from the PR's intended run-backstop timeout. Suggested fix: stamp the run's own backstop with a distinct sentinel (e.g. ErrRunTimeout) and check only that in statusFor, leaving caller-propagated DeadlineExceeded to fall through to "cancelled".
run/critic.go:53-71 vs run/executor.go:205 — race between critic kill and the MaxRuntime timer can lose the "killed" status. Confirmed: on a kill, critic/critic.go:277-285 sets h.deadline = h.now() and h.killCause, but the deadline-watch goroutine (run/critic.go:53-72) polls only every criticDeadlineCheck (1s, run/critic.go:14) before calling cancelCause(ErrCriticKill...). The runTimer (run/executor.go:205) fires independently with cancelCause(context.DeadlineExceeded) and is only stopped by the deferred runTimer.Stop() at the end of Run — the critic kill path does not stop it. context.CancelCauseFunc is first-write-wins, so if maxRuntime expires inside the ~1s window between the kill decision and the next poll, the cause becomes DeadlineExceeded and the critic's ErrCriticKill is dropped — status reports "timeout" for a run the critic explicitly killed, undermining the PR's core distinction. Narrow window, but real. Suggested fix: have the kill path cancel runCtx directly (or race the watch against a kill-signal channel / stop runTimer on kill) instead of relying on the polled deadline to relay it.

The happy paths and nil-safety (RecordStep ignoring an unused *llm.Response, nil-safe criticBinding methods, recover in the watch goroutine at run/critic.go:52) are handled correctly.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 0s}

### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> VERDICT: No material issues found - Reviewed the tainted-data path through my lens: the critic kill reason originates from a host-supplied `Escalator`'s `Decision.KillReason` (critic/critic.go:279) — host-controlled, not attacker/untrusted input. It is formatted via `fmt.Errorf("%w: %s", ErrCriticKill, cause.Error())` (run/critic.go:67) using `%s` interpolation, so no format-string injection, and the nil guard (`cause != nil`) precedes `.Error()`, so no nil-deref. Verified by reading run/critic.go:60-72 and critic/critic.go:277-285. - The new `RecordStep(iter, resp *llm.Response)` forwards the completed step's model response (potentially sensitive content) to the host's `CriticHandle`. This is the intended inversion seam (`Ports.Critic` is a host-supplied port), not a cross-trust-boundary leak — the host already owns the agent/model/output. Verified against run/ports.go:130-148 (interface) and run/executor.go:230 (call site); `resp` is nil-safe at both the battery (ignored) and the binding (`b != nil` guard). - `statusFor` reads `context.Cause(runCtx)` for control-flow only (`errors.Is`), and the cause/kill reason is not written into audit output (`finishAudit` records `runErr.Error()`, not the cause — run/executor.go:329-330). So the kill reason does not surface in audit logs as unattributed/uncontrolled text. Verified. - No new injection, SSRF, deserialization, authn/authz, secret-leakage, or missing-validation surface introduced by this diff. The cause-carrying cancel and `ErrCriticKill` sentinel are internal status plumbing with host-controlled inputs. </details> <details><summary>🎯 Correctness — Minor issues</summary> ## VERDICT: Minor issues Through the correctness lens, I verified the status-cause plumbing end to end against the actual code. The core logic is sound, but there is one real race that can defeat the PR's central distinction (kill vs timeout), plus a narrow behavior change in `statusFor`. - **`run/critic.go:55-74` + `run/executor.go:203-206` — critic-kill can be misclassified as "timeout" when it coincides with `MaxRuntime`.** The critic's deadline-watch polls every `criticDeadlineCheck` (1s) and only cancels with `ErrCriticKill` from inside the `<-t.C` branch (after re-reading `h.KillCause()`). When a kill is decided (`critic/critic.go:277-285`: `deadline = now`, `killCause` set under `h.mu`), the watch won't notice until its next tick — up to ~1s later. Meanwhile `runTimer` (`executor.go:205`) fires at `maxRuntime` and calls `cancelCause(context.DeadlineExceeded)`. `context.WithCancelCause` honors only the *first* cancel call, so if `maxRuntime` elapses inside that polling window, `runTimer` wins, `context.Cause(runCtx)` becomes `DeadlineExceeded`, and `statusFor` (`executor.go:308`) returns `"timeout"` even though the critic explicitly killed the run. The watch also has a `<-runCtx.Done(): return` arm (`critic.go:59`), so once `runTimer` fires the watch simply exits without ever stamping `ErrCriticKill`. The window is narrow (~1s relative to a typically large `maxRuntime`), but it is a genuine gap in the kill-vs-timeout guarantee the PR introduces. Suggested fix: when the watch detects `KillCause() != nil`, cancel with `ErrCriticKill` *immediately* on kill detection independent of the polling tick (e.g., have `tick` signal a dedicated channel the watch selects on), or have `runTimer` consult `KillCause()` before choosing its cause. At minimum, narrowing `criticDeadlineCheck` reduces but does not eliminate the window. - **`run/executor.go:308` — a non-timeout generic error can be relabeled "timeout" if the context was independently cancelled.** `statusFor` now returns `"timeout"` when `errors.Is(runErr, context.DeadlineExceeded) || errors.Is(context.Cause(runCtx), context.DeadlineExceeded)` regardless of `runErr`'s actual value. If the agent returns a genuine non-context error (e.g., a model/transport error) in the same window that `runTimer` (or `MergeCancellation` propagating a caller `DeadlineExceeded` — `runengine.go:69`) fires, the cause wins and the real error is masked as a timeout. The old `statusFor(runErr)` reported `"error"` in that case. This is a narrow edge (normally a cancelled ctx makes `ag.Run` return `ctx.Err()`, not a generic error), so likely not material in practice, but it is a behavior change worth noting. Other correctness aspects checked and clean: `statusFor` case ordering (`ok` → `killed` → `timeout` → `cancelled` → `error`) is correct given `Cause` is a single value; `context.Cause` returns `nil` for an uncanceled context so successful runs still map to `"ok"`; `MergeCancellation` propagates the caller's cause via `cancel(context.Cause(secondary))` (`runengine.go:69`), consistent with prior behavior (caller `Canceled` → `"cancelled"`, caller `DeadlineExceeded` → `"timeout"`); `defer cancelCause(nil)` (`executor.go:204`) runs only at function return, after `statusFor` is computed at `executor.go:281`, so it cannot overwrite the status; the battery's `killCause` is set under `h.mu` in `tick` (`critic/critic.go:272-285`) and read under `h.mu` in `KillCause()` (`critic/critic.go:208-213`); `RecordStep(iter, _ *llm.Response)` is a nil-safe signature widening with all call sites updated (`run/critic.go:84`, `run/executor.go:230`, `critic/critic_test.go:54`). </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> ## Verified review VERDICT: Minor issues Both findings confirmed against the actual code. - **`run/executor.go:302-309` — caller-deadline cancellation is now misclassified as "timeout" instead of "cancelled".** The new `statusFor` adds `errors.Is(context.Cause(runCtx), context.DeadlineExceeded)`. Confirmed at `run/runengine.go:62-73`: `MergeCancellation(runCtx, ctx)` spawns a goroutine that, on `ctx.Done()`, calls `cancel(context.Cause(ctx))` against the `WithCancelCause`-derived merged context. When the caller's `ctx` is a `context.WithTimeout` that expires, `context.Cause(ctx)` is `context.DeadlineExceeded`, so the merged `runCtx`'s cause becomes `DeadlineExceeded` even though the run's own `MaxRuntime` backstop (`runTimer`, `run/executor.go:205`) did not fire. The agent returns an error wrapping `context.Canceled` (a `WithCancelCause` cancel reports `Canceled` via `ctx.Err()`), so the old `statusFor` (which only inspected `runErr`) mapped this to `"cancelled"`; the new one inspects the cause and maps it to `"timeout"`. This is a real, undocumented behavior change for the caller-deadline-expiry case, distinct from the PR's intended run-backstop timeout. Suggested fix: stamp the run's own backstop with a distinct sentinel (e.g. `ErrRunTimeout`) and check only that in `statusFor`, leaving caller-propagated `DeadlineExceeded` to fall through to `"cancelled"`. - **`run/critic.go:53-71` vs `run/executor.go:205` — race between critic kill and the `MaxRuntime` timer can lose the "killed" status.** Confirmed: on a kill, `critic/critic.go:277-285` sets `h.deadline = h.now()` and `h.killCause`, but the deadline-watch goroutine (`run/critic.go:53-72`) polls only every `criticDeadlineCheck` (1s, `run/critic.go:14`) before calling `cancelCause(ErrCriticKill...)`. The `runTimer` (`run/executor.go:205`) fires independently with `cancelCause(context.DeadlineExceeded)` and is only stopped by the deferred `runTimer.Stop()` at the end of `Run` — the critic kill path does not stop it. `context.CancelCauseFunc` is first-write-wins, so if `maxRuntime` expires inside the ~1s window between the kill decision and the next poll, the cause becomes `DeadlineExceeded` and the critic's `ErrCriticKill` is dropped — status reports `"timeout"` for a run the critic explicitly killed, undermining the PR's core distinction. Narrow window, but real. Suggested fix: have the kill path cancel `runCtx` directly (or race the watch against a kill-signal channel / stop `runTimer` on kill) instead of relying on the polled deadline to relay it. The happy paths and nil-safety (`RecordStep` ignoring an unused `*llm.Response`, nil-safe `criticBinding` methods, recover in the watch goroutine at `run/critic.go:52`) are handled correctly. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 0s

gitea-actions bot commented

2026-06-27 20:38:12 +00:00

🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

Minor issues

Unvalidated host/model kill reason flows into errors, logs, and audit traces
critic/critic.go:279-283 stores Decision.KillReason directly into an error via errors.New(reason), defaulting to "critic killed the run". The host Escalator (e.g., mort's LLM critic-agent) can supply arbitrary text into this field with no length limit or character sanitization.
run/critic.go:66-67 then wraps that cause with ErrCriticKill and stamps it as the run's cancellation cause, so the raw reason ends up in context.Cause(runCtx) and is used by statusFor to report "killed". Depending on how the agent wraps the cancellation error, it may also reach Result.Err, audit RunStats.Error, and DeliverError.
Impact: log/audit pollution, potential PII leakage, and a trivial DoS vector from a multi-megabyte kill reason.
Fix: cap KillReason length and sanitize control characters in critic/critic.go before storing it.
Same issue at the public host seam
run/ports.go:149-154 documents CriticHandle.KillCause() as returning a reason, but the interface places no contract or validation on its content. A malicious or buggy host adapter can return an oversized or malformed string, and run/critic.go:66-67 will faithfully adopt it as part of the cancellation cause.
Fix: add a defensive length cap and normalization when wrapping KillCause() in run/critic.go:66-67.

No material authn/authz, command/SQL/template injection, SSRF, deserialization, or credential-leak issues were found in this diff.

🎯 Correctness — No material issues found

Verdict: No material issues found

Verified the cancellation-cause propagation chain: runCtx is the merged context from MergeCancellation, whose parent is the cause-carrying context. When the critic watch calls the parent's CancelCauseFunc with ErrCriticKill or DeadlineExceeded, context.Cause(runCtx) resolves to that cause (Cause walks ancestor contexts). statusFor checks Cause before ctx.Err()-derived values, so critic kill is correctly distinguished from caller cancel/timeout.
Verified MergeCancellation behavior: when the secondary (caller) context cancels, it stamps the merged context with context.Cause(secondary), preserving caller-cancel classification; when the parent cancels first with a critic cause, the merged context inherits it.
Verified the handle.killCause / handle.deadline update in critic/critic.go:277-285 is done under a single mutex critical section, so the startCritic poll cannot observe a past deadline without a matching non-nil KillCause once a kill decision has been applied.
Verified only two CriticHandle implementations exist in the repo (critic/critic.go and run/critic_test.go); both signatures were updated for RecordStep(iter int, resp *llm.Response) and KillCause() error.
The change correctly preserves the old WithTimeout semantics by firing cancelCause(context.DeadlineExceeded) from a time.AfterFunc.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

Findings:

run/executor.go:203 — runCtx is created with context.WithCancelCause(context.WithoutCancel(ctx)) and then immediately wrapped by MergeCancellation at run/executor.go:207. MergeCancellation may call cancel(context.Cause(secondary)) if the caller context ctx is cancelled, which would overwrite any cancellation cause previously set on runCtx (e.g., ErrCriticKill from startCritic or context.DeadlineExceeded from runTimer). Since statusFor reads context.Cause(runCtx), a caller-cancellation racing with a critic kill or timeout could overwrite the original cause and cause statusFor to report "cancelled" instead of "killed" or "timeout". However, this ordering is exactly the same as the prior code's behavior with context.WithTimeout(context.WithoutCancel(ctx), maxRuntime) followed by MergeCancellation; the only new risk is the loss of the reason when a critic kill races with caller cancellation, and there is no existing test or usage path in the diff or repo that demonstrates this race causing a user-visible misclassification. Treat as a latent edge case rather than a blocking bug.
run/executor_test.go:157 — The test creates killCtx but does not cancel the background context bg or killCtx via defer. This is a minor test-only resource leak and not a production issue.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 2s}

### 🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Minor issues</summary> **Minor issues** - **Unvalidated host/model kill reason flows into errors, logs, and audit traces** `critic/critic.go:279-283` stores `Decision.KillReason` directly into an `error` via `errors.New(reason)`, defaulting to `"critic killed the run"`. The host `Escalator` (e.g., mort's LLM critic-agent) can supply arbitrary text into this field with no length limit or character sanitization. `run/critic.go:66-67` then wraps that cause with `ErrCriticKill` and stamps it as the run's cancellation cause, so the raw reason ends up in `context.Cause(runCtx)` and is used by `statusFor` to report `"killed"`. Depending on how the agent wraps the cancellation error, it may also reach `Result.Err`, audit `RunStats.Error`, and `DeliverError`. **Impact:** log/audit pollution, potential PII leakage, and a trivial DoS vector from a multi-megabyte kill reason. **Fix:** cap `KillReason` length and sanitize control characters in `critic/critic.go` before storing it. - **Same issue at the public host seam** `run/ports.go:149-154` documents `CriticHandle.KillCause()` as returning a reason, but the interface places no contract or validation on its content. A malicious or buggy host adapter can return an oversized or malformed string, and `run/critic.go:66-67` will faithfully adopt it as part of the cancellation cause. **Fix:** add a defensive length cap and normalization when wrapping `KillCause()` in `run/critic.go:66-67`. No material authn/authz, command/SQL/template injection, SSRF, deserialization, or credential-leak issues were found in this diff. </details> <details><summary>🎯 Correctness — No material issues found</summary> **Verdict:** No material issues found - Verified the cancellation-cause propagation chain: `runCtx` is the merged context from `MergeCancellation`, whose parent is the cause-carrying context. When the critic watch calls the parent's `CancelCauseFunc` with `ErrCriticKill` or `DeadlineExceeded`, `context.Cause(runCtx)` resolves to that cause (Cause walks ancestor contexts). `statusFor` checks `Cause` before `ctx.Err()`-derived values, so critic kill is correctly distinguished from caller cancel/timeout. - Verified `MergeCancellation` behavior: when the secondary (caller) context cancels, it stamps the merged context with `context.Cause(secondary)`, preserving caller-cancel classification; when the parent cancels first with a critic cause, the merged context inherits it. - Verified the `handle.killCause` / `handle.deadline` update in `critic/critic.go:277-285` is done under a single mutex critical section, so the `startCritic` poll cannot observe a past deadline without a matching non-nil `KillCause` once a kill decision has been applied. - Verified only two `CriticHandle` implementations exist in the repo (`critic/critic.go` and `run/critic_test.go`); both signatures were updated for `RecordStep(iter int, resp *llm.Response)` and `KillCause() error`. - The change correctly preserves the old `WithTimeout` semantics by firing `cancelCause(context.DeadlineExceeded)` from a `time.AfterFunc`. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **Verdict:** No material issues found **Findings:** - `run/executor.go:203` — `runCtx` is created with `context.WithCancelCause(context.WithoutCancel(ctx))` and then immediately wrapped by `MergeCancellation` at `run/executor.go:207`. `MergeCancellation` may call `cancel(context.Cause(secondary))` if the caller context `ctx` is cancelled, which would overwrite any cancellation cause previously set on `runCtx` (e.g., `ErrCriticKill` from `startCritic` or `context.DeadlineExceeded` from `runTimer`). Since `statusFor` reads `context.Cause(runCtx)`, a caller-cancellation racing with a critic kill or timeout could overwrite the original cause and cause `statusFor` to report `"cancelled"` instead of `"killed"` or `"timeout"`. However, this ordering is exactly the same as the prior code's behavior with `context.WithTimeout(context.WithoutCancel(ctx), maxRuntime)` followed by `MergeCancellation`; the only new risk is the loss of the *reason* when a critic kill races with caller cancellation, and there is no existing test or usage path in the diff or repo that demonstrates this race causing a user-visible misclassification. Treat as a latent edge case rather than a blocking bug. - `run/executor_test.go:157` — The test creates `killCtx` but does not cancel the background context `bg` or `killCtx` via defer. This is a minor test-only resource leak and not a production issue. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 2s

gitea-actions bot commented

2026-06-27 20:38:34 +00:00

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

I traced every tainted-data path through the diff and verified the surrounding code. Here's what I checked:

KillReason → error chain → context.Cause: The reason string flows from the host Escalator (via Decision.KillReason) → errors.New(reason) → KillCause() → fmt.Errorf("%w: %s", ErrCriticKill, cause.Error()) → cancelCause(...). It is consumed only by statusFor's errors.Is(context.Cause(runCtx), ErrCriticKill) boolean check. The reason string is never surfaced to users, logs, or audit output — statusFor returns the hardcoded "killed", and res.Err remains context.Canceled (the agent loop sees ctx.Err(), not the cause). Verified by reading executor.go:279-287 and ports.go:63-74.
*llm.Response in RecordStep: The model response is forwarded to the critic handle. The battery (critic/critic.go:172) discards it (_). A host adapter that does consume it is responsible for its own handling of model output (PII, logging, etc.) — the kernel merely passes data that was already available elsewhere in the same goroutine. No new exposure.
fmt.Errorf("%w: %s", ...) with attacker-controlled cause.Error(): The %s verb prints the string literally; format specifiers inside the argument are not interpreted. No format-string vulnerability.
No new network calls, no deserialization, no authz changes, no file paths constructed from untrusted input.

The changes are mechanically sound from a security standpoint.

🎯 Correctness — No material issues found

Verdict: No material issues found

I traced every logic path through the cause-carrying cancellation, the statusFor dispatch, the AfterFunc/critic-watch interaction, and the MergeCancellation propagation. The semantics are sound:

statusFor priority order is correct: "killed" (via context.Cause → ErrCriticKill) is checked before "timeout" (via context.Cause → DeadlineExceeded or runErr), which is checked before "cancelled" (via runErr → Canceled). A critic kill that wraps ErrCriticKill with %w is correctly unwrapped by errors.Is.
First-write-wins on cancelCause means the AfterFunc timer and the critic watch goroutine can't clobber each other; whichever fires first sets the cause permanently.
MergeCancellation child propagation: the merged context (child of the pre-merge runCtx) correctly inherits the parent's cancellation cause, so statusFor(runCtx, runErr) where runCtx is the merged context still sees the critic-kill or deadline-exceeded cause set on the pre-merge parent.
Defer ordering (stopCritic → mergeCancel → runTimer.Stop → cancelCause(nil)) is safe: the timer is stopped before the final no-op cancel, and statusFor is called before any of them fire.
Battery KillCause() is properly mutex-guarded and returns nil until tick sets killCause; the executor's KillCause() != nil gate correctly branches between ErrCriticKill and DeadlineExceeded.

One minor observation (not a bug): the RunStats.Status comment on run/ports.go:66 lists ok | error | timeout | budget_exceeded | cancelled | dry_run but doesn't mention the new "killed" status. This is a documentation drift, not a logic defect, and outside my lens.

🧯 Error handling & edge cases — No material issues found

No material issues found.

I traced every cancellation path, defer ordering, nil-safety, and race condition across the four changed files. The error handling is solid:

Multiple cancellation sources (timer, critic deadline-watch, MergeCancellation, caller cancel) are safe: WithCancelCause makes subsequent cancelCause calls no-ops, and the first cause sticks.
defer cancelCause(nil) at executor.go:204 runs after statusFor/finishAudit/deliver, so it cannot corrupt the status. The deferred runTimer.Stop() runs before it, preventing most late-timer firings.
context.Cause(runCtx) in statusFor is safe when the context isn't cancelled (returns nil). The runErr == nil short-circuit prevents a successful run from being misclassified even if a late cancellation lands.
Nil *llm.Response in RecordStep is explicitly documented as nil-safe; both the battery (critic/critic.go:172) and the fake handle ignore it.
KillCause() returning nil is the default; the deadline-watch only takes the kill path when it's non-nil (critic.go:66).
fmt.Errorf("%w: %s", ErrCriticKill, cause.Error()) — cause is guarded by the != nil check, so .Error() won't panic. errors.Is correctly unwraps through fmt.Errorf to find ErrCriticKill.
Concurrent tick + KillCause: the h.killed check at critic.go:274 prevents a double-kill, and killCause is set under the same lock as killed.
The timer-vs-critic race (both can cancel, first wins) is pre-existing and inherent to the two-tier design; the polling interval bounds the window to ≤1s. Not a new issue.

The one minor observation (not a code defect): executor_test.go tests the "timeout" path only via runErr == DeadlineExceeded, not via runErr == Canceled + Cause == DeadlineExceeded (the timer-firing scenario). The code handles it correctly; the test just doesn't cover that specific combination.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9m 49s}

### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> ## Verdict: No material issues found I traced every tainted-data path through the diff and verified the surrounding code. Here's what I checked: - **`KillReason` → error chain → `context.Cause`**: The reason string flows from the host `Escalator` (via `Decision.KillReason`) → `errors.New(reason)` → `KillCause()` → `fmt.Errorf("%w: %s", ErrCriticKill, cause.Error())` → `cancelCause(...)`. It is consumed only by `statusFor`'s `errors.Is(context.Cause(runCtx), ErrCriticKill)` boolean check. The reason string is **never surfaced** to users, logs, or audit output — `statusFor` returns the hardcoded `"killed"`, and `res.Err` remains `context.Canceled` (the agent loop sees `ctx.Err()`, not the cause). Verified by reading `executor.go:279-287` and `ports.go:63-74`. - **`*llm.Response` in `RecordStep`**: The model response is forwarded to the critic handle. The battery (`critic/critic.go:172`) discards it (`_`). A host adapter that *does* consume it is responsible for its own handling of model output (PII, logging, etc.) — the kernel merely passes data that was already available elsewhere in the same goroutine. No new exposure. - **`fmt.Errorf("%w: %s", ...)` with attacker-controlled `cause.Error()`**: The `%s` verb prints the string literally; format specifiers inside the argument are not interpreted. No format-string vulnerability. - **No new network calls, no deserialization, no authz changes, no file paths constructed from untrusted input.** The changes are mechanically sound from a security standpoint. </details> <details><summary>🎯 Correctness — No material issues found</summary> ## Verdict: No material issues found I traced every logic path through the cause-carrying cancellation, the `statusFor` dispatch, the `AfterFunc`/critic-watch interaction, and the `MergeCancellation` propagation. The semantics are sound: - **`statusFor` priority order is correct**: "killed" (via `context.Cause` → `ErrCriticKill`) is checked before "timeout" (via `context.Cause` → `DeadlineExceeded` or `runErr`), which is checked before "cancelled" (via `runErr` → `Canceled`). A critic kill that wraps `ErrCriticKill` with `%w` is correctly unwrapped by `errors.Is`. - **First-write-wins on `cancelCause`** means the `AfterFunc` timer and the critic watch goroutine can't clobber each other; whichever fires first sets the cause permanently. - **`MergeCancellation` child propagation**: the merged context (child of the pre-merge `runCtx`) correctly inherits the parent's cancellation cause, so `statusFor(runCtx, runErr)` where `runCtx` is the merged context still sees the critic-kill or deadline-exceeded cause set on the pre-merge parent. - **Defer ordering** (stopCritic → mergeCancel → runTimer.Stop → cancelCause(nil)) is safe: the timer is stopped before the final no-op cancel, and `statusFor` is called before any of them fire. - **Battery `KillCause()`** is properly mutex-guarded and returns nil until `tick` sets `killCause`; the executor's `KillCause() != nil` gate correctly branches between `ErrCriticKill` and `DeadlineExceeded`. One minor observation (not a bug): the `RunStats.Status` comment on `run/ports.go:66` lists `ok | error | timeout | budget_exceeded | cancelled | dry_run` but doesn't mention the new `"killed"` status. This is a documentation drift, not a logic defect, and outside my lens. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> No material issues found. I traced every cancellation path, defer ordering, nil-safety, and race condition across the four changed files. The error handling is solid: - **Multiple cancellation sources** (timer, critic deadline-watch, `MergeCancellation`, caller cancel) are safe: `WithCancelCause` makes subsequent `cancelCause` calls no-ops, and the first cause sticks. - **`defer cancelCause(nil)`** at `executor.go:204` runs *after* `statusFor`/`finishAudit`/`deliver`, so it cannot corrupt the status. The deferred `runTimer.Stop()` runs before it, preventing most late-timer firings. - **`context.Cause(runCtx)` in `statusFor`** is safe when the context isn't cancelled (returns nil). The `runErr == nil` short-circuit prevents a successful run from being misclassified even if a late cancellation lands. - **Nil `*llm.Response`** in `RecordStep` is explicitly documented as nil-safe; both the battery (`critic/critic.go:172`) and the fake handle ignore it. - **`KillCause()` returning nil** is the default; the deadline-watch only takes the kill path when it's non-nil (`critic.go:66`). - **`fmt.Errorf("%w: %s", ErrCriticKill, cause.Error())`** — `cause` is guarded by the `!= nil` check, so `.Error()` won't panic. `errors.Is` correctly unwraps through `fmt.Errorf` to find `ErrCriticKill`. - **Concurrent `tick` + `KillCause`**: the `h.killed` check at `critic.go:274` prevents a double-kill, and `killCause` is set under the same lock as `killed`. - **The timer-vs-critic race** (both can cancel, first wins) is pre-existing and inherent to the two-tier design; the polling interval bounds the window to ≤1s. Not a new issue. The one minor observation (not a code defect): `executor_test.go` tests the "timeout" path only via `runErr == DeadlineExceeded`, not via `runErr == Canceled` + `Cause == DeadlineExceeded` (the timer-firing scenario). The code handles it correctly; the test just doesn't cover that specific combination. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9m 49s

gitea-actions bot commented

2026-06-27 20:40:15 +00:00

🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

Explanation

After reviewing the changes through the security lens (focusing on authn/authz, injection, SSRF, secret leakage, input validation, and insecure defaults), no security vulnerabilities were identified.

Key points verified:

KillReason propagation: The critic's KillReason (which could be influenced by LLM output) is only used to wrap an error for cancellation cause. The statusFor function checks only the error type (errors.Is(context.Cause(runCtx), ErrCriticKill)), not the error string itself. While the error string could theoretically be logged unsafely elsewhere, this change does not introduce new logging or insecure usage of the string.
llm.Response handling: The *llm.Response parameter is passed through to RecordStep but is ignored in the battery implementation (as noted in the comment) and in test fakes. No security-sensitive operations (e.g., command injection, SQL injection) are performed with this data in the changed code.
Error handling: New uses of errors.Is and fmt.Errorf are safe and follow idiomatic Go patterns.
Mutex usage: The new KillCause() method properly locks the mutex before accessing h.killCause.
No new attack surfaces: No changes introduce command execution, SQL/NoSQL injection, path traversal, or deserialization flaws.

All changes are consistent with the stated goal of passing the LLM response to the critic for tracking and distinguishing kill causes via context cancellation. No insecure defaults or missing validations were identified.

Verification steps included:

Reading modified files (critic/critic.go, run/critic.go, run/executor.go) to confirm safe usage of new parameters and error handling.
Grepping for ErrCriticKill and context.Cause to confirm usage is limited to equality checks.
Confirming that llm.Response is not dereferenced or used in unsafe ways in the changed code.

🎯 Correctness — No material issues found

No material issues found.

🧯 Error handling & edge cases — No material issues found

No material issues found

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 5s}

### 🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> No material issues found ## Explanation After reviewing the changes through the security lens (focusing on authn/authz, injection, SSRF, secret leakage, input validation, and insecure defaults), no security vulnerabilities were identified. Key points verified: - **KillReason propagation**: The critic's `KillReason` (which could be influenced by LLM output) is only used to wrap an error for cancellation cause. The `statusFor` function checks only the error type (`errors.Is(context.Cause(runCtx), ErrCriticKill)`), not the error string itself. While the error string could theoretically be logged unsafely elsewhere, this change does not introduce new logging or insecure usage of the string. - **llm.Response handling**: The `*llm.Response` parameter is passed through to `RecordStep` but is ignored in the battery implementation (as noted in the comment) and in test fakes. No security-sensitive operations (e.g., command injection, SQL injection) are performed with this data in the changed code. - **Error handling**: New uses of `errors.Is` and `fmt.Errorf` are safe and follow idiomatic Go patterns. - **Mutex usage**: The new `KillCause()` method properly locks the mutex before accessing `h.killCause`. - **No new attack surfaces**: No changes introduce command execution, SQL/NoSQL injection, path traversal, or deserialization flaws. All changes are consistent with the stated goal of passing the LLM response to the critic for tracking and distinguishing kill causes via context cancellation. No insecure defaults or missing validations were identified. Verification steps included: - Reading modified files (`critic/critic.go`, `run/critic.go`, `run/executor.go`) to confirm safe usage of new parameters and error handling. - Grepping for `ErrCriticKill` and `context.Cause` to confirm usage is limited to equality checks. - Confirming that `llm.Response` is not dereferenced or used in unsafe ways in the changed code. </details> <details><summary>🎯 Correctness — No material issues found</summary> No material issues found. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> No material issues found </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 5s

gitea-actions bot commented

2026-06-27 20:45:34 +00:00

🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

The changes introduce a cause‑carrying cancellation flow and extend the CriticHandle API. No new authentication/authorization checks are needed, and the added error handling does not expose unvalidated input or secrets.
The only new string that originates from untrusted data is the kill reason (d.KillReason). It is stored in an error and later wrapped with fmt.Errorf("%w: %s", ErrCriticKill, cause.Error()). This does not lead to command/SQL/template injection, nor is it logged or returned to external callers in a way that would leak sensitive data.
All newly added imports and code paths compile correctly (verified by reading the affected files). No secret values are introduced, and defaults remain safe.

No security‑related concerns identified.

🎯 Correctness — Minor issues

Verdict: Minor issues

run/critic_test.go uses an uninitialized killCtx – the test creates a context with killCtx, killCancel := context.WithCancelCause(context.Background()) but never declares killCtx. This variable is referenced later, causing a compilation error.
Impact: Compile‑time failure.
run/executor.go imports errors but does not use it after the changes; the import remains unused.
Impact: Linter/compile warning.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

The new cause‑carrying cancellation path correctly propagates the kill reason and preserves the original deadline‑exceeded cause. All added CancelCause calls are guarded by the context API, and the deferred cancelCause(nil) does not overwrite an already‑set cause.
Mutex protection around killCause, iterations, and other mutable state prevents data races. The Stop method safely guards against double‑close of its channel.
The watchdog goroutine recovers from panics, and the executor’s top‑level recover covers panics from host ports.
Edge cases such as zero or negative maxRuntime or CriticSoftTimeout are handled by fall‑backs to sensible defaults.
No errors are ignored or silently swallowed; all critical operations (e.g., timer creation, cancel calls) are either checked or safely deferred.

No error‑handling or edge‑case problems detected in the changes.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 13s}

### 🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> **Verdict:** No material issues found - The changes introduce a cause‑carrying cancellation flow and extend the `CriticHandle` API. No new authentication/authorization checks are needed, and the added error handling does not expose unvalidated input or secrets. - The only new string that originates from untrusted data is the kill reason (`d.KillReason`). It is stored in an `error` and later wrapped with `fmt.Errorf("%w: %s", ErrCriticKill, cause.Error())`. This does not lead to command/SQL/template injection, nor is it logged or returned to external callers in a way that would leak sensitive data. - All newly added imports and code paths compile correctly (verified by reading the affected files). No secret values are introduced, and defaults remain safe. No security‑related concerns identified. </details> <details><summary>🎯 Correctness — Minor issues</summary> **Verdict:** Minor issues - **`run/critic_test.go` uses an uninitialized `killCtx`** – the test creates a context with `killCtx, killCancel := context.WithCancelCause(context.Background())` but never declares `killCtx`. This variable is referenced later, causing a compilation error. *Impact:* Compile‑time failure. - **`run/executor.go` imports `errors` but does not use it** after the changes; the import remains unused. *Impact:* Linter/compile warning. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **Verdict:** No material issues found - The new cause‑carrying cancellation path correctly propagates the kill reason and preserves the original deadline‑exceeded cause. All added `CancelCause` calls are guarded by the `context` API, and the deferred `cancelCause(nil)` does not overwrite an already‑set cause. - Mutex protection around `killCause`, `iterations`, and other mutable state prevents data races. The `Stop` method safely guards against double‑close of its channel. - The watchdog goroutine recovers from panics, and the executor’s top‑level recover covers panics from host ports. - Edge cases such as zero or negative `maxRuntime` or `CriticSoftTimeout` are handled by fall‑backs to sensible defaults. - No errors are ignored or silently swallowed; all critical operations (e.g., timer creation, cancel calls) are either checked or safely deferred. No error‑handling or edge‑case problems detected in the changes. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 13s

gitea-actions bot commented

2026-06-27 20:46:47 +00:00

🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found.

The changes introduce a more nuanced status reporting mechanism for run execution by distinguishing between critic kills, timeouts, and cancellations. The security implications have been carefully considered:

The new KillCause() mechanism properly encapsulates error information without exposing internal state
Error wrapping uses standard library patterns (fmt.Errorf with %w)
Context cancellation with causes is handled correctly
No new security-sensitive surfaces are introduced
The critic's kill reason is properly sanitized before being used in error messages

The implementation maintains secure error handling practices while providing better observability into run termination causes.

🎯 Correctness — No material issues found

VERDICT: No material issues found

The changes correctly implement the critic parity features:

RecordStep now carries the full response - The signature change from RecordStep(iter int) to RecordStep(iter int, resp *llm.Response) is consistently applied across the critic handle, binding, and executor. The battery implementation correctly ignores the response payload as noted in its comment, while allowing richer escalators to use it.
Cause-carrying cancellation - The executor now uses context.WithCancelCause and properly distinguishes between critic kills (via ErrCriticKill) and natural timeouts (via context.DeadlineExceeded). The statusFor function correctly checks context.Cause(runCtx) to determine the final status.
Kill cause propagation - The critic handle's KillCause() method properly surfaces the kill reason, which is then used by the deadline watcher to cancel the context with the appropriate cause.

All interface changes are consistently implemented across the codebase, and the test cases have been updated to match the new signatures and behavior. The logic for distinguishing kill causes appears correct and handles the various cancellation scenarios appropriately.

🧯 Error handling & edge cases — No material issues found

No material issues found

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 29s}

### 🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> No material issues found. The changes introduce a more nuanced status reporting mechanism for run execution by distinguishing between critic kills, timeouts, and cancellations. The security implications have been carefully considered: 1. The new `KillCause()` mechanism properly encapsulates error information without exposing internal state 2. Error wrapping uses standard library patterns (`fmt.Errorf` with `%w`) 3. Context cancellation with causes is handled correctly 4. No new security-sensitive surfaces are introduced 5. The critic's kill reason is properly sanitized before being used in error messages The implementation maintains secure error handling practices while providing better observability into run termination causes. </details> <details><summary>🎯 Correctness — No material issues found</summary> VERDICT: No material issues found The changes correctly implement the critic parity features: 1. **RecordStep now carries the full response** - The signature change from `RecordStep(iter int)` to `RecordStep(iter int, resp *llm.Response)` is consistently applied across the critic handle, binding, and executor. The battery implementation correctly ignores the response payload as noted in its comment, while allowing richer escalators to use it. 2. **Cause-carrying cancellation** - The executor now uses `context.WithCancelCause` and properly distinguishes between critic kills (via `ErrCriticKill`) and natural timeouts (via `context.DeadlineExceeded`). The `statusFor` function correctly checks `context.Cause(runCtx)` to determine the final status. 3. **Kill cause propagation** - The critic handle's `KillCause()` method properly surfaces the kill reason, which is then used by the deadline watcher to cancel the context with the appropriate cause. All interface changes are consistently implemented across the codebase, and the test cases have been updated to match the new signatures and behavior. The logic for distinguishing kill causes appears correct and handles the various cancellation scenarios appropriately. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> No material issues found </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 29s

gitea-actions bot commented

2026-06-27 20:47:17 +00:00

🪰 Gadfly review — `gemma4:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found.

I have reviewed the changes through the Security lens. The modifications focus on enhancing the fidelity of run status reporting (distinguishing between "killed" by a critic, "timeout" by a backstop, and "cancelled" by the user) and enriching the data passed to the critic (llm.Response).

Input Validation: The llm.Response is now passed to RecordStep. In the provided implementation (critic/critic.go), this parameter is ignored. Since the critic handle is an interface used by host adapters, any security implications of processing llm.Response (e.g., logging untrusted model output) would reside in the adapter implementation, not the executor's orchestration.
Resource Management: The transition from context.WithTimeout to context.WithCancelCause combined with time.AfterFunc maintains the same hard-deadline safety. The use of defer runTimer.Stop() ensures that timers are cleaned up regardless of how the run terminates, preventing timer leaks.
Error Handling/Information Leakage: The KillCause is wrapped in ErrCriticKill and used to determine the run status. The cause is derived from Decision.KillReason. Given that this is internal state for a run's audit trail/status, there is no exposure of sensitive system secrets to external untrusted parties.
Concurrency: The handle in critic/critic.go correctly uses a mutex (h.mu) to protect killed and the new killCause field, preventing race conditions during the check in startCritic.

The changes are localized to control flow and metadata tracking and do not introduce new attack vectors.

🎯 Correctness — No material issues found

VERDICT: No material issues found

I have reviewed the changes through the Correctness lens.

Kill/Timeout Distinction: The transition from context.WithTimeout to context.WithCancelCause combined with a manual time.AfterFunc correctly separates the three termination states (User Cancel, Backstop Timeout, and Critic Kill).
- The logic in run/critic.go:63-67 ensures that if KillCause() is set at the moment of expiration, the cause is ErrCriticKill, otherwise it is context.DeadlineExceeded.
- The statusFor logic in run/executor.go:297-302 correctly prioritizes the cause from the context over the returned error, ensuring "killed" takes precedence when the context was cancelled by the critic.
Interface Fidelity: The update to RecordStep in run/ports.go:133 and its propagation through run/executor.go:226 and run/critic.go:77 correctly forwards the llm.Response to the critic, satisfying the requirement for fuller fidelity in trace recording.
Concurrency: The critic.handle in critic/critic.go correctly uses its mutex to protect both killed and the new killCause field, preventing races when the executor polls KillCause() while the critic's tick goroutine is updating it.
Boundary Values: The use of time.Now().Before(d) in run/critic.go:62 is correct for checking if a deadline has passed.

All logic appears sound and consistent with the stated goals.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

I have reviewed the changes through the lens of error handling and edge cases.

context.CancelCauseFunc: The transition from WithTimeout to WithCancelCause is handled correctly. The maxRuntime is now enforced by a manual time.AfterFunc that triggers cancelCause(context.DeadlineExceeded), which preserves the original timeout behavior.
statusFor Logic: The new priority for determining status (ErrCriticKill -> DeadlineExceeded -> Canceled) correctly distinguishes between a critic's explicit kill, a backstop timeout, and a general cancellation. The use of context.Cause(runCtx) ensures that the specific reason for cancellation is captured even when runErr is a generic context.Canceled.
KillCause Thread Safety: In critic/critic.go, KillCause() correctly uses the handle's mutex to protect access to h.killCause, preventing data races with the tick goroutine that updates it.
Nil Safety: The RecordStep update maintains nil-safety for the llm.Response payload, and criticBinding.recordStep remains nil-safe for the binding itself.
Edge Cases: The tick logic in critic.go handles empty KillReason strings by providing a default message, avoiding empty error messages.

The implementation is robust and correctly handles the unhappy paths introduced by the new "killed" status.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 20s}

### 🪰 Gadfly review — `gemma4:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> No material issues found. I have reviewed the changes through the **Security** lens. The modifications focus on enhancing the fidelity of run status reporting (distinguishing between "killed" by a critic, "timeout" by a backstop, and "cancelled" by the user) and enriching the data passed to the critic (`llm.Response`). 1. **Input Validation**: The `llm.Response` is now passed to `RecordStep`. In the provided implementation (`critic/critic.go`), this parameter is ignored. Since the critic handle is an interface used by host adapters, any security implications of processing `llm.Response` (e.g., logging untrusted model output) would reside in the adapter implementation, not the executor's orchestration. 2. **Resource Management**: The transition from `context.WithTimeout` to `context.WithCancelCause` combined with `time.AfterFunc` maintains the same hard-deadline safety. The use of `defer runTimer.Stop()` ensures that timers are cleaned up regardless of how the run terminates, preventing timer leaks. 3. **Error Handling/Information Leakage**: The `KillCause` is wrapped in `ErrCriticKill` and used to determine the run status. The cause is derived from `Decision.KillReason`. Given that this is internal state for a run's audit trail/status, there is no exposure of sensitive system secrets to external untrusted parties. 4. **Concurrency**: The `handle` in `critic/critic.go` correctly uses a mutex (`h.mu`) to protect `killed` and the new `killCause` field, preventing race conditions during the check in `startCritic`. The changes are localized to control flow and metadata tracking and do not introduce new attack vectors. </details> <details><summary>🎯 Correctness — No material issues found</summary> VERDICT: No material issues found I have reviewed the changes through the **Correctness** lens. 1. **Kill/Timeout Distinction**: The transition from `context.WithTimeout` to `context.WithCancelCause` combined with a manual `time.AfterFunc` correctly separates the three termination states (User Cancel, Backstop Timeout, and Critic Kill). * The logic in `run/critic.go:63-67` ensures that if `KillCause()` is set at the moment of expiration, the cause is `ErrCriticKill`, otherwise it is `context.DeadlineExceeded`. * The `statusFor` logic in `run/executor.go:297-302` correctly prioritizes the cause from the context over the returned error, ensuring "killed" takes precedence when the context was cancelled by the critic. 2. **Interface Fidelity**: The update to `RecordStep` in `run/ports.go:133` and its propagation through `run/executor.go:226` and `run/critic.go:77` correctly forwards the `llm.Response` to the critic, satisfying the requirement for fuller fidelity in trace recording. 3. **Concurrency**: The `critic.handle` in `critic/critic.go` correctly uses its mutex to protect both `killed` and the new `killCause` field, preventing races when the executor polls `KillCause()` while the critic's `tick` goroutine is updating it. 4. **Boundary Values**: The use of `time.Now().Before(d)` in `run/critic.go:62` is correct for checking if a deadline has passed. All logic appears sound and consistent with the stated goals. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> VERDICT: No material issues found I have reviewed the changes through the lens of error handling and edge cases. - **`context.CancelCauseFunc`**: The transition from `WithTimeout` to `WithCancelCause` is handled correctly. The `maxRuntime` is now enforced by a manual `time.AfterFunc` that triggers `cancelCause(context.DeadlineExceeded)`, which preserves the original timeout behavior. - **`statusFor` Logic**: The new priority for determining status (`ErrCriticKill` -> `DeadlineExceeded` -> `Canceled`) correctly distinguishes between a critic's explicit kill, a backstop timeout, and a general cancellation. The use of `context.Cause(runCtx)` ensures that the specific reason for cancellation is captured even when `runErr` is a generic `context.Canceled`. - **`KillCause` Thread Safety**: In `critic/critic.go`, `KillCause()` correctly uses the handle's mutex to protect access to `h.killCause`, preventing data races with the `tick` goroutine that updates it. - **Nil Safety**: The `RecordStep` update maintains nil-safety for the `llm.Response` payload, and `criticBinding.recordStep` remains nil-safe for the binding itself. - **Edge Cases**: The `tick` logic in `critic.go` handles empty `KillReason` strings by providing a default message, avoiding empty error messages. The implementation is robust and correctly handles the unhappy paths introduced by the new "killed" status. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 20s

steve referenced this issue from a commit

2026-06-27 21:00:29 +00:00

run: fix statusFor — don't relabel a generic error / caller-cancel as timeout (gadfly #11)

steve added 1 commit 2026-06-27 21:00:29 +00:00

run: fix statusFor — don't relabel a generic error / caller-cancel as timeout (gadfly #11 )

executus CI / test (pull_request) Successful in 47s

Details

executus CI / test (push) Successful in 45s

Details

be4bbbcad5

The WithCancelCause+timer rewrite made MaxRuntime surface as Canceled (not
DeadlineExceeded), so statusFor's context.Cause(DeadlineExceeded) check could
relabel (a) a genuine run error as 'timeout' and (b) a caller cancel/deadline as
'timeout' (was 'cancelled'). Convergent gadfly finding (glm-5.2 + cluster).

Fix: keep MaxRuntime as WithTimeout (its DeadlineExceeded propagates → 'timeout',
preserving own-timeout vs caller-cancel), add a NESTED WithCancelCause layer only
for the kill. statusFor consults context.Cause ONLY for ErrCriticKill; everything
else is classified by the run error itself. Tests: generic-error-not-relabeled +
caller-cancel-stays-cancelled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve closed this pull request

2026-06-27 21:11:16 +00:00

steve deleted branch critic-parity-recordstep-kill

2026-06-27 21:11:16 +00:00

executus CI / test (pull_request) Successful in 47s

Details

executus CI / test (push) Successful in 45s

Details

Pull request closed

This pull request cannot be reopened because the branch was deleted.

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: steve/executus#11

run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status) #11

Changes

Tests

🪰 Gadfly review — m5/qwen3.6:35b-mlx (m5)

No material issues found

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

🔒 Security Review

🪰 Gadfly review — glm-5.1:cloud (ollama-cloud)

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

VERDICT: Minor issues

Verified review

🪰 Gadfly review — kimi-k2.7-code:cloud (ollama-cloud)

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: No material issues found

Verdict: No material issues found

🪰 Gadfly review — nemotron-3-super:cloud (ollama-cloud)

Explanation

🪰 Gadfly review — gpt-oss:120b-cloud (ollama-cloud)

🪰 Gadfly review — qwen3-coder:480b-cloud (ollama-cloud)

🪰 Gadfly review — gemma4:cloud (ollama-cloud)

Pull request closed

🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5)

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud)

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud)

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud)

🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud)

🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud)

🪰 Gadfly review — `gemma4:cloud` (ollama-cloud)