feat(run): durable checkpoint + resume (wire Ports.Checkpointer) #20

Merged
steve merged 2 commits from feat/kernel-checkpoint into main 2026-06-29 20:44:17 +00:00
Owner

The kernel defined run.Ports.Checkpointer + the checkpoint battery but never drove them (the documented "P2 follow-up"). This wires durable recovery into the run loop so a run interrupted by shutdown resumes on the next boot instead of being lost — the executus-side half of mort's durable-agent-recovery parity (mort #1355; durable recovery is not host-fixable because the resume state needs the loop's internal []llm.Message transcript).

Kernel (run/)

  • Ports.Checkpointer is now a CheckpointerFactory (Begin(ctx, RunInfo) → (Checkpointer, error), or nil for a non-durable run). A single per-instance Checkpointer (Save/Complete/Fail, no run id) can't distinguish concurrent runs; a factory mints one per run, matching mort's agentexec.CheckpointerFactory. The port was never consumed before, so this is not a breaking change for any caller.
  • RunInfo gains GuildID + ModelTier (so the factory can build resume meta); RunCheckpointState gains CompletedPhases + ActivePhase (+ PhaseOutput).
  • run/checkpoint.go: ResumeState + WithResumeState / WithExistingCheckpointer context carriers; classifyCheckpointOutcome (success → Complete, shutdown → leave for boot recovery, else → Fail — keyed on run.ErrShutdown); finalizeCheckpoint.
  • run/executor.go: resolve the per-run checkpointer (existing-from-ctx on a recovery re-run, else factory.Begin); the single-loop path wraps the step observer to accumulate the transcript + Save each step (the host throttles), and a recovered run seeds the saved transcript via WithHistory and continues with no new input (completed tool calls are not re-run); finalize on exit.
  • run/phases.go: phase-boundary checkpointing — record completed phases after each phase; a resumed run skips already-completed phases. The interrupted phase re-runs from its start (boundary-granular, documented) — only the single-loop path resumes mid-loop.

Battery (checkpoint/)

NewFactory wires the battery into the factory port (per-run handle, meta derived from RunInfo); RunCheckpoint + handle.Save now carry the phase fields.

Note for the host (mort)

classifyCheckpointOutcome keys shutdown on run.ErrShutdown. mort currently stamps its own runengine.ErrShutdown (a distinct value) on the base context — the mort wiring PR will alias them so errors.Is matches.

Tests (run/checkpoint_test.go)

finalize decision matrix; single-loop Save+Complete; terminal-error Fail; resume seeds history; phase-boundary Saves completed phases; resume skips completed phases. Full go test ./... green.

🤖 Generated with Claude Code

The kernel defined `run.Ports.Checkpointer` + the `checkpoint` battery but **never drove them** (the documented "P2 follow-up"). This wires durable recovery into the run loop so a run interrupted by shutdown resumes on the next boot instead of being lost — the executus-side half of mort's durable-agent-recovery parity (mort #1355; durable recovery is not host-fixable because the resume state needs the loop's internal `[]llm.Message` transcript). ## Kernel (`run/`) - **`Ports.Checkpointer` is now a `CheckpointerFactory`** (`Begin(ctx, RunInfo) → (Checkpointer, error)`, or `nil` for a non-durable run). A single per-instance `Checkpointer` (Save/Complete/Fail, no run id) can't distinguish concurrent runs; a factory mints one per run, matching mort's `agentexec.CheckpointerFactory`. The port was never consumed before, so this is not a breaking change for any caller. - `RunInfo` gains `GuildID` + `ModelTier` (so the factory can build resume meta); `RunCheckpointState` gains `CompletedPhases` + `ActivePhase` (+ `PhaseOutput`). - **`run/checkpoint.go`**: `ResumeState` + `WithResumeState` / `WithExistingCheckpointer` context carriers; `classifyCheckpointOutcome` (success → Complete, shutdown → leave for boot recovery, else → Fail — keyed on `run.ErrShutdown`); `finalizeCheckpoint`. - **`run/executor.go`**: resolve the per-run checkpointer (existing-from-ctx on a recovery re-run, else `factory.Begin`); the single-loop path wraps the step observer to accumulate the transcript + `Save` each step (the host throttles), and a recovered run seeds the saved transcript via `WithHistory` and continues with no new input (completed tool calls are not re-run); finalize on exit. - **`run/phases.go`**: phase-boundary checkpointing — record completed phases after each phase; a resumed run skips already-completed phases. The interrupted phase re-runs from its start (boundary-granular, documented) — only the single-loop path resumes mid-loop. ## Battery (`checkpoint/`) `NewFactory` wires the battery into the factory port (per-run handle, meta derived from `RunInfo`); `RunCheckpoint` + `handle.Save` now carry the phase fields. ## Note for the host (mort) `classifyCheckpointOutcome` keys shutdown on `run.ErrShutdown`. mort currently stamps its own `runengine.ErrShutdown` (a distinct value) on the base context — the mort wiring PR will alias them so `errors.Is` matches. ## Tests (`run/checkpoint_test.go`) finalize decision matrix; single-loop Save+Complete; terminal-error Fail; resume seeds history; phase-boundary Saves completed phases; resume skips completed phases. Full `go test ./...` green. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-06-29 20:04:28 +00:00
feat(run): durable checkpoint + resume (wire Ports.Checkpointer)
executus CI / test (pull_request) Successful in 46s
Adversarial Review (Gadfly) / review (pull_request) Successful in 17m25s
899059a791
The kernel defined run.Ports.Checkpointer + the checkpoint battery but never
drove them (the documented "P2 follow-up"). This wires durable recovery into
the run loop so a run interrupted by shutdown can resume on the next boot
instead of being lost — the executus-side half of mort's durable-agent-recovery
parity (mort #1355).

Kernel (run/):
- Ports.Checkpointer is now a CheckpointerFactory (Begin per run → a per-run
  Checkpointer, or nil for a non-durable run). The single per-instance
  Checkpointer couldn't distinguish runs; a factory mints one per run, matching
  mort's agentexec.CheckpointerFactory.
- RunInfo gains GuildID + ModelTier (so the factory can build resume meta);
  RunCheckpointState gains CompletedPhases + ActivePhase (+ PhaseOutput).
- run/checkpoint.go: ResumeState + WithResumeState / WithExistingCheckpointer
  context carriers, classifyCheckpointOutcome (success→Complete, shutdown→leave
  for boot recovery, else→Fail using run.ErrShutdown), and finalizeCheckpoint.
- run/executor.go: resolve the per-run checkpointer (existing-from-ctx on a
  recovery re-run, else factory.Begin); single-loop wraps the step observer to
  accumulate the transcript + Save each step (host throttles), and a recovered
  run seeds the saved transcript via WithHistory and continues with no new
  input; finalize on exit.
- run/phases.go: phase-boundary checkpointing — record completed phases after
  each phase; a resumed run skips already-completed phases (the interrupted
  phase re-runs from its start — boundary-granular, documented; only the
  single-loop path resumes mid-loop).

Battery (checkpoint/): NewFactory wires the battery into the factory port
(per-run handle, meta derived from RunInfo); RunCheckpoint + handle.Save carry
the phase fields.

Tests (run/checkpoint_test.go): the finalize decision matrix; single-loop
Save+Complete; terminal-error Fail; resume seeds history; phase-boundary Saves
completed phases; resume skips completed phases. Full ./... green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly — live review status

7/7 reviewers finished · updated 2026-06-29 20:21:51Z

claude-code/opus · claude-code — done

  • security — No material issues found
  • correctness — Minor issues
  • maintainability — Minor issues
  • performance — Minor issues
  • error-handling — Minor issues

claude-code/opus:max · claude-code — done

  • security — Minor issues
  • correctness — No material issues found
  • maintainability — Minor issues
  • performance — No material issues found
  • error-handling — Minor issues

deepseek-v4-pro:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — Minor issues
  • maintainability — Minor issues
  • performance — No material issues found
  • error-handling — Minor issues

devstral-2:123b-cloud · ollama-cloud — done

  • security — Minor issues
  • correctness — Blocking issues found
  • maintainability — Minor issues
  • performance — Minor issues
  • error-handling — Minor issues

glm-5.2:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — Minor issues
  • maintainability — Minor issues
  • performance — No material issues found
  • error-handling — Minor issues

kimi-k2.6:cloud · ollama-cloud — done

  • security — No material issues found
  • ⚠️ correctness — could not complete
  • maintainability — No material issues found
  • performance — No material issues found
  • error-handling — Blocking issues found

qwen3.5:397b-cloud · ollama-cloud — done

  • security — Minor issues
  • correctness — Minor issues
  • maintainability — Minor issues
  • performance — No material issues found
  • error-handling — Blocking issues found

Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.

<!-- gadfly-status-board --> ## 🪰 Gadfly — live review status 7/7 reviewers finished · updated 2026-06-29 20:21:51Z #### `claude-code/opus` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — Minor issues - ✅ **error-handling** — Minor issues #### `claude-code/opus:max` · claude-code — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — No material issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — Minor issues #### `deepseek-v4-pro:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — Minor issues #### `devstral-2:123b-cloud` · ollama-cloud — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — Blocking issues found - ✅ **maintainability** — Minor issues - ✅ **performance** — Minor issues - ✅ **error-handling** — Minor issues #### `glm-5.2:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — Minor issues #### `kimi-k2.6:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ⚠️ **correctness** — could not complete - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — Blocking issues found #### `qwen3.5:397b-cloud` · ollama-cloud — ✅ done - ✅ **security** — Minor issues - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — Blocking issues found <sub>Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.</sub>
gitea-actions bot reviewed 2026-06-29 20:21:52 +00:00
gitea-actions bot left a comment

🪰 Gadfly consensus review — 20 inline findings on changed lines. See the consensus comment for the full ranked summary.

Advisory only — does not block merge.

<!-- gadfly-inline-review --> 🪰 **Gadfly consensus review** — 20 inline findings on changed lines. See the consensus comment for the full ranked summary. <sub>Advisory only — does not block merge.</sub>
@@ -84,0 +87,4 @@
// factory is a run.CheckpointerFactory that mints a per-run handle over store,
// deriving the per-run meta from the kernel's RunInfo. It is the battery's glue
// for the Ports.Checkpointer (factory) seam: every run becomes durable (the
// store persists snapshots; a host wanting lazy/short-run skipping uses its own

factory.now field is never set by NewFactory (always nil); dead/untestable field

maintainability · flagged by 1 model

🪰 Gadfly · advisory

⚪ **factory.now field is never set by NewFactory (always nil); dead/untestable field** _maintainability · flagged by 1 model_ <sub>🪰 Gadfly · advisory</sub>
@@ -84,0 +92,4 @@
type factory struct {
store CheckpointStore
throttle time.Duration
now func() time.Time

factory.now field is never assigned (NewFactory omits it) — dead/always-nil clock field

maintainability · flagged by 1 model

  • checkpoint/handle.go:95factory.now is dead. The factory struct carries a now func() time.Time field, but NewFactory never sets it (handle.go:103-105), so f.now is always nil and Begin always passes nil to New (handle.go:122), which then defaults to time.Now. Unlike handle, there's no constructor/option that injects a clock, so the field can never be anything but nil. Either drop it, or add the injection path it implies (mirroring handle's testable clock).

🪰 Gadfly · advisory

⚪ **factory.now field is never assigned (NewFactory omits it) — dead/always-nil clock field** _maintainability · flagged by 1 model_ - **`checkpoint/handle.go:95` — `factory.now` is dead.** The `factory` struct carries a `now func() time.Time` field, but `NewFactory` never sets it (handle.go:103-105), so `f.now` is always nil and `Begin` always passes nil to `New` (handle.go:122), which then defaults to `time.Now`. Unlike `handle`, there's no constructor/option that injects a clock, so the field can never be anything but nil. Either drop it, or add the injection path it implies (mirroring `handle`'s testable clock). <sub>🪰 Gadfly · advisory</sub>
@@ -84,0 +100,4 @@
// NewFactory returns a run.CheckpointerFactory backed by store: each run gets a
// per-run Checkpointer (throttled to at most once per throttle). A nil store
// yields factory.Begin returning a no-op Checkpointer.
func NewFactory(store CheckpointStore, throttle time.Duration) run.CheckpointerFactory {

🟡 NewFactory never initializes factory.now, leaving a configurable-looking but always-nil field

maintainability · flagged by 1 model

  1. NewFactory never initializes factory.now — confirmed at checkpoint/handle.go:103-104; f.now stays nil, passed to New which falls back to time.Now. Confirmed.

🪰 Gadfly · advisory

🟡 **NewFactory never initializes factory.now, leaving a configurable-looking but always-nil field** _maintainability · flagged by 1 model_ 4. **NewFactory never initializes factory.now** — confirmed at checkpoint/handle.go:103-104; `f.now` stays nil, passed to `New` which falls back to `time.Now`. Confirmed. <sub>🪰 Gadfly · advisory</sub>
@@ -84,0 +104,4 @@
return &factory{store: store, throttle: throttle}
}
// Begin mints the per-run Checkpointer. The prompt is read from

🟡 Potential information leakage in checkpoint metadata

maintainability, security · flagged by 2 models

  • Potential information leakage in checkpoint metadata (checkpoint/handle.go:109-121): The Begin method extracts the prompt from info.Inputs["prompt"] without validation or sanitization. If the prompt contains sensitive information (e.g., API keys, PII), it will be stored in the checkpoint metadata. This could lead to unintended exposure if the checkpoint store is compromised or accessed by unauthorized parties. Consider redacting sensitive data or providing an allowlist/denylist mecha…

🪰 Gadfly · advisory

🟡 **Potential information leakage in checkpoint metadata** _maintainability, security · flagged by 2 models_ - **Potential information leakage in checkpoint metadata** (`checkpoint/handle.go:109-121`): The `Begin` method extracts the prompt from `info.Inputs["prompt"]` without validation or sanitization. If the prompt contains sensitive information (e.g., API keys, PII), it will be stored in the checkpoint metadata. This could lead to unintended exposure if the checkpoint store is compromised or accessed by unauthorized parties. Consider redacting sensitive data or providing an allowlist/denylist mecha… <sub>🪰 Gadfly · advisory</sub>
@@ -0,0 +20,4 @@
// WithResumeState; the executor reads it (single-loop seeds the saved transcript
// as history; multi-phase skips completed phases and seeds the active phase).
type ResumeState struct {
History []llm.Message // single-loop transcript OR active-phase transcript

🟠 ActivePhase field plumbed through 3 structs but never read/set; doc comments claim active-phase seeding that isn't implemented

correctness, maintainability · flagged by 5 models

  • ActivePhase is dead weight, and its docs overpromise — run/checkpoint.go:25, run/ports.go:210, checkpoint/checkpoint.go:41. The field is plumbed through three structs (ResumeState, RunCheckpointState, RunCheckpoint) but the kernel never reads it and never writes a non-empty value to it. Every write site is either the literal ActivePhase: "" (run/phases.go:186) or a pass-through copy (checkpoint/handle.go:61), and there is no read site anywhere — the single-loop Save (…

🪰 Gadfly · advisory

🟠 **ActivePhase field plumbed through 3 structs but never read/set; doc comments claim active-phase seeding that isn't implemented** _correctness, maintainability · flagged by 5 models_ - **`ActivePhase` is dead weight, and its docs overpromise — `run/checkpoint.go:25`, `run/ports.go:210`, `checkpoint/checkpoint.go:41`.** The field is plumbed through three structs (`ResumeState`, `RunCheckpointState`, `RunCheckpoint`) but the kernel never reads it and never writes a non-empty value to it. Every write site is either the literal `ActivePhase: ""` (`run/phases.go:186`) or a pass-through copy (`checkpoint/handle.go:61`), and there is no read site anywhere — the single-loop `Save` (… <sub>🪰 Gadfly · advisory</sub>
@@ -0,0 +32,4 @@
return context.WithValue(ctx, resumeStateKey{}, rs)
}
func resumeStateFromContext(ctx context.Context) *ResumeState {

🟡 Context value type safety

security · flagged by 1 model

  • Context value type safety (run/checkpoint.go:35,49): The resumeStateFromContext and existingCheckpointerFromContext functions use type assertions without error handling. If the context contains a value with the wrong type for these keys, the functions will silently return nil, which could lead to unexpected behavior. While this is not a direct security vulnerability, it could mask configuration errors that might have security implications.

🪰 Gadfly · advisory

🟡 **Context value type safety** _security · flagged by 1 model_ - **Context value type safety** (`run/checkpoint.go:35,49`): The `resumeStateFromContext` and `existingCheckpointerFromContext` functions use type assertions without error handling. If the context contains a value with the wrong type for these keys, the functions will silently return `nil`, which could lead to unexpected behavior. While this is not a direct security vulnerability, it could mask configuration errors that might have security implications. <sub>🪰 Gadfly · advisory</sub>
@@ -0,0 +51,4 @@
return cp
}
// checkpointOutcome is the finalize decision for a durable run.

🟡 checkpointOutcome type and constants are not exported, limiting usability outside the package

maintainability · flagged by 1 model

  • run/checkpoint.go:54-61: The checkpointOutcome type and its constants are not exported. This limits their usability outside the package and could be confusing for external consumers who might need to interact with these outcomes. Consider exporting them if they are part of the public API. - run/phases.go:67-74: The phaseDeps struct includes a checkpointer field and a resume field, but the comments and logic around these fields are not entirely clear. The comments could be i…

🪰 Gadfly · advisory

🟡 **checkpointOutcome type and constants are not exported, limiting usability outside the package** _maintainability · flagged by 1 model_ - **`run/checkpoint.go:54-61`**: The `checkpointOutcome` type and its constants are not exported. This limits their usability outside the package and could be confusing for external consumers who might need to interact with these outcomes. Consider exporting them if they are part of the public API. - **`run/phases.go:67-74`**: The `phaseDeps` struct includes a `checkpointer` field and a `resume` field, but the comments and logic around these fields are not entirely clear. The comments could be i… <sub>🪰 Gadfly · advisory</sub>
@@ -0,0 +70,4 @@
switch {
case runErr == nil:
return checkpointComplete
case errors.Is(cause, ErrShutdown):

🔴 ErrShutdown sentinel mismatch: shutdowns classified as Fail until mort wiring PR lands

correctness, error-handling · flagged by 2 models

  • run/checkpoint.go:73classifyCheckpointOutcome keys shutdown detection on run.ErrShutdown, but mort currently stamps its own runengine.ErrShutdown (a distinct sentinel) on the base context. Until the mort wiring PR aliases them, errors.Is(cause, ErrShutdown) will be false for every mort shutdown, causing checkpointFail (deletes the checkpoint) instead of checkpointLeaveRunning. The PR description acknowledges this gap; from the error-handling lens, it means the shutdown→r…

🪰 Gadfly · advisory

🔴 **ErrShutdown sentinel mismatch: shutdowns classified as Fail until mort wiring PR lands** _correctness, error-handling · flagged by 2 models_ - **`run/checkpoint.go:73`** — `classifyCheckpointOutcome` keys shutdown detection on `run.ErrShutdown`, but mort currently stamps its own `runengine.ErrShutdown` (a distinct sentinel) on the base context. Until the mort wiring PR aliases them, `errors.Is(cause, ErrShutdown)` will be `false` for every mort shutdown, causing `checkpointFail` (deletes the checkpoint) instead of `checkpointLeaveRunning`. The PR description acknowledges this gap; from the error-handling lens, it means the shutdown→r… <sub>🪰 Gadfly · advisory</sub>
@@ -0,0 +83,4 @@
if cp == nil {
return
}
switch classifyCheckpointOutcome(runErr, cause) {

🔴 finalizeCheckpoint silently ignores Complete/Fail errors leaving stale recovery records

error-handling, security · flagged by 4 models

  • run/checkpoint.go:88-90finalizeCheckpoint swallows errors from cp.Complete and cp.Fail. If the store fails to delete a finished run's checkpoint (network blip, timeout), the record survives and the next boot will incorrectly attempt to resume a run that already terminated. This creates false recovery candidates.

🪰 Gadfly · advisory

🔴 **finalizeCheckpoint silently ignores Complete/Fail errors leaving stale recovery records** _error-handling, security · flagged by 4 models_ - **`run/checkpoint.go:88-90`** — `finalizeCheckpoint` swallows errors from `cp.Complete` and `cp.Fail`. If the store fails to delete a finished run's checkpoint (network blip, timeout), the record survives and the next boot will incorrectly attempt to resume a run that already terminated. This creates false recovery candidates. <sub>🪰 Gadfly · advisory</sub>
run/executor.go Outdated
@@ -183,0 +185,4 @@
// Durable recovery (optional): a recovered run carries a ResumeState (prior
// transcript / completed phases) + an existing Checkpointer in ctx so it
// continues on the SAME durable record; a fresh run mints a per-run
// Checkpointer via the factory (which decides durability — nil = non-durable).

🔴 CheckpointerFactory.Begin error silently swallowed with no logging

error-handling · flagged by 1 model

  • run/executor.go:188CheckpointerFactory.Begin error is silently discarded. If the factory returns a storage error (e.g., DB unavailable), the executor proceeds as if the run is non-durable, with no log or metric. Operators have no way to detect a broken checkpoint store.

🪰 Gadfly · advisory

🔴 **CheckpointerFactory.Begin error silently swallowed with no logging** _error-handling · flagged by 1 model_ - **`run/executor.go:188`** — `CheckpointerFactory.Begin` error is silently discarded. If the factory returns a storage error (e.g., DB unavailable), the executor proceeds as if the run is non-durable, with no log or metric. Operators have no way to detect a broken checkpoint store. <sub>🪰 Gadfly · advisory</sub>
run/executor.go Outdated
@@ -183,0 +190,4 @@
resume := resumeStateFromContext(ctx)
ckpt := existingCheckpointerFromContext(ctx)
if ckpt == nil && e.cfg.Ports.Checkpointer != nil {
if c, cerr := e.cfg.Ports.Checkpointer.Begin(ctx, info); cerr == nil {

🟠 CheckpointerFactory.Begin error silently swallowed with no log

correctness, error-handling · flagged by 4 models

  • run/executor.go:193-195 — Swallowed Begin error with no logging. If CheckpointerFactory.Begin returns a non-nil error (e.g., a transient store issue the factory didn't self-log), the executor silently degrades to non-durable with zero indication. The interface doc says the factory should log and return (nil, nil), but the executor cannot enforce that. At minimum, log a warning so operators know durability is broken for this run. *(Verified by reading executor.go:192-196 — the `ce…

🪰 Gadfly · advisory

🟠 **CheckpointerFactory.Begin error silently swallowed with no log** _correctness, error-handling · flagged by 4 models_ - **`run/executor.go:193-195`** — Swallowed `Begin` error with no logging. If `CheckpointerFactory.Begin` returns a non-nil error (e.g., a transient store issue the factory didn't self-log), the executor silently degrades to non-durable with zero indication. The interface doc says the factory should log and return `(nil, nil)`, but the executor cannot enforce that. At minimum, log a warning so operators know durability is broken for this run. *(Verified by reading `executor.go:192-196` — the `ce… <sub>🪰 Gadfly · advisory</sub>
run/executor.go Outdated
@@ -333,0 +351,4 @@
// transcript as history and continues with no new input. acc starts from the
// resume history (or the opening user message) and grows as steps complete.
obs := stepObserver
if ckpt != nil {

🔴 Resume logic incorrectly overwrites initial user message

correctness, maintainability, performance · flagged by 5 models

  • run/executor.go:356-367: The checkpointing logic for single-loop runs incorrectly initializes the transcript accumulator (acc). It starts with multimodalUserMessage(input, inv.Images) and then overwrites it with resume.History if a resume state exists. This means the initial user message is lost during resume, which violates the semantic correctness of resuming a run. The resumed run should continue from the saved transcript, but the initial user message should still be part of…

🪰 Gadfly · advisory

🔴 **Resume logic incorrectly overwrites initial user message** _correctness, maintainability, performance · flagged by 5 models_ - **`run/executor.go:356-367`**: The checkpointing logic for single-loop runs incorrectly initializes the transcript accumulator (`acc`). It starts with `multimodalUserMessage(input, inv.Images)` and then **overwrites** it with `resume.History` if a resume state exists. This means the initial user message is lost during resume, which violates the semantic correctness of resuming a run. The resumed run should continue from the saved transcript, but the initial user message should still be part of… <sub>🪰 Gadfly · advisory</sub>
run/executor.go Outdated
@@ -333,0 +364,4 @@
if len(s.Results) > 0 {
acc = append(acc, llm.ToolResultsMessage(s.Results...))
}
_ = ckpt.Save(runCtx, RunCheckpointState{Messages: acc, Iteration: s.Index + 1})

🔴 Per-step checkpoint Save error silently swallowed

correctness, error-handling, performance, security · flagged by 6 models

  • run/executor.go:367 — Per-step checkpoint Save error silently swallowed: The per-step save in the single-loop path discards the error with _ =. Same impact as above — checkpoint persistence failures go unnoticed, risking stale recovery state. Verified by reading run/executor.go:359-368. Suggested fix: Log the error; at minimum ensure the failure is recorded in the audit trail.

🪰 Gadfly · advisory

🔴 **Per-step checkpoint Save error silently swallowed** _correctness, error-handling, performance, security · flagged by 6 models_ - **`run/executor.go:367` — Per-step checkpoint `Save` error silently swallowed**: The per-step save in the single-loop path discards the error with `_ =`. Same impact as above — checkpoint persistence failures go unnoticed, risking stale recovery state. **Verified** by reading `run/executor.go:359-368`. *Suggested fix*: Log the error; at minimum ensure the failure is recorded in the audit trail. <sub>🪰 Gadfly · advisory</sub>
run/executor.go Outdated
@@ -355,1 +401,4 @@
// Finalize durable recovery: clear the checkpoint on success/terminal failure,
// or leave it for boot recovery when the run was interrupted by shutdown.
finalizeCheckpoint(ctx, ckpt, runErr, context.Cause(runCtx))

🟠 finalizeCheckpoint not deferred: a panic (or early setup-error return) bypasses it, leaving an orphaned recoverable checkpoint that should be Fail'd → boot recovery loop

error-handling · flagged by 1 model

  • run/executor.go:404finalizeCheckpoint is a plain statement, not deferred, so panics and early-error returns bypass it; a panicked run leaves an orphaned recoverable checkpoint. The checkpointer is begun at :193, the wrapped step observer Saves a snapshot each completed step (:367), but finalizeCheckpoint(...) only runs at :404 after the dispatch. The top-level recover() defer (:119-123) converts a panic anywhere in the loop into res.Err and returns without finali…

🪰 Gadfly · advisory

🟠 **finalizeCheckpoint not deferred: a panic (or early setup-error return) bypasses it, leaving an orphaned recoverable checkpoint that should be Fail'd → boot recovery loop** _error-handling · flagged by 1 model_ - **`run/executor.go:404` — `finalizeCheckpoint` is a plain statement, not deferred, so panics and early-error returns bypass it; a panicked run leaves an orphaned recoverable checkpoint.** The checkpointer is begun at `:193`, the wrapped step observer `Save`s a snapshot each completed step (`:367`), but `finalizeCheckpoint(...)` only runs at `:404` after the dispatch. The top-level `recover()` defer (`:119-123`) converts a panic anywhere in the loop into `res.Err` and returns **without** finali… <sub>🪰 Gadfly · advisory</sub>
run/phases.go Outdated
@@ -64,6 +64,13 @@ type phaseDeps struct {
stepObserver func(agent.Step)
steer func() []llm.Message
rec RunRecorder
// checkpointer records phase-boundary progress (completed phases) for durable

🟡 Comments and logic around phaseDeps fields could be improved for clarity

maintainability · flagged by 1 model

  • run/checkpoint.go:54-61: The checkpointOutcome type and its constants are not exported. This limits their usability outside the package and could be confusing for external consumers who might need to interact with these outcomes. Consider exporting them if they are part of the public API. - run/phases.go:67-74: The phaseDeps struct includes a checkpointer field and a resume field, but the comments and logic around these fields are not entirely clear. The comments could be i…

🪰 Gadfly · advisory

🟡 **Comments and logic around phaseDeps fields could be improved for clarity** _maintainability · flagged by 1 model_ - **`run/checkpoint.go:54-61`**: The `checkpointOutcome` type and its constants are not exported. This limits their usability outside the package and could be confusing for external consumers who might need to interact with these outcomes. Consider exporting them if they are part of the public API. - **`run/phases.go:67-74`**: The `phaseDeps` struct includes a `checkpointer` field and a `resume` field, but the comments and logic around these fields are not entirely clear. The comments could be i… <sub>🪰 Gadfly · advisory</sub>
run/phases.go Outdated
@@ -73,10 +80,22 @@ type phaseDeps struct {
// deadline/critic-kill — returns the error.
func (e *Executor) runPhases(runCtx context.Context, ra RunnableAgent, deps phaseDeps, query string, images []llm.ImagePart) (*agent.Result, error) {
outputs := make(map[string]string, len(ra.Phases))
var completed []PhaseOutput

🟡 Potential Unbounded Growth in completed Slice in runPhases

performance · flagged by 1 model

  • Potential Unbounded Growth in completed Slice in runPhases In run/phases.go:83, the completed slice is appended to in each phase iteration without any bounds checking. This could lead to unbounded growth in memory usage for long-running or frequently resumed multi-phase runs. Consider adding a limit or periodic cleanup mechanism to prevent excessive memory consumption.

🪰 Gadfly · advisory

🟡 **Potential Unbounded Growth in `completed` Slice in `runPhases`** _performance · flagged by 1 model_ - **Potential Unbounded Growth in `completed` Slice in `runPhases`** In `run/phases.go:83`, the `completed` slice is appended to in each phase iteration without any bounds checking. This could lead to unbounded growth in memory usage for long-running or frequently resumed multi-phase runs. Consider adding a limit or periodic cleanup mechanism to prevent excessive memory consumption. <sub>🪰 Gadfly · advisory</sub>
run/phases.go Outdated
@@ -92,1 +111,4 @@
for i, phase := range ra.Phases {
// Skip phases already completed on a resumed run (key presence, not output
// emptiness — a legitimately-empty phase output still counts as done).
if _, done := outputs[phase.Name]; done {

🟠 Resume phase-skipping trusts checkpoint integrity without validation

error-handling, maintainability, security · flagged by 3 models

  • run/phases.go:114-116 — Resume phase-skipping trusts checkpoint integrity (medium): On resume, phases are skipped based solely on key presence in outputs map (line 114: if _, done := outputs[phase.Name]; done), populated from deps.resume.CompletedPhases (lines 91-97). If an attacker compromises the checkpoint store, they can inject arbitrary PhaseOutput entries to skip security-critical phases. Verified: Read runPhases resume logic — no integrity validation of `deps.resume.…

🪰 Gadfly · advisory

🟠 **Resume phase-skipping trusts checkpoint integrity without validation** _error-handling, maintainability, security · flagged by 3 models_ - **`run/phases.go:114-116` — Resume phase-skipping trusts checkpoint integrity (medium)**: On resume, phases are skipped based solely on key presence in `outputs` map (line 114: `if _, done := outputs[phase.Name]; done`), populated from `deps.resume.CompletedPhases` (lines 91-97). If an attacker compromises the checkpoint store, they can inject arbitrary `PhaseOutput` entries to skip security-critical phases. **Verified**: Read `runPhases` resume logic — no integrity validation of `deps.resume.… <sub>🪰 Gadfly · advisory</sub>
@@ -154,0 +181,4 @@
// hold/serialize it asynchronously.)
completed = append(completed, PhaseOutput{Name: phase.Name, Output: output})
if deps.checkpointer != nil {
_ = deps.checkpointer.Save(runCtx, RunCheckpointState{

🔴 Phase-boundary checkpoint Save error silently swallowed

correctness, error-handling, maintainability, performance · flagged by 5 models

  • run/phases.go:184 — Phase-boundary checkpoint Save error silently swallowed: The phase-boundary save discards the error with _ =. If the checkpoint store fails mid-run (disk full, DB locked, network blip), the run continues but checkpoints aren't persisted. On shutdown/recovery, the host will resume from a stale checkpoint (or none), potentially re-executing work already done or losing progress. Verified by reading run/phases.go:183-188. Suggested fix: Log the error (at minim…

🪰 Gadfly · advisory

🔴 **Phase-boundary checkpoint Save error silently swallowed** _correctness, error-handling, maintainability, performance · flagged by 5 models_ - **`run/phases.go:184` — Phase-boundary checkpoint `Save` error silently swallowed**: The phase-boundary save discards the error with `_ =`. If the checkpoint store fails mid-run (disk full, DB locked, network blip), the run continues but checkpoints aren't persisted. On shutdown/recovery, the host will resume from a stale checkpoint (or none), potentially re-executing work already done or losing progress. **Verified** by reading `run/phases.go:183-188`. *Suggested fix*: Log the error (at minim… <sub>🪰 Gadfly · advisory</sub>
run/ports.go Outdated
@@ -188,2 +200,3 @@
// minimal here; the executor extends what it records during the merge.
// RunCheckpointState is the resumable snapshot a Checkpointer persists.
type RunCheckpointState struct {
// Messages is the running transcript (single-loop run) OR the active phase's

🟡 Documentation claims multi-phase transcript resume not implemented

correctness · flagged by 1 model

  1. run/ports.go:202-203 — Documentation claims unsupported functionality - The comment states Messages is "the running transcript (single-loop run) OR the active phase's transcript (multi-phase run)". However, the multi-phase checkpoint save at run/phases.go:184-187 never sets Messages — it only saves CompletedPhases and ActivePhase. This means mid-phase transcript recovery for multi-phase runs is NOT supported (only boundary-granular). - Impact: Documentation promises fu…

🪰 Gadfly · advisory

🟡 **Documentation claims multi-phase transcript resume not implemented** _correctness · flagged by 1 model_ 2. **`run/ports.go:202-203` — Documentation claims unsupported functionality** - The comment states `Messages` is "the running transcript (single-loop run) OR the active phase's transcript (multi-phase run)". However, the multi-phase checkpoint save at `run/phases.go:184-187` **never sets `Messages`** — it only saves `CompletedPhases` and `ActivePhase`. This means mid-phase transcript recovery for multi-phase runs is NOT supported (only boundary-granular). - **Impact**: Documentation promises fu… <sub>🪰 Gadfly · advisory</sub>
@@ -192,0 +207,4 @@
// already finished, in phase order. nil for single-loop runs.
CompletedPhases []PhaseOutput
// ActivePhase is the name of the in-progress phase (multi-phase only).
ActivePhase string

🟡 RunCheckpointState.ActivePhase is always empty string — field exists but is never populated with meaningful data

maintainability · flagged by 1 model

  1. run/ports.go:210 / run/phases.go:186RunCheckpointState.ActivePhase is defined and persisted, but the only write in the entire codebase is ActivePhase: "" at run/phases.go:186. It is never set to a non-empty value.

🪰 Gadfly · advisory

🟡 **RunCheckpointState.ActivePhase is always empty string — field exists but is never populated with meaningful data** _maintainability · flagged by 1 model_ 2. **`run/ports.go:210` / `run/phases.go:186`** — `RunCheckpointState.ActivePhase` is defined and persisted, but the only write in the entire codebase is `ActivePhase: ""` at `run/phases.go:186`. It is never set to a non-empty value. <sub>🪰 Gadfly · advisory</sub>

🪰 Gadfly review — consensus across 7 models

Verdict: Blocking issues found · 23 findings (9 with multi-model agreement)

Finding Where Models Lens
🔴 Per-step checkpoint Save error silently swallowed run/executor.go:367 6/7 correctness, error-handling, performance, security
🔴 Resume logic incorrectly overwrites initial user message run/executor.go:354 5/7 correctness, maintainability, performance
🔴 Phase-boundary checkpoint Save error silently swallowed run/phases.go:184 5/7 correctness, error-handling, maintainability, performance
🟠 ActivePhase field plumbed through 3 structs but never read/set; doc comments claim active-phase seeding that isn't implemented run/checkpoint.go:23 5/7 correctness, maintainability
🔴 finalizeCheckpoint silently ignores Complete/Fail errors leaving stale recovery records run/checkpoint.go:86 4/7 error-handling, security
🟠 CheckpointerFactory.Begin error silently swallowed with no log run/executor.go:193 4/7 correctness, error-handling
🟠 Resume phase-skipping trusts checkpoint integrity without validation run/phases.go:114 3/7 error-handling, maintainability, security
🔴 ErrShutdown sentinel mismatch: shutdowns classified as Fail until mort wiring PR lands run/checkpoint.go:73 2/7 correctness, error-handling
🟡 Potential information leakage in checkpoint metadata checkpoint/handle.go:107 2/7 maintainability, security
14 single-model findings (lower confidence)
Finding Where Model Lens
🔴 CheckpointerFactory.Begin error silently swallowed with no logging run/executor.go:188 kimi-k2.6:cloud error-handling
🟠 Pre-loop early-return error paths skip finalizeCheckpoint; a recovered run's existing durable record is left unfinalized, causing indefinite boot-recovery retry on persistent build errors run/executor.go:207 glm-5.2:cloud error-handling
🟠 finalizeCheckpoint not deferred: a panic (or early setup-error return) bypasses it, leaving an orphaned recoverable checkpoint that should be Fail'd → boot recovery loop run/executor.go:404 claude-code/opus error-handling
🟠 Template injection via persisted phase outputs in expandPhaseTemplate run/phases.go:177 qwen3.5:397b-cloud security
🟡 NewFactory never initializes factory.now, leaving a configurable-looking but always-nil field checkpoint/handle.go:103 glm-5.2:cloud maintainability
🟡 Context value type safety run/checkpoint.go:35 devstral-2:123b-cloud security
🟡 checkpointOutcome type and constants are not exported, limiting usability outside the package run/checkpoint.go:54 devstral-2:123b-cloud maintainability
🟡 Stale phaseDeps doc still lists 'step observer' in sharedOpts after it was moved out per-path run/phases.go:56 claude-code/opus correctness, maintainability
🟡 Comments and logic around phaseDeps fields could be improved for clarity run/phases.go:67 devstral-2:123b-cloud maintainability
🟡 Potential Unbounded Growth in 'completed' Slice in 'runPhases' run/phases.go:83 devstral-2:123b-cloud performance
🟡 Documentation claims multi-phase transcript resume not implemented run/ports.go:202 qwen3.5:397b-cloud correctness
🟡 RunCheckpointState.ActivePhase is always empty string — field exists but is never populated with meaningful data run/ports.go:210 deepseek-v4-pro:cloud maintainability
factory.now field is never set by NewFactory (always nil); dead/untestable field checkpoint/handle.go:90 claude-code/opus:max maintainability
factory.now field is never assigned (NewFactory omits it) — dead/always-nil clock field checkpoint/handle.go:95 claude-code/opus maintainability
Per-model detail
claude-code/opus (claude-code) — Minor issues

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

This PR is internal kernel plumbing for durable checkpoint/resume. I've traced the tainted-input and trust-boundary paths relevant to the security lens.

VERDICT: No material issues found

I reviewed this through the security lens (authz, injection, untrusted-data handling, secret leakage) and verified the key paths against the checked-out code:

  • Resume data trust boundary (run/executor.go:185-196, run/checkpoint.go:18-52): ResumeState and the existing Checkpointer arrive via in-process context values set by the host's boot-recovery path, and the saved transcript originates from the host-controlled CheckpointStore (durable infra). This is trusted host infrastructure, not an external/untrusted ingress point, so seeding agent.WithHistory(resume.History) does not cross a new trust boundary.
  • No template injection on phase resume: I checked run/phases.go — completed-phase outputs are threaded into the next phase's prompt via {{.<PhaseName>}} substitution. Go's text/template substitutes a .Field value as data; it does not re-parse interpolated values, so a phase output containing {{...}} cannot inject template directives. No injection vector here.
  • Authz/budget not bypassed on recovery: The pre-run budget gate (e.cfg.Ports.Budget.Check(ctx, inv.CallerID), run/executor.go:139-143) runs before any resume seeding, so a recovered run is still gated. Authorization at re-dispatch is the host's responsibility, unchanged by this PR.
  • No secret leakage introduced: factory.Begin (checkpoint/handle.go:107-123) copies Prompt/CallerID/GuildID into RunCheckpointMeta for persistence — that is inherent to checkpointing and goes only to the host-supplied store, not to any external sink. The info.Inputs["prompt"].(string) assertion is a safe comma-ok and is not a security concern.

No authn/authz gap, injection, SSRF, unsafe deserialization, or insecure default is introduced by this change.

One adjacent observation, explicitly outside my lens (not a security finding): the documented run.ErrShutdown vs mort's runengine.ErrShutdown mismatch means a shutdown would currently be classified as checkpointFail and delete the recovery record — that's an availability/correctness concern for the correctness lens, not security.

🎯 Correctness — Minor issues

Both findings are confirmed against the actual code:

Finding 1 — Verified. resume.History is read only in the single-loop path (executor.go:356,376); the multi-phase path reads only CompletedPhases (phases.go:91-97). ActivePhase has no reader anywhere — it is written as the literal "" at phases.go:186 and merely copied through handle.go:61. Meanwhile ResumeState (checkpoint.go:21,25) and RunCheckpointState (ports.go:209-210) document active-phase seeding that never occurs. Confirmed dead field + over-promising docs.

Finding 2 — Verified. phaseDeps comment at phases.go:56-57 still says the shared options are "(tool-error limits + step observer + compactor)", but executor.go:310-312 now builds sharedOpts with only tool-error limits + compactor; the step observer is wired per-path (executor.go:373, phases.go:236). Observer is added exactly once per path — no double-count. Pure doc drift.

Both hold. Verdict unchanged.


I've traced the durable-recovery logic against the actual checked-out code. The core classification, cause-propagation, resume, and phase-skip logic all hold up. Findings below.

VERDICT: Minor issues

Findings

  • Multi-phase active-phase resume is documented but unimplemented (write-only/dead fields). run/checkpoint.go:21,25 and run/ports.go:209-210 advertise that a resumed run "seeds the active phase" via ResumeState.ActivePhase / ResumeState.History, and RunCheckpointState.ActivePhase carries "the name of the in-progress phase (multi-phase only)." Verified by grep + reading runPhases: the phase path only ever reads deps.resume.CompletedPhases (run/phases.go:91-97); it never reads resume.ActivePhase or resume.History (those are read only by the single-loop path at executor.go:356,376), and the phase-boundary Save hardcodes ActivePhase: "" (run/phases.go:186). So for any multi-phase run the persisted ActivePhase is always empty, and a host that populates ResumeState.ActivePhase+History (per the doc) gets a silent full re-run of the interrupted phase instead of the mid-phase resume the field names imply. The boundary-granular behavior is correct and intentional (it's documented in the phaseDeps block at phases.go:67-71), but the field-level docs over-promise and the ActivePhase field is effectively dead. Impact: misleading persisted state + a latent host-integration footgun, not a crash. Fix: either drop ActivePhase from RunCheckpointState/ResumeState (and the "seeds the active phase" wording), or actually record/consume it. (low severity.)

  • Stale comment: phaseDeps still lists the step observer as part of sharedOpts. run/phases.go:56-57 describes "the shared agent options (tool-error limits + step observer + compactor)". This PR moved WithStepObserver out of sharedOpts (run/executor.go:306-312) and now wires it per-path (executor.go:373, phases.go:236). The comment now contradicts the code. No behavioral effect — both paths add the observer exactly once, so there's no double-count or audit regression — purely doc drift. (trivial.)

Verified clean (within the correctness lens)

  • classifyCheckpointOutcome keying on errors.Is(cause, ErrShutdown) is sound: finalizeCheckpoint is called at executor.go:404 with context.Cause(runCtx), before the deferred cancellations run, so the cause reflects the true outcome. Critic-kill/deadline/caller-cancel → Fail, shutdown → leave-running, success → Complete all match the test matrix.
  • Single-loop resume (executor.go:376-382) seeds WithHistory + empty input so completed tool calls aren't re-run; the acc reconstruction appends consistently at step granularity.
  • Factory meta mapping (checkpoint/handle.go) is correct; nil now handled by New; RunCheckpointMeta has the GuildID/ModelTier fields, so it compiles.
  • Phase skip uses key-presence not output-emptiness (phases.go:114), isLast indexing stays correct across skipped phases, and a fully-resumed run returns the saved final output without a model call.
🧹 Code cleanliness & maintainability — Minor issues

All four findings are confirmed against the code:

  1. phases.go sharedOpts docsharedOpts (executor.go:310-312) now holds only tool-error limits (+ optional compactor); the phaseDeps parenthetical still lists "step observer". The actual stale text is on line 56 (line 57 separately/correctly documents the distinct stepObserver field). Confirmed.
  2. ResumeState.ActivePhase — grep shows zero reads anywhere; runPhases consumes only CompletedPhases. The "seeds the active phase" doc is unimplemented. Confirmed.
  3. RunCheckpointState.ActivePhase — only ever written "" (phases.go:186; single-loop Save omits it); never read. Confirmed dead plumbing.
  4. factory.nowNewFactory (handle.go:103) never sets it; f.now is always nil. Confirmed dead field.

All survive. Corrected review below (finding 1's line reference adjusted to where the stale text actually sits).

Review

VERDICT: Minor issues

Through the code cleanliness & maintainability lens, the change is generally well-structured (good use of the shared multimodalUserMessage helper, the per-path observer wrap reads cleanly, and the classifyCheckpointOutcome/finalizeCheckpoint split is tidy). A few stale-doc and speculative-field smells:

  • run/phases.go:56 — stale struct doc. The phaseDeps comment still describes the shared agent options as "(tool-error limits + step observer + compactor)". This PR deliberately moved the step observer out of sharedOpts (executor.go:310-312 now builds it with only tool-error limits + optional compactor; WithStepObserver only appears at the per-path sites, executor.go:373 and phases.go:236). The comment now describes the pre-PR layout and contradicts the runOnePhase comment you did update (phases.go:229-232). Fix: drop "step observer" from the sharedOpts parenthetical here.

  • run/checkpoint.go:25 (+ doc lines 20-21) — ResumeState.ActivePhase is unused; docs describe unimplemented behavior. Grep confirms ResumeState.ActivePhase is never read anywhere, and in the multi-phase path (runPhases) only CompletedPhases is consumed (phases.go:91-97). Yet the struct doc claims the executor "seeds the active phase". The PR is explicitly boundary-granular (the active phase re-runs from scratch — see phaseDeps doc, phases.go:67-71), so this is a speculative exported field whose documentation promises behavior that doesn't exist. Either drop ActivePhase or clearly mark it "reserved; not yet consumed" so a future reader doesn't assume mid-phase resume works.

  • run/ports.go:210 + run/phases.go:186RunCheckpointState.ActivePhase is write-only/always empty. The only writer (phases.go:186) sets ActivePhase: "" explicitly, and the single-loop path (executor.go:367) never sets it. So the persisted field is always empty in practice — dead plumbing paired with the unused ResumeState.ActivePhase above. The explicit ActivePhase: "" (zero value) is also redundant.

  • checkpoint/handle.go:95factory.now is dead. The factory struct carries a now func() time.Time field, but NewFactory never sets it (handle.go:103-105), so f.now is always nil and Begin always passes nil to New (handle.go:122), which then defaults to time.Now. Unlike handle, there's no constructor/option that injects a clock, so the field can never be anything but nil. Either drop it, or add the injection path it implies (mirroring handle's testable clock).

Nothing here is blocking; these are low-churn doc/dead-field cleanups.

Performance — Minor issues

I've independently verified both findings against the actual code.

Finding 1 (executor.go:359 — accumulator bypasses compaction): Confirmed. At executor.go:354-368 the checkpoint observer builds acc as a private append-only slice (acc = append(acc, s.Response.Message()) + llm.ToolResultsMessage(s.Results...) every step), entirely independent of the agent's internal history. The compactor is attached only via sharedOpts (executor.go:328), which folds into the agent's options at executor.go:374 — so compaction shrinks the agent's live working set but never touches acc. The growth/serialization-cost framing holds.

Finding 2 (executor.go:367 + handle.go — full-transcript Save every step, no throttle floor): Confirmed. ckpt.Save(runCtx, RunCheckpointState{Messages: acc, ...}) fires synchronously in the step loop; handle.Save (handle.go:42-72) passes the whole st.Messages to store.Save with no delta. The throttle at handle.go:45 is the only guard, and New's contract (handle.go:30-31) plus NewFactory (handle.go:103-105) confirm throttle <= 0 = save every call with no enforced floor. Config-dependent O(n²) write volume for durable backings is real.

Both findings accurately describe the code and stay within the Performance lens. Verdict stands.

VERDICT: Minor issues

The wiring is reasonable and the throttle bounds persistence frequency. Two related performance concerns, both within my lane:

  • run/executor.go:355-368 — the checkpoint accumulator bypasses compaction, so it grows unbounded on exactly the long runs durable recovery targets. The single-loop observer builds acc by appending s.Response.Message() + llm.ToolResultsMessage(s.Results...) on every step and never drops anything. But the agent itself runs with a compactor (sharedOpts includes agent.WithCompactor(...) at executor.go:328, and single-loop opts folds sharedOpts in at executor.go:374). When compaction fires, the agent's live working set shrinks; acc does not. Result: on a long run the checkpoint transcript is strictly larger than the live conversation and grows without the bound compaction is there to provide — both as retained memory (a second, ever-growing copy of every message) and, more importantly, as the payload serialized on each persisted Save. Suggested fix: feed the checkpoint from the agent's current (post-compaction) message history rather than a private append-only accumulator, or cap/compact acc in step with the agent.

  • run/executor.go:367 + checkpoint/handle.go:42-72 — each persisted Save serializes the entire growing transcript, and the kernel relies entirely on the host throttle with no floor. ckpt.Save(runCtx, RunCheckpointState{Messages: acc, ...}) fires synchronously in the agent step loop on every step. When the throttle window has passed, handle.Save calls store.Save with the full acc — a blocking store write on the hot path that re-serializes the whole history every time (no delta/incremental write). The throttle (handle.go:45) is the only mitigation, and New/NewFactory explicitly support throttle <= 0 = "save every call" (handle.go:30-31). With that documented config, a single-loop run does a full-transcript store write on every step → O(n²) cumulative write volume plus blocking I/O between every agent step. This is config-dependent (the in-memory default makes each write cheap), so it's a real edge for the durable (mort durable-job table) backing, not a default-path regression. Suggested fix: enforce a minimum throttle in NewFactory, or document that durable stores must pass a non-zero throttle; ideally combine with the fix above so the serialized payload is bounded.

Everything else I checked is fine: the throttled in-window path is cheap (lock + time.Now + compare); the phase-boundary Save copies only the small []PhaseOutput slice (phases.go:185-189); acc append is amortized O(1) and passing it by reference to Save is not a per-step copy; the resume seed (executor.go:357, WithHistory) is a one-time copy. No N+1, no quadratic per-step copying.

🧯 Error handling & edge cases — Minor issues

Both findings are confirmed against the actual source:

  • Finding 1 (executor.go:404): Verified — finalizeCheckpoint(...) is a plain statement at line 404, executed only after the dispatch returns. The top-level recover() defer (lines 119–123) sets res.Err and returns, bypassing line 404 entirely on a panic. The wrapped step observer (lines 359–368) Saves each step via the live ckpt, so a panic after ≥1 Save leaves a persisted record that is neither Completed nor Failed — which classifyCheckpointOutcome reserves for ErrShutdown (leave-for-recovery), not a terminal panic. The setup-error early returns after Begin (lines 209/217/228/241) likewise skip finalize. Confirmed.
  • Finding 2 (executor.go:193): Verified — cerr is dropped with no log. The CheckpointerFactory doc (ports.go:178–183) does place the "log + degrade to (nil,nil)" duty on the factory, while phaseModel (phases.go:270) emits slog.Warn on the analogous resolve failure. The inconsistency the finding draws is accurate.

Both survive. The verdict is unchanged.


VERDICT: Minor issues

My lens is error handling & edge cases. The happy paths and the documented degraded paths (optional/budget-exhausted phases, nil-safe ports, throttle-skip) are handled cleanly. Two unhappy-path gaps the diff introduces:

  • run/executor.go:404finalizeCheckpoint is a plain statement, not deferred, so panics and early-error returns bypass it; a panicked run leaves an orphaned recoverable checkpoint. The checkpointer is begun at :193, the wrapped step observer Saves a snapshot each completed step (:367), but finalizeCheckpoint(...) only runs at :404 after the dispatch. The top-level recover() defer (:119-123) converts a panic anywhere in the loop into res.Err and returns without finalizing. So if a panic occurs after one or more Saves, the persisted checkpoint is neither Completed nor Failed — by omission it gets the "leave for boot recovery" treatment that classifyCheckpointOutcome reserves for ErrShutdown. A panicking run is terminal and should Fail (delete) the record; instead it becomes a recovery candidate that will be re-dispatched on the next boot and can re-panic (a recovery loop). This is reachable with the shipped battery (whose Save actually persists), so it's store-independent. Suggested fix: make finalize a defer that captures runErr/context.Cause(runCtx) (or have the recover defer Fail the checkpointer).

    Same root cause, lower-impact corollary: the setup-error early returns after Beginbuild toolbox (:209), delegation tools (:217), extra tools (:228), session tools (:241) — also skip finalize. Harmless for the shipped battery (no Save has happened yet), but a factory whose Begin marks the run durable/recoverable (the PR explicitly models this on mort's agentexec) would leak an un-Failed record on any of these.

  • run/executor.go:193Begin error is silently swallowed. if c, cerr := ...Begin(ctx, info); cerr == nil { ckpt = c } drops a non-nil cerr with no log; the run silently continues non-durable. The CheckpointerFactory doc (ports.go:178-183) does place the "log + degrade to (nil,nil)" duty on the factory, so this is defensible — but it diverges from the codebase's own convention for analogous resolution failures (phaseModel at phases.go:270 emits slog.Warn on a model-resolve failure). A misbehaving factory that returns an error here loses durability with zero signal. Suggested fix: slog.Warn the dropped cerr.

Things I checked and cleared (not findings): the shutdown-cause classification reaching context.Cause(runCtx) is sound — MergeCancellation propagates the secondary ctx's cause, so a host-set ErrShutdown flows through. detach(ctx) correctly bases cleanup writes off the parent (non-cancelled) ctx. The phase-skip uses key-presence not output-emptiness (phases.go:114), correctly handling a legitimately-empty phase output. The live acc slice handed to Save at :367 is not an aliasing/mutation bug — append only writes at/after the handed-off slice's len, so the store's prefix view is never mutated.

claude-code/opus:max (claude-code) — Minor issues

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — Minor issues

I've verified the finding against the actual source. The factual claims hold:

  • checkpoint/handle.go:76Fail does discard the passed error (_ error) and only deletes, so runErr is never persisted. Confirmed.
  • run/executor.go:359-368 — the wrapped step observer accumulates s.Response.Message() and llm.ToolResultsMessage(s.Results...) into acc, then calls ckpt.Save each step. Confirmed.
  • The checkpoint executor wiring was a pending P2 follow-up (per CLAUDE.md: "critic/checkpoint executor wiring … is a P2 follow-up"), so this PR is indeed the first to drive Save and persist the transcript. Confirmed.
  • No redaction hook exists on the Save/Checkpointer path. Confirmed.

The finding is accurate and security-relevant; it survives.

VERDICT: Minor issues

This change is internal plumbing — the data crossing the new boundary (transcripts, phase outputs, run meta) originates from the agent's own run, and the only trust boundary is the checkpoint store, which is host-controlled infrastructure. I found no injection, SSRF, deserialization, or authz defect introduced in this diff. One security-relevant property change is worth flagging:

  • run/executor.go:359-368 / checkpoint/handle.go:56 — newly-activated unredacted persistence of full transcripts at rest. The checkpointer port was never consumed before this PR (the executor wiring was a pending P2 follow-up), and the new step-observer accumulates s.Response.Message() and llm.ToolResultsMessage(s.Results...) into acc, then calls ckpt.Save every step. Tool results routinely contain secrets/credentials/PII the agent fetched (config reads, API responses with tokens, etc.). Previously these lived only in process memory for the run's lifetime; this PR is the first to write the complete []llm.Message transcript to durable storage (mort's job table / any host store) with no redaction hook. Impact: it widens data-at-rest exposure and makes the store's encryption/access-control load-bearing for secret confidentiality. This is largely inherent to durable resume (you can't continue without the transcript), so it's a design tradeoff rather than a code defect — but it deserves an explicit note, and ideally a redaction/opt-out seam, since the in-tree Memory() default is in-process only and hides the at-rest cost from anyone reading the core. Severity: small.

Two things I checked and found not to be problems (so I'm not reporting them):

  • No secret leakage via error paths: handle.Fail discards the passed error and only Deletes, so runErr is never persisted.
  • Authz is preserved on a fresh run: the Budget.Check(inv.CallerID) gate and the rebuilt toolbox/permissions still run; CallerID/GuildID are persisted into meta so recovery can restore attribution. The actual boot re-dispatch + its authz live host-side (mort), explicitly out of this diff, so there's no in-diff authz gap to report.

Outside my lens: classifyCheckpointOutcome keys shutdown on run.ErrShutdown while mort stamps a distinct runengine.ErrShutdown (per the PR note, to be aliased later) — until that alias lands, a shutdown-interrupted run would be classified checkpointFail and its recovery record deleted, defeating durable recovery. That's a correctness/availability issue for the other lens, not security.

🎯 Correctness — No material issues found

I've verified the change thoroughly against the checked-out code. Build/vet require approval in this sandbox, but I confirmed every cross-file symbol the diff depends on (MergeCancellation cause-propagation, ErrShutdown/ErrCriticKill, PhaseOutput, phaseProvider, multimodalUserMessage, handle.New nil-now default).

VERDICT: No material issues found

I checked the correctness-sensitive seams and they hold up:

  • classifyCheckpointOutcome decision matrix (run/checkpoint.go:67) — verified the cause actually reaches it correctly. finalizeCheckpoint reads context.Cause(runCtx), and MergeCancellation (run/runengine.go:62) calls cancel(context.Cause(secondary)), so a host-stamped ErrShutdown on the base ctx propagates to runCtx as its cause → checkpointLeaveRunning. A critic kill stamps ErrCriticKill on the inner cancelCause layer, which context.Cause surfaces through the merged child → checkpointFail. The runErr == nil check precedes the cause check, so a run that completes despite a concurrent shutdown still Completes (clears). All consistent with the test matrix.
  • Single-loop transcript accumulator (run/executor.go:355-368) — the reconstructed acc is always balanced: each step appends the assistant message and (when present) the paired tool-results message before Save, so a saved transcript never ends on a dangling tool call. On resume it's correctly re-seeded from resume.History (the multimodalUserMessage seed is overwritten), and ag.Run(runCtx, "", WithHistory(...)) continues without re-running completed tool calls — matching the PR claim. System prompt is re-applied via agent.New, correctly not stored in history. Iteration s.Index+1 is the right 1-based completed-iteration count.
  • Phase-boundary resume (run/phases.go:91-117,177-188) — skip uses key-presence (_, done := outputs[...]), so a legitimately-empty phase output still counts as done; the completed slice is copied per Save and ordering is preserved across resumed + fresh phases. The interrupted phase correctly re-runs from scratch (boundary-granular, as documented), and the all-phases-completed edge returns the last saved output without a model call.
  • Battery factory (checkpoint/handle.go:109-122) — nil now is defaulted inside New, so the unset factory.now is harmless; RunCheckpointMeta carries the GuildID/ModelTier fields Begin populates. No import cycle (battery→core only).

One non-blocking observation (not a correctness defect, so excluded from the findings block): RunCheckpointState.ActivePhase is plumbed through handle.Save but never written with a non-empty value — the single-loop path leaves it zero and the phase-boundary Save explicitly sets ActivePhase: "" (run/phases.go:186). Resume doesn't consume it (the active phase re-runs from CompletedPhases), so behavior is correct today; the field is just latent/dead. Worth a comment so a future host doesn't rely on it to identify the in-flight phase.

🧹 Code cleanliness & maintainability — Minor issues

All four findings are confirmed against the actual code:

  1. ActivePhase — grep shows every occurrence is either a field declaration, the literal ActivePhase: "" (run/phases.go:186), or the pass-through copy (checkpoint/handle.go:61). No read site exists; the single-loop Save (run/executor.go:367) never sets it, so st.ActivePhase is always "". The doc comments at run/checkpoint.go:21,25 do claim "seeds the active phase" / "the phase that was in flight." Confirmed.
  2. Redundant ActivePhase: "" at run/phases.go:186 — confirmed zero-value with no reader.
  3. Duplicated resume guardresume != nil && len(resume.History) > 0 at run/executor.go:356 and :376. Confirmed.
  4. factory.now never setNewFactory returns &factory{store, throttle} (handle.go:104), leaving now nil; only safe via New's nil→time.Now default. No assignment anywhere in the package. Confirmed.

All findings survive.

VERDICT: Minor issues

  • ActivePhase is dead weight, and its docs overpromise — run/checkpoint.go:25, run/ports.go:210, checkpoint/checkpoint.go:41. The field is plumbed through three structs (ResumeState, RunCheckpointState, RunCheckpoint) but the kernel never reads it and never writes a non-empty value to it. Every write site is either the literal ActivePhase: "" (run/phases.go:186) or a pass-through copy (checkpoint/handle.go:61), and there is no read site anywhere — the single-loop Save (run/executor.go:367) doesn't set it, and the phase resume loop (run/phases.go:91-97) only reads CompletedPhases. The design is boundary-granular — the interrupted phase re-runs from its start — so ActivePhase carries no information the resume path uses. Meanwhile the doc comments claim otherwise: run/checkpoint.go:21 says multi-phase resume "skips completed phases and seeds the active phase", and the field comment (:25) calls it "the phase that was in flight." No active-phase seeding exists. This will mislead a future maintainer into thinking mid-phase resume is implemented. Fix: either drop the ActivePhase field from all three structs (and the misleading clauses in the comments), or, if it's a deliberate forward hook for the Phases follow-up, mark it clearly as reserved/unused and correct the comments so they don't describe behavior that isn't there.

  • Redundant explicit zero-value with no payoff — run/phases.go:186. The phase-boundary Save sets ActivePhase: "" explicitly. Since it's always the zero value and nothing reads it, the line is noise that hints at intent the code doesn't deliver. Drop it (folds into the finding above).

  • Duplicated resume guard — run/executor.go:356 and :376. resume != nil && len(resume.History) > 0 is computed twice within the single-loop branch (once to seed acc, once to choose the ag.Run-with-history vs runAgent path). Low-churn fix: hoist to a local resuming := resume != nil && len(resume.History) > 0 and reuse it, so the two sites can't drift.

  • (trivial) Unused now field on factorycheckpoint/handle.go (factory struct + NewFactory). NewFactory never sets now, so factory.now is always nil; it's only safe because New defaults nil→time.Now (handle.go:36-38). It's a dead field (no same-package assignment, no test sets it). Either drop it or give NewFactory a way to inject it, to match the testability New already offers.

No structural problems with the executor refactor itself: moving WithStepObserver out of sharedOpts and wiring it per-path (single-loop wraps for checkpointing; phases wire it in runOnePhase) is a clean, well-commented split, and the updated comments at run/executor.go:306-309 and run/phases.go:229-232 accurately describe the new arrangement.

Performance — No material issues found

I've verified the mechanism against the actual code. My findings:

  • executor.go:367 does call ckpt.Save(runCtx, ...) synchronously inside the wrapped step observer obs, wired via agent.WithStepObserver(obs) — confirmed.
  • handle.Save (handle.go:42–72) is synchronous and throttle-gated; the throttle short-circuits before the store write — confirmed.
  • throttle <= 0 "saves every call" is documented (handle.go:30–31) and the factory makes every run durable by default (handle.go:88–91) — confirmed.

However, the performance framing does not hold up. The step observer fires once per agent step — i.e., once per model round-trip (see run/steps.go and the agent.Step-keyed observer). It is not a tight inner loop. A synchronous local checkpoint write at LLM-round-trip cadence (seconds) is negligible against the model latency that dominates each step, even at throttle=0. The "quadratic-ish" transcript re-serialization is true in big-O but trivial in absolute terms because the number of writes is bounded by the number of model calls. The author themselves rate this severity small / confidence low and concede it "doesn't bite in the intended (throttled) configuration." As a performance finding there is no material cost to keep.

Dropping it leaves nothing.

VERDICT: No material issues found

The checkpoint/resume wiring is performance-conscious: the store write is throttle-gated and short-circuits before serialization, the single-loop acc accumulation is append-only / amortized O(1) per step, and phase-boundary checkpointing copies a small slice bounded by phase count. The one per-step ckpt.Save runs synchronously in the step observer, but that observer fires at model-round-trip cadence, so a local durable write there is not a meaningful hot-path cost.

🧯 Error handling & edge cases — Minor issues

Both findings verified against the actual code:

Finding 1 (phases.go:114-117) — Confirmed. The skip guard if _, done := outputs[phase.Name]; done is newly added in this diff, and the same outputs map is populated at line 177 as each phase completes. Phase.Name (run/agent.go:62) is a free-form string with no uniqueness validation anywhere in run/ (grep confirms only uniqueName for input files and palette dedup — nothing for phase names). So a fresh pipeline with two same-named phases now skips the second. Real new edge case.

Finding 2 (executor.go:193) — Confirmed at the code level: cerr is dropped with no log. The port contract (run/ports.go) does delegate logging to the factory, so this is a minor defensive gap, but the observation is accurate and consistent with the nearby slog.Warn degrade convention.

Both survive.

VERDICT: Minor issues

Reviewed strictly through the error-handling & edge-cases lens. I traced the shutdown-cause propagation (ctxWithoutCancel+WithTimeoutWithCancelCauseMergeCancellation) and confirmed context.Cause(runCtx) carries ErrShutdown into classifyCheckpointOutcome (run/checkpoint.go, run/executor.go:259-264), so the finalize/classify matrix is sound. The accumulator never persists a transcript ending on a dangling tool call (obs appends results before Save), and nil/empty inputs are handled (info.Inputs["prompt"] on a nil map, f.now nil → time.Now). Two real but low-severity issues:

  • run/phases.go:114-117 — repeated phase name now silently skips the second phase (new edge case, not resume-only). The new skip guard keys on outputs[phase.Name] presence, but that same map is populated as each phase completes (line 177). Phase.Name is a free-form string with no uniqueness validation (run/agent.go:62-63; no validator anywhere in run/). So in a fresh (non-resume) pipeline where two phases share a name, the first runs and sets outputs["x"]; when the loop reaches the second "x", the guard fires and it is skipped entirely. Before this diff every phase ran unconditionally. The failure is silent: the skipped phase never executes, lastOutput is the earlier phase's text, and the run is then checkpoint-Completed as if finished. Impact: a misconfigured-but-previously-working pipeline (e.g. a duplicated "synthesize" name) returns the wrong output with no error. Trigger likelihood is low (requires duplicate names), hence small severity. Fix: either validate phase-name uniqueness up front, or track skip-set membership in a dedicated set seeded only from resume.CompletedPhases rather than reusing the live outputs map.

  • run/executor.go:193Begin error silently swallowed, no log. if c, cerr := …Begin(ctx, info); cerr == nil { ckpt = c } drops a non-nil cerr with no slog. The port contract (run/ports.go) says a storage error "should be logged and degraded to (nil, nil)" — i.e. logging is the factory's job — so degrading to non-durable is defensible, but a contract-violating factory that returns an error becomes invisibly non-durable. This is inconsistent with the codebase's degrade-with-slog.Warn convention used nearby (phase model resolve, template parse, unknown tool). Fix: slog.Warn("run: checkpointer Begin failed; running non-durable", "run_id", info.RunID, "error", cerr) in the error branch.

deepseek-v4-pro:cloud (ollama-cloud) — Minor issues

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

After tracing tainted input through the full checkpoint/resume flow and examining the new code for authn/authz gaps, injection, SSRF, unsafe deserialization, credential leakage, and missing validation, I find no material security issues in this PR.

Here's what I verified:

  • Context value carriers (WithResumeState, WithExistingCheckpointer): use unexported key types (resumeStateKey{}, existingCheckpointerKey{}), the standard Go pattern preventing cross-package spoofing. Verified at run/checkpoint.go:28-51.

  • Template injection in phase prompts (expandPhaseTemplate at run/phases.go:343): uses text/template with missingkey=zero. The template source is the agent author's phase.SystemPrompt; user-supplied values (query, prior phase outputs) are substituted as plain strings — Go's text/template does not recursively re-parse substituted values. A user typing {{.Query}} produces literal text, not expansion.

  • info.Inputs["prompt"].(string) in checkpoint/handle.go:110: safe comma-ok type assertion. A non-string or nil Inputs map yields "" — no panic, no injection. The prompt is metadata for re-dispatch, not used for any security decision.

  • detach for finalization (run/executor.go:526-530): uses context.WithoutCancel + WithTimeout — the correct pattern for cleanup that must not be aborted by caller cancellation. No context-value leakage beyond what the caller already put there.

  • Checkpoint data at rest: RunCheckpoint carries []llm.Message (full transcript) and []run.PhaseOutput. This is inherent to durable resume — you can't resume without saving state. The CheckpointStore interface leaves encryption/access control to the implementation (mort's durable-job table, SQLite, etc.). The battery itself adds no new exposure.

  • No authz bypass: the Checkpointer handle is per-run, minted by the factory and scoped to a single RunID. Complete/Fail delete by that run ID only. No cross-run access vector exists.

  • No SSRF, path traversal, or unsafe deserialization: the code touches no URLs, file paths, or dynamic interface{} deserialization. All persisted types are concrete Go structs with string/int/slice fields.

  • Error suppression in finalize (run/checkpoint.go:88-90): Complete/Fail errors are silently discarded (_ = cp.Complete(...)). While this is a reliability concern (a failed Complete leaves a stale checkpoint), it is security-positive: store error details are never surfaced to callers.

🎯 Correctness — Minor issues

Verdict: Minor issues

  • run/phases.go:184-187ActivePhase is always hardcoded to "" in the phase-boundary checkpoint save, and the single-loop path (run/executor.go:367) never sets it either. The field exists in RunCheckpointState, RunCheckpoint, and ResumeState but the executor never writes a non-empty value. A host loading a checkpoint to build a ResumeState will always see ActivePhase == "", making the field dead data in the kernel. Either the field should be populated (e.g., set to the next phase name at each boundary, or to "" / the current phase name mid-loop for single-loop runs) or removed from the kernel types to avoid misleading consumers. (Verified by reading run/phases.go:183-188 and run/executor.go:354-368ActivePhase is never assigned a non-zero value in any save path.)

  • run/executor.go:192-195CheckpointerFactory.Begin errors are silently swallowed. The interface doc says the factory should log and degrade, but if an implementation returns an error without logging (or the error is unexpected), the executor proceeds with no checkpointing and no diagnostic. A slog.Warn here would be defense-in-depth consistent with how the executor logs other non-fatal port failures. (Verified by reading run/executor.go:192-195 — the cerr variable is checked but never logged.)

  • run/executor.go:367 vs run/phases.go:185 — Inconsistent defensive copying: the phase path copies completed via append([]PhaseOutput(nil), completed...) before passing to Save, but the single-loop path passes the live acc slice directly (Messages: acc). If any CheckpointStore implementation retains the RunCheckpoint struct (or its Messages field) beyond the synchronous Save return — e.g., for async serialization — the single-loop path's checkpoint data would be mutated by subsequent observer calls. The single-loop path should copy acc the same way the phase path copies completed. (Verified by comparing run/executor.go:367 with run/phases.go:185.)

🧹 Code cleanliness & maintainability — Minor issues

All three findings are confirmed against the actual code. Let me verify the exact line numbers for the findings:

  1. run/checkpoint.go:25ResumeState.ActivePhase is defined but never read. Grep confirms zero reads of resume.ActivePhase anywhere; the only .ActivePhase reads are on RunCheckpointState (in checkpoint/handle.go:61), never on ResumeState. The phase runner uses CompletedPhases but never consults ActivePhase.

  2. run/ports.go:210 / run/phases.go:186RunCheckpointState.ActivePhase is defined and persisted, but the only write in the entire codebase is ActivePhase: "" at run/phases.go:186. It is never set to a non-empty value.

  3. run/executor.go:355-358 — On the resume path, acc is allocated as []llm.Message{multimodalUserMessage(input, inv.Images)} at line 355 and then immediately overwritten with resume.History at line 357. The initial allocation is dead on the resume path.

All three survive verification.

Verdict: Minor issues

  • run/checkpoint.go:25 — Dead field ResumeState.ActivePhase (verified). The field is defined, documented ("the phase that was in flight"), and carried through the context plumbing, but it is never read anywhere in the codebase. I grepped for resume.ActivePhase, \.ActivePhase across the entire repo — the only .ActivePhase reads are on RunCheckpointState (in checkpoint/handle.go:61), never on ResumeState. The phase runner (run/phases.go:91-97) uses deps.resume.CompletedPhases to skip finished phases but never consults ActivePhase. This is a dead field that will confuse future readers who expect it to drive behavior. Either wire it up (e.g., to validate that the active phase matches expectations on resume) or remove it.

  • run/ports.go:210 / run/phases.go:186RunCheckpointState.ActivePhase is always "" (verified). The field is defined, documented, and persisted to the store, but the only write in the codebase is ActivePhase: "" at run/phases.go:186. It is never set to a non-empty value. A reader seeing this field on the struct would reasonably expect it to carry the name of the in-progress phase during a multi-phase run, but it never does. Either populate it (set it to phase.Name before the phase loop body at run/phases.go:111) or remove it from the struct until it's actually used.

  • run/executor.go:355-358 — Wasted allocation in acc initialization (verified). When resuming, acc is first allocated as []llm.Message{multimodalUserMessage(input, inv.Images)} and then immediately overwritten with resume.History on the next line. The initial multimodalUserMessage allocation is dead on the resume path. A cleaner structure avoids the throwaway allocation:

    var acc []llm.Message
    if resume != nil && len(resume.History) > 0 {
        acc = append([]llm.Message(nil), resume.History...)
    } else {
        acc = []llm.Message{multimodalUserMessage(input, inv.Images)}
    }
    
Performance — No material issues found

After a thorough review of the hot paths, allocation patterns, and synchronization overhead introduced by this PR, I find no material performance regressions. Here's what I verified:

  • Single-loop checkpoint observer (run/executor.go:359-368): The closure captures acc by reference and calls ckpt.Save on every step. However, handle.Save (checkpoint/handle.go:42-72) throttles aggressively — after the first persist, subsequent calls within the throttle window do only a mutex lock/check/unlock (uncontended, single-goroutine). The RunCheckpointState struct is stack-allocated and the Messages slice header copy is O(1). No deep copy occurs on the hot path.

  • Phase-boundary checkpointing (run/phases.go:182-188): append([]PhaseOutput(nil), completed...) allocates a copy of the completed-phases slice at each boundary. For N phases this is O(N²) total allocation, but N is small (typically 2–5), making this negligible.

  • Resume path (run/executor.go:355-357): The initial acc allocation ([]llm.Message{multimodalUserMessage(...)}) is discarded when resume history is present. This is one wasted allocation per resumed run — truly trivial.

  • No unbounded growth, missing pagination, or N+1 patterns were introduced. The transcript grows with the conversation, but that's inherent to durable recovery and bounded by the existing compaction system.

  • No blocking calls on hot paths beyond the throttled Save. The mutex in handle.Save is uncontended (one goroutine per run), and time.Now() (if used) is vDSO-optimized.

Verdict: No material issues found

🧯 Error handling & edge cases — Minor issues

Verdict: Minor issues

  • run/executor.go:193-195 — Swallowed Begin error with no logging. If CheckpointerFactory.Begin returns a non-nil error (e.g., a transient store issue the factory didn't self-log), the executor silently degrades to non-durable with zero indication. The interface doc says the factory should log and return (nil, nil), but the executor cannot enforce that. At minimum, log a warning so operators know durability is broken for this run. (Verified by reading executor.go:192-196 — the cerr is checked only for nil-ness, never logged.)

  • run/executor.go:367ckpt.Save error silently discarded. If the checkpoint store is down mid-run, every step's save fails silently; the run completes, finalizeCheckpoint calls Complete (which may also fail silently), and there is no durable record — but the caller sees a successful run with no indication durability was lost. Same pattern at run/phases.go:184 for phase-boundary saves. (Verified by reading both call sites — both use _ = to discard the error.)

  • run/checkpoint.go:88-89cp.Complete / cp.Fail errors silently discarded in finalizeCheckpoint. If the store is unavailable at finalize time, a successful run's checkpoint is never cleaned up (orphaned record) or a failed run's checkpoint persists (waste). Neither case is logged. (Verified by reading checkpoint.go:86-93.)

  • run/executor.go:367acc slice passed to Save without copying, inconsistent with the defensive copy in phases.go:185 (append([]PhaseOutput(nil), completed...)). The phases.go comment explicitly notes the checkpointer "may hold/serialize it asynchronously." In the single-loop path, acc continues to grow via append in subsequent observer calls after Save returns. If any Checkpointer implementation holds the Messages slice reference past Save's return, the transcript it serializes will be corrupted by later appends. In practice handle.Save is synchronous, but the inconsistency is a latent race. (Verified by comparing executor.go:367 with phases.go:184-185.)

  • run/checkpoint.go:73classifyCheckpointOutcome keys shutdown detection on run.ErrShutdown, but mort currently stamps its own runengine.ErrShutdown (a distinct sentinel) on the base context. Until the mort wiring PR aliases them, errors.Is(cause, ErrShutdown) will be false for every mort shutdown, causing checkpointFail (deletes the checkpoint) instead of checkpointLeaveRunning. The PR description acknowledges this gap; from the error-handling lens, it means the shutdown→recovery path is non-functional with current mort. (Verified by reading runengine.go:34 and checkpoint.go:73 — two distinct errors.New sentinels.)

devstral-2:123b-cloud (ollama-cloud) — Blocking issues found

Verdict: Blocking issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — Minor issues

Verdict

Minor issues

Findings

  • Potential information leakage in checkpoint metadata (checkpoint/handle.go:109-121): The Begin method extracts the prompt from info.Inputs["prompt"] without validation or sanitization. If the prompt contains sensitive information (e.g., API keys, PII), it will be stored in the checkpoint metadata. This could lead to unintended exposure if the checkpoint store is compromised or accessed by unauthorized parties. Consider redacting sensitive data or providing an allowlist/denylist mechanism for what can be stored in checkpoints.

  • Context value type safety (run/checkpoint.go:35,49): The resumeStateFromContext and existingCheckpointerFromContext functions use type assertions without error handling. If the context contains a value with the wrong type for these keys, the functions will silently return nil, which could lead to unexpected behavior. While this is not a direct security vulnerability, it could mask configuration errors that might have security implications.

gadfly-findings

[
  {"file": "checkpoint/handle.go", "line": 109, "severity": "medium", "confidence": "high", "title": "Potential information leakage in checkpoint metadata"},
  {"file": "run/checkpoint.go", "line": 35, "severity": "small", "confidence": "medium", "title": "Context value type safety issue"}
]
🎯 Correctness — Blocking issues found

Verdict

Blocking issues found

Findings

  • run/checkpoint.go:73: The logic in classifyCheckpointOutcome is incorrect. The function checks errors.Is(cause, ErrShutdown), but ErrShutdown is defined in runengine.go as a distinct error value (errors.New("mort: shutting down")). However, the PR description explicitly states that mort currently stamps its own runengine.ErrShutdown (a distinct value) on the base context, and the mort wiring PR will alias them so errors.Is matches. This means the current implementation will not correctly classify shutdowns until the mort wiring PR is merged, which could lead to incorrect checkpoint finalization (e.g., treating shutdowns as failures instead of leaving them for recovery). This is a blocking correctness issue.

  • run/executor.go:356-367: The checkpointing logic for single-loop runs incorrectly initializes the transcript accumulator (acc). It starts with multimodalUserMessage(input, inv.Images) and then overwrites it with resume.History if a resume state exists. This means the initial user message is lost during resume, which violates the semantic correctness of resuming a run. The resumed run should continue from the saved transcript, but the initial user message should still be part of the history. This is a blocking correctness issue.

gadfly-findings

🧹 Code cleanliness & maintainability — Minor issues

Based on the verification of the actual code, here is the corrected review:

Verdict

Minor issues

Findings

  • run/checkpoint.go:54-61: The checkpointOutcome type and its constants are not exported. This limits their usability outside the package and could be confusing for external consumers who might need to interact with these outcomes. Consider exporting them if they are part of the public API.
  • run/phases.go:67-74: The phaseDeps struct includes a checkpointer field and a resume field, but the comments and logic around these fields are not entirely clear. The comments could be improved to better explain the purpose and usage of these fields, especially in the context of durable recovery.
Performance — Minor issues

Corrected Review

Verdict

Minor issues

Findings

  • Potential Unbounded Growth in completed Slice in runPhases
    In run/phases.go:83, the completed slice is appended to in each phase iteration without any bounds checking. This could lead to unbounded growth in memory usage for long-running or frequently resumed multi-phase runs. Consider adding a limit or periodic cleanup mechanism to prevent excessive memory consumption.

  • Unverified: Potential Inefficiency in Slice Copying
    In run/phases.go:185, the completed slice is copied using append([]PhaseOutput(nil), completed...). While this ensures isolation, it could introduce unnecessary allocations if the slice is large. This is a minor concern but worth noting for optimization in high-throughput scenarios.

gadfly-findings

[
  {"file": "run/phases.go", "line": 83, "severity": "medium", "confidence": "high", "title": "Unbounded growth in `completed` slice could lead to excessive memory usage"},
  {"file": "run/phases.go", "line": 185, "severity": "small", "confidence": "medium", "title": "Potential inefficiency in slice copying during checkpointing"}
]
🧯 Error handling & edge cases — Minor issues

All three findings are confirmed in the actual code:

  1. run/checkpoint.go:90: The error returned by cp.Fail is ignored.
  2. run/executor.go:367: The error returned by ckpt.Save is ignored.
  3. run/phases.go:184: The error returned by deps.checkpointer.Save is ignored.

The draft review is accurate, and all findings are valid. The verdict remains "Minor issues."

glm-5.2:cloud (ollama-cloud) — Minor issues

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

I traced every tainted-data path this PR opens, through my assigned security lens:

  • Resume transcript → model history: executor.go:379 seeds resume.History via agent.WithHistory. This is the same trust boundary as the existing image-folding path (runAgent at executor.go:547 already builds WithHistory from input/images). Resume data originates from the host's own persisted checkpoint, not a new external input.
  • Recovered phase outputs → template expansion: phases.go:92-96 pre-populates outputs from resume.CompletedPhases, and expandPhaseTemplate (phases.go:343) feeds them into text/template. I verified the template is parsed from the static phase.SystemPrompt and the recovered outputs are substituted as data values, not parsed as template text — so there's no template-injection sink. This is the same flow as in-run phase outputs, which were already attacker-reachable via model output.
  • Authorization on resume: a recovered run re-enters the full Run entry point, so Budget.Check (executor.go:139-143), model resolution, and the toolbox build all re-run. ResumeState carries only transcript/phase data — no agent/tier/invocation override — so there's no authz bypass.
  • Context-carrier seams (WithResumeState/WithExistingCheckpointer): host-trusted recovery plumbing, not untrusted input surfaces.
  • factory.Begin prompt read (handle.go:110): a safe type-assertion from info.Inputs, already part of the trusted run input; not a new injection vector.
  • Shutdown classification (classifyCheckpointOutcome): keyed on errors.Is(cause, ErrShutdown) from context.Cause(runCtx) — only the host stamps the cause; an external caller can't forge it to force a leave-for-recovery outcome.

Verdict: No material issues found.

🎯 Correctness — Minor issues

I now have enough to verify both findings.

Finding 1 claims that when compaction fires, the majordomo agent's internal message history "is replaced by a summary" while acc keeps the raw pre-compaction messages, causing resume to replay the full transcript. But the compact/compactor.go documentation in this repo explicitly states (lines 36-38):

"The compactor is per-run STATEFUL: majordomo does not replace the loop's internal transcript with the compacted slice (the hook shapes only what is SENT)"

So the agent's internal transcript is not replaced by compaction — the compactor only shapes what's sent to the model per-call, and the loop keeps the raw messages. That means acc (the accumulated raw transcript) is the actual conversation state, and replaying it on resume is correct. Finding 1's central premise is contradicted by the in-repo compactor docs, so it must be dropped. The reviewer even flagged the agent internals as "unverified" — verification against the available compactor documentation overturns it.

Finding 2ResumeState (run/checkpoint.go:22-26) carries only History, CompletedPhases, ActivePhase; there is no iteration offset. On the single-loop resume path (run/executor.go:357), acc is seeded from resume.History, and each Save records Iteration: s.Index + 1 (run/executor.go:367) where s.Index is the new agent loop's step index starting at 0. So a run saved at iteration 5, resumed for one more step, checkpoints Iteration: 1. The prior iteration count is genuinely lost in the durable metadata. This is a real but metadata-only issue (it doesn't break resume continuation). Finding 2 is confirmed and kept.

VERDICT: Minor issues

  • run/executor.go:367 — On a single-loop resume, each checkpoint Save records Iteration: s.Index + 1 where s.Index is the new agent loop's step index (starting at 0), and ResumeState (run/checkpoint.go:22-26) carries no iteration offset. A run previously saved at iteration 5 that resumes for one more step therefore checkpoints Iteration: 1, losing the prior iteration count in the durable record. This corrupts the durable progress metadata that a host may use for display, throttling, or step-budget decisions on a subsequent recovery. Resume continuation itself is unaffected. Minor, metadata-only. Fix: carry the prior iteration count in ResumeState and add s.Index to it when saving.
🧹 Code cleanliness & maintainability — Minor issues

All four findings verified against the actual code:

  1. ActivePhaseResumeState.ActivePhase (run/checkpoint.go:25), RunCheckpointState.ActivePhase (run/ports.go:210), RunCheckpoint.ActivePhase (checkpoint/checkpoint.go:41) are all declared but never read; RunCheckpointState.ActivePhase is never set to anything but "" (run/phases.go:186). Confirmed dead.

  2. Checkpoint observer duplicate transcript assembly — confirmed at run/executor.go:359-367, and phases.go:236 wires plain deps.stepObserver with no checkpointing, so the "differs" rationale only applies to single-loop. Confirmed.

  3. Redundant lastOutput — pre-population loop (run/phases.go:95) sets lastOutput = pc.Output; skip branch (line 115) reassigns lastOutput = outputs[phase.Name] (same value). Confirmed no-op.

  4. NewFactory never initializes factory.now — confirmed at checkpoint/handle.go:103-104; f.now stays nil, passed to New which falls back to time.Now. Confirmed.

VERDICT: Minor issues

  • run/checkpoint.go:25ResumeState.ActivePhase is dead. The field is declared and documented ("the phase that was in flight"), but no code reads it: grep for .ActivePhase only returns write sites (checkpoint/handle.go:61 copies st.ActivePhase into the persisted struct; run/phases.go:186 always saves ActivePhase: ""), and runPhases only consumes ResumeState.CompletedPhases (skipping whole phases, not seeding an in-flight one). RunCheckpointState.ActivePhase / RunCheckpoint.ActivePhase are likewise never set to a non-empty value, and the persisted RunCheckpoint is never loaded back to populate ResumeState.ActivePhase (no caller reads RunCheckpoint.ActivePhase from a Load). The comment on RunCheckpointState.Messages ("OR the active phase's transcript", run/ports.go:202-203) is therefore also misleading — the multi-phase path never records an active-phase transcript. Either wire active-phase resume or drop the fields/comments to match what's shipped (boundary-granular only).

  • run/executor.go:354-369 — checkpoint observer duplicates transcript assembly that agent already owns. The wrapped obs manually reconstructs the conversation (s.Response.Message() + llm.ToolResultsMessage(s.Results...)) into acc, mirroring exactly what the agent loop builds internally. The comment at run/executor.go:307-309 cites that the observer "differs single-loop vs per-phase" as the reason it moved out of sharedOpts, but the per-phase path (run/phases.go:236) just wires the plain deps.stepObserver with no checkpointing — so the "differs" rationale only applies to single-loop. A transcript() accessor on agent.Step/agent.Result (or reusing runRes.Messages at finalize) would remove this hand-rolled zip, which is exactly the pairwise-call/result shape the existing stepObserver at run/executor.go:280-303 already exercises. Worth a follow-up to share the accumulation rather than duplicate it.

  • run/phases.go:114-117 — redundant lastOutput reassignment on the resume skip path. lastOutput was already set to pc.Output for every resumed phase during the pre-population loop at run/phases.go:91-97. Re-assigning it inside the skip branch (lastOutput = outputs[phase.Name]) is a no-op for resumed phases and dead-confusing next to the identical-looking line at 178. The skip branch can be a bare continue.

  • checkpoint/handle.go:103NewFactory silently drops now. factory.now is read by BeginNew (passed as f.now), but NewFactory never initializes it, so it's always nil and New falls back to time.Now (handle.go:36-38). That's functionally fine, but it leaves a struct field that looks configurable and isn't — either accept now in NewFactory (parity with New) or drop factory.now and pass nil explicitly so the field doesn't read as a forgotten injection point. The omission stands out against the existing New(store, meta, throttle, now) signature.

Performance — No material issues found

Verdict: No material issues found

Through the performance lens, the durable-checkpoint wiring is reasonably behaved:

  • Single-loop per-step Save is throttled. handle.Save (checkpoint/handle.go:42) takes the throttle window under a brief mutex, releases it before the store I/O, and skips the persist inside the window — so a hot agent loop doesn't block on store writes every step. Per-step work that always runs (the acc append, llm.ToolResultsMessage, the RunCheckpointState struct construction) is O(1) amortized / cheap, so even unthrottled the per-step overhead is minimal.
  • No N+1 / quadratic blowup introduced. The transcript acc grows O(steps) and each persisted save is O(transcript), but frequency is bounded by the throttle, so total cost is O(saves × transcript), not O(steps²). Phase-boundary saves copy the completed-phases list (append([]PhaseOutput(nil), completed...), run/phases.go:185) — O(phases), and phase counts are small. Both match mort's established pattern; not a regression.
  • No unbounded growth in the real wiring. Complete/Fail delete the record (checkpoint/handle.go:74-76); only shutdown-interrupted runs survive, which is the intended recovery set. The in-process Memory store's ListInterrupted copies the whole map (checkpoint/memory.go:47-54), but that store is test-only and explicitly non-durable per its own doc.

Minor (trivial) nits, not material:

  • run/executor.go:355-358: on a resume, multimodalUserMessage(input, inv.Images) is built and immediately discarded (overwritten by the resume-history copy). One wasted message construction per run — negligible.
  • run/executor.go:359-368: the wrapped observer constructs llm.ToolResultsMessage(s.Results...) and a RunCheckpointState every step even when handle.Save will be throttled away. Negligible allocation on the hot path; could short-circuit, but not worth it.
  • A host passing throttle <= 0 to NewFactory makes every step block synchronously on store.Save — but this is documented as host-controlled ("the host throttles") and mirrors mort, so it's a configuration footgun, not a defect.
🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

  • run/executor.go:207 (and the other pre-loop early returns at ~215, ~228) skip finalizeCheckpoint. Confirmed: finalizeCheckpoint has a single call site at executor.go:404, after the agent loop; the checkpointer is acquired earlier at executor.go:190-196before the toolbox/delegation/extra-tool builds. The build-error early returns (return res at lines 207, 215, 228, and the session-tool one at ~242) all run before finalizeCheckpoint, leaving ckpt unfinalized. For a fresh run this is harmless — the handle writes nothing to the store until the first Save, which only happens inside the loop, so ListInterrupted finds no record. But for a recovered run (WithExistingCheckpointer, executor.go:191), the durable record already exists from the prior interrupted run; a persistent pre-loop error (e.g., a misconfigured tool failing Registry.Build) takes the early return res without Complete/Fail, so the record survives → boot recovery re-dispatches the same run → same build error → indefinite retry loop. Suggested fix: route the build-error early returns through finalizeCheckpoint(ctx, ckpt, res.Err, context.Cause(runCtx)) so a terminal build error Fails an existing record instead of leaving it recoverable (or defer it guarded by whether the run actually started).

  • run/executor.go:192-196 silently swallows the CheckpointerFactory.Begin error. Confirmed against run/ports.go:181-182, whose doc comment on CheckpointerFactory explicitly states: "A storage error should be logged and degraded to (nil, nil)". The executor does if c, cerr := e.cfg.Ports.Checkpointer.Begin(ctx, info); cerr == nil { ckpt = c } and discards cerr with no log, so a failing/misconfigured checkpoint store makes every run silently non-durable with zero operator signal. Minor, but it contradicts the stated invariant. Suggested fix: emit an slog.Warn on the cerr != nil branch.

The classifyCheckpointOutcome decision matrix is sound: ErrShutdown/ErrCriticKill are defined at run/ports.go-adjacent code, the runErr == nil short-circuit correctly precedes the shutdown check so a clean success is always Complete even if shutdown landed late, and the test matrix in run/checkpoint_test.go covers the cases.

kimi-k2.6:cloud (ollama-cloud) — Blocking issues found

Verdict: Blocking issues found · ⚠️ 1/5 lens(es) errored — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

No material issues found

I scrutinized the durable-checkpoint wiring for authentication, injection, unsafe deserialization, secret leakage, input validation, and insecure defaults. The changes are clean for their intended threat model:

  • run/checkpoint.go: WithResumeState / WithExistingCheckpointer are standard context carriers; the unexported key types mean only package-level code can read them. classifyCheckpointOutcome correctly uses errors.Is(cause, ErrShutdown) to avoid exact-equality fragility. finalizeCheckpoint runs on a detached context so cancellation doesn't prevent terminal-state recording. No injection or deserialization surface here.

  • run/executor.go: The resume path seeds resume.History via agent.WithHistory and passes empty input. The transcript accumulator starts from the resume history (or the opening user message) and appends step responses + tool results. No command/template/SQL injection vectors introduced — the only template is expandPhaseTemplate, which uses the agent-defined SystemPrompt (trusted) with missingkey=zero against user data.

  • checkpoint/handle.go: NewFactory.Begin reads info.Inputs["prompt"] with a safe type assertion to string (defaults to empty). The factory mints a per-run handle keyed by RunID, so concurrent runs don't collide. Error from Begin is silently swallowed on the executor side, but the interface contract explicitly expects degradation to non-durable rather than failing the run.

  • run/phases.go: Phase-boundary checkpointing copies the completed slice before saving (defensive against async serialization). Resumed runs skip phases by key presence in outputs, which is populated from the trusted ResumeState.

  • run/ports.go: GuildID and ModelTier are added to RunInfo and checkpoint meta for resume re-dispatch attribution. This is needed metadata, not a leakage bug.

No authn/authz gaps, injection points, SSRF, unsafe deserialization, credential leakage, or missing input validation were introduced by this diff.

🎯 Correctness⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 10: context deadline exceeded

🧹 Code cleanliness & maintainability — No material issues found

No material issues found.

Through the cleanliness & maintainability lens, the PR is well-structured:

  • Context key types follow the existing repo pattern (progressSinkKey).
  • The step-observer wrapping for checkpointing in executor.go is localized and clearly commented.
  • The factory pattern matches mort's existing agentexec.CheckpointerFactory, keeping cross-repo consistency.
  • Tests reuse shared helpers (phaseProvider) rather than duplicating setup.
  • Error handling degrades gracefully (failed Begin → nil checkpointer, nil-safe throughout).

No dead code, confusing names, or leaky abstractions are introduced.

Performance — No material issues found

No material issues found

Through the performance lens, the checkpoint wiring is clean:

  • The single-loop path accumulates transcript messages in a slice and calls ckpt.Save each step, but the battery-level handle.Save throttles these calls by time window (skipping eagerly inside the throttle), and the host controls the throttle duration. The work inside the observer (appending to acc, calling Save) is O(1) when throttled and O(n) in total transcript length only when a save actually fires — which is unavoidable since the full transcript is required for mid-loop resume.
  • Phase-boundary checkpointing only fires once per completed phase (not per step), copies the completed-phases slice defensively, and stores only phase outputs — both bounded and minimal.
  • Resume seeding via WithHistory is a one-time cost at loop start.
  • No N+1 queries, unbounded growth, or quadratic behavior introduced.
🧯 Error handling & edge cases — Blocking issues found

Blocking issues found

  • run/executor.go:188CheckpointerFactory.Begin error is silently discarded. If the factory returns a storage error (e.g., DB unavailable), the executor proceeds as if the run is non-durable, with no log or metric. Operators have no way to detect a broken checkpoint store.

  • run/checkpoint.go:88-90finalizeCheckpoint swallows errors from cp.Complete and cp.Fail. If the store fails to delete a finished run's checkpoint (network blip, timeout), the record survives and the next boot will incorrectly attempt to resume a run that already terminated. This creates false recovery candidates.

qwen3.5:397b-cloud (ollama-cloud) — Blocking issues found

Verdict: Blocking issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues found

Security lens findings:

  • run/phases.go:177-188 + run/phases.go:343-359 — Template injection via persisted phase outputs (medium): Phase outputs (derived from LLM responses) are stored in checkpoints at line 182-187 (completed = append(completed, PhaseOutput{Name: phase.Name, Output: output})) and later used as template data in expandPhaseTemplate on resume. On resume, outputs map is populated from deps.resume.CompletedPhases (lines 91-97), then passed directly to expandPhaseTemplate at line 123. The expandPhaseTemplate function uses text/template with no sanitization of priorOutputs — it directly inserts phase outputs into the template data map (lines 349-352). An attacker controlling LLM output via prompt injection could inject Go template syntax ({{...}}) that gets evaluated when the checkpoint resumes. Verified: Read expandPhaseTemplate — uses template.New("phase")...Parse(tmpl) and t.Execute(&buf, data) where data includes unsanitized priorOutputs. Fix: Sanitize phase outputs before template insertion, or use a non-interpolating format for phase threading.

  • checkpoint/handle.go:110-122 — Prompt stored in checkpoint metadata without size validation (small): The user prompt is extracted from info.Inputs["prompt"] at line 110 via direct type assertion (prompt, _ := info.Inputs["prompt"].(string)) with no length limit or validation before being stored in RunCheckpointMeta.Prompt at line 118. A host using a bounded store (e.g., fixed-size job table) could face DoS via oversized prompts. Verified: Read Begin() — direct type assertion with no bounds check. Fix: Truncate or reject prompts exceeding a reasonable threshold before storing in meta.

  • run/checkpoint.go:86-93 — Checkpoint finalize swallows errors silently (small): finalizeCheckpoint discards all errors from Complete()/Fail() with _ = at lines 88 and 90. If the checkpoint store fails during terminal state recording, the run completes but the checkpoint may remain in an inconsistent state (e.g., marked as "interrupted" when it actually finished). Verified: Read finalizeCheckpoint — errors explicitly ignored. Fix: At minimum log the error; consider surfacing it to audit/recorder.

  • run/phases.go:114-116 — Resume phase-skipping trusts checkpoint integrity (medium): On resume, phases are skipped based solely on key presence in outputs map (line 114: if _, done := outputs[phase.Name]; done), populated from deps.resume.CompletedPhases (lines 91-97). If an attacker compromises the checkpoint store, they can inject arbitrary PhaseOutput entries to skip security-critical phases. Verified: Read runPhases resume logic — no integrity validation of deps.resume.CompletedPhases before using them to skip phases. Fix: Add checkpoint integrity verification (e.g., HMAC) if store is untrusted; document trust boundary.

🎯 Correctness — Minor issues

VERDICT: Minor issues found

Findings:

  1. run/phases.go:186ActivePhase field unused in resume logic (dead code)

    • The ResumeState.ActivePhase field is populated from checkpoints but never read during resume. The multi-phase resume logic (lines 91-97, 114-117) determines which phase to run next solely by checking CompletedPhases — it runs the first phase whose name is NOT in the completed set. The ActivePhase field has no effect on behavior.
    • Impact: Dead code that suggests functionality (mid-phase resumption) that doesn't exist. Confusing for future maintainers.
    • Fix: Either remove ActivePhase from ResumeState and RunCheckpointState, or document it as metadata-only (not used for resume decisions). Verified by reading run/phases.go:81-117 and run/checkpoint.go:22-26.
  2. run/ports.go:202-203 — Documentation claims unsupported functionality

    • The comment states Messages is "the running transcript (single-loop run) OR the active phase's transcript (multi-phase run)". However, the multi-phase checkpoint save at run/phases.go:184-187 never sets Messages — it only saves CompletedPhases and ActivePhase. This means mid-phase transcript recovery for multi-phase runs is NOT supported (only boundary-granular).
    • Impact: Documentation promises functionality that the code doesn't deliver. A reader expecting mid-phase transcript resume for multi-phase runs will be misled.
    • Fix: Update the comment to clarify that Messages is only populated for single-loop runs; multi-phase checkpoints do not preserve the active phase's transcript. Verified by reading run/phases.go:179-188 and run/ports.go:200-211.
  3. run/checkpoint.go:25ActivePhase field semantically inconsistent

    • ResumeState.ActivePhase is documented as "the phase that was in flight", but the checkpoint save at phase boundaries (after a phase completes) sets ActivePhase: "" (line 186). The field would only be meaningful if saved mid-phase, but no mid-phase save exists for multi-phase runs.
    • Impact: Field is always empty when read on resume, making it useless for its stated purpose.
    • Fix: Remove the field or repurpose it to track "next phase to run" if that becomes useful. Verified by reading run/phases.go:179-188.
🧹 Code cleanliness & maintainability — Minor issues

VERDICT: Minor issues found

Findings:

  • run/checkpoint.go:23 — The ResumeState.History field comment claims it holds "single-loop transcript OR active-phase transcript", but the multi-phase path in run/phases.go:88-97 never reads deps.resume.History — it only uses CompletedPhases to skip finished phases. The active phase re-runs from scratch with the original query. This is documented as intentional boundary-granular recovery in the code comments, but the struct comment is misleading. Fix: Update the comment to clarify History is only used for single-loop runs; multi-phase recovery does not resume mid-phase transcripts.

  • run/checkpoint.go:25 — The ResumeState.ActivePhase field is defined but never read by the executor. Grep confirms it is only written (in checkpoint/handle.go:61, run/phases.go:186) but never read. The multi-phase path determines which phase to run by checking key presence in outputs map (run/phases.go:114-117), not by consulting ActivePhase. This field adds cognitive load without functional purpose. Fix: Either remove the field or add a comment explaining it's metadata only (e.g., for host debugging/logging) and not used by the kernel.

  • checkpoint/handle.go:107-108 — The factory.Begin method extracts prompt from info.Inputs["prompt"] with a comment saying this is "so a recovered run can re-dispatch". However, the kernel's multi-phase recovery path (run/phases.go:81-191) does not use any recovered prompt — it skips completed phases and re-runs the active phase with the original query passed to runPhases. The RunInfo.Prompt field (populated from this extraction) is also never read anywhere in the executor or phases logic. The prompt extraction appears to be dead code for the multi-phase case. Fix: Clarify the comment to state this is only relevant for single-loop resume, or remove if unused.

Performance — No material issues found

No material issues found

After reviewing the checkpoint/resume wiring through the performance lens, the implementation handles the key efficiency concerns appropriately:

Verified findings:

  1. Throttling is correctly implemented (checkpoint/handle.go:45-48): The Save method checks the throttle window before persisting, preventing excessive I/O on every step. Confirmed by reading handle.go.

  2. No unnecessary slice copies in hot path (run/executor.go:359-367): The single-loop observer accumulates acc in-place and passes the slice header to Save. The slice grows naturally via append, which is amortized O(1). Confirmed by reading executor.go.

  3. Multi-phase checkpoint copy is intentional (run/phases.go:185): append([]PhaseOutput(nil), completed...) creates a snapshot copy at phase boundaries to prevent aliasing—this is correct since the checkpointer may hold the slice asynchronously. Phase boundaries are infrequent (not per-step), so this copy is negligible. Confirmed by reading phases.go.

  4. Resume path does one-time O(n) copy (run/executor.go:356-357): append([]llm.Message(nil), resume.History...) copies the saved transcript once at startup to prevent aliasing between the resumed state and new messages. This is a necessary correctness cost, not a hot-loop regression. Confirmed by reading executor.go.

  5. No N+1 or unbounded growth issues: Checkpoints are per-run (factory pattern), the Memory store holds one checkpoint per run ID (overwrites on save), and completed runs delete their checkpoints immediately via Complete. Confirmed by reading memory.go:23-28 and handle.go:74-76.

The throttle prevents hot-path I/O, slice handling avoids unnecessary copies while preserving correctness, and there are no quadratic behaviors or blocking calls in performance-critical sections.

🧯 Error handling & edge cases — Blocking issues found

Blocking issues found

  • run/phases.go:184 — Phase-boundary checkpoint Save error silently swallowed: The phase-boundary save discards the error with _ =. If the checkpoint store fails mid-run (disk full, DB locked, network blip), the run continues but checkpoints aren't persisted. On shutdown/recovery, the host will resume from a stale checkpoint (or none), potentially re-executing work already done or losing progress. Verified by reading run/phases.go:183-188. Suggested fix: Log the error (at minimum) and consider surfacing it via the RunRecorder audit trail so the host knows durability failed.

  • run/executor.go:367 — Per-step checkpoint Save error silently swallowed: The per-step save in the single-loop path discards the error with _ =. Same impact as above — checkpoint persistence failures go unnoticed, risking stale recovery state. Verified by reading run/executor.go:359-368. Suggested fix: Log the error; at minimum ensure the failure is recorded in the audit trail.

  • run/checkpoint.go:90Fail error silently swallowed in finalizeCheckpoint: When a run terminates with a non-shutdown error, cp.Fail(detach(ctx), runErr) is called but its error is discarded. If Fail itself fails (e.g., store unreachable during cleanup), the checkpoint record may remain in an ambiguous "running" state instead of being marked failed. On boot recovery, the host might incorrectly attempt to resume a run that should be terminal. Verified by reading run/checkpoint.go:80-94. Suggested fix: Log the error; at minimum ensure the failure is recorded in the audit trail.

Automated adversarial review by Gadfly — consensus across the model swarm. Advisory only — does not block merge.

<!-- gadfly-consensus --> ## 🪰 Gadfly review — consensus across 7 models **Verdict: Blocking issues found** · 23 findings (9 with multi-model agreement) | | Finding | Where | Models | Lens | |--|--|--|--|--| | 🔴 | Per-step checkpoint Save error silently swallowed | `run/executor.go:367` | 6/7 | correctness, error-handling, performance, security | | 🔴 | Resume logic incorrectly overwrites initial user message | `run/executor.go:354` | 5/7 | correctness, maintainability, performance | | 🔴 | Phase-boundary checkpoint Save error silently swallowed | `run/phases.go:184` | 5/7 | correctness, error-handling, maintainability, performance | | 🟠 | ActivePhase field plumbed through 3 structs but never read/set; doc comments claim active-phase seeding that isn't implemented | `run/checkpoint.go:23` | 5/7 | correctness, maintainability | | 🔴 | finalizeCheckpoint silently ignores Complete/Fail errors leaving stale recovery records | `run/checkpoint.go:86` | 4/7 | error-handling, security | | 🟠 | CheckpointerFactory.Begin error silently swallowed with no log | `run/executor.go:193` | 4/7 | correctness, error-handling | | 🟠 | Resume phase-skipping trusts checkpoint integrity without validation | `run/phases.go:114` | 3/7 | error-handling, maintainability, security | | 🔴 | ErrShutdown sentinel mismatch: shutdowns classified as Fail until mort wiring PR lands | `run/checkpoint.go:73` | 2/7 | correctness, error-handling | | 🟡 | Potential information leakage in checkpoint metadata | `checkpoint/handle.go:107` | 2/7 | maintainability, security | <details><summary>14 single-model findings (lower confidence)</summary> | | Finding | Where | Model | Lens | |--|--|--|--|--| | 🔴 | CheckpointerFactory.Begin error silently swallowed with no logging | `run/executor.go:188` | kimi-k2.6:cloud | error-handling | | 🟠 | Pre-loop early-return error paths skip finalizeCheckpoint; a recovered run's existing durable record is left unfinalized, causing indefinite boot-recovery retry on persistent build errors | `run/executor.go:207` | glm-5.2:cloud | error-handling | | 🟠 | finalizeCheckpoint not deferred: a panic (or early setup-error return) bypasses it, leaving an orphaned recoverable checkpoint that should be Fail'd → boot recovery loop | `run/executor.go:404` | claude-code/opus | error-handling | | 🟠 | Template injection via persisted phase outputs in expandPhaseTemplate | `run/phases.go:177` | qwen3.5:397b-cloud | security | | 🟡 | NewFactory never initializes factory.now, leaving a configurable-looking but always-nil field | `checkpoint/handle.go:103` | glm-5.2:cloud | maintainability | | 🟡 | Context value type safety | `run/checkpoint.go:35` | devstral-2:123b-cloud | security | | 🟡 | checkpointOutcome type and constants are not exported, limiting usability outside the package | `run/checkpoint.go:54` | devstral-2:123b-cloud | maintainability | | 🟡 | Stale phaseDeps doc still lists 'step observer' in sharedOpts after it was moved out per-path | `run/phases.go:56` | claude-code/opus | correctness, maintainability | | 🟡 | Comments and logic around phaseDeps fields could be improved for clarity | `run/phases.go:67` | devstral-2:123b-cloud | maintainability | | 🟡 | Potential Unbounded Growth in 'completed' Slice in 'runPhases' | `run/phases.go:83` | devstral-2:123b-cloud | performance | | 🟡 | Documentation claims multi-phase transcript resume not implemented | `run/ports.go:202` | qwen3.5:397b-cloud | correctness | | 🟡 | RunCheckpointState.ActivePhase is always empty string — field exists but is never populated with meaningful data | `run/ports.go:210` | deepseek-v4-pro:cloud | maintainability | | ⚪ | factory.now field is never set by NewFactory (always nil); dead/untestable field | `checkpoint/handle.go:90` | claude-code/opus:max | maintainability | | ⚪ | factory.now field is never assigned (NewFactory omits it) — dead/always-nil clock field | `checkpoint/handle.go:95` | claude-code/opus | maintainability | </details> <details><summary>Per-model detail</summary> <details><summary><b>claude-code/opus</b> (claude-code) — Minor issues</summary> **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> This PR is internal kernel plumbing for durable checkpoint/resume. I've traced the tainted-input and trust-boundary paths relevant to the security lens. ## VERDICT: No material issues found I reviewed this through the security lens (authz, injection, untrusted-data handling, secret leakage) and verified the key paths against the checked-out code: - **Resume data trust boundary** (`run/executor.go:185-196`, `run/checkpoint.go:18-52`): `ResumeState` and the existing `Checkpointer` arrive via in-process `context` values set by the host's boot-recovery path, and the saved transcript originates from the host-controlled `CheckpointStore` (durable infra). This is trusted host infrastructure, not an external/untrusted ingress point, so seeding `agent.WithHistory(resume.History)` does not cross a new trust boundary. - **No template injection on phase resume**: I checked `run/phases.go` — completed-phase outputs are threaded into the next phase's prompt via `{{.<PhaseName>}}` substitution. Go's `text/template` substitutes a `.Field` value as *data*; it does not re-parse interpolated values, so a phase output containing `{{...}}` cannot inject template directives. No injection vector here. - **Authz/budget not bypassed on recovery**: The pre-run budget gate (`e.cfg.Ports.Budget.Check(ctx, inv.CallerID)`, `run/executor.go:139-143`) runs before any resume seeding, so a recovered run is still gated. Authorization at re-dispatch is the host's responsibility, unchanged by this PR. - **No secret leakage introduced**: `factory.Begin` (`checkpoint/handle.go:107-123`) copies `Prompt`/`CallerID`/`GuildID` into `RunCheckpointMeta` for persistence — that is inherent to checkpointing and goes only to the host-supplied store, not to any external sink. The `info.Inputs["prompt"].(string)` assertion is a safe comma-ok and is not a security concern. No authn/authz gap, injection, SSRF, unsafe deserialization, or insecure default is introduced by this change. One adjacent observation, explicitly outside my lens (not a security finding): the documented `run.ErrShutdown` vs mort's `runengine.ErrShutdown` mismatch means a shutdown would currently be classified as `checkpointFail` and *delete* the recovery record — that's an availability/correctness concern for the correctness lens, not security. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> Both findings are confirmed against the actual code: **Finding 1** — Verified. `resume.History` is read only in the single-loop path (`executor.go:356,376`); the multi-phase path reads only `CompletedPhases` (`phases.go:91-97`). `ActivePhase` has no reader anywhere — it is written as the literal `""` at `phases.go:186` and merely copied through `handle.go:61`. Meanwhile `ResumeState` (`checkpoint.go:21,25`) and `RunCheckpointState` (`ports.go:209-210`) document active-phase seeding that never occurs. Confirmed dead field + over-promising docs. **Finding 2** — Verified. `phaseDeps` comment at `phases.go:56-57` still says the shared options are "(tool-error limits + step observer + compactor)", but `executor.go:310-312` now builds `sharedOpts` with only tool-error limits + compactor; the step observer is wired per-path (`executor.go:373`, `phases.go:236`). Observer is added exactly once per path — no double-count. Pure doc drift. Both hold. Verdict unchanged. --- I've traced the durable-recovery logic against the actual checked-out code. The core classification, cause-propagation, resume, and phase-skip logic all hold up. Findings below. ## VERDICT: Minor issues ### Findings - **Multi-phase active-phase resume is documented but unimplemented (write-only/dead fields).** `run/checkpoint.go:21,25` and `run/ports.go:209-210` advertise that a resumed run "seeds the active phase" via `ResumeState.ActivePhase` / `ResumeState.History`, and `RunCheckpointState.ActivePhase` carries "the name of the in-progress phase (multi-phase only)." Verified by grep + reading `runPhases`: the phase path **only ever reads `deps.resume.CompletedPhases`** (`run/phases.go:91-97`); it never reads `resume.ActivePhase` or `resume.History` (those are read only by the single-loop path at `executor.go:356,376`), and the phase-boundary `Save` **hardcodes `ActivePhase: ""`** (`run/phases.go:186`). So for any multi-phase run the persisted `ActivePhase` is always empty, and a host that populates `ResumeState.ActivePhase`+`History` (per the doc) gets a *silent full re-run of the interrupted phase* instead of the mid-phase resume the field names imply. The boundary-granular behavior is correct and intentional (it's documented in the `phaseDeps` block at `phases.go:67-71`), but the field-level docs over-promise and the `ActivePhase` field is effectively dead. Impact: misleading persisted state + a latent host-integration footgun, not a crash. Fix: either drop `ActivePhase` from `RunCheckpointState`/`ResumeState` (and the "seeds the active phase" wording), or actually record/consume it. *(low severity.)* - **Stale comment: `phaseDeps` still lists the step observer as part of `sharedOpts`.** `run/phases.go:56-57` describes "the shared agent options (tool-error limits + step observer + compactor)". This PR moved `WithStepObserver` out of `sharedOpts` (`run/executor.go:306-312`) and now wires it per-path (`executor.go:373`, `phases.go:236`). The comment now contradicts the code. No behavioral effect — both paths add the observer exactly once, so there's no double-count or audit regression — purely doc drift. *(trivial.)* ### Verified clean (within the correctness lens) - `classifyCheckpointOutcome` keying on `errors.Is(cause, ErrShutdown)` is sound: `finalizeCheckpoint` is called at `executor.go:404` with `context.Cause(runCtx)`, *before* the deferred cancellations run, so the cause reflects the true outcome. Critic-kill/deadline/caller-cancel → `Fail`, shutdown → leave-running, success → `Complete` all match the test matrix. - Single-loop resume (`executor.go:376-382`) seeds `WithHistory` + empty input so completed tool calls aren't re-run; the `acc` reconstruction appends consistently at step granularity. - Factory meta mapping (`checkpoint/handle.go`) is correct; nil `now` handled by `New`; `RunCheckpointMeta` has the `GuildID`/`ModelTier` fields, so it compiles. - Phase skip uses key-presence not output-emptiness (`phases.go:114`), `isLast` indexing stays correct across skipped phases, and a fully-resumed run returns the saved final output without a model call. </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> All four findings are confirmed against the code: 1. **phases.go sharedOpts doc** — `sharedOpts` (executor.go:310-312) now holds only tool-error limits (+ optional compactor); the `phaseDeps` parenthetical still lists "step observer". The actual stale text is on line 56 (line 57 separately/correctly documents the distinct `stepObserver` field). Confirmed. 2. **ResumeState.ActivePhase** — grep shows zero reads anywhere; `runPhases` consumes only `CompletedPhases`. The "seeds the active phase" doc is unimplemented. Confirmed. 3. **RunCheckpointState.ActivePhase** — only ever written `""` (phases.go:186; single-loop Save omits it); never read. Confirmed dead plumbing. 4. **factory.now** — `NewFactory` (handle.go:103) never sets it; `f.now` is always nil. Confirmed dead field. All survive. Corrected review below (finding 1's line reference adjusted to where the stale text actually sits). ## Review **VERDICT: Minor issues** Through the code cleanliness & maintainability lens, the change is generally well-structured (good use of the shared `multimodalUserMessage` helper, the per-path observer wrap reads cleanly, and the `classifyCheckpointOutcome`/`finalizeCheckpoint` split is tidy). A few stale-doc and speculative-field smells: - **`run/phases.go:56` — stale struct doc.** The `phaseDeps` comment still describes the shared agent options as *"(tool-error limits + step observer + compactor)"*. This PR deliberately moved the step observer **out** of `sharedOpts` (executor.go:310-312 now builds it with only tool-error limits + optional compactor; `WithStepObserver` only appears at the per-path sites, executor.go:373 and phases.go:236). The comment now describes the pre-PR layout and contradicts the `runOnePhase` comment you *did* update (phases.go:229-232). Fix: drop "step observer" from the `sharedOpts` parenthetical here. - **`run/checkpoint.go:25` (+ doc lines 20-21) — `ResumeState.ActivePhase` is unused; docs describe unimplemented behavior.** Grep confirms `ResumeState.ActivePhase` is never read anywhere, and in the multi-phase path (`runPhases`) only `CompletedPhases` is consumed (phases.go:91-97). Yet the struct doc claims the executor "seeds the active phase". The PR is explicitly boundary-granular (the active phase re-runs from scratch — see phaseDeps doc, phases.go:67-71), so this is a speculative exported field whose documentation promises behavior that doesn't exist. Either drop `ActivePhase` or clearly mark it "reserved; not yet consumed" so a future reader doesn't assume mid-phase resume works. - **`run/ports.go:210` + `run/phases.go:186` — `RunCheckpointState.ActivePhase` is write-only/always empty.** The only writer (phases.go:186) sets `ActivePhase: ""` explicitly, and the single-loop path (executor.go:367) never sets it. So the persisted field is always empty in practice — dead plumbing paired with the unused `ResumeState.ActivePhase` above. The explicit `ActivePhase: ""` (zero value) is also redundant. - **`checkpoint/handle.go:95` — `factory.now` is dead.** The `factory` struct carries a `now func() time.Time` field, but `NewFactory` never sets it (handle.go:103-105), so `f.now` is always nil and `Begin` always passes nil to `New` (handle.go:122), which then defaults to `time.Now`. Unlike `handle`, there's no constructor/option that injects a clock, so the field can never be anything but nil. Either drop it, or add the injection path it implies (mirroring `handle`'s testable clock). Nothing here is blocking; these are low-churn doc/dead-field cleanups. </details> <details><summary><b>⚡ Performance</b> — Minor issues</summary> I've independently verified both findings against the actual code. **Finding 1 (executor.go:359 — accumulator bypasses compaction):** Confirmed. At `executor.go:354-368` the checkpoint observer builds `acc` as a private append-only slice (`acc = append(acc, s.Response.Message())` + `llm.ToolResultsMessage(s.Results...)` every step), entirely independent of the agent's internal history. The compactor is attached only via `sharedOpts` (`executor.go:328`), which folds into the agent's options at `executor.go:374` — so compaction shrinks the agent's live working set but never touches `acc`. The growth/serialization-cost framing holds. **Finding 2 (executor.go:367 + handle.go — full-transcript Save every step, no throttle floor):** Confirmed. `ckpt.Save(runCtx, RunCheckpointState{Messages: acc, ...})` fires synchronously in the step loop; `handle.Save` (handle.go:42-72) passes the whole `st.Messages` to `store.Save` with no delta. The throttle at handle.go:45 is the only guard, and `New`'s contract (handle.go:30-31) plus `NewFactory` (handle.go:103-105) confirm `throttle <= 0` = save every call with no enforced floor. Config-dependent O(n²) write volume for durable backings is real. Both findings accurately describe the code and stay within the Performance lens. Verdict stands. ## VERDICT: Minor issues The wiring is reasonable and the throttle bounds persistence frequency. Two related performance concerns, both within my lane: - **`run/executor.go:355-368` — the checkpoint accumulator bypasses compaction, so it grows unbounded on exactly the long runs durable recovery targets.** The single-loop observer builds `acc` by appending `s.Response.Message()` + `llm.ToolResultsMessage(s.Results...)` on *every* step and never drops anything. But the agent itself runs with a compactor (`sharedOpts` includes `agent.WithCompactor(...)` at `executor.go:328`, and single-loop `opts` folds `sharedOpts` in at `executor.go:374`). When compaction fires, the agent's live working set shrinks; `acc` does not. Result: on a long run the checkpoint transcript is strictly larger than the live conversation and grows without the bound compaction is there to provide — both as retained memory (a second, ever-growing copy of every message) and, more importantly, as the payload serialized on each persisted `Save`. Suggested fix: feed the checkpoint from the agent's current (post-compaction) message history rather than a private append-only accumulator, or cap/compact `acc` in step with the agent. - **`run/executor.go:367` + `checkpoint/handle.go:42-72` — each persisted `Save` serializes the *entire* growing transcript, and the kernel relies entirely on the host throttle with no floor.** `ckpt.Save(runCtx, RunCheckpointState{Messages: acc, ...})` fires synchronously in the agent step loop on every step. When the throttle window has passed, `handle.Save` calls `store.Save` with the full `acc` — a blocking store write on the hot path that re-serializes the whole history every time (no delta/incremental write). The throttle (`handle.go:45`) is the only mitigation, and `New`/`NewFactory` explicitly support `throttle <= 0` = "save every call" (`handle.go:30-31`). With that documented config, a single-loop run does a full-transcript store write on *every* step → O(n²) cumulative write volume plus blocking I/O between every agent step. This is config-dependent (the in-memory default makes each write cheap), so it's a real edge for the durable (mort durable-job table) backing, not a default-path regression. Suggested fix: enforce a minimum throttle in `NewFactory`, or document that durable stores must pass a non-zero throttle; ideally combine with the fix above so the serialized payload is bounded. Everything else I checked is fine: the throttled in-window path is cheap (lock + `time.Now` + compare); the phase-boundary `Save` copies only the small `[]PhaseOutput` slice (`phases.go:185-189`); `acc` append is amortized O(1) and passing it by reference to `Save` is not a per-step copy; the resume seed (`executor.go:357`, `WithHistory`) is a one-time copy. No N+1, no quadratic per-step copying. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> Both findings are confirmed against the actual source: - **Finding 1** (`executor.go:404`): Verified — `finalizeCheckpoint(...)` is a plain statement at line 404, executed only after the dispatch returns. The top-level `recover()` defer (lines 119–123) sets `res.Err` and returns, bypassing line 404 entirely on a panic. The wrapped step observer (lines 359–368) `Save`s each step via the live `ckpt`, so a panic after ≥1 Save leaves a persisted record that is neither `Complete`d nor `Fail`ed — which `classifyCheckpointOutcome` reserves for `ErrShutdown` (leave-for-recovery), not a terminal panic. The setup-error early returns after `Begin` (lines 209/217/228/241) likewise skip finalize. Confirmed. - **Finding 2** (`executor.go:193`): Verified — `cerr` is dropped with no log. The `CheckpointerFactory` doc (ports.go:178–183) does place the "log + degrade to (nil,nil)" duty on the factory, while `phaseModel` (phases.go:270) emits `slog.Warn` on the analogous resolve failure. The inconsistency the finding draws is accurate. Both survive. The verdict is unchanged. --- ## VERDICT: Minor issues My lens is error handling & edge cases. The happy paths and the documented degraded paths (optional/budget-exhausted phases, nil-safe ports, throttle-skip) are handled cleanly. Two unhappy-path gaps the diff introduces: - **`run/executor.go:404` — `finalizeCheckpoint` is a plain statement, not deferred, so panics and early-error returns bypass it; a panicked run leaves an orphaned recoverable checkpoint.** The checkpointer is begun at `:193`, the wrapped step observer `Save`s a snapshot each completed step (`:367`), but `finalizeCheckpoint(...)` only runs at `:404` after the dispatch. The top-level `recover()` defer (`:119-123`) converts a panic anywhere in the loop into `res.Err` and returns **without** finalizing. So if a panic occurs after one or more `Save`s, the persisted checkpoint is neither `Complete`d nor `Fail`ed — by omission it gets the "leave for boot recovery" treatment that `classifyCheckpointOutcome` reserves for `ErrShutdown`. A panicking run is terminal and should `Fail` (delete) the record; instead it becomes a recovery candidate that will be re-dispatched on the next boot and can re-panic (a recovery loop). This is reachable with the **shipped battery** (whose `Save` actually persists), so it's store-independent. Suggested fix: make finalize a `defer` that captures `runErr`/`context.Cause(runCtx)` (or have the `recover` defer `Fail` the checkpointer). Same root cause, lower-impact corollary: the setup-error early returns after `Begin` — `build toolbox` (`:209`), delegation tools (`:217`), extra tools (`:228`), session tools (`:241`) — also skip finalize. Harmless for the shipped battery (no `Save` has happened yet), but a factory whose `Begin` marks the run durable/recoverable (the PR explicitly models this on mort's `agentexec`) would leak an un-`Fail`ed record on any of these. - **`run/executor.go:193` — `Begin` error is silently swallowed.** `if c, cerr := ...Begin(ctx, info); cerr == nil { ckpt = c }` drops a non-nil `cerr` with no log; the run silently continues non-durable. The `CheckpointerFactory` doc (`ports.go:178-183`) does place the "log + degrade to (nil,nil)" duty on the factory, so this is defensible — but it diverges from the codebase's own convention for analogous resolution failures (`phaseModel` at `phases.go:270` emits `slog.Warn` on a model-resolve failure). A misbehaving factory that returns an error here loses durability with zero signal. Suggested fix: `slog.Warn` the dropped `cerr`. Things I checked and cleared (not findings): the shutdown-cause classification reaching `context.Cause(runCtx)` is sound — `MergeCancellation` propagates the secondary ctx's cause, so a host-set `ErrShutdown` flows through. `detach(ctx)` correctly bases cleanup writes off the parent (non-cancelled) ctx. The phase-skip uses key-presence not output-emptiness (`phases.go:114`), correctly handling a legitimately-empty phase output. The live `acc` slice handed to `Save` at `:367` is not an aliasing/mutation bug — `append` only writes at/after the handed-off slice's `len`, so the store's prefix view is never mutated. </details> </details> <details><summary><b>claude-code/opus:max</b> (claude-code) — Minor issues</summary> **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> I've verified the finding against the actual source. The factual claims hold: - `checkpoint/handle.go:76` — `Fail` does discard the passed error (`_ error`) and only deletes, so `runErr` is never persisted. Confirmed. - `run/executor.go:359-368` — the wrapped step observer accumulates `s.Response.Message()` **and** `llm.ToolResultsMessage(s.Results...)` into `acc`, then calls `ckpt.Save` each step. Confirmed. - The checkpoint executor wiring was a pending P2 follow-up (per CLAUDE.md: "critic/checkpoint executor wiring … is a P2 follow-up"), so this PR is indeed the first to drive `Save` and persist the transcript. Confirmed. - No redaction hook exists on the `Save`/`Checkpointer` path. Confirmed. The finding is accurate and security-relevant; it survives. ## VERDICT: Minor issues This change is internal plumbing — the data crossing the new boundary (transcripts, phase outputs, run meta) originates from the agent's own run, and the only trust boundary is the checkpoint store, which is host-controlled infrastructure. I found no injection, SSRF, deserialization, or authz defect introduced in this diff. One security-relevant property change is worth flagging: - **`run/executor.go:359-368` / `checkpoint/handle.go:56` — newly-activated unredacted persistence of full transcripts at rest.** The checkpointer port was never consumed before this PR (the executor wiring was a pending P2 follow-up), and the new step-observer accumulates `s.Response.Message()` **and** `llm.ToolResultsMessage(s.Results...)` into `acc`, then calls `ckpt.Save` every step. Tool results routinely contain secrets/credentials/PII the agent fetched (config reads, API responses with tokens, etc.). Previously these lived only in process memory for the run's lifetime; this PR is the first to write the complete `[]llm.Message` transcript to durable storage (mort's job table / any host store) with **no redaction hook**. Impact: it widens data-at-rest exposure and makes the store's encryption/access-control load-bearing for secret confidentiality. This is largely inherent to durable resume (you can't continue without the transcript), so it's a design tradeoff rather than a code defect — but it deserves an explicit note, and ideally a redaction/opt-out seam, since the in-tree `Memory()` default is in-process only and hides the at-rest cost from anyone reading the core. Severity: small. Two things I checked and found **not** to be problems (so I'm not reporting them): - No secret leakage via error paths: `handle.Fail` discards the passed `error` and only `Delete`s, so `runErr` is never persisted. - Authz is preserved on a fresh run: the `Budget.Check(inv.CallerID)` gate and the rebuilt toolbox/permissions still run; `CallerID`/`GuildID` are persisted into meta so recovery can restore attribution. The actual boot re-dispatch + its authz live host-side (mort), explicitly out of this diff, so there's no in-diff authz gap to report. Outside my lens: `classifyCheckpointOutcome` keys shutdown on `run.ErrShutdown` while mort stamps a distinct `runengine.ErrShutdown` (per the PR note, to be aliased later) — until that alias lands, a shutdown-interrupted run would be classified `checkpointFail` and its recovery record **deleted**, defeating durable recovery. That's a correctness/availability issue for the other lens, not security. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> I've verified the change thoroughly against the checked-out code. Build/vet require approval in this sandbox, but I confirmed every cross-file symbol the diff depends on (`MergeCancellation` cause-propagation, `ErrShutdown`/`ErrCriticKill`, `PhaseOutput`, `phaseProvider`, `multimodalUserMessage`, `handle.New` nil-`now` default). ## VERDICT: No material issues found I checked the correctness-sensitive seams and they hold up: - **`classifyCheckpointOutcome` decision matrix** (`run/checkpoint.go:67`) — verified the cause actually reaches it correctly. `finalizeCheckpoint` reads `context.Cause(runCtx)`, and `MergeCancellation` (`run/runengine.go:62`) calls `cancel(context.Cause(secondary))`, so a host-stamped `ErrShutdown` on the base ctx propagates to `runCtx` as its cause → `checkpointLeaveRunning`. A critic kill stamps `ErrCriticKill` on the inner `cancelCause` layer, which `context.Cause` surfaces through the merged child → `checkpointFail`. The `runErr == nil` check precedes the cause check, so a run that completes despite a concurrent shutdown still `Complete`s (clears). All consistent with the test matrix. - **Single-loop transcript accumulator** (`run/executor.go:355-368`) — the reconstructed `acc` is always balanced: each step appends the assistant message and (when present) the paired tool-results message before `Save`, so a saved transcript never ends on a dangling tool call. On resume it's correctly re-seeded from `resume.History` (the `multimodalUserMessage` seed is overwritten), and `ag.Run(runCtx, "", WithHistory(...))` continues without re-running completed tool calls — matching the PR claim. System prompt is re-applied via `agent.New`, correctly not stored in history. Iteration `s.Index+1` is the right 1-based completed-iteration count. - **Phase-boundary resume** (`run/phases.go:91-117,177-188`) — skip uses key-presence (`_, done := outputs[...]`), so a legitimately-empty phase output still counts as done; the `completed` slice is copied per `Save` and ordering is preserved across resumed + fresh phases. The interrupted phase correctly re-runs from scratch (boundary-granular, as documented), and the all-phases-completed edge returns the last saved output without a model call. - **Battery factory** (`checkpoint/handle.go:109-122`) — nil `now` is defaulted inside `New`, so the unset `factory.now` is harmless; `RunCheckpointMeta` carries the `GuildID`/`ModelTier` fields `Begin` populates. No import cycle (battery→core only). One non-blocking observation (not a correctness defect, so excluded from the findings block): `RunCheckpointState.ActivePhase` is plumbed through `handle.Save` but **never written with a non-empty value** — the single-loop path leaves it zero and the phase-boundary `Save` explicitly sets `ActivePhase: ""` (`run/phases.go:186`). Resume doesn't consume it (the active phase re-runs from `CompletedPhases`), so behavior is correct today; the field is just latent/dead. Worth a comment so a future host doesn't rely on it to identify the in-flight phase. </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> All four findings are confirmed against the actual code: 1. **`ActivePhase`** — grep shows every occurrence is either a field declaration, the literal `ActivePhase: ""` (`run/phases.go:186`), or the pass-through copy (`checkpoint/handle.go:61`). No read site exists; the single-loop `Save` (`run/executor.go:367`) never sets it, so `st.ActivePhase` is always `""`. The doc comments at `run/checkpoint.go:21,25` do claim "seeds the active phase" / "the phase that was in flight." Confirmed. 2. **Redundant `ActivePhase: ""`** at `run/phases.go:186` — confirmed zero-value with no reader. 3. **Duplicated resume guard** — `resume != nil && len(resume.History) > 0` at `run/executor.go:356` and `:376`. Confirmed. 4. **`factory.now` never set** — `NewFactory` returns `&factory{store, throttle}` (handle.go:104), leaving `now` nil; only safe via `New`'s nil→`time.Now` default. No assignment anywhere in the package. Confirmed. All findings survive. ## VERDICT: Minor issues - **`ActivePhase` is dead weight, and its docs overpromise — `run/checkpoint.go:25`, `run/ports.go:210`, `checkpoint/checkpoint.go:41`.** The field is plumbed through three structs (`ResumeState`, `RunCheckpointState`, `RunCheckpoint`) but the kernel never reads it and never writes a non-empty value to it. Every write site is either the literal `ActivePhase: ""` (`run/phases.go:186`) or a pass-through copy (`checkpoint/handle.go:61`), and there is no read site anywhere — the single-loop `Save` (`run/executor.go:367`) doesn't set it, and the phase resume loop (`run/phases.go:91-97`) only reads `CompletedPhases`. The design is boundary-granular — the interrupted phase re-runs from its start — so `ActivePhase` carries no information the resume path uses. Meanwhile the doc comments claim otherwise: `run/checkpoint.go:21` says multi-phase resume "skips completed phases **and seeds the active phase**", and the field comment (`:25`) calls it "the phase that was in flight." No active-phase seeding exists. This will mislead a future maintainer into thinking mid-phase resume is implemented. Fix: either drop the `ActivePhase` field from all three structs (and the misleading clauses in the comments), or, if it's a deliberate forward hook for the Phases follow-up, mark it clearly as reserved/unused and correct the comments so they don't describe behavior that isn't there. - **Redundant explicit zero-value with no payoff — `run/phases.go:186`.** The phase-boundary `Save` sets `ActivePhase: ""` explicitly. Since it's always the zero value and nothing reads it, the line is noise that hints at intent the code doesn't deliver. Drop it (folds into the finding above). - **Duplicated resume guard — `run/executor.go:356` and `:376`.** `resume != nil && len(resume.History) > 0` is computed twice within the single-loop branch (once to seed `acc`, once to choose the `ag.Run`-with-history vs `runAgent` path). Low-churn fix: hoist to a local `resuming := resume != nil && len(resume.History) > 0` and reuse it, so the two sites can't drift. - **(trivial) Unused `now` field on `factory` — `checkpoint/handle.go` (factory struct + `NewFactory`).** `NewFactory` never sets `now`, so `factory.now` is always nil; it's only safe because `New` defaults nil→`time.Now` (`handle.go:36-38`). It's a dead field (no same-package assignment, no test sets it). Either drop it or give `NewFactory` a way to inject it, to match the testability `New` already offers. No structural problems with the executor refactor itself: moving `WithStepObserver` out of `sharedOpts` and wiring it per-path (single-loop wraps for checkpointing; phases wire it in `runOnePhase`) is a clean, well-commented split, and the updated comments at `run/executor.go:306-309` and `run/phases.go:229-232` accurately describe the new arrangement. </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> I've verified the mechanism against the actual code. My findings: - `executor.go:367` does call `ckpt.Save(runCtx, ...)` synchronously inside the wrapped step observer `obs`, wired via `agent.WithStepObserver(obs)` — confirmed. - `handle.Save` (handle.go:42–72) is synchronous and throttle-gated; the throttle short-circuits before the store write — confirmed. - `throttle <= 0` "saves every call" is documented (handle.go:30–31) and the factory makes every run durable by default (handle.go:88–91) — confirmed. However, the **performance framing does not hold up**. The step observer fires once per agent *step* — i.e., once per model round-trip (see `run/steps.go` and the `agent.Step`-keyed observer). It is not a tight inner loop. A synchronous local checkpoint write at LLM-round-trip cadence (seconds) is negligible against the model latency that dominates each step, even at `throttle=0`. The "quadratic-ish" transcript re-serialization is true in big-O but trivial in absolute terms because the number of writes is bounded by the number of model calls. The author themselves rate this severity `small` / confidence `low` and concede it "doesn't bite in the intended (throttled) configuration." As a performance finding there is no material cost to keep. Dropping it leaves nothing. ## VERDICT: No material issues found The checkpoint/resume wiring is performance-conscious: the store write is throttle-gated and short-circuits before serialization, the single-loop `acc` accumulation is append-only / amortized O(1) per step, and phase-boundary checkpointing copies a small slice bounded by phase count. The one per-step `ckpt.Save` runs synchronously in the step observer, but that observer fires at model-round-trip cadence, so a local durable write there is not a meaningful hot-path cost. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> Both findings verified against the actual code: **Finding 1 (phases.go:114-117)** — Confirmed. The skip guard `if _, done := outputs[phase.Name]; done` is newly added in this diff, and the same `outputs` map is populated at line 177 as each phase completes. `Phase.Name` (run/agent.go:62) is a free-form string with no uniqueness validation anywhere in `run/` (grep confirms only `uniqueName` for input files and palette dedup — nothing for phase names). So a fresh pipeline with two same-named phases now skips the second. Real new edge case. **Finding 2 (executor.go:193)** — Confirmed at the code level: `cerr` is dropped with no log. The port contract (run/ports.go) does delegate logging to the factory, so this is a minor defensive gap, but the observation is accurate and consistent with the nearby `slog.Warn` degrade convention. Both survive. ## VERDICT: Minor issues Reviewed strictly through the error-handling & edge-cases lens. I traced the shutdown-cause propagation (`ctx` → `WithoutCancel`+`WithTimeout` → `WithCancelCause` → `MergeCancellation`) and confirmed `context.Cause(runCtx)` carries `ErrShutdown` into `classifyCheckpointOutcome` (run/checkpoint.go, run/executor.go:259-264), so the finalize/classify matrix is sound. The accumulator never persists a transcript ending on a dangling tool call (obs appends results before `Save`), and nil/empty inputs are handled (`info.Inputs["prompt"]` on a nil map, `f.now` nil → `time.Now`). Two real but low-severity issues: - **`run/phases.go:114-117` — repeated phase name now silently skips the second phase (new edge case, not resume-only).** The new skip guard keys on `outputs[phase.Name]` *presence*, but that same map is populated as each phase completes (line 177). `Phase.Name` is a free-form string with **no uniqueness validation** (run/agent.go:62-63; no validator anywhere in `run/`). So in a *fresh* (non-resume) pipeline where two phases share a name, the first runs and sets `outputs["x"]`; when the loop reaches the second `"x"`, the guard fires and it is skipped entirely. Before this diff every phase ran unconditionally. The failure is silent: the skipped phase never executes, `lastOutput` is the earlier phase's text, and the run is then checkpoint-`Complete`d as if finished. Impact: a misconfigured-but-previously-working pipeline (e.g. a duplicated "synthesize" name) returns the wrong output with no error. Trigger likelihood is low (requires duplicate names), hence small severity. Fix: either validate phase-name uniqueness up front, or track skip-set membership in a dedicated set seeded *only* from `resume.CompletedPhases` rather than reusing the live `outputs` map. - **`run/executor.go:193` — `Begin` error silently swallowed, no log.** `if c, cerr := …Begin(ctx, info); cerr == nil { ckpt = c }` drops a non-nil `cerr` with no `slog`. The port contract (run/ports.go) says a storage error "should be logged and degraded to (nil, nil)" — i.e. logging is the factory's job — so degrading to non-durable is defensible, but a *contract-violating* factory that returns an error becomes invisibly non-durable. This is inconsistent with the codebase's degrade-with-`slog.Warn` convention used nearby (phase model resolve, template parse, unknown tool). Fix: `slog.Warn("run: checkpointer Begin failed; running non-durable", "run_id", info.RunID, "error", cerr)` in the error branch. </details> </details> <details><summary><b>deepseek-v4-pro:cloud</b> (ollama-cloud) — Minor issues</summary> **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## Verdict: No material issues found After tracing tainted input through the full checkpoint/resume flow and examining the new code for authn/authz gaps, injection, SSRF, unsafe deserialization, credential leakage, and missing validation, I find no material security issues in this PR. Here's what I verified: - **Context value carriers** (`WithResumeState`, `WithExistingCheckpointer`): use unexported key types (`resumeStateKey{}`, `existingCheckpointerKey{}`), the standard Go pattern preventing cross-package spoofing. Verified at `run/checkpoint.go:28-51`. - **Template injection in phase prompts** (`expandPhaseTemplate` at `run/phases.go:343`): uses `text/template` with `missingkey=zero`. The template source is the agent author's `phase.SystemPrompt`; user-supplied values (`query`, prior phase outputs) are substituted as plain strings — Go's `text/template` does not recursively re-parse substituted values. A user typing `{{.Query}}` produces literal text, not expansion. - **`info.Inputs["prompt"].(string)`** in `checkpoint/handle.go:110`: safe comma-ok type assertion. A non-string or nil `Inputs` map yields `""` — no panic, no injection. The prompt is metadata for re-dispatch, not used for any security decision. - **`detach` for finalization** (`run/executor.go:526-530`): uses `context.WithoutCancel` + `WithTimeout` — the correct pattern for cleanup that must not be aborted by caller cancellation. No context-value leakage beyond what the caller already put there. - **Checkpoint data at rest**: `RunCheckpoint` carries `[]llm.Message` (full transcript) and `[]run.PhaseOutput`. This is inherent to durable resume — you can't resume without saving state. The `CheckpointStore` interface leaves encryption/access control to the implementation (mort's durable-job table, SQLite, etc.). The battery itself adds no new exposure. - **No authz bypass**: the `Checkpointer` handle is per-run, minted by the factory and scoped to a single `RunID`. `Complete`/`Fail` delete by that run ID only. No cross-run access vector exists. - **No SSRF, path traversal, or unsafe deserialization**: the code touches no URLs, file paths, or dynamic `interface{}` deserialization. All persisted types are concrete Go structs with string/int/slice fields. - **Error suppression in finalize** (`run/checkpoint.go:88-90`): `Complete`/`Fail` errors are silently discarded (`_ = cp.Complete(...)`). While this is a reliability concern (a failed `Complete` leaves a stale checkpoint), it is security-positive: store error details are never surfaced to callers. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> ## Verdict: Minor issues - **`run/phases.go:184-187`** — `ActivePhase` is always hardcoded to `""` in the phase-boundary checkpoint save, and the single-loop path (`run/executor.go:367`) never sets it either. The field exists in `RunCheckpointState`, `RunCheckpoint`, and `ResumeState` but the executor never writes a non-empty value. A host loading a checkpoint to build a `ResumeState` will always see `ActivePhase == ""`, making the field dead data in the kernel. Either the field should be populated (e.g., set to the *next* phase name at each boundary, or to `""` / the current phase name mid-loop for single-loop runs) or removed from the kernel types to avoid misleading consumers. *(Verified by reading `run/phases.go:183-188` and `run/executor.go:354-368` — `ActivePhase` is never assigned a non-zero value in any save path.)* - **`run/executor.go:192-195`** — `CheckpointerFactory.Begin` errors are silently swallowed. The interface doc says the factory should log and degrade, but if an implementation returns an error without logging (or the error is unexpected), the executor proceeds with no checkpointing and no diagnostic. A `slog.Warn` here would be defense-in-depth consistent with how the executor logs other non-fatal port failures. *(Verified by reading `run/executor.go:192-195` — the `cerr` variable is checked but never logged.)* - **`run/executor.go:367` vs `run/phases.go:185`** — Inconsistent defensive copying: the phase path copies `completed` via `append([]PhaseOutput(nil), completed...)` before passing to `Save`, but the single-loop path passes the live `acc` slice directly (`Messages: acc`). If any `CheckpointStore` implementation retains the `RunCheckpoint` struct (or its `Messages` field) beyond the synchronous `Save` return — e.g., for async serialization — the single-loop path's checkpoint data would be mutated by subsequent observer calls. The single-loop path should copy `acc` the same way the phase path copies `completed`. *(Verified by comparing `run/executor.go:367` with `run/phases.go:185`.)* </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> All three findings are confirmed against the actual code. Let me verify the exact line numbers for the findings: 1. **`run/checkpoint.go:25`** — `ResumeState.ActivePhase` is defined but never read. Grep confirms zero reads of `resume.ActivePhase` anywhere; the only `.ActivePhase` reads are on `RunCheckpointState` (in `checkpoint/handle.go:61`), never on `ResumeState`. The phase runner uses `CompletedPhases` but never consults `ActivePhase`. 2. **`run/ports.go:210` / `run/phases.go:186`** — `RunCheckpointState.ActivePhase` is defined and persisted, but the only write in the entire codebase is `ActivePhase: ""` at `run/phases.go:186`. It is never set to a non-empty value. 3. **`run/executor.go:355-358`** — On the resume path, `acc` is allocated as `[]llm.Message{multimodalUserMessage(input, inv.Images)}` at line 355 and then immediately overwritten with `resume.History` at line 357. The initial allocation is dead on the resume path. All three survive verification. ## Verdict: Minor issues - **`run/checkpoint.go:25` — Dead field `ResumeState.ActivePhase`** (verified). The field is defined, documented ("the phase that was in flight"), and carried through the context plumbing, but it is **never read anywhere in the codebase**. I grepped for `resume.ActivePhase`, `\.ActivePhase` across the entire repo — the only `.ActivePhase` reads are on `RunCheckpointState` (in `checkpoint/handle.go:61`), never on `ResumeState`. The phase runner (`run/phases.go:91-97`) uses `deps.resume.CompletedPhases` to skip finished phases but never consults `ActivePhase`. This is a dead field that will confuse future readers who expect it to drive behavior. Either wire it up (e.g., to validate that the active phase matches expectations on resume) or remove it. - **`run/ports.go:210` / `run/phases.go:186` — `RunCheckpointState.ActivePhase` is always `""`** (verified). The field is defined, documented, and persisted to the store, but the only write in the codebase is `ActivePhase: ""` at `run/phases.go:186`. It is never set to a non-empty value. A reader seeing this field on the struct would reasonably expect it to carry the name of the in-progress phase during a multi-phase run, but it never does. Either populate it (set it to `phase.Name` before the phase loop body at `run/phases.go:111`) or remove it from the struct until it's actually used. - **`run/executor.go:355-358` — Wasted allocation in `acc` initialization** (verified). When resuming, `acc` is first allocated as `[]llm.Message{multimodalUserMessage(input, inv.Images)}` and then immediately overwritten with `resume.History` on the next line. The initial `multimodalUserMessage` allocation is dead on the resume path. A cleaner structure avoids the throwaway allocation: ```go var acc []llm.Message if resume != nil && len(resume.History) > 0 { acc = append([]llm.Message(nil), resume.History...) } else { acc = []llm.Message{multimodalUserMessage(input, inv.Images)} } ``` </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> After a thorough review of the hot paths, allocation patterns, and synchronization overhead introduced by this PR, I find no material performance regressions. Here's what I verified: - **Single-loop checkpoint observer** (`run/executor.go:359-368`): The closure captures `acc` by reference and calls `ckpt.Save` on every step. However, `handle.Save` (`checkpoint/handle.go:42-72`) throttles aggressively — after the first persist, subsequent calls within the throttle window do only a mutex lock/check/unlock (uncontended, single-goroutine). The `RunCheckpointState` struct is stack-allocated and the `Messages` slice header copy is O(1). No deep copy occurs on the hot path. - **Phase-boundary checkpointing** (`run/phases.go:182-188`): `append([]PhaseOutput(nil), completed...)` allocates a copy of the completed-phases slice at each boundary. For N phases this is O(N²) total allocation, but N is small (typically 2–5), making this negligible. - **Resume path** (`run/executor.go:355-357`): The initial `acc` allocation (`[]llm.Message{multimodalUserMessage(...)}`) is discarded when resume history is present. This is one wasted allocation per resumed run — truly trivial. - **No unbounded growth, missing pagination, or N+1 patterns** were introduced. The transcript grows with the conversation, but that's inherent to durable recovery and bounded by the existing compaction system. - **No blocking calls on hot paths beyond the throttled Save**. The mutex in `handle.Save` is uncontended (one goroutine per run), and `time.Now()` (if used) is vDSO-optimized. ## Verdict: No material issues found </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> ## Verdict: Minor issues - **`run/executor.go:193-195`** — Swallowed `Begin` error with no logging. If `CheckpointerFactory.Begin` returns a non-nil error (e.g., a transient store issue the factory didn't self-log), the executor silently degrades to non-durable with zero indication. The interface doc says the factory should log and return `(nil, nil)`, but the executor cannot enforce that. At minimum, log a warning so operators know durability is broken for this run. *(Verified by reading `executor.go:192-196` — the `cerr` is checked only for nil-ness, never logged.)* - **`run/executor.go:367`** — `ckpt.Save` error silently discarded. If the checkpoint store is down mid-run, every step's save fails silently; the run completes, `finalizeCheckpoint` calls `Complete` (which may also fail silently), and there is no durable record — but the caller sees a successful run with no indication durability was lost. Same pattern at **`run/phases.go:184`** for phase-boundary saves. *(Verified by reading both call sites — both use `_ =` to discard the error.)* - **`run/checkpoint.go:88-89`** — `cp.Complete` / `cp.Fail` errors silently discarded in `finalizeCheckpoint`. If the store is unavailable at finalize time, a successful run's checkpoint is never cleaned up (orphaned record) or a failed run's checkpoint persists (waste). Neither case is logged. *(Verified by reading `checkpoint.go:86-93`.)* - **`run/executor.go:367`** — `acc` slice passed to `Save` without copying, inconsistent with the defensive copy in `phases.go:185` (`append([]PhaseOutput(nil), completed...)`). The `phases.go` comment explicitly notes the checkpointer "may hold/serialize it asynchronously." In the single-loop path, `acc` continues to grow via `append` in subsequent observer calls after `Save` returns. If any `Checkpointer` implementation holds the `Messages` slice reference past `Save`'s return, the transcript it serializes will be corrupted by later appends. In practice `handle.Save` is synchronous, but the inconsistency is a latent race. *(Verified by comparing `executor.go:367` with `phases.go:184-185`.)* - **`run/checkpoint.go:73`** — `classifyCheckpointOutcome` keys shutdown detection on `run.ErrShutdown`, but mort currently stamps its own `runengine.ErrShutdown` (a distinct sentinel) on the base context. Until the mort wiring PR aliases them, `errors.Is(cause, ErrShutdown)` will be `false` for every mort shutdown, causing `checkpointFail` (deletes the checkpoint) instead of `checkpointLeaveRunning`. The PR description acknowledges this gap; from the error-handling lens, it means the shutdown→recovery path is non-functional with current mort. *(Verified by reading `runengine.go:34` and `checkpoint.go:73` — two distinct `errors.New` sentinels.)* </details> </details> <details><summary><b>devstral-2:123b-cloud</b> (ollama-cloud) — Blocking issues found</summary> **Verdict: Blocking issues found** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> ## Verdict Minor issues ## Findings - **Potential information leakage in checkpoint metadata** (`checkpoint/handle.go:109-121`): The `Begin` method extracts the prompt from `info.Inputs["prompt"]` without validation or sanitization. If the prompt contains sensitive information (e.g., API keys, PII), it will be stored in the checkpoint metadata. This could lead to unintended exposure if the checkpoint store is compromised or accessed by unauthorized parties. Consider redacting sensitive data or providing an allowlist/denylist mechanism for what can be stored in checkpoints. - **Context value type safety** (`run/checkpoint.go:35,49`): The `resumeStateFromContext` and `existingCheckpointerFromContext` functions use type assertions without error handling. If the context contains a value with the wrong type for these keys, the functions will silently return `nil`, which could lead to unexpected behavior. While this is not a direct security vulnerability, it could mask configuration errors that might have security implications. ## gadfly-findings ```json [ {"file": "checkpoint/handle.go", "line": 109, "severity": "medium", "confidence": "high", "title": "Potential information leakage in checkpoint metadata"}, {"file": "run/checkpoint.go", "line": 35, "severity": "small", "confidence": "medium", "title": "Context value type safety issue"} ] ``` </details> <details><summary><b>🎯 Correctness</b> — Blocking issues found</summary> ## Verdict **Blocking issues found** ## Findings - **`run/checkpoint.go:73`**: The logic in `classifyCheckpointOutcome` is incorrect. The function checks `errors.Is(cause, ErrShutdown)`, but `ErrShutdown` is defined in `runengine.go` as a distinct error value (`errors.New("mort: shutting down")`). However, the PR description explicitly states that `mort` currently stamps its own `runengine.ErrShutdown` (a distinct value) on the base context, and the mort wiring PR will alias them so `errors.Is` matches. This means the current implementation will **not** correctly classify shutdowns until the mort wiring PR is merged, which could lead to incorrect checkpoint finalization (e.g., treating shutdowns as failures instead of leaving them for recovery). This is a **blocking correctness issue**. - **`run/executor.go:356-367`**: The checkpointing logic for single-loop runs incorrectly initializes the transcript accumulator (`acc`). It starts with `multimodalUserMessage(input, inv.Images)` and then **overwrites** it with `resume.History` if a resume state exists. This means the initial user message is lost during resume, which violates the semantic correctness of resuming a run. The resumed run should continue from the saved transcript, but the initial user message should still be part of the history. This is a **blocking correctness issue**. ## gadfly-findings </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> Based on the verification of the actual code, here is the corrected review: ## Verdict Minor issues ## Findings - **`run/checkpoint.go:54-61`**: The `checkpointOutcome` type and its constants are not exported. This limits their usability outside the package and could be confusing for external consumers who might need to interact with these outcomes. Consider exporting them if they are part of the public API. - **`run/phases.go:67-74`**: The `phaseDeps` struct includes a `checkpointer` field and a `resume` field, but the comments and logic around these fields are not entirely clear. The comments could be improved to better explain the purpose and usage of these fields, especially in the context of durable recovery. </details> <details><summary><b>⚡ Performance</b> — Minor issues</summary> ## Corrected Review ## Verdict Minor issues ## Findings - **Potential Unbounded Growth in `completed` Slice in `runPhases`** In `run/phases.go:83`, the `completed` slice is appended to in each phase iteration without any bounds checking. This could lead to unbounded growth in memory usage for long-running or frequently resumed multi-phase runs. Consider adding a limit or periodic cleanup mechanism to prevent excessive memory consumption. - **Unverified: Potential Inefficiency in Slice Copying** In `run/phases.go:185`, the `completed` slice is copied using `append([]PhaseOutput(nil), completed...)`. While this ensures isolation, it could introduce unnecessary allocations if the slice is large. This is a minor concern but worth noting for optimization in high-throughput scenarios. ## gadfly-findings ```json [ {"file": "run/phases.go", "line": 83, "severity": "medium", "confidence": "high", "title": "Unbounded growth in `completed` slice could lead to excessive memory usage"}, {"file": "run/phases.go", "line": 185, "severity": "small", "confidence": "medium", "title": "Potential inefficiency in slice copying during checkpointing"} ] ``` </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> All three findings are confirmed in the actual code: 1. **run/checkpoint.go:90**: The error returned by `cp.Fail` is ignored. 2. **run/executor.go:367**: The error returned by `ckpt.Save` is ignored. 3. **run/phases.go:184**: The error returned by `deps.checkpointer.Save` is ignored. The draft review is accurate, and all findings are valid. The verdict remains "Minor issues." </details> </details> <details><summary><b>glm-5.2:cloud</b> (ollama-cloud) — Minor issues</summary> **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> I traced every tainted-data path this PR opens, through my assigned security lens: - **Resume transcript → model history**: `executor.go:379` seeds `resume.History` via `agent.WithHistory`. This is the same trust boundary as the existing image-folding path (`runAgent` at `executor.go:547` already builds `WithHistory` from `input`/`images`). Resume data originates from the host's own persisted checkpoint, not a new external input. - **Recovered phase outputs → template expansion**: `phases.go:92-96` pre-populates `outputs` from `resume.CompletedPhases`, and `expandPhaseTemplate` (`phases.go:343`) feeds them into `text/template`. I verified the template is parsed from the static `phase.SystemPrompt` and the recovered outputs are substituted as **data values**, not parsed as template text — so there's no template-injection sink. This is the same flow as in-run phase outputs, which were already attacker-reachable via model output. - **Authorization on resume**: a recovered run re-enters the full `Run` entry point, so `Budget.Check` (`executor.go:139-143`), model resolution, and the toolbox build all re-run. `ResumeState` carries only transcript/phase data — no agent/tier/invocation override — so there's no authz bypass. - **Context-carrier seams** (`WithResumeState`/`WithExistingCheckpointer`): host-trusted recovery plumbing, not untrusted input surfaces. - **`factory.Begin` prompt read** (`handle.go:110`): a safe type-assertion from `info.Inputs`, already part of the trusted run input; not a new injection vector. - **Shutdown classification** (`classifyCheckpointOutcome`): keyed on `errors.Is(cause, ErrShutdown)` from `context.Cause(runCtx)` — only the host stamps the cause; an external caller can't forge it to force a leave-for-recovery outcome. Verdict: No material issues found. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> I now have enough to verify both findings. **Finding 1** claims that when compaction fires, the majordomo agent's internal message history "is replaced by a summary" while `acc` keeps the raw pre-compaction messages, causing resume to replay the full transcript. But the `compact/compactor.go` documentation in *this* repo explicitly states (lines 36-38): > "The compactor is per-run STATEFUL: **majordomo does not replace the loop's internal transcript with the compacted slice (the hook shapes only what is SENT)**" So the agent's internal transcript is *not* replaced by compaction — the compactor only shapes what's sent to the model per-call, and the loop keeps the raw messages. That means `acc` (the accumulated raw transcript) *is* the actual conversation state, and replaying it on resume is correct. Finding 1's central premise is contradicted by the in-repo compactor docs, so it must be dropped. The reviewer even flagged the agent internals as "unverified" — verification against the available compactor documentation overturns it. **Finding 2** — `ResumeState` (run/checkpoint.go:22-26) carries only `History`, `CompletedPhases`, `ActivePhase`; there is no iteration offset. On the single-loop resume path (run/executor.go:357), `acc` is seeded from `resume.History`, and each Save records `Iteration: s.Index + 1` (run/executor.go:367) where `s.Index` is the *new* agent loop's step index starting at 0. So a run saved at iteration 5, resumed for one more step, checkpoints `Iteration: 1`. The prior iteration count is genuinely lost in the durable metadata. This is a real but metadata-only issue (it doesn't break resume continuation). Finding 2 is confirmed and kept. ## VERDICT: Minor issues - `run/executor.go:367` — On a single-loop resume, each checkpoint Save records `Iteration: s.Index + 1` where `s.Index` is the *new* agent loop's step index (starting at 0), and `ResumeState` (run/checkpoint.go:22-26) carries no iteration offset. A run previously saved at iteration 5 that resumes for one more step therefore checkpoints `Iteration: 1`, losing the prior iteration count in the durable record. This corrupts the durable progress metadata that a host may use for display, throttling, or step-budget decisions on a subsequent recovery. Resume continuation itself is unaffected. Minor, metadata-only. Fix: carry the prior iteration count in `ResumeState` and add `s.Index` to it when saving. </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> All four findings verified against the actual code: 1. **ActivePhase** — `ResumeState.ActivePhase` (run/checkpoint.go:25), `RunCheckpointState.ActivePhase` (run/ports.go:210), `RunCheckpoint.ActivePhase` (checkpoint/checkpoint.go:41) are all declared but never read; `RunCheckpointState.ActivePhase` is never set to anything but `""` (run/phases.go:186). Confirmed dead. 2. **Checkpoint observer duplicate transcript assembly** — confirmed at run/executor.go:359-367, and phases.go:236 wires plain `deps.stepObserver` with no checkpointing, so the "differs" rationale only applies to single-loop. Confirmed. 3. **Redundant lastOutput** — pre-population loop (run/phases.go:95) sets `lastOutput = pc.Output`; skip branch (line 115) reassigns `lastOutput = outputs[phase.Name]` (same value). Confirmed no-op. 4. **NewFactory never initializes factory.now** — confirmed at checkpoint/handle.go:103-104; `f.now` stays nil, passed to `New` which falls back to `time.Now`. Confirmed. ## VERDICT: Minor issues - **`run/checkpoint.go:25` — `ResumeState.ActivePhase` is dead.** The field is declared and documented ("the phase that was in flight"), but no code reads it: grep for `.ActivePhase` only returns write sites (`checkpoint/handle.go:61` copies `st.ActivePhase` into the persisted struct; `run/phases.go:186` always saves `ActivePhase: ""`), and `runPhases` only consumes `ResumeState.CompletedPhases` (skipping whole phases, not seeding an in-flight one). `RunCheckpointState.ActivePhase` / `RunCheckpoint.ActivePhase` are likewise never set to a non-empty value, and the persisted `RunCheckpoint` is never loaded back to populate `ResumeState.ActivePhase` (no caller reads `RunCheckpoint.ActivePhase` from a `Load`). The comment on `RunCheckpointState.Messages` ("OR the active phase's transcript", run/ports.go:202-203) is therefore also misleading — the multi-phase path never records an active-phase transcript. Either wire active-phase resume or drop the fields/comments to match what's shipped (boundary-granular only). - **`run/executor.go:354-369` — checkpoint observer duplicates transcript assembly that `agent` already owns.** The wrapped `obs` manually reconstructs the conversation (`s.Response.Message()` + `llm.ToolResultsMessage(s.Results...)`) into `acc`, mirroring exactly what the agent loop builds internally. The comment at `run/executor.go:307-309` cites that the observer "differs single-loop vs per-phase" as the reason it moved out of `sharedOpts`, but the per-phase path (`run/phases.go:236`) just wires the *plain* `deps.stepObserver` with no checkpointing — so the "differs" rationale only applies to single-loop. A `transcript()` accessor on `agent.Step`/`agent.Result` (or reusing `runRes.Messages` at finalize) would remove this hand-rolled zip, which is exactly the pairwise-call/result shape the existing `stepObserver` at `run/executor.go:280-303` already exercises. Worth a follow-up to share the accumulation rather than duplicate it. - **`run/phases.go:114-117` — redundant `lastOutput` reassignment on the resume skip path.** `lastOutput` was already set to `pc.Output` for every resumed phase during the pre-population loop at `run/phases.go:91-97`. Re-assigning it inside the skip branch (`lastOutput = outputs[phase.Name]`) is a no-op for resumed phases and dead-confusing next to the identical-looking line at `178`. The skip branch can be a bare `continue`. - **`checkpoint/handle.go:103` — `NewFactory` silently drops `now`.** `factory.now` is read by `Begin` → `New` (passed as `f.now`), but `NewFactory` never initializes it, so it's always `nil` and `New` falls back to `time.Now` (handle.go:36-38). That's functionally fine, but it leaves a struct field that looks configurable and isn't — either accept `now` in `NewFactory` (parity with `New`) or drop `factory.now` and pass `nil` explicitly so the field doesn't read as a forgotten injection point. The omission stands out against the existing `New(store, meta, throttle, now)` signature. </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> ## Verdict: No material issues found Through the performance lens, the durable-checkpoint wiring is reasonably behaved: - **Single-loop per-step `Save` is throttled.** `handle.Save` (`checkpoint/handle.go:42`) takes the throttle window under a brief mutex, releases it *before* the store I/O, and skips the persist inside the window — so a hot agent loop doesn't block on store writes every step. Per-step work that always runs (the `acc` append, `llm.ToolResultsMessage`, the `RunCheckpointState` struct construction) is O(1) amortized / cheap, so even unthrottled the per-step overhead is minimal. - **No N+1 / quadratic blowup introduced.** The transcript `acc` grows O(steps) and each *persisted* save is O(transcript), but frequency is bounded by the throttle, so total cost is O(saves × transcript), not O(steps²). Phase-boundary saves copy the completed-phases list (`append([]PhaseOutput(nil), completed...)`, `run/phases.go:185`) — O(phases), and phase counts are small. Both match mort's established pattern; not a regression. - **No unbounded growth in the real wiring.** `Complete`/`Fail` delete the record (`checkpoint/handle.go:74-76`); only shutdown-interrupted runs survive, which is the intended recovery set. The in-process `Memory` store's `ListInterrupted` copies the whole map (`checkpoint/memory.go:47-54`), but that store is test-only and explicitly non-durable per its own doc. Minor (trivial) nits, not material: - `run/executor.go:355-358`: on a resume, `multimodalUserMessage(input, inv.Images)` is built and immediately discarded (overwritten by the resume-history copy). One wasted message construction per run — negligible. - `run/executor.go:359-368`: the wrapped observer constructs `llm.ToolResultsMessage(s.Results...)` and a `RunCheckpointState` every step even when `handle.Save` will be throttled away. Negligible allocation on the hot path; could short-circuit, but not worth it. - A host passing `throttle <= 0` to `NewFactory` makes every step block synchronously on `store.Save` — but this is documented as host-controlled ("the host throttles") and mirrors mort, so it's a configuration footgun, not a defect. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> ## VERDICT: Minor issues - **`run/executor.go:207` (and the other pre-loop early returns at ~215, ~228) skip `finalizeCheckpoint`.** Confirmed: `finalizeCheckpoint` has a single call site at `executor.go:404`, after the agent loop; the checkpointer is acquired earlier at `executor.go:190-196` — *before* the toolbox/delegation/extra-tool builds. The build-error early returns (`return res` at lines 207, 215, 228, and the session-tool one at ~242) all run before `finalizeCheckpoint`, leaving `ckpt` unfinalized. For a **fresh** run this is harmless — the handle writes nothing to the store until the first `Save`, which only happens inside the loop, so `ListInterrupted` finds no record. But for a **recovered** run (`WithExistingCheckpointer`, `executor.go:191`), the durable record already exists from the prior interrupted run; a persistent pre-loop error (e.g., a misconfigured tool failing `Registry.Build`) takes the early `return res` without `Complete`/`Fail`, so the record survives → boot recovery re-dispatches the same run → same build error → indefinite retry loop. Suggested fix: route the build-error early returns through `finalizeCheckpoint(ctx, ckpt, res.Err, context.Cause(runCtx))` so a terminal build error `Fail`s an existing record instead of leaving it recoverable (or `defer` it guarded by whether the run actually started). - **`run/executor.go:192-196` silently swallows the `CheckpointerFactory.Begin` error.** Confirmed against `run/ports.go:181-182`, whose doc comment on `CheckpointerFactory` explicitly states: "A storage error should be logged and degraded to (nil, nil)". The executor does `if c, cerr := e.cfg.Ports.Checkpointer.Begin(ctx, info); cerr == nil { ckpt = c }` and discards `cerr` with no log, so a failing/misconfigured checkpoint store makes every run silently non-durable with zero operator signal. Minor, but it contradicts the stated invariant. Suggested fix: emit an `slog.Warn` on the `cerr != nil` branch. The `classifyCheckpointOutcome` decision matrix is sound: `ErrShutdown`/`ErrCriticKill` are defined at `run/ports.go`-adjacent code, the `runErr == nil` short-circuit correctly precedes the shutdown check so a clean success is always `Complete` even if shutdown landed late, and the test matrix in `run/checkpoint_test.go` covers the cases. </details> </details> <details><summary><b>kimi-k2.6:cloud</b> (ollama-cloud) — Blocking issues found</summary> **Verdict: Blocking issues found · ⚠️ 1/5 lens(es) errored** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **No material issues found** I scrutinized the durable-checkpoint wiring for authentication, injection, unsafe deserialization, secret leakage, input validation, and insecure defaults. The changes are clean for their intended threat model: - **`run/checkpoint.go`**: `WithResumeState` / `WithExistingCheckpointer` are standard context carriers; the unexported key types mean only package-level code can read them. `classifyCheckpointOutcome` correctly uses `errors.Is(cause, ErrShutdown)` to avoid exact-equality fragility. `finalizeCheckpoint` runs on a detached context so cancellation doesn't prevent terminal-state recording. No injection or deserialization surface here. - **`run/executor.go`**: The resume path seeds `resume.History` via `agent.WithHistory` and passes empty input. The transcript accumulator starts from the resume history (or the opening user message) and appends step responses + tool results. No command/template/SQL injection vectors introduced — the only template is `expandPhaseTemplate`, which uses the agent-defined `SystemPrompt` (trusted) with `missingkey=zero` against user data. - **`checkpoint/handle.go`**: `NewFactory.Begin` reads `info.Inputs["prompt"]` with a safe type assertion to string (defaults to empty). The factory mints a per-run `handle` keyed by `RunID`, so concurrent runs don't collide. Error from `Begin` is silently swallowed on the executor side, but the interface contract explicitly expects degradation to non-durable rather than failing the run. - **`run/phases.go`**: Phase-boundary checkpointing copies the `completed` slice before saving (defensive against async serialization). Resumed runs skip phases by key presence in `outputs`, which is populated from the trusted `ResumeState`. - **`run/ports.go`**: `GuildID` and `ModelTier` are added to `RunInfo` and checkpoint meta for resume re-dispatch attribution. This is needed metadata, not a leakage bug. No authn/authz gaps, injection points, SSRF, unsafe deserialization, credential leakage, or missing input validation were introduced by this diff. </details> <details><summary><b>🎯 Correctness</b> — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 10: context deadline exceeded </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — No material issues found</summary> No material issues found. Through the cleanliness & maintainability lens, the PR is well-structured: - Context key types follow the existing repo pattern (`progressSinkKey`). - The step-observer wrapping for checkpointing in `executor.go` is localized and clearly commented. - The factory pattern matches mort's existing `agentexec.CheckpointerFactory`, keeping cross-repo consistency. - Tests reuse shared helpers (`phaseProvider`) rather than duplicating setup. - Error handling degrades gracefully (failed `Begin` → nil checkpointer, nil-safe throughout). No dead code, confusing names, or leaky abstractions are introduced. </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> No material issues found Through the performance lens, the checkpoint wiring is clean: - The single-loop path accumulates transcript messages in a slice and calls `ckpt.Save` each step, but the battery-level `handle.Save` throttles these calls by time window (skipping eagerly inside the throttle), and the host controls the throttle duration. The work inside the observer (appending to `acc`, calling `Save`) is O(1) when throttled and O(n) in total transcript length only when a save actually fires — which is unavoidable since the full transcript is required for mid-loop resume. - Phase-boundary checkpointing only fires once per completed phase (not per step), copies the completed-phases slice defensively, and stores only phase outputs — both bounded and minimal. - Resume seeding via `WithHistory` is a one-time cost at loop start. - No N+1 queries, unbounded growth, or quadratic behavior introduced. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Blocking issues found</summary> **Blocking issues found** - **`run/executor.go:188`** — `CheckpointerFactory.Begin` error is silently discarded. If the factory returns a storage error (e.g., DB unavailable), the executor proceeds as if the run is non-durable, with no log or metric. Operators have no way to detect a broken checkpoint store. - **`run/checkpoint.go:88-90`** — `finalizeCheckpoint` swallows errors from `cp.Complete` and `cp.Fail`. If the store fails to delete a finished run's checkpoint (network blip, timeout), the record survives and the next boot will incorrectly attempt to resume a run that already terminated. This creates false recovery candidates. </details> </details> <details><summary><b>qwen3.5:397b-cloud</b> (ollama-cloud) — Blocking issues found</summary> **Verdict: Blocking issues found** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> ## VERDICT: Minor issues found **Security lens findings:** - **`run/phases.go:177-188` + `run/phases.go:343-359` — Template injection via persisted phase outputs (medium)**: Phase outputs (derived from LLM responses) are stored in checkpoints at line 182-187 (`completed = append(completed, PhaseOutput{Name: phase.Name, Output: output})`) and later used as template data in `expandPhaseTemplate` on resume. On resume, `outputs` map is populated from `deps.resume.CompletedPhases` (lines 91-97), then passed directly to `expandPhaseTemplate` at line 123. The `expandPhaseTemplate` function uses `text/template` with no sanitization of `priorOutputs` — it directly inserts phase outputs into the template data map (lines 349-352). An attacker controlling LLM output via prompt injection could inject Go template syntax (`{{...}}`) that gets evaluated when the checkpoint resumes. **Verified**: Read `expandPhaseTemplate` — uses `template.New("phase")...Parse(tmpl)` and `t.Execute(&buf, data)` where `data` includes unsanitized `priorOutputs`. **Fix**: Sanitize phase outputs before template insertion, or use a non-interpolating format for phase threading. - **`checkpoint/handle.go:110-122` — Prompt stored in checkpoint metadata without size validation (small)**: The user prompt is extracted from `info.Inputs["prompt"]` at line 110 via direct type assertion (`prompt, _ := info.Inputs["prompt"].(string)`) with no length limit or validation before being stored in `RunCheckpointMeta.Prompt` at line 118. A host using a bounded store (e.g., fixed-size job table) could face DoS via oversized prompts. **Verified**: Read `Begin()` — direct type assertion with no bounds check. **Fix**: Truncate or reject prompts exceeding a reasonable threshold before storing in meta. - **`run/checkpoint.go:86-93` — Checkpoint finalize swallows errors silently (small)**: `finalizeCheckpoint` discards all errors from `Complete()`/`Fail()` with `_ =` at lines 88 and 90. If the checkpoint store fails during terminal state recording, the run completes but the checkpoint may remain in an inconsistent state (e.g., marked as "interrupted" when it actually finished). **Verified**: Read `finalizeCheckpoint` — errors explicitly ignored. **Fix**: At minimum log the error; consider surfacing it to audit/recorder. - **`run/phases.go:114-116` — Resume phase-skipping trusts checkpoint integrity (medium)**: On resume, phases are skipped based solely on key presence in `outputs` map (line 114: `if _, done := outputs[phase.Name]; done`), populated from `deps.resume.CompletedPhases` (lines 91-97). If an attacker compromises the checkpoint store, they can inject arbitrary `PhaseOutput` entries to skip security-critical phases. **Verified**: Read `runPhases` resume logic — no integrity validation of `deps.resume.CompletedPhases` before using them to skip phases. **Fix**: Add checkpoint integrity verification (e.g., HMAC) if store is untrusted; document trust boundary. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> **VERDICT: Minor issues found** **Findings:** 1. **`run/phases.go:186` — `ActivePhase` field unused in resume logic (dead code)** - The `ResumeState.ActivePhase` field is populated from checkpoints but **never read** during resume. The multi-phase resume logic (lines 91-97, 114-117) determines which phase to run next solely by checking `CompletedPhases` — it runs the first phase whose name is NOT in the completed set. The `ActivePhase` field has no effect on behavior. - **Impact**: Dead code that suggests functionality (mid-phase resumption) that doesn't exist. Confusing for future maintainers. - **Fix**: Either remove `ActivePhase` from `ResumeState` and `RunCheckpointState`, or document it as metadata-only (not used for resume decisions). *Verified by reading `run/phases.go:81-117` and `run/checkpoint.go:22-26`.* 2. **`run/ports.go:202-203` — Documentation claims unsupported functionality** - The comment states `Messages` is "the running transcript (single-loop run) OR the active phase's transcript (multi-phase run)". However, the multi-phase checkpoint save at `run/phases.go:184-187` **never sets `Messages`** — it only saves `CompletedPhases` and `ActivePhase`. This means mid-phase transcript recovery for multi-phase runs is NOT supported (only boundary-granular). - **Impact**: Documentation promises functionality that the code doesn't deliver. A reader expecting mid-phase transcript resume for multi-phase runs will be misled. - **Fix**: Update the comment to clarify that `Messages` is only populated for single-loop runs; multi-phase checkpoints do not preserve the active phase's transcript. *Verified by reading `run/phases.go:179-188` and `run/ports.go:200-211`.* 3. **`run/checkpoint.go:25` — `ActivePhase` field semantically inconsistent** - `ResumeState.ActivePhase` is documented as "the phase that was in flight", but the checkpoint save at phase boundaries (after a phase completes) sets `ActivePhase: ""` (line 186). The field would only be meaningful if saved mid-phase, but no mid-phase save exists for multi-phase runs. - **Impact**: Field is always empty when read on resume, making it useless for its stated purpose. - **Fix**: Remove the field or repurpose it to track "next phase to run" if that becomes useful. *Verified by reading `run/phases.go:179-188`.* </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> **VERDICT:** Minor issues found **Findings:** - **`run/checkpoint.go:23`** — The `ResumeState.History` field comment claims it holds "single-loop transcript OR active-phase transcript", but the multi-phase path in `run/phases.go:88-97` never reads `deps.resume.History` — it only uses `CompletedPhases` to skip finished phases. The active phase re-runs from scratch with the original query. This is documented as intentional boundary-granular recovery in the code comments, but the struct comment is misleading. **Fix:** Update the comment to clarify `History` is only used for single-loop runs; multi-phase recovery does not resume mid-phase transcripts. - **`run/checkpoint.go:25`** — The `ResumeState.ActivePhase` field is defined but never read by the executor. Grep confirms it is only written (in `checkpoint/handle.go:61`, `run/phases.go:186`) but never read. The multi-phase path determines which phase to run by checking key presence in `outputs` map (`run/phases.go:114-117`), not by consulting `ActivePhase`. This field adds cognitive load without functional purpose. **Fix:** Either remove the field or add a comment explaining it's metadata only (e.g., for host debugging/logging) and not used by the kernel. - **`checkpoint/handle.go:107-108`** — The `factory.Begin` method extracts `prompt` from `info.Inputs["prompt"]` with a comment saying this is "so a recovered run can re-dispatch". However, the kernel's multi-phase recovery path (`run/phases.go:81-191`) does not use any recovered prompt — it skips completed phases and re-runs the active phase with the original `query` passed to `runPhases`. The `RunInfo.Prompt` field (populated from this extraction) is also never read anywhere in the executor or phases logic. The prompt extraction appears to be dead code for the multi-phase case. **Fix:** Clarify the comment to state this is only relevant for single-loop resume, or remove if unused. </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> **No material issues found** After reviewing the checkpoint/resume wiring through the performance lens, the implementation handles the key efficiency concerns appropriately: **Verified findings:** 1. **Throttling is correctly implemented** (`checkpoint/handle.go:45-48`): The `Save` method checks the throttle window before persisting, preventing excessive I/O on every step. Confirmed by reading `handle.go`. 2. **No unnecessary slice copies in hot path** (`run/executor.go:359-367`): The single-loop observer accumulates `acc` in-place and passes the slice header to `Save`. The slice grows naturally via `append`, which is amortized O(1). Confirmed by reading `executor.go`. 3. **Multi-phase checkpoint copy is intentional** (`run/phases.go:185`): `append([]PhaseOutput(nil), completed...)` creates a snapshot copy at phase boundaries to prevent aliasing—this is correct since the checkpointer may hold the slice asynchronously. Phase boundaries are infrequent (not per-step), so this copy is negligible. Confirmed by reading `phases.go`. 4. **Resume path does one-time O(n) copy** (`run/executor.go:356-357`): `append([]llm.Message(nil), resume.History...)` copies the saved transcript once at startup to prevent aliasing between the resumed state and new messages. This is a necessary correctness cost, not a hot-loop regression. Confirmed by reading `executor.go`. 5. **No N+1 or unbounded growth issues**: Checkpoints are per-run (factory pattern), the `Memory` store holds one checkpoint per run ID (overwrites on save), and completed runs delete their checkpoints immediately via `Complete`. Confirmed by reading `memory.go:23-28` and `handle.go:74-76`. The throttle prevents hot-path I/O, slice handling avoids unnecessary copies while preserving correctness, and there are no quadratic behaviors or blocking calls in performance-critical sections. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Blocking issues found</summary> **Blocking issues found** - **`run/phases.go:184` — Phase-boundary checkpoint `Save` error silently swallowed**: The phase-boundary save discards the error with `_ =`. If the checkpoint store fails mid-run (disk full, DB locked, network blip), the run continues but checkpoints aren't persisted. On shutdown/recovery, the host will resume from a stale checkpoint (or none), potentially re-executing work already done or losing progress. **Verified** by reading `run/phases.go:183-188`. *Suggested fix*: Log the error (at minimum) and consider surfacing it via the `RunRecorder` audit trail so the host knows durability failed. - **`run/executor.go:367` — Per-step checkpoint `Save` error silently swallowed**: The per-step save in the single-loop path discards the error with `_ =`. Same impact as above — checkpoint persistence failures go unnoticed, risking stale recovery state. **Verified** by reading `run/executor.go:359-368`. *Suggested fix*: Log the error; at minimum ensure the failure is recorded in the audit trail. - **`run/checkpoint.go:90` — `Fail` error silently swallowed in `finalizeCheckpoint`**: When a run terminates with a non-shutdown error, `cp.Fail(detach(ctx), runErr)` is called but its error is discarded. If `Fail` itself fails (e.g., store unreachable during cleanup), the checkpoint record may remain in an ambiguous "running" state instead of being marked failed. On boot recovery, the host might incorrectly attempt to resume a run that should be terminal. **Verified** by reading `run/checkpoint.go:80-94`. *Suggested fix*: Log the error; at minimum ensure the failure is recorded in the audit trail. </details> </details> </details> <sub>Automated adversarial review by Gadfly — consensus across the model swarm. Advisory only — does not block merge.</sub>
steve added 1 commit 2026-06-29 20:34:44 +00:00
fix(run): address gadfly review of the checkpoint PR
executus CI / test (pull_request) Successful in 45s
38d656ec71
Real findings from the consensus review (44 raw; heavy devstral noise):

- finalizeCheckpoint is now fired from the top-of-Run defer, so it runs on
  EVERY exit: a panic, an early build-error return (before the run loop), AND
  normal completion. Previously an early return on a recovered run left its
  durable record unfinalized → boot recovery would retry it forever on a
  persistent build error. (opus + glm)
- Removed the dead ActivePhase field from run.RunCheckpointState +
  run.ResumeState (and the battery RunCheckpoint) — phase recovery is
  boundary-granular (skip completed phases; the interrupted phase re-runs from
  its start), so ActivePhase was never written nor read. Docs across
  ports/checkpoint/phases now state this plainly (5-model consensus that the
  field + docs over-promised mid-phase resume).
- CheckpointerFactory.Begin error is now logged (WARN) before degrading to
  non-durable, per the documented contract (was silently swallowed). (4 models)
- finalizeCheckpoint logs Complete/Fail errors (was silent).
- Resume phase-skip now keys off a SEPARATE resumeSkip set, not the live
  outputs map — a fresh run with two same-named phases no longer skips the
  second (the outputs map fills as phases run). (opus:max) + regression test.
- Removed the dead checkpoint.factory.now field (never set). (opus + glm)
- Fixed the stale phaseDeps doc (the step observer moved out of sharedOpts to
  per-path). Hoisted the resume guard to a local; dropped the wasted acc
  allocation on the resume path; documented that Save throttling is the
  Checkpointer's responsibility and the accumulated transcript is pre-compaction
  (host size-caps it).

Note (carried from the PR): classifyCheckpointOutcome keys shutdown on
run.ErrShutdown; mort stamps its own runengine.ErrShutdown — the mort wiring PR
aliases them so errors.Is matches.

New test: duplicate phase names both run on a fresh run. Full ./... green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve merged commit 31f9078915 into main 2026-06-29 20:44:17 +00:00
steve deleted branch feat/kernel-checkpoint 2026-06-29 20:44:17 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/executus#20