Real findings from the consensus review (44 raw; heavy devstral noise):
- finalizeCheckpoint is now fired from the top-of-Run defer, so it runs on
EVERY exit: a panic, an early build-error return (before the run loop), AND
normal completion. Previously an early return on a recovered run left its
durable record unfinalized → boot recovery would retry it forever on a
persistent build error. (opus + glm)
- Removed the dead ActivePhase field from run.RunCheckpointState +
run.ResumeState (and the battery RunCheckpoint) — phase recovery is
boundary-granular (skip completed phases; the interrupted phase re-runs from
its start), so ActivePhase was never written nor read. Docs across
ports/checkpoint/phases now state this plainly (5-model consensus that the
field + docs over-promised mid-phase resume).
- CheckpointerFactory.Begin error is now logged (WARN) before degrading to
non-durable, per the documented contract (was silently swallowed). (4 models)
- finalizeCheckpoint logs Complete/Fail errors (was silent).
- Resume phase-skip now keys off a SEPARATE resumeSkip set, not the live
outputs map — a fresh run with two same-named phases no longer skips the
second (the outputs map fills as phases run). (opus:max) + regression test.
- Removed the dead checkpoint.factory.now field (never set). (opus + glm)
- Fixed the stale phaseDeps doc (the step observer moved out of sharedOpts to
per-path). Hoisted the resume guard to a local; dropped the wasted acc
allocation on the resume path; documented that Save throttling is the
Checkpointer's responsibility and the accumulated transcript is pre-compaction
(host size-caps it).
Note (carried from the PR): classifyCheckpointOutcome keys shutdown on
run.ErrShutdown; mort stamps its own runengine.ErrShutdown — the mort wiring PR
aliases them so errors.Is matches.
New test: duplicate phase names both run on a fresh run. Full ./... green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The kernel defined run.Ports.Checkpointer + the checkpoint battery but never
drove them (the documented "P2 follow-up"). This wires durable recovery into
the run loop so a run interrupted by shutdown can resume on the next boot
instead of being lost — the executus-side half of mort's durable-agent-recovery
parity (mort #1355).
Kernel (run/):
- Ports.Checkpointer is now a CheckpointerFactory (Begin per run → a per-run
Checkpointer, or nil for a non-durable run). The single per-instance
Checkpointer couldn't distinguish runs; a factory mints one per run, matching
mort's agentexec.CheckpointerFactory.
- RunInfo gains GuildID + ModelTier (so the factory can build resume meta);
RunCheckpointState gains CompletedPhases + ActivePhase (+ PhaseOutput).
- run/checkpoint.go: ResumeState + WithResumeState / WithExistingCheckpointer
context carriers, classifyCheckpointOutcome (success→Complete, shutdown→leave
for boot recovery, else→Fail using run.ErrShutdown), and finalizeCheckpoint.
- run/executor.go: resolve the per-run checkpointer (existing-from-ctx on a
recovery re-run, else factory.Begin); single-loop wraps the step observer to
accumulate the transcript + Save each step (host throttles), and a recovered
run seeds the saved transcript via WithHistory and continues with no new
input; finalize on exit.
- run/phases.go: phase-boundary checkpointing — record completed phases after
each phase; a resumed run skips already-completed phases (the interrupted
phase re-runs from its start — boundary-granular, documented; only the
single-loop path resumes mid-loop).
Battery (checkpoint/): NewFactory wires the battery into the factory port
(per-run handle, meta derived from RunInfo); RunCheckpoint + handle.Save carry
the phase fields.
Tests (run/checkpoint_test.go): the finalize decision matrix; single-loop
Save+Complete; terminal-error Fail; resume seeds history; phase-boundary Saves
completed phases; resume skips completed phases. Full ./... green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
executus's tool.Invocation already carried InputFiles (audio/PDF/binary), but the
executor never staged them — only Images were folded into the run. This adds the
host seam mort's chat/chatbot surfaces need for audio-input parity with agentexec.
- run.Ports gains InputFiles InputFileStager (nil-safe; nil = input files silently
ignored, run still proceeds text-only). The interface mirrors mort's skill
FileStorage: StageInputFile(ctx, runID, agentID, name, mime, content) → file_id.
- run/input_files.go (ported from mort agentexec/input_files.go): stageInputFiles
persists each file under run scope and appends an [ATTACHED FILES] descriptor
block to the prompt so the agent can reach them by file_id (e.g. code_exec
files_in → /workspace/<name>). Bytes are NEVER inlined into model context.
Best-effort: empty/oversized(>50MB)/save-error files are skipped; colliding
base names are disambiguated (name-2, name-3) so they don't clobber at
/workspace/<name>.
- Executor.Run calls it after the model/toolbox build, before the loop, so the
descriptor rides the first user turn (alongside the existing Images folding).
Tests: stages + builds the block; nil stager / no files leave the prompt intact;
dedup; empty/save-error skipping. Full suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the run-critic seam so a host adapter (mort's agentcritic) has full
fidelity, closing the two limitations gadfly surfaced on mort #1334.
- RecordStep(iter int, resp *llm.Response): the completed step's model response
is now passed to the critic (was index-only), so a host that records a trace
(mort's ProgressRecorder) can show what the agent actually produced, not just
an iteration count. The executor forwards s.Response; the battery ignores it
(its Progress is count-based).
- CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes
an explicit critic KILL from a natural backstop expiry. runCtx uses a
cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with
DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when
KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause →
killed / timeout / cancelled are now distinct (were all "cancelled"). The
battery sets killCause from Decision.KillReason on a Kill.
Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle
+ battery RecordStep/KillCause signatures. Core stays battery-free.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prerequisite for a full-fidelity mort agentcritic adapter (which adjusts a
healthy-but-long run's iteration budget, not just its deadline). executus's
CriticHandle was deadline+steer only; this adds the dynamic step ceiling above
an unchanged majordomo (which already exposes WithMaxStepsFunc).
- run.RunInfo += MaxIterations (the run's base ceiling, so a critic can raise it
relative to the baseline).
- run.CriticHandle += MaxSteps() int — polled by the executor each step via
agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses
WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else WithMaxSteps.
- critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) +
MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling
alongside ExtendBy. ExtendOnce default is unchanged (time-only).
Test: a critic returning MaxSteps=5 lets a base-MaxIterations=1 run complete two
tool-dispatch steps past the base ceiling. Core stays battery-free (run doesn't
import critic).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From PR #9 (minimax + deepseek):
- Run now has a top-level recover() — the "never propagates a panic" promise was
unenforced; a panicking host Port (Critic/Audit/Palette) on the run goroutine
now becomes Result.Err instead of unwinding into the caller.
- The critic deadline-watch goroutine recovers panics from a host Deadline()
(it's a separate goroutine, so Run's recover can't catch it) — a buggy
CriticHandle can't crash the process.
- CriticHandle interface documents its concurrency contract (Record*/Steer on the
run goroutine vs Deadline()/Stop() from the watch goroutine — impls must be
concurrent-safe; the critic battery already is).
- startCritic's dead `soft <= 0 -> noop` guard (withFallbacks already coerces to
90s) replaced with a defensive inline 90s default, so a bypass of withFallbacks
still gets a working critic instead of silently none.
- Delivery tests made honest: the old "error path" test only checked the
early-return (no delivery); added TestDeliverErrorOnRunFailure (in-loop model
error -> DeliverError to the target) + renamed the early-return test.
Graded all #9 findings in the gadfly MCP.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add run/ports.go: the host seams the executor will consume, every one
nil-safe so a light host runs with the zero Ports (no persistence/audit/
budget/critic/delegation/delivery) and a heavy host wires each to a battery.
Ports mirror mort's existing interfaces so the batteries implement them
directly:
- Audit + RunRecorder (mort skillaudit.Storage/Writer): StartRun -> per-run
recorder (OnStep/OnTool/LogEvent/Close), recorder satisfies RunTally.
- Budget (mort skillexec.BudgetTracker): Check / Commit.
- Critic + CriticHandle (mort agentcritic): Monitor -> handle with
RecordStep/RecordToolStart/Steer/Deadline/Stop (the loop wiring finalizes
with the executor merge).
- Checkpointer (mort agentexec.RunCheckpointer): Save/Complete/Fail.
- PaletteSource (mort SkillInvokerForPalette + AgentInvokerForPalette):
Resolve/Invoke skill + agent delegation.
Plus host-neutral RunInfo / RunStats.
This completes the P2 inversion DESIGN; the agentexec+skillexec ->
run.Executor merge that consumes these Ports is the remaining P2 work.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>