executus

Author	SHA1	Message	Date
steve	38d656ec71	fix(run): address gadfly review of the checkpoint PR executus CI / test (pull_request) Successful in 45s Details Real findings from the consensus review (44 raw; heavy devstral noise): - finalizeCheckpoint is now fired from the top-of-Run defer, so it runs on EVERY exit: a panic, an early build-error return (before the run loop), AND normal completion. Previously an early return on a recovered run left its durable record unfinalized → boot recovery would retry it forever on a persistent build error. (opus + glm) - Removed the dead ActivePhase field from run.RunCheckpointState + run.ResumeState (and the battery RunCheckpoint) — phase recovery is boundary-granular (skip completed phases; the interrupted phase re-runs from its start), so ActivePhase was never written nor read. Docs across ports/checkpoint/phases now state this plainly (5-model consensus that the field + docs over-promised mid-phase resume). - CheckpointerFactory.Begin error is now logged (WARN) before degrading to non-durable, per the documented contract (was silently swallowed). (4 models) - finalizeCheckpoint logs Complete/Fail errors (was silent). - Resume phase-skip now keys off a SEPARATE resumeSkip set, not the live outputs map — a fresh run with two same-named phases no longer skips the second (the outputs map fills as phases run). (opus:max) + regression test. - Removed the dead checkpoint.factory.now field (never set). (opus + glm) - Fixed the stale phaseDeps doc (the step observer moved out of sharedOpts to per-path). Hoisted the resume guard to a local; dropped the wasted acc allocation on the resume path; documented that Save throttling is the Checkpointer's responsibility and the accumulated transcript is pre-compaction (host size-caps it). Note (carried from the PR): classifyCheckpointOutcome keys shutdown on run.ErrShutdown; mort stamps its own runengine.ErrShutdown — the mort wiring PR aliases them so errors.Is matches. New test: duplicate phase names both run on a fresh run. Full ./... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 16:34:42 -04:00
steve	899059a791	feat(run): durable checkpoint + resume (wire Ports.Checkpointer) executus CI / test (pull_request) Successful in 46s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 17m25s Details The kernel defined run.Ports.Checkpointer + the checkpoint battery but never drove them (the documented "P2 follow-up"). This wires durable recovery into the run loop so a run interrupted by shutdown can resume on the next boot instead of being lost — the executus-side half of mort's durable-agent-recovery parity (mort #1355). Kernel (run/): - Ports.Checkpointer is now a CheckpointerFactory (Begin per run → a per-run Checkpointer, or nil for a non-durable run). The single per-instance Checkpointer couldn't distinguish runs; a factory mints one per run, matching mort's agentexec.CheckpointerFactory. - RunInfo gains GuildID + ModelTier (so the factory can build resume meta); RunCheckpointState gains CompletedPhases + ActivePhase (+ PhaseOutput). - run/checkpoint.go: ResumeState + WithResumeState / WithExistingCheckpointer context carriers, classifyCheckpointOutcome (success→Complete, shutdown→leave for boot recovery, else→Fail using run.ErrShutdown), and finalizeCheckpoint. - run/executor.go: resolve the per-run checkpointer (existing-from-ctx on a recovery re-run, else factory.Begin); single-loop wraps the step observer to accumulate the transcript + Save each step (host throttles), and a recovered run seeds the saved transcript via WithHistory and continues with no new input; finalize on exit. - run/phases.go: phase-boundary checkpointing — record completed phases after each phase; a resumed run skips already-completed phases (the interrupted phase re-runs from its start — boundary-granular, documented; only the single-loop path resumes mid-loop). Battery (checkpoint/): NewFactory wires the battery into the factory port (per-run handle, meta derived from RunInfo); RunCheckpoint + handle.Save carry the phase fields. Tests (run/checkpoint_test.go): the finalize decision matrix; single-loop Save+Complete; terminal-error Fail; resume seeds history; phase-boundary Saves completed phases; resume skips completed phases. Full ./... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 16:04:06 -04:00
steve	2ef88f2a73	feat(run): InputFileStager seam — stage non-image attachments into the prompt Adversarial Review (Gadfly) / review (pull_request) Has been cancelled Details executus CI / test (pull_request) Successful in 2m21s Details executus's tool.Invocation already carried InputFiles (audio/PDF/binary), but the executor never staged them — only Images were folded into the run. This adds the host seam mort's chat/chatbot surfaces need for audio-input parity with agentexec. - run.Ports gains InputFiles InputFileStager (nil-safe; nil = input files silently ignored, run still proceeds text-only). The interface mirrors mort's skill FileStorage: StageInputFile(ctx, runID, agentID, name, mime, content) → file_id. - run/input_files.go (ported from mort agentexec/input_files.go): stageInputFiles persists each file under run scope and appends an [ATTACHED FILES] descriptor block to the prompt so the agent can reach them by file_id (e.g. code_exec files_in → /workspace/<name>). Bytes are NEVER inlined into model context. Best-effort: empty/oversized(>50MB)/save-error files are skipped; colliding base names are disambiguated (name-2, name-3) so they don't clobber at /workspace/<name>. - Executor.Run calls it after the model/toolbox build, before the loop, so the descriptor rides the first user turn (alongside the existing Images folding). Tests: stages + builds the block; nil stager / no files leave the prompt intact; dedup; empty/save-error skipping. Full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 13:02:55 -04:00
steve	390e6cf905	run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status) executus CI / test (pull_request) Successful in 46s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 22m30s Details Completes the run-critic seam so a host adapter (mort's agentcritic) has full fidelity, closing the two limitations gadfly surfaced on mort #1334. - RecordStep(iter int, resp *llm.Response): the completed step's model response is now passed to the critic (was index-only), so a host that records a trace (mort's ProgressRecorder) can show what the agent actually produced, not just an iteration count. The executor forwards s.Response; the battery ignores it (its Progress is count-based). - CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes an explicit critic KILL from a natural backstop expiry. runCtx uses a cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause → killed / timeout / cancelled are now distinct (were all "cancelled"). The battery sets killCause from Decision.KillReason on a Kill. Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle + battery RecordStep/KillCause signatures. Core stays battery-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 16:35:13 -04:00
steve	4ba83ab905	run: critic can raise a run's step ceiling mid-flight (CriticHandle.MaxSteps) executus CI / test (pull_request) Failing after 1m1s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 21m8s Details Prerequisite for a full-fidelity mort agentcritic adapter (which adjusts a healthy-but-long run's iteration budget, not just its deadline). executus's CriticHandle was deadline+steer only; this adds the dynamic step ceiling above an unchanged majordomo (which already exposes WithMaxStepsFunc). - run.RunInfo += MaxIterations (the run's base ceiling, so a critic can raise it relative to the baseline). - run.CriticHandle += MaxSteps() int — polled by the executor each step via agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else WithMaxSteps. - critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) + MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling alongside ExtendBy. ExtendOnce default is unchanged (time-only). Test: a critic returning MaxSteps=5 lets a base-MaxIterations=1 run complete two tool-dispatch steps past the base ceiling. Core stays battery-free (run doesn't import critic). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:16:03 -04:00
steve	4aa06f652e	C0b: address verified gadfly findings (panic-safety + test honesty) executus CI / test (pull_request) Failing after 58s Details From PR #9 (minimax + deepseek): - Run now has a top-level recover() — the "never propagates a panic" promise was unenforced; a panicking host Port (Critic/Audit/Palette) on the run goroutine now becomes Result.Err instead of unwinding into the caller. - The critic deadline-watch goroutine recovers panics from a host Deadline() (it's a separate goroutine, so Run's recover can't catch it) — a buggy CriticHandle can't crash the process. - CriticHandle interface documents its concurrency contract (Record*/Steer on the run goroutine vs Deadline()/Stop() from the watch goroutine — impls must be concurrent-safe; the critic battery already is). - startCritic's dead `soft <= 0 -> noop` guard (withFallbacks already coerces to 90s) replaced with a defensive inline 90s default, so a bypass of withFallbacks still gets a working critic instead of silently none. - Delivery tests made honest: the old "error path" test only checked the early-return (no delivery); added TestDeliverErrorOnRunFailure (in-loop model error -> DeliverError to the target) + renamed the early-return test. Graded all #9 findings in the gadfly MCP. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 10:09:22 -04:00
steve	c732a677df	P2: define nil-safe run.Ports (the inversion spine) Add run/ports.go: the host seams the executor will consume, every one nil-safe so a light host runs with the zero Ports (no persistence/audit/ budget/critic/delegation/delivery) and a heavy host wires each to a battery. Ports mirror mort's existing interfaces so the batteries implement them directly: - Audit + RunRecorder (mort skillaudit.Storage/Writer): StartRun -> per-run recorder (OnStep/OnTool/LogEvent/Close), recorder satisfies RunTally. - Budget (mort skillexec.BudgetTracker): Check / Commit. - Critic + CriticHandle (mort agentcritic): Monitor -> handle with RecordStep/RecordToolStart/Steer/Deadline/Stop (the loop wiring finalizes with the executor merge). - Checkpointer (mort agentexec.RunCheckpointer): Save/Complete/Fail. - PaletteSource (mort SkillInvokerForPalette + AgentInvokerForPalette): Resolve/Invoke skill + agent delegation. Plus host-neutral RunInfo / RunStats. This completes the P2 inversion DESIGN; the agentexec+skillexec -> run.Executor merge that consumes these Ports is the remaining P2 work. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 02:02:21 +00:00

7 Commits