executus

Author	SHA1	Message	Date
steve	cb4c612461	fix(run): address gadfly review of the critic-deadline PR executus CI / test (pull_request) Successful in 1m45s Details All 11 findings were real (3 clusters): - Failsafe ceiling could pre-empt the critic's backstop (e9c9483f, 9109317b, d5a9bf0d, 76ad171e): CriticAbsoluteMax was 6h, but the host's backstop (MaxRuntime × multiplier, or its own absolute max) can reach 6h+, so the ceiling fired first and reintroduced a premature hard cap. Now CriticAbsoluteMax is a 24h RUNAWAY guard set far beyond any realistic backstop (the host clamps its own backstop to a much smaller absolute max, e.g. mort's 6h convar), so it never pre-empts a healthy supervised run. Comments corrected. - nil Monitor handle lost the MaxRuntime cap (df016a6f, 9dd42827): a critic-enabled run whose host Monitor returned no handle had no deadline-watch and was bounded only by the generous ceiling. Added an unsupervised-run failsafe that re-wraps runCtx to the nominal MaxRuntime when the critic is enabled but didn't arm. New test TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime. - CriticSoftTimeout vestigial / dead fallback (f7764919, 9805bebe, 6864086f, b2b11721): the soft trigger is now always the resolved MaxRuntime (> 0), so the CriticSoftTimeout field + its startCritic fallback were unreachable. Removed the field entirely; the remaining 90s floor is documented as defensive-only. - DRY (f30ce827): extracted e.criticOwnsDeadline(ra), now the single predicate used by both Run and startCritic so they can't drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X	2026-06-30 11:32:46 -04:00
steve	5b5ee4148e	feat(run): critic owns the deadline — MaxRuntime becomes the soft trigger executus CI / test (pull_request) Successful in 47s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 23m4s Details When a run enables the critic (Ports.Critic set + RunnableAgent.Critic.Enabled), the kernel no longer hard-caps it at MaxRuntime. MaxRuntime becomes the SOFT trigger (passed to startCritic, used by the host critic as its wake + the base for its extendable backstop); the critic's deadline-watch is the real hard cancel. This restores mort's old agentexec two-tier timeout semantics — a slow-but-progressing run (e.g. a parent agent blocked on a 30-min animate render) is given room up to the critic's backstop instead of being killed at the nominal MaxRuntime. Specifics: - run/executor.go: the WithTimeout(MaxRuntime) is now conditional. Non-critic runs keep the literal MaxRuntime kill (→ "timeout"). Critic-owned runs get a GENEROUS WithTimeout at the new Defaults.CriticAbsoluteMax (default 6h) as a failsafe ceiling only — it never fires before the critic's backstop, and it guarantees a broken/nil host handle can't run unbounded. - run/critic.go: startCritic takes the resolved MaxRuntime as the soft trigger (falling back to Defaults.CriticSoftTimeout, then 90s), instead of always using the global CriticSoftTimeout. - Defaults.CriticAbsoluteMax added (withFallbacks default 6h). - Tests: non-critic dies at MaxRuntime; critic-owned survives past it; soft trigger == MaxRuntime. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X	2026-06-30 11:03:40 -04:00
steve	4e179259de	run: wire SessionToolFactory + PostRun artifacts + AttachImages executus CI / test (pull_request) Successful in 1m49s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 5m19s Details The session-tool TYPES already lived in tool/ (P4 move) but the executor never used them. This wires them, unblocking artifact-producing host surfaces (mort's chat API / chatbot / .skill / scaddy) to run on executus: - run/session.go: steerMailbox (thread-safe message queue) + runSession (tool.AgentSession over it: AttachImages → a user-role multimodal message injected before the agent's next step) + runPostRun (panic-isolated hook call). - executor: create the mailbox + set inv.AttachImages BEFORE the toolbox build; add inv.ExtraTools + a SessionToolFactory's per-run Tools to the toolbox; defer its Cleanup; merge the session mailbox with the critic's nudges into ONE WithSteer; after the run, call PostRun with the full transcript (runRes.Messages) → Result.PostRunResult (best-effort, never fails the run). - run.Result += PostRunResult *tool.PostRunResult. - dropped the now-dead criticBinding.steerOptions (superseded by drainSteer). Tests: a factory whose PostRun emits an artifact from the output+transcript + Cleanup lands on Result.PostRunResult; a factory-added tool is callable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 18:13:16 -04:00
steve	390e6cf905	run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status) executus CI / test (pull_request) Successful in 46s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 22m30s Details Completes the run-critic seam so a host adapter (mort's agentcritic) has full fidelity, closing the two limitations gadfly surfaced on mort #1334. - RecordStep(iter int, resp *llm.Response): the completed step's model response is now passed to the critic (was index-only), so a host that records a trace (mort's ProgressRecorder) can show what the agent actually produced, not just an iteration count. The executor forwards s.Response; the battery ignores it (its Progress is count-based). - CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes an explicit critic KILL from a natural backstop expiry. runCtx uses a cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause → killed / timeout / cancelled are now distinct (were all "cancelled"). The battery sets killCause from Decision.KillReason on a Kill. Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle + battery RecordStep/KillCause signatures. Core stays battery-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 16:35:13 -04:00
steve	4ba83ab905	run: critic can raise a run's step ceiling mid-flight (CriticHandle.MaxSteps) executus CI / test (pull_request) Failing after 1m1s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 21m8s Details Prerequisite for a full-fidelity mort agentcritic adapter (which adjusts a healthy-but-long run's iteration budget, not just its deadline). executus's CriticHandle was deadline+steer only; this adds the dynamic step ceiling above an unchanged majordomo (which already exposes WithMaxStepsFunc). - run.RunInfo += MaxIterations (the run's base ceiling, so a critic can raise it relative to the baseline). - run.CriticHandle += MaxSteps() int — polled by the executor each step via agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else WithMaxSteps. - critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) + MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling alongside ExtendBy. ExtendOnce default is unchanged (time-only). Test: a critic returning MaxSteps=5 lets a base-MaxIterations=1 run complete two tool-dispatch steps past the base ceiling. Core stays battery-free (run doesn't import critic). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:16:03 -04:00
steve	97154395e6	C0b: document recordToolStart post-iteration timing (gadfly glm finding) executus CI / test (pull_request) Failing after 59s Details executus CI / test (push) Failing after 1m1s Details majordomo's step observer fires post-iteration, so the critic's activity clock refreshes per-iteration, not mid-tool — a single long tool call won't refresh it until it returns. Documented + the host-progress-bridge mitigation (mort's pattern). A true pre-dispatch hook needs majordomo support (follow-up). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 10:10:56 -04:00
steve	4aa06f652e	C0b: address verified gadfly findings (panic-safety + test honesty) executus CI / test (pull_request) Failing after 58s Details From PR #9 (minimax + deepseek): - Run now has a top-level recover() — the "never propagates a panic" promise was unenforced; a panicking host Port (Critic/Audit/Palette) on the run goroutine now becomes Result.Err instead of unwinding into the caller. - The critic deadline-watch goroutine recovers panics from a host Deadline() (it's a separate goroutine, so Run's recover can't catch it) — a buggy CriticHandle can't crash the process. - CriticHandle interface documents its concurrency contract (Record*/Steer on the run goroutine vs Deadline()/Stop() from the watch goroutine — impls must be concurrent-safe; the critic battery already is). - startCritic's dead `soft <= 0 -> noop` guard (withFallbacks already coerces to 90s) replaced with a defensive inline 90s default, so a bypass of withFallbacks still gets a working critic instead of silently none. - Delivery tests made honest: the old "error path" test only checked the early-return (no delivery); added TestDeliverErrorOnRunFailure (in-loop model error -> DeliverError to the target) + renamed the early-return test. Graded all #9 findings in the gadfly MCP. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 10:09:22 -04:00
steve	43b2471737	C0b: wire Critic + Delivery into run.Executor executus CI / test (pull_request) Failing after 1m0s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 5m9s Details Continues finishing the executor's run.Ports wiring (after C0's Palette). Critic (run/critic.go): when Ports.Critic is set and the agent enables it, the executor calls Monitor at run start, feeds RecordStep/RecordToolStart from the step observer, drains the critic's Steer messages into the loop via agent.WithSteer, and binds the run's hard cancellation to the critic's (extendable) Deadline through a watch goroutine — a healthy-but-slow run gets room while a hung one is killed. Stop() on run end. Soft timeout from Defaults.CriticSoftTimeout (default 90s). nil-safe: no critic / not-enabled = no-op. Delivery (run/executor.go deliver): after the run, when Ports.Delivery is set and inv.DeliveryID is non-empty, the executor posts Result.Output (or DeliverError on failure) to a host-interpreted deliver.Target {inv.DeliveryKind, inv.DeliveryID}. Empty target = caller reads Result.Output itself (the synchronous default; the `.agent run` canary). Best-effort + detached. tool.Invocation gains DeliveryKind/DeliveryID (host-set egress target). Tests: critic monitored/fed/steered/stopped when enabled, untouched when not; delivery posts on a target, skips without one. Deferred: Checkpointer (needs a majordomo hook to snapshot the running message history). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 10:00:05 -04:00

8 Commits