Commit Graph

7 Commits

Author SHA1 Message Date
steve 5b5ee4148e feat(run): critic owns the deadline — MaxRuntime becomes the soft trigger
executus CI / test (pull_request) Successful in 47s
Adversarial Review (Gadfly) / review (pull_request) Successful in 23m4s
When a run enables the critic (Ports.Critic set + RunnableAgent.Critic.Enabled),
the kernel no longer hard-caps it at MaxRuntime. MaxRuntime becomes the SOFT
trigger (passed to startCritic, used by the host critic as its wake + the base
for its extendable backstop); the critic's deadline-watch is the real hard
cancel. This restores mort's old agentexec two-tier timeout semantics — a
slow-but-progressing run (e.g. a parent agent blocked on a 30-min animate render)
is given room up to the critic's backstop instead of being killed at the nominal
MaxRuntime.

Specifics:
- run/executor.go: the WithTimeout(MaxRuntime) is now conditional. Non-critic
  runs keep the literal MaxRuntime kill (→ "timeout"). Critic-owned runs get a
  GENEROUS WithTimeout at the new Defaults.CriticAbsoluteMax (default 6h) as a
  failsafe ceiling only — it never fires before the critic's backstop, and it
  guarantees a broken/nil host handle can't run unbounded.
- run/critic.go: startCritic takes the resolved MaxRuntime as the soft trigger
  (falling back to Defaults.CriticSoftTimeout, then 90s), instead of always using
  the global CriticSoftTimeout.
- Defaults.CriticAbsoluteMax added (withFallbacks default 6h).
- Tests: non-critic dies at MaxRuntime; critic-owned survives past it; soft
  trigger == MaxRuntime.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X
2026-06-30 11:03:40 -04:00
steve 4e179259de run: wire SessionToolFactory + PostRun artifacts + AttachImages
executus CI / test (pull_request) Successful in 1m49s
Adversarial Review (Gadfly) / review (pull_request) Successful in 5m19s
The session-tool TYPES already lived in tool/ (P4 move) but the executor never
used them. This wires them, unblocking artifact-producing host surfaces (mort's
chat API / chatbot / .skill / scaddy) to run on executus:

- run/session.go: steerMailbox (thread-safe message queue) + runSession
  (tool.AgentSession over it: AttachImages → a user-role multimodal message
  injected before the agent's next step) + runPostRun (panic-isolated hook call).
- executor: create the mailbox + set inv.AttachImages BEFORE the toolbox build;
  add inv.ExtraTools + a SessionToolFactory's per-run Tools to the toolbox; defer
  its Cleanup; merge the session mailbox with the critic's nudges into ONE
  WithSteer; after the run, call PostRun with the full transcript
  (runRes.Messages) → Result.PostRunResult (best-effort, never fails the run).
- run.Result += PostRunResult *tool.PostRunResult.
- dropped the now-dead criticBinding.steerOptions (superseded by drainSteer).

Tests: a factory whose PostRun emits an artifact from the output+transcript +
Cleanup lands on Result.PostRunResult; a factory-added tool is callable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 18:13:16 -04:00
steve 390e6cf905 run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status)
executus CI / test (pull_request) Successful in 46s
Adversarial Review (Gadfly) / review (pull_request) Successful in 22m30s
Completes the run-critic seam so a host adapter (mort's agentcritic) has full
fidelity, closing the two limitations gadfly surfaced on mort #1334.

- RecordStep(iter int, resp *llm.Response): the completed step's model response
  is now passed to the critic (was index-only), so a host that records a trace
  (mort's ProgressRecorder) can show what the agent actually produced, not just
  an iteration count. The executor forwards s.Response; the battery ignores it
  (its Progress is count-based).
- CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes
  an explicit critic KILL from a natural backstop expiry. runCtx uses a
  cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with
  DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when
  KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause →
  killed / timeout / cancelled are now distinct (were all "cancelled"). The
  battery sets killCause from Decision.KillReason on a Kill.

Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle
+ battery RecordStep/KillCause signatures. Core stays battery-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 16:35:13 -04:00
steve 4ba83ab905 run: critic can raise a run's step ceiling mid-flight (CriticHandle.MaxSteps)
executus CI / test (pull_request) Failing after 1m1s
Adversarial Review (Gadfly) / review (pull_request) Successful in 21m8s
Prerequisite for a full-fidelity mort agentcritic adapter (which adjusts a
healthy-but-long run's iteration budget, not just its deadline). executus's
CriticHandle was deadline+steer only; this adds the dynamic step ceiling above
an unchanged majordomo (which already exposes WithMaxStepsFunc).

- run.RunInfo += MaxIterations (the run's base ceiling, so a critic can raise it
  relative to the baseline).
- run.CriticHandle += MaxSteps() int — polled by the executor each step via
  agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses
  WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else WithMaxSteps.
- critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) +
  MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling
  alongside ExtendBy. ExtendOnce default is unchanged (time-only).

Test: a critic returning MaxSteps=5 lets a base-MaxIterations=1 run complete two
tool-dispatch steps past the base ceiling. Core stays battery-free (run doesn't
import critic).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:16:03 -04:00
steve 97154395e6 C0b: document recordToolStart post-iteration timing (gadfly glm finding)
executus CI / test (pull_request) Failing after 59s
executus CI / test (push) Failing after 1m1s
majordomo's step observer fires post-iteration, so the critic's activity clock
refreshes per-iteration, not mid-tool — a single long tool call won't refresh it
until it returns. Documented + the host-progress-bridge mitigation (mort's
pattern). A true pre-dispatch hook needs majordomo support (follow-up).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 10:10:56 -04:00
steve 4aa06f652e C0b: address verified gadfly findings (panic-safety + test honesty)
executus CI / test (pull_request) Failing after 58s
From PR #9 (minimax + deepseek):
- Run now has a top-level recover() — the "never propagates a panic" promise was
  unenforced; a panicking host Port (Critic/Audit/Palette) on the run goroutine
  now becomes Result.Err instead of unwinding into the caller.
- The critic deadline-watch goroutine recovers panics from a host Deadline()
  (it's a separate goroutine, so Run's recover can't catch it) — a buggy
  CriticHandle can't crash the process.
- CriticHandle interface documents its concurrency contract (Record*/Steer on the
  run goroutine vs Deadline()/Stop() from the watch goroutine — impls must be
  concurrent-safe; the critic battery already is).
- startCritic's dead `soft <= 0 -> noop` guard (withFallbacks already coerces to
  90s) replaced with a defensive inline 90s default, so a bypass of withFallbacks
  still gets a working critic instead of silently none.
- Delivery tests made honest: the old "error path" test only checked the
  early-return (no delivery); added TestDeliverErrorOnRunFailure (in-loop model
  error -> DeliverError to the target) + renamed the early-return test.

Graded all #9 findings in the gadfly MCP.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 10:09:22 -04:00
steve 43b2471737 C0b: wire Critic + Delivery into run.Executor
executus CI / test (pull_request) Failing after 1m0s
Adversarial Review (Gadfly) / review (pull_request) Successful in 5m9s
Continues finishing the executor's run.Ports wiring (after C0's Palette).

Critic (run/critic.go): when Ports.Critic is set and the agent enables it, the
executor calls Monitor at run start, feeds RecordStep/RecordToolStart from the
step observer, drains the critic's Steer messages into the loop via
agent.WithSteer, and binds the run's hard cancellation to the critic's
(extendable) Deadline through a watch goroutine — a healthy-but-slow run gets
room while a hung one is killed. Stop() on run end. Soft timeout from
Defaults.CriticSoftTimeout (default 90s). nil-safe: no critic / not-enabled =
no-op.

Delivery (run/executor.go deliver): after the run, when Ports.Delivery is set
and inv.DeliveryID is non-empty, the executor posts Result.Output (or
DeliverError on failure) to a host-interpreted deliver.Target
{inv.DeliveryKind, inv.DeliveryID}. Empty target = caller reads Result.Output
itself (the synchronous default; the `.agent run` canary). Best-effort +
detached.

tool.Invocation gains DeliveryKind/DeliveryID (host-set egress target).

Tests: critic monitored/fed/steered/stopped when enabled, untouched when not;
delivery posts on a target, skips without one. Deferred: Checkpointer (needs a
majordomo hook to snapshot the running message history).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 10:00:05 -04:00