feat(run): critic owns the deadline — MaxRuntime becomes the soft trigger
When a run enables the critic (Ports.Critic set + RunnableAgent.Critic.Enabled), the kernel no longer hard-caps it at MaxRuntime. MaxRuntime becomes the SOFT trigger (passed to startCritic, used by the host critic as its wake + the base for its extendable backstop); the critic's deadline-watch is the real hard cancel. This restores mort's old agentexec two-tier timeout semantics — a slow-but-progressing run (e.g. a parent agent blocked on a 30-min animate render) is given room up to the critic's backstop instead of being killed at the nominal MaxRuntime. Specifics: - run/executor.go: the WithTimeout(MaxRuntime) is now conditional. Non-critic runs keep the literal MaxRuntime kill (→ "timeout"). Critic-owned runs get a GENEROUS WithTimeout at the new Defaults.CriticAbsoluteMax (default 6h) as a failsafe ceiling only — it never fires before the critic's backstop, and it guarantees a broken/nil host handle can't run unbounded. - run/critic.go: startCritic takes the resolved MaxRuntime as the soft trigger (falling back to Defaults.CriticSoftTimeout, then 90s), instead of always using the global CriticSoftTimeout. - Defaults.CriticAbsoluteMax added (withFallbacks default 6h). - Tests: non-critic dies at MaxRuntime; critic-owned survives past it; soft trigger == MaxRuntime. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X
This commit is contained in:
+45
-15
@@ -30,6 +30,13 @@ type Defaults struct {
|
||||
MaxSameToolCallRepeats int // retry-storm guard; default 3
|
||||
CompactionThresholdRatio float64 // fraction of model context to compact at; default 0.7
|
||||
CriticSoftTimeout time.Duration // idle window before the critic wakes; default 90s
|
||||
// CriticAbsoluteMax is the failsafe wall-clock ceiling for a critic-OWNED run
|
||||
// (Ports.Critic set AND the agent enables it). For such a run MaxRuntime is the
|
||||
// SOFT trigger, not a hard cap, and the critic's extendable backstop is the
|
||||
// normal deadline — so this ceiling only fires if the critic never acts (a
|
||||
// broken/nil host handle). Default 6h; never shorter than the run's MaxRuntime.
|
||||
// Non-critic runs ignore it (they keep the literal MaxRuntime kill).
|
||||
CriticAbsoluteMax time.Duration
|
||||
}
|
||||
|
||||
func (d Defaults) withFallbacks() Defaults {
|
||||
@@ -54,6 +61,9 @@ func (d Defaults) withFallbacks() Defaults {
|
||||
if d.CriticSoftTimeout <= 0 {
|
||||
d.CriticSoftTimeout = 90 * time.Second
|
||||
}
|
||||
if d.CriticAbsoluteMax <= 0 {
|
||||
d.CriticAbsoluteMax = 6 * time.Hour
|
||||
}
|
||||
return d
|
||||
}
|
||||
|
||||
@@ -265,18 +275,36 @@ func (e *Executor) Run(ctx context.Context, ra RunnableAgent, inv tool.Invocatio
|
||||
postRun = st.PostRun
|
||||
}
|
||||
|
||||
// Run context: bound by MaxRuntime, detached from the caller's deadline so a
|
||||
// lane/queue wait doesn't eat the run budget (mort's V10 lesson). Caller
|
||||
// cancellation still propagates via MergeCancellation. Created BEFORE the
|
||||
// step observer so the observer forwards the merged run context (not a
|
||||
// possibly-cancelled caller ctx) to OnStep consumers.
|
||||
// MaxRuntime stays a WithTimeout so its DeadlineExceeded propagates through the
|
||||
// child chain (→ "timeout"), preserving the run's-own-timeout vs caller-cancel
|
||||
// distinction. A NESTED cause-carrying layer lets a critic kill surface as a
|
||||
// distinct "killed" without disturbing that: only an ErrCriticKill cause is
|
||||
// consulted in statusFor; a generic run error or a caller cancel is classified
|
||||
// by the run error itself.
|
||||
timeoutCtx, cancelTimeout := context.WithTimeout(context.WithoutCancel(ctx), maxRuntime)
|
||||
// Run context: detached from the caller's deadline so a lane/queue wait doesn't
|
||||
// eat the run budget (mort's V10 lesson). Caller cancellation still propagates
|
||||
// via MergeCancellation. Created BEFORE the step observer so the observer
|
||||
// forwards the merged run context (not a possibly-cancelled caller ctx) to
|
||||
// OnStep consumers.
|
||||
//
|
||||
// Two-tier timeout: who owns the hard deadline depends on the critic.
|
||||
// - NO critic (the default): MaxRuntime is a literal WithTimeout. Its
|
||||
// DeadlineExceeded propagates through the child chain (→ "timeout"),
|
||||
// preserving the run's-own-timeout vs caller-cancel distinction.
|
||||
// - critic OWNS the deadline (Ports.Critic set + ra.Critic.Enabled):
|
||||
// MaxRuntime becomes the SOFT trigger (passed to startCritic), and the
|
||||
// critic's extendable backstop — watched in startCritic, which cancels via
|
||||
// cancelCause — is the real deadline. A slow-but-progressing run is given
|
||||
// room up to the backstop; only a stalled one is killed. We still wrap a
|
||||
// GENEROUS WithTimeout at CriticAbsoluteMax so a broken/nil critic handle
|
||||
// can't run unbounded; that ceiling never fires before the critic's backstop.
|
||||
// A NESTED cause-carrying layer (cancelCause) lets a critic kill surface as a
|
||||
// distinct "killed": only an ErrCriticKill cause is consulted in statusFor; a
|
||||
// generic run error, a backstop expiry, or a caller cancel is classified by the
|
||||
// run error itself.
|
||||
criticOwnsDeadline := e.cfg.Ports.Critic != nil && ra.Critic.Enabled
|
||||
hardCap := maxRuntime
|
||||
if criticOwnsDeadline {
|
||||
hardCap = e.cfg.Defaults.CriticAbsoluteMax
|
||||
if hardCap < maxRuntime {
|
||||
hardCap = maxRuntime // the failsafe ceiling is never shorter than the nominal budget
|
||||
}
|
||||
}
|
||||
timeoutCtx, cancelTimeout := context.WithTimeout(context.WithoutCancel(ctx), hardCap)
|
||||
defer cancelTimeout()
|
||||
runCtx, cancelCause := context.WithCancelCause(timeoutCtx)
|
||||
defer cancelCause(nil)
|
||||
@@ -287,9 +315,11 @@ func (e *Executor) Run(ctx context.Context, ra RunnableAgent, inv tool.Invocatio
|
||||
checkpointCause = func() error { return context.Cause(runCtx) }
|
||||
|
||||
// Critic (optional): monitors the run for a stall, can nudge/extend/kill via
|
||||
// its host Escalator. Its hard deadline is bound to runCtx (cancel on pass).
|
||||
// nil-safe: no-op when no critic is configured or the agent doesn't enable it.
|
||||
critic, stopCritic := e.startCritic(runCtx, cancelCause, ra, info)
|
||||
// its host Escalator. When it owns the deadline, MaxRuntime is its soft trigger
|
||||
// (so a slow-but-progressing run survives past it); its extendable backstop is
|
||||
// bound to runCtx (cancel on pass). nil-safe: no-op when no critic is configured
|
||||
// or the agent doesn't enable it.
|
||||
critic, stopCritic := e.startCritic(runCtx, cancelCause, ra, info, maxRuntime)
|
||||
defer stopCritic()
|
||||
|
||||
// Step instrumentation: accumulate Result.Steps + fire inv.OnStep, feed the
|
||||
|
||||
Reference in New Issue
Block a user