fix(run): address gadfly review of the critic-deadline PR
executus CI / test (pull_request) Successful in 1m45s
executus CI / test (pull_request) Successful in 1m45s
All 11 findings were real (3 clusters): - Failsafe ceiling could pre-empt the critic's backstop (e9c9483f, 9109317b, d5a9bf0d, 76ad171e): CriticAbsoluteMax was 6h, but the host's backstop (MaxRuntime × multiplier, or its own absolute max) can reach 6h+, so the ceiling fired first and reintroduced a premature hard cap. Now CriticAbsoluteMax is a 24h RUNAWAY guard set far beyond any realistic backstop (the host clamps its own backstop to a much smaller absolute max, e.g. mort's 6h convar), so it never pre-empts a healthy supervised run. Comments corrected. - nil Monitor handle lost the MaxRuntime cap (df016a6f, 9dd42827): a critic-enabled run whose host Monitor returned no handle had no deadline-watch and was bounded only by the generous ceiling. Added an unsupervised-run failsafe that re-wraps runCtx to the nominal MaxRuntime when the critic is enabled but didn't arm. New test TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime. - CriticSoftTimeout vestigial / dead fallback (f7764919, 9805bebe, 6864086f, b2b11721): the soft trigger is now always the resolved MaxRuntime (> 0), so the CriticSoftTimeout field + its startCritic fallback were unreachable. Removed the field entirely; the remaining 90s floor is documented as defensive-only. - DRY (f30ce827): extracted e.criticOwnsDeadline(ra), now the single predicate used by both Run and startCritic so they can't drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X
This commit is contained in:
+41
-20
@@ -29,13 +29,17 @@ type Defaults struct {
|
||||
MaxConsecutiveToolErrors int // loop guard; default 3
|
||||
MaxSameToolCallRepeats int // retry-storm guard; default 3
|
||||
CompactionThresholdRatio float64 // fraction of model context to compact at; default 0.7
|
||||
CriticSoftTimeout time.Duration // idle window before the critic wakes; default 90s
|
||||
// CriticAbsoluteMax is the failsafe wall-clock ceiling for a critic-OWNED run
|
||||
// (Ports.Critic set AND the agent enables it). For such a run MaxRuntime is the
|
||||
// SOFT trigger, not a hard cap, and the critic's extendable backstop is the
|
||||
// normal deadline — so this ceiling only fires if the critic never acts (a
|
||||
// broken/nil host handle). Default 6h; never shorter than the run's MaxRuntime.
|
||||
// Non-critic runs ignore it (they keep the literal MaxRuntime kill).
|
||||
// CriticAbsoluteMax is the RUNAWAY ceiling for a critic-OWNED run (Ports.Critic
|
||||
// set AND the agent enables it). For such a run MaxRuntime is the SOFT trigger,
|
||||
// not a hard cap, and the critic's own extendable backstop is the normal
|
||||
// deadline. This ceiling exists ONLY to stop a critic that never advances its
|
||||
// deadline (a broken host handle) from running forever, so it is deliberately
|
||||
// set FAR beyond any realistic backstop (default 24h): the host clamps its own
|
||||
// backstop to a much smaller absolute max (e.g. mort's agents.critic.
|
||||
// absolute_max_seconds = 6h), so the ceiling never pre-empts a healthy
|
||||
// supervised run. Keep it well above the host's absolute max. Never shorter than
|
||||
// the run's MaxRuntime. Non-critic runs ignore it (they keep the literal
|
||||
// MaxRuntime kill).
|
||||
CriticAbsoluteMax time.Duration
|
||||
}
|
||||
|
||||
@@ -58,11 +62,8 @@ func (d Defaults) withFallbacks() Defaults {
|
||||
if d.CompactionThresholdRatio <= 0 {
|
||||
d.CompactionThresholdRatio = 0.7
|
||||
}
|
||||
if d.CriticSoftTimeout <= 0 {
|
||||
d.CriticSoftTimeout = 90 * time.Second
|
||||
}
|
||||
if d.CriticAbsoluteMax <= 0 {
|
||||
d.CriticAbsoluteMax = 6 * time.Hour
|
||||
d.CriticAbsoluteMax = 24 * time.Hour
|
||||
}
|
||||
return d
|
||||
}
|
||||
@@ -289,19 +290,26 @@ func (e *Executor) Run(ctx context.Context, ra RunnableAgent, inv tool.Invocatio
|
||||
// MaxRuntime becomes the SOFT trigger (passed to startCritic), and the
|
||||
// critic's extendable backstop — watched in startCritic, which cancels via
|
||||
// cancelCause — is the real deadline. A slow-but-progressing run is given
|
||||
// room up to the backstop; only a stalled one is killed. We still wrap a
|
||||
// GENEROUS WithTimeout at CriticAbsoluteMax so a broken/nil critic handle
|
||||
// can't run unbounded; that ceiling never fires before the critic's backstop.
|
||||
// room up to that backstop; only a stalled one is killed. The base context
|
||||
// gets a WithTimeout at CriticAbsoluteMax (default 24h) purely as a RUNAWAY
|
||||
// guard for a critic that never advances its deadline: it is set FAR beyond
|
||||
// any realistic backstop (the host clamps its own backstop to a much smaller
|
||||
// absolute max, e.g. mort's 6h convar), so it does NOT pre-empt a healthy
|
||||
// supervised run. If the host critic fails to ARM (nil handle), the run is
|
||||
// unsupervised and we tighten the cap back down to MaxRuntime below.
|
||||
// A NESTED cause-carrying layer (cancelCause) lets a critic kill surface as a
|
||||
// distinct "killed": only an ErrCriticKill cause is consulted in statusFor; a
|
||||
// generic run error, a backstop expiry, or a caller cancel is classified by the
|
||||
// run error itself.
|
||||
criticOwnsDeadline := e.cfg.Ports.Critic != nil && ra.Critic.Enabled
|
||||
criticOwns := e.criticOwnsDeadline(ra)
|
||||
hardCap := maxRuntime
|
||||
if criticOwnsDeadline {
|
||||
if criticOwns {
|
||||
// Runaway guard only — the critic's own (extendable) deadline-watch is the
|
||||
// normal cap. Never shorter than the nominal budget, in case an operator
|
||||
// sets MaxRuntime above the runaway ceiling (a degenerate config).
|
||||
hardCap = e.cfg.Defaults.CriticAbsoluteMax
|
||||
if hardCap < maxRuntime {
|
||||
hardCap = maxRuntime // the failsafe ceiling is never shorter than the nominal budget
|
||||
hardCap = maxRuntime
|
||||
}
|
||||
}
|
||||
timeoutCtx, cancelTimeout := context.WithTimeout(context.WithoutCancel(ctx), hardCap)
|
||||
@@ -310,9 +318,6 @@ func (e *Executor) Run(ctx context.Context, ra RunnableAgent, inv tool.Invocatio
|
||||
defer cancelCause(nil)
|
||||
runCtx, mergeCancel := MergeCancellation(runCtx, ctx)
|
||||
defer mergeCancel()
|
||||
// The finalize defer (top of Run) now has a run context to read the
|
||||
// cancellation cause from (shutdown vs critic-kill vs deadline vs cancel).
|
||||
checkpointCause = func() error { return context.Cause(runCtx) }
|
||||
|
||||
// Critic (optional): monitors the run for a stall, can nudge/extend/kill via
|
||||
// its host Escalator. When it owns the deadline, MaxRuntime is its soft trigger
|
||||
@@ -322,6 +327,22 @@ func (e *Executor) Run(ctx context.Context, ra RunnableAgent, inv tool.Invocatio
|
||||
critic, stopCritic := e.startCritic(runCtx, cancelCause, ra, info, maxRuntime)
|
||||
defer stopCritic()
|
||||
|
||||
// Unsupervised-run failsafe: the agent enabled the critic (so the base context
|
||||
// got the generous runaway ceiling instead of MaxRuntime), but the host Monitor
|
||||
// returned no handle — there is no deadline-watch. Without this the run would be
|
||||
// bounded only by the 24h ceiling. Tighten it back to the nominal MaxRuntime so
|
||||
// an unsupervised run can't hold its slot far past budget. mort's adapter always
|
||||
// arms when the flag is set, so this is pure defence in depth.
|
||||
if criticOwns && critic == nil {
|
||||
var cancelUnsupervised context.CancelFunc
|
||||
runCtx, cancelUnsupervised = context.WithTimeout(runCtx, maxRuntime)
|
||||
defer cancelUnsupervised()
|
||||
}
|
||||
// The finalize defer (top of Run) now has a run context to read the
|
||||
// cancellation cause from (shutdown vs critic-kill vs deadline vs cancel). Set
|
||||
// AFTER the unsupervised-failsafe re-wrap so it reads the context the loop runs on.
|
||||
checkpointCause = func() error { return context.Cause(runCtx) }
|
||||
|
||||
// Step instrumentation: accumulate Result.Steps + fire inv.OnStep, feed the
|
||||
// audit recorder, and keep the live iteration counter fresh. majordomo's
|
||||
// step observer hands us each completed iteration; we zip the model's tool
|
||||
|
||||
Reference in New Issue
Block a user