fix(run): address gadfly review of the critic-deadline PR
executus CI / test (pull_request) Successful in 1m45s

All 11 findings were real (3 clusters):

- Failsafe ceiling could pre-empt the critic's backstop (e9c9483f, 9109317b,
  d5a9bf0d, 76ad171e): CriticAbsoluteMax was 6h, but the host's backstop
  (MaxRuntime × multiplier, or its own absolute max) can reach 6h+, so the
  ceiling fired first and reintroduced a premature hard cap. Now CriticAbsoluteMax
  is a 24h RUNAWAY guard set far beyond any realistic backstop (the host clamps
  its own backstop to a much smaller absolute max, e.g. mort's 6h convar), so it
  never pre-empts a healthy supervised run. Comments corrected.

- nil Monitor handle lost the MaxRuntime cap (df016a6f, 9dd42827): a critic-enabled
  run whose host Monitor returned no handle had no deadline-watch and was bounded
  only by the generous ceiling. Added an unsupervised-run failsafe that re-wraps
  runCtx to the nominal MaxRuntime when the critic is enabled but didn't arm.
  New test TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime.

- CriticSoftTimeout vestigial / dead fallback (f7764919, 9805bebe, 6864086f,
  b2b11721): the soft trigger is now always the resolved MaxRuntime (> 0), so the
  CriticSoftTimeout field + its startCritic fallback were unreachable. Removed the
  field entirely; the remaining 90s floor is documented as defensive-only.

- DRY (f30ce827): extracted e.criticOwnsDeadline(ra), now the single predicate used
  by both Run and startCritic so they can't drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X
This commit is contained in:
2026-06-30 11:32:46 -04:00
parent 5b5ee4148e
commit cb4c612461
4 changed files with 81 additions and 31 deletions
+2 -2
View File
@@ -61,8 +61,8 @@ func TestCriticRaisesStepCeiling(t *testing.T) {
Registry: tool.NewRegistry(),
Models: func(ctx context.Context, _ string) (context.Context, llm.Model, error) { return ctx, m, nil },
Ports: run.Ports{Critic: &fakeCritic{h: h}},
// large soft timeout so the deadline-watch never interferes in the test
Defaults: run.Defaults{CriticSoftTimeout: time.Hour},
// The fake handle's Deadline() is zero (no hard deadline), so the
// deadline-watch never interferes regardless of the soft trigger.
})
res := ex.Run(context.Background(),
run.RunnableAgent{Name: "x", ModelTier: "m", MaxIterations: 1, Critic: run.CriticConfig{Enabled: true}},