fix: address verified gadfly P4c findings (3-cloud fleet)
executus CI / test (pull_request) Successful in 1m39s
executus CI / test (pull_request) Successful in 1m39s
critic (all 3 models — HIGH): - ExtendOnce was a single global one-shot shared across every run a System monitors, so only the FIRST run to stall got its extension and all others were killed by the backstop. Key the fired-state per run (RunInfo.RunID). - Kill is now sticky: a `killed` flag short-circuits later ticks so a wavering Escalator returning ExtendBy after a Kill can't un-collapse the deadline; a Kill paired with Nudge/ExtendBy ignores the latter. - watch() recovers panics from a misbehaving Escalator (logs; the run falls back to its existing deadline) instead of silently killing the watch goroutine. checkpoint (deepseek — HIGH): handle.Save advanced the throttle clock BEFORE the store write, so a failed save was silently throttled away (caller believes it persisted). Advance lastSave only after a successful persist. schedule (all 3): compute Next BEFORE Run — a permanently-unparseable cron now skips the job entirely instead of re-running it every tick forever; nil required callbacks return a validate() error instead of a first-tick nil panic; Loop recovers tick panics; the Mark-failure => possible-re-run trade-off is documented (Run must be idempotent). + tests for each. Triaged-but-kept: critic backstopMul<=1 floor (it's a total-runtime multiple, so a floor >1 is intentional, not the reported footgun); checkpoint Load (nil,nil) on miss (documented convention). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -77,13 +77,18 @@ func TestKillCollapsesDeadline(t *testing.T) {
|
||||
|
||||
func TestExtendOnceOnlyFiresOnce(t *testing.T) {
|
||||
e := &ExtendOnce{By: time.Minute}
|
||||
d1 := e.OnSoftTimeout(context.Background(), run.RunInfo{}, Progress{})
|
||||
d2 := e.OnSoftTimeout(context.Background(), run.RunInfo{}, Progress{})
|
||||
// Same run id: only the first call extends.
|
||||
d1 := e.OnSoftTimeout(context.Background(), run.RunInfo{RunID: "r1"}, Progress{})
|
||||
d2 := e.OnSoftTimeout(context.Background(), run.RunInfo{RunID: "r1"}, Progress{})
|
||||
if d1.ExtendBy != time.Minute {
|
||||
t.Errorf("first decision should extend, got %+v", d1)
|
||||
}
|
||||
if d2.ExtendBy != 0 || d2.Kill {
|
||||
t.Errorf("second decision should be a no-op, got %+v", d2)
|
||||
t.Errorf("second call for the same run should be a no-op, got %+v", d2)
|
||||
}
|
||||
// A DIFFERENT run still gets its own one extension (per-run, not global).
|
||||
if d3 := e.OnSoftTimeout(context.Background(), run.RunInfo{RunID: "r2"}, Progress{}); d3.ExtendBy != time.Minute {
|
||||
t.Errorf("a different run should get its own extension, got %+v", d3)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user