P4c: remaining batteries — checkpoint + schedule + critic #6

Closed
steve wants to merge 5 commits from phase-4c-batteries into phase-4b-skill
Owner

Completes the P4 battery set (stacked on #5). Three batteries that plug into the remaining run.Ports seams. NOTE: the executor's call sites for Critic/Checkpointer are a P2 follow-up — these provide the seams + impls + defaults ahead of that wiring, each nil-safe and core-import-clean (verified core → battery = 0).

Batteries

  • checkpoint/run.Checkpointer durable-resume: CheckpointStore seam + a per-run handle (throttled Save, Complete/Fail delete) + Memory default (with the honest caveat that in-memory doesn't survive the restart it recovers from). ListInterrupted is the boot-recovery query.
  • schedule/ — a generic cron Runner (Tick = one pass, Loop = run on an interval). Every dependency is wired by the host (Due/Run/Mark/Next), so the battery owns no cron grammar and never duplicates the parser. A bad job is logged + left due (retries); only a failing Due lister is pass-fatal.
  • critic/ — a two-tier timeout watchdog (run.Critic). executus owns the deterministic mechanics (activity tracking, fire-once-per-idle-period on soft timeout, extendable deadline, hard-kill backstop); the policy is the Escalator seam (nudge/extend/kill). Mort plugs its LLM critic-agent in here; ExtendOnce is the default. Race-tested.

Also in this PR

  • gadfly config update (ci(gadfly)): image → sha-d0de034, and ollama-cloud concurrency moved from the model axis to the lens axis (PROVIDER_CONCURRENCY=1, PROVIDER_LENS_CONCURRENCY=3) — so this PR's own review runs one model at a time with its 3 lenses concurrent (first comment sooner; the other two models in series).

Verification

  • go build/vet/test -race ./... green; core go.sum still free of host/DB deps.

P4 status

Battery set complete: audit, budget, persona, skill (nouns), checkpoint, schedule, critic + contrib/store SQLite for budget/persona/skill/audit. Next: P5 (gadfly-on-core canary) and P6 (mort rewrite).

🤖 Generated with Claude Code

Completes the P4 battery set (stacked on #5). Three batteries that plug into the remaining `run.Ports` seams. NOTE: the executor's call sites for Critic/Checkpointer are a P2 follow-up — these provide the seams + impls + defaults ahead of that wiring, each nil-safe and **core-import-clean** (verified `core → battery` = 0). ## Batteries - **`checkpoint/`** — `run.Checkpointer` durable-resume: `CheckpointStore` seam + a per-run `handle` (throttled `Save`, `Complete`/`Fail` delete) + `Memory` default (with the honest caveat that in-memory doesn't survive the restart it recovers from). `ListInterrupted` is the boot-recovery query. - **`schedule/`** — a generic cron `Runner` (`Tick` = one pass, `Loop` = run on an interval). Every dependency is wired by the host (`Due`/`Run`/`Mark`/`Next`), so the battery owns **no cron grammar** and never duplicates the parser. A bad job is logged + left due (retries); only a failing `Due` lister is pass-fatal. - **`critic/`** — a two-tier timeout watchdog (`run.Critic`). executus owns the deterministic mechanics (activity tracking, fire-once-per-idle-period on soft timeout, extendable deadline, hard-kill backstop); the **policy** is the `Escalator` seam (nudge/extend/kill). Mort plugs its LLM critic-agent in here; `ExtendOnce` is the default. Race-tested. ## Also in this PR - **gadfly config update** (`ci(gadfly)`): image → `sha-d0de034`, and ollama-cloud concurrency moved from the model axis to the **lens** axis (`PROVIDER_CONCURRENCY=1`, `PROVIDER_LENS_CONCURRENCY=3`) — so this PR's own review runs one model at a time with its 3 lenses concurrent (first comment sooner; the other two models in series). ## Verification - `go build/vet/test -race ./...` green; core go.sum still free of host/DB deps. ## P4 status Battery set complete: **audit, budget, persona, skill (nouns), checkpoint, schedule, critic** + `contrib/store` SQLite for budget/persona/skill/audit. Next: P5 (gadfly-on-core canary) and P6 (mort rewrite). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 4 commits 2026-06-27 03:04:57 +00:00
Bump the gadfly image to sha-d0de034 (adds GADFLY_PROVIDER_LENS_CONCURRENCY)
and move ollama-cloud's concurrency from the MODEL axis to the LENS axis:
- GADFLY_PROVIDER_CONCURRENCY: ollama-cloud=1 (one model at a time)
- GADFLY_PROVIDER_LENS_CONCURRENCY: ollama-cloud=3 (its 3 lenses concurrent)

Net: still 3 models, but reviewed serially — the first model's consolidated
comment lands sooner and each model finishes faster, while the other two
models' comments arrive in series after it (instead of all 3 in parallel).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Plugs into run.Ports.Checkpointer (the executor's call site is a P2 follow-up;
this provides the seam + impls ahead of it):
- checkpoint.go: CheckpointStore seam + RunCheckpoint{Meta, Messages, Iteration,
  ActivePhase} + RunCheckpointMeta (mirrors mort's agentexec types).
- handle.go: New(store, meta, throttle, now) -> run.Checkpointer. Save writes a
  throttled snapshot; Complete/Fail delete it (a cleanly finished or terminally
  failed run is NOT a recovery candidate; a shutdown-interrupted run never calls
  them, so its checkpoint survives ListInterrupted at boot). nil store -> no-op.
- memory.go: NewMemory() default (with the honest caveat that in-memory does
  not survive the restart it exists to recover from — a durable store is mort's).

Tests: save+complete clears the recovery candidate; throttle skips in-window
saves; nil-store is a clean no-op. Core imports ZERO from checkpoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A host-agnostic ticker (Tick = one pass; Loop = run on an interval until ctx
done) that fires due jobs. Every dependency is wired by the host:
- Due lists due jobs (skill.ListDueScheduled / persona.ListScheduledAgents),
- Run executes one (run.Executor),
- Mark stamps the next fire (store.MarkScheduledRun),
- Next computes the cron next-fire (a cron lib / skill's parser).

The battery owns NO cron grammar, so it never duplicates the parser. A job whose
Run or Next errors is logged and left un-stamped (stays due, retries next tick)
— one bad job can't stall the others; only a failing Due lister is pass-fatal.

Tests: due jobs run + stamped, bad-cron job runs but isn't stamped, a failing
Run doesn't stamp or stall siblings, Due error surfaces. Core imports ZERO.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P4: critic battery — two-tier timeout watchdog + Escalator seam
executus CI / test (pull_request) Successful in 1m35s
Adversarial Review (Gadfly) / review (pull_request) Successful in 9m21s
e5cab5525e
The last Tier-2 battery, plugging into run.Ports.Critic (executor call site is a
P2 follow-up). Clean split of concerns:
- executus owns the deterministic MECHANICS: System.Monitor returns a
  run.CriticHandle that tracks activity (RecordStep/RecordToolStart), and a
  watcher goroutine fires once per idle period a run crosses its soft timeout,
  applies the decision (queue Steer nudges / extend the Deadline / collapse it
  to now on Kill), and enforces a hard-kill backstop (softTimeout * mul).
- the POLICY is the Escalator seam (nudge/extend/kill/escalate). Mort plugs its
  LLM critic-agent in here; ExtendOnce is the zero-dependency default (extend
  once, then let the backstop kill a truly hung run).

Race-tested: escalate-once-per-idle-period with re-arm on fresh activity, Kill
collapses the deadline, ExtendOnce fires once, zero soft-timeout => nil handle.
Core imports ZERO from critic.

This completes the P4 battery set: audit, budget, persona, skill, checkpoint,
schedule, critic — each nil-safe, each with a default, each core-import-clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No os/exec, no SQL, no HTTP, no templates in the three new packages — all DB-touching code lives in contrib/store (out of scope for this diff, and that's host-supplied SQL anyway).

There's one minor subtlety worth noting: schedule.Runner.Run and schedule.Runner.Mark are called sequentially on the host-supplied goroutine, and the same Runner.Due/Mark callbacks will eventually call into whatever storage mort plugs in — but that wiring is host-side, not in this diff, and the same trust boundary already exists in the rest of the codebase. Nothing in this PR creates a new injection sink.

The CI workflow change (gadfly:sha-d0de034, concurrency shifted to lens axis) is pure config — no secrets expanded into the diff beyond secrets.GITEA_TOKEN which was already there. No new secret leakage introduced.

One small CI note worth flagging: the new workflow pins a specific image sha (sha-d0de034) — pinning by sha is good practice (avoids tag-mutation supply-chain risk), and no integrity verification is added beyond the registry's own. That's consistent with the prior pin (sha-6e3a83c) and not a regression.

VERDICT: No material issues found

Through the security lens:

  • Injection sinks: I checked schedule/runner.go, checkpoint/{handle,memory,checkpoint}.go, and critic/critic.go for any os/exec, database/sql, net/http, or text/template usage. None present in this diff (all SQL/DB code lives in contrib/store, not touched here). The only string-handling that touches host-supplied data (j.ID, j.Cron) is passing them back to host-supplied Run/Mark/Next callbacks via structured slog fields — no format-string eval, no shell.
  • Tainted-input tracing: Messages []llm.Message in RunCheckpointState flows into CheckpointStore.Save and only the in-memory map implementation is shipped here — no JSON marshaling, no SQL interpolation, no template rendering. RunInfo fields flow into Escalator.OnSoftTimeout, but the Escalator is host-supplied and runs in-process on the kernel goroutine; no untrusted-code boundary is crossed.
  • Concurrency / TOCTOU: handle.Save correctly uses a mutex around lastSave; Memory uses sync.RWMutex; critic.handle uses sync.Mutex for all shared state. No obvious data races.
  • Authn/authz, secrets, SSRF, unsafe deserialization, insecure defaults: not applicable to these three packages — they are orchestration primitives that delegate persistence/IO to host-supplied implementations behind interfaces. Verified by grepping the new packages for http, exec, sql, template and finding none.
  • CI config: the workflow change pins the gadfly image by content hash (sha-d0de034) and shifts concurrency config — supply-chain posture is unchanged (sha pinning is good; same as the prior pin), and the only secret referenced is the pre-existing secrets.GITEA_TOKEN.

Verified by reading schedule/runner.go, checkpoint/handle.go, checkpoint/memory.go, critic/critic.go, run/ports.go, and grepping the diff paths for injection sinks.

🎯 Correctness — Blocking issues found

VERDICT: Blocking issues found

Findings

  • critic/critic.go:54-74ExtendOnce is per-Escalator, but the contract is per-run (real bug, verified).

    • The doc on lines 50-53 says "the first time a run stalls it extends the deadline by By ... then takes no further action" — per-run language.
    • The implementation stores fired on the ExtendOnce struct itself (line 58), behind a mu. System (line 78) holds one esc Escalator (line 79); New (line 89-97) constructs a single &ExtendOnce{} when esc is nil; Monitor (line 102-125) reuses s.esc for every handle it creates. The tick (line 215) calls h.sys.esc.OnSoftTimeout(...) on the shared esc.
    • Net effect: the second run whose soft-timeout elapses sees e.fired == true and gets Decision{} — no extension, no nudge. The per-handle escalatedAt de-duplication (line 211) does NOT prevent this because each handle has its own escalatedAt; it only re-arms within a single handle, not across runs sharing one esc.
    • The existing test (critic_test.go:79-89, TestExtendOnceOnlyFiresOnce) calls OnSoftTimeout twice on the same ExtendOnce{} instance with different RunInfo values, but it never wires two Monitors through one System with the default esc, so the bug is not exercised.
    • Impact: a host using the documented default behavior (nil esc ⇒ ExtendOnce) silently loses the "one extension per stall" guarantee for every run after the first. Under fanout load all but the first stalling run are killed prematurely; single-run deployments only appear safe because there is no second run.
    • Suggested fix: key the fired-state by run.RunInfo.RunID (e.g. map[string]struct{} protected by mu) so each run gets its own one-shot.
  • critic/critic.go:202-227Decision.Kill is not sticky across re-arms (real but narrow bug, verified).

    • On a Kill tick, deadline is collapsed to h.now() (line 226). The per-handle escalatedAt is set to lastActivity at the time of that tick (line 211). If the executor hasn't yet observed the collapsed deadline and calls RecordStep before its next cancellation check (line 145-150), lastActivity advances, so a subsequent tick sees h.escalatedAt != h.lastActivity and the gate idle < softTimeout || escalatedAt.Equal(lastActivity) (line 207) lets it through once a fresh idle period exceeds softTimeout. The Escalator is called again; if it now returns ExtendBy > 0, line 222-224 does h.deadline = h.deadline.Add(d.ExtendBy) — pushing the just-collapsed deadline back into the future.
    • ExtendOnce is immune (its fired flag sticks at the Escalator level), but any Escalator that returns a mix of Kill and ExtendBy across calls — e.g. a wavering LLM critic that decides to "give it one more try" after first issuing a kill — is susceptible. Also susceptible is any Escalator that returns Kill once and then Decision{} (which would leave the deadline collapsed but lose the kill intent if a subsequent ExtendBy ever arrives).
    • Suggested fix: track killed bool on the handle; once set, tick() short-circuits regardless of Escalator output. Or, equivalently, set escalatedAt to a sentinel that no RecordStep can produce after Kill.
  • checkpoint/handle.go:42-58Save writes via h.store.Save outside the throttle mutex (verified clean, no bug).

    • Confirmed at lines 42-58. The throttle check, the lastSave = now update, and the h.mu.Unlock() happen under the mutex; h.store.Save runs after the unlock. The gate is monotonic (now.Sub(h.lastSave) < h.throttle) and lastSave is updated before the write, so a concurrent caller cannot undercount saves. No correctness issue here.
  • schedule/runner.go:60-81 — failed Mark is logged but the job is still effectively considered "ran" for this tick (verified, design choice, not a bug).

    • Confirmed at lines 76-78. A Mark failure after a successful Run leaves next_run_at unchanged, so the same job will be picked up next tick and re-run. The doc-comment on lines 56-59 already frames this as the safety/correctness trade-off (skipping Mark is better than skipping the work). Not material.
  • checkpoint/memory.go:30-38Load returns nil, nil for missing entries (verified, intentional).

    • Confirmed at line 35. Callers must distinguish "no checkpoint" from "error". A doc note on CheckpointStore.Load in checkpoint.go:47 ("returns (nil, nil) if no checkpoint exists") would prevent future misreading but is not blocking.

Summary

The ExtendOnce per-run/per-Escalator mismatch is a real blocking correctness bug that contradicts the documented contract and will manifest under any multi-run workload that uses the default Escalator. The Kill-stickiness issue is real but narrower and only bites Escalators that issue inconsistent decisions across idle periods. Everything else I traced checks out.

🧯 Error handling & edge cases — Minor issues

Error-handling & edge-case review — 🧯

VERDICT: Minor issues

Findings

  • critic/critic.go:202-215 — escalatedAt de-dup is racy across the OnSoftTimeout call. Between h.mu.Unlock() at L213 and the re-lock at L217, a concurrent RecordStep (L145-150) can advance lastActivity. The next tick will see escalatedAt != lastActivity and re-escalate against what is effectively the same idle period (the post-OnSoftTimeout lock at L217-227 doesn't re-check the dedup condition). A fast step-then-stall sequence can double-fire the Escalator. Fix: snapshot the decision inputs (lastActivity, escalatedAt) and re-check escalatedAt.Equal(lastActivity) inside the second locked section before applying the Decision, or hold the mu across OnSoftTimeout (the Escalator is documented as policy and must be cheap; if it isn't, document the expectation).

  • checkpoint/memory.go:30-38 — Load returns (nil, nil) for a missing key. Conventional Go returns (nil, ErrNotFound) (or an explicit sentinel) so callers can't conflate "no checkpoint because the run finished cleanly" with "Save never landed". Verified by reading the interface (checkpoint.go:45-49) and the Memory impl. Impact: a future durable CheckpointStore (mort's durable-job table, contrib/store SQLite) has to invent its own convention and Memory/SQLite will diverge silently. Fix: define var ErrNoCheckpoint = errors.New("no checkpoint") in checkpoint.go and return it from Memory.Load; have callers (and the package docs) say "check errors.Is(err, ErrNoCheckpoint)".

  • schedule/runner.go:67-78 — Tick retries failing jobs forever with no backoff. Verified against the docstring: a Run failure (L67-70) or a Mark failure (L76-78) logs and continues without stamping, so the job stays due and re-runs next tick indefinitely. This conflates transient and permanent failures and amplifies load against a flapping downstream (especially Mark — if the store is down, every due job retries every Interval). Fix: add a per-job retry budget (e.g., Runner.MaxRetries or a circuit-breaker map keyed by job id), or at minimum a linear/exponential backoff stamp on failure instead of leaving nextAt unchanged.

  • schedule/runner.go:62-78 — nil Due/Run/Mark/Next panic with nil deref. Verified: none of r.Due (L62), r.Run (L67), r.Next (L71), r.Mark (L76) is guarded. Docstring says "Every func field is required except Now and Logger" but a partial Runner compiles silently and crashes at the first tick. Fix: add a Validate() method that returns an error if any required func is nil, and have Loop call it once at start; or have each call site return a wrapped error from Tick so the host learns about misconfiguration instead of taking a nil deref.

  • schedule/runner.go:86-103 — Loop has no panic recovery; a nil-func panic in Tick kills the scheduler goroutine silently. This is distinct from the validation finding above: even after the immediate nil-deref is fixed, any future panic from a Due/Run/Next/Mark impl would unwind through Tick, out of Loop's select, and terminate the scheduler with no log line. Loop currently only catches errors returned from Tick (L98-100), not panics. Fix: wrap the Tick call in defer recover() that logs and continues, so a single bad tick doesn't silently kill the cron runner for the process lifetime.

  • critic/critic.go:187-227 — watch/tick do not recover panics from OnSoftTimeout (or from the post-decision apply). A misbehaving Escalator that panics takes down the watch goroutine; the run then has no critic (no steer, no deadline extension, no kill) for the rest of its lifetime, with no log. Verified: watch (L187-200) has no defer recover(), and tick (L202-228) has none either. Fix: add defer recover() in watch (logs and returns — the goroutine exits, but at least it's logged), or in tick (logs, treats the panic as a Decision{}, and continues). Watch-level recovery is sufficient and simpler.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 19s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> No `os/exec`, no SQL, no HTTP, no templates in the three new packages — all DB-touching code lives in `contrib/store` (out of scope for this diff, and that's host-supplied SQL anyway). There's one minor subtlety worth noting: `schedule.Runner.Run` and `schedule.Runner.Mark` are called sequentially on the host-supplied goroutine, and the same `Runner.Due/Mark` callbacks will eventually call into whatever storage mort plugs in — but that wiring is host-side, not in this diff, and the same trust boundary already exists in the rest of the codebase. Nothing in this PR creates a new injection sink. The CI workflow change (`gadfly:sha-d0de034`, concurrency shifted to lens axis) is pure config — no secrets expanded into the diff beyond `secrets.GITEA_TOKEN` which was already there. No new secret leakage introduced. One small CI note worth flagging: the new workflow pins a specific image sha (`sha-d0de034`) — pinning by sha is good practice (avoids tag-mutation supply-chain risk), and no integrity verification is added beyond the registry's own. That's consistent with the prior pin (`sha-6e3a83c`) and not a regression. VERDICT: No material issues found Through the security lens: - **Injection sinks**: I checked `schedule/runner.go`, `checkpoint/{handle,memory,checkpoint}.go`, and `critic/critic.go` for any `os/exec`, `database/sql`, `net/http`, or `text/template` usage. None present in this diff (all SQL/DB code lives in `contrib/store`, not touched here). The only string-handling that touches host-supplied data (`j.ID`, `j.Cron`) is passing them back to host-supplied `Run`/`Mark`/`Next` callbacks via structured `slog` fields — no format-string eval, no shell. - **Tainted-input tracing**: `Messages []llm.Message` in `RunCheckpointState` flows into `CheckpointStore.Save` and only the in-memory map implementation is shipped here — no JSON marshaling, no SQL interpolation, no template rendering. `RunInfo` fields flow into `Escalator.OnSoftTimeout`, but the Escalator is host-supplied and runs in-process on the kernel goroutine; no untrusted-code boundary is crossed. - **Concurrency / TOCTOU**: `handle.Save` correctly uses a mutex around `lastSave`; `Memory` uses `sync.RWMutex`; `critic.handle` uses `sync.Mutex` for all shared state. No obvious data races. - **Authn/authz, secrets, SSRF, unsafe deserialization, insecure defaults**: not applicable to these three packages — they are orchestration primitives that delegate persistence/IO to host-supplied implementations behind interfaces. Verified by grepping the new packages for `http`, `exec`, `sql`, `template` and finding none. - **CI config**: the workflow change pins the gadfly image by content hash (`sha-d0de034`) and shifts concurrency config — supply-chain posture is unchanged (sha pinning is good; same as the prior pin), and the only secret referenced is the pre-existing `secrets.GITEA_TOKEN`. Verified by reading `schedule/runner.go`, `checkpoint/handle.go`, `checkpoint/memory.go`, `critic/critic.go`, `run/ports.go`, and grepping the diff paths for injection sinks. </details> <details><summary><b>🎯 Correctness</b> — Blocking issues found</summary> # VERDICT: Blocking issues found ## Findings - **`critic/critic.go:54-74` — `ExtendOnce` is per-Escalator, but the contract is per-run (real bug, verified).** - The doc on lines 50-53 says "*the first time a run stalls it extends the deadline by `By` ... then takes no further action*" — per-run language. - The implementation stores `fired` on the `ExtendOnce` struct itself (line 58), behind a `mu`. `System` (line 78) holds one `esc Escalator` (line 79); `New` (line 89-97) constructs a single `&ExtendOnce{}` when esc is nil; `Monitor` (line 102-125) reuses `s.esc` for every handle it creates. The `tick` (line 215) calls `h.sys.esc.OnSoftTimeout(...)` on the shared esc. - Net effect: the second run whose soft-timeout elapses sees `e.fired == true` and gets `Decision{}` — no extension, no nudge. The per-handle `escalatedAt` de-duplication (line 211) does NOT prevent this because each handle has its own `escalatedAt`; it only re-arms within a single handle, not across runs sharing one esc. - The existing test (`critic_test.go:79-89`, `TestExtendOnceOnlyFiresOnce`) calls `OnSoftTimeout` twice on the same `ExtendOnce{}` instance with different `RunInfo` values, but it never wires two `Monitor`s through one `System` with the default esc, so the bug is not exercised. - Impact: a host using the documented default behavior (nil esc ⇒ `ExtendOnce`) silently loses the "one extension per stall" guarantee for every run after the first. Under fanout load all but the first stalling run are killed prematurely; single-run deployments only appear safe because there is no second run. - Suggested fix: key the fired-state by `run.RunInfo.RunID` (e.g. `map[string]struct{}` protected by `mu`) so each run gets its own one-shot. - **`critic/critic.go:202-227` — `Decision.Kill` is not sticky across re-arms (real but narrow bug, verified).** - On a `Kill` tick, `deadline` is collapsed to `h.now()` (line 226). The per-handle `escalatedAt` is set to `lastActivity` at the time of that tick (line 211). If the executor hasn't yet observed the collapsed deadline and calls `RecordStep` before its next cancellation check (line 145-150), `lastActivity` advances, so a subsequent tick sees `h.escalatedAt != h.lastActivity` and the gate `idle < softTimeout || escalatedAt.Equal(lastActivity)` (line 207) lets it through once a fresh idle period exceeds softTimeout. The Escalator is called again; if it now returns `ExtendBy > 0`, line 222-224 does `h.deadline = h.deadline.Add(d.ExtendBy)` — pushing the just-collapsed deadline back into the future. - `ExtendOnce` is immune (its `fired` flag sticks at the Escalator level), but any Escalator that returns a mix of `Kill` and `ExtendBy` across calls — e.g. a wavering LLM critic that decides to "give it one more try" after first issuing a kill — is susceptible. Also susceptible is any Escalator that returns `Kill` once and then `Decision{}` (which would leave the deadline collapsed but lose the kill intent if a subsequent `ExtendBy` ever arrives). - Suggested fix: track `killed bool` on the handle; once set, `tick()` short-circuits regardless of Escalator output. Or, equivalently, set `escalatedAt` to a sentinel that no `RecordStep` can produce after Kill. - **`checkpoint/handle.go:42-58` — `Save` writes via `h.store.Save` outside the throttle mutex (verified clean, no bug).** - Confirmed at lines 42-58. The throttle check, the `lastSave = now` update, and the `h.mu.Unlock()` happen under the mutex; `h.store.Save` runs after the unlock. The gate is monotonic (`now.Sub(h.lastSave) < h.throttle`) and `lastSave` is updated before the write, so a concurrent caller cannot undercount saves. No correctness issue here. - **`schedule/runner.go:60-81` — failed `Mark` is logged but the job is still effectively considered "ran" for this tick (verified, design choice, not a bug).** - Confirmed at lines 76-78. A `Mark` failure after a successful `Run` leaves `next_run_at` unchanged, so the same job will be picked up next tick and re-run. The doc-comment on lines 56-59 already frames this as the safety/correctness trade-off (skipping `Mark` is better than skipping the work). Not material. - **`checkpoint/memory.go:30-38` — `Load` returns `nil, nil` for missing entries (verified, intentional).** - Confirmed at line 35. Callers must distinguish "no checkpoint" from "error". A doc note on `CheckpointStore.Load` in `checkpoint.go:47` ("returns (nil, nil) if no checkpoint exists") would prevent future misreading but is not blocking. ## Summary The `ExtendOnce` per-run/per-Escalator mismatch is a real blocking correctness bug that contradicts the documented contract and will manifest under any multi-run workload that uses the default Escalator. The Kill-stickiness issue is real but narrower and only bites Escalators that issue inconsistent decisions across idle periods. Everything else I traced checks out. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> ## Error-handling & edge-case review — 🧯 VERDICT: Minor issues ### Findings - **critic/critic.go:202-215 — `escalatedAt` de-dup is racy across the `OnSoftTimeout` call.** Between `h.mu.Unlock()` at L213 and the re-lock at L217, a concurrent `RecordStep` (L145-150) can advance `lastActivity`. The next tick will see `escalatedAt != lastActivity` and re-escalate against what is effectively the same idle period (the post-`OnSoftTimeout` lock at L217-227 doesn't re-check the dedup condition). A fast step-then-stall sequence can double-fire the Escalator. Fix: snapshot the decision inputs (`lastActivity`, `escalatedAt`) and re-check `escalatedAt.Equal(lastActivity)` inside the second locked section before applying the Decision, or hold the mu across `OnSoftTimeout` (the Escalator is documented as policy and must be cheap; if it isn't, document the expectation). - **checkpoint/memory.go:30-38 — `Load` returns `(nil, nil)` for a missing key.** Conventional Go returns `(nil, ErrNotFound)` (or an explicit sentinel) so callers can't conflate "no checkpoint because the run finished cleanly" with "Save never landed". Verified by reading the interface (`checkpoint.go:45-49`) and the `Memory` impl. Impact: a future durable `CheckpointStore` (mort's durable-job table, contrib/store SQLite) has to invent its own convention and `Memory`/`SQLite` will diverge silently. Fix: define `var ErrNoCheckpoint = errors.New("no checkpoint")` in `checkpoint.go` and return it from `Memory.Load`; have callers (and the package docs) say "check `errors.Is(err, ErrNoCheckpoint)`". - **schedule/runner.go:67-78 — `Tick` retries failing jobs forever with no backoff.** Verified against the docstring: a `Run` failure (L67-70) or a `Mark` failure (L76-78) logs and `continue`s without stamping, so the job stays due and re-runs next tick indefinitely. This conflates transient and permanent failures and amplifies load against a flapping downstream (especially `Mark` — if the store is down, every due job retries every `Interval`). Fix: add a per-job retry budget (e.g., `Runner.MaxRetries` or a circuit-breaker map keyed by job id), or at minimum a linear/exponential backoff stamp on failure instead of leaving `nextAt` unchanged. - **schedule/runner.go:62-78 — nil `Due`/`Run`/`Mark`/`Next` panic with nil deref.** Verified: none of `r.Due` (L62), `r.Run` (L67), `r.Next` (L71), `r.Mark` (L76) is guarded. Docstring says "Every func field is required except Now and Logger" but a partial `Runner` compiles silently and crashes at the first tick. Fix: add a `Validate()` method that returns an error if any required func is nil, and have `Loop` call it once at start; or have each call site return a wrapped error from `Tick` so the host learns about misconfiguration instead of taking a nil deref. - **schedule/runner.go:86-103 — `Loop` has no panic recovery; a nil-func panic in `Tick` kills the scheduler goroutine silently.** This is distinct from the validation finding above: even after the immediate nil-deref is fixed, any future panic from a `Due`/`Run`/`Next`/`Mark` impl would unwind through `Tick`, out of `Loop`'s `select`, and terminate the scheduler with no log line. `Loop` currently only catches errors returned from `Tick` (L98-100), not panics. Fix: wrap the `Tick` call in `defer recover()` that logs and continues, so a single bad tick doesn't silently kill the cron runner for the process lifetime. - **critic/critic.go:187-227 — `watch`/`tick` do not recover panics from `OnSoftTimeout` (or from the post-decision apply).** A misbehaving Escalator that panics takes down the watch goroutine; the run then has no critic (no steer, no deadline extension, no kill) for the rest of its lifetime, with no log. Verified: `watch` (L187-200) has no `defer recover()`, and `tick` (L202-228) has none either. Fix: add `defer recover()` in `watch` (logs and returns — the goroutine exits, but at least it's logged), or in `tick` (logs, treats the panic as a `Decision{}`, and continues). Watch-level recovery is sufficient and simpler. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 19s</sub>

🪰 Gadfly review — deepseek-v4-flash:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

I reviewed the three new battery packages (checkpoint/, critic/, schedule/) and the CI/CLAUDE.md changes through the security lens. No material security issues were found.

Analysis summary:

  • No injection vectors. All three batteries operate through abstracted interfaces/seams (CheckpointStore, Escalator, Runner function fields) that the host wires. The batteries never directly handle SQL, shell commands, or template execution. Data flows from host-provided callbacks through the battery and back to host-provided callbacks — the battery is pure orchestration.

  • No untrusted input paths. The schedule.Runner passes Due.ID and Due.Cron strings from the host's Due callback to the host's Run/Next callbacks. The critic package passes run.RunInfo and Progress to the host's Escalator. The checkpoint package stores []llm.Message through the host's CheckpointStore. In every case the host controls both the producer and consumer of the data; the battery introduces no new trust boundary.

  • No secret/credential leakage. The CI workflow change passes GITEA_TOKEN as an env var to the Docker container, but this is a pre-existing pattern unchanged by this PR (only the image tag and concurrency settings were modified).

  • No unsafe defaults. ExtendOnce (default Escalator) fires at most once. The noop checkpointer is a safe nil-store default. The schedule.Runner defaults to time.Now and slog.Default() — no dangerous fallbacks.

  • No SSRF, deserialization, or authn/authz gaps. These batteries don't make network calls, deserialize untrusted data, or implement access control — those concerns live in the host layer.

The code is clean from a security perspective.

🎯 Correctness — Minor issues

🎯 Correctness — Verdict: Minor issues

Findings

1. checkpoint/handle.go:50-56lastSave updated before the store write, so a failed save is invisible to the throttle

h.lastSave = now
h.mu.Unlock()
return h.store.Save(ctx, RunCheckpoint{...})

If h.store.Save returns an error, lastSave has already been bumped. Every subsequent Save call within the throttle window will be silently skipped (the throttle check sees now.Sub(lastSave) < throttle and returns nil). The checkpoint is therefore lost — the caller believes it was persisted but it wasn't, and the throttle prevents retries until the window expires.

Fix: Move the h.lastSave = now assignment after the successful store write, or use a two-phase approach (save first, then update the timestamp under the lock).


2. schedule/runner.go:72-80Mark failure after a successful Run causes duplicate execution

if err := r.Run(ctx, j.ID); err != nil {
    r.log().Warn(...)
    continue
}
next, err := r.Next(j.Cron, now)
if err != nil {
    r.log().Warn(...)
    continue
}
if err := r.Mark(ctx, j.ID, now, next); err != nil {
    r.log().Warn("failed to stamp scheduled job's next run", ...)
}

When Run succeeds but Mark fails, the job was already executed but its next-run time was never updated. On the next Tick, the job is still due and runs again. For idempotent jobs this is merely wasteful; for non-idempotent ones (e.g. sending a notification, charging a credit) it is a correctness bug — the same work is performed twice.

The same problem applies when Run succeeds but Next fails (unparseable cron): the job runs every tick forever.

Fix: Either (a) make Mark failure fatal to the job (skip Run on the next tick by recording a temporary "don't retry" marker), or (b) document this as an intentional design choice and add a note that callers must make their Run handler idempotent. The current code silently assumes idempotency without enforcing or documenting it.


3. schedule/runner.go:33-40 — Required function fields (Due, Run, Mark, Next) have no nil guard

The doc comment says "Every func field is required except Now and Logger", but Tick calls r.Due(ctx, now), r.Run(ctx, j.ID), r.Next(j.Cron, now), and r.Mark(ctx, j.ID, now, next) without nil-checking any of them. A zero-value Runner{} panics on the first Tick.

Fix: Add nil checks in Tick (or a constructor) that return a clear error instead of panicking, or make the fields safe by providing no-op defaults (like Now and Logger already have).


4. critic/critic.go:93-97backstopMul <= 1 silently overrides to 3, but backstopMul = 1 is a valid (if tight) choice

if backstopMul <= 1 {
    backstopMul = 3
}

A host that explicitly passes backstopMul = 1 (hard deadline == soft timeout, meaning the run is killed the instant it idles past the soft timeout) has its choice silently discarded. While backstopMul = 1 is unusual, silently overriding it violates the principle of least surprise. The condition should be backstopMul < 1 (treat only strictly-less-than-1 as "unset"), or the doc should state that <= 1 is treated as "use default 3".

Fix: Change <= 1 to < 1 so that backstopMul = 1 is accepted as an intentional (if tight) configuration.


5. critic/critic.go:93New does not set checkInterval, so it is always zero and recomputed per Monitor call

This is not a bug per se, but it means every Monitor call re-derives the same interval from softTimeout / 2. If a System monitors runs with different soft timeouts, each run gets a different check frequency. If that's intentional, fine — but the field is exported (checkInterval is lowercase, so it's unexported — actually it's lowercase, so it's package-private). The test sets it directly (s.checkInterval = 5 * time.Millisecond), which works because they're in the same package. No action needed, just noting the design.


No material issues found in the checkpoint/memory.go store, the checkpoint interface definitions, the critic ExtendOnce logic, the critic de-duplication by escalatedAt.Equal(lastActivity), or the schedule Loop/Tick control flow — those are all correct after verification against run/ports.go.

🧯 Error handling & edge cases — Blocking issues found

All three findings are confirmed against the actual code. No findings need to be dropped.

🧯 Error handling & edge cases — Verdict: Blocking issues found

Blocking

  • checkpoint/handle.go:49-52 — Throttle timer advanced before store save, causing silent data loss on transient errors.
    h.lastSave = now is set before calling h.store.Save. If the underlying store save fails (e.g. network hiccup, disk full), the caller receives the error — but the throttle window has already been consumed. Any subsequent Save call within the throttle period is silently dropped (returns nil), even though the previous snapshot was never persisted. The caller believes the retry succeeded when in fact the data is lost.
    Fix: Move h.lastSave = now to after a successful h.store.Save(...):

    h.mu.Unlock()
    err := h.store.Save(ctx, RunCheckpoint{...})
    if err != nil {
        return err
    }
    h.mu.Lock()
    h.lastSave = now
    h.mu.Unlock()
    return nil
    

Minor

  • schedule/runner.go:60-81 — Nil func fields cause panic instead of a clear error.
    Tick calls r.Due, r.Run, r.Next, and r.Mark without nil-guards. The doc says "Every func field is required" but a host that forgets to wire one gets a confusing runtime panic at the first tick, not a clear error at construction or on first call. No constructor exists (the struct is built directly).
    Fix: Add a constructor or validate at the top of Tick/Loop:

    if r.Due == nil || r.Run == nil || r.Mark == nil || r.Next == nil {
        panic("schedule.Runner: Due, Run, Mark, and Next are required")
    }
    
  • schedule/runner.go:67-74 — Successful Run + failed Next causes duplicate execution.
    When Run succeeds but Next fails (unparseable cron), the job's next-run time is not updated. The job stays due and is re-fetched on every tick, causing re-execution of an already-successful job. The code logs "not rescheduling" but doesn't acknowledge that the job will be re-run. For non-idempotent jobs this is a correctness bug.
    Fix: Either stamp a sentinel value (e.g. far-future time) to prevent re-fetch, or document the re-execution risk explicitly so hosts can make their jobs idempotent.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 51s

<!-- gadfly-review:ollama:deepseek-v4-flash:cloud --> ### 🪰 Gadfly review — `deepseek-v4-flash:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **Verdict: No material issues found** I reviewed the three new battery packages (`checkpoint/`, `critic/`, `schedule/`) and the CI/CLAUDE.md changes through the security lens. No material security issues were found. **Analysis summary:** - **No injection vectors.** All three batteries operate through abstracted interfaces/seams (`CheckpointStore`, `Escalator`, `Runner` function fields) that the host wires. The batteries never directly handle SQL, shell commands, or template execution. Data flows from host-provided callbacks through the battery and back to host-provided callbacks — the battery is pure orchestration. - **No untrusted input paths.** The `schedule.Runner` passes `Due.ID` and `Due.Cron` strings from the host's `Due` callback to the host's `Run`/`Next` callbacks. The `critic` package passes `run.RunInfo` and `Progress` to the host's `Escalator`. The `checkpoint` package stores `[]llm.Message` through the host's `CheckpointStore`. In every case the host controls both the producer and consumer of the data; the battery introduces no new trust boundary. - **No secret/credential leakage.** The CI workflow change passes `GITEA_TOKEN` as an env var to the Docker container, but this is a pre-existing pattern unchanged by this PR (only the image tag and concurrency settings were modified). - **No unsafe defaults.** `ExtendOnce` (default `Escalator`) fires at most once. The `noop` checkpointer is a safe nil-store default. The `schedule.Runner` defaults to `time.Now` and `slog.Default()` — no dangerous fallbacks. - **No SSRF, deserialization, or authn/authz gaps.** These batteries don't make network calls, deserialize untrusted data, or implement access control — those concerns live in the host layer. The code is clean from a security perspective. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> ## 🎯 Correctness — Verdict: Minor issues ### Findings **1. `checkpoint/handle.go:50-56` — `lastSave` updated before the store write, so a failed save is invisible to the throttle** ```go h.lastSave = now h.mu.Unlock() return h.store.Save(ctx, RunCheckpoint{...}) ``` If `h.store.Save` returns an error, `lastSave` has already been bumped. Every subsequent `Save` call within the throttle window will be silently skipped (the throttle check sees `now.Sub(lastSave) < throttle` and returns nil). The checkpoint is therefore *lost* — the caller believes it was persisted but it wasn't, and the throttle prevents retries until the window expires. **Fix:** Move the `h.lastSave = now` assignment *after* the successful store write, or use a two-phase approach (save first, then update the timestamp under the lock). --- **2. `schedule/runner.go:72-80` — `Mark` failure after a successful `Run` causes duplicate execution** ```go if err := r.Run(ctx, j.ID); err != nil { r.log().Warn(...) continue } next, err := r.Next(j.Cron, now) if err != nil { r.log().Warn(...) continue } if err := r.Mark(ctx, j.ID, now, next); err != nil { r.log().Warn("failed to stamp scheduled job's next run", ...) } ``` When `Run` succeeds but `Mark` fails, the job was already executed but its next-run time was never updated. On the next `Tick`, the job is still due and runs *again*. For idempotent jobs this is merely wasteful; for non-idempotent ones (e.g. sending a notification, charging a credit) it is a correctness bug — the same work is performed twice. The same problem applies when `Run` succeeds but `Next` fails (unparseable cron): the job runs every tick forever. **Fix:** Either (a) make `Mark` failure fatal to the job (skip `Run` on the next tick by recording a temporary "don't retry" marker), or (b) document this as an intentional design choice and add a note that callers must make their `Run` handler idempotent. The current code silently assumes idempotency without enforcing or documenting it. --- **3. `schedule/runner.go:33-40` — Required function fields (`Due`, `Run`, `Mark`, `Next`) have no nil guard** The doc comment says "Every func field is required except `Now` and `Logger`", but `Tick` calls `r.Due(ctx, now)`, `r.Run(ctx, j.ID)`, `r.Next(j.Cron, now)`, and `r.Mark(ctx, j.ID, now, next)` without nil-checking any of them. A zero-value `Runner{}` panics on the first `Tick`. **Fix:** Add nil checks in `Tick` (or a constructor) that return a clear error instead of panicking, or make the fields safe by providing no-op defaults (like `Now` and `Logger` already have). --- **4. `critic/critic.go:93-97` — `backstopMul <= 1` silently overrides to 3, but `backstopMul = 1` is a valid (if tight) choice** ```go if backstopMul <= 1 { backstopMul = 3 } ``` A host that explicitly passes `backstopMul = 1` (hard deadline == soft timeout, meaning the run is killed the instant it idles past the soft timeout) has its choice silently discarded. While `backstopMul = 1` is unusual, silently overriding it violates the principle of least surprise. The condition should be `backstopMul < 1` (treat only strictly-less-than-1 as "unset"), or the doc should state that `<= 1` is treated as "use default 3". **Fix:** Change `<= 1` to `< 1` so that `backstopMul = 1` is accepted as an intentional (if tight) configuration. --- **5. `critic/critic.go:93` — `New` does not set `checkInterval`, so it is always zero and recomputed per `Monitor` call** This is not a bug per se, but it means every `Monitor` call re-derives the same interval from `softTimeout / 2`. If a `System` monitors runs with different soft timeouts, each run gets a different check frequency. If that's intentional, fine — but the field is exported (`checkInterval` is lowercase, so it's unexported — actually it's lowercase, so it's package-private). The test sets it directly (`s.checkInterval = 5 * time.Millisecond`), which works because they're in the same package. No action needed, just noting the design. --- **No material issues found** in the `checkpoint/memory.go` store, the `checkpoint` interface definitions, the `critic` `ExtendOnce` logic, the `critic` de-duplication by `escalatedAt.Equal(lastActivity)`, or the `schedule` `Loop`/`Tick` control flow — those are all correct after verification against `run/ports.go`. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Blocking issues found</summary> All three findings are confirmed against the actual code. No findings need to be dropped. ## 🧯 Error handling & edge cases — Verdict: **Blocking issues found** ### Blocking - **`checkpoint/handle.go:49-52` — Throttle timer advanced before store save, causing silent data loss on transient errors.** `h.lastSave = now` is set *before* calling `h.store.Save`. If the underlying store save fails (e.g. network hiccup, disk full), the caller receives the error — but the throttle window has already been consumed. Any subsequent `Save` call within the throttle period is silently dropped (returns nil), even though the previous snapshot was never persisted. The caller believes the retry succeeded when in fact the data is lost. **Fix:** Move `h.lastSave = now` to *after* a successful `h.store.Save(...)`: ```go h.mu.Unlock() err := h.store.Save(ctx, RunCheckpoint{...}) if err != nil { return err } h.mu.Lock() h.lastSave = now h.mu.Unlock() return nil ``` ### Minor - **`schedule/runner.go:60-81` — Nil func fields cause panic instead of a clear error.** `Tick` calls `r.Due`, `r.Run`, `r.Next`, and `r.Mark` without nil-guards. The doc says "Every func field is required" but a host that forgets to wire one gets a confusing runtime panic at the first tick, not a clear error at construction or on first call. No constructor exists (the struct is built directly). **Fix:** Add a constructor or validate at the top of `Tick`/`Loop`: ```go if r.Due == nil || r.Run == nil || r.Mark == nil || r.Next == nil { panic("schedule.Runner: Due, Run, Mark, and Next are required") } ``` - **`schedule/runner.go:67-74` — Successful `Run` + failed `Next` causes duplicate execution.** When `Run` succeeds but `Next` fails (unparseable cron), the job's next-run time is not updated. The job stays due and is re-fetched on every tick, causing **re-execution** of an already-successful job. The code logs "not rescheduling" but doesn't acknowledge that the job will be re-run. For non-idempotent jobs this is a correctness bug. **Fix:** Either stamp a sentinel value (e.g. far-future time) to prevent re-fetch, or document the re-execution risk explicitly so hosts can make their jobs idempotent. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 51s</sub>

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

Through the security lens, this PR adds three host-wired batteries (checkpoint, schedule, critic) plus a workflow image bump. I traced the tainted-input paths and found nothing material:

  • schedule/runner.go: All potentially-untrusted values (Due.ID, Due.Cron, errors) come exclusively from host-supplied funcs (Due/Run/Mark/Next). They flow only into (a) the Run(id) callback (host's own dispatch, its problem) and (b) slog.Warn log lines. No SQL/command/path interpolation in this package; the only log of raw untrusted content is j.ID/j.Cron/err at runner.go:68,73,77,99, which are job ids and cron expressions — not credentials or user PII. No injection surface owned by this code.
  • checkpoint/handle.go / memory.go: meta.RunID is used as a map key in Memory (parameterization is the host's concern when backing with SQL — no query construction here). The stored RunCheckpoint carries Prompt and Messages (conversation content), but that's inherent to a resume store and not newly leaked anywhere — ListInterrupted just returns them to the boot-recovery caller; nothing logs them. No secret exposure introduced.
  • critic/critic.go: The Escalator.OnSoftTimeout returns Decision.Nudge ([]llm.Message) that the executor injects as a steer. The escalator is host-wired (Mort's LLM critic), so prompt-injection-via-nudge is host-policy territory, not a seam this PR leaves insecure. Progress.LastTool (a tool name) is passed to the escalator, but again the consumer is host-trusted. The Kill path only mutates an in-memory deadline; no privileged side effect.
  • Workflow change (.gitea/workflows/adversarial-review.yml): swaps a pinned image digest (sha-d0de034) and tunes provider concurrency env vars. No secrets are added or echoed; GITEA_TOKEN remains sourced from secrets. and not logged. Nothing to flag.

I verified the seam types against run/ports.go:42-150 (RunInfo, Critic/CriticHandle, Checkpointer/RunCheckpointState) and confirmed the implementations satisfy them; no undefined/unsafe symbol usage. The contrib/store SQLite layer touched by grepping uses parameterized database/sql accessors (no raw query building in the diff scope).

Nothing in the security lane rises to material. Stopping here.

🎯 Correctness — Blocking issues found

Both findings verified against the actual code.

Finding 1 (ExtendOnce): Confirmed at critic/critic.go:89-96 (single &ExtendOnce{} installed in System.esc when esc == nil), :62-73 (fired is a plain bool set once, never reset), and :176-183 (every handle calls the shared h.sys.esc.OnSoftTimeout). The package doc at :77 and type doc at :76-77 state "one System monitors many runs concurrently," while ExtendOnce's own doc at :50-53 reads per-run. Real design defect.

Finding 2 (schedule hot-loop): Confirmed at schedule/runner.go:71-74 — on Next error the loop continues without Marking, leaving next_run unchanged, so a Due query of "next_run <= now" returns it again. With Run succeeding but Next permanently failing, the job's Run executes every Interval indefinitely. Real, lower severity.

Verdict: Blocking issues found

Findings:

  • critic/critic.go:89-96 + critic/critic.go:50-73ExtendOnce is shared across all runs, breaking the default Escalator under the documented multi-run usage.
    New(nil, …) installs a single &ExtendOnce{} into System.esc (:90-92). System.Monitor (:102) creates a fresh per-run handle, but every handle calls the same h.sys.esc.OnSoftTimeout(...) inside tick (:184), and ExtendOnce.fired is a plain bool set to true on the first stall and never reset (:65-68). The package/type doc (:76-77) explicitly states "one System monitors many runs concurrently," and ExtendOnce's own doc (:50-53) reads as per-run ("the first time a run stalls it extends… then takes no further action — so a genuinely hung run is later killed by the hard backstop"). Under the documented usage the consequence is: the first run to stall gets its one extension; every other run that later stalls receives Decision{} (no nudge, no extend) and is instead killed outright by the hard backstop — i.e. the "extend once" guarantee silently applies once globally, not once per run. The existing tests don't catch this: TestMonitorEscalatesOncePerIdlePeriodAndExtends uses a custom escFunc, and TestExtendOnceOnlyFiresOnce exercises a single ExtendOnce in isolation. Suggested fix: key the fired state per run (e.g. a map[string]struct{} of run ids that have fired, keyed off info.RunID in OnSoftTimeout), or document that the default ExtendOnce is only valid for single-run Systems and have New construct a fresh ExtendOnce per Monitor call (per-run, not per-System).

  • schedule/runner.go:66-78 — a permanently-unparseable cron expression causes the job to execute every Interval forever, not "retry later."
    When Next returns an error, the job is left unstamped (:72-74, no Mark call). Combined with the typical Due query ("next_run <= now"), the job remains due and Run is re-invoked on every tick — so a job with a bad cron runs at full Interval cadence indefinitely rather than backing off. The comment ("will retry next tick"/"left due (retries)") implies an eventual-success retry, but for a permanent parse failure this is an infinite hot-loop of actual Run executions. Lower severity (matches stated intent for transient failures), but worth either skipping jobs whose Next error is non-transient, or marking/stamping them to defer the next attempt.

No issues found in the checkpoint battery: the throttle logic (atomic check-and-set under h.mu at handle.go:34-48), nil-store no-op (handle.go:25-27), and Memory Save/Load/Delete/ListInterrupted (memory.go) are correct and match the run.Checkpointer/CheckpointStore interfaces.

🧯 Error handling & edge cases — Minor issues

Minor issues

  • schedule/runner.go:76-78Mark failure silently causes duplicate job execution. A job whose Run succeeds and whose Next parses has already executed; if Mark then errors, only a Warn is logged and the loop moves on. Because the next-fire time was never stamped, Due returns the same job on the next tick and Run executes it again — duplicate side effects, surfaced only by the one-time Warn. The Tick docstring (lines 55-59) only promises the "stays due and retries" behavior for Run/Next failures, not for Mark; the Mark error path (lines 76-78) has no continue and no stamp recovery. Suggested fix: either document that Mark failure also implies a duplicate next tick, or treat a Mark failure as pass-fatal (return it) so the host learns the scheduler lost track of the job.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 7s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **Verdict: No material issues found** Through the security lens, this PR adds three host-wired batteries (checkpoint, schedule, critic) plus a workflow image bump. I traced the tainted-input paths and found nothing material: - **`schedule/runner.go`**: All potentially-untrusted values (`Due.ID`, `Due.Cron`, errors) come exclusively from host-supplied funcs (`Due`/`Run`/`Mark`/`Next`). They flow only into (a) the `Run(id)` callback (host's own dispatch, its problem) and (b) `slog.Warn` log lines. No SQL/command/path interpolation in this package; the only log of raw untrusted content is `j.ID`/`j.Cron`/`err` at `runner.go:68,73,77,99`, which are job ids and cron expressions — not credentials or user PII. No injection surface owned by this code. - **`checkpoint/handle.go` / `memory.go`**: `meta.RunID` is used as a map key in `Memory` (parameterization is the host's concern when backing with SQL — no query construction here). The stored `RunCheckpoint` carries `Prompt` and `Messages` (conversation content), but that's inherent to a resume store and not newly leaked anywhere — `ListInterrupted` just returns them to the boot-recovery caller; nothing logs them. No secret exposure introduced. - **`critic/critic.go`**: The `Escalator.OnSoftTimeout` returns `Decision.Nudge` (`[]llm.Message`) that the executor injects as a steer. The escalator is host-wired (Mort's LLM critic), so prompt-injection-via-nudge is host-policy territory, not a seam this PR leaves insecure. `Progress.LastTool` (a tool name) is passed to the escalator, but again the consumer is host-trusted. The `Kill` path only mutates an in-memory `deadline`; no privileged side effect. - **Workflow change** (`.gitea/workflows/adversarial-review.yml`): swaps a pinned image digest (`sha-d0de034`) and tunes provider concurrency env vars. No secrets are added or echoed; `GITEA_TOKEN` remains sourced from `secrets.` and not logged. Nothing to flag. I verified the seam types against `run/ports.go:42-150` (`RunInfo`, `Critic`/`CriticHandle`, `Checkpointer`/`RunCheckpointState`) and confirmed the implementations satisfy them; no undefined/unsafe symbol usage. The `contrib/store` SQLite layer touched by grepping uses parameterized `database/sql` accessors (no raw query building in the diff scope). Nothing in the security lane rises to material. Stopping here. </details> <details><summary><b>🎯 Correctness</b> — Blocking issues found</summary> Both findings verified against the actual code. **Finding 1 (ExtendOnce):** Confirmed at `critic/critic.go:89-96` (single `&ExtendOnce{}` installed in `System.esc` when `esc == nil`), `:62-73` (`fired` is a plain bool set once, never reset), and `:176-183` (every handle calls the shared `h.sys.esc.OnSoftTimeout`). The package doc at `:77` and type doc at `:76-77` state "one System monitors many runs concurrently," while `ExtendOnce`'s own doc at `:50-53` reads per-run. Real design defect. **Finding 2 (schedule hot-loop):** Confirmed at `schedule/runner.go:71-74` — on `Next` error the loop `continue`s without `Mark`ing, leaving `next_run` unchanged, so a `Due` query of "`next_run <= now`" returns it again. With `Run` succeeding but `Next` permanently failing, the job's `Run` executes every `Interval` indefinitely. Real, lower severity. Verdict: Blocking issues found Findings: - **`critic/critic.go:89-96` + `critic/critic.go:50-73` — `ExtendOnce` is shared across all runs, breaking the default Escalator under the documented multi-run usage.** `New(nil, …)` installs a single `&ExtendOnce{}` into `System.esc` (`:90-92`). `System.Monitor` (`:102`) creates a fresh per-run `handle`, but every handle calls the *same* `h.sys.esc.OnSoftTimeout(...)` inside `tick` (`:184`), and `ExtendOnce.fired` is a plain bool set to `true` on the first stall and never reset (`:65-68`). The package/type doc (`:76-77`) explicitly states "one `System` monitors many runs concurrently," and `ExtendOnce`'s own doc (`:50-53`) reads as per-run ("the first time *a run* stalls it extends… then takes no further action — so a genuinely hung run is later killed by the hard backstop"). Under the documented usage the consequence is: the first run to stall gets its one extension; every *other* run that later stalls receives `Decision{}` (no nudge, no extend) and is instead killed outright by the hard backstop — i.e. the "extend once" guarantee silently applies once *globally*, not once per run. The existing tests don't catch this: `TestMonitorEscalatesOncePerIdlePeriodAndExtends` uses a custom `escFunc`, and `TestExtendOnceOnlyFiresOnce` exercises a single `ExtendOnce` in isolation. Suggested fix: key the `fired` state per run (e.g. a `map[string]struct{}` of run ids that have fired, keyed off `info.RunID` in `OnSoftTimeout`), or document that the default `ExtendOnce` is only valid for single-run `System`s and have `New` construct a fresh `ExtendOnce` per `Monitor` call (per-run, not per-System). - **`schedule/runner.go:66-78` — a permanently-unparseable cron expression causes the job to execute every `Interval` forever, not "retry later."** When `Next` returns an error, the job is left unstamped (`:72-74`, no `Mark` call). Combined with the typical `Due` query ("`next_run <= now`"), the job remains due and `Run` is re-invoked on *every* tick — so a job with a bad cron runs at full `Interval` cadence indefinitely rather than backing off. The comment ("will retry next tick"/"left due (retries)") implies an eventual-success retry, but for a permanent parse failure this is an infinite hot-loop of actual `Run` executions. Lower severity (matches stated intent for transient failures), but worth either skipping jobs whose `Next` error is non-transient, or marking/stamping them to defer the next attempt. No issues found in the `checkpoint` battery: the throttle logic (atomic check-and-set under `h.mu` at `handle.go:34-48`), nil-store no-op (`handle.go:25-27`), and `Memory` `Save`/`Load`/`Delete`/`ListInterrupted` (`memory.go`) are correct and match the `run.Checkpointer`/`CheckpointStore` interfaces. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> Minor issues - **`schedule/runner.go:76-78` — `Mark` failure silently causes duplicate job execution.** A job whose `Run` succeeds and whose `Next` parses has *already executed*; if `Mark` then errors, only a `Warn` is logged and the loop moves on. Because the next-fire time was never stamped, `Due` returns the same job on the next tick and `Run` executes it again — duplicate side effects, surfaced only by the one-time Warn. The `Tick` docstring (lines 55-59) only promises the "stays due and retries" behavior for `Run`/`Next` failures, not for `Mark`; the `Mark` error path (lines 76-78) has no `continue` and no stamp recovery. Suggested fix: either document that `Mark` failure also implies a duplicate next tick, or treat a `Mark` failure as pass-fatal (return it) so the host learns the scheduler lost track of the job. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 7s</sub>
steve added 1 commit 2026-06-27 03:32:29 +00:00
fix: address verified gadfly P4c findings (3-cloud fleet)
executus CI / test (pull_request) Successful in 1m39s
eea84e6e2c
critic (all 3 models — HIGH):
- ExtendOnce was a single global one-shot shared across every run a System
  monitors, so only the FIRST run to stall got its extension and all others
  were killed by the backstop. Key the fired-state per run (RunInfo.RunID).
- Kill is now sticky: a `killed` flag short-circuits later ticks so a wavering
  Escalator returning ExtendBy after a Kill can't un-collapse the deadline; a
  Kill paired with Nudge/ExtendBy ignores the latter.
- watch() recovers panics from a misbehaving Escalator (logs; the run falls
  back to its existing deadline) instead of silently killing the watch goroutine.

checkpoint (deepseek — HIGH): handle.Save advanced the throttle clock BEFORE
the store write, so a failed save was silently throttled away (caller believes
it persisted). Advance lastSave only after a successful persist.

schedule (all 3): compute Next BEFORE Run — a permanently-unparseable cron now
skips the job entirely instead of re-running it every tick forever; nil required
callbacks return a validate() error instead of a first-tick nil panic; Loop
recovers tick panics; the Mark-failure => possible-re-run trade-off is documented
(Run must be idempotent). + tests for each.

Triaged-but-kept: critic backstopMul<=1 floor (it's a total-runtime multiple, so
a floor >1 is intentional, not the reported footgun); checkpoint Load (nil,nil)
on miss (documented convention).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve closed this pull request 2026-06-27 04:16:08 +00:00
steve deleted branch phase-4c-batteries 2026-06-27 04:16:08 +00:00
executus CI / test (pull_request) Successful in 1m39s

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/executus#6