P4: checkpoint battery — durable-resume seam + run.Checkpointer handle

Plugs into run.Ports.Checkpointer (the executor's call site is a P2 follow-up;
this provides the seam + impls ahead of it):
- checkpoint.go: CheckpointStore seam + RunCheckpoint{Meta, Messages, Iteration,
  ActivePhase} + RunCheckpointMeta (mirrors mort's agentexec types).
- handle.go: New(store, meta, throttle, now) -> run.Checkpointer. Save writes a
  throttled snapshot; Complete/Fail delete it (a cleanly finished or terminally
  failed run is NOT a recovery candidate; a shutdown-interrupted run never calls
  them, so its checkpoint survives ListInterrupted at boot). nil store -> no-op.
- memory.go: NewMemory() default (with the honest caveat that in-memory does
  not survive the restart it exists to recover from — a durable store is mort's).

Tests: save+complete clears the recovery candidate; throttle skips in-window
saves; nil-store is a clean no-op. Core imports ZERO from checkpoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-26 23:00:02 -04:00
parent 5b5e130cee
commit e856dacc12
5 changed files with 242 additions and 1 deletions
+50
View File
@@ -0,0 +1,50 @@
// Package checkpoint is the durable-resume battery: it persists a run's
// resumable progress so a run interrupted by a shutdown can be recovered and
// continued on the next boot, rather than silently lost. It plugs into
// run.Ports.Checkpointer.
//
// Mort backs CheckpointStore with its durable-job table; Memory() is the
// zero-dependency default; contrib/store can add a SQLite one. NOTE: the
// executor's call into run.Ports.Checkpointer is a P2 follow-up — this battery
// provides the seam + impls ahead of that wiring.
package checkpoint
import (
"context"
"time"
"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
)
// RunCheckpointMeta is the run attribution needed to resume a run from scratch
// (mirrors mort's agentexec.RunCheckpointMeta).
type RunCheckpointMeta struct {
RunID string
AgentID string
AgentName string
CallerID string
ChannelID string
GuildID string
Prompt string
ModelTier string
ParentRunID string
}
// RunCheckpoint is one persisted snapshot of a run's resumable progress.
type RunCheckpoint struct {
Meta RunCheckpointMeta
Messages []llm.Message // conversation so far
Iteration int // completed agent-loop iterations
ActivePhase string // current phase name (multi-phase agents); "" otherwise
UpdatedAt time.Time
}
// CheckpointStore persists run checkpoints keyed by run id. A live checkpoint
// means "this run was in flight and not cleanly finished"; Complete/Fail delete
// it. ListInterrupted returns every surviving checkpoint at boot for recovery.
type CheckpointStore interface {
Save(ctx context.Context, cp RunCheckpoint) error
Load(ctx context.Context, runID string) (*RunCheckpoint, error)
Delete(ctx context.Context, runID string) error
ListInterrupted(ctx context.Context) ([]RunCheckpoint, error)
}