e856dacc12
Plugs into run.Ports.Checkpointer (the executor's call site is a P2 follow-up;
this provides the seam + impls ahead of it):
- checkpoint.go: CheckpointStore seam + RunCheckpoint{Meta, Messages, Iteration,
ActivePhase} + RunCheckpointMeta (mirrors mort's agentexec types).
- handle.go: New(store, meta, throttle, now) -> run.Checkpointer. Save writes a
throttled snapshot; Complete/Fail delete it (a cleanly finished or terminally
failed run is NOT a recovery candidate; a shutdown-interrupted run never calls
them, so its checkpoint survives ListInterrupted at boot). nil store -> no-op.
- memory.go: NewMemory() default (with the honest caveat that in-memory does
not survive the restart it exists to recover from — a durable store is mort's).
Tests: save+complete clears the recovery candidate; throttle skips in-window
saves; nil-store is a clean no-op. Core imports ZERO from checkpoint.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
51 lines
1.8 KiB
Go
51 lines
1.8 KiB
Go
// Package checkpoint is the durable-resume battery: it persists a run's
|
|
// resumable progress so a run interrupted by a shutdown can be recovered and
|
|
// continued on the next boot, rather than silently lost. It plugs into
|
|
// run.Ports.Checkpointer.
|
|
//
|
|
// Mort backs CheckpointStore with its durable-job table; Memory() is the
|
|
// zero-dependency default; contrib/store can add a SQLite one. NOTE: the
|
|
// executor's call into run.Ports.Checkpointer is a P2 follow-up — this battery
|
|
// provides the seam + impls ahead of that wiring.
|
|
package checkpoint
|
|
|
|
import (
|
|
"context"
|
|
"time"
|
|
|
|
"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
|
|
)
|
|
|
|
// RunCheckpointMeta is the run attribution needed to resume a run from scratch
|
|
// (mirrors mort's agentexec.RunCheckpointMeta).
|
|
type RunCheckpointMeta struct {
|
|
RunID string
|
|
AgentID string
|
|
AgentName string
|
|
CallerID string
|
|
ChannelID string
|
|
GuildID string
|
|
Prompt string
|
|
ModelTier string
|
|
ParentRunID string
|
|
}
|
|
|
|
// RunCheckpoint is one persisted snapshot of a run's resumable progress.
|
|
type RunCheckpoint struct {
|
|
Meta RunCheckpointMeta
|
|
Messages []llm.Message // conversation so far
|
|
Iteration int // completed agent-loop iterations
|
|
ActivePhase string // current phase name (multi-phase agents); "" otherwise
|
|
UpdatedAt time.Time
|
|
}
|
|
|
|
// CheckpointStore persists run checkpoints keyed by run id. A live checkpoint
|
|
// means "this run was in flight and not cleanly finished"; Complete/Fail delete
|
|
// it. ListInterrupted returns every surviving checkpoint at boot for recovery.
|
|
type CheckpointStore interface {
|
|
Save(ctx context.Context, cp RunCheckpoint) error
|
|
Load(ctx context.Context, runID string) (*RunCheckpoint, error)
|
|
Delete(ctx context.Context, runID string) error
|
|
ListInterrupted(ctx context.Context) ([]RunCheckpoint, error)
|
|
}
|