run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status)
executus CI / test (pull_request) Successful in 46s
Adversarial Review (Gadfly) / review (pull_request) Successful in 22m30s

Completes the run-critic seam so a host adapter (mort's agentcritic) has full
fidelity, closing the two limitations gadfly surfaced on mort #1334.

- RecordStep(iter int, resp *llm.Response): the completed step's model response
  is now passed to the critic (was index-only), so a host that records a trace
  (mort's ProgressRecorder) can show what the agent actually produced, not just
  an iteration count. The executor forwards s.Response; the battery ignores it
  (its Progress is count-based).
- CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes
  an explicit critic KILL from a natural backstop expiry. runCtx uses a
  cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with
  DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when
  KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause →
  killed / timeout / cancelled are now distinct (were all "cancelled"). The
  battery sets killCause from Decision.KillReason on a Kill.

Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle
+ battery RecordStep/KillCause signatures. Core stays battery-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-27 16:35:13 -04:00
parent 1a1d5e417b
commit 390e6cf905
7 changed files with 98 additions and 36 deletions
+17 -2
View File
@@ -2,6 +2,7 @@ package run
import (
"context"
"errors"
"time"
"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
@@ -9,6 +10,12 @@ import (
"gitea.stevedudenhoeffer.com/steve/executus/deliver"
)
// ErrCriticKill is the cancellation cause the executor stamps on a run the
// critic kills, so a critic kill surfaces as a distinct "killed" status (vs a
// backstop "timeout" or a caller "cancelled"). A host CriticHandle signals a
// kill via KillCause(); the executor wraps that reason with this sentinel.
var ErrCriticKill = errors.New("run: critic killed the run")
// Ports are the host seams the run executor consumes. Every field is nil-safe:
// a light host passes the zero Ports and gets a bounded, in-memory run with no
// persistence, audit, budget, critic, delegation, or delivery — which is
@@ -123,8 +130,10 @@ type Critic interface {
// methods (the critic battery's handle guards its state with a mutex).
type CriticHandle interface {
// RecordStep / RecordToolStart keep the critic's activity clock fresh so a
// healthy-but-slow run is not mistaken for a hang.
RecordStep(iter int)
// healthy-but-slow run is not mistaken for a hang. RecordStep also carries the
// completed step's model response (nil-safe) so the critic's Trace can show
// what the agent actually produced, not just an iteration count.
RecordStep(iter int, resp *llm.Response)
RecordToolStart(name, args string)
// Steer returns any messages the critic wants injected into the loop (a
// nudge), drained before each step — matches majordomo agent.WithSteer.
@@ -137,6 +146,12 @@ type CriticHandle interface {
// healthy-but-long run's iteration budget mid-flight. Return <= 0 to defer to
// the run's base MaxIterations.
MaxSteps() int
// KillCause returns a non-nil reason iff the critic has decided to KILL this
// run (as opposed to letting the hard-deadline backstop expire). The executor
// reads it when the deadline passes: non-nil → cancel the run with
// ErrCriticKill (status "killed"); nil → the backstop expired naturally
// (status "timeout"). Hosts that never distinguish the two may return nil.
KillCause() error
// Stop ends monitoring when the run finishes.
Stop()
}