Real findings from the consensus review (37 raw; many devstral dups/noise):
- Optional/budget-salvage branches no longer swallow a context
cancellation / deadline / critic-kill: such errors return immediately so
the run is classified cancelled/timeout/killed, not "ok" with a fallback.
(the most serious finding — an Optional final phase could mask a killed run)
- IsRunFunc bare phase now feeds the SHARED step observer (not just the
audit recorder), so the critic's activity clock + Result.Steps see it —
a long synthesize phase no longer looks idle to the critic.
- phaseModel returns the resolver's enriched (usage-attribution) context and
the phase's calls use it, mirroring the single-loop path (non-base-tier
phases were mis-attributed).
- salvagePhaseTranscript trims the tail on a rune boundary (was a raw byte
slice that could split a UTF-8 rune); maxSalvage is now a named const with
rationale.
- expandPhaseTemplate logs a WARN on parse/execute failure instead of
silently returning the unexpanded template; documented the phase-name
identifier requirement + the "Query" shadow.
- removed the dead phaseDeps.baseTier field.
- extracted multimodalUserMessage, shared by runAgent + the phase runner
(was duplicated image-folding).
- aggregated phase usage is stamped onto the result even on a hard-error
return; TrimSpace computed once; filterToolbox returns the base toolbox
as-is for the empty-names (full-palette) case instead of copying;
phaseModel WARN no longer prints error=<nil>.
New test: Optional phase does not swallow a cancellation. Full ./... green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The kernel carried RunnableAgent.Phases as a DTO but never executed it —
Run always ran a single agent loop with ra.SystemPrompt, so a phased agent
(mort's deepresearch/research) silently ran one loop with the base prompt
instead of its pipeline. This implements the phase loop, ported from mort's
agentexec pipeline but reusing the kernel's own machinery.
- run/phases.go: runPhases / runOnePhase. Phases run sequentially; each is a
fresh agent loop (or a bare LLM call for IsRunFunc phases) with its own
template-expanded system prompt ({{.Query}} + {{.<PhaseName>}}), model
tier, step cap, and tool subset. Outputs thread into later phases; the
final phase's output is the run output. Optional phases swallow errors and
substitute FallbackMessage; a non-optional phase that merely exhausts its
step/tool budget salvages its partial transcript and continues (a hard
error still aborts); per-phase tier-resolve failures fall back with a WARN.
- run/agent.go: Phase gains IsRunFunc + FallbackMessage (the kernel Phase
struct previously omitted them).
- run/executor.go: Run factors the shared agent options (tool-error limits,
step observer, compactor) and branches — single loop (critic's dynamic
step ceiling) vs the phase runner (fixed per-phase caps; the run-level
critic's steer + hard deadline still apply across phases). systemPrompt
now delegates to systemPromptWithBody so each phase keeps the platform
header. The same step observer feeds audit/steps/critic across all phases.
Tests (run/phases_test.go): sequential output threading + template
expansion, Optional-failure → FallbackMessage continues, hard-error abort,
IsRunFunc bare call, per-phase SystemHeader, filterToolbox subset, template
expansion. Full ./... suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
executus's tool.Invocation already carried InputFiles (audio/PDF/binary), but the
executor never staged them — only Images were folded into the run. This adds the
host seam mort's chat/chatbot surfaces need for audio-input parity with agentexec.
- run.Ports gains InputFiles InputFileStager (nil-safe; nil = input files silently
ignored, run still proceeds text-only). The interface mirrors mort's skill
FileStorage: StageInputFile(ctx, runID, agentID, name, mime, content) → file_id.
- run/input_files.go (ported from mort agentexec/input_files.go): stageInputFiles
persists each file under run scope and appends an [ATTACHED FILES] descriptor
block to the prompt so the agent can reach them by file_id (e.g. code_exec
files_in → /workspace/<name>). Bytes are NEVER inlined into model context.
Best-effort: empty/oversized(>50MB)/save-error files are skipped; colliding
base names are disambiguated (name-2, name-3) so they don't clobber at
/workspace/<name>.
- Executor.Run calls it after the model/toolbox build, before the loop, so the
descriptor rides the first user turn (alongside the existing Images folding).
Tests: stages + builds the block; nil stager / no files leave the prompt intact;
dedup; empty/save-error skipping. Full suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Every reviewer flagged that runAgent appended llm.Text(input) unconditionally, so
an image-only run (blank prompt) emitted an empty TextPart — inconsistent with the
sibling runSession.AttachImages which guards it. Mirror that guard
(strings.TrimSpace(input) != ""). Also:
- copy opts before appending (variadic backing array can have spare capacity; avoid
aliasing a caller's slice).
- reword the doc comment to drop the mort-agentexec reference (executus is a
standalone lib; a consumer name doesn't belong in its godoc).
Tests: image+text are co-located in ONE user message; an image-only run emits no
blank TextPart.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The executor passed only the text `input` to majordomo's agent.Run, silently
dropping inv.Images — so a multimodal run (vision: chatbot @mention, chat API)
lost its images on the executus path. majordomo's Run input arg is text-only, so
fold the images into the first user message (text + image parts) via WithHistory
and call Run with empty input, mirroring mort agentexec's multimodal seeding. The
image-less path is unchanged (prompt passes straight through).
Tests: a run with Images carries the image bytes + prompt into the first model
request; the text-only path still reaches the model.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two convergent gadfly refinements on the PostRun wiring:
- PostRun now runs on detach(ctx), not the caller's ctx — a finished/cancelled
caller no longer aborts artifact production (3-model: glm-5.2/minimax/deepseek).
- Cleanup is panic-isolated via safeCleanup (recover+log), matching runPostRun, so
a misbehaving teardown can't clobber an otherwise-successful run (deepseek).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The session-tool TYPES already lived in tool/ (P4 move) but the executor never
used them. This wires them, unblocking artifact-producing host surfaces (mort's
chat API / chatbot / .skill / scaddy) to run on executus:
- run/session.go: steerMailbox (thread-safe message queue) + runSession
(tool.AgentSession over it: AttachImages → a user-role multimodal message
injected before the agent's next step) + runPostRun (panic-isolated hook call).
- executor: create the mailbox + set inv.AttachImages BEFORE the toolbox build;
add inv.ExtraTools + a SessionToolFactory's per-run Tools to the toolbox; defer
its Cleanup; merge the session mailbox with the critic's nudges into ONE
WithSteer; after the run, call PostRun with the full transcript
(runRes.Messages) → Result.PostRunResult (best-effort, never fails the run).
- run.Result += PostRunResult *tool.PostRunResult.
- dropped the now-dead criticBinding.steerOptions (superseded by drainSteer).
Tests: a factory whose PostRun emits an artifact from the output+transcript +
Cleanup lands on Result.PostRunResult; a factory-added tool is callable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The WithCancelCause+timer rewrite made MaxRuntime surface as Canceled (not
DeadlineExceeded), so statusFor's context.Cause(DeadlineExceeded) check could
relabel (a) a genuine run error as 'timeout' and (b) a caller cancel/deadline as
'timeout' (was 'cancelled'). Convergent gadfly finding (glm-5.2 + cluster).
Fix: keep MaxRuntime as WithTimeout (its DeadlineExceeded propagates → 'timeout',
preserving own-timeout vs caller-cancel), add a NESTED WithCancelCause layer only
for the kill. statusFor consults context.Cause ONLY for ErrCriticKill; everything
else is classified by the run error itself. Tests: generic-error-not-relabeled +
caller-cancel-stays-cancelled.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the run-critic seam so a host adapter (mort's agentcritic) has full
fidelity, closing the two limitations gadfly surfaced on mort #1334.
- RecordStep(iter int, resp *llm.Response): the completed step's model response
is now passed to the critic (was index-only), so a host that records a trace
(mort's ProgressRecorder) can show what the agent actually produced, not just
an iteration count. The executor forwards s.Response; the battery ignores it
(its Progress is count-based).
- CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes
an explicit critic KILL from a natural backstop expiry. runCtx uses a
cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with
DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when
KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause →
killed / timeout / cancelled are now distinct (were all "cancelled"). The
battery sets killCause from Decision.KillReason on a Kill.
Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle
+ battery RecordStep/KillCause signatures. Core stays battery-free.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prerequisite for a full-fidelity mort agentcritic adapter (which adjusts a
healthy-but-long run's iteration budget, not just its deadline). executus's
CriticHandle was deadline+steer only; this adds the dynamic step ceiling above
an unchanged majordomo (which already exposes WithMaxStepsFunc).
- run.RunInfo += MaxIterations (the run's base ceiling, so a critic can raise it
relative to the baseline).
- run.CriticHandle += MaxSteps() int — polled by the executor each step via
agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses
WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else WithMaxSteps.
- critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) +
MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling
alongside ExtendBy. ExtendOnce default is unchanged (time-only).
Test: a critic returning MaxSteps=5 lets a base-MaxIterations=1 run complete two
tool-dispatch steps past the base ceiling. Core stays battery-free (run doesn't
import critic).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From PR #9 (minimax + deepseek):
- Run now has a top-level recover() — the "never propagates a panic" promise was
unenforced; a panicking host Port (Critic/Audit/Palette) on the run goroutine
now becomes Result.Err instead of unwinding into the caller.
- The critic deadline-watch goroutine recovers panics from a host Deadline()
(it's a separate goroutine, so Run's recover can't catch it) — a buggy
CriticHandle can't crash the process.
- CriticHandle interface documents its concurrency contract (Record*/Steer on the
run goroutine vs Deadline()/Stop() from the watch goroutine — impls must be
concurrent-safe; the critic battery already is).
- startCritic's dead `soft <= 0 -> noop` guard (withFallbacks already coerces to
90s) replaced with a defensive inline 90s default, so a bypass of withFallbacks
still gets a working critic instead of silently none.
- Delivery tests made honest: the old "error path" test only checked the
early-return (no delivery); added TestDeliverErrorOnRunFailure (in-loop model
error -> DeliverError to the target) + renamed the early-return test.
Graded all #9 findings in the gadfly MCP.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Continues finishing the executor's run.Ports wiring (after C0's Palette).
Critic (run/critic.go): when Ports.Critic is set and the agent enables it, the
executor calls Monitor at run start, feeds RecordStep/RecordToolStart from the
step observer, drains the critic's Steer messages into the loop via
agent.WithSteer, and binds the run's hard cancellation to the critic's
(extendable) Deadline through a watch goroutine — a healthy-but-slow run gets
room while a hung one is killed. Stop() on run end. Soft timeout from
Defaults.CriticSoftTimeout (default 90s). nil-safe: no critic / not-enabled =
no-op.
Delivery (run/executor.go deliver): after the run, when Ports.Delivery is set
and inv.DeliveryID is non-empty, the executor posts Result.Output (or
DeliverError on failure) to a host-interpreted deliver.Target
{inv.DeliveryKind, inv.DeliveryID}. Empty target = caller reads Result.Output
itself (the synchronous default; the `.agent run` canary). Best-effort +
detached.
tool.Invocation gains DeliveryKind/DeliveryID (host-set egress target).
Tests: critic monitored/fed/steered/stopped when enabled, untouched when not;
delivery posts on a target, skips without one. Deferred: Checkpointer (needs a
majordomo hook to snapshot the running message history).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The first cutover prerequisite: the executor now turns an agent's SkillPalette /
SubAgentPalette into delegation tools so a mort agent that delegates works
through run.Executor (the piece the `.agent run` canary needs beyond the
already-wired audit/budget).
- run/palette.go: addDelegationTools builds a skill__<name> tool (structured
inputs) per SkillPalette entry and an agent__<name> tool (prompt) per
SubAgentPalette entry, each invoking run.Ports.Palette as a CHILD of the
current run (parentRunID = inv.RunID, inheriting caller + channel). A non-ok
child status is surfaced to the parent with the partial output. nil-safe: no
PaletteSource or empty palette → no delegation tools (unchanged behavior).
- executor.go: call it right after building the low-level toolbox.
Tests: the model calls skill__helper → routed through Palette with the right
name/caller/inputs/parent; nil palette → run still works.
Deferred to C0b (the remaining run.Ports executor wiring): Critic (soft-timeout
monitor + deadline binding + steer), Delivery (output egress for surfaces that
need executor-side delivery), Checkpointer (needs a majordomo message-history
hook to snapshot resumable state). The `.agent run` canary delivers its returned
Result.Output itself, so these aren't on its critical path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Independently verified all 18 gadfly findings against the code (18-agent
fan-out). Fixed the 9 real ones; the other 9 were false-positive /
hallucinated / valid-tradeoff (no change).
High:
- F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the
agent loop and nil-panicked. Now a clean error (Run never panics). +test.
- F9 compactor data-leak: renderTranscript sent tool-call args verbatim to
the summarizer (a possibly-different provider/tier); secret-bearing tool
args (mcp_call/email_send/http_*/webhook_*) are now redacted, with a doc
note that result bodies still flow (summary needs them).
Medium/minor:
- F2 compactor error path returned the folded slice, not the original msgs
(contradicting the documented non-fatal contract) -> return msgs.
- F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) /
cancelled (Canceled) via statusFor. +test.
- F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs
min(calls,results) only.
- F5 SetIteration was never called -> RunState.Iteration always 0; the step
observer now updates it each loop.
- F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue).
- F7 estimateTokens had no default arm (future Part kinds counted as 0);
unknown parts now counted conservatively.
- F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now
a clear "response exceeded N bytes" via readCapped.
- F12 step observer captured the caller ctx; now the merged runCtx.
- F13 compaction onFire was nil (doc claimed it logged); now wired to
audit LogEvent("compaction_fired").
- F11 (no pre-dispatch hook in majordomo) documented honestly as a known
limitation; F18 UsageSink doc clarified cache tokens are subsets of input.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The capstone of the run kernel: run.Executor.Run(ctx, RunnableAgent, inv)
ties model resolution + the tool registry + majordomo's agent loop +
context compaction + run-bounding + step/audit instrumentation into one
path, with every host concern behind the nil-safe run.Ports.
- run/executor.go: New(Config{Registry, Models, Defaults, Ports, Compactor,
ContextTokens, SystemHeader}) + Run -> Result{RunID, Output, Steps, Usage,
Err}. Budget gate (pre-run), model resolve, Audit StartRun/recorder
(satisfies RunTally, stamped on inv.RunState), toolbox build, step observer
(zips tool calls/results -> emitter + recorder.OnStep/OnTool), V10
detached-MaxRuntime context with caller-cancel merged back, compaction wired
from ContextTokens×ratio, audit Close + Budget Commit on a detached cleanup
ctx. Zero Ports = a bounded in-memory run (gadfly's case).
- run/executor_test.go: hermetic end-to-end run against majordomo's fake
provider (hello-world), Budget-rejection (no model call), Audit-port wiring
(StartRun + Close with terminal status/output). All green under -race.
- examples/minimal upgraded to the real "hello, agentic world" (~15 lines:
Configure tiers -> run.New -> Run -> print). README/CLAUDE.md updated.
Remaining P2 follow-ups (incremental): wire Critic/Checkpointer/PaletteSource/
Delivery into the loop, multi-phase Pipelines, and the no-tools direct path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>