executus

Author	SHA1	Message	Date
steve	0dd2ced717	fix(run): address gadfly review of the phases PR executus CI / test (pull_request) Successful in 48s Details Real findings from the consensus review (37 raw; many devstral dups/noise): - Optional/budget-salvage branches no longer swallow a context cancellation / deadline / critic-kill: such errors return immediately so the run is classified cancelled/timeout/killed, not "ok" with a fallback. (the most serious finding — an Optional final phase could mask a killed run) - IsRunFunc bare phase now feeds the SHARED step observer (not just the audit recorder), so the critic's activity clock + Result.Steps see it — a long synthesize phase no longer looks idle to the critic. - phaseModel returns the resolver's enriched (usage-attribution) context and the phase's calls use it, mirroring the single-loop path (non-base-tier phases were mis-attributed). - salvagePhaseTranscript trims the tail on a rune boundary (was a raw byte slice that could split a UTF-8 rune); maxSalvage is now a named const with rationale. - expandPhaseTemplate logs a WARN on parse/execute failure instead of silently returning the unexpanded template; documented the phase-name identifier requirement + the "Query" shadow. - removed the dead phaseDeps.baseTier field. - extracted multimodalUserMessage, shared by runAgent + the phase runner (was duplicated image-folding). - aggregated phase usage is stamped onto the result even on a hard-error return; TrimSpace computed once; filterToolbox returns the base toolbox as-is for the empty-names (full-palette) case instead of copying; phaseModel WARN no longer prints error=<nil>. New test: Optional phase does not swallow a cancellation. Full ./... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 15:44:04 -04:00
steve	30b79a330f	feat(run): execute multi-phase pipelines (RunnableAgent.Phases) executus CI / test (pull_request) Successful in 1m49s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 13m59s Details The kernel carried RunnableAgent.Phases as a DTO but never executed it — Run always ran a single agent loop with ra.SystemPrompt, so a phased agent (mort's deepresearch/research) silently ran one loop with the base prompt instead of its pipeline. This implements the phase loop, ported from mort's agentexec pipeline but reusing the kernel's own machinery. - run/phases.go: runPhases / runOnePhase. Phases run sequentially; each is a fresh agent loop (or a bare LLM call for IsRunFunc phases) with its own template-expanded system prompt ({{.Query}} + {{.<PhaseName>}}), model tier, step cap, and tool subset. Outputs thread into later phases; the final phase's output is the run output. Optional phases swallow errors and substitute FallbackMessage; a non-optional phase that merely exhausts its step/tool budget salvages its partial transcript and continues (a hard error still aborts); per-phase tier-resolve failures fall back with a WARN. - run/agent.go: Phase gains IsRunFunc + FallbackMessage (the kernel Phase struct previously omitted them). - run/executor.go: Run factors the shared agent options (tool-error limits, step observer, compactor) and branches — single loop (critic's dynamic step ceiling) vs the phase runner (fixed per-phase caps; the run-level critic's steer + hard deadline still apply across phases). systemPrompt now delegates to systemPromptWithBody so each phase keeps the platform header. The same step observer feeds audit/steps/critic across all phases. Tests (run/phases_test.go): sequential output threading + template expansion, Optional-failure → FallbackMessage continues, hard-error abort, IsRunFunc bare call, per-phase SystemHeader, filterToolbox subset, template expansion. Full ./... suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 15:14:45 -04:00
steve	2ef88f2a73	feat(run): InputFileStager seam — stage non-image attachments into the prompt Adversarial Review (Gadfly) / review (pull_request) Has been cancelled Details executus CI / test (pull_request) Successful in 2m21s Details executus's tool.Invocation already carried InputFiles (audio/PDF/binary), but the executor never staged them — only Images were folded into the run. This adds the host seam mort's chat/chatbot surfaces need for audio-input parity with agentexec. - run.Ports gains InputFiles InputFileStager (nil-safe; nil = input files silently ignored, run still proceeds text-only). The interface mirrors mort's skill FileStorage: StageInputFile(ctx, runID, agentID, name, mime, content) → file_id. - run/input_files.go (ported from mort agentexec/input_files.go): stageInputFiles persists each file under run scope and appends an [ATTACHED FILES] descriptor block to the prompt so the agent can reach them by file_id (e.g. code_exec files_in → /workspace/<name>). Bytes are NEVER inlined into model context. Best-effort: empty/oversized(>50MB)/save-error files are skipped; colliding base names are disambiguated (name-2, name-3) so they don't clobber at /workspace/<name>. - Executor.Run calls it after the model/toolbox build, before the loop, so the descriptor rides the first user turn (alongside the existing Images folding). Tests: stages + builds the block; nil stager / no files leave the prompt intact; dedup; empty/save-error skipping. Full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 13:02:55 -04:00
steve	0acaa8c9a5	run: guard empty text part in runAgent + drop cross-repo doc ref (gadfly #16 ) executus CI / test (pull_request) Successful in 1m46s Details Every reviewer flagged that runAgent appended llm.Text(input) unconditionally, so an image-only run (blank prompt) emitted an empty TextPart — inconsistent with the sibling runSession.AttachImages which guards it. Mirror that guard (strings.TrimSpace(input) != ""). Also: - copy opts before appending (variadic backing array can have spare capacity; avoid aliasing a caller's slice). - reword the doc comment to drop the mort-agentexec reference (executus is a standalone lib; a consumer name doesn't belong in its godoc). Tests: image+text are co-located in ONE user message; an image-only run emits no blank TextPart. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 01:11:15 -04:00
steve	a35c176b42	run: fold inv.Images into the initial user message (multimodal opening turn) executus CI / test (pull_request) Successful in 46s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 6m5s Details The executor passed only the text `input` to majordomo's agent.Run, silently dropping inv.Images — so a multimodal run (vision: chatbot @mention, chat API) lost its images on the executus path. majordomo's Run input arg is text-only, so fold the images into the first user message (text + image parts) via WithHistory and call Run with empty input, mirroring mort agentexec's multimodal seeding. The image-less path is unchanged (prompt passes straight through). Tests: a run with Images carries the image bytes + prompt into the first model request; the text-only path still reaches the model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 00:37:53 -04:00
steve	784d5d7ce4	run: PostRun detached ctx + panic-isolated Cleanup (gadfly #12 ) executus CI / test (pull_request) Successful in 45s Details executus CI / test (push) Successful in 1m47s Details Two convergent gadfly refinements on the PostRun wiring: - PostRun now runs on detach(ctx), not the caller's ctx — a finished/cancelled caller no longer aborts artifact production (3-model: glm-5.2/minimax/deepseek). - Cleanup is panic-isolated via safeCleanup (recover+log), matching runPostRun, so a misbehaving teardown can't clobber an otherwise-successful run (deepseek). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 18:33:41 -04:00
steve	4e179259de	run: wire SessionToolFactory + PostRun artifacts + AttachImages executus CI / test (pull_request) Successful in 1m49s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 5m19s Details The session-tool TYPES already lived in tool/ (P4 move) but the executor never used them. This wires them, unblocking artifact-producing host surfaces (mort's chat API / chatbot / .skill / scaddy) to run on executus: - run/session.go: steerMailbox (thread-safe message queue) + runSession (tool.AgentSession over it: AttachImages → a user-role multimodal message injected before the agent's next step) + runPostRun (panic-isolated hook call). - executor: create the mailbox + set inv.AttachImages BEFORE the toolbox build; add inv.ExtraTools + a SessionToolFactory's per-run Tools to the toolbox; defer its Cleanup; merge the session mailbox with the critic's nudges into ONE WithSteer; after the run, call PostRun with the full transcript (runRes.Messages) → Result.PostRunResult (best-effort, never fails the run). - run.Result += PostRunResult *tool.PostRunResult. - dropped the now-dead criticBinding.steerOptions (superseded by drainSteer). Tests: a factory whose PostRun emits an artifact from the output+transcript + Cleanup lands on Result.PostRunResult; a factory-added tool is callable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 18:13:16 -04:00
steve	be4bbbcad5	run: fix statusFor — don't relabel a generic error / caller-cancel as timeout (gadfly #11 ) executus CI / test (pull_request) Successful in 47s Details executus CI / test (push) Successful in 45s Details The WithCancelCause+timer rewrite made MaxRuntime surface as Canceled (not DeadlineExceeded), so statusFor's context.Cause(DeadlineExceeded) check could relabel (a) a genuine run error as 'timeout' and (b) a caller cancel/deadline as 'timeout' (was 'cancelled'). Convergent gadfly finding (glm-5.2 + cluster). Fix: keep MaxRuntime as WithTimeout (its DeadlineExceeded propagates → 'timeout', preserving own-timeout vs caller-cancel), add a NESTED WithCancelCause layer only for the kill. statusFor consults context.Cause ONLY for ErrCriticKill; everything else is classified by the run error itself. Tests: generic-error-not-relabeled + caller-cancel-stays-cancelled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 17:00:26 -04:00
steve	390e6cf905	run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status) executus CI / test (pull_request) Successful in 46s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 22m30s Details Completes the run-critic seam so a host adapter (mort's agentcritic) has full fidelity, closing the two limitations gadfly surfaced on mort #1334. - RecordStep(iter int, resp *llm.Response): the completed step's model response is now passed to the critic (was index-only), so a host that records a trace (mort's ProgressRecorder) can show what the agent actually produced, not just an iteration count. The executor forwards s.Response; the battery ignores it (its Progress is count-based). - CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes an explicit critic KILL from a natural backstop expiry. runCtx uses a cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause → killed / timeout / cancelled are now distinct (were all "cancelled"). The battery sets killCause from Decision.KillReason on a Kill. Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle + battery RecordStep/KillCause signatures. Core stays battery-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 16:35:13 -04:00
steve	4ba83ab905	run: critic can raise a run's step ceiling mid-flight (CriticHandle.MaxSteps) executus CI / test (pull_request) Failing after 1m1s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 21m8s Details Prerequisite for a full-fidelity mort agentcritic adapter (which adjusts a healthy-but-long run's iteration budget, not just its deadline). executus's CriticHandle was deadline+steer only; this adds the dynamic step ceiling above an unchanged majordomo (which already exposes WithMaxStepsFunc). - run.RunInfo += MaxIterations (the run's base ceiling, so a critic can raise it relative to the baseline). - run.CriticHandle += MaxSteps() int — polled by the executor each step via agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else WithMaxSteps. - critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) + MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling alongside ExtendBy. ExtendOnce default is unchanged (time-only). Test: a critic returning MaxSteps=5 lets a base-MaxIterations=1 run complete two tool-dispatch steps past the base ceiling. Core stays battery-free (run doesn't import critic). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:16:03 -04:00
steve	4aa06f652e	C0b: address verified gadfly findings (panic-safety + test honesty) executus CI / test (pull_request) Failing after 58s Details From PR #9 (minimax + deepseek): - Run now has a top-level recover() — the "never propagates a panic" promise was unenforced; a panicking host Port (Critic/Audit/Palette) on the run goroutine now becomes Result.Err instead of unwinding into the caller. - The critic deadline-watch goroutine recovers panics from a host Deadline() (it's a separate goroutine, so Run's recover can't catch it) — a buggy CriticHandle can't crash the process. - CriticHandle interface documents its concurrency contract (Record*/Steer on the run goroutine vs Deadline()/Stop() from the watch goroutine — impls must be concurrent-safe; the critic battery already is). - startCritic's dead `soft <= 0 -> noop` guard (withFallbacks already coerces to 90s) replaced with a defensive inline 90s default, so a bypass of withFallbacks still gets a working critic instead of silently none. - Delivery tests made honest: the old "error path" test only checked the early-return (no delivery); added TestDeliverErrorOnRunFailure (in-loop model error -> DeliverError to the target) + renamed the early-return test. Graded all #9 findings in the gadfly MCP. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 10:09:22 -04:00
steve	43b2471737	C0b: wire Critic + Delivery into run.Executor executus CI / test (pull_request) Failing after 1m0s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 5m9s Details Continues finishing the executor's run.Ports wiring (after C0's Palette). Critic (run/critic.go): when Ports.Critic is set and the agent enables it, the executor calls Monitor at run start, feeds RecordStep/RecordToolStart from the step observer, drains the critic's Steer messages into the loop via agent.WithSteer, and binds the run's hard cancellation to the critic's (extendable) Deadline through a watch goroutine — a healthy-but-slow run gets room while a hung one is killed. Stop() on run end. Soft timeout from Defaults.CriticSoftTimeout (default 90s). nil-safe: no critic / not-enabled = no-op. Delivery (run/executor.go deliver): after the run, when Ports.Delivery is set and inv.DeliveryID is non-empty, the executor posts Result.Output (or DeliverError on failure) to a host-interpreted deliver.Target {inv.DeliveryKind, inv.DeliveryID}. Empty target = caller reads Result.Output itself (the synchronous default; the `.agent run` canary). Best-effort + detached. tool.Invocation gains DeliveryKind/DeliveryID (host-set egress target). Tests: critic monitored/fed/steered/stopped when enabled, untouched when not; delivery posts on a target, skips without one. Deferred: Checkpointer (needs a majordomo hook to snapshot the running message history). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 10:00:05 -04:00
steve	9d41987b0e	C0: wire Palette delegation into run.Executor (skill__/agent__ tools) executus CI / test (pull_request) Failing after 1m2s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 3m47s Details The first cutover prerequisite: the executor now turns an agent's SkillPalette / SubAgentPalette into delegation tools so a mort agent that delegates works through run.Executor (the piece the `.agent run` canary needs beyond the already-wired audit/budget). - run/palette.go: addDelegationTools builds a skill__<name> tool (structured inputs) per SkillPalette entry and an agent__<name> tool (prompt) per SubAgentPalette entry, each invoking run.Ports.Palette as a CHILD of the current run (parentRunID = inv.RunID, inheriting caller + channel). A non-ok child status is surfaced to the parent with the partial output. nil-safe: no PaletteSource or empty palette → no delegation tools (unchanged behavior). - executor.go: call it right after building the low-level toolbox. Tests: the model calls skill__helper → routed through Palette with the right name/caller/inputs/parent; nil palette → run still works. Deferred to C0b (the remaining run.Ports executor wiring): Critic (soft-timeout monitor + deadline binding + steer), Delivery (output egress for surfaces that need executor-side delivery), Checkpointer (needs a majordomo message-history hook to snapshot resumable state). The `.agent run` canary delivers its returned Result.Output itself, so these aren't on its critical path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 09:28:01 -04:00
steve	7b3da87c08	fix: address verified gadfly P2 findings (9 real of 18) Independently verified all 18 gadfly findings against the code (18-agent fan-out). Fixed the 9 real ones; the other 9 were false-positive / hallucinated / valid-tradeoff (no change). High: - F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the agent loop and nil-panicked. Now a clean error (Run never panics). +test. - F9 compactor data-leak: renderTranscript sent tool-call args verbatim to the summarizer (a possibly-different provider/tier); secret-bearing tool args (mcp_call/email_send/http_/webhook_) are now redacted, with a doc note that result bodies still flow (summary needs them). Medium/minor: - F2 compactor error path returned the folded slice, not the original msgs (contradicting the documented non-fatal contract) -> return msgs. - F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) / cancelled (Canceled) via statusFor. +test. - F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs min(calls,results) only. - F5 SetIteration was never called -> RunState.Iteration always 0; the step observer now updates it each loop. - F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue). - F7 estimateTokens had no default arm (future Part kinds counted as 0); unknown parts now counted conservatively. - F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now a clear "response exceeded N bytes" via readCapped. - F12 step observer captured the caller ctx; now the merged runCtx. - F13 compaction onFire was nil (doc claimed it logged); now wired to audit LogEvent("compaction_fired"). - F11 (no pre-dispatch hook in majordomo) documented honestly as a known limitation; F18 UsageSink doc clarified cache tokens are subsets of input. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 02:02:21 +00:00
steve	dfbc5a42b9	P2: run.Executor — executus is runnable The capstone of the run kernel: run.Executor.Run(ctx, RunnableAgent, inv) ties model resolution + the tool registry + majordomo's agent loop + context compaction + run-bounding + step/audit instrumentation into one path, with every host concern behind the nil-safe run.Ports. - run/executor.go: New(Config{Registry, Models, Defaults, Ports, Compactor, ContextTokens, SystemHeader}) + Run -> Result{RunID, Output, Steps, Usage, Err}. Budget gate (pre-run), model resolve, Audit StartRun/recorder (satisfies RunTally, stamped on inv.RunState), toolbox build, step observer (zips tool calls/results -> emitter + recorder.OnStep/OnTool), V10 detached-MaxRuntime context with caller-cancel merged back, compaction wired from ContextTokens×ratio, audit Close + Budget Commit on a detached cleanup ctx. Zero Ports = a bounded in-memory run (gadfly's case). - run/executor_test.go: hermetic end-to-end run against majordomo's fake provider (hello-world), Budget-rejection (no model call), Audit-port wiring (StartRun + Close with terminal status/output). All green under -race. - examples/minimal upgraded to the real "hello, agentic world" (~15 lines: Configure tiers -> run.New -> Run -> print). README/CLAUDE.md updated. Remaining P2 follow-ups (incremental): wire Critic/Checkpointer/PaletteSource/ Delivery into the loop, multi-phase Pipelines, and the no-tools direct path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 02:02:21 +00:00

15 Commits