cb4c612461c394d327927d2b874372e39e0f4b46
27 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
cb4c612461 |
fix(run): address gadfly review of the critic-deadline PR
executus CI / test (pull_request) Successful in 1m45s
All 11 findings were real (3 clusters): - Failsafe ceiling could pre-empt the critic's backstop (e9c9483f, 9109317b, d5a9bf0d, 76ad171e): CriticAbsoluteMax was 6h, but the host's backstop (MaxRuntime × multiplier, or its own absolute max) can reach 6h+, so the ceiling fired first and reintroduced a premature hard cap. Now CriticAbsoluteMax is a 24h RUNAWAY guard set far beyond any realistic backstop (the host clamps its own backstop to a much smaller absolute max, e.g. mort's 6h convar), so it never pre-empts a healthy supervised run. Comments corrected. - nil Monitor handle lost the MaxRuntime cap (df016a6f, 9dd42827): a critic-enabled run whose host Monitor returned no handle had no deadline-watch and was bounded only by the generous ceiling. Added an unsupervised-run failsafe that re-wraps runCtx to the nominal MaxRuntime when the critic is enabled but didn't arm. New test TestCriticOwnsDeadline_NilHandleFallsBackToMaxRuntime. - CriticSoftTimeout vestigial / dead fallback (f7764919, 9805bebe, 6864086f, b2b11721): the soft trigger is now always the resolved MaxRuntime (> 0), so the CriticSoftTimeout field + its startCritic fallback were unreachable. Removed the field entirely; the remaining 90s floor is documented as defensive-only. - DRY (f30ce827): extracted e.criticOwnsDeadline(ra), now the single predicate used by both Run and startCritic so they can't drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X |
||
|
|
5b5ee4148e |
feat(run): critic owns the deadline — MaxRuntime becomes the soft trigger
When a run enables the critic (Ports.Critic set + RunnableAgent.Critic.Enabled), the kernel no longer hard-caps it at MaxRuntime. MaxRuntime becomes the SOFT trigger (passed to startCritic, used by the host critic as its wake + the base for its extendable backstop); the critic's deadline-watch is the real hard cancel. This restores mort's old agentexec two-tier timeout semantics — a slow-but-progressing run (e.g. a parent agent blocked on a 30-min animate render) is given room up to the critic's backstop instead of being killed at the nominal MaxRuntime. Specifics: - run/executor.go: the WithTimeout(MaxRuntime) is now conditional. Non-critic runs keep the literal MaxRuntime kill (→ "timeout"). Critic-owned runs get a GENEROUS WithTimeout at the new Defaults.CriticAbsoluteMax (default 6h) as a failsafe ceiling only — it never fires before the critic's backstop, and it guarantees a broken/nil host handle can't run unbounded. - run/critic.go: startCritic takes the resolved MaxRuntime as the soft trigger (falling back to Defaults.CriticSoftTimeout, then 90s), instead of always using the global CriticSoftTimeout. - Defaults.CriticAbsoluteMax added (withFallbacks default 6h). - Tests: non-critic dies at MaxRuntime; critic-owned survives past it; soft trigger == MaxRuntime. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Jo75sqmeVPgFUWZQBn179X |
||
|
|
38d656ec71 |
fix(run): address gadfly review of the checkpoint PR
executus CI / test (pull_request) Successful in 45s
Real findings from the consensus review (44 raw; heavy devstral noise): - finalizeCheckpoint is now fired from the top-of-Run defer, so it runs on EVERY exit: a panic, an early build-error return (before the run loop), AND normal completion. Previously an early return on a recovered run left its durable record unfinalized → boot recovery would retry it forever on a persistent build error. (opus + glm) - Removed the dead ActivePhase field from run.RunCheckpointState + run.ResumeState (and the battery RunCheckpoint) — phase recovery is boundary-granular (skip completed phases; the interrupted phase re-runs from its start), so ActivePhase was never written nor read. Docs across ports/checkpoint/phases now state this plainly (5-model consensus that the field + docs over-promised mid-phase resume). - CheckpointerFactory.Begin error is now logged (WARN) before degrading to non-durable, per the documented contract (was silently swallowed). (4 models) - finalizeCheckpoint logs Complete/Fail errors (was silent). - Resume phase-skip now keys off a SEPARATE resumeSkip set, not the live outputs map — a fresh run with two same-named phases no longer skips the second (the outputs map fills as phases run). (opus:max) + regression test. - Removed the dead checkpoint.factory.now field (never set). (opus + glm) - Fixed the stale phaseDeps doc (the step observer moved out of sharedOpts to per-path). Hoisted the resume guard to a local; dropped the wasted acc allocation on the resume path; documented that Save throttling is the Checkpointer's responsibility and the accumulated transcript is pre-compaction (host size-caps it). Note (carried from the PR): classifyCheckpointOutcome keys shutdown on run.ErrShutdown; mort stamps its own runengine.ErrShutdown — the mort wiring PR aliases them so errors.Is matches. New test: duplicate phase names both run on a fresh run. Full ./... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
899059a791 |
feat(run): durable checkpoint + resume (wire Ports.Checkpointer)
The kernel defined run.Ports.Checkpointer + the checkpoint battery but never drove them (the documented "P2 follow-up"). This wires durable recovery into the run loop so a run interrupted by shutdown can resume on the next boot instead of being lost — the executus-side half of mort's durable-agent-recovery parity (mort #1355). Kernel (run/): - Ports.Checkpointer is now a CheckpointerFactory (Begin per run → a per-run Checkpointer, or nil for a non-durable run). The single per-instance Checkpointer couldn't distinguish runs; a factory mints one per run, matching mort's agentexec.CheckpointerFactory. - RunInfo gains GuildID + ModelTier (so the factory can build resume meta); RunCheckpointState gains CompletedPhases + ActivePhase (+ PhaseOutput). - run/checkpoint.go: ResumeState + WithResumeState / WithExistingCheckpointer context carriers, classifyCheckpointOutcome (success→Complete, shutdown→leave for boot recovery, else→Fail using run.ErrShutdown), and finalizeCheckpoint. - run/executor.go: resolve the per-run checkpointer (existing-from-ctx on a recovery re-run, else factory.Begin); single-loop wraps the step observer to accumulate the transcript + Save each step (host throttles), and a recovered run seeds the saved transcript via WithHistory and continues with no new input; finalize on exit. - run/phases.go: phase-boundary checkpointing — record completed phases after each phase; a resumed run skips already-completed phases (the interrupted phase re-runs from its start — boundary-granular, documented; only the single-loop path resumes mid-loop). Battery (checkpoint/): NewFactory wires the battery into the factory port (per-run handle, meta derived from RunInfo); RunCheckpoint + handle.Save carry the phase fields. Tests (run/checkpoint_test.go): the finalize decision matrix; single-loop Save+Complete; terminal-error Fail; resume seeds history; phase-boundary Saves completed phases; resume skips completed phases. Full ./... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
0dd2ced717 |
fix(run): address gadfly review of the phases PR
executus CI / test (pull_request) Successful in 48s
Real findings from the consensus review (37 raw; many devstral dups/noise): - Optional/budget-salvage branches no longer swallow a context cancellation / deadline / critic-kill: such errors return immediately so the run is classified cancelled/timeout/killed, not "ok" with a fallback. (the most serious finding — an Optional final phase could mask a killed run) - IsRunFunc bare phase now feeds the SHARED step observer (not just the audit recorder), so the critic's activity clock + Result.Steps see it — a long synthesize phase no longer looks idle to the critic. - phaseModel returns the resolver's enriched (usage-attribution) context and the phase's calls use it, mirroring the single-loop path (non-base-tier phases were mis-attributed). - salvagePhaseTranscript trims the tail on a rune boundary (was a raw byte slice that could split a UTF-8 rune); maxSalvage is now a named const with rationale. - expandPhaseTemplate logs a WARN on parse/execute failure instead of silently returning the unexpanded template; documented the phase-name identifier requirement + the "Query" shadow. - removed the dead phaseDeps.baseTier field. - extracted multimodalUserMessage, shared by runAgent + the phase runner (was duplicated image-folding). - aggregated phase usage is stamped onto the result even on a hard-error return; TrimSpace computed once; filterToolbox returns the base toolbox as-is for the empty-names (full-palette) case instead of copying; phaseModel WARN no longer prints error=<nil>. New test: Optional phase does not swallow a cancellation. Full ./... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
30b79a330f |
feat(run): execute multi-phase pipelines (RunnableAgent.Phases)
The kernel carried RunnableAgent.Phases as a DTO but never executed it —
Run always ran a single agent loop with ra.SystemPrompt, so a phased agent
(mort's deepresearch/research) silently ran one loop with the base prompt
instead of its pipeline. This implements the phase loop, ported from mort's
agentexec pipeline but reusing the kernel's own machinery.
- run/phases.go: runPhases / runOnePhase. Phases run sequentially; each is a
fresh agent loop (or a bare LLM call for IsRunFunc phases) with its own
template-expanded system prompt ({{.Query}} + {{.<PhaseName>}}), model
tier, step cap, and tool subset. Outputs thread into later phases; the
final phase's output is the run output. Optional phases swallow errors and
substitute FallbackMessage; a non-optional phase that merely exhausts its
step/tool budget salvages its partial transcript and continues (a hard
error still aborts); per-phase tier-resolve failures fall back with a WARN.
- run/agent.go: Phase gains IsRunFunc + FallbackMessage (the kernel Phase
struct previously omitted them).
- run/executor.go: Run factors the shared agent options (tool-error limits,
step observer, compactor) and branches — single loop (critic's dynamic
step ceiling) vs the phase runner (fixed per-phase caps; the run-level
critic's steer + hard deadline still apply across phases). systemPrompt
now delegates to systemPromptWithBody so each phase keeps the platform
header. The same step observer feeds audit/steps/critic across all phases.
Tests (run/phases_test.go): sequential output threading + template
expansion, Optional-failure → FallbackMessage continues, hard-error abort,
IsRunFunc bare call, per-phase SystemHeader, filterToolbox subset, template
expansion. Full ./... suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
df4033f42e |
fix(run): harden input-file staging per gadfly #18 validation pass
executus CI / test (pull_request) Successful in 48s
Second-pass findings on the security fix: - Mime sanitized ONCE and passed to BOTH StageInputFile and the descriptor (was passing raw f.MimeType to the host store while only the descriptor sanitized) — 3 models. - sanitizeField now also strips Unicode format chars (category Cf, incl. the bidi overrides U+202A–U+202E that can reorder how the descriptor renders); IsControl already covers \n\r\t so the explicit checks are dropped. - fileID is sanitized before inlining + an empty file_id drops the file (defense vs a misbehaving stager). - humanizeBytes clamps the prefix index so an absurd size (≥1024^6) can't index past "KMGTPE" and panic — a no-panic guarantee independent of the per-file cap. - Docs sync: README Ports list gains InputFiles; tool.InputFile.Name doc now says the executor reduces an untrusted name to a safe base name (was claiming the field is already safe). Tests: bidi/control stripping; mime sanitized in staged value + descriptor; empty file_id drop; humanizeBytes no-panic across sizes up to 1<<62. Suite green (-race). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
1e65f4b6e5 |
fix(run): sanitize input-file names — path-traversal + prompt-injection hardening (gadfly #18)
executus CI / test (pull_request) Successful in 48s
The full swarm (5-6 models) flagged that stageInputFiles passed the untrusted attachment filename straight to StageInputFile and inlined it into the [ATTACHED FILES]/`/workspace/<name>` descriptor with no sanitization — a path the byte-cap already treats as a trust boundary. A name like ../../etc/passwd or an absolute/drive path could escape the host store or the sandbox workspace, and newlines in the name/mime could inject text into the prompt block. - sanitizeName: strips control chars/newlines, then reduces to a base name (path.Base after backslash-normalization) so ../, nested dirs, and absolute / drive paths all collapse to their last element; "attachment" fallback for empty/"."/"..". Applied BEFORE staging AND inlining. - sanitizeField: strips control chars from MimeType (also inlined verbatim). - maxInputFiles (32) count cap — defense-in-depth vs a flood of tiny files, independent of the per-file byte cap. Tests: sanitizeName table (traversal/absolute/backslash/control/fallback, + no-separator invariant); traversal staged+described under the base name only; oversize skip; count-cap truncation. Full suite green (-race). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
2ef88f2a73 |
feat(run): InputFileStager seam — stage non-image attachments into the prompt
executus's tool.Invocation already carried InputFiles (audio/PDF/binary), but the executor never staged them — only Images were folded into the run. This adds the host seam mort's chat/chatbot surfaces need for audio-input parity with agentexec. - run.Ports gains InputFiles InputFileStager (nil-safe; nil = input files silently ignored, run still proceeds text-only). The interface mirrors mort's skill FileStorage: StageInputFile(ctx, runID, agentID, name, mime, content) → file_id. - run/input_files.go (ported from mort agentexec/input_files.go): stageInputFiles persists each file under run scope and appends an [ATTACHED FILES] descriptor block to the prompt so the agent can reach them by file_id (e.g. code_exec files_in → /workspace/<name>). Bytes are NEVER inlined into model context. Best-effort: empty/oversized(>50MB)/save-error files are skipped; colliding base names are disambiguated (name-2, name-3) so they don't clobber at /workspace/<name>. - Executor.Run calls it after the model/toolbox build, before the loop, so the descriptor rides the first user turn (alongside the existing Images folding). Tests: stages + builds the block; nil stager / no files leave the prompt intact; dedup; empty/save-error skipping. Full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
0acaa8c9a5 |
run: guard empty text part in runAgent + drop cross-repo doc ref (gadfly #16)
executus CI / test (pull_request) Successful in 1m46s
Every reviewer flagged that runAgent appended llm.Text(input) unconditionally, so an image-only run (blank prompt) emitted an empty TextPart — inconsistent with the sibling runSession.AttachImages which guards it. Mirror that guard (strings.TrimSpace(input) != ""). Also: - copy opts before appending (variadic backing array can have spare capacity; avoid aliasing a caller's slice). - reword the doc comment to drop the mort-agentexec reference (executus is a standalone lib; a consumer name doesn't belong in its godoc). Tests: image+text are co-located in ONE user message; an image-only run emits no blank TextPart. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
a35c176b42 |
run: fold inv.Images into the initial user message (multimodal opening turn)
The executor passed only the text `input` to majordomo's agent.Run, silently dropping inv.Images — so a multimodal run (vision: chatbot @mention, chat API) lost its images on the executus path. majordomo's Run input arg is text-only, so fold the images into the first user message (text + image parts) via WithHistory and call Run with empty input, mirroring mort agentexec's multimodal seeding. The image-less path is unchanged (prompt passes straight through). Tests: a run with Images carries the image bytes + prompt into the first model request; the text-only path still reaches the model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
784d5d7ce4 |
run: PostRun detached ctx + panic-isolated Cleanup (gadfly #12)
Two convergent gadfly refinements on the PostRun wiring: - PostRun now runs on detach(ctx), not the caller's ctx — a finished/cancelled caller no longer aborts artifact production (3-model: glm-5.2/minimax/deepseek). - Cleanup is panic-isolated via safeCleanup (recover+log), matching runPostRun, so a misbehaving teardown can't clobber an otherwise-successful run (deepseek). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
4e179259de |
run: wire SessionToolFactory + PostRun artifacts + AttachImages
The session-tool TYPES already lived in tool/ (P4 move) but the executor never used them. This wires them, unblocking artifact-producing host surfaces (mort's chat API / chatbot / .skill / scaddy) to run on executus: - run/session.go: steerMailbox (thread-safe message queue) + runSession (tool.AgentSession over it: AttachImages → a user-role multimodal message injected before the agent's next step) + runPostRun (panic-isolated hook call). - executor: create the mailbox + set inv.AttachImages BEFORE the toolbox build; add inv.ExtraTools + a SessionToolFactory's per-run Tools to the toolbox; defer its Cleanup; merge the session mailbox with the critic's nudges into ONE WithSteer; after the run, call PostRun with the full transcript (runRes.Messages) → Result.PostRunResult (best-effort, never fails the run). - run.Result += PostRunResult *tool.PostRunResult. - dropped the now-dead criticBinding.steerOptions (superseded by drainSteer). Tests: a factory whose PostRun emits an artifact from the output+transcript + Cleanup lands on Result.PostRunResult; a factory-added tool is callable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
be4bbbcad5 |
run: fix statusFor — don't relabel a generic error / caller-cancel as timeout (gadfly #11)
The WithCancelCause+timer rewrite made MaxRuntime surface as Canceled (not DeadlineExceeded), so statusFor's context.Cause(DeadlineExceeded) check could relabel (a) a genuine run error as 'timeout' and (b) a caller cancel/deadline as 'timeout' (was 'cancelled'). Convergent gadfly finding (glm-5.2 + cluster). Fix: keep MaxRuntime as WithTimeout (its DeadlineExceeded propagates → 'timeout', preserving own-timeout vs caller-cancel), add a NESTED WithCancelCause layer only for the kill. statusFor consults context.Cause ONLY for ErrCriticKill; everything else is classified by the run error itself. Tests: generic-error-not-relabeled + caller-cancel-stays-cancelled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
390e6cf905 |
run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status)
Completes the run-critic seam so a host adapter (mort's agentcritic) has full fidelity, closing the two limitations gadfly surfaced on mort #1334. - RecordStep(iter int, resp *llm.Response): the completed step's model response is now passed to the critic (was index-only), so a host that records a trace (mort's ProgressRecorder) can show what the agent actually produced, not just an iteration count. The executor forwards s.Response; the battery ignores it (its Progress is count-based). - CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes an explicit critic KILL from a natural backstop expiry. runCtx uses a cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause → killed / timeout / cancelled are now distinct (were all "cancelled"). The battery sets killCause from Decision.KillReason on a Kill. Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle + battery RecordStep/KillCause signatures. Core stays battery-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
4ba83ab905 |
run: critic can raise a run's step ceiling mid-flight (CriticHandle.MaxSteps)
Prerequisite for a full-fidelity mort agentcritic adapter (which adjusts a healthy-but-long run's iteration budget, not just its deadline). executus's CriticHandle was deadline+steer only; this adds the dynamic step ceiling above an unchanged majordomo (which already exposes WithMaxStepsFunc). - run.RunInfo += MaxIterations (the run's base ceiling, so a critic can raise it relative to the baseline). - run.CriticHandle += MaxSteps() int — polled by the executor each step via agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else WithMaxSteps. - critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) + MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling alongside ExtendBy. ExtendOnce default is unchanged (time-only). Test: a critic returning MaxSteps=5 lets a base-MaxIterations=1 run complete two tool-dispatch steps past the base ceiling. Core stays battery-free (run doesn't import critic). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
97154395e6 |
C0b: document recordToolStart post-iteration timing (gadfly glm finding)
majordomo's step observer fires post-iteration, so the critic's activity clock refreshes per-iteration, not mid-tool — a single long tool call won't refresh it until it returns. Documented + the host-progress-bridge mitigation (mort's pattern). A true pre-dispatch hook needs majordomo support (follow-up). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
4aa06f652e |
C0b: address verified gadfly findings (panic-safety + test honesty)
executus CI / test (pull_request) Failing after 58s
From PR #9 (minimax + deepseek): - Run now has a top-level recover() — the "never propagates a panic" promise was unenforced; a panicking host Port (Critic/Audit/Palette) on the run goroutine now becomes Result.Err instead of unwinding into the caller. - The critic deadline-watch goroutine recovers panics from a host Deadline() (it's a separate goroutine, so Run's recover can't catch it) — a buggy CriticHandle can't crash the process. - CriticHandle interface documents its concurrency contract (Record*/Steer on the run goroutine vs Deadline()/Stop() from the watch goroutine — impls must be concurrent-safe; the critic battery already is). - startCritic's dead `soft <= 0 -> noop` guard (withFallbacks already coerces to 90s) replaced with a defensive inline 90s default, so a bypass of withFallbacks still gets a working critic instead of silently none. - Delivery tests made honest: the old "error path" test only checked the early-return (no delivery); added TestDeliverErrorOnRunFailure (in-loop model error -> DeliverError to the target) + renamed the early-return test. Graded all #9 findings in the gadfly MCP. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
43b2471737 |
C0b: wire Critic + Delivery into run.Executor
Continues finishing the executor's run.Ports wiring (after C0's Palette).
Critic (run/critic.go): when Ports.Critic is set and the agent enables it, the
executor calls Monitor at run start, feeds RecordStep/RecordToolStart from the
step observer, drains the critic's Steer messages into the loop via
agent.WithSteer, and binds the run's hard cancellation to the critic's
(extendable) Deadline through a watch goroutine — a healthy-but-slow run gets
room while a hung one is killed. Stop() on run end. Soft timeout from
Defaults.CriticSoftTimeout (default 90s). nil-safe: no critic / not-enabled =
no-op.
Delivery (run/executor.go deliver): after the run, when Ports.Delivery is set
and inv.DeliveryID is non-empty, the executor posts Result.Output (or
DeliverError on failure) to a host-interpreted deliver.Target
{inv.DeliveryKind, inv.DeliveryID}. Empty target = caller reads Result.Output
itself (the synchronous default; the `.agent run` canary). Best-effort +
detached.
tool.Invocation gains DeliveryKind/DeliveryID (host-set egress target).
Tests: critic monitored/fed/steered/stopped when enabled, untouched when not;
delivery posts on a target, skips without one. Deferred: Checkpointer (needs a
majordomo hook to snapshot the running message history).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
0c80679719 |
C0: address verified gadfly findings (trivial fixes)
From the PR #8 review (all graded in the gadfly MCP): - skip empty palette names + dedupe by final tool name, instead of producing a "skill__" tool or an opaque box.Add duplicate error. - delegationResult: no trailing blank line when a non-ok child produced no output. - delegationErr: fold a child's partial output into the hard-failure error so it isn't silently dropped. Deferred to C0b (design-level, not trivial): route delegation through the tool.Registry gate/audit wrappers; expose the skill's real input schema to the LLM instead of a generic inputs map. typed-nil PaletteSource is left as a caller contract (the == nil guard catches the untyped-nil interface). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
9d41987b0e |
C0: wire Palette delegation into run.Executor (skill__/agent__ tools)
The first cutover prerequisite: the executor now turns an agent's SkillPalette / SubAgentPalette into delegation tools so a mort agent that delegates works through run.Executor (the piece the `.agent run` canary needs beyond the already-wired audit/budget). - run/palette.go: addDelegationTools builds a skill__<name> tool (structured inputs) per SkillPalette entry and an agent__<name> tool (prompt) per SubAgentPalette entry, each invoking run.Ports.Palette as a CHILD of the current run (parentRunID = inv.RunID, inheriting caller + channel). A non-ok child status is surfaced to the parent with the partial output. nil-safe: no PaletteSource or empty palette → no delegation tools (unchanged behavior). - executor.go: call it right after building the low-level toolbox. Tests: the model calls skill__helper → routed through Palette with the right name/caller/inputs/parent; nil palette → run still works. Deferred to C0b (the remaining run.Ports executor wiring): Critic (soft-timeout monitor + deadline binding + steer), Delivery (output egress for surfaces that need executor-side delivery), Checkpointer (needs a majordomo message-history hook to snapshot resumable state). The `.agent run` canary delivers its returned Result.Output itself, so these aren't on its critical path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
7b3da87c08 |
fix: address verified gadfly P2 findings (9 real of 18)
Independently verified all 18 gadfly findings against the code (18-agent
fan-out). Fixed the 9 real ones; the other 9 were false-positive /
hallucinated / valid-tradeoff (no change).
High:
- F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the
agent loop and nil-panicked. Now a clean error (Run never panics). +test.
- F9 compactor data-leak: renderTranscript sent tool-call args verbatim to
the summarizer (a possibly-different provider/tier); secret-bearing tool
args (mcp_call/email_send/http_*/webhook_*) are now redacted, with a doc
note that result bodies still flow (summary needs them).
Medium/minor:
- F2 compactor error path returned the folded slice, not the original msgs
(contradicting the documented non-fatal contract) -> return msgs.
- F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) /
cancelled (Canceled) via statusFor. +test.
- F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs
min(calls,results) only.
- F5 SetIteration was never called -> RunState.Iteration always 0; the step
observer now updates it each loop.
- F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue).
- F7 estimateTokens had no default arm (future Part kinds counted as 0);
unknown parts now counted conservatively.
- F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now
a clear "response exceeded N bytes" via readCapped.
- F12 step observer captured the caller ctx; now the merged runCtx.
- F13 compaction onFire was nil (doc claimed it logged); now wired to
audit LogEvent("compaction_fired").
- F11 (no pre-dispatch hook in majordomo) documented honestly as a known
limitation; F18 UsageSink doc clarified cache tokens are subsets of input.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
dfbc5a42b9 |
P2: run.Executor — executus is runnable
The capstone of the run kernel: run.Executor.Run(ctx, RunnableAgent, inv)
ties model resolution + the tool registry + majordomo's agent loop +
context compaction + run-bounding + step/audit instrumentation into one
path, with every host concern behind the nil-safe run.Ports.
- run/executor.go: New(Config{Registry, Models, Defaults, Ports, Compactor,
ContextTokens, SystemHeader}) + Run -> Result{RunID, Output, Steps, Usage,
Err}. Budget gate (pre-run), model resolve, Audit StartRun/recorder
(satisfies RunTally, stamped on inv.RunState), toolbox build, step observer
(zips tool calls/results -> emitter + recorder.OnStep/OnTool), V10
detached-MaxRuntime context with caller-cancel merged back, compaction wired
from ContextTokens×ratio, audit Close + Budget Commit on a detached cleanup
ctx. Zero Ports = a bounded in-memory run (gadfly's case).
- run/executor_test.go: hermetic end-to-end run against majordomo's fake
provider (hello-world), Budget-rejection (no model call), Audit-port wiring
(StartRun + Close with terminal status/output). All green under -race.
- examples/minimal upgraded to the real "hello, agentic world" (~15 lines:
Configure tiers -> run.New -> Run -> print). README/CLAUDE.md updated.
Remaining P2 follow-ups (incremental): wire Critic/Checkpointer/PaletteSource/
Delivery into the loop, multi-phase Pipelines, and the no-tools direct path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
130c2bdfab |
P2: move compactor -> compact/ + step instrumentation -> run/steps.go
- compact/compactor.go: the per-run stateful context compactor (token-threshold gate, fast-tier middle summarisation, fold memory) lifted from mort's skillexec/compactor.go. Self-contained; its only dependency is a ModelResolver func (model.ParseModelForContext satisfies it) + a token threshold. - run/steps.go: the step-emission/instrumentation (stepEmitter, tool->kind/ summary mapping with redaction, Result.Steps accumulation) from agentexec, repointed onto executus/tool. Both build green. With the run-loop mechanics, RunnableAgent DTO, run.Ports, compactor, and step instrumentation now all in place, the remaining P2 work is the run.Executor itself (wiring these + majordomo's agent loop), which makes executus runnable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
c732a677df |
P2: define nil-safe run.Ports (the inversion spine)
Add run/ports.go: the host seams the executor will consume, every one nil-safe so a light host runs with the zero Ports (no persistence/audit/ budget/critic/delegation/delivery) and a heavy host wires each to a battery. Ports mirror mort's existing interfaces so the batteries implement them directly: - Audit + RunRecorder (mort skillaudit.Storage/Writer): StartRun -> per-run recorder (OnStep/OnTool/LogEvent/Close), recorder satisfies RunTally. - Budget (mort skillexec.BudgetTracker): Check / Commit. - Critic + CriticHandle (mort agentcritic): Monitor -> handle with RecordStep/RecordToolStart/Steer/Deadline/Stop (the loop wiring finalizes with the executor merge). - Checkpointer (mort agentexec.RunCheckpointer): Save/Complete/Fail. - PaletteSource (mort SkillInvokerForPalette + AgentInvokerForPalette): Resolve/Invoke skill + agent delegation. Plus host-neutral RunInfo / RunStats. This completes the P2 inversion DESIGN; the agentexec+skillexec -> run.Executor merge that consumes these Ports is the remaining P2 work. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
fa644f1826 |
P2 (foundation): run-loop mechanics + RunnableAgent DTO
Stand up the executus/run kernel foundation, decoupled from mort: - runengine.go: the shared run-loop scaffolding (MergeCancellation, CleanupContextTimeout, RunFinalizer/FireFinalizers, RunStateAccessor) moved from mort. The accessor's *skillaudit.Writer dependency is inverted to a narrow run.RunTally interface (TokenStats + ToolCallsCount) — the kernel reads live tallies without importing the audit battery. - submit.go: the legacy submit-capture compat tool (stdlib + majordomo/llm). - agent.go: RunnableAgent DTO — the kernel's view of "a thing to run" (tier, prompt, caps, palette, phases, critic config). The persona Agent and saved Skill will LOWER into this DTO so the kernel never imports a noun battery. This is the spine of the agentexec.Run(*agents.Agent) inversion. run/ builds with only majordomo + executus/tool. The executor merge (agentexec+skillexec -> run.Executor) and the nil-safe run.Ports (Audit/Critic/Budget/Checkpointer/PaletteSource) are the next P2 block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
ca243a2d50 |
P0: stand up executus harness module above majordomo
executus CI / test (push) Failing after 24s
Batteries-included agent-harness base, extracted from mort's agent layer. This first cut establishes the module + the zero-coupling core primitives: - lane, dispatchguard, pendingattach, run/progress.go: moved verbatim from mort - config: host config Source seam + env-var default (nil-safe helpers) - deliver: output-egress seam + Discard/Stdout defaults - identity: AdminPolicy + MemberResolver seams (nil-safe) - fanout: programmatic N×M swarm (bounded global + per-key concurrency) - README/CLAUDE.md with the vibe-coded banner; CI with Go gates + the "core stays majordomo+stdlib only" invariant Core builds with stdlib only today; majordomo enters at P1 (model/structured). go build/vet/test -race all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |