executus

steve/executus

Fork 0

Commit Graph

Author	SHA1	Message	Date
steve	38d656ec71	fix(run): address gadfly review of the checkpoint PR executus CI / test (pull_request) Successful in 45s Details Real findings from the consensus review (44 raw; heavy devstral noise): - finalizeCheckpoint is now fired from the top-of-Run defer, so it runs on EVERY exit: a panic, an early build-error return (before the run loop), AND normal completion. Previously an early return on a recovered run left its durable record unfinalized → boot recovery would retry it forever on a persistent build error. (opus + glm) - Removed the dead ActivePhase field from run.RunCheckpointState + run.ResumeState (and the battery RunCheckpoint) — phase recovery is boundary-granular (skip completed phases; the interrupted phase re-runs from its start), so ActivePhase was never written nor read. Docs across ports/checkpoint/phases now state this plainly (5-model consensus that the field + docs over-promised mid-phase resume). - CheckpointerFactory.Begin error is now logged (WARN) before degrading to non-durable, per the documented contract (was silently swallowed). (4 models) - finalizeCheckpoint logs Complete/Fail errors (was silent). - Resume phase-skip now keys off a SEPARATE resumeSkip set, not the live outputs map — a fresh run with two same-named phases no longer skips the second (the outputs map fills as phases run). (opus:max) + regression test. - Removed the dead checkpoint.factory.now field (never set). (opus + glm) - Fixed the stale phaseDeps doc (the step observer moved out of sharedOpts to per-path). Hoisted the resume guard to a local; dropped the wasted acc allocation on the resume path; documented that Save throttling is the Checkpointer's responsibility and the accumulated transcript is pre-compaction (host size-caps it). Note (carried from the PR): classifyCheckpointOutcome keys shutdown on run.ErrShutdown; mort stamps its own runengine.ErrShutdown — the mort wiring PR aliases them so errors.Is matches. New test: duplicate phase names both run on a fresh run. Full ./... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 16:34:42 -04:00
steve	0dd2ced717	fix(run): address gadfly review of the phases PR executus CI / test (pull_request) Successful in 48s Details Real findings from the consensus review (37 raw; many devstral dups/noise): - Optional/budget-salvage branches no longer swallow a context cancellation / deadline / critic-kill: such errors return immediately so the run is classified cancelled/timeout/killed, not "ok" with a fallback. (the most serious finding — an Optional final phase could mask a killed run) - IsRunFunc bare phase now feeds the SHARED step observer (not just the audit recorder), so the critic's activity clock + Result.Steps see it — a long synthesize phase no longer looks idle to the critic. - phaseModel returns the resolver's enriched (usage-attribution) context and the phase's calls use it, mirroring the single-loop path (non-base-tier phases were mis-attributed). - salvagePhaseTranscript trims the tail on a rune boundary (was a raw byte slice that could split a UTF-8 rune); maxSalvage is now a named const with rationale. - expandPhaseTemplate logs a WARN on parse/execute failure instead of silently returning the unexpanded template; documented the phase-name identifier requirement + the "Query" shadow. - removed the dead phaseDeps.baseTier field. - extracted multimodalUserMessage, shared by runAgent + the phase runner (was duplicated image-folding). - aggregated phase usage is stamped onto the result even on a hard-error return; TrimSpace computed once; filterToolbox returns the base toolbox as-is for the empty-names (full-palette) case instead of copying; phaseModel WARN no longer prints error=<nil>. New test: Optional phase does not swallow a cancellation. Full ./... green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 15:44:04 -04:00
steve	30b79a330f	feat(run): execute multi-phase pipelines (RunnableAgent.Phases) executus CI / test (pull_request) Successful in 1m49s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 13m59s Details The kernel carried RunnableAgent.Phases as a DTO but never executed it — Run always ran a single agent loop with ra.SystemPrompt, so a phased agent (mort's deepresearch/research) silently ran one loop with the base prompt instead of its pipeline. This implements the phase loop, ported from mort's agentexec pipeline but reusing the kernel's own machinery. - run/phases.go: runPhases / runOnePhase. Phases run sequentially; each is a fresh agent loop (or a bare LLM call for IsRunFunc phases) with its own template-expanded system prompt ({{.Query}} + {{.<PhaseName>}}), model tier, step cap, and tool subset. Outputs thread into later phases; the final phase's output is the run output. Optional phases swallow errors and substitute FallbackMessage; a non-optional phase that merely exhausts its step/tool budget salvages its partial transcript and continues (a hard error still aborts); per-phase tier-resolve failures fall back with a WARN. - run/agent.go: Phase gains IsRunFunc + FallbackMessage (the kernel Phase struct previously omitted them). - run/executor.go: Run factors the shared agent options (tool-error limits, step observer, compactor) and branches — single loop (critic's dynamic step ceiling) vs the phase runner (fixed per-phase caps; the run-level critic's steer + hard deadline still apply across phases). systemPrompt now delegates to systemPromptWithBody so each phase keeps the platform header. The same step observer feeds audit/steps/critic across all phases. Tests (run/phases_test.go): sequential output threading + template expansion, Optional-failure → FallbackMessage continues, hard-error abort, IsRunFunc bare call, per-phase SystemHeader, filterToolbox subset, template expansion. Full ./... suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 15:14:45 -04:00

Author

SHA1

Message

Date

steve

38d656ec71

fix(run): address gadfly review of the checkpoint PR

executus CI / test (pull_request) Successful in 45s

Details

Real findings from the consensus review (44 raw; heavy devstral noise):

- finalizeCheckpoint is now fired from the top-of-Run defer, so it runs on
  EVERY exit: a panic, an early build-error return (before the run loop), AND
  normal completion. Previously an early return on a recovered run left its
  durable record unfinalized → boot recovery would retry it forever on a
  persistent build error. (opus + glm)
- Removed the dead ActivePhase field from run.RunCheckpointState +
  run.ResumeState (and the battery RunCheckpoint) — phase recovery is
  boundary-granular (skip completed phases; the interrupted phase re-runs from
  its start), so ActivePhase was never written nor read. Docs across
  ports/checkpoint/phases now state this plainly (5-model consensus that the
  field + docs over-promised mid-phase resume).
- CheckpointerFactory.Begin error is now logged (WARN) before degrading to
  non-durable, per the documented contract (was silently swallowed). (4 models)
- finalizeCheckpoint logs Complete/Fail errors (was silent).
- Resume phase-skip now keys off a SEPARATE resumeSkip set, not the live
  outputs map — a fresh run with two same-named phases no longer skips the
  second (the outputs map fills as phases run). (opus:max) + regression test.
- Removed the dead checkpoint.factory.now field (never set). (opus + glm)
- Fixed the stale phaseDeps doc (the step observer moved out of sharedOpts to
  per-path). Hoisted the resume guard to a local; dropped the wasted acc
  allocation on the resume path; documented that Save throttling is the
  Checkpointer's responsibility and the accumulated transcript is pre-compaction
  (host size-caps it).

Note (carried from the PR): classifyCheckpointOutcome keys shutdown on
run.ErrShutdown; mort stamps its own runengine.ErrShutdown — the mort wiring PR
aliases them so errors.Is matches.

New test: duplicate phase names both run on a fresh run. Full ./... green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-29 16:34:42 -04:00

steve

0dd2ced717

fix(run): address gadfly review of the phases PR

executus CI / test (pull_request) Successful in 48s

Details

Real findings from the consensus review (37 raw; many devstral dups/noise):

- Optional/budget-salvage branches no longer swallow a context
  cancellation / deadline / critic-kill: such errors return immediately so
  the run is classified cancelled/timeout/killed, not "ok" with a fallback.
  (the most serious finding — an Optional final phase could mask a killed run)
- IsRunFunc bare phase now feeds the SHARED step observer (not just the
  audit recorder), so the critic's activity clock + Result.Steps see it —
  a long synthesize phase no longer looks idle to the critic.
- phaseModel returns the resolver's enriched (usage-attribution) context and
  the phase's calls use it, mirroring the single-loop path (non-base-tier
  phases were mis-attributed).
- salvagePhaseTranscript trims the tail on a rune boundary (was a raw byte
  slice that could split a UTF-8 rune); maxSalvage is now a named const with
  rationale.
- expandPhaseTemplate logs a WARN on parse/execute failure instead of
  silently returning the unexpanded template; documented the phase-name
  identifier requirement + the "Query" shadow.
- removed the dead phaseDeps.baseTier field.
- extracted multimodalUserMessage, shared by runAgent + the phase runner
  (was duplicated image-folding).
- aggregated phase usage is stamped onto the result even on a hard-error
  return; TrimSpace computed once; filterToolbox returns the base toolbox
  as-is for the empty-names (full-palette) case instead of copying;
  phaseModel WARN no longer prints error=<nil>.

New test: Optional phase does not swallow a cancellation. Full ./... green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-29 15:44:04 -04:00

steve

30b79a330f

feat(run): execute multi-phase pipelines (RunnableAgent.Phases)

executus CI / test (pull_request) Successful in 1m49s

Details

Adversarial Review (Gadfly) / review (pull_request) Successful in 13m59s

Details

The kernel carried RunnableAgent.Phases as a DTO but never executed it —
Run always ran a single agent loop with ra.SystemPrompt, so a phased agent
(mort's deepresearch/research) silently ran one loop with the base prompt
instead of its pipeline. This implements the phase loop, ported from mort's
agentexec pipeline but reusing the kernel's own machinery.

- run/phases.go: runPhases / runOnePhase. Phases run sequentially; each is a
  fresh agent loop (or a bare LLM call for IsRunFunc phases) with its own
  template-expanded system prompt ({{.Query}} + {{.<PhaseName>}}), model
  tier, step cap, and tool subset. Outputs thread into later phases; the
  final phase's output is the run output. Optional phases swallow errors and
  substitute FallbackMessage; a non-optional phase that merely exhausts its
  step/tool budget salvages its partial transcript and continues (a hard
  error still aborts); per-phase tier-resolve failures fall back with a WARN.
- run/agent.go: Phase gains IsRunFunc + FallbackMessage (the kernel Phase
  struct previously omitted them).
- run/executor.go: Run factors the shared agent options (tool-error limits,
  step observer, compactor) and branches — single loop (critic's dynamic
  step ceiling) vs the phase runner (fixed per-phase caps; the run-level
  critic's steer + hard deadline still apply across phases). systemPrompt
  now delegates to systemPromptWithBody so each phase keeps the platform
  header. The same step observer feeds audit/steps/critic across all phases.

Tests (run/phases_test.go): sequential output threading + template
expansion, Optional-failure → FallbackMessage continues, hard-error abort,
IsRunFunc bare call, per-phase SystemHeader, filterToolbox subset, template
expansion. Full ./... suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-29 15:14:45 -04:00

3 Commits