feat(run): execute multi-phase pipelines (RunnableAgent.Phases) #19
Reference in New Issue
Block a user
Delete Branch "feat/kernel-phases"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The kernel carried
RunnableAgent.Phasesas a DTO but never executed it —run.Executor.Runalways ran a single agent loop withra.SystemPrompt. A phased agent (mort'sdeepresearch/research) therefore silently ran one loop with the base prompt instead of its pipeline. This is the host-side companion to mort's executus migration (mort issue #1355): mort's agent path is now sole-sourced on this kernel, so the phase feature must actually run.What
run/phases.go—runPhases/runOnePhase. Phases run sequentially; each is a fresh agent loop (or a bare LLM call forIsRunFuncphases) with its own template-expanded system prompt ({{.Query}}+{{.<PhaseName>}}), model tier, step cap, and tool subset. Outputs thread into later phases; the final phase's output is the run output.Optionalphases swallow errors →FallbackMessage; a non-optional phase that merely exhausts its step/tool budget salvages its partial transcript and continues (a hard error still aborts); per-phase tier-resolve failures fall back to the base model with a WARN. Ported from mort'sagentexecpipeline but reusing the kernel's machinery.run/agent.go—PhasegainsIsRunFunc+FallbackMessage(the struct previously omitted them).run/executor.go—Runfactors the shared agent options (tool-error limits, step observer, compactor) and branches: single loop (critic's dynamic step ceiling) vs the phase runner (fixed per-phase caps; the run-level critic's steer + hard deadline still apply across phases). The same step observer feeds audit/steps/critic across all phases, so token/tool/step accounting accumulates.systemPromptdelegates to a newsystemPromptWithBodyso each phase keeps the platform header.Not carried over (kernel is leaner than mort's legacy pipeline)
submitcapture tool (the kernel relies on majordomo's no-tool-call-is-final-answer termination, as its single-loop path already does);Tests
run/phases_test.go: sequential output threading + template expansion, Optional-failure→FallbackMessage continues, hard-error abort, IsRunFunc bare call, per-phase SystemHeader,filterToolboxsubset,expandPhaseTemplate. Fullgo test ./...green.🤖 Generated with Claude Code
The kernel carried RunnableAgent.Phases as a DTO but never executed it — Run always ran a single agent loop with ra.SystemPrompt, so a phased agent (mort's deepresearch/research) silently ran one loop with the base prompt instead of its pipeline. This implements the phase loop, ported from mort's agentexec pipeline but reusing the kernel's own machinery. - run/phases.go: runPhases / runOnePhase. Phases run sequentially; each is a fresh agent loop (or a bare LLM call for IsRunFunc phases) with its own template-expanded system prompt ({{.Query}} + {{.<PhaseName>}}), model tier, step cap, and tool subset. Outputs thread into later phases; the final phase's output is the run output. Optional phases swallow errors and substitute FallbackMessage; a non-optional phase that merely exhausts its step/tool budget salvages its partial transcript and continues (a hard error still aborts); per-phase tier-resolve failures fall back with a WARN. - run/agent.go: Phase gains IsRunFunc + FallbackMessage (the kernel Phase struct previously omitted them). - run/executor.go: Run factors the shared agent options (tool-error limits, step observer, compactor) and branches — single loop (critic's dynamic step ceiling) vs the phase runner (fixed per-phase caps; the run-level critic's steer + hard deadline still apply across phases). systemPrompt now delegates to systemPromptWithBody so each phase keeps the platform header. The same step observer feeds audit/steps/critic across all phases. Tests (run/phases_test.go): sequential output threading + template expansion, Optional-failure → FallbackMessage continues, hard-error abort, IsRunFunc bare call, per-phase SystemHeader, filterToolbox subset, template expansion. Full ./... suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>🪰 Gadfly — live review status
7/7 reviewers finished · updated 2026-06-29 19:29:03Z
claude-code/opus· claude-code — ✅ doneclaude-code/opus:max· claude-code — ✅ donedeepseek-v4-pro:cloud· ollama-cloud — ✅ donedevstral-2:123b-cloud· ollama-cloud — ✅ doneglm-5.2:cloud· ollama-cloud — ✅ donekimi-k2.6:cloud· ollama-cloud — ✅ doneqwen3.5:397b-cloud· ollama-cloud — ✅ doneLive status board. Findings are posted in each model's own comment. Advisory only — does not block merge.
🪰 Gadfly consensus review — 19 inline findings on changed lines. See the consensus comment for the full ranked summary.
Advisory only — does not block merge.
@@ -0,0 +43,4 @@// critic's dynamic iteration ceiling (per-phase caps are fixed at phase start —// the run-level critic's steer + hard deadline still apply across phases).// phaseDeps carries the per-run state the phase runner shares with Run: the base🟡 Redundant comments in phaseDeps
maintainability · flagged by 1 model
run/phases.go:46-58- Issue: ThephaseDepsstruct has a comment that explains its purpose, but the comment is somewhat redundant with the struct's name and fields. The comment could be more concise. - Suggestion: Shorten the comment to focus on the unique aspects ofphaseDepsthat aren't immediately obvious from the struct definition.🪰 Gadfly · advisory
@@ -0,0 +49,4 @@// the critic/session steer, and the audit recorder (phase events).type phaseDeps struct {baseModel llm.ModelbaseTier string🟠 Dead field baseTier in phaseDeps struct — populated but never read
maintainability · flagged by 4 models
phaseDeps.baseTieris dead code (run/phases.go:52). The field is populated inexecutor.go:348(baseTier: tier) but never read anywhere in the package.grepconfirmsdeps.baseTierhas zero usages. This is a dead struct field that will confuse future readers. Remove it from the struct and the construction site.🪰 Gadfly · advisory
@@ -0,0 +61,4 @@// whose Output is the final phase's output, with Usage aggregated across phases// and Messages set to the last phase's transcript (for the PostRun hook). A hard// (non-optional, non-budget) phase failure returns the error.func (e *Executor) runPhases(runCtx context.Context, ra RunnableAgent, deps phaseDeps, query string, images []llm.ImagePart) (*agent.Result, error) {🟡 Long function runPhases
maintainability · flagged by 1 model
runPhases- Location:run/phases.go:64-140- Issue: TherunPhasesfunction is quite long and handles multiple responsibilities, including iterating over phases, handling errors, and managing outputs. This makes it harder to read and maintain. - Suggestion: Break this function into smaller, focused functions. For example, separate the error handling logic into a dedicated function.🪰 Gadfly · advisory
@@ -0,0 +73,4 @@return lastResult, err}instructions := expandPhaseTemplate(phase.SystemPrompt, query, outputs)🟠 Untrusted model outputs (prior phase results) are injected into later phases' system prompts via {{.}}, creating a prompt-injection escalation path where adversarial content gains system-prompt privilege.
security · flagged by 1 model
run/phases.go:76,run/phases.go:122,run/phases.go:144,run/phases.go:262-275) —expandPhaseTemplatesubstitutes prior-phase outputs (model-generated, influenced by adversarial query/tool content) via{{.<PhaseName>}}; the expandedinstructionsbecome the next phase'sSystem:prompt (systemPromptWithBody→agent.New(model, system, …)/model.Generate{System: system}). In the single-loop path model…🪰 Gadfly · advisory
@@ -0,0 +86,4 @@if err != nil {isLast := i == len(ra.Phases)-1switch {case phase.Optional:🔴 Optional-phase branch swallows context cancellation/deadline/critic-kill; a killed/timed-out run with an Optional final phase reports status "ok" with a fallback message
correctness, error-handling · flagged by 3 models
1.
Optionalphase handling swallows context cancellation / deadline / critic-kill — a killed or timed-out run whose final phase isOptionalis reported as a successful run —run/phases.go:89🪰 Gadfly · advisory
@@ -0,0 +97,4 @@deps.rec.LogEvent("phase_failed_optional", map[string]any{"phase": phase.Name, "error": err.Error()})}case isPhaseBudgetExhaustion(err) && (!isLast || strings.TrimSpace(output) != ""):🟠 Last non-optional phase exhausting its budget with empty salvage aborts the run with a hard error and empty output, discarding all prior phases' completed work — contradicting the documented 'NOT fatal / never discards earlier work' semantics
correctness, error-handling, maintainability, performance · flagged by 3 models
run/phases.go:100-118— a final non-optional phase that exhausts its step/tool budget with no salvageable narrated text aborts the whole run with a hard error and empty output, discarding all earlier phases' completed work — contradicting the file's own documented "NOT fatal" guarantee. Confirmed by tracing the switch at lines 88–119.runOnePhase(lines 178–190) only setsoutputto a non-empty value whensalvagePhaseTranscript(res)finds assistant prose; otherwiseoutputstays `"…🪰 Gadfly · advisory
@@ -0,0 +115,4 @@}default:return lastResult, fmt.Errorf("pipeline phase %q: %w", phase.Name, err)🟡 totalUsage not applied to lastResult on hard-error return; prior phases' token usage lost from run result
correctness · flagged by 1 model
run/phases.go:118—totalUsagenot applied tolastResulton hard-error return (verified by tracing lines 82-84, 118, andexecutor.go:358-360). When a non-optional phase fails hard andrunOnePhasereturns a non-nil partialres, line 83 overwriteslastResult = resand line 84 accumulatestotalUsagecorrectly. But the error return on line 118 sendslastResultas-is — its.Usagefield still holds only the failed phase's partial usage, not the accumulatedtotalUsagethat i…🪰 Gadfly · advisory
@@ -0,0 +125,4 @@// Synthesize the run result: the final phase's output, usage aggregated over// all phases, and the last phase's transcript for the PostRun hook.if lastResult == nil {🟡 When the last phase fails as Optional, synthesized lastResult carries a prior phase's transcript for the PostRun hook
error-handling · flagged by 1 model
run/phases.go:128runPhases— final synthesizedlastResultmay carry a prior phase's transcript when the last phase fails optionally. When the last phase isOptionaland fails with a nilres,lastResultis not updated and retains the previous phase'sagent.Result; the code then overwritesOutput/Usagewith the fallback/accumulated usage, butMessagesremains the prior phase's transcript (passed to the PostRun hook). The doc says "last phase's transcript for the PostRu…🪰 Gadfly · advisory
@@ -0,0 +140,4 @@func (e *Executor) runOnePhase(runCtx context.Context, ra RunnableAgent, deps phaseDeps, phase Phase, instructions, query string, images []llm.ImagePart) (string, *agent.Result, error) {model := e.phaseModel(runCtx, deps, ra, phase)// The phase's expanded instructions are the system prompt (with the platform// header so tools keep their run ids); the original query is the user message.🟡 Inconsistent error handling in runOnePhase
maintainability · flagged by 1 model
runOnePhase- Location:run/phases.go:143-200- Issue: TherunOnePhasefunction handles errors in a nested manner, which can make it difficult to follow the flow of execution. The error handling logic is intertwined with the main logic. - Suggestion: Extract error handling into separate functions or use a more structured approach to improve readability.🪰 Gadfly · advisory
@@ -0,0 +152,4 @@return "", nil, fmt.Errorf("phase %q model call: %w", phase.Name, err)}if deps.rec != nil {deps.rec.OnStep(1, resp)🟠 IsRunFunc phases bypass shared stepObserver; critic clock not refreshed, steps missing from emitter
correctness · flagged by 2 models
🪰 Gadfly · advisory
@@ -0,0 +183,4 @@// Budget/guard exhaustion leaves a usable partial transcript but an empty// final answer; salvage the narrated work so the pipeline can carry it forward.if runErr != nil && isPhaseBudgetExhaustion(runErr) {if salvaged := salvagePhaseTranscript(res); salvaged != "" {🟠 Missing nil check for res.Steps
error-handling · flagged by 1 model
run/phases.go:186: The draft claims thatsalvagePhaseTranscriptassumesres.Stepsis non-nil without an explicit check. Looking at the code, the functionsalvagePhaseTranscriptdoes iterate overres.Stepswithout a nil check. This is a valid concern, and the finding should be kept.🪰 Gadfly · advisory
@@ -0,0 +196,4 @@if phase.ModelTier == "" {return deps.baseModel}_, m, err := e.cfg.Models(ctx, phase.ModelTier)🟠 phaseModel discards the resolver's enriched usage-attribution context, so non-base-tier phases can be mis-attributed when resp.Model is empty (single-loop path adopts it via ctx = modelCtx)
correctness, error-handling · flagged by 2 models
2. Per-phase model resolution discards the enriched usage-attribution context —
run/phases.go:199(phaseModel:_, m, err := e.cfg.Models(ctx, phase.ModelTier))🪰 Gadfly · advisory
@@ -0,0 +209,4 @@// loop hit its step cap or tripped a tool-error guard) — which leaves a usable// partial transcript — as opposed to a hard error (cancellation, model failure).func isPhaseBudgetExhaustion(err error) bool {return errors.Is(err, agent.ErrMaxSteps) || errors.Is(err, agent.ErrToolLoop)🟡
run/phases.go:212correctness · flagged by 1 model
run/phases.go:212: TheisPhaseBudgetExhaustionfunction checks foragent.ErrMaxStepsandagent.ErrToolLoop, but these errors are not defined in the repository. This could lead to incorrect behavior if the errors are not properly defined or imported. Verified by searching for the error definitions in the repository and finding no matches.🪰 Gadfly · advisory
@@ -0,0 +232,4 @@}}out := strings.TrimSpace(b.String())const maxSalvage = 8000🔴 salvagePhaseTranscript truncates by bytes, corrupting multi-byte UTF-8 runes
correctness, error-handling, maintainability, performance · flagged by 7 models
run/phases.go:236-238—salvagePhaseTranscripttruncates by byte length (len(out) > maxSalvageandout[len(out)-maxSalvage:]). For any multi-byte UTF-8 text this slices through runes, producing invalid UTF-8 and garbled output fed into downstream phases. The limit should count runes (utf8.RuneCountInString) and slice on a rune boundary.🪰 Gadfly · advisory
@@ -0,0 +241,4 @@// phaseUserMessage builds a phase's user message: the original query text plus// any inline images. Mirrors the single-loop multimodal seeding in runAgent.func phaseUserMessage(query string, images []llm.ImagePart) llm.Message {🟠 phaseUserMessage duplicates runAgent's multimodal message construction — extract shared helper
maintainability · flagged by 5 models
phaseUserMessageduplicatesrunAgent's multimodal message logic (run/phases.go:244–256vsrun/executor.go:491–506). Both functions build allm.Messagefromquery + imageswith the same pattern: no images → simple path, otherwise build[]llm.Partwith optional text + image parts. The comment onphaseUserMessageeven acknowledges it "mirrors"runAgent. Extract a shared helper (e.g.,func userMessage(query string, images []llm.ImagePart) llm.Message) so the two call sites…🪰 Gadfly · advisory
@@ -0,0 +257,4 @@// expandPhaseTemplate applies Go text/template substitution to a phase prompt,// replacing {{.Query}} with the original query and {{.<PhaseName>}} with a prior// phase's output. Returns the original string unchanged if parsing or execution🔴 expandPhaseTemplate silently fails for phase names with hyphens or non-identifier characters
correctness, error-handling, maintainability, performance, security · flagged by 6 models
run/phases.go:262-275—expandPhaseTemplatebuildstext/templatevariables as{{.<PhaseName>}}(line 269:data[k] = v). Gotext/templateidentifiers may contain only letters, digits, and underscores. Phase names with hyphens (deep-research), digits at the start, spaces, etc. parse as invalid expressions or fail execution silently, causing the template to return unexpanded — downstream phases receive literal{{.deep-research}}instead of the actual prior output. This con…🪰 Gadfly · advisory
@@ -0,0 +264,4 @@if err != nil {return tmpl}data := map[string]string{"Query": query}🟡 Unsanitized user input in template data could lead to XSS or injection
security · flagged by 1 model
Finding 2: Unsanitized User Input in Template Data (
run/phases.go:267-270) - Verification: ThequeryandpriorOutputsvalues are directly passed into the template data map without sanitization. - Impact: Confirmed. This could lead to XSS or other injection issues if the output is rendered in a web context. - Conclusion: This finding is valid.🪰 Gadfly · advisory
@@ -0,0 +280,4 @@// a WARN — a typo'd phase tool list should not abort a run mid-pipeline.func filterToolbox(box *llm.Toolbox, names []string) *llm.Toolbox {out := llm.NewToolbox(box.Name())if len(names) == 0 {🟠 filterToolbox copies entire toolbox when no filtering is requested
correctness, performance · flagged by 3 models
filterToolboxcopies entire toolbox when no filtering is requested —run/phases.go:283-287When a phase does not specify a tool subset (len(names) == 0),filterToolboxstill allocates a newllm.Toolboxand copies every tool into it. If the base toolbox is large and the pipeline has many phases, this is unnecessary allocation andAddoverhead on every phase of every run. The single-loop path already passes the originaltoolboxpointer directly without copying. Fix: Return…🪰 Gadfly · advisory
@@ -0,0 +301,4 @@// addUsage sums two llm.Usage tallies field-by-field so a phased run reports the// total tokens across all phases.func addUsage(a, b llm.Usage) llm.Usage {🟠 addUsage manually sums llm.Usage fields, brittle to upstream changes
maintainability · flagged by 2 models
run/phases.go:304-311—addUsagemanually accumulates every field ofllm.Usage. If the upstream struct gains a new field (e.g. a future cost or token category), this function silently under-counts. Prefer anAddmethod onllm.Usage(if majordomo doesn't provide one, add it there) or generate this with a simple reflection helper; at minimum add a compile-time assertion that verifies field count.🪰 Gadfly · advisory
🪰 Gadfly review — consensus across 7 models
Verdict: Blocking issues found · 21 findings (10 with multi-model agreement)
run/phases.go:235run/phases.go:260run/phases.go:244run/phases.go:52run/phases.go:89run/phases.go:100run/phases.go:283run/phases.go:155run/phases.go:199run/phases.go:30411 single-model findings (lower confidence)
run/phases.go:76run/phases.go:186run/agent.go:64run/phases.go:46run/phases.go:64run/phases.go:118run/phases.go:128run/phases.go:143run/phases.go:212run/phases.go:267run/executor.go:267Per-model detail
claude-code/opus (claude-code) — Blocking issues found
Verdict: Blocking issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
I've verified the key security-relevant mechanics. Let me confirm one detail about the data-vs-template-text boundary, since it's the crux of the injection question.
The findings:
expandPhaseTemplate(run/phases.go:262) usestext/templatewhere the template text isphase.SystemPrompt(host/agent config — trusted) and the data is{Query, prior-phase-outputs}(untrusted). Go'stext/templateinserts data values as literals; it does not re-parse them as templates. So a query containing{{.Secret}}is rendered as literal text, not evaluated — no server-side template injection / RCE. Verified against the package usage (run/phases.go:10, onlytext/template, neverhtml/template, and no other call site feeds untrusted strings as template text).filterToolbox(run/phases.go:289) copies fromdeps.baseToolbox, which is the already-permission-decorated toolbox built once viaRegistry.Build(..., inv, Visibility("private"), ...)(run/executor.go:191) plus delegation/extra/session tools. Phases can only narrow this set (box.Get(name)over the existing palette) — they cannot add a tool that wasn't already authorized for this invocation, and an unknown name is skipped, not synthesized. No authz escalation across phases. Verified.phaseModel(run/phases.go:218) resolvesphase.ModelTierfrom config, falling back to the base model. Tier strings are host config, not request-tainted. No concern.LogEvent/slog.Warnonly (phase name +err.Error()), not phase outputs/secrets.One thing genuinely in-lane worth flagging, though it is a design property rather than a coding defect:
runOnePhase(run/phases.go:160-165), the expanded instructions — which for any phase whoseSystemPromptcontains{{.Query}}or{{.<priorPhase>}}now embed the raw user query and prior (LLM-generated, therefore untrusted) phase outputs — become the system message (system := e.systemPromptWithBody(instructions)), while the single-loop path keeps user input confined to the user message. This widens the prompt-injection surface: untrusted text gains system-role authority across phases. It's opt-in (the template author must write{{.Query}}) and ported intentionally from mort, so it's not a regression I'd block on, but hosts running phased agents on untrusted input should be aware that phase-to-phase output threading carries any injected instructions forward at system authority. Mitigation if desired: keep{{.Query}}substitution in the user message and reserve the system body for static instructions.Net: the injection/authz/secret surfaces are clean; the only security-relevant note is the inherent prompt-injection amplification of threading untrusted content through system prompts, which is by design.
VERDICT: No material issues found
🎯 Correctness — Blocking issues found
I've verified all three findings against the actual code. Let me confirm my analysis:
Finding 1 — Confirmed. In
runPhases, the loop-top guard (phases.go:72) only protects the start of the next phase. Thecase phase.Optional:(phases.go:89) is checked first and swallows any error, includingcontext.Canceled/DeadlineExceeded/critic-kill.isPhaseBudgetExhaustion(phases.go:212) only matchesErrMaxSteps/ErrToolLoop, so a ctx error falls into the Optional branch. When the Optional phase is the last one, no subsequent iteration's top-guard catches it;runPhasesreturnsnilerror.statusFor(executor.go:391) checksrunErr == nilfirst → returns"ok"even whencontext.Cause(runCtx)isErrCriticKill. Real bug.Finding 2 — Confirmed as a deviation.
phaseModel(phases.go:199) discards the resolver's enriched ctx with_; the single-loop path adopts it (executor.go:159). Impact is narrower than it could be:recordUsage/resolvedModelName(call.go:235-268) attribute primarily byresp.Model, falling back to ctx attribution only whenresp.Modelis empty — so mis-attribution only manifests in that fallback path. The draft's low confidence is appropriate; the deviation itself is real.Finding 3 — Confirmed.
SetIteration(runengine.go:142) is an atomic store; each phase is a fresh loop with a sharedstepObserver, so the live counter regresses at phase boundaries.RunInfo.MaxIterations= basemaxIter(executor.go:172). Token accounting unaffected —Writer.OnStep(writer.go:83) sumsresp.Usageregardless ofiter. Observability only.All three survive. (Fixing the stale line number for finding 2 in the JSON block: the discard is at
phases.go:199, not 178.)VERDICT: Blocking issues found
Findings
1.
Optionalphase handling swallows context cancellation / deadline / critic-kill — a killed or timed-out run whose final phase isOptionalis reported as a successful run —run/phases.go:89The error switch checks
case phase.Optional:first and unconditionally substitutesFallbackMessagefor any error, includingcontext.Canceled,context.DeadlineExceeded, andErrCriticKill. The only cancellation guard is at the top of the loop (run/phases.go:72), which protects the start of the next phase, not a cancellation that occurs during a phase.Trace the case where the final phase is
Optionaland the run is killed/timed-out mid-phase:runOnePhase→runAgentreturns(res, ctxErr)whereerrors.Is(ctxErr, context.Canceled)(orDeadlineExceeded).isPhaseBudgetExhaustion(ctxErr)is false (phases.go:212— only matchesErrMaxSteps/ErrToolLoop), so cancellation never hits the budget branch.case phase.Optionalmatches → error swallowed,output = FallbackMessage, loop ends (it's the last phase).runPhasesreturns(synthesized result, nil).Back in
Run(run/executor.go:357),statusFor(runCtx, nil)returns"ok"— it checksrunErr == nilbefore consulting the cancellation cause, so even anErrCriticKillcause is ignored — andres.Outputis the fallback string. A run the critic killed (or that blew itsMaxRuntime) is recorded as a successful run delivering a canned "phase encountered an error" message, instead of statuskilled/timeout. The IsRunFunc path has the same exposure (its error wraps the ctx error atrun/phases.go:152).Fix: before the
Optional/budget branches, bail on a genuine context error, e.g.so cancellation/deadline/kill always aborts and surfaces with the correct status.
2. Per-phase model resolution discards the enriched usage-attribution context —
run/phases.go:199(phaseModel:_, m, err := e.cfg.Models(ctx, phase.ModelTier))The single-loop path adopts the resolver's returned context (
ctx = modelCtx,run/executor.go:159) — its documented purpose is "enriches ctx for usage attribution" (executor.go:19-21).phaseModelresolves a phase's tier but throws the enriched ctx away with_, so a phase running on a non-base tier executes under the base tier's attribution context while using the other tier's model.Confidence is low — usage attribution (
model/call.go:235-268) keys primarily offresp.Model(the serving model name), and only falls back to the context attribution set byParseModelForContextwhenresp.Modelis empty. So mis-attribution only manifests for responses that don't name their serving model. It is nonetheless a clear, unintended deviation from the established single-loop pattern introduced by this change.3. (Minor) Live iteration counter regresses across phases —
run/executor.go:267(stateAcc.SetIteration(s.Index))Each phase is a fresh agent loop whose
s.Indexrestarts, andSetIterationdoes an atomic store, not a max/accumulate (RunStateAccessor,run/runengine.go:142). A self-status/progress consumer watching a phased run sees the iteration count jump backwards at each phase boundary, andRunInfo.MaxIterations(set to the basemaxIter,executor.go:172) understates a phased run whose phases can each consume up tomaxItersteps. Token accounting is not affected — the auditWritersumsresp.UsageperOnStepregardless ofiter(audit/writer.go:83-110), so duplicate indices across phases don't corrupt totals. Observability only.🧹 Code cleanliness & maintainability — Minor issues
Both findings verified against the actual source. Outputting the corrected review.
VERDICT: Minor issues
This is a clean, well-commented port. The structure (one
runPhasesorchestrator +runOnePhaseworker + small pure helpers) reads well, the doc comments are thorough and honest about what was/wasn't carried over, and thesharedOptsrefactor inexecutor.gocleanly factors the common agent options without duplicating them per path. Two minor maintainability nits:run/phases.go:52— dead struct fieldbaseTier.phaseDeps.baseTieris populated (executor.go:348baseTier: tier) and documented ("the base model + tier"), but no code in the package ever reads it —phaseModelfalls back todeps.baseModel, never a tier. Verified bygrep -rn baseTier run/: only the declaration and the assignment, no read. A field that looks load-bearing but isn't will mislead the next maintainer (e.g. someone wiring per-phase compaction thresholds will assume the base tier is already threaded here). Drop the field and thebaseTier: tierline, or actually use it (the compaction threshold is computed once from the basetierand shared across phases — if that's a known limitation,baseTierwould be the natural hook, but as written it's just dead).run/phases.go:244-256vsrun/executor.go:495-501— duplicated multimodal part-building.phaseUserMessageand the image-folding block insiderunAgentare byte-for-byte the same logic (make([]llm.Part, 0, len(images)+1), trim-guard the text part, append each image). The comment onphaseUserMessageeven says "Mirrors the single-loop multimodal seeding in runAgent" — an explicit acknowledgment of the copy-paste. Extract a smallfunc userParts(text string, images []llm.ImagePart) []llm.Partand have both call sites use it, so a future change to multimodal seeding (e.g. a new part type) only has to land in one place. Low-churn, removes the mirror-by-hand maintenance hazard.Nothing else in my lane: naming is consistent with the surrounding code, the
switchinrunPhasesis readable, helper functions are appropriately small and single-purpose, and the new test file follows the existing table/fake-provider patterns.⚡ Performance — No material issues found
I have enough to assess the Performance lens. The phase runner is LLM-round-trip-bound, and I verified the key allocation/loop sites against the actual code in
run/phases.goandrun/executor.go.VERDICT: No material issues found
Reviewed strictly through the ⚡ Performance lens. The new phase runner is dominated by sequential model round-trips; the surrounding bookkeeping is all bounded by the (small, author-defined) phase count, so there's nothing I can justify as a real efficiency regression.
What I checked and cleared:
Registry.Build(permissions/SSRF/delegation decoration) runs once inRun(executor.go:191-232); each phase only callsfilterToolbox(phases.go:281), which copies already-decorated tool references into a fresh*llm.Toolbox. Cost is O(phases × tools) reference copies — negligible vs. the model calls. Good reuse, not a regression.phases.go:149,176-178); the prior phase's full transcript is not carried forward into the next loop's context. So context size doesn't compound phase-to-phase. Theoutputsmap (phases.go:65) grows by exactly one entry per phase, and salvaged transcripts are tail-bounded to 8000 chars (phases.go:235-238).stepObserver/emitter andaddUsagetally across phases (executor.go:265-290,phases.go:84); total steps ≤ Σ per-phase caps, total usage is a fixed-field sum. No quadratic accumulation.expandPhaseTemplateparses one small template per phase (phases.go:262-276) andaddUsageis constant-time (phases.go:304-311) — both off the hot path relative to a model call.phaseModelre-resolves the tier only for phases that set a non-emptyModelTier(phases.go:195-206); that's required for per-phase tiers, not redundant work.Below-threshold, deliberately not flagged as a finding (micro-optimization): when
phase.Toolsis empty,filterToolboxstill allocates a full copy of the base palette per phase (phases.go:283-288) rather than returningdeps.baseToolboxdirectly — but it's a handful of pointer copies per phase, immaterial against the LLM latency.🧯 Error handling & edge cases — Minor issues
I've read the full implementation. Let me verify each finding against the actual code.
Finding 1 —
expandPhaseTemplate(lines 262-276): Confirmed on all three sub-claims by reading the function:tmplverbatim, no log. A phase referenced as{{.deep-research}}(hyphen/space/leading-digit name) is an invalid Go template field selector → parse error → entire prompt returned unchanged with the literal{{...}}passed through.data["Query"], then lines 268-270 overwrite frompriorOutputs, so a phase namedQueryshadows the original input.return tmplwith noslog.Warn, unlikephaseModel(201) andfilterToolbox(292, 296) which do warn. Confirmed.Finding 2 —
salvagePhaseTranscript(lines 236-238): Confirmed.out[len(out)-maxSalvage:]is a byte-offset slice on a string; a multibyte rune straddling the boundary is split. Real (cosmetic) issue.Both findings independently reproduce against the real source and rest only on in-repo logic. Both survive.
VERDICT: Minor issues
🧯 Error handling & edge cases
run/phases.go:262-276—expandPhaseTemplateswallows template errors silently, and several validPhase.Names break threading invisibly. Three coupled edge cases, all silent:Nameis not a valid Go template identifier — hyphens, spaces, leading digits (deep-research,step 1,2nd) — makes any downstream{{.deep-research}}a parse error, so the whole prompt is returned unchanged and the literal{{.deep-research}}is sent to the model. The kernel's ownRunnableAgent.PhasesDTO places no constraint onName, so this is reachable from valid config.Querysilently shadows the original input:datais seeded with"Query": query(line 267) and thenfor k,v := range priorOutputs { data[k]=v }(268-270) overwrites it, so later phases'{{.Query}}resolves to that phase's output, not the run input.slog.Warn— unlike every other fallback in this file (phaseModel,filterToolboxall WARN). A misconfigured pipeline therefore degrades to "model got raw{{...}}text" with zero diagnostics.Impact: misconfiguration produces silently-wrong prompts, very hard to debug. Suggested fix: log a WARN on parse/execute failure, and at minimum document/validate that
Namemust be a bare template identifier (and reserveQuery).run/phases.go:236-238—salvagePhaseTranscripttail-trim can split a UTF-8 rune.out[len(out)-maxSalvage:]slices at a byte offset, so a multi-byte rune straddling the 8000-byte boundary becomes an invalid leading fragment (a replacement char) in the salvaged text. Cosmetic, but trivial to fix: advance to a rune boundary, e.g. trim onutf8.RuneStartor slice on a[]rune.Everything else in this lane checked out: the
runCtx.Err()guard between phases (72-74), thenil-result handling for failed bare calls (82-85, 128-130), the budget-vs-hard-error classification (isPhaseBudgetExhaustion, 211-213), the last-phase-with-empty-salvage falling through to a hard error (100, 117-118), andfilterToolboxskipping unknown/duplicate tools without aborting (289-298) are all sound.claude-code/opus:max (claude-code) — Blocking issues found
Verdict: Blocking issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
I've verified the security-relevant surfaces of this change. Findings:
expandPhaseTemplate(run/phases.go:266) parsesphase.SystemPrompt(agent-definition config) as the template source, and injects the userqueryand prior phase outputs only as data values in themap[string]string. A query containing{{...}}is never re-parsed, so untrusted input can't inject template directives. It usestext/template(correct — prompt text, no HTML/XSS context). Even a fully attacker-controlled phase template can only reachQuery/prior outputs and call methods onstringvalues —text/templateis not a sandbox escape and exposes no fs/env, so no info disclosure beyond the run's own data.toolboxis built viae.cfg.Registry.Build(... Visibility("private") ...)(run/executor.go:191), i.e. the decorated/permissioned palette.filterToolbox(run/phases.go:289) only re-adds the same tool objects returned bybox.Get(name)and can only ever produce a subset — a phase cannot acquire a tool absent from the agent's palette, nor strip the permission decoration. No privilege escalation. (EmptyTools→ full palette is documented, agent-author-controlled behavior, not a vuln.)run/phases.go:154) passes no tools at all — strictly more restrictive than the loop path; no new sink.No material security issue introduced by this change.
VERDICT: No material issues found
🎯 Correctness — Minor issues
I've confirmed both findings against the actual source.
Finding 1 (IsRunFunc bypasses critic clock): Confirmed.
critic.recordStepis called only atrun/executor.go:272, insidestepObserver(defined at line 265). The IsRunFunc branch inrun/phases.gocallsdeps.rec.OnStep(1, resp)directly and never routes throughstepObserver, so the critic's activity clock is not refreshed by a bare phase.critic.recordStep→h.RecordStep(run/critic.go:82-86) is the sole refresh path, and the executor's deadline goroutine (run/critic.go:48-75) cancelsrunCtxonce the (unrefreshed) hard deadline passes — a real false-kill vector when a stall-based critic is enabled. The executor comment atexecutor.go:343and the phases.go header both assert the step observer feeds the critic "across every phase," which IsRunFunc phases violate. Kept.Finding 2 (silent template swallow): Confirmed.
expandPhaseTemplatereturns the prompt unchanged on bothParseandExecuteerrors with no log, while sibling helpersfilterToolboxandphaseModelbothslog.Warnon their failure paths. A phase name containing-/space can't be referenced via{{.Name}}(parse error → whole prompt left unexpanded). Kept as a minor observability concern.Both findings survive; verdict unchanged.
VERDICT: Minor issues
The core mechanics are sound. Verified against the actual code:
addUsage(run/phases.go:304) sums exactly the fivellm.Usagefields the rest of the repo references (InputTokens,OutputTokens,CacheReadTokens,CacheWriteTokens,ReasoningTokens); no field is dropped. Audit token accounting (rec.TokenStats()viaOnStep) and the returnedResult.Usage(viatotalUsage) are independent sinks, so there's no double-count (executor.go:426).runPhasesare logically consistent with their tests, and the%wwrapping in the default branch preserveserrors.Is(matchesTestPhases_HardErrorAborts).model.Generatereturns*llm.Response, sorec.OnStep(1, resp)andresp.Message()are correctly typed; theIsRunFunctest exercises this path.Findings:
run/phases.go:154-160—IsRunFuncphases never refresh the critic's activity clock (possible false stall-kill). The bare-LLM branch callsdeps.rec.OnStep(1, resp)directly but bypassesstepObserver, which is the only caller ofcritic.recordStep(run/executor.go:272,run/critic.go:82-86). So for the entire duration of a bare phase, the run-critic's last-activity timestamp stays frozen at the prior phase's value. A slowIsRunFuncphase — exactly the plan/synthesize transforms this feature targets, which can run long on a large threaded context — can be misjudged as stalled and nudged or killed. This contradicts the PR's stated invariant ("the same step observer feeds audit/steps/critic across all phases"). Impact is gated on the host enabling a stall-based critic, hence not blocking. Fix: callcritic.recordStep(1, resp)alongside therec.OnStepin theIsRunFuncbranch. (Real-world kill depends on critic config — medium confidence.)run/phases.go:283-296—expandPhaseTemplatesilently swallows parse/exec failures, masking broken threading. On anytemplate.Parse/Executeerror it returns the prompt unchanged, so the literal{{.x}}text leaks into the system prompt with no diagnostic. Two concrete ways to hit this: (a) a typo'd phase variable, and (b) a phase whoseNameisn't a valid Go-template identifier (contains-, space, etc.) — then{{.<PhaseName>}}is a parse error and the whole template silently fails to expand, breaking threading for that phase. Given threading is the headline feature, a silent degradation is a correctness smell. Fix:slog.Warnon the parse/exec error (mirroring the WARN discipline used infilterToolboxandphaseModel) so a misconfigured pipeline is visible. (Medium confidence.)Items checked and judged not bugs (documented design choices): per-phase fixed step caps vs. the critic's dynamic ceiling; last-phase budget-exhaustion with non-empty output reporting
okrather thanerror;missingkey=zeroforward-references resolving to"". The 8000-byte tail-trim insalvagePhaseTranscriptcan split a UTF-8 rune but that's trivial.🧹 Code cleanliness & maintainability — Minor issues
Both findings are confirmed against the actual code:
baseTieris declared atrun/phases.go:52and assigned atrun/executor.go:348(baseTier: tier), with no other matches anywhere inrun/— a genuine dead field. ThephaseDepsdoc comment does advertise "the base model + tier," so it's also misleading.phaseUserMessage(run/phases.go:244-256) andrunAgent(run/executor.go:495-501) build[]llm.Partwith a line-for-line identical block — samemake([]llm.Part, 0, len(images)+1), samestrings.TrimSpace(...) != ""guard on the text part, same image loop — differing only in the final wrap.VERDICT: Minor issues
Through the code cleanliness & maintainability lens, the new
run/phases.gois well-organized, thoroughly commented, and follows the surrounding executor conventions (nil-saferec,slog.Warnon degrade,fmt.Errorf("…: %w")wrapping). Two genuine maintainability nits:phaseDeps.baseTieris a dead field —run/phases.go:52, set atrun/executor.go:348(baseTier: tier) but never read anywhere in the package (grep -rn "baseTier" run/matches only the declaration and the assignment). ThephaseDepsdoc comment even advertises "the base model + tier," so the comment misleads too.phaseModelresolves tiers offphase.ModelTier/deps.baseModeland never needs the base tier string. Fix: drop the field and thebaseTier:line (and trim "+ tier" from the comment), or actually use it (e.g. in the resolve-failure WARN).phaseUserMessageduplicatesrunAgent's image-folding —run/phases.go:244-256vsrun/executor.go:495-501. The block that builds[]llm.Partfrom a (possibly-blank) query plus inline images is line-for-line identical; the only difference is the final wrap (llm.UserParts(parts...)returned directly vs handed toagent.WithHistory). The new helper's own comment ("Mirrors the single-loop multimodal seeding in runAgent") flags the copy-paste. Fix: extract a smallmultimodalParts(text string, images []llm.ImagePart) []llm.Partand call it from bothrunAgentandphaseUserMessage, so the "omit empty text part" rule lives in one place.Nothing blocking in this lane. The phase/loop branching in
Run, thesystemPromptWithBodyextraction, and the error-classificationswitchinrunPhasesare all clean and readable.⚡ Performance — No material issues found
VERDICT: No material issues found
I reviewed
run/phases.goand theexecutor.gobranch through the performance lens, verifying the actual checked-out code (read both files in full around the changed regions). The phase runner is well-bounded and I found no efficiency regression worth flagging:outputsmap,totalUsage, and the accumulatedemitter/critic step state are all bounded by the phase count and the fixed per-phaseMaxStepscaps.salvagePhaseTranscriptis explicitly tail-capped atmaxSalvage = 8000(run/phases.go:235-238), and the tail slice is concatenated into a fresh string so it doesn't retain the larger backing array.expandPhaseTemplaterebuilds itsdatamap per phase by copying prior outputs (run/phases.go:267-270), which is O(phases²) in map entries — but phase counts are single-digit and string values are copied by reference, so this is immaterial. No per-item DB/model fan-out inside any loop.sharedOpts(step observer, tool-error limits, compactor) is built once inRunand reused across phases (executor.go:295-355), so the expensive setup isn't repeated per phase. The per-phaseappend([]agent.Option{...}, deps.sharedOpts...)allocates a small new slice per phase — trivial.Two things I considered and am deliberately not filing as findings, because they're inherent to the phased design rather than avoidable code inefficiencies:
runOnePhaseseedsimagesinto each loop viarunAgent, andphaseUserMessagere-attaches them forIsRunFuncphases —run/phases.go:149,178,244-256). For an image-bearing multi-phase run this multiplies image token/upload cost by the phase count vs. a single-loop run. But each phase is a fresh context with no shared message history, so re-seeding the query+images is structurally required; there's no cheaper rewrite without changing the design. Whether later phases need the images at all is a correctness/design question, not my lane.filterToolboxclones the full toolbox even whenphase.Toolsis empty (run/phases.go:283-288), where the single-loop path uses the toolbox directly. This is O(tools) allocations per loop phase. With single-digit phases and a modest palette it's negligible; I could not verifyToolbox.Add's per-call cost (majordomo lives outside the readable working dir), so I won't assert a real impact.Nothing here rises to a material performance problem.
🧯 Error handling & edge cases — Blocking issues found
I've verified both findings against the actual source in
run/phases.goandrun/executor.go.Finding 1 is confirmed. In
runPhasestheswitchatrun/phases.go:88evaluatescase phase.Optional:(line 89) before any context-abort check. For a non-final phase a context cancellation is re-caught by the next iteration'srunCtx.Err()guard (line 72), but the final phase has no next iteration: theOptionalbranch substitutesFallbackMessage, the loop ends, andrunPhasesreturns(lastResult, nil). Back inRun,statusFor(runCtx, nil)hits its first caserunErr == niland returns"ok"(line 393) — itsErrCriticKillrelabel at line 400 is never reached. The in-repo comment at line 395-399 confirms a critic kill surfaces as a plaincontext.Canceledrun error, corroborating that the abort flows throughrunOnePhaseas anerr. Real.Finding 2 is confirmed:
out[len(out)-maxSalvage:]at line 237 slices a string on a byte index, which can split a multi-byte rune. Trivial/cosmetic.Both survive.
VERDICT: Blocking issues found
🧯 Findings
1. (high) Optional last phase swallows context-cancellation / deadline / critic-kill → run misreported as
"ok"—run/phases.go:88-98The
switchinrunPhasesmatchescase phase.Optional:first and swallowserrunconditionally, without distinguishing the phase's own intrinsic failure from a run-level abort (caller cancel,MaxRuntimedeadline, or a critic kill).For every phase except the last this is masked: the next iteration's top-of-loop
runCtx.Err()check (run/phases.go:72) re-detects the cancellation and aborts. But when the final phase isOptional, there is no next iteration:runOnePhase→runAgentreturnscontext.Canceled(the critic kills viacancelCause(ErrCriticKill), which cancelsrunCtx; a deadline / caller-cancel likewise),Optionalbranch substitutesFallbackMessageand falls through,runPhasesreturns(lastResult, nil).Back in
Run(run/executor.go:357),runErr == nil, sostatusFor(run/executor.go:391-394) returns"ok"andres.Err == nil.statusForkeys entirely offrunErr; its onlyrunCtxconsult is theErrCriticKillrelabel (line 400), which is never reached because therunErr == nilcase wins first.Concrete impact: a timed-out / caller-cancelled / critic-killed run whose trailing phase is
Optionalis reported as a successful run — audit status"ok", theFallbackMessagedelivered as if it were a real answer, and a critic's kill decision silently overridden. This defeats thekilled/timeout/cancelleddistinction the surrounding code goes to lengths to preserve. Plausible in practice: a best-effort synthesis/enrichment final phase markedOptional.Fix: don't let
Optional(or the budget branch) swallow a run-context abort. Add a guard before theOptionalcase, e.g.:(
errorsis already imported and used atrun/phases.go:212.)2. (trivial)
salvagePhaseTranscripttail-trims on a byte boundary —run/phases.go:236-237out[len(out)-maxSalvage:]can split a multi-byte UTF-8 rune, so the salvaged text can begin with an invalid partial rune (right after the prepended"…trimmed…\n"). Cosmetic only — it's fed into a downstream prompt, not parsed — but a rune-safe trim (advance to the next rune start, or trim on a line boundary) is cheap.deepseek-v4-pro:cloud (ollama-cloud) — Minor issues
Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
I traced the flow of untrusted data (user
queryand prior-phase LLM outputs) through the new phase pipeline and found no material security issues.What I verified:
Template injection is not possible (
phases.go:262-276):expandPhaseTemplateuses Go'stext/templatewith the template text sourced fromphase.SystemPrompt(host-configured, not user-controlled). The user'squeryand prior phase outputs are passed as data values tot.Execute(), not as template text. In Go's template engine, data values are rendered as literal strings — they cannot inject template directives. A prior phase's LLM output containing{{...}}would appear literally in the expanded prompt, not be interpreted. This is structurally safe.No new injection surfaces: The expanded prompt and user message flow into
llm.Request.Systemandllm.Request.Messagesrespectively, which are consumed by the LLM provider. There is no SQL, shell, path, or HTML context where these strings could be interpreted as code. The LLM is the intended consumer.Safe error defaults:
filterToolboxskips unknown tool names with a WARN rather than aborting (phases.go:291-293).phaseModelfalls back to the base model on resolution failure (phases.go:199-204).expandPhaseTemplatereturns the raw template unchanged on parse/execute failure (phases.go:264-265, 272-273). None of these degrade security — they're graceful degradation, not privilege escalation.No new authn/authz gaps: Phases run within the same
runCtxand inherit the same budget gating, critic, and audit recorder as the single-loop path. No phase can escalate privileges or bypass the existing run-level controls.No SSRF, unsafe deserialization, or credential handling: The phase runner makes no new network calls beyond the existing model/tool dispatch. It deserializes nothing. It handles no secrets.
Prompt injection via prior phase outputs is inherent to the design: An earlier phase's LLM output is intentionally threaded into later phase prompts via
{{.<PhaseName>}}. This is the documented feature, not a bug — the PR description states "Outputs thread into later phases." The kernel cannot sanitize LLM outputs for prompt injection without defeating the purpose of the pipeline.🎯 Correctness — Minor issues
All three findings are confirmed against the actual code. Here is the corrected review:
Verdict: Minor issues
Findings
run/phases.go:146-161— IsRunFunc phases bypass the sharedstepObserver(verified by readingexecutor.go:265-289andphases.go:146-161). ThestepObserverclosure — which feedsstateAcc.SetIteration,critic.recordStep, and the step emitter — is passed viasharedOptsto every agent-loop phase but is never called forIsRunFuncphases. Onlydeps.rec.OnStep(1, resp)is called directly. Impact: the critic's activity clock is not refreshed during an IsRunFunc phase, so a slow bare-LLM call could trigger a false stall detection; steps from IsRunFunc phases are absent fromResult.Steps(the emitter never sees them); the live iteration counter (stateAcc) stalls. Fix: call the sharedstepObserver(or at minimumcritic.recordStep+ the emitter) for IsRunFunc phases too, or document that these phases are invisible to the critic/emitter by design.run/phases.go:118—totalUsagenot applied tolastResulton hard-error return (verified by tracing lines 82-84, 118, andexecutor.go:358-360). When a non-optional phase fails hard andrunOnePhasereturns a non-nil partialres, line 83 overwriteslastResult = resand line 84 accumulatestotalUsagecorrectly. But the error return on line 118 sendslastResultas-is — its.Usagefield still holds only the failed phase's partial usage, not the accumulatedtotalUsagethat includes prior successful phases. The caller (executor.go:360) copiesrunRes.Usageintores.Usage, so the run result under-reports token consumption. Fix: setlastResult.Usage = totalUsagebefore thereturn lastResult, ...on line 118.run/phases.go:100— budget exhaustion on the last phase with empty output is fatal, contradicting the documented contract (verified by reading lines 100-118 and the doc comment on lines 34-36). The doc states "a non-optional phase that merely exhausts its step/tool budget is NOT fatal," but the condition!isLast || strings.TrimSpace(output) != ""means the last phase exhausting its budget with no salvageable output falls through to thedefaulterror case and aborts the run. This is arguably reasonable (there's nothing to return), but it contradicts the blanket guarantee in the doc comment. Fix: either update the doc to note the last-phase exception, or treat it as a soft stop with a generated placeholder (consistent with the non-last case on line 107).🧹 Code cleanliness & maintainability — Minor issues
All four findings are confirmed against the actual code. Let me verify each one:
baseTierdead field: Grep confirmsbaseTierappears only at the struct definition (phases.go:52) and the construction site (executor.go:348).deps.baseTierhas zero reads anywhere in the package. Confirmed.phaseUserMessageduplicatesrunAgent's multimodal logic: Both functions (phases.go:244-256 and executor.go:491-506) build the same[]llm.Partfrom text+images with identical branching (len(images)==0→ simple path, else build parts slice). The comment onphaseUserMessageexplicitly says "Mirrors the single-loop multimodal seeding in runAgent." Confirmed.expandPhaseTemplatesilently swallows errors: Lines 263-265 and 272-274 returntmplunchanged on parse/execution failure with noslog.Warnor any diagnostic. The doc comment even says "best-effort, not fatal." Confirmed.Magic number 8000: Line 235 is
const maxSalvage = 8000with no explanatory comment. Confirmed.All four findings survive. The verdict remains "Minor issues."
Verdict: Minor issues
phaseDeps.baseTieris dead code (run/phases.go:52). The field is populated inexecutor.go:348(baseTier: tier) but never read anywhere in the package.grepconfirmsdeps.baseTierhas zero usages. This is a dead struct field that will confuse future readers. Remove it from the struct and the construction site.phaseUserMessageduplicatesrunAgent's multimodal message logic (run/phases.go:244–256vsrun/executor.go:491–506). Both functions build allm.Messagefromquery + imageswith the same pattern: no images → simple path, otherwise build[]llm.Partwith optional text + image parts. The comment onphaseUserMessageeven acknowledges it "mirrors"runAgent. Extract a shared helper (e.g.,func userMessage(query string, images []llm.ImagePart) llm.Message) so the two call sites don't drift.expandPhaseTemplatesilently swallows parse/execution errors (run/phases.go:262–276). When a template is malformed or execution fails, the function returns the raw template string unchanged — meaning the LLM sees literal{{.Query}}/{{.PhaseName}}in its system prompt. There is no log, no warning, no audit event. A misconfigured phase prompt would produce subtly wrong behavior with zero diagnostic signal. At minimum, log aslog.Warnon failure so operators can detect the problem.Magic number
maxSalvage = 8000insalvagePhaseTranscript(run/phases.go:235). The constant has no explanation of why 8000 characters was chosen. A named constant with a comment (e.g., "cap salvage to roughly 2K tokens to avoid bloating downstream phase prompts") would make the intent clear and easier to tune later.⚡ Performance — No material issues found
Verdict: No material issues found
The phase runner is linear in both phases and steps, with no N+1 queries, quadratic loops, or unbounded growth. LLM call latency dominates by orders of magnitude; the CPU/memory overhead of template parsing, toolbox copying, and string building is negligible in that context.
A few micro-observations that don't rise to the level of material findings:
filterToolboxcopies the full toolbox even whennamesis empty (phases.go:283-287). When a phase doesn't restrict tools (the common case), the function could return the originalboxdirectly instead of allocating a copy and re-adding every tool. With typical toolbox sizes (~10–50 tools) and phase counts (2–5), the wasted allocation is trivial relative to LLM call latency.expandPhaseTemplatere-parses the template string on every run (phases.go:263). The phaseSystemPromptis static per agent definition; caching parsed templates would avoid repeated lexing/parsing. Again, microsecond-level savings against second-to-minute LLM calls.salvagePhaseTranscriptbyte-index slices a UTF-8 string (phases.go:237).out[len(out)-maxSalvage:]can split a multi-byte rune, producing invalid UTF-8. This is primarily a correctness concern (outside my lens), but could cause wasted work if a downstream LLM API rejects the malformed text.None of these are material enough to flag as performance regressions.
🧯 Error handling & edge cases — Minor issues
Verdict: Minor issues
run/phases.go:237—salvagePhaseTranscripttruncates on byte boundary, can split UTF-8 (medium, high confidence). The lineout[len(out)-maxSalvage:]slices at a byte offset, which can break a multi-byte rune in half, producing invalid UTF-8 in the salvaged output that gets threaded into later phases. Use[]runeconversion or iterate withutf8.DecodeLastRuneInStringto find a safe cut point. Verified by reading the function;len(out)on astringreturns byte count, and the subsequent slice operates on byte indices.run/phases.go:201—phaseModellogserror=<nil>when resolver returns nil model without error (small, high confidence). The conditionerr != nil || m == nilon line 200 triggers the warn log for both cases, but whenm == nil && err == nilthe log field"error", erremitserror=<nil>, making the message "model resolve failed" misleading — there was no error, just a nil model. Distinguishing the two cases (or logging"model", "nil"instead) would make the warning actionable. Verified by reading lines 199-205.devstral-2:123b-cloud (ollama-cloud) — Minor issues
Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — Minor issues
Now, let's verify the findings in the draft review:
Finding 1: Template Injection Risk in
expandPhaseTemplate(run/phases.go:262)expandPhaseTemplatefunction usestext/templatewithOption("missingkey=zero"). This does not prevent template injection attacks if an attacker controls theSystemPromptfield.SystemPrompt, they could inject malicious template directives.Finding 2: Unsanitized User Input in Template Data (
run/phases.go:267-270)queryandpriorOutputsvalues are directly passed into the template data map without sanitization.Finding 3: Lack of Input Validation for
SystemPrompt(run/agent.go:64)SystemPromptfield in thePhasestruct is a free-form string with no validation.All findings in the draft review are confirmed. The verdict remains "Minor issues."
🎯 Correctness — Minor issues
Verdict
Minor issues
Findings
run/phases.go:212: TheisPhaseBudgetExhaustionfunction checks foragent.ErrMaxStepsandagent.ErrToolLoop, but these errors are not defined in the repository. This could lead to incorrect behavior if the errors are not properly defined or imported. Verified by searching for the error definitions in the repository and finding no matches.gadfly-findings
🧹 Code cleanliness & maintainability — Minor issues
Now, let's verify each finding in the draft review:
Inconsistent Naming in
PhaseStruct:IsRunFuncis inconsistently named compared to other fields likeMaxIterationsandModelTier. However,IsRunFuncis a boolean flag that indicates whether a phase is a single bare LLM call, which is distinct from the other fields. The naming is appropriate for its purpose, and there is no inconsistency.Long Function in
runPhases:runPhasesfunction is indeed long and handles multiple responsibilities. This is a valid observation.Magic Number in
salvagePhaseTranscript:maxSalvage = 8000is defined within the function and lacks context. This is a valid observation.Redundant Comments:
phaseDepsstruct is descriptive but could be more concise. This is a valid observation.Inconsistent Error Handling in
runOnePhase:runOnePhaseis nested and intertwined with the main logic. This is a valid observation.Unused Variable in
runPhases:lastOutputis used in the final synthesis of the result, so it is not unused. This finding is incorrect.Based on this verification, the findings that are confirmed are:
runPhasessalvagePhaseTranscriptphaseDepsrunOnePhaseThe corrected review is as follows:
Corrected Review
Verdict
Minor issues
Findings
Long Function in
runPhasesrun/phases.go:64-140runPhasesfunction is quite long and handles multiple responsibilities, including iterating over phases, handling errors, and managing outputs. This makes it harder to read and maintain.Magic Number in
salvagePhaseTranscriptrun/phases.go:260maxSalvage = 8000is a magic number. While it is defined within the function, it lacks context or explanation.Redundant Comments
run/phases.go:46-58phaseDepsstruct has a comment that explains its purpose, but the comment is somewhat redundant with the struct's name and fields. The comment could be more concise.phaseDepsthat aren't immediately obvious from the struct definition.Inconsistent Error Handling in
runOnePhaserun/phases.go:143-200runOnePhasefunction handles errors in a nested manner, which can make it difficult to follow the flow of execution. The error handling logic is intertwined with the main logic.⚡ Performance — Minor issues
Verdict
Minor issues
Findings
salvagePhaseTranscript: The functionsalvagePhaseTranscriptinrun/phases.go(line 218) reconstructs the phase output by iterating over all steps and concatenating their responses. This operation could be inefficient for phases with a large number of steps, as it involves multiple string concatenations and trimming operations. Consider using a more efficient approach, such as preallocating a buffer or using a more optimized string-building method.gadfly-findings
🧯 Error handling & edge cases — Minor issues
Now let's verify the findings:
Finding at
run/phases.go:106: The draft claims thatstrings.TrimSpace(output) == ""is checked redundantly at lines 106 and 108. However, the code shows that the check at line 106 is part of anifcondition, while line 108 is part of anelseblock. These are not redundant checks but rather mutually exclusive conditions. Therefore, this finding is incorrect and should be dropped.Finding at
run/phases.go:186: The draft claims thatsalvagePhaseTranscriptassumesres.Stepsis non-nil without an explicit check. Looking at the code, the functionsalvagePhaseTranscriptdoes iterate overres.Stepswithout a nil check. This is a valid concern, and the finding should be kept.Finding at
run/phases.go:236: The draft claims that themaxSalvageconstant is hardcoded to8000without validation. This is confirmed in the code, and there is no additional validation or explanation for this value. This finding should be kept.Corrected Review
VERDICT
Minor issues
Findings
run/phases.go:186: ThesalvagePhaseTranscriptfunction assumesres.Stepsis non-nil, but there is no explicit check for this. Ifres.Stepsisnil, the loop will panic. Add a nil check before iterating overres.Steps.run/phases.go:236: ThemaxSalvageconstant is hardcoded to8000, but there is no validation to ensure this value is reasonable or safe for all use cases. Consider making this configurable or adding a comment explaining the rationale for this value.glm-5.2:cloud (ollama-cloud) — Minor issues
Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — Minor issues
Both findings are confirmed against the actual source.
expandPhaseTemplate(lines 262-275) substitutes prior-phase outputs (outputs[phase.Name], populated at line 122) into a later phase's prompt template; the result (instructions) becomes the phase's system prompt at line 144 (system := e.systemPromptWithBody(instructions)), passed toagent.New(model, system, ...)(line 176) andmodel.Generate(..., System: system)(line 150). The injection-into-system-prompt claim holds. (Note: line 107-108 cited in the draft is the budget-exhaustion note strings, not the injection site; the real injection sites are lines 76/122/144/150/176.)expandPhaseTemplatereturns the rawtmplon bothParse(264-266) andExecute(272-274) error with no WARN logged. Confirmed.Corrected review
Verdict: Minor issues
Untrusted model outputs are injected into later phases' system prompts (
run/phases.go:76,run/phases.go:122,run/phases.go:144,run/phases.go:262-275) —expandPhaseTemplatesubstitutes prior-phase outputs (model-generated, influenced by adversarial query/tool content) via{{.<PhaseName>}}; the expandedinstructionsbecome the next phase'sSystem:prompt (systemPromptWithBody→agent.New(model, system, …)/model.Generate{System: system}). In the single-loop path model content only ever returns as conversational user/assistant messages; here it is elevated to system-prompt status, which models weight more authoritatively. This is a prompt-injection escalation path: a malicious query or tool result that drives phase A's output can embed instructions that phase B treats as system-level directives. The system-prompt placement is the kernel's choice and materially raises the injection surface. Suggested mitigation: render prior-phase outputs into the later phase's user message (or a fenced, clearly-delimited context block) rather than theSystem:field; at minimum document the trust boundary for phase-template authors.Template parse/execution failures silently fall back to the raw prompt (
run/phases.go:263-266,run/phases.go:272-274) —expandPhaseTemplatereturns the originaltmplon eitherParseorExecuteerror. For a malformed operator template this means unexpanded{{...}}tokens (which may contain prior-phase output names or.Query) reach the model verbatim — not a direct injection, but an operator typo silently leaks template scaffolding into the prompt with no error surfaced. Low severity; consider logging a WARN on the parse/execute-error path so misconfigured phase prompts aren't silently swallowed.No authn/authz, command/SQL injection, SSRF, deserialization, or secret-leakage issues found in the touched code. Tool/model-tier inputs are operator-config-sourced and validated/resolved through existing resolvers;
filterToolboxskips unknown tool names rather than widening access;phaseModelfalls back to the base model on resolution failure rather than failing open to an arbitrary tier.🎯 Correctness — Minor issues
VERDICT: Minor issues
Verified against the actual code in
run/phases.go(lines 64–134, 140–191),run/executor.go(the single-loop/phase branch at 328–355,statusForat 391,runAgentat 491), and a repo-wide grep forReasoningTokens/ErrMaxSteps/ErrToolLoop.run/phases.go:100-118— a final non-optional phase that exhausts its step/tool budget with no salvageable narrated text aborts the whole run with a hard error and empty output, discarding all earlier phases' completed work — contradicting the file's own documented "NOT fatal" guarantee. Confirmed by tracing the switch at lines 88–119.runOnePhase(lines 178–190) only setsoutputto a non-empty value whensalvagePhaseTranscript(res)finds assistant prose; otherwiseoutputstays"". Back inrunPhases, the budget-exhaustion case is gated byisPhaseBudgetExhaustion(err) && (!isLast || strings.TrimSpace(output) != "")(line 100). For the last phase with empty salvage that condition istrue && (false || false)=false, so execution falls through todefault(line 117):return lastResult, fmt.Errorf("pipeline phase %q: %w", phase.Name, err).lastResultat that point is the last phase'sres(it is reassigned every iteration at line 83), so its.Outputis empty and carries none of the earlier phases' outputs — those live only in the localoutputsmap (line 65) and the transiently-overwrittenlastResult, both dropped on the early return. The caller (executor.go:357-363) then classifies this as a failed run (statusFordefault branch) withres.Output = "". This is precisely the "long verify phase" the doc comment (lines 34–37) promises will "never [discard] every earlier phase's work": a verify/consolidation phase is typically last in a research pipeline, and when the model burns its whole step budget on tool calls without writing a final prose answer, salvage returns""and the run is reported as a total failure. The fix is to handle the empty-last-phase-budget case explicitly: synthesize a degraded output from the prioroutputsmap (or at minimum a "final phase did not consolidate" note) and returnnilerror instead of falling todefault, so earlier phase work survives. (Confidence: medium — the code path is unambiguous; one can argue a hard abort is defensible when there is genuinely nothing to return, but that is not what the documented contract states.)The
addUsagefield concern was already self-dropped by the draft and is not a finding: a repo-wide grep confirmsllm.Usage.ReasoningTokensis referenced inaudit/writer.go:109,139-140, so the field exists andaddUsagecompiles. The input-file staging ordering (stageInputFilesatexecutor.go:323precedes the single-loop/phase branch) and the{{.Query}}/{{.<PhaseName>}}template threading throughexpandPhaseTemplate(missingkey=zeroon amap[string]string) are correct as the draft noted.🧹 Code cleanliness & maintainability — Minor issues
All three findings are confirmed against the actual source code. The duplication between
phaseUserMessage(phases.go:244-256) andrunAgent(executor.go:491-505) is real, the convoluted guard at line 100 is as described, andaddUsagedoes sum field-by-field against an externalllm.Usagestruct.Verdict: Minor issues
run/phases.go:244-256—phaseUserMessageduplicatesrunAgent's multimodal seeding. The comment admits it "mirrors the single-loop multimodal seeding in runAgent," but the code is copy-pasted rather than shared.runAgent(executor.go:491-505) already encapsulates the "user text + inline images →llm.UserParts" logic with the same trimming/append pattern. The current duplication means a future change to one (e.g., how empty query text is handled, image part ordering) silently diverges between single-loop and phase paths. Suggested fix: extract a shareduserMessageWithImages(query string, images []llm.ImagePart) llm.Messagehelper used by bothrunAgentandphaseUserMessage.run/phases.go:100— convoluted guard condition hurts readability.case isPhaseBudgetExhaustion(err) && (!isLast || strings.TrimSpace(output) != "")couples the "is this a soft stop" check with a last-phase non-empty-output check in one boolean expression, with the real intent (a final phase that exhausts budget with no salvageable output is treated as a hard failure) only visible by reading the body. A two-line comment or anisLast && strings.TrimSpace(output) == ""early-fallthrough todefault:would make the salvage-vs-abort boundary legible without a comment archaeology dig.run/phases.go:304-311—addUsagefield-by-field sumsllm.Usage. This is the standard Go approach and works today, but it's a maintainability hazard ifllm.Usage(defined in the externalmajordomo/llmpackage) ever gains a field — a phased run would silently under-report that field. Thellmpackage is an external module dependency (not vendored in this checkout), so the struct definition could not be inspected directly; this is a low-severity "fragile if upstream changes" note, not a current bug.⚡ Performance — No material issues found
VERDICT: No material issues found
Reviewed
run/phases.go,run/executor.go, andrun/phases_test.gothrough the performance lens.outputs(map[string]stringsized tolen(ra.Phases)),filterToolbox(one new toolbox per phase, O(tools) copy),expandPhaseTemplate(onetext/templateparse per phase), and the per-phaseoptsslice copy ofsharedOptsare all O(phases) and cheap relative to the LLM calls they bracket. None of these sit in a hot loop beyond the phase loop itself.salvagePhaseTranscriptis capped atmaxSalvage = 8000via a tail-trim, and it only iteratesres.Stepsonce with a singlestrings.Builder. No quadratic/unbounded growth.addUsage) is a fixed 5-field additive fold over phases — O(phases), no extra allocation beyond the value copy oftotalUsage.stepObserver/steer/compactoroptions are built once inRunand reused by reference (sharedOptsappended, not rebuilt) across phases.Two things I deliberately did not flag (verified they are outside the performance lane / inherent to the feature):
phaseUserMessage), so each phase re-sends all images to the model. This is an inherent cost of "each phase is a fresh loop seeded with the original query," mirroring the single-loop multimodal path — not a regression, and removing it would break phase isolation.WithCompactorthreshold is computed once for the basetierand applied to all phase agents regardless of their per-phaseModelTier; a phase on a larger-context model could compaction-fire earlier than necessary. That's a correctness/threshold-fidelity concern, not a performance regression, and it depends on tier configs I cannot inspect.🧯 Error handling & edge cases — Minor issues
All three findings are confirmed against the actual code. Line numbers in the draft's JSON need correcting to match the actual source.
out[len(out)-maxSalvage:]slices by byte offset.VERDICT: Minor issues
I verified the error-handling paths against the checked-out code. The phase runner's unhappy paths are mostly sound: cancellation is checked between phases (
runCtx.Err()), hard errors abort, budget exhaustion salvages partial transcripts, andagent.ErrMaxSteps/agent.ErrToolLoopare real symbols. A few edge-case nits remain:run/phases.go:237salvagePhaseTranscript— byte-truncation can split a multibyte UTF-8 rune. The boundout[len(out)-maxSalvage:]slices by byte offset; if the cut lands inside a multibyte character the salvaged string begins with invalid UTF-8, which then becomes the phase output (and may be threaded into the next phase's template or returned as the run output). Low impact (only triggers on >8KB salvage), but a real boundary bug. Fix: use a rune-aware truncate.run/phases.go:89runPhasesswitch ordering — anOptionalphase that hits budget exhaustion is swallowed withFallbackMessage, discarding the salvaged partial transcript.case phase.Optional:precedes theisPhaseBudgetExhaustioncase, so a long-running optional verify phase that ran out of steps loses its (real, salvageable) partial work and is replaced by a static fallback note. This appears intentional ("Optional swallows a phase's error and substitutes FallbackMessage"), but it is a silent data-loss edge case worth confirming: an optional phase that did substantial work before hitting the step cap yields less useful output than a non-optional one in the same situation.run/phases.go:128runPhases— final synthesizedlastResultmay carry a prior phase's transcript when the last phase fails optionally. When the last phase isOptionaland fails with a nilres,lastResultis not updated and retains the previous phase'sagent.Result; the code then overwritesOutput/Usagewith the fallback/accumulated usage, butMessagesremains the prior phase's transcript (passed to the PostRun hook). The doc says "last phase's transcript for the PostRun hook," which is inaccurate in this path.No swallowed errors or missing cleanup found: the
defer/context handling is consistent with the single-loop path, andphaseModel's resolve-failure fallback is logged at WARN rather than dropped silently.kimi-k2.6:cloud (ollama-cloud) — Blocking issues found
Verdict: Blocking issues found — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
No material issues found.
Through the Security lens I examined:
expandPhaseTemplate, run/phases.go:262–276): Phase prompts are parsed withtext/templateagainst amap[string]stringof prior outputs and the original query. The data surface is limited to strings (no functions or structs are exposed), and no custom func-map is registered. The key risk — template injection — would only materialize if an untrusted party could author thePhase.SystemPrompttemplate itself; per the PR,RunnableAgent(including itsPhases) is a host-controlled DTO, so the template source is trusted.missingkey=zerois a safe default.err.Error()is logged into the internalRunRecorderaudit stream on optional-phase failure and budget exhaustion. This is consistent with existing audit logging in the executor and does not expose data to a new, less-trusted consumer.phaseModel, run/phases.go:195–206): Tier strings come from the host-providedPhaseconfig, not from user input, so thee.cfg.Models(ctx, tier)call does not introduce a new SSRF vector.filterToolbox, run/phases.go:281–300): Skips unknown tools with a warning; no authority bypass.runPhases, run/phases.go:64–134): The final result is synthesized from phase outputs that were already produced by the agent/LLM stack; no unsafe deserialization or credential leakage was introduced.🎯 Correctness — Blocking issues found
Blocking issues found
run/phases.go:236-238—salvagePhaseTranscripttruncates by byte length (len(out) > maxSalvageandout[len(out)-maxSalvage:]). For any multi-byte UTF-8 text this slices through runes, producing invalid UTF-8 and garbled output fed into downstream phases. The limit should count runes (utf8.RuneCountInString) and slice on a rune boundary.run/phases.go:262-275—expandPhaseTemplatebuildstext/templatevariables as{{.<PhaseName>}}(line 269:data[k] = v). Gotext/templateidentifiers may contain only letters, digits, and underscores. Phase names with hyphens (deep-research), digits at the start, spaces, etc. parse as invalid expressions or fail execution silently, causing the template to return unexpanded — downstream phases receive literal{{.deep-research}}instead of the actual prior output. This contradicts the documented threading semantics and is a silent data-corruption bug. Useindexsyntax ({{index . "Phase-Name"}}) or validate/sanitize phase names.🧹 Code cleanliness & maintainability — Minor issues
Minor issues
run/phases.go:244-255—phaseUserMessageduplicates the multimodalllm.Partassembly already present inrunAgent(executor.go:491-505). Both functions build a[]llm.Partfrom an optional text part + image parts with identicalstrings.TrimSpace(query) != ""guards. Extract a shared helper likebuildMultimodalUserMessage(query string, images []llm.ImagePart) llm.Messageand call it from both sites.run/phases.go:304-311—addUsagemanually accumulates every field ofllm.Usage. If the upstream struct gains a new field (e.g. a future cost or token category), this function silently under-counts. Prefer anAddmethod onllm.Usage(if majordomo doesn't provide one, add it there) or generate this with a simple reflection helper; at minimum add a compile-time assertion that verifies field count.run/phases.go:262-275—expandPhaseTemplateusestext/templatefor simple key-value substitution and silently returns the original string on parse or execution failure. Phase names containing hyphens, spaces, or other non-identifier characters (e.g.deep-research) produce invalid template syntax ({{.deep-research}}is parsed as a subtraction expression), causing silent fallback to the unexpanded template. This is a debugging trap for pipeline authors. Consider a simplerstrings.ReplaceAll-style expander or document the identifier restriction and log/return the error at WARN level so authors know why substitution failed.run/phases.go:235— Magic constantconst maxSalvage = 8000with no explanation of where the value comes from. Is it bytes, runes, characters? Why 8000? Add a comment describing the rationale (e.g. "tail of ~2k tokens at ~4 bytes/token") or make it a configurable default.⚡ Performance — Blocking issues found
Blocking issues found
Template re-parsed on every phase execution instead of being pre-parsed or cached —
run/phases.go:262-263expandPhaseTemplatecallstemplate.New("phase").Option("missingkey=zero").Parse(tmpl)once per phase, every run. Sincetmplis the staticPhase.SystemPromptfrom configuration, the same AST is rebuilt repeatedly. For a multi-phase agent under load this is pure lexer/parser overhead that scales with request rate.Fix: Pre-parse
Phase.SystemPromptinto a*template.Templatewhen theRunnableAgentis loaded/constructed (or cache parsed templates in async.Mapkeyed by prompt string) and execute with the per-run data map.filterToolboxcopies entire toolbox when no filtering is requested —run/phases.go:283-287When a phase does not specify a tool subset (
len(names) == 0),filterToolboxstill allocates a newllm.Toolboxand copies every tool into it. If the base toolbox is large and the pipeline has many phases, this is unnecessary allocation andAddoverhead on every phase of every run. The single-loop path already passes the originaltoolboxpointer directly without copying.Fix: Return
boxdirectly whenlen(names) == 0:Double
strings.TrimSpacein budget-exhaustion path —run/phases.go:100and:106The condition on line 100 evaluates
strings.TrimSpace(output) != "", and line 106 evaluatesstrings.TrimSpace(output) == ""for the same string. This redundant scan is minor but easily avoided by binding the trimmed result once.Fix: Hoist
trimmed := strings.TrimSpace(output)before the switch.🧯 Error handling & edge cases — Minor issues
Minor issues found.
run/phases.go:100— The budget-exhaustion soft-stop conditionisPhaseBudgetExhaustion(err) && (!isLast || strings.TrimSpace(output) != "")makes the final phase fatal when it exhausts its budget and produces no salvageable transcript. This contradicts the stated semantics that non-optional budget exhaustion is always non-fatal. Earlier phases are protected, but a last-phase soft-stop still aborts the whole run. Suggested fix: drop the!isLast ||guard so the last phase degrades with a generated placeholder message (like the non-last phases already do). Verified by reading therunPhaseserror switch.run/phases.go:236–237—salvagePhaseTranscripttruncates by raw byte count (len(out) > maxSalvagethenout[len(out)-maxSalvage:]), which can split a multi-byte UTF-8 rune and emit invalid UTF-8 into the next phase’s template variable. Suggested fix: truncate by rune count or walk back to the nearest valid UTF-8 boundary. Verified by reading the function body.qwen3.5:397b-cloud (ollama-cloud) — Minor issues
Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
The finding is confirmed. I've verified:
persona/agent.go:159-191—AgentPhasestruct has bothFallbackMessage(lines 183-185) andIsRunFunc(lines 187-190) fields.run/agent.go:62-79—run.Phasestruct has bothFallbackMessage(lines 72-74) andIsRunFunc(lines 75-78) fields.persona/runnable.go:26-35— The conversion loop copiesName,SystemPrompt,ModelTier,MaxIterations,Tools, andOptional, but omitsFallbackMessageandIsRunFunc.This is a genuine bug: personas configured with
IsRunFunc: true(intended as safe, tool-less transform steps) will silently run as full agent loops with tool access, and customFallbackMessagevalues are lost.No material issues found — the single finding in the draft is valid and survives verification.
🎯 Correctness — No material issues found
VERDICT: No material issues found
After thorough review through the Correctness lens, I verified the following:
Template expansion (
expandPhaseTemplate, phases.go:262-276): Correctly expands{{.Query}}and{{.<PhaseName>}}using Go'stext/templatewithmissingkey=zero. Returns original on parse/execute failure (best-effort). Verified against testTestExpandPhaseTemplate.Output threading (phases.go:122): Phase outputs stored in
outputsmap and threaded to subsequent phases via template expansion. Verified by testTestPhases_SequentialThreadsOutputsshowing phase b receives "out-a" and phase c receives "out-b".Optional phase error handling (phases.go:89-98): Sets
output = phase.FallbackMessage(or generated note) while allowing pipeline to continue.lastResultfrom the failed phase is retained for Messages/Usage. Verified by testTestPhases_OptionalFailureSubstitutesFallback.Budget exhaustion salvage (phases.go:100-115, 185-188): Non-optional phases that exhaust step/tool budget salvage partial transcript via
salvagePhaseTranscriptand continue (unless last phase with empty output).salvagePhaseTranscript(phases.go:218-239) safely handles nil result.Usage aggregation (
addUsage, phases.go:304-310): Correctly sums all fivellm.Usagefields (InputTokens, OutputTokens, CacheReadTokens, CacheWriteTokens, ReasoningTokens).IsRunFunc bare call (phases.go:146-162): Single LLM call without tool loop, properly constructs
agent.Resultwith Output, Usage, and Messages.SystemHeader composition (
systemPromptWithBody, executor.go:437-445): Correctly prepends platform header to each phase's expanded instructions. Verified by testTestPhases_SystemHeaderAppliedPerPhase.Toolbox filtering (
filterToolbox, phases.go:281-299): Preserves palette order, skips unknown tools with WARN. Verified by testTestFilterToolbox.Context cancellation (phases.go:72-74): Checks
runCtx.Err()before each phase to abort on timeout/cancellation.The implementation correctly ports mort's pipeline semantics while reusing kernel machinery. Tests cover sequential threading, optional failures, hard errors, IsRunFunc, SystemHeader, toolbox filtering, and template expansion.
🧹 Code cleanliness & maintainability — Minor issues
The finding is confirmed. I've verified:
baseTieris declared atrun/phases.go:52in thephaseDepsstructrun/executor.go:348(baseTier: tier)baseTierreturns only those two occurrences — there is no read access to this field anywhere in the codebaseThis is dead code: the field is written but never read.