run: wire SessionToolFactory + PostRun artifacts + AttachImages #12

Closed
steve wants to merge 0 commits from run-postrun-sessiontools into main
Owner

The gateway for migrating mort's artifact-producing surfaces (chat API, chatbot, .skill, scaddy) onto executus. The session-tool types already lived in tool/ (the P4 move), but the executor never used them — this wires them.

Changes

  • run/session.gosteerMailbox (thread-safe message queue) + runSession (tool.AgentSession over it: AttachImages → a user-role multimodal message injected before the agent's next step) + runPostRun (panic-isolated hook call).
  • executor — create the mailbox + set inv.AttachImages before the toolbox build (so tool handlers capture the live seam); add inv.ExtraTools + a SessionToolFactory's per-run Tools to the toolbox; defer its Cleanup; merge the session mailbox with the critic's nudges into one WithSteer; after the run, call PostRun with the full transcript (runRes.Messages) → Result.PostRunResult (best-effort — a PostRun panic never fails the run).
  • run.Result += PostRunResult *tool.PostRunResult.
  • dropped the now-dead criticBinding.steerOptions (superseded by drainSteer).

Tests

A factory whose PostRun emits an artifact from the output+transcript + Cleanup lands on Result.PostRunResult; a factory-added tool is callable by the model.

Next: a mort PR maps run.Result.PostRunResult through ExecutusRunner so a flagged artifact run delivers its gif/image — then the chat-API/chatbot surface migrations.

🤖 Generated with Claude Code

The **gateway** for migrating mort's artifact-producing surfaces (chat API, chatbot, `.skill`, scaddy) onto executus. The session-tool *types* already lived in `tool/` (the P4 move), but the executor never used them — this wires them. ### Changes - **`run/session.go`** — `steerMailbox` (thread-safe message queue) + `runSession` (`tool.AgentSession` over it: `AttachImages` → a user-role multimodal message injected before the agent's next step) + `runPostRun` (panic-isolated hook call). - **executor** — create the mailbox + set `inv.AttachImages` **before** the toolbox build (so tool handlers capture the live seam); add `inv.ExtraTools` + a `SessionToolFactory`'s per-run `Tools` to the toolbox; defer its `Cleanup`; merge the session mailbox with the critic's nudges into **one** `WithSteer`; after the run, call `PostRun` with the full transcript (`runRes.Messages`) → `Result.PostRunResult` (best-effort — a PostRun panic never fails the run). - **`run.Result` += `PostRunResult *tool.PostRunResult`**. - dropped the now-dead `criticBinding.steerOptions` (superseded by `drainSteer`). ### Tests A factory whose `PostRun` emits an artifact from the output+transcript + `Cleanup` lands on `Result.PostRunResult`; a factory-added tool is callable by the model. Next: a mort PR maps `run.Result.PostRunResult` through `ExecutusRunner` so a flagged artifact run delivers its gif/image — then the chat-API/chatbot surface migrations. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-06-27 22:13:28 +00:00
run: wire SessionToolFactory + PostRun artifacts + AttachImages
executus CI / test (pull_request) Successful in 1m49s
Adversarial Review (Gadfly) / review (pull_request) Successful in 5m19s
4e179259de
The session-tool TYPES already lived in tool/ (P4 move) but the executor never
used them. This wires them, unblocking artifact-producing host surfaces (mort's
chat API / chatbot / .skill / scaddy) to run on executus:

- run/session.go: steerMailbox (thread-safe message queue) + runSession
  (tool.AgentSession over it: AttachImages → a user-role multimodal message
  injected before the agent's next step) + runPostRun (panic-isolated hook call).
- executor: create the mailbox + set inv.AttachImages BEFORE the toolbox build;
  add inv.ExtraTools + a SessionToolFactory's per-run Tools to the toolbox; defer
  its Cleanup; merge the session mailbox with the critic's nudges into ONE
  WithSteer; after the run, call PostRun with the full transcript
  (runRes.Messages) → Result.PostRunResult (best-effort, never fails the run).
- run.Result += PostRunResult *tool.PostRunResult.
- dropped the now-dead criticBinding.steerOptions (superseded by drainSteer).

Tests: a factory whose PostRun emits an artifact from the output+transcript +
Cleanup lands on Result.PostRunResult; a factory-added tool is callable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly review — glm-5.1:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Reviewing through the 🔒 Security lens:

  • inv.AttachImages / steerMailbox.push — The new AttachImages seam queues arbitrary llm.Messages (user-role, constructed from text + images) into the agent's conversation loop. Only trusted, server-side code can reach this: tool handlers via inv, and SessionToolFactory closures via AgentSession. No end-user input flows directly into push without passing through a trusted tool handler first. The same trust model already applies to tool return values the model sees.

  • SessionToolFactory / ExtraTools injection — Both allow injecting arbitrary tools and code (including PostRun/Cleanup closures) into a run. This is by design — they're set by the trusted dispatch path (mort's code constructing Invocation), not from raw user input. Invocation is passed by value to Run, so post-construction tampering by the caller isn't possible.

  • PostRun panic isolationrunPostRun correctly uses defer/recover to isolate panics, setting the result to nil rather than propagating. This prevents a malicious or buggy PostRun from crashing the process or failing an otherwise-successful run. ✓

  • Cleanup is deferred — Ensures resource teardown runs even on early returns or panics in the run path. ✓

  • steerMailbox synchronizationpush acquires mu, appends, releases; drain acquires mu, swaps to nil, releases. No TOCTOU, no data race on the slice after drain (the caller owns the returned slice). ✓

  • No secret/credential leakagePostRunResult carries Artifacts (name+MIME+bytes) and Metadata (arbitrary map). These are produced by trusted PostRun hooks and returned in the Result struct. No credentials, tokens, or secrets are in scope.

No security-relevant findings after verification.

🎯 Correctness — No material issues found

VERDICT: No material issues found

After thorough verification:

  • runSessiontool.AgentSession conformance: The AgentSession interface (in tool/session_tools.go:23-30) requires AttachImages(text string, images ...llm.ImagePart). The concrete runSession in session.go:64 has exactly this signature. ✓

  • Two runSession instances sharing one mailbox: Lines 186–187 and 219 both create &runSession{mailbox: mailbox} pointing to the same steerMailbox. This is intentional — inv.AttachImages (for ordinary tools) and the factory's AgentSession.AttachImages must both inject into the same agent loop. ✓

  • Nil-safety on critic.drainSteer(): drainSteer() returns nil when b == nil (critic.go:119–123). The merged steer closure append(mailbox.drain(), critic.drainSteer()...) appends to a nil slice when the critic is absent, which is valid Go. ✓

  • runPostRun panic recovery: Uses a named return prr correctly; the defer/recover sets prr = nil on panic, which overwrites any partial assignment before the panic. ✓

  • Cleanup deferral: defer st.Cleanup() (executor.go:221) is registered before the tool-adding loop, so it runs even if toolbox.Add fails at line 224. Early returns before the factory block (lines 192, 199) correctly skip Cleanup since no factory was created yet. ✓

  • ctx vs runCtx in PostRun: runPostRun receives ctx (the parent context, line 341), not the potentially-cancelled runCtx. This is correct — PostRun should still be able to produce artifacts after a timeout. ✓

  • steerMailbox.drain() thread safety: drain() locks, swaps m.msgs with nil, and returns the old slice. Each call yields a fresh slice. The append(mailbox.drain(), critic.drainSteer()...) closure in the steer callback cannot corrupt the mailbox state. ✓

No logic bugs, semantic errors, or domain-correctness issues found.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

Reviewing through the error-handling & edge-case lens:

  • runPostRun panic isolation is correct: named return prr is assigned nil inside the defer/recover, so a panicking PostRun hook safely produces a nil PostRunResult without propagating the panic. ✓
  • steerMailbox.drain() returns the internal slice then sets m.msgs = nil. Subsequent pushes allocate a new backing array, so the returned slice is not aliased. ✓
  • Nil-safety on criticBinding.drainSteer(): returns nil when b == nil (no critic configured). Combined with mailbox.drain() (returns nil when empty), the merged steer closure func() []llm.Message { return append(mailbox.drain(), critic.drainSteer()...) } is nil-safe. ✓
  • AttachImages no-op guards: early return when all parts are empty (whitespace-only text + no images). ✓
  • st.Cleanup defer ordering: defer st.Cleanup() is registered after the top-level recover defer, so it executes first (LIFO). If Cleanup panics, the top-level recover catches it — acceptable best-effort cleanup semantics. ✓
  • Early-return paths still run Cleanup: When toolbox.Add fails for an extra/session tool, the function returns with res.Err set, but st.Cleanup was already deferred and will still execute. ✓
  • Nil transcript on partial results: When runRes is nil, transcript stays the zero-value nil, which the PostRun hook receives as a valid empty slice. By-design best-effort. ✓
  • Two runSession instances share one mailbox (line 187 vs 219): both wrap the same mailbox pointer, so inv.AttachImages and the factory's session both push to the same queue. Correct. ✓

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 23s

<!-- gadfly-review:ollama:glm-5.1:cloud --> ### 🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> VERDICT: No material issues found Reviewing through the 🔒 Security lens: - **`inv.AttachImages` / `steerMailbox.push`** — The new `AttachImages` seam queues arbitrary `llm.Message`s (user-role, constructed from `text + images`) into the agent's conversation loop. Only trusted, server-side code can reach this: tool handlers via `inv`, and `SessionToolFactory` closures via `AgentSession`. No end-user input flows directly into `push` without passing through a trusted tool handler first. The same trust model already applies to tool return values the model sees. - **`SessionToolFactory` / `ExtraTools` injection** — Both allow injecting arbitrary tools and code (including `PostRun`/`Cleanup` closures) into a run. This is by design — they're set by the trusted dispatch path (mort's code constructing `Invocation`), not from raw user input. `Invocation` is passed by value to `Run`, so post-construction tampering by the caller isn't possible. - **PostRun panic isolation** — `runPostRun` correctly uses `defer`/`recover` to isolate panics, setting the result to `nil` rather than propagating. This prevents a malicious or buggy `PostRun` from crashing the process or failing an otherwise-successful run. ✓ - **`Cleanup` is `defer`red** — Ensures resource teardown runs even on early returns or panics in the run path. ✓ - **`steerMailbox` synchronization** — `push` acquires `mu`, appends, releases; `drain` acquires `mu`, swaps to `nil`, releases. No TOCTOU, no data race on the slice after drain (the caller owns the returned slice). ✓ - **No secret/credential leakage** — `PostRunResult` carries `Artifacts` (name+MIME+bytes) and `Metadata` (arbitrary map). These are produced by trusted PostRun hooks and returned in the `Result` struct. No credentials, tokens, or secrets are in scope. No security-relevant findings after verification. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **VERDICT: No material issues found** After thorough verification: - **`runSession` ↔ `tool.AgentSession` conformance**: The `AgentSession` interface (in `tool/session_tools.go:23-30`) requires `AttachImages(text string, images ...llm.ImagePart)`. The concrete `runSession` in `session.go:64` has exactly this signature. ✓ - **Two `runSession` instances sharing one mailbox**: Lines 186–187 and 219 both create `&runSession{mailbox: mailbox}` pointing to the same `steerMailbox`. This is intentional — `inv.AttachImages` (for ordinary tools) and the factory's `AgentSession.AttachImages` must both inject into the same agent loop. ✓ - **Nil-safety on `critic.drainSteer()`**: `drainSteer()` returns `nil` when `b == nil` (critic.go:119–123). The merged steer closure `append(mailbox.drain(), critic.drainSteer()...)` appends to a nil slice when the critic is absent, which is valid Go. ✓ - **`runPostRun` panic recovery**: Uses a named return `prr` correctly; the `defer`/`recover` sets `prr = nil` on panic, which overwrites any partial assignment before the panic. ✓ - **`Cleanup` deferral**: `defer st.Cleanup()` (executor.go:221) is registered *before* the tool-adding loop, so it runs even if `toolbox.Add` fails at line 224. Early returns *before* the factory block (lines 192, 199) correctly skip Cleanup since no factory was created yet. ✓ - **`ctx` vs `runCtx` in PostRun**: `runPostRun` receives `ctx` (the parent context, line 341), not the potentially-cancelled `runCtx`. This is correct — PostRun should still be able to produce artifacts after a timeout. ✓ - **`steerMailbox.drain()` thread safety**: `drain()` locks, swaps `m.msgs` with nil, and returns the old slice. Each call yields a fresh slice. The `append(mailbox.drain(), critic.drainSteer()...)` closure in the steer callback cannot corrupt the mailbox state. ✓ No logic bugs, semantic errors, or domain-correctness issues found. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> **VERDICT: No material issues found** Reviewing through the error-handling & edge-case lens: - **`runPostRun` panic isolation** is correct: named return `prr` is assigned `nil` inside the `defer`/`recover`, so a panicking PostRun hook safely produces a nil `PostRunResult` without propagating the panic. ✓ - **`steerMailbox.drain()`** returns the internal slice then sets `m.msgs = nil`. Subsequent `push`es allocate a new backing array, so the returned slice is not aliased. ✓ - **Nil-safety on `criticBinding.drainSteer()`**: returns `nil` when `b == nil` (no critic configured). Combined with `mailbox.drain()` (returns `nil` when empty), the merged `steer` closure `func() []llm.Message { return append(mailbox.drain(), critic.drainSteer()...) }` is nil-safe. ✓ - **`AttachImages` no-op guards**: early return when all parts are empty (whitespace-only text + no images). ✓ - **`st.Cleanup` defer ordering**: `defer st.Cleanup()` is registered after the top-level `recover` defer, so it executes first (LIFO). If `Cleanup` panics, the top-level recover catches it — acceptable best-effort cleanup semantics. ✓ - **Early-return paths still run `Cleanup`**: When `toolbox.Add` fails for an extra/session tool, the function returns with `res.Err` set, but `st.Cleanup` was already deferred and will still execute. ✓ - **Nil `transcript` on partial results**: When `runRes` is `nil`, `transcript` stays the zero-value `nil`, which the PostRun hook receives as a valid empty slice. By-design best-effort. ✓ - **Two `runSession` instances share one mailbox** (line 187 vs 219): both wrap the same `mailbox` pointer, so `inv.AttachImages` and the factory's session both push to the same queue. Correct. ✓ </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 23s</sub>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

🔒 Security Review

VERDICT: No material issues found

Through the security lens I checked:

  • run/session.go runSession.AttachImages — Builds a []llm.Part from caller-supplied text and images. Both go through typed constructors (llm.Text, ImagePart) into llm.UserParts(...). No string interpolation into SQL/shell/path/templates; no deserialization; no SSRF surface (no URL fetch). The parts slice is purely model-facing. No injection vector.
  • steerMailbox — Thread-safe queue protected by sync.Mutex. Only push/drain paths, no external exposure. No untrusted-data parsing.
  • runPostRun — Wraps the host-supplied hook in recover(), logs the panic, returns nil. Panic isolation is correct (named return prr is reset to nil in the recover). No swallowed-error leak: the panic is logged via slog.Error with the panic value.
  • executor.go mailbox + inv.AttachImages assignment — Occurs BEFORE toolbox build so registered tool handlers close over the live seam. The closure is set on a local inv (toolInv is what tools actually receive via the registry), but Registry.Build copies inv to a per-tool toolInv and the executor then mutates inv.AttachImages afterwards — verified at tool/registry.go:693 (toolInv := inv). Wait: Build runs at run/executor.go:190, AFTER the inv.AttachImages assignment at line 187, so the copy picks up the populated AttachImages. Confirmed correct ordering — no stale-nil seam handed to tool handlers. (Verified by reading tool/registry.go:678-707 and run/executor.go:183-195.)
  • Tool injection via ExtraTools / SessionToolFactory.Tools — These come from the invoker (already trusted: the same caller that built tool.Invocation). No external/user input flows into the toolbox unchecked; authn/authz gating happens via existing CheckGate(inv) per-tool. No privilege escalation vector beyond what the invoker already has.
  • Cleanup deferred — Runs at executor return, after finishAudit + budget commit; isolated from hook panic via runPostRun's recover. No resource-leak or post-panic-corruption issue observed.
  • toolbox.Add for session tools — Returns error on collision; duplicates from ExtraTools + factory + registry are surfaced as run errors, not silently overwriting. (Verified run/executor.go:211, 224; run/palette.go:45, 67 use the same pattern.)

No authn/authz gaps, no injection, no SSRF, no unsafe deserialization, no secret leakage, and no insecure defaults introduced by this change.

Outside my lens: none worth raising.

🎯 Correctness — Minor issues

VERDICT: Minor issues

  • run/executor.go:221defer st.Cleanup() is registered BEFORE the run-context defers at run/executor.go:245-255 (defer cancelTimeout(), defer cancelCause(nil), defer mergeCancel(), defer stopCritic()), so by Go's LIFO defer order st.Cleanup fires LAST — after stopCritic() has called h.Stop() on the critic and after runCtx has been cancelled. The doc at tool/session_tools.go:63-67 only promises "deferred by the executor immediately after the factory returns"; it doesn't tell implementers their Cleanup runs on a fully-torn-down run context. A Cleanup that reads from the critic handle, calls h.Record*, or touches any resource the run context supervised will observe a cancelled/teardown world. Concrete fix: move defer st.Cleanup() to AFTER the runCtx/critic defers (e.g., right before agent.New) so it runs while the run context is still observable; alternatively document the teardown-after order so factory authors don't reach for live state.
🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

  • run/session.go + run/executor.go:341 — PostRun runs on the caller's ctx, not a detached one. Verified. The executor's own comment says artifacts are "best-effort" and the PostRun is invoked with the run's caller ctx (runPostRun(ctx, ...) at line 341). Compare to finishAudit (line 344) which routes through e.finishAudit and that helper explicitly does detach(ctx) so a cancelled run still records (per the comment at run/executor.go:377-378). If the caller cancels between ag.Run returning and runPostRun being called — e.g. Discord request deadline, gateway timeout — PostRun's ctx is already Canceled and the hook will fail or short-circuit before producing artifacts. The panic isolation in runPostRun does not help with a cancelled ctx, only with a panic. Fix: call runPostRun(detach(ctx), …) (or accept a detached context at the top of Run) for symmetry with the audit path.

  • run/executor.go:336-342 — no timeout on PostRun. Verified by reading runPostRun in run/session.go:18-28: it has panic isolation but no deadline. A hung artifact hook (e.g. a renderer that wedges) blocks Executor.Run indefinitely after the agent loop has finished. The "best-effort + panic-isolated" framing in the comment understates the liveness risk — panic isolation only catches panics, not infinite loops. Consider wrapping with a short context.WithTimeout (e.g. 10s) using detach(ctx) as parent so a stuck hook can't pin the executor.

  • run/executor.go:186-187inv.AttachImages is stamped on the local inv only. Verified. The comment promises the seam is captured "live" by tool handlers built via e.cfg.Registry.Build(... inv ...). That part is fine because Build takes inv by value but tool handlers close over it. However, if Build (or any later step) copies/aliases inv into a sub-context that outlives this stack frame (e.g. into a delegation sub-run), the local mutation may not propagate. Worth confirming with the registry author; flagging as a potential edge case rather than a confirmed bug.

No swallowed errors found — every error path in the new code (toolbox.Add, SessionToolFactory invocation path) either returns with res.Err set + finishAudit, or panic-isolates via runPostRun. steerMailbox.drain correctly handles the empty case (returns nil, not a zero-length slice). runSession.AttachImages correctly no-ops on empty text + images. The mailbox mutex usage is correct (push and drain both lock). The defer order for st.Cleanup is correct (deferred before the tool Add loop, so even a partial-Add failure still releases resources).

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 54s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## 🔒 Security Review **VERDICT: No material issues found** Through the security lens I checked: - **`run/session.go` `runSession.AttachImages`** — Builds a `[]llm.Part` from caller-supplied `text` and `images`. Both go through typed constructors (`llm.Text`, `ImagePart`) into `llm.UserParts(...)`. No string interpolation into SQL/shell/path/templates; no deserialization; no SSRF surface (no URL fetch). The `parts` slice is purely model-facing. No injection vector. - **`steerMailbox`** — Thread-safe queue protected by `sync.Mutex`. Only `push`/`drain` paths, no external exposure. No untrusted-data parsing. - **`runPostRun`** — Wraps the host-supplied `hook` in `recover()`, logs the panic, returns `nil`. Panic isolation is correct (named return `prr` is reset to nil in the recover). No swallowed-error leak: the panic is logged via `slog.Error` with the panic value. - **`executor.go` mailbox + `inv.AttachImages` assignment** — Occurs BEFORE toolbox build so registered tool handlers close over the live seam. The closure is set on a local `inv` (`toolInv` is what tools actually receive via the registry), but `Registry.Build` copies `inv` to a per-tool `toolInv` and the executor then mutates `inv.AttachImages` afterwards — verified at `tool/registry.go:693` (`toolInv := inv`). **Wait:** `Build` runs at `run/executor.go:190`, AFTER the `inv.AttachImages` assignment at line 187, so the copy picks up the populated `AttachImages`. Confirmed correct ordering — no stale-nil seam handed to tool handlers. (Verified by reading `tool/registry.go:678-707` and `run/executor.go:183-195`.) - **Tool injection via `ExtraTools` / `SessionToolFactory.Tools`** — These come from the invoker (already trusted: the same caller that built `tool.Invocation`). No external/user input flows into the toolbox unchecked; authn/authz gating happens via existing `CheckGate(inv)` per-tool. No privilege escalation vector beyond what the invoker already has. - **`Cleanup` deferred** — Runs at executor return, after `finishAudit` + budget commit; isolated from hook panic via `runPostRun`'s recover. No resource-leak or post-panic-corruption issue observed. - **`toolbox.Add` for session tools** — Returns error on collision; duplicates from `ExtraTools` + factory + registry are surfaced as run errors, not silently overwriting. (Verified `run/executor.go:211, 224`; `run/palette.go:45, 67` use the same pattern.) No authn/authz gaps, no injection, no SSRF, no unsafe deserialization, no secret leakage, and no insecure defaults introduced by this change. **Outside my lens:** none worth raising. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> ## VERDICT: Minor issues - **`run/executor.go:221` — `defer st.Cleanup()` is registered BEFORE the run-context defers at `run/executor.go:245-255` (`defer cancelTimeout()`, `defer cancelCause(nil)`, `defer mergeCancel()`, `defer stopCritic()`), so by Go's LIFO defer order `st.Cleanup` fires LAST — after `stopCritic()` has called `h.Stop()` on the critic and after `runCtx` has been cancelled.** The doc at `tool/session_tools.go:63-67` only promises "deferred by the executor immediately after the factory returns"; it doesn't tell implementers their `Cleanup` runs on a fully-torn-down run context. A `Cleanup` that reads from the critic handle, calls `h.Record*`, or touches any resource the run context supervised will observe a cancelled/teardown world. Concrete fix: move `defer st.Cleanup()` to AFTER the `runCtx`/`critic` defers (e.g., right before `agent.New`) so it runs while the run context is still observable; alternatively document the teardown-after order so factory authors don't reach for live state. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> VERDICT: **Minor issues** - **`run/session.go` + `run/executor.go:341` — PostRun runs on the caller's `ctx`, not a detached one.** Verified. The executor's own comment says artifacts are "best-effort" and the PostRun is invoked with the run's caller `ctx` (`runPostRun(ctx, ...)` at line 341). Compare to `finishAudit` (line 344) which routes through `e.finishAudit` and that helper explicitly does `detach(ctx)` so a cancelled run still records (per the comment at `run/executor.go:377-378`). If the caller cancels between `ag.Run` returning and `runPostRun` being called — e.g. Discord request deadline, gateway timeout — PostRun's `ctx` is already `Canceled` and the hook will fail or short-circuit before producing artifacts. The panic isolation in `runPostRun` does not help with a cancelled ctx, only with a panic. Fix: call `runPostRun(detach(ctx), …)` (or accept a detached context at the top of `Run`) for symmetry with the audit path. - **`run/executor.go:336-342` — no timeout on PostRun.** Verified by reading `runPostRun` in `run/session.go:18-28`: it has panic isolation but no deadline. A hung artifact hook (e.g. a renderer that wedges) blocks `Executor.Run` indefinitely after the agent loop has finished. The "best-effort + panic-isolated" framing in the comment understates the liveness risk — panic isolation only catches panics, not infinite loops. Consider wrapping with a short `context.WithTimeout` (e.g. 10s) using `detach(ctx)` as parent so a stuck hook can't pin the executor. - **`run/executor.go:186-187` — `inv.AttachImages` is stamped on the local `inv` only.** Verified. The comment promises the seam is captured "live" by tool handlers built via `e.cfg.Registry.Build(... inv ...)`. That part is fine because `Build` takes `inv` by value but tool handlers close over it. However, if `Build` (or any later step) copies/aliases `inv` into a sub-context that outlives this stack frame (e.g. into a delegation sub-run), the local mutation may not propagate. Worth confirming with the registry author; flagging as a potential edge case rather than a confirmed bug. No swallowed errors found — every error path in the new code (`toolbox.Add`, `SessionToolFactory` invocation path) either returns with `res.Err` set + `finishAudit`, or panic-isolates via `runPostRun`. `steerMailbox.drain` correctly handles the empty case (returns nil, not a zero-length slice). `runSession.AttachImages` correctly no-ops on empty `text + images`. The mailbox mutex usage is correct (push and drain both lock). The defer order for `st.Cleanup` is correct (deferred before the tool Add loop, so even a partial-Add failure still releases resources). </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 54s</sub>

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

  • Verified run/executor.go:186-187 sets inv.AttachImages before the toolbox build, and tool/registry.go:678-707 copies inv by value into each tool (toolInv := inv), so a tool handler cannot reassign the executor's seam — no tampering surface.
  • inv.ExtraTools and inv.SessionToolFactory are host-constructed fields on tool.Invocation (confirmed at tool/registry.go:427-453); they are not deserialized from untrusted/user input in this repo (grep found only test/executor call-sites). The added toolbox.Add(t) loop introduces no new injection or authz bypass — it reuses the same trust model as registry-built tools. No new untrusted-input surface.
  • runPostRun (run/session.go:18-28) is panic-isolated and passes the caller ctx (not the cancelled runCtx); a PostRun panic is recovered and logged, never fails the run. No unsafe deserialization or execution of hook-supplied data.
  • steerMailbox (run/session.go:35-56) is unbounded, so a runaway tool could queue many messages — but tools are trusted, host-provided code, not an untrusted-input vector; not a material security finding.

No authn/authz gaps, injection, SSRF, deserialization, or secret-leakage issues introduced by this change.

🎯 Correctness — No material issues found

VERDICT: No material issues found

  • Verified run/executor.go:322 steer := func() []llm.Message { return append(mailbox.drain(), critic.drainSteer()...) } — matches the shape agent.WithSteer previously consumed (b.h.Steer is func() []llm.Message, per run/ports.go:140 and critic/critic.go:188). Both drain() and critic.drainSteer() return fresh slices each call (mailbox resets to nil; Steer() drains), so there is no cross-step aliasing or double-injection. Merge order (session images before critic nudges) is benign.
  • Verified run/session.go:19 runPostRun panic isolation: named return prr is correctly reset to nil in the defer recover(), so a panicking hook cannot leak a partial/zero-value *PostRunResult.
  • Verified the SessionToolFactory wiring (run/executor.go:217-231): defer st.Cleanup() is registered before the tool-adding loop, so the early-return error path inside the loop still runs Cleanup. The factory's &runSession{mailbox: mailbox} shares the same mailbox pointer as inv.AttachImages, so the two distinct runSession instances are correctly coupled.
  • Verified PostRun transcript plumbing (run/executor.go:336-342): runRes.Messages is read only when runRes != nil, and res.Output is only populated when runRes != nil (run/executor.go:326-329), so on a failed run with no result the hook gets transcript=nil, output="" — consistent with the documented "populated even on partial results" contract (transcript is nil only when there's no result at all).
  • No magic constants, conversions, or thresholds introduced in this diff; the mailbox/drain logic is pure append-and-swap.

(I could not resolve agent.WithSteer's exact signature in-repo — it lives in the external majordomo/agent module — but the call-site type matches the prior critic.steerOptions() usage that compiled, so this is not flagged.)

🧯 Error handling & edge cases — Minor issues

All three findings are verified against the actual code. Let me confirm the details of each:

  1. PostRun ctx (finding 1): Confirmed — executor.go:341 passes ctx (the caller's context), not runCtx. The runCtx is built with MaxRuntime timeout at line 244-248. The doc at tool/session_tools.go:46-56 says PostRun should produce artifacts on partial-success (timeout) runs. If runCtx were passed, it would be Done() on a timed-out run, preventing artifact production. So passing ctx is deliberate and defensible. Caller cancellation can drop artifacts — a narrow race.

  2. Panic skips PostRun (finding 2): Confirmed — top-level recover at executor.go:119-123 catches panics into res.Err. PostRun is called inline at line 336-342, not deferred. A panic inside ag.Run (line 323) would unwind through the deferred recover, set res.Err, but skip the postRun call at 336. Cleanup is deferred (line 221) so it still runs. The doc says PostRun is "called even if the run fails" — a panic is a failure mode. Minor, as agent panics aren't expected.

  3. drain returns internal slice (finding 3): Confirmed — session.go:53-55 returns m.msgs directly, then sets m.msgs = nil. The backing array is orphaned so current usage at executor.go:322 (append(mailbox.drain(), critic.drainSteer()...)) is safe. But the aliasing is non-obvious; a future caller mutating the returned slice would mutate former mailbox state.

All three are correctly characterized as minor/defensible, not blocking.

Verdict: Minor issues found

  • run/executor.go:341PostRun receives the caller's ctx, not runCtx. This is a defensible choice, not a bug: runCtx carries the run's own MaxRuntime deadline (executor.go:244-248), so on a timed-out (partial-success) run runCtx would already be Done() and a context-respecting hook could not produce artifacts — the exact partial-success case the SessionTools.PostRun doc calls out (tool/session_tools.go:46-56). Using the original ctx preserves artifact production across run timeouts. The tradeoff is that a caller cancellation (client disconnect) will cancel PostRun and silently drop artifacts even on an otherwise-completed run — a narrow race (caller must cancel in the window between ag.Run returning and runPostRun finishing), arguably acceptable under "best-effort," but worth a one-line comment noting the deliberate ctx choice so a future reader doesn't "fix" it to runCtx and break the timeout/partial-success path.

  • run/executor.go:323-342 — a panic inside ag.Run skips PostRun entirely. The top-level recover at executor.go:119-123 converts any panic (including one from the agent run) into res.Err, but because PostRun is a synchronous call after ag.Run (not deferred), a panicking agent run means postRun is never invoked, so no artifacts are produced for that run. Cleanup is still honored (deferred at executor.go:221). The SessionTools doc says PostRun is called "even if the run fails," and a panic is a failure mode. In practice majordomo's agent.Run isn't expected to panic (errors surface via runErr), so impact is minimal — but if strict "PostRun always runs" coverage matters, the hook would need to be invoked from a deferred closure rather than inline. Flagging as minor; no change required if panics are considered out-of-contract for the agent loop.

  • run/session.go:47-56drain returns the internal slice (not a copy). This is safe today: the mailbox sets m.msgs = nil before returning, so the backing array is orphaned and the caller's append(mailbox.drain(), critic.drainSteer()...) at executor.go:322 cannot corrupt the mailbox. Not a bug, but the aliasing is non-obvious; a future caller that holds the returned slice and mutates it would be mutating state that was the mailbox's. A one-line comment or a copy would harden it — low priority.

No issues found with: AttachImages empty-text/no-image early return (session.go:64-74), runPostRun named-return panic recovery (session.go:18-28), nil-guard on Cleanup (executor.go:220-222), or the nil-safe drainSteer/critic == nil path (critic.go:119-123, executor.go:322).

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 34s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **VERDICT: No material issues found** - Verified `run/executor.go:186-187` sets `inv.AttachImages` before the toolbox build, and `tool/registry.go:678-707` copies `inv` by value into each tool (`toolInv := inv`), so a tool handler cannot reassign the executor's seam — no tampering surface. - `inv.ExtraTools` and `inv.SessionToolFactory` are host-constructed fields on `tool.Invocation` (confirmed at `tool/registry.go:427-453`); they are not deserialized from untrusted/user input in this repo (grep found only test/executor call-sites). The added `toolbox.Add(t)` loop introduces no new injection or authz bypass — it reuses the same trust model as registry-built tools. No new untrusted-input surface. - `runPostRun` (`run/session.go:18-28`) is panic-isolated and passes the caller `ctx` (not the cancelled `runCtx`); a PostRun panic is recovered and logged, never fails the run. No unsafe deserialization or execution of hook-supplied data. - `steerMailbox` (`run/session.go:35-56`) is unbounded, so a runaway tool could queue many messages — but tools are trusted, host-provided code, not an untrusted-input vector; not a material security finding. No authn/authz gaps, injection, SSRF, deserialization, or secret-leakage issues introduced by this change. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **VERDICT: No material issues found** - Verified `run/executor.go:322` `steer := func() []llm.Message { return append(mailbox.drain(), critic.drainSteer()...) }` — matches the shape `agent.WithSteer` previously consumed (`b.h.Steer` is `func() []llm.Message`, per `run/ports.go:140` and `critic/critic.go:188`). Both `drain()` and `critic.drainSteer()` return fresh slices each call (mailbox resets to nil; `Steer()` drains), so there is no cross-step aliasing or double-injection. Merge order (session images before critic nudges) is benign. - Verified `run/session.go:19` `runPostRun` panic isolation: named return `prr` is correctly reset to `nil` in the `defer recover()`, so a panicking hook cannot leak a partial/zero-value `*PostRunResult`. - Verified the `SessionToolFactory` wiring (`run/executor.go:217-231`): `defer st.Cleanup()` is registered before the tool-adding loop, so the early-return error path inside the loop still runs Cleanup. The factory's `&runSession{mailbox: mailbox}` shares the same mailbox pointer as `inv.AttachImages`, so the two distinct `runSession` instances are correctly coupled. - Verified `PostRun` transcript plumbing (`run/executor.go:336-342`): `runRes.Messages` is read only when `runRes != nil`, and `res.Output` is only populated when `runRes != nil` (`run/executor.go:326-329`), so on a failed run with no result the hook gets `transcript=nil, output=""` — consistent with the documented "populated even on partial results" contract (transcript is nil only when there's no result at all). - No magic constants, conversions, or thresholds introduced in this diff; the mailbox/drain logic is pure append-and-swap. (I could not resolve `agent.WithSteer`'s exact signature in-repo — it lives in the external `majordomo/agent` module — but the call-site type matches the prior `critic.steerOptions()` usage that compiled, so this is not flagged.) </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> All three findings are verified against the actual code. Let me confirm the details of each: 1. **PostRun ctx (finding 1):** Confirmed — `executor.go:341` passes `ctx` (the caller's context), not `runCtx`. The `runCtx` is built with `MaxRuntime` timeout at line 244-248. The doc at `tool/session_tools.go:46-56` says PostRun should produce artifacts on partial-success (timeout) runs. If `runCtx` were passed, it would be `Done()` on a timed-out run, preventing artifact production. So passing `ctx` is deliberate and defensible. Caller cancellation can drop artifacts — a narrow race. 2. **Panic skips PostRun (finding 2):** Confirmed — top-level recover at `executor.go:119-123` catches panics into `res.Err`. `PostRun` is called inline at line 336-342, not deferred. A panic inside `ag.Run` (line 323) would unwind through the deferred recover, set `res.Err`, but skip the `postRun` call at 336. `Cleanup` is deferred (line 221) so it still runs. The doc says PostRun is "called even if the run fails" — a panic is a failure mode. Minor, as agent panics aren't expected. 3. **drain returns internal slice (finding 3):** Confirmed — `session.go:53-55` returns `m.msgs` directly, then sets `m.msgs = nil`. The backing array is orphaned so current usage at `executor.go:322` (`append(mailbox.drain(), critic.drainSteer()...)`) is safe. But the aliasing is non-obvious; a future caller mutating the returned slice would mutate former mailbox state. All three are correctly characterized as minor/defensible, not blocking. Verdict: Minor issues found - **`run/executor.go:341` — `PostRun` receives the caller's `ctx`, not `runCtx`.** This is a defensible choice, not a bug: `runCtx` carries the run's own `MaxRuntime` deadline (`executor.go:244-248`), so on a timed-out (partial-success) run `runCtx` would already be `Done()` and a context-respecting hook could not produce artifacts — the exact partial-success case the `SessionTools.PostRun` doc calls out (`tool/session_tools.go:46-56`). Using the original `ctx` preserves artifact production across run timeouts. The tradeoff is that a *caller* cancellation (client disconnect) will cancel `PostRun` and silently drop artifacts even on an otherwise-completed run — a narrow race (caller must cancel in the window between `ag.Run` returning and `runPostRun` finishing), arguably acceptable under "best-effort," but worth a one-line comment noting the deliberate ctx choice so a future reader doesn't "fix" it to `runCtx` and break the timeout/partial-success path. - **`run/executor.go:323-342` — a panic inside `ag.Run` skips `PostRun` entirely.** The top-level recover at `executor.go:119-123` converts any panic (including one from the agent run) into `res.Err`, but because `PostRun` is a synchronous call *after* `ag.Run` (not deferred), a panicking agent run means `postRun` is never invoked, so no artifacts are produced for that run. `Cleanup` is still honored (deferred at `executor.go:221`). The `SessionTools` doc says `PostRun` is called "even if the run fails," and a panic is a failure mode. In practice majordomo's `agent.Run` isn't expected to panic (errors surface via `runErr`), so impact is minimal — but if strict "PostRun always runs" coverage matters, the hook would need to be invoked from a deferred closure rather than inline. Flagging as minor; no change required if panics are considered out-of-contract for the agent loop. - **`run/session.go:47-56` — `drain` returns the internal slice (not a copy).** This is safe today: the mailbox sets `m.msgs = nil` before returning, so the backing array is orphaned and the caller's `append(mailbox.drain(), critic.drainSteer()...)` at `executor.go:322` cannot corrupt the mailbox. Not a bug, but the aliasing is non-obvious; a future caller that holds the returned slice and mutates it would be mutating state that *was* the mailbox's. A one-line comment or a copy would harden it — low priority. No issues found with: `AttachImages` empty-text/no-image early return (`session.go:64-74`), `runPostRun` named-return panic recovery (`session.go:18-28`), nil-guard on `Cleanup` (`executor.go:220-222`), or the nil-safe `drainSteer`/`critic == nil` path (`critic.go:119-123`, `executor.go:322`). </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 34s</sub>

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

The PR introduces a steer-mailbox mechanism (steerMailbox / runSession.AttachImages) and a PostRun artifact hook — both are framework-level plumbing that pass data between trusted components (the executor, host-provided tools, and host-provided factory hooks). From a security perspective:

  • AttachImages text injection: Tools can inject arbitrary user-role messages into the model's conversation context. This is a prompt-injection surface, but it's the tool's design intent (showing rendered previews to the model), not a framework bug. The executor correctly provides the mechanism; the trust boundary is at the tool author, who controls what text/images they feed in. No executor-level validation is missing that would change this.

  • PostRun artifact data: PostRunResult.Artifacts[].Data is raw []byte produced by the host's hook. The executor neither interprets, stores, nor serves these bytes — it just attaches them to Result. No injection or exposure risk at this layer.

  • steerMailbox concurrency: The mutex-guarded push/drain is correctly implemented — no TOCTOU or data-race risk. Messages are consumed (cleared on drain), not duplicated.

  • runPostRun panic isolation: The defer/recover correctly prevents a hook panic from failing the run. The slog.Error call logs the panic value; if a hook panics with sensitive data, that data hits the log — but the hook is trusted host code, and Go panics rarely carry secrets.

  • No untrusted input flows into executor-controlled paths: ExtraTools, SessionToolFactory, and PostRun are all supplied by the host (trusted). The executor does not deserialize, parse, or execute anything from an external/untrusted source.

No injection, authn/authz, SSRF, deserialization, credential-leakage, or insecure-default issues were found in the changed code.

🎯 Correctness — Minor issues

Verdict: Minor issues

  • defer st.Cleanup() is not panic-isolated, unlike runPostRun (run/executor.go:220–222). The PR explicitly wraps PostRun in runPostRun with recover so a panic there never fails the run. But st.Cleanup() is deferred raw — if it panics, the top-level recover at line 119 catches it and sets res.Err, turning an otherwise-successful run into an error. If the intent is that Cleanup should also be best-effort (consistent with PostRun), it should be wrapped in its own recover. If the intent is that a Cleanup panic should fail the run, that's a defensible choice but should be documented — right now the asymmetry is silent and surprising. (Confirmed: executor.go:119–123 top-level recover catches all panics in Run; executor.go:220–222 defers st.Cleanup() with no local recover; session.go:21–26 shows runPostRun has its own recover.)

  • runPostRun receives the raw caller ctx rather than a detached context (run/executor.go:341 vs 344,348). Both finishAudit and deliver internally call detach(ctx) to insulate their I/O from caller cancellation, but runPostRun passes ctx straight through to the hook. If the caller's context is cancelled (deadline, client disconnect, etc.), the PostRun hook's I/O may fail silently and artifacts won't be produced. This is inconsistent with every other post-run operation in the same function. (Confirmed: executor.go:341 passes raw ctx; executor.go:393 uses detach(ctx) inside finishAudit; executor.go:425 uses detach(ctx) inside deliver; session.go:18–28 shows runPostRun does not detach.)

🧯 Error handling & edge cases — Minor issues

Both findings are factually confirmed against the actual code. Let me verify the exact line numbers match:

  • Finding 1: Line 341 indeed calls runPostRun(ctx, ...) with raw ctx, while finishAudit (line 393: rec.Close(detach(ctx), stats)) and deliver (line 425: dctx := detach(ctx)) both detach internally. The runPostRun function at session.go:18-28 passes ctx straight through to the hook with no detachment. Confirmed.

  • Finding 2: Lines 218-221 show st := inv.SessionToolFactory(...) followed by defer st.Cleanup(). If the factory panics, st is never assigned, the defer never registers, and only the top-level recover at line 119-123 catches it — any resources the factory allocated before panicking leak. Confirmed.

Both findings are accurate and properly scoped as minor. The verdict stands.


Verdict: Minor issues

  • run/executor.go:341 — PostRun receives the caller's (potentially cancelled) ctx, not a detached context. Every other post-run operation in this function — finishAudit (line 344 → internally calls detach(ctx) at line 393) and deliver (line 348 → internally calls detach(ctx) at line 425) — detaches from the caller's cancellation so a timed-out HTTP request or cancelled gRPC stream doesn't prevent audit writes or delivery. runPostRun passes the raw ctx directly to the hook. If the caller cancelled, the hook receives a cancelled context and any I/O it attempts (file writes, uploads) will fail with context.Canceled. The hook can work around this with context.WithoutCancel, but the asymmetry with every other post-run path is a trap. Consider runPostRun(ctx, ...)runPostRun(detach(ctx), ...) or document that the hook must self-detach. (Verified by reading finishAudit at executor.go:379-394 and deliver at executor.go:420-431 — both use detach(ctx); runPostRun at session.go:18-28 does not.)

  • run/executor.go:219-221 — If inv.SessionToolFactory itself panics, st.Cleanup never runs. The defer st.Cleanup() is registered after the factory call returns. If the factory allocates resources (temp dirs, file handles) and then panics before returning, those resources leak — there is no st to call Cleanup on. The executor's top-level recover (line 119) converts the panic to res.Err, but the leak stands. This is inherent to the shape (you can't defer Cleanup before you have st), but it's worth noting that a factory must not allocate before it's ready to return successfully, or it must do its own internal defer/recover. (Verified by reading executor.go:119-123 and 218-231.)

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 53s

<!-- gadfly-review:ollama:deepseek-v4-pro:cloud --> ### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## Verdict: No material issues found The PR introduces a steer-mailbox mechanism (`steerMailbox` / `runSession.AttachImages`) and a `PostRun` artifact hook — both are framework-level plumbing that pass data between trusted components (the executor, host-provided tools, and host-provided factory hooks). From a security perspective: - **`AttachImages` text injection**: Tools can inject arbitrary user-role messages into the model's conversation context. This *is* a prompt-injection surface, but it's the tool's design intent (showing rendered previews to the model), not a framework bug. The executor correctly provides the mechanism; the trust boundary is at the tool author, who controls what text/images they feed in. No executor-level validation is missing that would change this. - **`PostRun` artifact data**: `PostRunResult.Artifacts[].Data` is raw `[]byte` produced by the host's hook. The executor neither interprets, stores, nor serves these bytes — it just attaches them to `Result`. No injection or exposure risk at this layer. - **`steerMailbox` concurrency**: The mutex-guarded `push`/`drain` is correctly implemented — no TOCTOU or data-race risk. Messages are consumed (cleared on drain), not duplicated. - **`runPostRun` panic isolation**: The `defer/recover` correctly prevents a hook panic from failing the run. The `slog.Error` call logs the panic value; if a hook panics with sensitive data, that data hits the log — but the hook is trusted host code, and Go panics rarely carry secrets. - **No untrusted input flows into executor-controlled paths**: `ExtraTools`, `SessionToolFactory`, and `PostRun` are all supplied by the host (trusted). The executor does not deserialize, parse, or execute anything from an external/untrusted source. No injection, authn/authz, SSRF, deserialization, credential-leakage, or insecure-default issues were found in the changed code. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> ## Verdict: Minor issues - **`defer st.Cleanup()` is not panic-isolated, unlike `runPostRun`** (`run/executor.go:220–222`). The PR explicitly wraps `PostRun` in `runPostRun` with `recover` so a panic there never fails the run. But `st.Cleanup()` is deferred raw — if it panics, the top-level `recover` at line 119 catches it and sets `res.Err`, turning an otherwise-successful run into an error. If the intent is that Cleanup should also be best-effort (consistent with PostRun), it should be wrapped in its own recover. If the intent is that a Cleanup panic *should* fail the run, that's a defensible choice but should be documented — right now the asymmetry is silent and surprising. *(Confirmed: `executor.go:119–123` top-level recover catches all panics in `Run`; `executor.go:220–222` defers `st.Cleanup()` with no local recover; `session.go:21–26` shows `runPostRun` has its own recover.)* - **`runPostRun` receives the raw caller `ctx` rather than a detached context** (`run/executor.go:341` vs `344,348`). Both `finishAudit` and `deliver` internally call `detach(ctx)` to insulate their I/O from caller cancellation, but `runPostRun` passes `ctx` straight through to the hook. If the caller's context is cancelled (deadline, client disconnect, etc.), the PostRun hook's I/O may fail silently and artifacts won't be produced. This is inconsistent with every other post-run operation in the same function. *(Confirmed: `executor.go:341` passes raw `ctx`; `executor.go:393` uses `detach(ctx)` inside `finishAudit`; `executor.go:425` uses `detach(ctx)` inside `deliver`; `session.go:18–28` shows `runPostRun` does not detach.)* </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> Both findings are factually confirmed against the actual code. Let me verify the exact line numbers match: - **Finding 1**: Line 341 indeed calls `runPostRun(ctx, ...)` with raw `ctx`, while `finishAudit` (line 393: `rec.Close(detach(ctx), stats)`) and `deliver` (line 425: `dctx := detach(ctx)`) both detach internally. The `runPostRun` function at `session.go:18-28` passes `ctx` straight through to the hook with no detachment. Confirmed. - **Finding 2**: Lines 218-221 show `st := inv.SessionToolFactory(...)` followed by `defer st.Cleanup()`. If the factory panics, `st` is never assigned, the defer never registers, and only the top-level recover at line 119-123 catches it — any resources the factory allocated before panicking leak. Confirmed. Both findings are accurate and properly scoped as minor. The verdict stands. --- ## Verdict: Minor issues - **`run/executor.go:341` — PostRun receives the caller's (potentially cancelled) `ctx`, not a detached context.** Every other post-run operation in this function — `finishAudit` (line 344 → internally calls `detach(ctx)` at line 393) and `deliver` (line 348 → internally calls `detach(ctx)` at line 425) — detaches from the caller's cancellation so a timed-out HTTP request or cancelled gRPC stream doesn't prevent audit writes or delivery. `runPostRun` passes the raw `ctx` directly to the hook. If the caller cancelled, the hook receives a cancelled context and any I/O it attempts (file writes, uploads) will fail with `context.Canceled`. The hook *can* work around this with `context.WithoutCancel`, but the asymmetry with every other post-run path is a trap. Consider `runPostRun(ctx, ...)` → `runPostRun(detach(ctx), ...)` or document that the hook must self-detach. *(Verified by reading `finishAudit` at executor.go:379-394 and `deliver` at executor.go:420-431 — both use `detach(ctx)`; `runPostRun` at session.go:18-28 does not.)* - **`run/executor.go:219-221` — If `inv.SessionToolFactory` itself panics, `st.Cleanup` never runs.** The `defer st.Cleanup()` is registered *after* the factory call returns. If the factory allocates resources (temp dirs, file handles) and then panics before returning, those resources leak — there is no `st` to call `Cleanup` on. The executor's top-level recover (line 119) converts the panic to `res.Err`, but the leak stands. This is inherent to the shape (you can't defer Cleanup before you have `st`), but it's worth noting that a factory must not allocate before it's ready to return successfully, or it must do its own internal defer/recover. *(Verified by reading executor.go:119-123 and 218-231.)* </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 53s</sub>

🪰 Gadfly review — nemotron-3-super:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

The changes primarily wire session tooling and attach images functionality. From a security perspective:

  • No new authentication/authorization gaps introduced
  • No injection vectors (SQL/command/path/template) created - the AttachImages method safely constructs LLM messages without interpreting user input as code
  • No SSRF risks - no external network calls added
  • No unsafe deserialization - only standard Go types used
  • No secret/credential leakage - no handling of sensitive data
  • Input validation remains unchanged - AttachImages trims whitespace but doesn't validate or sanitize beyond normal LLM message construction
  • No insecure defaults - thread-safe mailbox with proper mutex usage

The SessionToolFactory hook execution is panic-isolated and best-effort, but factory trust boundary remains unchanged from existing tool invocation patterns. All new code follows existing security patterns in the codebase.

🎯 Correctness — No material issues found

No material issues found

🧯 Error handling & edge cases — No material issues found

No material issues found

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 20s

<!-- gadfly-review:ollama:nemotron-3-super:cloud --> ### 🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> No material issues found The changes primarily wire session tooling and attach images functionality. From a security perspective: - No new authentication/authorization gaps introduced - No injection vectors (SQL/command/path/template) created - the AttachImages method safely constructs LLM messages without interpreting user input as code - No SSRF risks - no external network calls added - No unsafe deserialization - only standard Go types used - No secret/credential leakage - no handling of sensitive data - Input validation remains unchanged - AttachImages trims whitespace but doesn't validate or sanitize beyond normal LLM message construction - No insecure defaults - thread-safe mailbox with proper mutex usage The SessionToolFactory hook execution is panic-isolated and best-effort, but factory trust boundary remains unchanged from existing tool invocation patterns. All new code follows existing security patterns in the codebase. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> No material issues found </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> No material issues found </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 20s</sub>

🪰 Gadfly review — qwen3-coder:480b-cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

After reviewing the changes through a security lens, I found no material security issues. The diff introduces new functionality for session tools, artifact production, and image attachment but does so without introducing apparent security vulnerabilities.

Key security-relevant aspects I examined:

  • No evidence of injection vulnerabilities (SQL, command, path, template)
  • No SSRF opportunities introduced
  • No unsafe deserialization
  • No secret/credential leakage
  • Input validation appears consistent with existing patterns
  • No obvious authn/authz gaps in the new interfaces

The changes appear to be well-contained and follow secure coding practices. The deferment of cleanup functions and panic isolation for PostRun hooks are implemented safely. The new steer mailbox mechanism for injecting messages into the agent loop is thread-safe.

🎯 Correctness — No material issues found

VERDICT: No material issues found

The changes correctly implement the wiring of SessionToolFactory, PostRun artifacts, and AttachImages functionality. The logic for merging steer messages from both the session mailbox and critic nudges is sound. The PostRun hook execution is properly isolated and best-effort as intended. All findings from the diff have been verified through code inspection and no correctness issues were identified.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

The changes introduce new functionality for session tools, including AttachImages, PostRun hooks, and a steerMailbox for managing messages. Error handling and edge cases have been carefully considered:

  1. The runPostRun function properly isolates panics from the PostRun hook, ensuring that artifact generation failures don't affect the main run result.
  2. The steerMailbox implementation is thread-safe with proper locking around push and drain operations.
  3. The AttachImages method correctly handles empty inputs by returning early.
  4. Cleanup functions from session tool factories are properly deferred.
  5. The merging of steer messages from both the session mailbox and critic is handled correctly.

All new code paths appear to handle errors appropriately, and the tests cover the key scenarios including panic isolation and proper cleanup. No blocking issues or significant edge case oversights were identified in this review.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 24s

<!-- gadfly-review:ollama:qwen3-coder:480b-cloud --> ### 🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> VERDICT: No material issues found After reviewing the changes through a security lens, I found no material security issues. The diff introduces new functionality for session tools, artifact production, and image attachment but does so without introducing apparent security vulnerabilities. Key security-relevant aspects I examined: - No evidence of injection vulnerabilities (SQL, command, path, template) - No SSRF opportunities introduced - No unsafe deserialization - No secret/credential leakage - Input validation appears consistent with existing patterns - No obvious authn/authz gaps in the new interfaces The changes appear to be well-contained and follow secure coding practices. The deferment of cleanup functions and panic isolation for PostRun hooks are implemented safely. The new steer mailbox mechanism for injecting messages into the agent loop is thread-safe. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> VERDICT: No material issues found The changes correctly implement the wiring of SessionToolFactory, PostRun artifacts, and AttachImages functionality. The logic for merging steer messages from both the session mailbox and critic nudges is sound. The PostRun hook execution is properly isolated and best-effort as intended. All findings from the diff have been verified through code inspection and no correctness issues were identified. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> VERDICT: No material issues found The changes introduce new functionality for session tools, including `AttachImages`, `PostRun` hooks, and a `steerMailbox` for managing messages. Error handling and edge cases have been carefully considered: 1. The `runPostRun` function properly isolates panics from the `PostRun` hook, ensuring that artifact generation failures don't affect the main run result. 2. The `steerMailbox` implementation is thread-safe with proper locking around push and drain operations. 3. The `AttachImages` method correctly handles empty inputs by returning early. 4. Cleanup functions from session tool factories are properly deferred. 5. The merging of steer messages from both the session mailbox and critic is handled correctly. All new code paths appear to handle errors appropriately, and the tests cover the key scenarios including panic isolation and proper cleanup. No blocking issues or significant edge case oversights were identified in this review. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 24s</sub>
steve added 1 commit 2026-06-27 22:33:44 +00:00
run: PostRun detached ctx + panic-isolated Cleanup (gadfly #12)
executus CI / test (pull_request) Successful in 45s
executus CI / test (push) Successful in 1m47s
784d5d7ce4
Two convergent gadfly refinements on the PostRun wiring:
- PostRun now runs on detach(ctx), not the caller's ctx — a finished/cancelled
  caller no longer aborts artifact production (3-model: glm-5.2/minimax/deepseek).
- Cleanup is panic-isolated via safeCleanup (recover+log), matching runPostRun, so
  a misbehaving teardown can't clobber an otherwise-successful run (deepseek).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve closed this pull request 2026-06-27 22:40:37 +00:00
steve deleted branch run-postrun-sessiontools 2026-06-27 22:40:37 +00:00
Some checks are pending
executus CI / test (pull_request) Successful in 45s
executus CI / test (push) Successful in 1m47s

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/executus#12