run: fold inv.Images into the initial user message (multimodal opening turn) #16

Merged
steve merged 2 commits from feat/initial-images into main 2026-06-28 05:16:46 +00:00
Owner

The executor passed only the text input to majordomo's agent.Run, silently dropping inv.Images. majordomo's Run input arg is text-only — so any multimodal run (vision: a host's chatbot @mention / chat API) lost its images on the executus path even though tool.Invocation carries an Images field.

Fix

Fold the images into the first user message (text + image parts) via agent.WithHistory and call Run with an empty input — the model then sees a multimodal opening turn. Extracted as a small runAgent helper; the image-less path is unchanged (the prompt passes straight through as the text input). Mirrors mort agentexec's multimodal seeding pattern.

Tests

  • TestExecutorFoldsInitialImages — a run with Images carries the image bytes and the prompt text into the first recorded model request.
  • TestExecutorTextOnlyUnchanged — the text-only path still reaches the model (regression guard).

-race green; full suite passes. Unblocks the multimodal host surfaces (mort's chatbot @mention + chat API migrations).

🤖 Generated with Claude Code

The executor passed only the text `input` to majordomo's `agent.Run`, **silently dropping `inv.Images`**. majordomo's `Run` input arg is text-only — so any multimodal run (vision: a host's chatbot @mention / chat API) lost its images on the executus path even though `tool.Invocation` carries an `Images` field. ### Fix Fold the images into the **first user message** (text + image parts) via `agent.WithHistory` and call `Run` with an empty input — the model then sees a multimodal opening turn. Extracted as a small `runAgent` helper; the image-less path is unchanged (the prompt passes straight through as the text input). Mirrors mort agentexec's multimodal seeding pattern. ### Tests - `TestExecutorFoldsInitialImages` — a run with `Images` carries the image bytes **and** the prompt text into the first recorded model request. - `TestExecutorTextOnlyUnchanged` — the text-only path still reaches the model (regression guard). `-race` green; full suite passes. Unblocks the multimodal host surfaces (mort's chatbot @mention + chat API migrations). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-06-28 04:38:07 +00:00
run: fold inv.Images into the initial user message (multimodal opening turn)
executus CI / test (pull_request) Successful in 46s
Adversarial Review (Gadfly) / review (pull_request) Successful in 6m5s
a35c176b42
The executor passed only the text `input` to majordomo's agent.Run, silently
dropping inv.Images — so a multimodal run (vision: chatbot @mention, chat API)
lost its images on the executus path. majordomo's Run input arg is text-only, so
fold the images into the first user message (text + image parts) via WithHistory
and call Run with empty input, mirroring mort agentexec's multimodal seeding. The
image-less path is unchanged (prompt passes straight through).

Tests: a run with Images carries the image bytes + prompt into the first model
request; the text-only path still reaches the model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly — live review status

6/6 reviewers finished · updated 2026-06-28 04:44:13Z

claude-code/opus · claude-code — done

  • security — No material issues found
  • correctness — Minor issues
  • maintainability — Minor issues
  • performance — No material issues found
  • error-handling — Minor issues

claude-code/opus:max · claude-code — done

  • security — No material issues found
  • correctness — Minor issues
  • maintainability — Minor issues
  • performance — No material issues found
  • error-handling — No material issues found

claude-code/sonnet · claude-code — done

  • security — No material issues found
  • correctness — Minor issues
  • maintainability — Minor issues
  • performance — No material issues found
  • error-handling — Minor issues

deepseek-v4-pro:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — Minor issues
  • maintainability — Minor issues
  • performance — No material issues found
  • error-handling — No material issues found

glm-5.2:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — Minor issues
  • maintainability — No material issues found
  • performance — No material issues found
  • error-handling — Minor issues

minimax-m3:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • maintainability — No material issues found
  • performance — No material issues found
  • error-handling — Minor issues

Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.

<!-- gadfly-status-board --> ## 🪰 Gadfly — live review status 6/6 reviewers finished · updated 2026-06-28 04:44:13Z #### `claude-code/opus` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — Minor issues #### `claude-code/opus:max` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `claude-code/sonnet` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — Minor issues #### `deepseek-v4-pro:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — Minor issues - ✅ **performance** — No material issues found - ✅ **error-handling** — No material issues found #### `glm-5.2:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — Minor issues - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — Minor issues #### `minimax-m3:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **maintainability** — No material issues found - ✅ **performance** — No material issues found - ✅ **error-handling** — Minor issues <sub>Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.</sub>

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

The change is a straightforward repackaging of already-in-memory data (inv.Images + input) into a multimodal first user message via agent.WithHistory. Through the security lens:

  • No new input sources: inv.Images was already populated upstream (Discord attachments, URL downloads) before this PR; the function merely stops silently dropping it. The ingestion/validation boundary is unchanged.
  • No injection: The function constructs llm.Text(input) and llm.ImagePart structs — no string concatenation, template expansion, SQL, command execution, or path construction.
  • No authn/authz change: The function doesn't touch authorization, caller identity, or access control.
  • No credential leakage: No logging, storage, or transmission of secrets.
  • No SSRF or deserialization: The function makes no network calls and deserializes nothing.
  • Role integrity: llm.UserParts produces a user-role message, so the history injection cannot escalate to system-role or override the system prompt set at agent construction time.

The image data itself (size, MIME, content) is a model-provider concern, not something this pass-through helper can or should gate — and that boundary existed before this change.

🎯 Correctness — Minor issues

Both findings are confirmed against the actual code. The draft review is accurate as written.

Verdict: Minor issues

  • Missing empty-text guard in runAgent (inconsistent with runSession.AttachImages)run/executor.go:454. When input is "" (or whitespace-only) but images is non-empty, runAgent unconditionally prepends llm.Text(input), producing a TextPart with empty content. The sibling function runSession.AttachImages (run/session.go:66) explicitly guards against this with if strings.TrimSpace(text) != "" before adding the text part. Many LLM APIs reject multimodal user messages that contain an empty/whitespace-only text part alongside images. While the current call site (executor.go:323) passes the user's prompt as input (unlikely to be empty in practice), the function should match the established guard pattern for correctness. Note: strings is not currently imported in executor.go and would need to be added.

    Suggested fix: Add the same guard:

    parts := make([]llm.Part, 0, len(images)+1)
    if strings.TrimSpace(input) != "" {
        parts = append(parts, llm.Text(input))
    }
    for _, img := range images {
        parts = append(parts, img)
    }
    
  • Test TestExecutorFoldsInitialImages doesn't verify co-location of text and image in the same messagerun/images_test.go:40-54. The test iterates over all messages and all parts, setting sawImage and sawText independently. It would pass even if the image and text landed in separate messages, which would defeat the purpose of folding them into a single multimodal opening turn. This is a test-correctness gap: the test asserts the wrong property.

🧹 Code cleanliness & maintainability — Minor issues

Verdict: Minor issues

  • Duplicated multimodal message construction (run/executor.go:453–458 and run/session.go:65–75): both sites build a []llm.Part from text + images and wrap it in llm.UserParts. The logic is similar (4–5 lines each), though not identical — session.go conditionally skips empty text and has an early-return guard, while executor.go always appends the text part. If the Part-construction pattern ever changes (new part type, ordering convention, etc.), both sites need updating. A small shared helper would eliminate the duplication.

  • Test package inconsistency (run/images_test.go:1 vs run/executor_test.go:1): the new tests use package run_test (black-box) while the existing tests in the same directory use package run (white-box). This forces the new tests to duplicate setup boilerplate (fake.New, fp.Model, run.New) that executor_test.go already provides via the unexported fakeModels helper. Either switching to package run to share the helper, or extracting fakeModels into an exported test utility, would reduce duplication.

  • Cross-repo reference in doc comment (run/executor.go:448): "Mirrors mort agentexec's multimodal seeding" references a pattern in another repository. The comment would be more self-contained and maintainable without the cross-repo aside, which may drift or become meaningless to future readers of this codebase alone.

Performance — No material issues found

Verdict: No material issues found

The change is clean from a performance standpoint:

  • Image-less path (the common case): len(images) == 0 returns ag.Run(ctx, input, opts...) directly — zero overhead added, no extra allocation, no indirection cost.
  • Image path: The parts slice is pre-sized exactly (len(images)+1), so the append loop never reallocates. The opts = append(opts, ...) may allocate a new 2-element backing array (since the caller passes a single variadic agent.WithSteer(steer), giving cap=1), but that's a one-time, tiny allocation per run — negligible.
  • Image data: for _, img := range images copies llm.ImagePart by value. Since ImagePart.Data is a []byte (slice header, not the backing array), this is O(number of images), not O(total bytes). No deep copy of image payloads.
  • No hot-loop concerns: runAgent is called exactly once per Executor.Run, not inside any iteration. No N+1 queries, no unbounded growth, no quadratic behavior, no blocking calls introduced.

Nothing material to flag.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

The runAgent helper is straightforward and handles its primary edge case — len(images) == 0 (both nil and empty slices) — correctly by falling through to the existing ag.Run(ctx, input, opts...) path. No new panic paths are introduced, no resources are allocated that could leak, and the existing defer recover() in Executor.Run (line 119–123) already provides a safety net for any unexpected panic from the majordomo agent layer.

A few observations that fall within this lens but don't rise to the level of a material issue:

  • Empty input with non-empty images: llm.Text("") creates a TextPart with empty content as the first part of the multimodal user message. Some model providers reject messages with an empty text part alongside images. However, this is not a regression — the prior code would have passed "" directly to ag.Run, which would have produced a text-only empty user message (equally problematic). The caller controls input, and sending images with no prompt text is an unlikely caller mistake.

  • opts slice mutation (executor.go:458): opts = append(opts, agent.WithHistory(...)) mutates the variadic parameter slice. In Go, when the caller passes a single argument (agent.WithSteer(steer)), the compiler allocates a fresh backing array; append may or may not grow it, but the caller holds no reference to the slice after runAgent returns, so there is no aliasing hazard. Verified by reading the single call site at line 323.

  • ag nil: The caller (Executor.Run, line 152–157) already guards model == nil and returns an error before constructing ag. A nil ag reaching runAgent would require a bug in majordomo's agent.New, which is outside this diff's responsibility.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 5s

<!-- gadfly-review:ollama:deepseek-v4-pro:cloud --> ### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## Verdict: No material issues found The change is a straightforward repackaging of already-in-memory data (`inv.Images` + `input`) into a multimodal first user message via `agent.WithHistory`. Through the security lens: - **No new input sources**: `inv.Images` was already populated upstream (Discord attachments, URL downloads) before this PR; the function merely stops silently dropping it. The ingestion/validation boundary is unchanged. - **No injection**: The function constructs `llm.Text(input)` and `llm.ImagePart` structs — no string concatenation, template expansion, SQL, command execution, or path construction. - **No authn/authz change**: The function doesn't touch authorization, caller identity, or access control. - **No credential leakage**: No logging, storage, or transmission of secrets. - **No SSRF or deserialization**: The function makes no network calls and deserializes nothing. - **Role integrity**: `llm.UserParts` produces a user-role message, so the history injection cannot escalate to system-role or override the system prompt set at agent construction time. The image data itself (size, MIME, content) is a model-provider concern, not something this pass-through helper can or should gate — and that boundary existed before this change. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> Both findings are confirmed against the actual code. The draft review is accurate as written. ## Verdict: Minor issues - **Missing empty-text guard in `runAgent` (inconsistent with `runSession.AttachImages`)** — `run/executor.go:454`. When `input` is `""` (or whitespace-only) but `images` is non-empty, `runAgent` unconditionally prepends `llm.Text(input)`, producing a `TextPart` with empty content. The sibling function `runSession.AttachImages` (`run/session.go:66`) explicitly guards against this with `if strings.TrimSpace(text) != ""` before adding the text part. Many LLM APIs reject multimodal user messages that contain an empty/whitespace-only text part alongside images. While the current call site (`executor.go:323`) passes the user's prompt as `input` (unlikely to be empty in practice), the function should match the established guard pattern for correctness. Note: `strings` is not currently imported in `executor.go` and would need to be added. **Suggested fix:** Add the same guard: ```go parts := make([]llm.Part, 0, len(images)+1) if strings.TrimSpace(input) != "" { parts = append(parts, llm.Text(input)) } for _, img := range images { parts = append(parts, img) } ``` - **Test `TestExecutorFoldsInitialImages` doesn't verify co-location of text and image in the same message** — `run/images_test.go:40-54`. The test iterates over *all* messages and *all* parts, setting `sawImage` and `sawText` independently. It would pass even if the image and text landed in separate messages, which would defeat the purpose of folding them into a single multimodal opening turn. This is a test-correctness gap: the test asserts the wrong property. </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> ## Verdict: Minor issues - **Duplicated multimodal message construction** (`run/executor.go:453–458` and `run/session.go:65–75`): both sites build a `[]llm.Part` from text + images and wrap it in `llm.UserParts`. The logic is similar (4–5 lines each), though not identical — `session.go` conditionally skips empty text and has an early-return guard, while `executor.go` always appends the text part. If the Part-construction pattern ever changes (new part type, ordering convention, etc.), both sites need updating. A small shared helper would eliminate the duplication. - **Test package inconsistency** (`run/images_test.go:1` vs `run/executor_test.go:1`): the new tests use `package run_test` (black-box) while the existing tests in the same directory use `package run` (white-box). This forces the new tests to duplicate setup boilerplate (`fake.New`, `fp.Model`, `run.New`) that `executor_test.go` already provides via the unexported `fakeModels` helper. Either switching to `package run` to share the helper, or extracting `fakeModels` into an exported test utility, would reduce duplication. - **Cross-repo reference in doc comment** (`run/executor.go:448`): "Mirrors mort agentexec's multimodal seeding" references a pattern in another repository. The comment would be more self-contained and maintainable without the cross-repo aside, which may drift or become meaningless to future readers of this codebase alone. </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> ## Verdict: No material issues found The change is clean from a performance standpoint: - **Image-less path (the common case):** `len(images) == 0` returns `ag.Run(ctx, input, opts...)` directly — zero overhead added, no extra allocation, no indirection cost. - **Image path:** The `parts` slice is pre-sized exactly (`len(images)+1`), so the `append` loop never reallocates. The `opts = append(opts, ...)` may allocate a new 2-element backing array (since the caller passes a single variadic `agent.WithSteer(steer)`, giving cap=1), but that's a one-time, tiny allocation per run — negligible. - **Image data:** `for _, img := range images` copies `llm.ImagePart` by value. Since `ImagePart.Data` is a `[]byte` (slice header, not the backing array), this is O(number of images), not O(total bytes). No deep copy of image payloads. - **No hot-loop concerns:** `runAgent` is called exactly once per `Executor.Run`, not inside any iteration. No N+1 queries, no unbounded growth, no quadratic behavior, no blocking calls introduced. Nothing material to flag. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> ## Verdict: No material issues found The `runAgent` helper is straightforward and handles its primary edge case — `len(images) == 0` (both nil and empty slices) — correctly by falling through to the existing `ag.Run(ctx, input, opts...)` path. No new panic paths are introduced, no resources are allocated that could leak, and the existing `defer recover()` in `Executor.Run` (line 119–123) already provides a safety net for any unexpected panic from the majordomo agent layer. A few observations that fall within this lens but don't rise to the level of a material issue: - **Empty `input` with non-empty `images`**: `llm.Text("")` creates a `TextPart` with empty content as the first part of the multimodal user message. Some model providers reject messages with an empty text part alongside images. However, this is not a regression — the prior code would have passed `""` directly to `ag.Run`, which would have produced a text-only empty user message (equally problematic). The caller controls `input`, and sending images with no prompt text is an unlikely caller mistake. - **`opts` slice mutation** (`executor.go:458`): `opts = append(opts, agent.WithHistory(...))` mutates the variadic parameter slice. In Go, when the caller passes a single argument (`agent.WithSteer(steer)`), the compiler allocates a fresh backing array; `append` may or may not grow it, but the caller holds no reference to the slice after `runAgent` returns, so there is no aliasing hazard. Verified by reading the single call site at line 323. - **`ag` nil**: The caller (`Executor.Run`, line 152–157) already guards `model == nil` and returns an error before constructing `ag`. A nil `ag` reaching `runAgent` would require a bug in majordomo's `agent.New`, which is outside this diff's responsibility. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 5s</sub>

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

  • Verified run/executor.go:449-460 (runAgent) and the call site at run/executor.go:323. The image path folds inv.Images (already-typed []llm.ImagePart from tool.Invocation) into a single llm.UserParts message via agent.WithHistory, then calls ag.Run(ctx, "", opts...). No string interpolation, path construction, command assembly, or templating happens with untrusted input — image bytes/MIME are passed through as structured llm.Part values, not concatenated into text. The text-only path passes input straight to ag.Run, unchanged.
  • The images originate from the existing tool.Invocation.Images field (tool/registry.go:416, verified) — this PR doesn't change where they come from or how they're trusted; it only stops silently dropping them. No new taint source or injection surface is introduced.
  • The empty input string passed to Run in the image branch carries no security risk (it's a literal "", not user-controlled concatenation).
  • No secret/credential material, SSRF, deserialization, or authz boundary is touched by this change.

Nothing in the security lens is materially wrong.

🎯 Correctness — Minor issues

VERDICT: Minor issues

  • run/executor.go:454 — empty-text TextPart not guarded (minor, inconsistent with the established in-package pattern). runAgent unconditionally prepends llm.Text(input) to the parts, even when input is the empty string. The sibling helper in the same package, run/session.go:64-68 (AttachImages), deliberately guards if strings.TrimSpace(text) != "" before appending llm.Text(text), and the documented shape in tool/registry.go:410-416 is "text + image parts." runAgent skips that guard, so a multimodal invocation with a blank prompt yields a first user message containing an empty TextPart followed by images — diverging from the convention the PR claims to mirror. Most providers tolerate an empty text part, but some reject it. Suggested fix: mirror AttachImages — only append llm.Text(input) when strings.TrimSpace(input) != "".

No other correctness issues found in this lane. The image-present fold seeds WithHistory([]llm.Message{llm.UserParts(text + images...)}) and calls Run(ctx, "", opts...), so the model sees the multimodal opening turn as the first user message and Run's text input contributes nothing extra; the image-less path is untouched via ag.Run(ctx, input, opts...). This matches the documented intent.

🧹 Code cleanliness & maintainability — No material issues found

Verdict: No material issues found

  • run/executor.go:449-460runAgent is a clean, well-documented helper that closely mirrors the existing multimodal-parts construction already in run/session.go:64-76 (AttachImages). Naming, structure, and the parts := make([]llm.Part, 0, len(images)+1) capacity hint all follow the established local pattern. No dead code, no leaky abstraction, no unwarranted duplication.
  • Minor (non-blocking) consistency note: run/session.go:66 guards the text part with strings.TrimSpace(text) != "" before appending, whereas runAgent (executor.go:454) appends llm.Text(input) unconditionally. On the image path input is the prompt and is expected non-empty, so this is not a bug — but aligning the two "text + images → UserParts" builders (e.g. extracting a tiny shared userImageParts(text, images) helper) would remove the only structural drift between them. Low value given how small each site is; flagging only for completeness.
  • run/images_test.go — new tests are readable, well-named, and scoped to the behavior under review; they use the existing fake provider harness consistently with the rest of the suite. No maintainability concern.
Performance — No material issues found

Verdict: No material issues found

Through the performance lens, this change is benign:

  • runAgent runs once per Executor.Run call (setup, not inside the agent/tool loop). The work it adds — building a []llm.Part slice sized len(images)+1 and appending one RunOption — is O(images) with a single properly-preallocated slice and no copying of image byte payloads (the llm.ImagePart struct copy shares the underlying Data slice). Negligible.
  • The image-less path is an early return ag.Run(ctx, input, opts...) with zero added allocation — no regression for the common case. Verified at run/executor.go:450-451.
  • No new per-step, per-tool-call, or per-message work is introduced on the hot path; the steer mailbox / critic drains (the actual per-step cost) are untouched. Verified the call site at run/executor.go:322-323 and the surrounding loop at run/executor.go:270-289.
  • opts = append(opts, agent.WithHistory(...)) at run/executor.go:458 appends to the variadic slice exactly once per run; one potential backing-array growth, immaterial.

Nothing in this lane is materially wrong.

🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

  • run/executor.go:454runAgent unconditionally appends llm.Text(input) to the parts slice, even when input is the empty string. The only guard is len(images) == 0, so a multimodal invocation with an empty/whitespace prompt produces a degenerate empty TextPart as the first part of the opening user message. The sibling code path in the same package (run/session.go:66) explicitly guards this case with if strings.TrimSpace(text) != "" before appending the text part, so the new helper diverges from the established in-repo pattern. Impact: some model providers choke on or drop empty text parts, and at minimum it seeds the conversation with a meaningless empty part. Suggested fix: mirror session.go — only append llm.Text(input) when strings.TrimSpace(input) != ""; if that leaves only image parts, still push them (an image-only opening turn is valid). (Verified by reading run/session.go:64-75 and run/executor.go:449-459.)

  • run/executor.go:459 — when images are present, ag.Run(ctx, "", opts...) is invoked with an empty input while the real prompt is seeded via WithHistory. majordomo's Run/WithHistory source is not vendored in this repo (only go.mod pins the module, and agent.WithHistory/agent.Run are only referenced, not defined, here), so I could not confirm whether an empty input combined with a WithHistory opening user message is handled cleanly by the agent loop — e.g., that it doesn't append a second empty user turn on top of the history or treat empty input as "no prompt" and short-circuit. Unverified; flagged because it's the unhappy path this diff introduces and worth a concrete check against majordomo's Run contract.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 34s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> VERDICT: No material issues found - Verified `run/executor.go:449-460` (`runAgent`) and the call site at `run/executor.go:323`. The image path folds `inv.Images` (already-typed `[]llm.ImagePart` from `tool.Invocation`) into a single `llm.UserParts` message via `agent.WithHistory`, then calls `ag.Run(ctx, "", opts...)`. No string interpolation, path construction, command assembly, or templating happens with untrusted input — image bytes/MIME are passed through as structured `llm.Part` values, not concatenated into text. The text-only path passes `input` straight to `ag.Run`, unchanged. - The images originate from the existing `tool.Invocation.Images` field (`tool/registry.go:416`, verified) — this PR doesn't change where they come from or how they're trusted; it only stops silently dropping them. No new taint source or injection surface is introduced. - The empty `input` string passed to `Run` in the image branch carries no security risk (it's a literal `""`, not user-controlled concatenation). - No secret/credential material, SSRF, deserialization, or authz boundary is touched by this change. Nothing in the security lens is materially wrong. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> **VERDICT: Minor issues** - **`run/executor.go:454` — empty-text `TextPart` not guarded (minor, inconsistent with the established in-package pattern).** `runAgent` unconditionally prepends `llm.Text(input)` to the parts, even when `input` is the empty string. The sibling helper in the same package, `run/session.go:64-68` (`AttachImages`), deliberately guards `if strings.TrimSpace(text) != ""` before appending `llm.Text(text)`, and the documented shape in `tool/registry.go:410-416` is "text + image parts." `runAgent` skips that guard, so a multimodal invocation with a blank prompt yields a first user message containing an empty `TextPart` followed by images — diverging from the convention the PR claims to mirror. Most providers tolerate an empty text part, but some reject it. Suggested fix: mirror `AttachImages` — only append `llm.Text(input)` when `strings.TrimSpace(input) != ""`. No other correctness issues found in this lane. The image-present fold seeds `WithHistory([]llm.Message{llm.UserParts(text + images...)})` and calls `Run(ctx, "", opts...)`, so the model sees the multimodal opening turn as the first user message and `Run`'s text input contributes nothing extra; the image-less path is untouched via `ag.Run(ctx, input, opts...)`. This matches the documented intent. </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — No material issues found</summary> **Verdict: No material issues found** - `run/executor.go:449-460` — `runAgent` is a clean, well-documented helper that closely mirrors the existing multimodal-parts construction already in `run/session.go:64-76` (`AttachImages`). Naming, structure, and the `parts := make([]llm.Part, 0, len(images)+1)` capacity hint all follow the established local pattern. No dead code, no leaky abstraction, no unwarranted duplication. - Minor (non-blocking) consistency note: `run/session.go:66` guards the text part with `strings.TrimSpace(text) != ""` before appending, whereas `runAgent` (executor.go:454) appends `llm.Text(input)` unconditionally. On the image path `input` is the prompt and is expected non-empty, so this is not a bug — but aligning the two "text + images → UserParts" builders (e.g. extracting a tiny shared `userImageParts(text, images)` helper) would remove the only structural drift between them. Low value given how small each site is; flagging only for completeness. - `run/images_test.go` — new tests are readable, well-named, and scoped to the behavior under review; they use the existing `fake` provider harness consistently with the rest of the suite. No maintainability concern. </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> **Verdict: No material issues found** Through the performance lens, this change is benign: - `runAgent` runs **once per `Executor.Run` call** (setup, not inside the agent/tool loop). The work it adds — building a `[]llm.Part` slice sized `len(images)+1` and appending one `RunOption` — is O(images) with a single properly-preallocated slice and no copying of image byte payloads (the `llm.ImagePart` struct copy shares the underlying `Data` slice). Negligible. - The image-less path is an early `return ag.Run(ctx, input, opts...)` with **zero added allocation** — no regression for the common case. Verified at `run/executor.go:450-451`. - No new per-step, per-tool-call, or per-message work is introduced on the hot path; the steer mailbox / critic drains (the actual per-step cost) are untouched. Verified the call site at `run/executor.go:322-323` and the surrounding loop at `run/executor.go:270-289`. - `opts = append(opts, agent.WithHistory(...))` at `run/executor.go:458` appends to the variadic slice exactly once per run; one potential backing-array growth, immaterial. Nothing in this lane is materially wrong. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> VERDICT: Minor issues - `run/executor.go:454` — `runAgent` unconditionally appends `llm.Text(input)` to the parts slice, even when `input` is the empty string. The only guard is `len(images) == 0`, so a multimodal invocation with an empty/whitespace prompt produces a degenerate empty `TextPart` as the first part of the opening user message. The sibling code path in the same package (`run/session.go:66`) explicitly guards this case with `if strings.TrimSpace(text) != ""` before appending the text part, so the new helper diverges from the established in-repo pattern. Impact: some model providers choke on or drop empty text parts, and at minimum it seeds the conversation with a meaningless empty part. Suggested fix: mirror `session.go` — only append `llm.Text(input)` when `strings.TrimSpace(input) != ""`; if that leaves only image parts, still push them (an image-only opening turn is valid). (Verified by reading `run/session.go:64-75` and `run/executor.go:449-459`.) - `run/executor.go:459` — when images are present, `ag.Run(ctx, "", opts...)` is invoked with an empty input while the real prompt is seeded via `WithHistory`. majordomo's `Run`/`WithHistory` source is not vendored in this repo (only `go.mod` pins the module, and `agent.WithHistory`/`agent.Run` are only referenced, not defined, here), so I could not confirm whether an empty `input` combined with a `WithHistory` opening user message is handled cleanly by the agent loop — e.g., that it doesn't append a second empty user turn on top of the history or treat empty input as "no prompt" and short-circuit. Unverified; flagged because it's the unhappy path this diff introduces and worth a concrete check against majordomo's `Run` contract. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 34s</sub>

🪰 Gadfly review — claude-code/opus (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

I have what I need. The diff routes already-fetched image bytes (llm.ImagePart is bytes-only per tool/registry.go:410-416 — "pkg/imageutil downloads the bytes … majordomo image parts are bytes-only") into the first user message. No network fetch, string interpolation, or auth decision is introduced by this change.

VERDICT: No material issues found

Security-lens assessment (verified by reading run/executor.go:319-459 and tool/registry.go:405-425):

  • No SSRF. runAgent consumes inv.Images as []llm.ImagePart, which carry raw bytes + MIME (registry.go:416), not URLs. The download/URL-resolution happens upstream in the host (pkg/imageutil), outside this diff. Nothing here fetches a remote resource, so the fold introduces no SSRF surface.
  • No injection. The image parts and prompt text are placed into a structured llm.UserParts(...) message, not concatenated into any string, query, path, or template. input was already passed to the model before this change; routing it through llm.Text(input) in a parts slice changes the wire shape, not the trust handling.
  • No authn/authz change. The change is purely message-assembly; it touches no permission, caller-identity, or gating logic (Budget/critic/audit paths at executor.go:346-350 are untouched).
  • No secret/credential handling and no unsafe deserialization in the new code.
  • Input validation: the images are untrusted (host @mention / chat API), but this diff neither adds nor removes validation — MIME/size validation, if desired, belongs in the upstream imageutil extractor, not in this routing helper. Not a regression introduced here. (Any concern about image count/size as a resource-exhaustion vector is a performance/DoS-lens question, not security, and is not introduced by this change.)

The image bytes were previously dropped; now they reach the model as intended. That is a strict, faithful addition with no new tainted-data flow that wasn't already authorized by the Images field's contract. Nothing material in the security lens.

🎯 Correctness — Minor issues

VERDICT: Minor issues

  • Minor — run/executor.go:453-458: unconditional empty text part diverges from the guarded sibling pattern. runAgent always does parts = append(parts, llm.Text(input)), even when input is empty/whitespace. The directly-analogous helper AttachImages (run/session.go:64-76) deliberately guards: it skips llm.Text(text) when strings.TrimSpace(text) == "" (and bails entirely if no parts remain). The image-only case is realistic for the surfaces this PR unblocks (a chatbot @mention with an image and no caption → input == ""), so this path emits a UserParts whose first element is an empty TextPart. Best case a harmless noop; the very reason AttachImages guards is that some providers reject/penalize empty text parts in a multimodal message. Suggest mirroring that guard:
    parts := make([]llm.Part, 0, len(images)+1)
    if strings.TrimSpace(input) != "" {
        parts = append(parts, llm.Text(input))
    }
    for _, img := range images { parts = append(parts, img) }
    

The fold is otherwise wired correctly: inv.Images is the right source (tool/registry.go:416), input is the raw Run parameter with no prior history seeding so there's no duplication, and the image-less branch is byte-for-byte the old behavior. The text-only regression guard is legitimate.

🧹 Code cleanliness & maintainability — Minor issues

Both functions confirmed against the actual source. The duplication is real: run/session.go:64-76 (runSession.AttachImages) and run/executor.go:453-458 (runAgent) both build a multimodal user message with the identical make([]llm.Part, 0, len(images)+1) / llm.Text / image loop / llm.UserParts(...) construction. The noted divergence — AttachImages guards with strings.TrimSpace(text) != "" while runAgent appends llm.Text(input) unconditionally — is also accurate.

VERDICT: Minor issues

Finding — duplicated multimodal message assembly (run/executor.go:453-458 vs run/session.go:64-76)

The new runAgent builds a llm.UserParts(text + image parts) message that is the same construction already implemented in runSession.AttachImages — same make([]llm.Part, 0, len(images)+1), same llm.Text, same image-append loop, same llm.UserParts(...). Both live in package run and both seed a user-role multimodal turn.

  • Impact (maintainability): Two copies of the same assembly will drift. They already differ — AttachImages guards empty text with strings.TrimSpace(text) != "" before appending llm.Text, while runAgent appends llm.Text(input) unconditionally — exactly the kind of divergence a shared helper would prevent.
  • Suggested fix: Extract a small package-private helper, e.g. func userParts(text string, images []llm.ImagePart) llm.Message, and call it from both runAgent and runSession.AttachImages. Low-churn, removes the second copy, and forces one decision about the empty-text guard.
Performance — No material issues found

I've verified the change at run/executor.go:444-460 against the checked-out code. Through the Performance lens there is nothing material to flag.

VERDICT: No material issues found

  • run/executor.go:449-452 — Image-less path is genuinely unchanged: it forwards straight to ag.Run with no extra allocation or wrapping. The common (text-only) hot path takes zero new work. Verified by reading the helper.
  • run/executor.go:453 — The parts slice is preallocated with exact capacity (len(images)+1), so the append loop never re-grows/re-copies the backing array. Correctly sized.
  • run/executor.go:455-457 — The loop is bounded by the number of images on the invocation (not an unbounded or repeated source), and append(parts, img) copies only the llm.ImagePart struct value (a slice header + MIME), not the underlying image bytes — so no large-payload copy is introduced. No N+1, no per-step or hot-loop work (this runs once at run start, outside the agent step loop), no quadratic behavior.

No efficiency regression in my lane. The remaining concerns (e.g. whether WithHistory + empty-input semantics are correct, or whether inv.Images aliasing matters) belong to the correctness lens, not performance.

🧯 Error handling & edge cases — Minor issues

Both core claims confirmed against the actual source:

  • run/executor.go:454 unconditionally appends llm.Text(input) before image parts, with no empty/whitespace guard.
  • run/session.go:64-76 (AttachImages) does the identical fold but guards both the text append (if strings.TrimSpace(text) != "") and the empty-parts case (if len(parts) == 0 { return }).

The divergence between the two in-repo multimodal-seeding paths is real. The draft is honest about the one thing it couldn't verify (whether majordomo's provider layer rejects an empty text part, since that source is outside the working tree). The finding stands.

VERDICT: Minor issues

🧯 Error handling & edge cases

  • run/executor.go:454 — image-only run produces a blank text part (unhandled empty-input edge case). runAgent unconditionally prepends llm.Text(input) before the image parts. When input is empty or whitespace-only — a realistic case, since a host can fold inv.Images with no caption (a chatbot @mention that is just an image, no prompt) — the opening user turn becomes [TextPart{""}, ImagePart]. The codebase's own sibling implementation, runSession.AttachImages (run/session.go:64-76), does the identical fold but guards it: if strings.TrimSpace(text) != "" { parts = append(parts, llm.Text(text)) } and also early-returns when len(parts) == 0. The new helper omits both guards, so the two multimodal-seeding paths now diverge on the empty-text edge.
    • Impact: an empty text content block is rejected by several provider APIs, which would turn a valid image-only run into a hard request error on the unhappy path. Verified: the unconditional append (executor.go:454) and the contrasting guard in AttachImages (session.go:66, 72). Unverified: whether majordomo's provider layer actually rejects the empty part — that source is outside the working tree — so the severity hinges on that. The inconsistency with the established in-repo pattern is concrete regardless.
    • Fix: mirror AttachImagesif strings.TrimSpace(input) != "" { parts = append(parts, llm.Text(input)) } so an image-only invocation seeds a clean single-image turn.
    • Note: the two new tests only exercise the non-empty prompt + image case ("describe this"), so this empty-caption edge is untested.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 59s

<!-- gadfly-review:ollama:claude-code/opus --> ### 🪰 Gadfly review — `claude-code/opus` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> I have what I need. The diff routes already-fetched image **bytes** (`llm.ImagePart` is bytes-only per `tool/registry.go:410-416` — "pkg/imageutil downloads the bytes … majordomo image parts are bytes-only") into the first user message. No network fetch, string interpolation, or auth decision is introduced by this change. ## VERDICT: No material issues found Security-lens assessment (verified by reading `run/executor.go:319-459` and `tool/registry.go:405-425`): - **No SSRF.** `runAgent` consumes `inv.Images` as `[]llm.ImagePart`, which carry raw bytes + MIME (`registry.go:416`), not URLs. The download/URL-resolution happens upstream in the host (`pkg/imageutil`), outside this diff. Nothing here fetches a remote resource, so the fold introduces no SSRF surface. - **No injection.** The image parts and prompt text are placed into a structured `llm.UserParts(...)` message, not concatenated into any string, query, path, or template. `input` was already passed to the model before this change; routing it through `llm.Text(input)` in a parts slice changes the wire shape, not the trust handling. - **No authn/authz change.** The change is purely message-assembly; it touches no permission, caller-identity, or gating logic (Budget/critic/audit paths at `executor.go:346-350` are untouched). - **No secret/credential handling** and **no unsafe deserialization** in the new code. - **Input validation:** the images are untrusted (host @mention / chat API), but this diff neither adds nor removes validation — MIME/size validation, if desired, belongs in the upstream `imageutil` extractor, not in this routing helper. Not a regression introduced here. (Any concern about image count/size as a resource-exhaustion vector is a performance/DoS-lens question, not security, and is not introduced by this change.) The image bytes were previously *dropped*; now they reach the model as intended. That is a strict, faithful addition with no new tainted-data flow that wasn't already authorized by the `Images` field's contract. Nothing material in the security lens. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> ## VERDICT: Minor issues - **Minor — `run/executor.go:453-458`: unconditional empty text part diverges from the guarded sibling pattern.** `runAgent` always does `parts = append(parts, llm.Text(input))`, even when `input` is empty/whitespace. The directly-analogous helper `AttachImages` (`run/session.go:64-76`) deliberately guards: it skips `llm.Text(text)` when `strings.TrimSpace(text) == ""` (and bails entirely if no parts remain). The image-only case is realistic for the surfaces this PR unblocks (a chatbot @mention with an image and no caption → `input == ""`), so this path emits a `UserParts` whose first element is an empty `TextPart`. Best case a harmless noop; the very reason `AttachImages` guards is that some providers reject/penalize empty text parts in a multimodal message. Suggest mirroring that guard: ```go parts := make([]llm.Part, 0, len(images)+1) if strings.TrimSpace(input) != "" { parts = append(parts, llm.Text(input)) } for _, img := range images { parts = append(parts, img) } ``` The fold is otherwise wired correctly: `inv.Images` is the right source (`tool/registry.go:416`), `input` is the raw `Run` parameter with no prior history seeding so there's no duplication, and the image-less branch is byte-for-byte the old behavior. The text-only regression guard is legitimate. </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> Both functions confirmed against the actual source. The duplication is real: `run/session.go:64-76` (`runSession.AttachImages`) and `run/executor.go:453-458` (`runAgent`) both build a multimodal user message with the identical `make([]llm.Part, 0, len(images)+1)` / `llm.Text` / image loop / `llm.UserParts(...)` construction. The noted divergence — `AttachImages` guards with `strings.TrimSpace(text) != ""` while `runAgent` appends `llm.Text(input)` unconditionally — is also accurate. ## VERDICT: Minor issues **Finding — duplicated multimodal message assembly (`run/executor.go:453-458` vs `run/session.go:64-76`)** The new `runAgent` builds a `llm.UserParts(text + image parts)` message that is the same construction already implemented in `runSession.AttachImages` — same `make([]llm.Part, 0, len(images)+1)`, same `llm.Text`, same image-append loop, same `llm.UserParts(...)`. Both live in package `run` and both seed a user-role multimodal turn. - **Impact (maintainability):** Two copies of the same assembly will drift. They already differ — `AttachImages` guards empty text with `strings.TrimSpace(text) != ""` before appending `llm.Text`, while `runAgent` appends `llm.Text(input)` unconditionally — exactly the kind of divergence a shared helper would prevent. - **Suggested fix:** Extract a small package-private helper, e.g. `func userParts(text string, images []llm.ImagePart) llm.Message`, and call it from both `runAgent` and `runSession.AttachImages`. Low-churn, removes the second copy, and forces one decision about the empty-text guard. </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> I've verified the change at `run/executor.go:444-460` against the checked-out code. Through the Performance lens there is nothing material to flag. ## VERDICT: No material issues found - `run/executor.go:449-452` — Image-less path is genuinely unchanged: it forwards straight to `ag.Run` with no extra allocation or wrapping. The common (text-only) hot path takes zero new work. Verified by reading the helper. - `run/executor.go:453` — The parts slice is preallocated with exact capacity (`len(images)+1`), so the append loop never re-grows/re-copies the backing array. Correctly sized. - `run/executor.go:455-457` — The loop is bounded by the number of images on the invocation (not an unbounded or repeated source), and `append(parts, img)` copies only the `llm.ImagePart` struct value (a slice header + MIME), **not** the underlying image bytes — so no large-payload copy is introduced. No N+1, no per-step or hot-loop work (this runs once at run start, outside the agent step loop), no quadratic behavior. No efficiency regression in my lane. The remaining concerns (e.g. whether `WithHistory` + empty-input semantics are correct, or whether `inv.Images` aliasing matters) belong to the correctness lens, not performance. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> Both core claims confirmed against the actual source: - `run/executor.go:454` unconditionally appends `llm.Text(input)` before image parts, with no empty/whitespace guard. - `run/session.go:64-76` (`AttachImages`) does the identical fold but guards both the text append (`if strings.TrimSpace(text) != ""`) and the empty-parts case (`if len(parts) == 0 { return }`). The divergence between the two in-repo multimodal-seeding paths is real. The draft is honest about the one thing it couldn't verify (whether majordomo's provider layer rejects an empty text part, since that source is outside the working tree). The finding stands. ## VERDICT: Minor issues **🧯 Error handling & edge cases** - **`run/executor.go:454` — image-only run produces a blank text part (unhandled empty-input edge case).** `runAgent` unconditionally prepends `llm.Text(input)` before the image parts. When `input` is empty or whitespace-only — a realistic case, since a host can fold `inv.Images` with no caption (a chatbot @mention that is *just* an image, no prompt) — the opening user turn becomes `[TextPart{""}, ImagePart]`. The codebase's own sibling implementation, `runSession.AttachImages` (`run/session.go:64-76`), does the identical fold but **guards it**: `if strings.TrimSpace(text) != "" { parts = append(parts, llm.Text(text)) }` and also early-returns when `len(parts) == 0`. The new helper omits both guards, so the two multimodal-seeding paths now diverge on the empty-text edge. - *Impact:* an empty text content block is rejected by several provider APIs, which would turn a valid image-only run into a hard request error on the unhappy path. **Verified:** the unconditional append (executor.go:454) and the contrasting guard in AttachImages (session.go:66, 72). **Unverified:** whether majordomo's provider layer actually rejects the empty part — that source is outside the working tree — so the severity hinges on that. The inconsistency with the established in-repo pattern is concrete regardless. - *Fix:* mirror `AttachImages` — `if strings.TrimSpace(input) != "" { parts = append(parts, llm.Text(input)) }` so an image-only invocation seeds a clean single-image turn. - *Note:* the two new tests only exercise the non-empty prompt + image case (`"describe this"`), so this empty-caption edge is untested. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 59s</sub>

🪰 Gadfly review — claude-code/sonnet (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

I have enough information to give a definitive verdict.


No material issues found

The runAgent helper takes llm.ImagePart values (bytes-only; pkg/imageutil is the upstream downloader documented in the Images field comment at tool/registry.go:413-415) and folds them into the initial history message via agent.WithHistory. From the security lens:

  • No new attack surface: inv.Images was already being carried in tool.Invocation before this PR; the bytes just weren't reaching the model. The trust boundary and any MIME-type/size validation live upstream in the callers that populate Images — this PR doesn't move or widen that boundary.

  • opts slice safety: opts = append(opts, ...) inside runAgent operates on the variadic copy, not the caller's backing array. No aliasing risk.

  • No authz bypass via WithHistory: The pre-seeded history message is treated as context, not a new turn, so it doesn't bypass dispatch-guard caps or budget gates wired into the agent at construction time (agent.New(model, e.systemPrompt(ra), opts...)).

  • No injection vectors: No SQL, shell, path traversal, or template expansion touches the image bytes or MIME strings in this change. MIME strings are passed to the majordomo provider layer; I couldn't read the majordomo source from this repo's context, but the concern (provider sending a MIME string verbatim in HTTP headers) is pre-existing and not introduced here.

  • No credential leakage: No new logging of image bytes or MIME types.

🎯 Correctness — Minor issues

Both findings verified against the code. Finding 1 is confirmed. Finding 2 requires majordomo source which is inaccessible from the allowed working directory. The draft's own reasoning for Finding 2 acknowledges it couldn't verify the majordomo contract and frames it as "unverified" — this is correctly a speculative finding, not a confirmed code defect.

Verdict: Minor issues


Finding 1 — llm.Text(input) always emitted, even when input == ""

runAgent (executor.go:453–454) unconditionally prepends llm.Text(input) to the parts slice whenever images are present:

parts = append(parts, llm.Text(input))
for _, img := range images {
    parts = append(parts, img)
}

The analogous multimodal helper in the same package — runSession.AttachImages (session.go:66–68) — guards against this:

if strings.TrimSpace(text) != "" {
    parts = append(parts, llm.Text(text))
}

If a caller supplies Images but leaves the input string empty, runAgent builds a UserParts(llm.Text(""), image…) message with a spurious empty text part. Several providers (e.g. Anthropic) reject or warn on empty text blocks inside a content array. The fix is to apply the same guard used in session.go.

🧹 Code cleanliness & maintainability — Minor issues

Both call sites and the comment are confirmed in the actual source. The only runAgent call is at line 323 (bare value, not a slice expansion — the draft's "Verified" note cites 319 by mistake, but that line is the agent.New call; the factual safety claim holds). The comment at lines 444–448 is exactly as described. Both findings survive.


Minor issues

The extraction is clean overall — runAgent is a sensible factoring, the tests follow established patterns in the package, and the image-less path is unchanged. Two items worth flagging:


  • opts mutation footgunrun/executor.go:458

    opts = append(opts, agent.WithHistory([]llm.Message{llm.UserParts(parts...)}))
    

    opts is a variadic parameter. In Go, when the caller expands a slice with slice..., the variadic parameter shares the caller's backing array. If there is spare capacity, append here writes into that array and silently corrupts the caller's slice. The current sole call site is safe — it passes a single element literal, so Go allocates a fresh 1-element slice and append triggers a reallocation — but the function's signature makes runAgent(ctx, ag, input, images, myOpts...) look entirely natural, and any future caller with a pre-allocated-or-resliced opts slice would be silently bitten.

    Fix: cap-limit the slice before appending to guarantee a new allocation regardless of call site:

    opts = append(opts[:len(opts):len(opts)], agent.WithHistory([]llm.Message{llm.UserParts(parts...)}))
    

  • Comment over-explains the implementationrun/executor.go:444–448

    The five-line block comment restates the mechanics the code itself shows clearly (build parts, seed via WithHistory, pass empty string to Run). The non-obvious WHY — that majordomo's Run is text-only — warrants a comment, but the rest is paraphrase. Per the repo's house style (CLAUDE.md: "Only add [a comment] when the WHY is non-obvious"), the comment could be trimmed to a single line capturing just the constraint:

    // majordomo.Run is text-only; inject images as a synthetic history message.
    

    The "Mirrors mort agentexec" note has value as an architectural cross-reference but would fit better in the PR description than in a package-level comment (the agentexec package is not visible from this module).

Performance — No material issues found

Now I have enough to render a verdict.


No material issues found

  • Fast path is zero-overhead. The len(images) == 0 guard at executor.go:450 short-circuits immediately for text-only runs (the common case), so the existing hot path is untouched.

  • parts slice is correctly pre-sized. make([]llm.Part, 0, len(images)+1) at line 453 allocates exactly the right capacity; no growth reallocations occur inside the loop.

  • opts append is safe and bounded. The call site (executor.go:323) passes a single variadic arg, so Go allocates a fresh []agent.RunOption with len == cap == 1. The append at line 458 therefore always allocates a new backing array of size 2 — one small allocation per multimodal run, not per step or per tool call. Given this wraps an LLM network round-trip, it is inconsequential.

  • llm.ImagePart loop copies only slice headers. Data []byte is a slice header (pointer + len + cap); the underlying pixel bytes are not copied, so the O(n-images) loop is as cheap as possible.

  • No unbounded growth, no N+1 patterns, no work added inside the per-step loop.

🧯 Error handling & edge cases — Minor issues

The finding is confirmed. Line 454 unconditionally appends llm.Text(input) even when input == "", while session.go:66-68 guards with strings.TrimSpace(text) != "".


Verdict: Minor issues

Finding: Empty input with images injects a spurious llm.Text("") part

run/executor.go:454 — when input == "" (caller provides images but no prompt text), runAgent unconditionally prepends llm.Text(input) to the parts slice, producing a multimodal history message whose first element is an empty-string text part:

parts = append(parts, llm.Text(input))  // always, even when input == ""

The sibling path in run/session.go:66-68 — which performs the same image-folding job for mid-run AttachImages calls — explicitly guards against this:

if strings.TrimSpace(text) != "" {
    parts = append(parts, llm.Text(text))
}

An empty TextPart in a multimodal message is not universally tolerated: several LLM providers (Anthropic included) reject messages where a text block has an empty string. This would cause a runtime error only on the path len(images) > 0 && input == "", which is a valid call (images-only invocation with a system-prompt–only agent), but it's a latent edge-case failure.

Suggested fix — mirror the AttachImages guard:

if strings.TrimSpace(input) != "" {
    parts = append(parts, llm.Text(input))
}

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 56s

<!-- gadfly-review:ollama:claude-code/sonnet --> ### 🪰 Gadfly review — `claude-code/sonnet` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> I have enough information to give a definitive verdict. --- **No material issues found** The `runAgent` helper takes `llm.ImagePart` values (bytes-only; `pkg/imageutil` is the upstream downloader documented in the `Images` field comment at `tool/registry.go:413-415`) and folds them into the initial history message via `agent.WithHistory`. From the security lens: - **No new attack surface**: `inv.Images` was already being carried in `tool.Invocation` before this PR; the bytes just weren't reaching the model. The trust boundary and any MIME-type/size validation live upstream in the callers that populate `Images` — this PR doesn't move or widen that boundary. - **`opts` slice safety**: `opts = append(opts, ...)` inside `runAgent` operates on the variadic copy, not the caller's backing array. No aliasing risk. - **No authz bypass via `WithHistory`**: The pre-seeded history message is treated as context, not a new turn, so it doesn't bypass dispatch-guard caps or budget gates wired into the agent at construction time (`agent.New(model, e.systemPrompt(ra), opts...)`). - **No injection vectors**: No SQL, shell, path traversal, or template expansion touches the image bytes or MIME strings in this change. MIME strings are passed to the majordomo provider layer; I couldn't read the majordomo source from this repo's context, but the concern (provider sending a MIME string verbatim in HTTP headers) is pre-existing and not introduced here. - **No credential leakage**: No new logging of image bytes or MIME types. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> Both findings verified against the code. Finding 1 is confirmed. Finding 2 requires majordomo source which is inaccessible from the allowed working directory. The draft's own reasoning for Finding 2 acknowledges it couldn't verify the majordomo contract and frames it as "unverified" — this is correctly a speculative finding, not a confirmed code defect. **Verdict: Minor issues** --- ### Finding 1 — `llm.Text(input)` always emitted, even when `input == ""` `runAgent` (`executor.go:453–454`) unconditionally prepends `llm.Text(input)` to the parts slice whenever images are present: ```go parts = append(parts, llm.Text(input)) for _, img := range images { parts = append(parts, img) } ``` The analogous multimodal helper in the same package — `runSession.AttachImages` (`session.go:66–68`) — guards against this: ```go if strings.TrimSpace(text) != "" { parts = append(parts, llm.Text(text)) } ``` If a caller supplies `Images` but leaves the `input` string empty, `runAgent` builds a `UserParts(llm.Text(""), image…)` message with a spurious empty text part. Several providers (e.g. Anthropic) reject or warn on empty text blocks inside a `content` array. The fix is to apply the same guard used in `session.go`. </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> Both call sites and the comment are confirmed in the actual source. The only `runAgent` call is at line 323 (bare value, not a slice expansion — the draft's "Verified" note cites 319 by mistake, but that line is the `agent.New` call; the factual safety claim holds). The comment at lines 444–448 is exactly as described. Both findings survive. --- **Minor issues** The extraction is clean overall — `runAgent` is a sensible factoring, the tests follow established patterns in the package, and the image-less path is unchanged. Two items worth flagging: --- - **`opts` mutation footgun** — `run/executor.go:458` ```go opts = append(opts, agent.WithHistory([]llm.Message{llm.UserParts(parts...)})) ``` `opts` is a variadic parameter. In Go, when the caller expands a slice with `slice...`, the variadic parameter shares the caller's backing array. If there is spare capacity, `append` here writes into that array and silently corrupts the caller's slice. The current sole call site is safe — it passes a single element literal, so Go allocates a fresh 1-element slice and `append` triggers a reallocation — but the function's signature makes `runAgent(ctx, ag, input, images, myOpts...)` look entirely natural, and any future caller with a pre-allocated-or-resliced opts slice would be silently bitten. Fix: cap-limit the slice before appending to guarantee a new allocation regardless of call site: ```go opts = append(opts[:len(opts):len(opts)], agent.WithHistory([]llm.Message{llm.UserParts(parts...)})) ``` --- - **Comment over-explains the implementation** — `run/executor.go:444–448` The five-line block comment restates the mechanics the code itself shows clearly (build `parts`, seed via `WithHistory`, pass empty string to `Run`). The non-obvious WHY — that majordomo's `Run` is text-only — warrants a comment, but the rest is paraphrase. Per the repo's house style (CLAUDE.md: "Only add [a comment] when the WHY is non-obvious"), the comment could be trimmed to a single line capturing just the constraint: ```go // majordomo.Run is text-only; inject images as a synthetic history message. ``` The "Mirrors mort agentexec" note has value as an architectural cross-reference but would fit better in the PR description than in a package-level comment (the `agentexec` package is not visible from this module). </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> Now I have enough to render a verdict. --- **No material issues found** - **Fast path is zero-overhead.** The `len(images) == 0` guard at `executor.go:450` short-circuits immediately for text-only runs (the common case), so the existing hot path is untouched. - **`parts` slice is correctly pre-sized.** `make([]llm.Part, 0, len(images)+1)` at line 453 allocates exactly the right capacity; no growth reallocations occur inside the loop. - **`opts` append is safe and bounded.** The call site (`executor.go:323`) passes a single variadic arg, so Go allocates a fresh `[]agent.RunOption` with `len == cap == 1`. The `append` at line 458 therefore always allocates a new backing array of size 2 — one small allocation per multimodal run, not per step or per tool call. Given this wraps an LLM network round-trip, it is inconsequential. - **`llm.ImagePart` loop copies only slice headers.** `Data []byte` is a slice header (pointer + len + cap); the underlying pixel bytes are not copied, so the O(n-images) loop is as cheap as possible. - No unbounded growth, no N+1 patterns, no work added inside the per-step loop. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> The finding is confirmed. Line 454 unconditionally appends `llm.Text(input)` even when `input == ""`, while `session.go:66-68` guards with `strings.TrimSpace(text) != ""`. --- **Verdict: Minor issues** **Finding:** Empty `input` with images injects a spurious `llm.Text("")` part `run/executor.go:454` — when `input == ""` (caller provides images but no prompt text), `runAgent` unconditionally prepends `llm.Text(input)` to the parts slice, producing a multimodal history message whose first element is an empty-string text part: ```go parts = append(parts, llm.Text(input)) // always, even when input == "" ``` The sibling path in `run/session.go:66-68` — which performs the same image-folding job for mid-run `AttachImages` calls — explicitly guards against this: ```go if strings.TrimSpace(text) != "" { parts = append(parts, llm.Text(text)) } ``` An empty `TextPart` in a multimodal message is not universally tolerated: several LLM providers (Anthropic included) reject messages where a text block has an empty string. This would cause a runtime error only on the path `len(images) > 0 && input == ""`, which is a valid call (images-only invocation with a system-prompt–only agent), but it's a latent edge-case failure. **Suggested fix** — mirror the `AttachImages` guard: ```go if strings.TrimSpace(input) != "" { parts = append(parts, llm.Text(input)) } ``` </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 56s</sub>

🪰 Gadfly review — claude-code/opus:max (claude-code)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

I've verified the key security-relevant facts against the checked-out code. My findings through the Security lens:

VERDICT: No material issues found

The change is a minimal, faithful passthrough of already-materialized image bytes into the model's opening turn. Nothing in the Security lane is materially wrong.

What I checked and why it's clean:

  • No SSRF introduced. tool.Invocation.Images is []llm.ImagePart, and the field comment (tool/registry.go:410-416) plus the test (llm.ImagePart{MIME, Data: []byte(...)}) confirm majordomo image parts are bytes-only — no URL field. Any network fetch happens host-side in pkg/imageutil before the invocation reaches runAgent; this PR never dereferences a caller-supplied URL. Verified by reading run/executor.go:449-459 and the ImagePart usage sites.

  • No injection (SQL/command/path/template). The image parts are placed into an llm.Message struct via llm.UserParts(...) (run/executor.go:453-458) — never concatenated into a query, shell command, filesystem path, or template. Same handling pattern already used by the vetted runSession.AttachImages (run/session.go:64-76).

  • No secret/credential leakage via audit. marshalMessages (model/call.go:329-348) records only ImageCount, not the raw image bytes, so seeding images into history doesn't dump untrusted/PII blob data into the run trace.

  • No authz/guardrail bypass between the two paths. Both branches run the same ag (built with the same systemPrompt and tool/step-limit options at run/executor.go:291-319) and the same WithSteer. The image path only swaps the text input arg for a WithHistory-seeded user message; it does not skip any auth, permission, or tool-gating step. Verified by reading the surrounding Run body.

I considered the prompt-injection-via-image angle (untrusted image could embed instructions), but inlining the bytes is the only way to do multimodal and is a model-provider concern, not a defect this code introduces or could mitigate — flagging it would be noise.

Outside my lens: there's no bound on the number/size of inv.Images folded into the turn, but that's a resource/DoS concern for the performance lens, not security.

🎯 Correctness — Minor issues

I've confirmed the code-level facts I can reach. Finding 1's divergence is directly verifiable in the source; Finding 2 depends on majordomo's internal Run/WithHistory behavior, which is outside the working dir and unreachable (module cache and go build/go doc are blocked here) — and the draft itself admits it could not be verified.


VERDICT: Minor issues

Findings (Correctness lens)

  • run/executor.go:454 — an empty text part is emitted when input == "", diverging from the sibling helper that deliberately guards against it. runAgent unconditionally does parts = append(parts, llm.Text(input)) before appending images. The sibling runSession.AttachImages (run/session.go:64-76) performs the identical construction but guards it (if strings.TrimSpace(text) != "" { parts = append(parts, llm.Text(text)) }) and additionally bails when len(parts) == 0. So for an image-only invocation (inv.Images set, empty prompt — a legitimate multimodal case the guarded sibling was written for), this path seeds the opening user message with an empty llm.Text("") block, which some providers reject.
    • Verified: by reading both functions and contrasting them directly (run/executor.go:449-460 vs run/session.go:64-76).
    • Fix: mirror the guard — only prepend the text part when strings.TrimSpace(input) != "".
🧹 Code cleanliness & maintainability — Minor issues

Both sites confirmed. The duplication finding is verified: run/executor.go:453-458 and run/session.go:65-75 are the only two make([]llm.Part sites in the repo, both in package run, and no shared helper exists. The empty-text drift is also real — session.go:66 guards strings.TrimSpace(text) != "" while runAgent unconditionally appends llm.Text(input) at executor.go:454.

VERDICT: Minor issues

  • Duplicated multimodal-message construction — should be a shared helper (run/executor.go:453-458 vs run/session.go:65-75). runAgent builds a []llm.Part from text + images and wraps it in llm.UserParts(...) with the exact shape already implemented in runSession.AttachImages: make([]llm.Part, 0, len(images)+1) → append llm.Text(...) → loop-append images → llm.UserParts(parts...). These are the only two make([]llm.Part sites in the repo and both live in package run, so a small unexported helper (e.g. func userMessage(text string, images []llm.ImagePart) llm.Message) would dedupe both with zero new surface. Impact: two copies of the same construction will drift.

    The drift has already started: AttachImages guards an empty/whitespace text part (if strings.TrimSpace(text) != "" { ... }, session.go:66) whereas runAgent unconditionally appends llm.Text(input) (executor.go:454) — so the images path emits an empty leading text part when input == "". Two sibling builders of "the initial user message" now disagree on the empty-text case. Folding both onto one helper fixes the inconsistency and is the low-churn maintainability fix.

Everything else in this lane is clean:

  • runAgent is well-named, single-purpose, and the early-return-on-empty-images shape reads clearly; the doc comment is accurate and matches the house // Why: style.
  • Re-assigning the variadic opts via append before the second Run call is idiomatic Go, not a smell.
  • The two tests are near-parallel in setup, but that's ordinary table-style test boilerplate and not worth churn to factor out.
Performance — No material issues found

I've verified the change. The call site (executor.go:323) runs once per run, not in any loop, and the helper itself is trivially bounded.

VERDICT: No material issues found

Performance assessment of runAgent (run/executor.go:449-459):

  • Allocation is correctly bounded and pre-sized. make([]llm.Part, 0, len(images)+1) (line 453) pre-allocates exact capacity, so the append loop never reallocates. Verified.
  • No deep copy of image payloads. The loop appends img (an llm.ImagePart struct) by value — only the struct header (incl. the Data []byte slice header) is copied, not the underlying bytes. So large image buffers are not duplicated. Verified by reading the helper.
  • Not a hot path. The whole helper executes once per Executor.Run (line 323), i.e. once at the start of an agent run, gated behind a model round-trip that dwarfs any of this setup cost. The per-run cost is O(number of images), which is the input itself.
  • Image-less path is zero-overhead. len(images) == 0 short-circuits straight to ag.Run(ctx, input, opts...) with no extra allocation — the text-only common case is unchanged. Verified.
  • opts = append(...) (line 458) appends to the locally-received variadic slice, not a shared one; no aliasing growth concern.

No N+1, no work inside a hot loop, no unbounded growth, no blocking call introduced, no avoidable copy. The change is clean through the performance lens.

🧯 Error handling & edge cases — No material issues found

Both draft findings rest entirely on majordomo-internal behavior the draft author explicitly could not verify, and which I also cannot read (the module cache lives outside the sandboxed working directory and there is no vendor/ tree). I confirmed the executus-side code matches the diff exactly (run/executor.go:444-460), that inv.Images is []llm.ImagePart (tool/registry.go:416), and that the call site is run/executor.go:323. But neither finding can be positively confirmed:

  • Finding 1 (empty text part): The code does unconditionally append llm.Text(input), but the claimed harm depends on (a) a host actually invoking Executor.Run with an empty input plus images — no caller in the repo does this; the input is always a non-empty prompt — and (b) majordomo's provider layer rejecting/not-stripping empty text blocks, which the draft itself marks unverified. Unconfirmable defect.

  • Finding 2 (duplicate opening turn): Explicitly self-labeled "(Unverified)" and hinges wholly on how majordomo's agent.Run treats an empty input arg. Unreadable, unconfirmable.

Neither survives the gate.

VERDICT: No material issues found

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 23s

<!-- gadfly-review:ollama:claude-code/opus:max --> ### 🪰 Gadfly review — `claude-code/opus:max` (claude-code) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> I've verified the key security-relevant facts against the checked-out code. My findings through the Security lens: ## VERDICT: No material issues found The change is a minimal, faithful passthrough of already-materialized image bytes into the model's opening turn. Nothing in the Security lane is materially wrong. What I checked and why it's clean: - **No SSRF introduced.** `tool.Invocation.Images` is `[]llm.ImagePart`, and the field comment (`tool/registry.go:410-416`) plus the test (`llm.ImagePart{MIME, Data: []byte(...)}`) confirm majordomo image parts are **bytes-only** — no URL field. Any network fetch happens host-side in `pkg/imageutil` *before* the invocation reaches `runAgent`; this PR never dereferences a caller-supplied URL. Verified by reading `run/executor.go:449-459` and the `ImagePart` usage sites. - **No injection (SQL/command/path/template).** The image parts are placed into an `llm.Message` struct via `llm.UserParts(...)` (`run/executor.go:453-458`) — never concatenated into a query, shell command, filesystem path, or template. Same handling pattern already used by the vetted `runSession.AttachImages` (`run/session.go:64-76`). - **No secret/credential leakage via audit.** `marshalMessages` (`model/call.go:329-348`) records only `ImageCount`, not the raw image bytes, so seeding images into history doesn't dump untrusted/PII blob data into the run trace. - **No authz/guardrail bypass between the two paths.** Both branches run the *same* `ag` (built with the same `systemPrompt` and tool/step-limit options at `run/executor.go:291-319`) and the same `WithSteer`. The image path only swaps the text input arg for a `WithHistory`-seeded user message; it does not skip any auth, permission, or tool-gating step. Verified by reading the surrounding `Run` body. I considered the prompt-injection-via-image angle (untrusted image could embed instructions), but inlining the bytes is the only way to do multimodal and is a model-provider concern, not a defect this code introduces or could mitigate — flagging it would be noise. Outside my lens: there's no bound on the number/size of `inv.Images` folded into the turn, but that's a resource/DoS concern for the performance lens, not security. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> I've confirmed the code-level facts I can reach. Finding 1's divergence is directly verifiable in the source; Finding 2 depends on majordomo's internal `Run`/`WithHistory` behavior, which is outside the working dir and unreachable (module cache and `go build`/`go doc` are blocked here) — and the draft itself admits it could not be verified. --- **VERDICT: Minor issues** ### Findings (Correctness lens) - **`run/executor.go:454` — an empty text part is emitted when `input == ""`, diverging from the sibling helper that deliberately guards against it.** `runAgent` unconditionally does `parts = append(parts, llm.Text(input))` before appending images. The sibling `runSession.AttachImages` (`run/session.go:64-76`) performs the identical construction but guards it (`if strings.TrimSpace(text) != "" { parts = append(parts, llm.Text(text)) }`) and additionally bails when `len(parts) == 0`. So for an image-only invocation (`inv.Images` set, empty prompt — a legitimate multimodal case the guarded sibling was written for), this path seeds the opening user message with an empty `llm.Text("")` block, which some providers reject. - *Verified:* by reading both functions and contrasting them directly (`run/executor.go:449-460` vs `run/session.go:64-76`). - *Fix:* mirror the guard — only prepend the text part when `strings.TrimSpace(input) != ""`. </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — Minor issues</summary> Both sites confirmed. The duplication finding is verified: `run/executor.go:453-458` and `run/session.go:65-75` are the only two `make([]llm.Part` sites in the repo, both in package `run`, and no shared helper exists. The empty-text drift is also real — `session.go:66` guards `strings.TrimSpace(text) != ""` while `runAgent` unconditionally appends `llm.Text(input)` at `executor.go:454`. **VERDICT: Minor issues** - **Duplicated multimodal-message construction — should be a shared helper** (`run/executor.go:453-458` vs `run/session.go:65-75`). `runAgent` builds a `[]llm.Part` from `text + images` and wraps it in `llm.UserParts(...)` with the exact shape already implemented in `runSession.AttachImages`: `make([]llm.Part, 0, len(images)+1)` → append `llm.Text(...)` → loop-append images → `llm.UserParts(parts...)`. These are the only two `make([]llm.Part` sites in the repo and both live in package `run`, so a small unexported helper (e.g. `func userMessage(text string, images []llm.ImagePart) llm.Message`) would dedupe both with zero new surface. Impact: two copies of the same construction will drift. The drift has already started: `AttachImages` guards an empty/whitespace text part (`if strings.TrimSpace(text) != "" { ... }`, session.go:66) whereas `runAgent` unconditionally appends `llm.Text(input)` (executor.go:454) — so the images path emits an empty leading text part when `input == ""`. Two sibling builders of "the initial user message" now disagree on the empty-text case. Folding both onto one helper fixes the inconsistency and is the low-churn maintainability fix. Everything else in this lane is clean: - `runAgent` is well-named, single-purpose, and the early-return-on-empty-images shape reads clearly; the doc comment is accurate and matches the house `// Why:` style. - Re-assigning the variadic `opts` via `append` before the second `Run` call is idiomatic Go, not a smell. - The two tests are near-parallel in setup, but that's ordinary table-style test boilerplate and not worth churn to factor out. </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> I've verified the change. The call site (`executor.go:323`) runs **once per run**, not in any loop, and the helper itself is trivially bounded. ## VERDICT: No material issues found Performance assessment of `runAgent` (`run/executor.go:449-459`): - **Allocation is correctly bounded and pre-sized.** `make([]llm.Part, 0, len(images)+1)` (line 453) pre-allocates exact capacity, so the append loop never reallocates. Verified. - **No deep copy of image payloads.** The loop appends `img` (an `llm.ImagePart` struct) by value — only the struct header (incl. the `Data []byte` slice header) is copied, not the underlying bytes. So large image buffers are not duplicated. Verified by reading the helper. - **Not a hot path.** The whole helper executes once per `Executor.Run` (line 323), i.e. once at the start of an agent run, gated behind a model round-trip that dwarfs any of this setup cost. The per-run cost is O(number of images), which is the input itself. - **Image-less path is zero-overhead.** `len(images) == 0` short-circuits straight to `ag.Run(ctx, input, opts...)` with no extra allocation — the text-only common case is unchanged. Verified. - **`opts = append(...)`** (line 458) appends to the locally-received variadic slice, not a shared one; no aliasing growth concern. No N+1, no work inside a hot loop, no unbounded growth, no blocking call introduced, no avoidable copy. The change is clean through the performance lens. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> Both draft findings rest entirely on majordomo-internal behavior the draft author explicitly could not verify, and which I also cannot read (the module cache lives outside the sandboxed working directory and there is no `vendor/` tree). I confirmed the executus-side code matches the diff exactly (`run/executor.go:444-460`), that `inv.Images` is `[]llm.ImagePart` (`tool/registry.go:416`), and that the call site is `run/executor.go:323`. But neither finding can be positively confirmed: - **Finding 1 (empty text part)**: The code does unconditionally append `llm.Text(input)`, but the claimed harm depends on (a) a host actually invoking `Executor.Run` with an empty `input` plus images — no caller in the repo does this; the input is always a non-empty prompt — and (b) majordomo's provider layer rejecting/not-stripping empty text blocks, which the draft itself marks unverified. Unconfirmable defect. - **Finding 2 (duplicate opening turn)**: Explicitly self-labeled "(Unverified)" and hinges wholly on how majordomo's `agent.Run` treats an empty `input` arg. Unreadable, unconfirmable. Neither survives the gate. ## VERDICT: No material issues found </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 23s</sub>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling

🔒 Security — No material issues found

🔒 Security review

VERDICT: No material issues found

Nothing in this PR opens a new security surface through the security lens. Verified by reading run/executor.go:449-460, run/session.go:64-76, tool/registry.go:410-416, and run/executor.go:183-323:

  • Tainted data flow is unchanged in shape. inv.Images (tool.Invocation.Images) was already a caller-supplied []llm.ImagePart reaching the model via runSession.AttachImages and the steer mailbox. This PR moves the same bytes into the initial turn via agent.WithHistory + llm.UserParts — the destination (model input) and trust boundary (caller-supplied, already in-scope) are identical. No new entry point for attacker-controlled data.
  • No new injection / SSRF / deserialization surface. No URL fetching, no path parsing, no shell, no template, no SQL. Image bytes are passed through opaquely to the LLM provider, as before.
  • No authn/authz gap introduced. CallerIsAdmin gating is untouched. The text-less Run call with images still hits the same ag.Run code path that already enforces upstream permissions.
  • Option-order safety. opts = append(opts, agent.WithHistory(...)) happens after the caller-supplied agent.WithSteer(steer) is captured, so WithSteer cannot be silently shadowed by history seeding. (Verified at run/executor.go:323 vs run/executor.go:458.)
  • Untrusted-image-content → prompt-injection is a pre-existing property of the multimodal pipeline (also reachable via the existing AttachImages/steerMailbox path) and is not introduced or worsened by this PR; flagging it here would duplicate the existing accepted risk, not surface a new one.

Outside my lens: nothing severe enough to escalate.

🎯 Correctness — No material issues found

VERDICT: No material issues found

Findings

None within the Correctness lens.

Verification summary

  • run/executor.go:449-459 (runAgent helper). Reads correctly: when len(images) == 0 the prompt passes through unchanged as ag.Run(ctx, input, opts...) (text-only regression guard preserved). When images are present, the helper builds []llm.Part of length len(images)+1, prepends llm.Text(input), appends each llm.ImagePart, seeds history via agent.WithHistory([]llm.Message{llm.UserParts(parts...)}), and calls ag.Run(ctx, "", opts...). The empty-string input alongside WithHistory is the canonical majordomo multimodal-opening pattern and is consistent with run/session.go:64-76 (runSession.AttachImages), which already builds []llm.Part{llm.Text(...), images...} and pushes via llm.UserParts(...).
  • API surface. agent.WithHistory, agent.RunOption, llm.UserParts, llm.Text, llm.Part, llm.ImagePart are all already used elsewhere in this repo (verified via grep on compact/, model/, run/session.go, tool/registry.go, examples/), so the helper's calls match the established Majordomo API contract.
  • Domain semantics — multimodal opening turn. Re-derived from the codebase: tool.Invocation.Images is documented (tool/registry.go:410-416) as "multi-modal image content for the initial user message" with llm.UserParts(text + image parts). The compactor's image handling (compact/compactor.go:289) and the model call's image serialization (model/call.go:337) both treat llm.ImagePart inside m.Parts as the on-the-wire image payload, which is exactly what UserParts(text, image...) produces and what the test asserts for calls[0].Request.Messages. The "1 image ≈ 1K tokens" estimate in compact/compactor.go:279-298 is unaffected by where the image appears in the history, so the token-budget accounting remains correct.
  • Order — text first, then images. Text preceding images matches the convention in runSession.AttachImages (run/session.go:65-71) and the documentation comment in tool/registry.go:412-413. Putting text first also matches typical provider expectations for OpenAI-/Anthropic-style multimodal turns.
  • Edge cases. Empty input with images still produces a valid user message (a Text("") part is harmless alongside images). Empty images slice short-circuits to the legacy path. A nil inv.Images is len(...) == 0 and takes the legacy path — no nil dereference.
  • run/images_test.go. Test assertions inspect calls[0].Request.Messages[*].Parts, type-switching on llm.ImagePart and llm.TextPart — both are existing types in the repo (compact/compactor.go:289-297, model/call.go:337, run/session.go:64). fp.Calls() shape is consistent with the existing fake provider usage in the suite.

Outside my lens: nothing to add — no other lens observations in this review.

🧹 Code cleanliness & maintainability — No material issues found

VERDICT: No material issues found

  • Pattern consistency with existing code (verified at run/session.go:64-76): the new runAgent mirrors runSession.AttachImages closely — same []llm.Part of llm.Text + image parts fed into llm.UserParts. The reuse of the established idiom is exactly the right call for cleanliness.
  • Helper extraction is appropriate (verified at run/executor.go:323): the new runAgent is small, single-purpose, named after its dispatch role, and the doc comment explains the why (majordomo's text-only Run input) rather than restating the what. The image-less fast path keeps opts... propagation clean via the early return at line 451.
  • Capacity hint is correct: make([]llm.Part, 0, len(images)+1) matches the known composition (one text + N images).
  • Test file (run/images_test.go): no copy-paste beyond what's needed for two distinct cases; test names encode intent (FoldsInitialImages / TextOnlyUnchanged); uses the existing fake provider the suite already standardizes on.
  • One unverified nit (do not block): runSession.AttachImages guards with strings.TrimSpace(text) != "" before adding the text part; runAgent does not. With an empty input and non-empty images you'd push a llm.Text("") part. This is inconsistent with the established pattern in session.go but unlikely to cause real breakage (providers tolerate empty text parts). If you want strict parity, drop the llm.Text(input) append when strings.TrimSpace(input) == "".

Nothing else in the cleanliness/maintainability lane is materially wrong: no dead code, no duplicated logic that should be shared (the helper itself is the de-duplication), no confusing names, no leaky abstractions.

Performance — No material issues found

VERDICT: No material issues found

Through the performance lens:

  • No hot-loop work. runAgent is invoked exactly once per Executor.Run (run/executor.go:323), before the agent loop starts. No per-step overhead added.
  • No unnecessary allocations. The image path sizes the parts slice exactly (make([]llm.Part, 0, len(images)+1) at run/executor.go:453); image bytes are appended by value-copy of the llm.ImagePart struct (containing the byte-slice header), no deep copy of image data.
  • Short-circuit preserved. The image-less path (run/executor.go:450–452) returns immediately with no allocation and the original ag.Run(ctx, input, opts...) call site — no regression for the text-only common case.
  • No quadratic behavior, no blocking I/O, no unbounded growth. Image bytes flow into a single first-turn user message via WithHistory; subsequent turns remain on the existing WithSteer mailbox mechanism.
  • opts aliasing is safe at the call site. opts = append(opts, agent.WithHistory(...)) at run/executor.go:458 mutates the underlying slice, but the caller (run/executor.go:323) passes a freshly-built single-element slice (agent.WithSteer(steer)), so there is no caller-visible aliasing risk.
🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

  • run/executor.go:449-460 — empty-input + images produces an empty text part. Confirmed against the code: parts := make([]llm.Part, 0, len(images)+1); parts = append(parts, llm.Text(input)) unconditionally appends llm.Text(input) even when input == "", so an empty prompt + non-empty inv.Images produces a UserParts(llm.Text(""), img1, img2, ...) opening turn. This is inconsistent with the existing pattern in run/session.go:64-76, where runSession.AttachImages explicitly guards with if strings.TrimSpace(text) != "" before adding the text part. The provider-tolerance impact remains unverified (no provider exercised here), but the inconsistency with the established helper in the same package is a real, low-risk code smell. Suggested fix: mirror the session.go guard — if strings.TrimSpace(input) != "" { parts = append(parts, llm.Text(input)) }.

  • run/executor.go:454-457 — no validation of llm.ImagePart.MIME / Data before they reach the model. Confirmed against the code: the for _, img := range images { parts = append(parts, img) } loop copies every entry through without checking MIME != "" or Data != nil. A zero-value ImagePart on tool.Invocation.Images will now flow into the first model request (where previously images were dropped on the executor path). The new TestExecutorFoldsInitialImages only exercises a well-formed image/png + PNGDATA payload, so this case is unguarded. Provider-tolerance impact unverified, but the helper is the new wire egress point and a one-line zero-value guard (if img.MIME == "" || len(img.Data) == 0 { continue } or similar) would be cheap insurance.

Verified (no action)

  • run/executor.go:323 — call site change matches the diff; agent.WithSteer(steer) is forwarded via opts... in both branches of runAgent.
  • run/executor.go:450len(images) == 0 correctly handles both nil and []ImagePart{}; the text-only branch is unchanged.
  • run/images_test.go — both new tests fail loudly (t.Fatal) when fp.Calls() is empty, and the regression-guard assertion structure looks sound.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 24s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Minor issues** — 5 reviewers: security, correctness, maintainability, performance, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## 🔒 Security review **VERDICT: No material issues found** Nothing in this PR opens a new security surface through the security lens. Verified by reading `run/executor.go:449-460`, `run/session.go:64-76`, `tool/registry.go:410-416`, and `run/executor.go:183-323`: - **Tainted data flow is unchanged in shape.** `inv.Images` (`tool.Invocation.Images`) was already a caller-supplied `[]llm.ImagePart` reaching the model via `runSession.AttachImages` and the steer mailbox. This PR moves the same bytes into the initial turn via `agent.WithHistory` + `llm.UserParts` — the *destination* (model input) and *trust boundary* (caller-supplied, already in-scope) are identical. No new entry point for attacker-controlled data. - **No new injection / SSRF / deserialization surface.** No URL fetching, no path parsing, no shell, no template, no SQL. Image bytes are passed through opaquely to the LLM provider, as before. - **No authn/authz gap introduced.** `CallerIsAdmin` gating is untouched. The text-less `Run` call with images still hits the same `ag.Run` code path that already enforces upstream permissions. - **Option-order safety.** `opts = append(opts, agent.WithHistory(...))` happens *after* the caller-supplied `agent.WithSteer(steer)` is captured, so `WithSteer` cannot be silently shadowed by history seeding. (Verified at `run/executor.go:323` vs `run/executor.go:458`.) - **Untrusted-image-content → prompt-injection** is a *pre-existing* property of the multimodal pipeline (also reachable via the existing `AttachImages`/`steerMailbox` path) and is not introduced or worsened by this PR; flagging it here would duplicate the existing accepted risk, not surface a new one. Outside my lens: nothing severe enough to escalate. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> ## VERDICT: No material issues found ### Findings None within the Correctness lens. ### Verification summary - **`run/executor.go:449-459` (`runAgent` helper).** Reads correctly: when `len(images) == 0` the prompt passes through unchanged as `ag.Run(ctx, input, opts...)` (text-only regression guard preserved). When images are present, the helper builds `[]llm.Part` of length `len(images)+1`, prepends `llm.Text(input)`, appends each `llm.ImagePart`, seeds history via `agent.WithHistory([]llm.Message{llm.UserParts(parts...)})`, and calls `ag.Run(ctx, "", opts...)`. The empty-string input alongside `WithHistory` is the canonical majordomo multimodal-opening pattern and is consistent with `run/session.go:64-76` (`runSession.AttachImages`), which already builds `[]llm.Part{llm.Text(...), images...}` and pushes via `llm.UserParts(...)`. - **API surface.** `agent.WithHistory`, `agent.RunOption`, `llm.UserParts`, `llm.Text`, `llm.Part`, `llm.ImagePart` are all already used elsewhere in this repo (verified via grep on `compact/`, `model/`, `run/session.go`, `tool/registry.go`, `examples/`), so the helper's calls match the established Majordomo API contract. - **Domain semantics — multimodal opening turn.** Re-derived from the codebase: `tool.Invocation.Images` is documented (`tool/registry.go:410-416`) as "multi-modal image content for the initial user message" with `llm.UserParts(text + image parts)`. The compactor's image handling (`compact/compactor.go:289`) and the model call's image serialization (`model/call.go:337`) both treat `llm.ImagePart` inside `m.Parts` as the on-the-wire image payload, which is exactly what `UserParts(text, image...)` produces and what the test asserts for `calls[0].Request.Messages`. The "1 image ≈ 1K tokens" estimate in `compact/compactor.go:279-298` is unaffected by where the image appears in the history, so the token-budget accounting remains correct. - **Order — text first, then images.** Text preceding images matches the convention in `runSession.AttachImages` (`run/session.go:65-71`) and the documentation comment in `tool/registry.go:412-413`. Putting text first also matches typical provider expectations for OpenAI-/Anthropic-style multimodal turns. - **Edge cases.** Empty `input` with images still produces a valid user message (a `Text("")` part is harmless alongside images). Empty `images` slice short-circuits to the legacy path. A `nil` `inv.Images` is `len(...) == 0` and takes the legacy path — no nil dereference. - **`run/images_test.go`.** Test assertions inspect `calls[0].Request.Messages[*].Parts`, type-switching on `llm.ImagePart` and `llm.TextPart` — both are existing types in the repo (`compact/compactor.go:289-297`, `model/call.go:337`, `run/session.go:64`). `fp.Calls()` shape is consistent with the existing `fake` provider usage in the suite. Outside my lens: nothing to add — no other lens observations in this review. </details> <details><summary><b>🧹 Code cleanliness & maintainability</b> — No material issues found</summary> **VERDICT: No material issues found** - **Pattern consistency with existing code** (verified at `run/session.go:64-76`): the new `runAgent` mirrors `runSession.AttachImages` closely — same `[]llm.Part` of `llm.Text` + image parts fed into `llm.UserParts`. The reuse of the established idiom is exactly the right call for cleanliness. - **Helper extraction is appropriate** (verified at `run/executor.go:323`): the new `runAgent` is small, single-purpose, named after its dispatch role, and the doc comment explains the *why* (majordomo's text-only `Run` input) rather than restating the *what*. The image-less fast path keeps `opts...` propagation clean via the early return at line 451. - **Capacity hint is correct**: `make([]llm.Part, 0, len(images)+1)` matches the known composition (one text + N images). - **Test file** (`run/images_test.go`): no copy-paste beyond what's needed for two distinct cases; test names encode intent (`FoldsInitialImages` / `TextOnlyUnchanged`); uses the existing `fake` provider the suite already standardizes on. - **One unverified nit (do not block)**: `runSession.AttachImages` guards with `strings.TrimSpace(text) != ""` before adding the text part; `runAgent` does not. With an empty `input` and non-empty `images` you'd push a `llm.Text("")` part. This is inconsistent with the established pattern in `session.go` but unlikely to cause real breakage (providers tolerate empty text parts). If you want strict parity, drop the `llm.Text(input)` append when `strings.TrimSpace(input) == ""`. Nothing else in the cleanliness/maintainability lane is materially wrong: no dead code, no duplicated logic that should be shared (the helper itself *is* the de-duplication), no confusing names, no leaky abstractions. </details> <details><summary><b>⚡ Performance</b> — No material issues found</summary> **VERDICT: No material issues found** Through the performance lens: - **No hot-loop work.** `runAgent` is invoked exactly once per `Executor.Run` (run/executor.go:323), before the agent loop starts. No per-step overhead added. - **No unnecessary allocations.** The image path sizes the parts slice exactly (`make([]llm.Part, 0, len(images)+1)` at run/executor.go:453); image bytes are appended by value-copy of the `llm.ImagePart` struct (containing the byte-slice header), no deep copy of image data. - **Short-circuit preserved.** The image-less path (run/executor.go:450–452) returns immediately with no allocation and the original `ag.Run(ctx, input, opts...)` call site — no regression for the text-only common case. - **No quadratic behavior, no blocking I/O, no unbounded growth.** Image bytes flow into a single first-turn user message via `WithHistory`; subsequent turns remain on the existing `WithSteer` mailbox mechanism. - **opts aliasing is safe at the call site.** `opts = append(opts, agent.WithHistory(...))` at run/executor.go:458 mutates the underlying slice, but the caller (run/executor.go:323) passes a freshly-built single-element slice (`agent.WithSteer(steer)`), so there is no caller-visible aliasing risk. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> **VERDICT:** Minor issues - **`run/executor.go:449-460` — empty-input + images produces an empty text part.** Confirmed against the code: `parts := make([]llm.Part, 0, len(images)+1); parts = append(parts, llm.Text(input))` unconditionally appends `llm.Text(input)` even when `input == ""`, so an empty prompt + non-empty `inv.Images` produces a `UserParts(llm.Text(""), img1, img2, ...)` opening turn. This is inconsistent with the existing pattern in `run/session.go:64-76`, where `runSession.AttachImages` explicitly guards with `if strings.TrimSpace(text) != ""` before adding the text part. The provider-tolerance impact remains unverified (no provider exercised here), but the inconsistency with the established helper in the same package is a real, low-risk code smell. Suggested fix: mirror the `session.go` guard — `if strings.TrimSpace(input) != "" { parts = append(parts, llm.Text(input)) }`. - **`run/executor.go:454-457` — no validation of `llm.ImagePart.MIME` / `Data` before they reach the model.** Confirmed against the code: the `for _, img := range images { parts = append(parts, img) }` loop copies every entry through without checking `MIME != ""` or `Data != nil`. A zero-value `ImagePart` on `tool.Invocation.Images` will now flow into the first model request (where previously images were dropped on the executor path). The new `TestExecutorFoldsInitialImages` only exercises a well-formed `image/png` + `PNGDATA` payload, so this case is unguarded. Provider-tolerance impact unverified, but the helper is the new wire egress point and a one-line zero-value guard (`if img.MIME == "" || len(img.Data) == 0 { continue }` or similar) would be cheap insurance. ### Verified (no action) - `run/executor.go:323` — call site change matches the diff; `agent.WithSteer(steer)` is forwarded via `opts...` in both branches of `runAgent`. - `run/executor.go:450` — `len(images) == 0` correctly handles both `nil` and `[]ImagePart{}`; the text-only branch is unchanged. - `run/images_test.go` — both new tests fail loudly (`t.Fatal`) when `fp.Calls()` is empty, and the regression-guard assertion structure looks sound. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 24s</sub>
steve added 1 commit 2026-06-28 05:11:17 +00:00
Every reviewer flagged that runAgent appended llm.Text(input) unconditionally, so
an image-only run (blank prompt) emitted an empty TextPart — inconsistent with the
sibling runSession.AttachImages which guards it. Mirror that guard
(strings.TrimSpace(input) != ""). Also:
- copy opts before appending (variadic backing array can have spare capacity; avoid
  aliasing a caller's slice).
- reword the doc comment to drop the mort-agentexec reference (executus is a
  standalone lib; a consumer name doesn't belong in its godoc).

Tests: image+text are co-located in ONE user message; an image-only run emits no
blank TextPart.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve merged commit f367796244 into main 2026-06-28 05:16:46 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/executus#16