run: fold inv.Images into the initial user message (multimodal opening turn) #16
Reference in New Issue
Block a user
Delete Branch "feat/initial-images"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The executor passed only the text
inputto majordomo'sagent.Run, silently droppinginv.Images. majordomo'sRuninput arg is text-only — so any multimodal run (vision: a host's chatbot @mention / chat API) lost its images on the executus path even thoughtool.Invocationcarries anImagesfield.Fix
Fold the images into the first user message (text + image parts) via
agent.WithHistoryand callRunwith an empty input — the model then sees a multimodal opening turn. Extracted as a smallrunAgenthelper; the image-less path is unchanged (the prompt passes straight through as the text input). Mirrors mort agentexec's multimodal seeding pattern.Tests
TestExecutorFoldsInitialImages— a run withImagescarries the image bytes and the prompt text into the first recorded model request.TestExecutorTextOnlyUnchanged— the text-only path still reaches the model (regression guard).-racegreen; full suite passes. Unblocks the multimodal host surfaces (mort's chatbot @mention + chat API migrations).🤖 Generated with Claude Code
🪰 Gadfly — live review status
6/6 reviewers finished · updated 2026-06-28 04:44:13Z
claude-code/opus· claude-code — ✅ doneclaude-code/opus:max· claude-code — ✅ doneclaude-code/sonnet· claude-code — ✅ donedeepseek-v4-pro:cloud· ollama-cloud — ✅ doneglm-5.2:cloud· ollama-cloud — ✅ doneminimax-m3:cloud· ollama-cloud — ✅ doneLive status board. Findings are posted in each model's own comment. Advisory only — does not block merge.
🪰 Gadfly review —
deepseek-v4-pro:cloud(ollama-cloud)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
The change is a straightforward repackaging of already-in-memory data (
inv.Images+input) into a multimodal first user message viaagent.WithHistory. Through the security lens:inv.Imageswas already populated upstream (Discord attachments, URL downloads) before this PR; the function merely stops silently dropping it. The ingestion/validation boundary is unchanged.llm.Text(input)andllm.ImagePartstructs — no string concatenation, template expansion, SQL, command execution, or path construction.llm.UserPartsproduces a user-role message, so the history injection cannot escalate to system-role or override the system prompt set at agent construction time.The image data itself (size, MIME, content) is a model-provider concern, not something this pass-through helper can or should gate — and that boundary existed before this change.
🎯 Correctness — Minor issues
Both findings are confirmed against the actual code. The draft review is accurate as written.
Verdict: Minor issues
Missing empty-text guard in
runAgent(inconsistent withrunSession.AttachImages) —run/executor.go:454. Wheninputis""(or whitespace-only) butimagesis non-empty,runAgentunconditionally prependsllm.Text(input), producing aTextPartwith empty content. The sibling functionrunSession.AttachImages(run/session.go:66) explicitly guards against this withif strings.TrimSpace(text) != ""before adding the text part. Many LLM APIs reject multimodal user messages that contain an empty/whitespace-only text part alongside images. While the current call site (executor.go:323) passes the user's prompt asinput(unlikely to be empty in practice), the function should match the established guard pattern for correctness. Note:stringsis not currently imported inexecutor.goand would need to be added.Suggested fix: Add the same guard:
Test
TestExecutorFoldsInitialImagesdoesn't verify co-location of text and image in the same message —run/images_test.go:40-54. The test iterates over all messages and all parts, settingsawImageandsawTextindependently. It would pass even if the image and text landed in separate messages, which would defeat the purpose of folding them into a single multimodal opening turn. This is a test-correctness gap: the test asserts the wrong property.🧹 Code cleanliness & maintainability — Minor issues
Verdict: Minor issues
Duplicated multimodal message construction (
run/executor.go:453–458andrun/session.go:65–75): both sites build a[]llm.Partfrom text + images and wrap it inllm.UserParts. The logic is similar (4–5 lines each), though not identical —session.goconditionally skips empty text and has an early-return guard, whileexecutor.goalways appends the text part. If the Part-construction pattern ever changes (new part type, ordering convention, etc.), both sites need updating. A small shared helper would eliminate the duplication.Test package inconsistency (
run/images_test.go:1vsrun/executor_test.go:1): the new tests usepackage run_test(black-box) while the existing tests in the same directory usepackage run(white-box). This forces the new tests to duplicate setup boilerplate (fake.New,fp.Model,run.New) thatexecutor_test.goalready provides via the unexportedfakeModelshelper. Either switching topackage runto share the helper, or extractingfakeModelsinto an exported test utility, would reduce duplication.Cross-repo reference in doc comment (
run/executor.go:448): "Mirrors mort agentexec's multimodal seeding" references a pattern in another repository. The comment would be more self-contained and maintainable without the cross-repo aside, which may drift or become meaningless to future readers of this codebase alone.⚡ Performance — No material issues found
Verdict: No material issues found
The change is clean from a performance standpoint:
len(images) == 0returnsag.Run(ctx, input, opts...)directly — zero overhead added, no extra allocation, no indirection cost.partsslice is pre-sized exactly (len(images)+1), so theappendloop never reallocates. Theopts = append(opts, ...)may allocate a new 2-element backing array (since the caller passes a single variadicagent.WithSteer(steer), giving cap=1), but that's a one-time, tiny allocation per run — negligible.for _, img := range imagescopiesllm.ImagePartby value. SinceImagePart.Datais a[]byte(slice header, not the backing array), this is O(number of images), not O(total bytes). No deep copy of image payloads.runAgentis called exactly once perExecutor.Run, not inside any iteration. No N+1 queries, no unbounded growth, no quadratic behavior, no blocking calls introduced.Nothing material to flag.
🧯 Error handling & edge cases — No material issues found
Verdict: No material issues found
The
runAgenthelper is straightforward and handles its primary edge case —len(images) == 0(both nil and empty slices) — correctly by falling through to the existingag.Run(ctx, input, opts...)path. No new panic paths are introduced, no resources are allocated that could leak, and the existingdefer recover()inExecutor.Run(line 119–123) already provides a safety net for any unexpected panic from the majordomo agent layer.A few observations that fall within this lens but don't rise to the level of a material issue:
Empty
inputwith non-emptyimages:llm.Text("")creates aTextPartwith empty content as the first part of the multimodal user message. Some model providers reject messages with an empty text part alongside images. However, this is not a regression — the prior code would have passed""directly toag.Run, which would have produced a text-only empty user message (equally problematic). The caller controlsinput, and sending images with no prompt text is an unlikely caller mistake.optsslice mutation (executor.go:458):opts = append(opts, agent.WithHistory(...))mutates the variadic parameter slice. In Go, when the caller passes a single argument (agent.WithSteer(steer)), the compiler allocates a fresh backing array;appendmay or may not grow it, but the caller holds no reference to the slice afterrunAgentreturns, so there is no aliasing hazard. Verified by reading the single call site at line 323.agnil: The caller (Executor.Run, line 152–157) already guardsmodel == niland returns an error before constructingag. A nilagreachingrunAgentwould require a bug in majordomo'sagent.New, which is outside this diff's responsibility.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 5s
🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
run/executor.go:449-460(runAgent) and the call site atrun/executor.go:323. The image path foldsinv.Images(already-typed[]llm.ImagePartfromtool.Invocation) into a singlellm.UserPartsmessage viaagent.WithHistory, then callsag.Run(ctx, "", opts...). No string interpolation, path construction, command assembly, or templating happens with untrusted input — image bytes/MIME are passed through as structuredllm.Partvalues, not concatenated into text. The text-only path passesinputstraight toag.Run, unchanged.tool.Invocation.Imagesfield (tool/registry.go:416, verified) — this PR doesn't change where they come from or how they're trusted; it only stops silently dropping them. No new taint source or injection surface is introduced.inputstring passed toRunin the image branch carries no security risk (it's a literal"", not user-controlled concatenation).Nothing in the security lens is materially wrong.
🎯 Correctness — Minor issues
VERDICT: Minor issues
run/executor.go:454— empty-textTextPartnot guarded (minor, inconsistent with the established in-package pattern).runAgentunconditionally prependsllm.Text(input)to the parts, even wheninputis the empty string. The sibling helper in the same package,run/session.go:64-68(AttachImages), deliberately guardsif strings.TrimSpace(text) != ""before appendingllm.Text(text), and the documented shape intool/registry.go:410-416is "text + image parts."runAgentskips that guard, so a multimodal invocation with a blank prompt yields a first user message containing an emptyTextPartfollowed by images — diverging from the convention the PR claims to mirror. Most providers tolerate an empty text part, but some reject it. Suggested fix: mirrorAttachImages— only appendllm.Text(input)whenstrings.TrimSpace(input) != "".No other correctness issues found in this lane. The image-present fold seeds
WithHistory([]llm.Message{llm.UserParts(text + images...)})and callsRun(ctx, "", opts...), so the model sees the multimodal opening turn as the first user message andRun's text input contributes nothing extra; the image-less path is untouched viaag.Run(ctx, input, opts...). This matches the documented intent.🧹 Code cleanliness & maintainability — No material issues found
Verdict: No material issues found
run/executor.go:449-460—runAgentis a clean, well-documented helper that closely mirrors the existing multimodal-parts construction already inrun/session.go:64-76(AttachImages). Naming, structure, and theparts := make([]llm.Part, 0, len(images)+1)capacity hint all follow the established local pattern. No dead code, no leaky abstraction, no unwarranted duplication.run/session.go:66guards the text part withstrings.TrimSpace(text) != ""before appending, whereasrunAgent(executor.go:454) appendsllm.Text(input)unconditionally. On the image pathinputis the prompt and is expected non-empty, so this is not a bug — but aligning the two "text + images → UserParts" builders (e.g. extracting a tiny shareduserImageParts(text, images)helper) would remove the only structural drift between them. Low value given how small each site is; flagging only for completeness.run/images_test.go— new tests are readable, well-named, and scoped to the behavior under review; they use the existingfakeprovider harness consistently with the rest of the suite. No maintainability concern.⚡ Performance — No material issues found
Verdict: No material issues found
Through the performance lens, this change is benign:
runAgentruns once perExecutor.Runcall (setup, not inside the agent/tool loop). The work it adds — building a[]llm.Partslice sizedlen(images)+1and appending oneRunOption— is O(images) with a single properly-preallocated slice and no copying of image byte payloads (thellm.ImagePartstruct copy shares the underlyingDataslice). Negligible.return ag.Run(ctx, input, opts...)with zero added allocation — no regression for the common case. Verified atrun/executor.go:450-451.run/executor.go:322-323and the surrounding loop atrun/executor.go:270-289.opts = append(opts, agent.WithHistory(...))atrun/executor.go:458appends to the variadic slice exactly once per run; one potential backing-array growth, immaterial.Nothing in this lane is materially wrong.
🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
run/executor.go:454—runAgentunconditionally appendsllm.Text(input)to the parts slice, even wheninputis the empty string. The only guard islen(images) == 0, so a multimodal invocation with an empty/whitespace prompt produces a degenerate emptyTextPartas the first part of the opening user message. The sibling code path in the same package (run/session.go:66) explicitly guards this case withif strings.TrimSpace(text) != ""before appending the text part, so the new helper diverges from the established in-repo pattern. Impact: some model providers choke on or drop empty text parts, and at minimum it seeds the conversation with a meaningless empty part. Suggested fix: mirrorsession.go— only appendllm.Text(input)whenstrings.TrimSpace(input) != ""; if that leaves only image parts, still push them (an image-only opening turn is valid). (Verified by readingrun/session.go:64-75andrun/executor.go:449-459.)run/executor.go:459— when images are present,ag.Run(ctx, "", opts...)is invoked with an empty input while the real prompt is seeded viaWithHistory. majordomo'sRun/WithHistorysource is not vendored in this repo (onlygo.modpins the module, andagent.WithHistory/agent.Runare only referenced, not defined, here), so I could not confirm whether an emptyinputcombined with aWithHistoryopening user message is handled cleanly by the agent loop — e.g., that it doesn't append a second empty user turn on top of the history or treat empty input as "no prompt" and short-circuit. Unverified; flagged because it's the unhappy path this diff introduces and worth a concrete check against majordomo'sRuncontract.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 34s
🪰 Gadfly review —
claude-code/opus(claude-code)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
I have what I need. The diff routes already-fetched image bytes (
llm.ImagePartis bytes-only pertool/registry.go:410-416— "pkg/imageutil downloads the bytes … majordomo image parts are bytes-only") into the first user message. No network fetch, string interpolation, or auth decision is introduced by this change.VERDICT: No material issues found
Security-lens assessment (verified by reading
run/executor.go:319-459andtool/registry.go:405-425):runAgentconsumesinv.Imagesas[]llm.ImagePart, which carry raw bytes + MIME (registry.go:416), not URLs. The download/URL-resolution happens upstream in the host (pkg/imageutil), outside this diff. Nothing here fetches a remote resource, so the fold introduces no SSRF surface.llm.UserParts(...)message, not concatenated into any string, query, path, or template.inputwas already passed to the model before this change; routing it throughllm.Text(input)in a parts slice changes the wire shape, not the trust handling.executor.go:346-350are untouched).imageutilextractor, not in this routing helper. Not a regression introduced here. (Any concern about image count/size as a resource-exhaustion vector is a performance/DoS-lens question, not security, and is not introduced by this change.)The image bytes were previously dropped; now they reach the model as intended. That is a strict, faithful addition with no new tainted-data flow that wasn't already authorized by the
Imagesfield's contract. Nothing material in the security lens.🎯 Correctness — Minor issues
VERDICT: Minor issues
run/executor.go:453-458: unconditional empty text part diverges from the guarded sibling pattern.runAgentalways doesparts = append(parts, llm.Text(input)), even wheninputis empty/whitespace. The directly-analogous helperAttachImages(run/session.go:64-76) deliberately guards: it skipsllm.Text(text)whenstrings.TrimSpace(text) == ""(and bails entirely if no parts remain). The image-only case is realistic for the surfaces this PR unblocks (a chatbot @mention with an image and no caption →input == ""), so this path emits aUserPartswhose first element is an emptyTextPart. Best case a harmless noop; the very reasonAttachImagesguards is that some providers reject/penalize empty text parts in a multimodal message. Suggest mirroring that guard:The fold is otherwise wired correctly:
inv.Imagesis the right source (tool/registry.go:416),inputis the rawRunparameter with no prior history seeding so there's no duplication, and the image-less branch is byte-for-byte the old behavior. The text-only regression guard is legitimate.🧹 Code cleanliness & maintainability — Minor issues
Both functions confirmed against the actual source. The duplication is real:
run/session.go:64-76(runSession.AttachImages) andrun/executor.go:453-458(runAgent) both build a multimodal user message with the identicalmake([]llm.Part, 0, len(images)+1)/llm.Text/ image loop /llm.UserParts(...)construction. The noted divergence —AttachImagesguards withstrings.TrimSpace(text) != ""whilerunAgentappendsllm.Text(input)unconditionally — is also accurate.VERDICT: Minor issues
Finding — duplicated multimodal message assembly (
run/executor.go:453-458vsrun/session.go:64-76)The new
runAgentbuilds allm.UserParts(text + image parts)message that is the same construction already implemented inrunSession.AttachImages— samemake([]llm.Part, 0, len(images)+1), samellm.Text, same image-append loop, samellm.UserParts(...). Both live in packagerunand both seed a user-role multimodal turn.AttachImagesguards empty text withstrings.TrimSpace(text) != ""before appendingllm.Text, whilerunAgentappendsllm.Text(input)unconditionally — exactly the kind of divergence a shared helper would prevent.func userParts(text string, images []llm.ImagePart) llm.Message, and call it from bothrunAgentandrunSession.AttachImages. Low-churn, removes the second copy, and forces one decision about the empty-text guard.⚡ Performance — No material issues found
I've verified the change at
run/executor.go:444-460against the checked-out code. Through the Performance lens there is nothing material to flag.VERDICT: No material issues found
run/executor.go:449-452— Image-less path is genuinely unchanged: it forwards straight toag.Runwith no extra allocation or wrapping. The common (text-only) hot path takes zero new work. Verified by reading the helper.run/executor.go:453— The parts slice is preallocated with exact capacity (len(images)+1), so the append loop never re-grows/re-copies the backing array. Correctly sized.run/executor.go:455-457— The loop is bounded by the number of images on the invocation (not an unbounded or repeated source), andappend(parts, img)copies only thellm.ImagePartstruct value (a slice header + MIME), not the underlying image bytes — so no large-payload copy is introduced. No N+1, no per-step or hot-loop work (this runs once at run start, outside the agent step loop), no quadratic behavior.No efficiency regression in my lane. The remaining concerns (e.g. whether
WithHistory+ empty-input semantics are correct, or whetherinv.Imagesaliasing matters) belong to the correctness lens, not performance.🧯 Error handling & edge cases — Minor issues
Both core claims confirmed against the actual source:
run/executor.go:454unconditionally appendsllm.Text(input)before image parts, with no empty/whitespace guard.run/session.go:64-76(AttachImages) does the identical fold but guards both the text append (if strings.TrimSpace(text) != "") and the empty-parts case (if len(parts) == 0 { return }).The divergence between the two in-repo multimodal-seeding paths is real. The draft is honest about the one thing it couldn't verify (whether majordomo's provider layer rejects an empty text part, since that source is outside the working tree). The finding stands.
VERDICT: Minor issues
🧯 Error handling & edge cases
run/executor.go:454— image-only run produces a blank text part (unhandled empty-input edge case).runAgentunconditionally prependsllm.Text(input)before the image parts. Wheninputis empty or whitespace-only — a realistic case, since a host can foldinv.Imageswith no caption (a chatbot @mention that is just an image, no prompt) — the opening user turn becomes[TextPart{""}, ImagePart]. The codebase's own sibling implementation,runSession.AttachImages(run/session.go:64-76), does the identical fold but guards it:if strings.TrimSpace(text) != "" { parts = append(parts, llm.Text(text)) }and also early-returns whenlen(parts) == 0. The new helper omits both guards, so the two multimodal-seeding paths now diverge on the empty-text edge.AttachImages—if strings.TrimSpace(input) != "" { parts = append(parts, llm.Text(input)) }so an image-only invocation seeds a clean single-image turn."describe this"), so this empty-caption edge is untested.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 59s
🪰 Gadfly review —
claude-code/sonnet(claude-code)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
I have enough information to give a definitive verdict.
No material issues found
The
runAgenthelper takesllm.ImagePartvalues (bytes-only;pkg/imageutilis the upstream downloader documented in theImagesfield comment attool/registry.go:413-415) and folds them into the initial history message viaagent.WithHistory. From the security lens:No new attack surface:
inv.Imageswas already being carried intool.Invocationbefore this PR; the bytes just weren't reaching the model. The trust boundary and any MIME-type/size validation live upstream in the callers that populateImages— this PR doesn't move or widen that boundary.optsslice safety:opts = append(opts, ...)insiderunAgentoperates on the variadic copy, not the caller's backing array. No aliasing risk.No authz bypass via
WithHistory: The pre-seeded history message is treated as context, not a new turn, so it doesn't bypass dispatch-guard caps or budget gates wired into the agent at construction time (agent.New(model, e.systemPrompt(ra), opts...)).No injection vectors: No SQL, shell, path traversal, or template expansion touches the image bytes or MIME strings in this change. MIME strings are passed to the majordomo provider layer; I couldn't read the majordomo source from this repo's context, but the concern (provider sending a MIME string verbatim in HTTP headers) is pre-existing and not introduced here.
No credential leakage: No new logging of image bytes or MIME types.
🎯 Correctness — Minor issues
Both findings verified against the code. Finding 1 is confirmed. Finding 2 requires majordomo source which is inaccessible from the allowed working directory. The draft's own reasoning for Finding 2 acknowledges it couldn't verify the majordomo contract and frames it as "unverified" — this is correctly a speculative finding, not a confirmed code defect.
Verdict: Minor issues
Finding 1 —
llm.Text(input)always emitted, even wheninput == ""runAgent(executor.go:453–454) unconditionally prependsllm.Text(input)to the parts slice whenever images are present:The analogous multimodal helper in the same package —
runSession.AttachImages(session.go:66–68) — guards against this:If a caller supplies
Imagesbut leaves theinputstring empty,runAgentbuilds aUserParts(llm.Text(""), image…)message with a spurious empty text part. Several providers (e.g. Anthropic) reject or warn on empty text blocks inside acontentarray. The fix is to apply the same guard used insession.go.🧹 Code cleanliness & maintainability — Minor issues
Both call sites and the comment are confirmed in the actual source. The only
runAgentcall is at line 323 (bare value, not a slice expansion — the draft's "Verified" note cites 319 by mistake, but that line is theagent.Newcall; the factual safety claim holds). The comment at lines 444–448 is exactly as described. Both findings survive.Minor issues
The extraction is clean overall —
runAgentis a sensible factoring, the tests follow established patterns in the package, and the image-less path is unchanged. Two items worth flagging:optsmutation footgun —run/executor.go:458optsis a variadic parameter. In Go, when the caller expands a slice withslice..., the variadic parameter shares the caller's backing array. If there is spare capacity,appendhere writes into that array and silently corrupts the caller's slice. The current sole call site is safe — it passes a single element literal, so Go allocates a fresh 1-element slice andappendtriggers a reallocation — but the function's signature makesrunAgent(ctx, ag, input, images, myOpts...)look entirely natural, and any future caller with a pre-allocated-or-resliced opts slice would be silently bitten.Fix: cap-limit the slice before appending to guarantee a new allocation regardless of call site:
Comment over-explains the implementation —
run/executor.go:444–448The five-line block comment restates the mechanics the code itself shows clearly (build
parts, seed viaWithHistory, pass empty string toRun). The non-obvious WHY — that majordomo'sRunis text-only — warrants a comment, but the rest is paraphrase. Per the repo's house style (CLAUDE.md: "Only add [a comment] when the WHY is non-obvious"), the comment could be trimmed to a single line capturing just the constraint:The "Mirrors mort agentexec" note has value as an architectural cross-reference but would fit better in the PR description than in a package-level comment (the
agentexecpackage is not visible from this module).⚡ Performance — No material issues found
Now I have enough to render a verdict.
No material issues found
Fast path is zero-overhead. The
len(images) == 0guard atexecutor.go:450short-circuits immediately for text-only runs (the common case), so the existing hot path is untouched.partsslice is correctly pre-sized.make([]llm.Part, 0, len(images)+1)at line 453 allocates exactly the right capacity; no growth reallocations occur inside the loop.optsappend is safe and bounded. The call site (executor.go:323) passes a single variadic arg, so Go allocates a fresh[]agent.RunOptionwithlen == cap == 1. Theappendat line 458 therefore always allocates a new backing array of size 2 — one small allocation per multimodal run, not per step or per tool call. Given this wraps an LLM network round-trip, it is inconsequential.llm.ImagePartloop copies only slice headers.Data []byteis a slice header (pointer + len + cap); the underlying pixel bytes are not copied, so the O(n-images) loop is as cheap as possible.No unbounded growth, no N+1 patterns, no work added inside the per-step loop.
🧯 Error handling & edge cases — Minor issues
The finding is confirmed. Line 454 unconditionally appends
llm.Text(input)even wheninput == "", whilesession.go:66-68guards withstrings.TrimSpace(text) != "".Verdict: Minor issues
Finding: Empty
inputwith images injects a spuriousllm.Text("")partrun/executor.go:454— wheninput == ""(caller provides images but no prompt text),runAgentunconditionally prependsllm.Text(input)to the parts slice, producing a multimodal history message whose first element is an empty-string text part:The sibling path in
run/session.go:66-68— which performs the same image-folding job for mid-runAttachImagescalls — explicitly guards against this:An empty
TextPartin a multimodal message is not universally tolerated: several LLM providers (Anthropic included) reject messages where a text block has an empty string. This would cause a runtime error only on the pathlen(images) > 0 && input == "", which is a valid call (images-only invocation with a system-prompt–only agent), but it's a latent edge-case failure.Suggested fix — mirror the
AttachImagesguard:Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 56s
🪰 Gadfly review —
claude-code/opus:max(claude-code)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
I've verified the key security-relevant facts against the checked-out code. My findings through the Security lens:
VERDICT: No material issues found
The change is a minimal, faithful passthrough of already-materialized image bytes into the model's opening turn. Nothing in the Security lane is materially wrong.
What I checked and why it's clean:
No SSRF introduced.
tool.Invocation.Imagesis[]llm.ImagePart, and the field comment (tool/registry.go:410-416) plus the test (llm.ImagePart{MIME, Data: []byte(...)}) confirm majordomo image parts are bytes-only — no URL field. Any network fetch happens host-side inpkg/imageutilbefore the invocation reachesrunAgent; this PR never dereferences a caller-supplied URL. Verified by readingrun/executor.go:449-459and theImagePartusage sites.No injection (SQL/command/path/template). The image parts are placed into an
llm.Messagestruct viallm.UserParts(...)(run/executor.go:453-458) — never concatenated into a query, shell command, filesystem path, or template. Same handling pattern already used by the vettedrunSession.AttachImages(run/session.go:64-76).No secret/credential leakage via audit.
marshalMessages(model/call.go:329-348) records onlyImageCount, not the raw image bytes, so seeding images into history doesn't dump untrusted/PII blob data into the run trace.No authz/guardrail bypass between the two paths. Both branches run the same
ag(built with the samesystemPromptand tool/step-limit options atrun/executor.go:291-319) and the sameWithSteer. The image path only swaps the text input arg for aWithHistory-seeded user message; it does not skip any auth, permission, or tool-gating step. Verified by reading the surroundingRunbody.I considered the prompt-injection-via-image angle (untrusted image could embed instructions), but inlining the bytes is the only way to do multimodal and is a model-provider concern, not a defect this code introduces or could mitigate — flagging it would be noise.
Outside my lens: there's no bound on the number/size of
inv.Imagesfolded into the turn, but that's a resource/DoS concern for the performance lens, not security.🎯 Correctness — Minor issues
I've confirmed the code-level facts I can reach. Finding 1's divergence is directly verifiable in the source; Finding 2 depends on majordomo's internal
Run/WithHistorybehavior, which is outside the working dir and unreachable (module cache andgo build/go docare blocked here) — and the draft itself admits it could not be verified.VERDICT: Minor issues
Findings (Correctness lens)
run/executor.go:454— an empty text part is emitted wheninput == "", diverging from the sibling helper that deliberately guards against it.runAgentunconditionally doesparts = append(parts, llm.Text(input))before appending images. The siblingrunSession.AttachImages(run/session.go:64-76) performs the identical construction but guards it (if strings.TrimSpace(text) != "" { parts = append(parts, llm.Text(text)) }) and additionally bails whenlen(parts) == 0. So for an image-only invocation (inv.Imagesset, empty prompt — a legitimate multimodal case the guarded sibling was written for), this path seeds the opening user message with an emptyllm.Text("")block, which some providers reject.run/executor.go:449-460vsrun/session.go:64-76).strings.TrimSpace(input) != "".🧹 Code cleanliness & maintainability — Minor issues
Both sites confirmed. The duplication finding is verified:
run/executor.go:453-458andrun/session.go:65-75are the only twomake([]llm.Partsites in the repo, both in packagerun, and no shared helper exists. The empty-text drift is also real —session.go:66guardsstrings.TrimSpace(text) != ""whilerunAgentunconditionally appendsllm.Text(input)atexecutor.go:454.VERDICT: Minor issues
Duplicated multimodal-message construction — should be a shared helper (
run/executor.go:453-458vsrun/session.go:65-75).runAgentbuilds a[]llm.Partfromtext + imagesand wraps it inllm.UserParts(...)with the exact shape already implemented inrunSession.AttachImages:make([]llm.Part, 0, len(images)+1)→ appendllm.Text(...)→ loop-append images →llm.UserParts(parts...). These are the only twomake([]llm.Partsites in the repo and both live in packagerun, so a small unexported helper (e.g.func userMessage(text string, images []llm.ImagePart) llm.Message) would dedupe both with zero new surface. Impact: two copies of the same construction will drift.The drift has already started:
AttachImagesguards an empty/whitespace text part (if strings.TrimSpace(text) != "" { ... }, session.go:66) whereasrunAgentunconditionally appendsllm.Text(input)(executor.go:454) — so the images path emits an empty leading text part wheninput == "". Two sibling builders of "the initial user message" now disagree on the empty-text case. Folding both onto one helper fixes the inconsistency and is the low-churn maintainability fix.Everything else in this lane is clean:
runAgentis well-named, single-purpose, and the early-return-on-empty-images shape reads clearly; the doc comment is accurate and matches the house// Why:style.optsviaappendbefore the secondRuncall is idiomatic Go, not a smell.⚡ Performance — No material issues found
I've verified the change. The call site (
executor.go:323) runs once per run, not in any loop, and the helper itself is trivially bounded.VERDICT: No material issues found
Performance assessment of
runAgent(run/executor.go:449-459):make([]llm.Part, 0, len(images)+1)(line 453) pre-allocates exact capacity, so the append loop never reallocates. Verified.img(anllm.ImagePartstruct) by value — only the struct header (incl. theData []byteslice header) is copied, not the underlying bytes. So large image buffers are not duplicated. Verified by reading the helper.Executor.Run(line 323), i.e. once at the start of an agent run, gated behind a model round-trip that dwarfs any of this setup cost. The per-run cost is O(number of images), which is the input itself.len(images) == 0short-circuits straight toag.Run(ctx, input, opts...)with no extra allocation — the text-only common case is unchanged. Verified.opts = append(...)(line 458) appends to the locally-received variadic slice, not a shared one; no aliasing growth concern.No N+1, no work inside a hot loop, no unbounded growth, no blocking call introduced, no avoidable copy. The change is clean through the performance lens.
🧯 Error handling & edge cases — No material issues found
Both draft findings rest entirely on majordomo-internal behavior the draft author explicitly could not verify, and which I also cannot read (the module cache lives outside the sandboxed working directory and there is no
vendor/tree). I confirmed the executus-side code matches the diff exactly (run/executor.go:444-460), thatinv.Imagesis[]llm.ImagePart(tool/registry.go:416), and that the call site isrun/executor.go:323. But neither finding can be positively confirmed:Finding 1 (empty text part): The code does unconditionally append
llm.Text(input), but the claimed harm depends on (a) a host actually invokingExecutor.Runwith an emptyinputplus images — no caller in the repo does this; the input is always a non-empty prompt — and (b) majordomo's provider layer rejecting/not-stripping empty text blocks, which the draft itself marks unverified. Unconfirmable defect.Finding 2 (duplicate opening turn): Explicitly self-labeled "(Unverified)" and hinges wholly on how majordomo's
agent.Runtreats an emptyinputarg. Unreadable, unconfirmable.Neither survives the gate.
VERDICT: No material issues found
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 23s
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Minor issues — 5 reviewers: security, correctness, maintainability, performance, error-handling
🔒 Security — No material issues found
🔒 Security review
VERDICT: No material issues found
Nothing in this PR opens a new security surface through the security lens. Verified by reading
run/executor.go:449-460,run/session.go:64-76,tool/registry.go:410-416, andrun/executor.go:183-323:inv.Images(tool.Invocation.Images) was already a caller-supplied[]llm.ImagePartreaching the model viarunSession.AttachImagesand the steer mailbox. This PR moves the same bytes into the initial turn viaagent.WithHistory+llm.UserParts— the destination (model input) and trust boundary (caller-supplied, already in-scope) are identical. No new entry point for attacker-controlled data.CallerIsAdmingating is untouched. The text-lessRuncall with images still hits the sameag.Runcode path that already enforces upstream permissions.opts = append(opts, agent.WithHistory(...))happens after the caller-suppliedagent.WithSteer(steer)is captured, soWithSteercannot be silently shadowed by history seeding. (Verified atrun/executor.go:323vsrun/executor.go:458.)AttachImages/steerMailboxpath) and is not introduced or worsened by this PR; flagging it here would duplicate the existing accepted risk, not surface a new one.Outside my lens: nothing severe enough to escalate.
🎯 Correctness — No material issues found
VERDICT: No material issues found
Findings
None within the Correctness lens.
Verification summary
run/executor.go:449-459(runAgenthelper). Reads correctly: whenlen(images) == 0the prompt passes through unchanged asag.Run(ctx, input, opts...)(text-only regression guard preserved). When images are present, the helper builds[]llm.Partof lengthlen(images)+1, prependsllm.Text(input), appends eachllm.ImagePart, seeds history viaagent.WithHistory([]llm.Message{llm.UserParts(parts...)}), and callsag.Run(ctx, "", opts...). The empty-string input alongsideWithHistoryis the canonical majordomo multimodal-opening pattern and is consistent withrun/session.go:64-76(runSession.AttachImages), which already builds[]llm.Part{llm.Text(...), images...}and pushes viallm.UserParts(...).agent.WithHistory,agent.RunOption,llm.UserParts,llm.Text,llm.Part,llm.ImagePartare all already used elsewhere in this repo (verified via grep oncompact/,model/,run/session.go,tool/registry.go,examples/), so the helper's calls match the established Majordomo API contract.tool.Invocation.Imagesis documented (tool/registry.go:410-416) as "multi-modal image content for the initial user message" withllm.UserParts(text + image parts). The compactor's image handling (compact/compactor.go:289) and the model call's image serialization (model/call.go:337) both treatllm.ImagePartinsidem.Partsas the on-the-wire image payload, which is exactly whatUserParts(text, image...)produces and what the test asserts forcalls[0].Request.Messages. The "1 image ≈ 1K tokens" estimate incompact/compactor.go:279-298is unaffected by where the image appears in the history, so the token-budget accounting remains correct.runSession.AttachImages(run/session.go:65-71) and the documentation comment intool/registry.go:412-413. Putting text first also matches typical provider expectations for OpenAI-/Anthropic-style multimodal turns.inputwith images still produces a valid user message (aText("")part is harmless alongside images). Emptyimagesslice short-circuits to the legacy path. Anilinv.Imagesislen(...) == 0and takes the legacy path — no nil dereference.run/images_test.go. Test assertions inspectcalls[0].Request.Messages[*].Parts, type-switching onllm.ImagePartandllm.TextPart— both are existing types in the repo (compact/compactor.go:289-297,model/call.go:337,run/session.go:64).fp.Calls()shape is consistent with the existingfakeprovider usage in the suite.Outside my lens: nothing to add — no other lens observations in this review.
🧹 Code cleanliness & maintainability — No material issues found
VERDICT: No material issues found
run/session.go:64-76): the newrunAgentmirrorsrunSession.AttachImagesclosely — same[]llm.Partofllm.Text+ image parts fed intollm.UserParts. The reuse of the established idiom is exactly the right call for cleanliness.run/executor.go:323): the newrunAgentis small, single-purpose, named after its dispatch role, and the doc comment explains the why (majordomo's text-onlyRuninput) rather than restating the what. The image-less fast path keepsopts...propagation clean via the early return at line 451.make([]llm.Part, 0, len(images)+1)matches the known composition (one text + N images).run/images_test.go): no copy-paste beyond what's needed for two distinct cases; test names encode intent (FoldsInitialImages/TextOnlyUnchanged); uses the existingfakeprovider the suite already standardizes on.runSession.AttachImagesguards withstrings.TrimSpace(text) != ""before adding the text part;runAgentdoes not. With an emptyinputand non-emptyimagesyou'd push allm.Text("")part. This is inconsistent with the established pattern insession.gobut unlikely to cause real breakage (providers tolerate empty text parts). If you want strict parity, drop thellm.Text(input)append whenstrings.TrimSpace(input) == "".Nothing else in the cleanliness/maintainability lane is materially wrong: no dead code, no duplicated logic that should be shared (the helper itself is the de-duplication), no confusing names, no leaky abstractions.
⚡ Performance — No material issues found
VERDICT: No material issues found
Through the performance lens:
runAgentis invoked exactly once perExecutor.Run(run/executor.go:323), before the agent loop starts. No per-step overhead added.make([]llm.Part, 0, len(images)+1)at run/executor.go:453); image bytes are appended by value-copy of thellm.ImagePartstruct (containing the byte-slice header), no deep copy of image data.ag.Run(ctx, input, opts...)call site — no regression for the text-only common case.WithHistory; subsequent turns remain on the existingWithSteermailbox mechanism.opts = append(opts, agent.WithHistory(...))at run/executor.go:458 mutates the underlying slice, but the caller (run/executor.go:323) passes a freshly-built single-element slice (agent.WithSteer(steer)), so there is no caller-visible aliasing risk.🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
run/executor.go:449-460— empty-input + images produces an empty text part. Confirmed against the code:parts := make([]llm.Part, 0, len(images)+1); parts = append(parts, llm.Text(input))unconditionally appendsllm.Text(input)even wheninput == "", so an empty prompt + non-emptyinv.Imagesproduces aUserParts(llm.Text(""), img1, img2, ...)opening turn. This is inconsistent with the existing pattern inrun/session.go:64-76, whererunSession.AttachImagesexplicitly guards withif strings.TrimSpace(text) != ""before adding the text part. The provider-tolerance impact remains unverified (no provider exercised here), but the inconsistency with the established helper in the same package is a real, low-risk code smell. Suggested fix: mirror thesession.goguard —if strings.TrimSpace(input) != "" { parts = append(parts, llm.Text(input)) }.run/executor.go:454-457— no validation ofllm.ImagePart.MIME/Databefore they reach the model. Confirmed against the code: thefor _, img := range images { parts = append(parts, img) }loop copies every entry through without checkingMIME != ""orData != nil. A zero-valueImagePartontool.Invocation.Imageswill now flow into the first model request (where previously images were dropped on the executor path). The newTestExecutorFoldsInitialImagesonly exercises a well-formedimage/png+PNGDATApayload, so this case is unguarded. Provider-tolerance impact unverified, but the helper is the new wire egress point and a one-line zero-value guard (if img.MIME == "" || len(img.Data) == 0 { continue }or similar) would be cheap insurance.Verified (no action)
run/executor.go:323— call site change matches the diff;agent.WithSteer(steer)is forwarded viaopts...in both branches ofrunAgent.run/executor.go:450—len(images) == 0correctly handles bothniland[]ImagePart{}; the text-only branch is unchanged.run/images_test.go— both new tests fail loudly (t.Fatal) whenfp.Calls()is empty, and the regression-guard assertion structure looks sound.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 24s