15 Commits

Author SHA1 Message Date
Steve Dudenhoeffer b25a13ed4f chore: repin gadfly reusable to @5007597 (structured findings + consensus + inline review)
Adopts gadfly's review-representation overhaul: one ranked consensus comment
across the swarm + an advisory COMMENT-state inline PR review, on image
sha-3095ebf. Swarm config still rides the owner variables.

[skip ci]
2026-06-28 22:13:24 -04:00
steve add8f847a4 Merge pull request 'feat(run): InputFileStager seam — stage non-image attachments into the prompt' (#18) from feat/input-file-stager into main
executus CI / test (push) Successful in 1m51s
2026-06-28 18:19:28 +00:00
steve df4033f42e fix(run): harden input-file staging per gadfly #18 validation pass
executus CI / test (pull_request) Successful in 48s
Second-pass findings on the security fix:

- Mime sanitized ONCE and passed to BOTH StageInputFile and the descriptor (was
  passing raw f.MimeType to the host store while only the descriptor sanitized) —
  3 models.
- sanitizeField now also strips Unicode format chars (category Cf, incl. the bidi
  overrides U+202A–U+202E that can reorder how the descriptor renders); IsControl
  already covers \n\r\t so the explicit checks are dropped.
- fileID is sanitized before inlining + an empty file_id drops the file (defense
  vs a misbehaving stager).
- humanizeBytes clamps the prefix index so an absurd size (≥1024^6) can't index
  past "KMGTPE" and panic — a no-panic guarantee independent of the per-file cap.
- Docs sync: README Ports list gains InputFiles; tool.InputFile.Name doc now says
  the executor reduces an untrusted name to a safe base name (was claiming the
  field is already safe).

Tests: bidi/control stripping; mime sanitized in staged value + descriptor; empty
file_id drop; humanizeBytes no-panic across sizes up to 1<<62. Suite green (-race).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 14:08:57 -04:00
steve 1e65f4b6e5 fix(run): sanitize input-file names — path-traversal + prompt-injection hardening (gadfly #18)
executus CI / test (pull_request) Successful in 48s
The full swarm (5-6 models) flagged that stageInputFiles passed the untrusted
attachment filename straight to StageInputFile and inlined it into the
[ATTACHED FILES]/`/workspace/<name>` descriptor with no sanitization — a path
the byte-cap already treats as a trust boundary. A name like ../../etc/passwd or
an absolute/drive path could escape the host store or the sandbox workspace, and
newlines in the name/mime could inject text into the prompt block.

- sanitizeName: strips control chars/newlines, then reduces to a base name
  (path.Base after backslash-normalization) so ../, nested dirs, and absolute /
  drive paths all collapse to their last element; "attachment" fallback for
  empty/"."/"..". Applied BEFORE staging AND inlining.
- sanitizeField: strips control chars from MimeType (also inlined verbatim).
- maxInputFiles (32) count cap — defense-in-depth vs a flood of tiny files,
  independent of the per-file byte cap.

Tests: sanitizeName table (traversal/absolute/backslash/control/fallback, +
no-separator invariant); traversal staged+described under the base name only;
oversize skip; count-cap truncation. Full suite green (-race).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 13:29:45 -04:00
steve 2ef88f2a73 feat(run): InputFileStager seam — stage non-image attachments into the prompt
Adversarial Review (Gadfly) / review (pull_request) Has been cancelled
executus CI / test (pull_request) Successful in 2m21s
executus's tool.Invocation already carried InputFiles (audio/PDF/binary), but the
executor never staged them — only Images were folded into the run. This adds the
host seam mort's chat/chatbot surfaces need for audio-input parity with agentexec.

- run.Ports gains InputFiles InputFileStager (nil-safe; nil = input files silently
  ignored, run still proceeds text-only). The interface mirrors mort's skill
  FileStorage: StageInputFile(ctx, runID, agentID, name, mime, content) → file_id.
- run/input_files.go (ported from mort agentexec/input_files.go): stageInputFiles
  persists each file under run scope and appends an [ATTACHED FILES] descriptor
  block to the prompt so the agent can reach them by file_id (e.g. code_exec
  files_in → /workspace/<name>). Bytes are NEVER inlined into model context.
  Best-effort: empty/oversized(>50MB)/save-error files are skipped; colliding
  base names are disambiguated (name-2, name-3) so they don't clobber at
  /workspace/<name>.
- Executor.Run calls it after the model/toolbox build, before the loop, so the
  descriptor rides the first user turn (alongside the existing Images folding).

Tests: stages + builds the block; nil stager / no files leave the prompt intact;
dedup; empty/save-error skipping. Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 13:02:55 -04:00
Steve Dudenhoeffer 7a5eebc468 fix(ci): restore valid adversarial-review.yml + pin gadfly reusable @7bc3c98 [skip ci]
The reusable now reads swarm config from user-scope vars (GADFLY_DEFAULT_* +
GADFLY_ENDPOINT_*); this immutable @sha bumps past the long-lived-runner ref
cache so the vars-config reusable is adopted. Direct to main + [skip ci] to
avoid triggering the review swarm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 02:05:28 -04:00
steve 7211ce227c ci: pin gadfly reusable workflow to immutable sha (cache-bust @v1)
executus CI / test (push) Successful in 48s
Long-lived act_runners cache the reusable-workflow ref, so a moved @v1 tag
keeps resolving to a stale cached copy and a newly-added reviewer never runs.
Pinning to a unique immutable sha forces a cache miss → fresh fetch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 05:44:52 +00:00
steve f367796244 Merge pull request 'run: fold inv.Images into the initial user message (multimodal opening turn)' (#16) from feat/initial-images into main
executus CI / test (push) Failing after 18m3s
Adversarial Review (Gadfly) / review (pull_request) Successful in 2s
2026-06-28 05:16:46 +00:00
steve 0acaa8c9a5 run: guard empty text part in runAgent + drop cross-repo doc ref (gadfly #16)
executus CI / test (pull_request) Successful in 1m46s
Every reviewer flagged that runAgent appended llm.Text(input) unconditionally, so
an image-only run (blank prompt) emitted an empty TextPart — inconsistent with the
sibling runSession.AttachImages which guards it. Mirror that guard
(strings.TrimSpace(input) != ""). Also:
- copy opts before appending (variadic backing array can have spare capacity; avoid
  aliasing a caller's slice).
- reword the doc comment to drop the mort-agentexec reference (executus is a
  standalone lib; a consumer name doesn't belong in its godoc).

Tests: image+text are co-located in ONE user message; an image-only run emits no
blank TextPart.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 01:11:15 -04:00
steve a35c176b42 run: fold inv.Images into the initial user message (multimodal opening turn)
executus CI / test (pull_request) Successful in 46s
Adversarial Review (Gadfly) / review (pull_request) Successful in 6m5s
The executor passed only the text `input` to majordomo's agent.Run, silently
dropping inv.Images — so a multimodal run (vision: chatbot @mention, chat API)
lost its images on the executus path. majordomo's Run input arg is text-only, so
fold the images into the first user message (text + image parts) via WithHistory
and call Run with empty input, mirroring mort agentexec's multimodal seeding. The
image-less path is unchanged (prompt passes straight through).

Tests: a run with Images carries the image bytes + prompt into the first model
request; the text-only path still reaches the model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 00:37:53 -04:00
steve 1cf46c9954 ci: track gadfly's v1 release tag instead of a pinned sha (#15)
executus CI / test (push) Successful in 52s
2026-06-28 04:08:30 +00:00
steve 56baac758d ci: inherit gadfly's default swarm (slim caller, re-pin @b02b11d) (#14)
executus CI / test (push) Successful in 50s
2026-06-28 02:48:25 +00:00
steve 5779035722 Merge pull request 'ci: subscribe to gadfly's reusable review workflow (cloud + Claude Code, no local)' (#13) from ci/gadfly-reusable into main
executus CI / test (push) Successful in 3m56s
2026-06-28 01:43:42 +00:00
Steve Dudenhoeffer 1a2a2364ec security: scope forwarded secrets + pin gadfly reusable to an immutable sha
executus CI / test (pull_request) Successful in 2m13s
Adversarial Review (Gadfly) / review (pull_request) Successful in 10m31s
Address the swarm's findings on this rollout:
- Replace `secrets: inherit` (which forwarded ALL repo secrets — registry/
  Komodo/Discord/DB creds the reviewer never uses) with explicit forwarding of
  only OLLAMA_CLOUD_API_KEY / CLAUDE_CODE_OAUTH_TOKEN / findings tokens.
  GITEA_TOKEN is the automatic job token (github.token in the reusable).
- Pin uses: ...@main -> @20a5c43 (immutable) so a push to gadfly can't change
  the code that runs with our forwarded secrets.

Requires gadfly's review-reusable.yml secrets contract (steve/gadfly#9, merged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 21:18:59 -04:00
Steve Dudenhoeffer c08ce47fa6 ci: subscribe to gadfly's reusable review workflow (cloud + Claude Code, no local)
executus CI / test (pull_request) Successful in 47s
Adversarial Review (Gadfly) / review (pull_request) Successful in 12m29s
Replace the full self-contained stub with a thin caller of steve/gadfly's
reusable workflow, using gadfly's own dogfood config: 6 cloud models +
the Claude Code engine (sonnet, opus, opus:max). No local Macs / foreman.
Advisory only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 20:03:00 -04:00
8 changed files with 618 additions and 63 deletions
+23 -58
View File
@@ -1,11 +1,8 @@
# Gadfly — agentic adversarial PR reviewer (https://gitea.stevedudenhoeffer.com/steve/gadfly).
#
# Runs the published Gadfly image (pinned to an immutable :sha- tag — act_runner
# caches :latest, and this build is what carries foreman provider-type support)
# as a specialist swarm and posts
# ONE consolidated review comment as gitea-actions. Advisory only — never blocks a
# merge. This reviews executus PRs with 3 ollama-cloud models (3-lens suite). Gadfly
# is a simple system — findings are advisory; always double-check before acting.
# Gadfly adversarial review — subscribes to steve/gadfly's reusable workflow and
# INHERITS its default swarm. This stub holds only the triggers, the actor gate,
# secret forwarding, and the allow-list; the swarm config (models, lenses,
# concurrency, timeouts) lives centrally in gadfly's review-reusable.yml so it is
# tuned in ONE place. Advisory only — never blocks a merge.
name: Adversarial Review (Gadfly)
@@ -32,59 +29,27 @@ concurrency:
jobs:
review:
# Security: only trusted users may trigger a secret-bearing run via a PR
# comment (pull_request + workflow_dispatch are already trusted). Mirrors
# GADFLY_ALLOWED_USERS, the in-container belt-and-suspenders check.
# comment (pull_request + workflow_dispatch are already trusted). Mirrors the
# allowed_users input below (the in-container belt-and-suspenders check) — both
# lists must stay in sync; a workflow if: can't read a workflow_call input.
if: >-
github.event_name != 'issue_comment'
|| (github.event.issue.pull_request
&& (github.actor == 'steve'
|| github.actor == 'fizi'
|| github.actor == 'dazed'))
runs-on: ubuntu-latest
# Full fleet: 3 cloud (lens fan-out) + M1/M5 Macs via foreman. The slow local
# lanes dominate wall time, so allow plenty of headroom.
timeout-minutes: 90
steps:
- uses: docker://gitea.stevedudenhoeffer.com/steve/gadfly:sha-d7f364d
env:
GITEA_API: ${{ github.server_url }}/api/v1/repos/${{ github.repository }}
GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }}
# Local Macs, reached through their foreman queues (native Ollama on the
# each a foreman-preset Ollama client at the secret's URL, of the form:
# foreman|https://<foreman-host>|<token>
# Needs an image with foreman provider-type support (this one). If a Mac
# is offline that model's comment shows an error and the others still post.
# (Gitea secrets aren't auto-exposed — map each explicitly.)
# Full fleet: 3 cloud + M1 Pro + M5 Max. The Macs are back so the
# gadfly-reports scoreboard can quantify whether they earn their keep
# (they previously took 2629 min for ZERO real findings — now measured).
# Cloud concurrency lives in the LENSES: one cloud model at a time
# (ollama-cloud=1) with its 3 lenses concurrent (LENS ollama-cloud=3) so
# its comment lands sooner; each Mac runs one model, lenses serial (its
# foreman queue serializes anyway). All three provider lanes run parallel.
GADFLY_MODELS: "minimax-m3:cloud,glm-5.2:cloud,glm-5.1:cloud,deepseek-v4-pro:cloud,nemotron-3-super:cloud,qwen3-coder:480b-cloud"
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3"
GADFLY_PROVIDER_LENS_CONCURRENCY: "ollama-cloud=3"
# Default => the 3-lens suite (security, correctness, error-handling).
# Set the repo var GADFLY_SPECIALISTS to override (csv / "all" / "auto").
GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS || 'security,correctness,error-handling' }}
# Per-lens deadline + bounded steps so the slow local models stay sane.
GADFLY_TIMEOUT_SECS: "600"
GADFLY_MAX_STEPS: "14"
# Allow-list for the comment trigger (mirrors the job-level if: guard).
GADFLY_ALLOWED_USERS: "steve,fizi,dazed"
# --- findings telemetry: POST runs + findings to the gadfly-reports store ---
# Advisory & off unless GADFLY_FINDINGS_URL is set; failures only log to
# stderr and never affect the review. GADFLY_REPO / GADFLY_PR are derived
# in-container; the URL + token are user-scope secrets.
GADFLY_FINDINGS_URL: ${{ secrets.GADFLY_FINDINGS_URL }}
GADFLY_FINDINGS_TOKEN: ${{ secrets.GADFLY_FINDINGS_TOKEN }}
# --- event context (leave as-is) ---
EVENT_NAME: ${{ github.event_name }}
PR: ${{ github.event.pull_request.number || github.event.issue.number || github.event.inputs.pr_number }}
PR_BRANCH: ${{ github.head_ref }}
IS_DRAFT: ${{ github.event.pull_request.draft }}
COMMENT_BODY: ${{ github.event.comment.body }}
COMMENT_ID: ${{ github.event.comment.id }}
ACTOR: ${{ github.actor }}
# Pinned to an immutable gadfly commit (not @v1): our act_runners are long-lived
# and cache the reusable-workflow ref, so a moved v1 tag keeps resolving to the
# stale cached copy. A unique sha forces a cache miss → fresh fetch. Bump this
# sha to adopt central swarm changes.
uses: steve/gadfly/.gitea/workflows/review-reusable.yml@5007597cf921dc3f0a83c708878facfe65fd8e8b
# Least privilege: forward only the review secrets (not `secrets: inherit`,
# which would expose every repo secret). GITEA_TOKEN is the automatic token.
secrets:
OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }}
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
GADFLY_FINDINGS_URL: ${{ secrets.GADFLY_FINDINGS_URL }}
GADFLY_FINDINGS_TOKEN: ${{ secrets.GADFLY_FINDINGS_TOKEN }}
with:
# Consumer-specific allow-list; everything else is inherited.
allowed_users: "steve,fizi,dazed"
+1 -1
View File
@@ -37,7 +37,7 @@ bot) — mort and gadfly are the first two consumers (heavy and light). See
tool registry, majordomo's agent loop, context compaction, run-bounding, and
step/audit instrumentation into one `Run(ctx, RunnableAgent, inv) Result`, with
every host concern behind a nil-safe `run.Ports` (Audit/Budget/Critic/
Checkpointer/PaletteSource/Delivery). See `examples/minimal`.
Checkpointer/PaletteSource/Delivery/InputFiles). See `examples/minimal`.
- `model/` — config-driven tier resolution + failover over majordomo, with
pluggable `UsageSink`/`TraceSink` and `GenerateWith[T]` structured output.
- `tool/` — the tool registry + 3-stage permission model + SSRF guard.
+33 -1
View File
@@ -4,6 +4,7 @@ import (
"context"
"errors"
"fmt"
"strings"
"time"
"gitea.stevedudenhoeffer.com/steve/majordomo/agent"
@@ -317,10 +318,16 @@ func (e *Executor) Run(ctx context.Context, ra RunnableAgent, inv tool.Invocatio
}
ag := agent.New(model, e.systemPrompt(ra), opts...)
// Stage non-image input attachments (audio/PDF/binary) into the host file
// store and fold an [ATTACHED FILES] descriptor into the prompt so the agent
// can reach them by file_id. No-op when Ports.InputFiles is nil or there are
// no files. Done after the model/toolbox build but before the loop, so the
// descriptor rides the very first user turn.
input = e.stageInputFiles(runCtx, inv.RunID, ra.ID, inv.InputFiles, input)
// One WithSteer drains BOTH the session mailbox (a tool's AttachImages) and
// the critic's nudges before each step.
steer := func() []llm.Message { return append(mailbox.drain(), critic.drainSteer()...) }
runRes, runErr := ag.Run(runCtx, input, agent.WithSteer(steer))
runRes, runErr := runAgent(runCtx, ag, input, inv.Images, agent.WithSteer(steer))
status := statusFor(runCtx, runErr)
if runRes != nil {
@@ -440,3 +447,28 @@ func detach(ctx context.Context) context.Context {
_ = cancel // bounded by the timeout; nothing to cancel early
return c
}
// runAgent dispatches the majordomo agent loop. majordomo's Run takes a text-only
// input arg, so when the invocation carries images they're folded into the first
// user message (text + image parts) via WithHistory and Run is called with an
// empty input — the model then sees a multimodal opening turn. The image-less path
// passes the prompt straight through.
//
// The text part is omitted when input is blank (image-only run), matching
// runSession.AttachImages so no empty TextPart is sent.
func runAgent(ctx context.Context, ag *agent.Agent, input string, images []llm.ImagePart, opts ...agent.RunOption) (*agent.Result, error) {
if len(images) == 0 {
return ag.Run(ctx, input, opts...)
}
parts := make([]llm.Part, 0, len(images)+1)
if strings.TrimSpace(input) != "" {
parts = append(parts, llm.Text(input))
}
for _, img := range images {
parts = append(parts, img)
}
// Copy opts before appending so a caller-supplied backing array is never
// mutated/aliased (the variadic slice can have spare capacity).
opts = append(opts[:len(opts):len(opts)], agent.WithHistory([]llm.Message{llm.UserParts(parts...)}))
return ag.Run(ctx, "", opts...)
}
+121
View File
@@ -0,0 +1,121 @@
package run_test
import (
"context"
"strings"
"testing"
"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
"gitea.stevedudenhoeffer.com/steve/majordomo/provider/fake"
"gitea.stevedudenhoeffer.com/steve/executus/run"
"gitea.stevedudenhoeffer.com/steve/executus/tool"
)
// TestExecutorFoldsInitialImages: when the invocation carries Images, they're
// folded into the first user message (alongside the prompt text) instead of being
// dropped — majordomo's Run input arg is text-only, so the executor seeds the
// multimodal opening turn via history.
func TestExecutorFoldsInitialImages(t *testing.T) {
fp := fake.New("fake")
fp.Enqueue("m", fake.Reply("saw the image"))
m, _ := fp.Model("m")
img := llm.ImagePart{MIME: "image/png", Data: []byte("PNGDATA")}
inv := tool.Invocation{RunID: "r1", Images: []llm.ImagePart{img}}
ex := run.New(run.Config{
Registry: tool.NewRegistry(),
Models: func(ctx context.Context, _ string) (context.Context, llm.Model, error) { return ctx, m, nil },
})
res := ex.Run(context.Background(), run.RunnableAgent{ModelTier: "m"}, inv, "describe this")
if res.Err != nil {
t.Fatalf("run error: %v", res.Err)
}
calls := fp.Calls()
if len(calls) == 0 {
t.Fatal("no model calls recorded")
}
// The text + image must be CO-LOCATED in a single user message (not split
// across two), so the model reads them as one multimodal turn.
coLocated := false
for _, msg := range calls[0].Request.Messages {
sawImage, sawText := false, false
for _, p := range msg.Parts {
switch pp := p.(type) {
case llm.ImagePart:
if string(pp.Data) == "PNGDATA" {
sawImage = true
}
case llm.TextPart:
if strings.Contains(pp.Text, "describe this") {
sawText = true
}
}
}
if sawImage && sawText {
coLocated = true
}
}
if !coLocated {
t.Error("image + prompt text were not folded into the SAME user message")
}
}
// TestExecutorImageOnlyNoBlankText: an image-only run (blank prompt) must NOT emit
// an empty TextPart — the message carries just the image, matching
// runSession.AttachImages's guard.
func TestExecutorImageOnlyNoBlankText(t *testing.T) {
fp := fake.New("fake")
fp.Enqueue("m", fake.Reply("saw it"))
m, _ := fp.Model("m")
inv := tool.Invocation{RunID: "r3", Images: []llm.ImagePart{{MIME: "image/png", Data: []byte("IMG")}}}
ex := run.New(run.Config{
Registry: tool.NewRegistry(),
Models: func(ctx context.Context, _ string) (context.Context, llm.Model, error) { return ctx, m, nil },
})
res := ex.Run(context.Background(), run.RunnableAgent{ModelTier: "m"}, inv, " ")
if res.Err != nil {
t.Fatalf("run error: %v", res.Err)
}
for _, msg := range fp.Calls()[0].Request.Messages {
for _, p := range msg.Parts {
if tp, ok := p.(llm.TextPart); ok && strings.TrimSpace(tp.Text) == "" {
t.Error("image-only run emitted a blank TextPart")
}
}
}
}
// TestExecutorTextOnlyUnchanged: with no Images, the prompt flows through as the
// text input (regression guard that the fold path didn't break the common case).
func TestExecutorTextOnlyUnchanged(t *testing.T) {
fp := fake.New("fake")
fp.Enqueue("m", fake.Reply("ok"))
m, _ := fp.Model("m")
ex := run.New(run.Config{
Registry: tool.NewRegistry(),
Models: func(ctx context.Context, _ string) (context.Context, llm.Model, error) { return ctx, m, nil },
})
res := ex.Run(context.Background(), run.RunnableAgent{ModelTier: "m"}, tool.Invocation{RunID: "r2"}, "plain prompt")
if res.Err != nil {
t.Fatalf("run error: %v", res.Err)
}
calls := fp.Calls()
if len(calls) == 0 {
t.Fatal("no model calls recorded")
}
sawText := false
for _, msg := range calls[0].Request.Messages {
for _, p := range msg.Parts {
if tp, ok := p.(llm.TextPart); ok && strings.Contains(tp.Text, "plain prompt") {
sawText = true
}
}
}
if !sawText {
t.Error("text-only prompt did not reach the model")
}
}
+179
View File
@@ -0,0 +1,179 @@
package run
import (
"context"
"fmt"
"log/slog"
"path"
"strings"
"unicode"
"gitea.stevedudenhoeffer.com/steve/executus/tool"
)
// maxInputFileBytes is a defense-in-depth cap at the staging boundary. A host's
// extraction path may already cap downloads, but stageInputFiles is the trust
// boundary for the InputFiles seam: a call site or bug that populates InputFiles
// directly must not write an unbounded blob to the host file store.
const maxInputFileBytes = 50_000_000
// maxInputFiles bounds how many attachments a single run stages, independent of
// the per-file byte cap — defense-in-depth against a flood of tiny files.
const maxInputFiles = 32
// stageInputFiles persists each non-image input attachment into the host file
// store (Ports.InputFiles) under run scope and appends a descriptor block to the
// prompt so the agent knows the file_ids it can pass to a worker tool. The bytes
// are NOT inlined into the model context — the LLM can't read raw audio/binary —
// so the agent reaches them via a file_id-aware tool (e.g. code_exec files_in,
// which writes the file to /workspace/<name>).
//
// Best-effort: a nil stager, no files, or a per-file save error degrades to
// "skip that file" — the run still proceeds. Returns the (possibly augmented)
// prompt.
func (e *Executor) stageInputFiles(ctx context.Context, runID, agentID string, files []tool.InputFile, prompt string) string {
if e.cfg.Ports.InputFiles == nil || len(files) == 0 {
return prompt
}
// Count cap: bound how many attachments one run can stage, independent of the
// per-file byte cap (defense-in-depth against a flood of tiny files).
if len(files) > maxInputFiles {
slog.Warn("run: too many input files, truncating",
"agent", agentID, "run_id", runID, "count", len(files), "cap", maxInputFiles)
files = files[:maxInputFiles]
}
type stagedFile struct {
name, mime, fileID string
size int
}
var staged []stagedFile
seenNames := make(map[string]int, len(files))
for _, f := range files {
if len(f.Data) == 0 {
slog.Warn("run: skipping empty input file",
"agent", agentID, "run_id", runID, "name", f.Name)
continue
}
if len(f.Data) > maxInputFileBytes {
slog.Warn("run: skipping oversized input file",
"agent", agentID, "run_id", runID, "name", f.Name,
"size", len(f.Data), "cap", maxInputFileBytes)
continue
}
// Reduce the untrusted filename to a safe base name BEFORE staging or
// inlining: strips ../ and absolute-path components (so it can't escape
// the host store or /workspace/<name>) and drops control chars/newlines
// (so a crafted name can't inject text into the descriptor block below).
// Then disambiguate colliding base names so two attachments don't both map
// to /workspace/<name> (the second would clobber the first).
name := uniqueName(sanitizeName(f.Name), seenNames)
// Sanitize the mime ONCE and pass the clean value to both the host store
// and the descriptor (don't hand the raw value to StageInputFile).
mime := sanitizeField(f.MimeType)
fileID, err := e.cfg.Ports.InputFiles.StageInputFile(ctx, runID, agentID, name, mime, f.Data)
if err != nil {
slog.Warn("run: failed to stage input file",
"agent", agentID, "run_id", runID, "name", name, "error", err)
continue
}
if fileID == "" {
slog.Warn("run: stager returned empty file_id, skipping",
"agent", agentID, "run_id", runID, "name", name)
continue
}
// fileID is host-generated, but sanitize it too before inlining — the
// descriptor must never carry control chars no matter the stager impl.
staged = append(staged, stagedFile{name: name, mime: mime, fileID: sanitizeField(fileID), size: len(f.Data)})
}
if len(staged) == 0 {
return prompt
}
var b strings.Builder
b.WriteString("[ATTACHED FILES]\n")
b.WriteString("The user attached the following file(s). Their contents are NOT included in this prompt and you cannot read them directly. ")
b.WriteString("To work with one, call the code_exec tool with a files_in entry — e.g. ")
b.WriteString(`files_in: [{"name": "<name>", "file_id": "<file_id>"}]`)
b.WriteString(" — which writes it to /workspace/<name> inside the Python sandbox. You may also pass a file_id to any other tool that accepts one.\n")
for _, s := range staged {
fmt.Fprintf(&b, "- %s (%s, %s) → file_id: %s\n", s.name, s.mime, humanizeBytes(s.size), s.fileID)
}
if strings.TrimSpace(prompt) == "" {
return b.String()
}
return prompt + "\n\n" + b.String()
}
// sanitizeName reduces an untrusted attachment filename to a safe base name. It
// drops control characters / newlines (which would otherwise let a crafted name
// inject text into the [ATTACHED FILES] descriptor) and strips every directory
// component — defeating ../ traversal, nested dirs, and absolute / drive paths
// both in the host file store and at /workspace/<name>. Returns "attachment"
// when nothing usable remains (empty, ".", "..").
func sanitizeName(name string) string {
name = sanitizeField(name)
// Normalize backslashes so a Windows-style path also reduces to its base.
base := path.Base(strings.ReplaceAll(name, `\`, "/"))
base = strings.TrimSpace(base)
if base == "" || base == "." || base == ".." {
return "attachment"
}
return base
}
// sanitizeField strips characters that could let a value inlined verbatim into
// the prompt descriptor break out of its line or visually mislead: control
// characters (IsControl covers newlines/tabs) AND Unicode format characters
// (category Cf — e.g. the bidi overrides U+202AU+202E, which can reorder how
// the descriptor renders).
func sanitizeField(s string) string {
return strings.Map(func(r rune) rune {
if unicode.IsControl(r) || unicode.Is(unicode.Cf, r) {
return -1
}
return r
}, s)
}
// uniqueName returns name unchanged the first time it's seen, then name-2,
// name-3, … (suffix inserted before the extension) on repeats, recording each
// result in seen so later collisions keep counting up.
func uniqueName(name string, seen map[string]int) string {
if seen[name] == 0 {
seen[name]++
return name
}
ext := path.Ext(name)
base := strings.TrimSuffix(name, ext)
for {
seen[name]++
candidate := fmt.Sprintf("%s-%d%s", base, seen[name], ext)
if seen[candidate] == 0 {
seen[candidate]++
return candidate
}
}
}
// humanizeBytes renders a byte count as a short human-readable string (e.g.
// "2.1 MB") for the attached-files descriptor block.
func humanizeBytes(n int) string {
if n < 0 {
n = 0
}
const unit = 1024
if n < unit {
return fmt.Sprintf("%d B", n)
}
const prefixes = "KMGTPE"
div, exp := int64(unit), 0
// Clamp exp to the last prefix so an absurd size (≥1024^7) can't index past
// "KMGTPE" and panic — a no-panic guarantee independent of the per-file cap.
for v := int64(n) / unit; v >= unit && exp < len(prefixes)-1; v /= unit {
div *= unit
exp++
}
return fmt.Sprintf("%.1f %cB", float64(n)/float64(div), prefixes[exp])
}
+243
View File
@@ -0,0 +1,243 @@
package run
import (
"context"
"errors"
"strings"
"testing"
"gitea.stevedudenhoeffer.com/steve/majordomo/llm"
"gitea.stevedudenhoeffer.com/steve/executus/tool"
)
// stagerFunc is a test InputFileStager: it records each staged file and returns
// a deterministic file_id ("file_<name>"), or an error if err is set.
type stagerFunc struct {
staged []stagedRec
err error
}
type stagedRec struct {
runID, agentID, name, mime string
size int
}
func (s *stagerFunc) StageInputFile(_ context.Context, runID, agentID, name, mime string, content []byte) (string, error) {
if s.err != nil {
return "", s.err
}
s.staged = append(s.staged, stagedRec{runID, agentID, name, mime, len(content)})
return "file_" + name, nil
}
func newStagerExecutor(s InputFileStager) *Executor {
return New(Config{
Registry: tool.NewRegistry(),
Models: func(ctx context.Context, _ string) (context.Context, llm.Model, error) { return ctx, nil, nil },
Ports: Ports{InputFiles: s},
})
}
// TestStageInputFiles: files are staged via the port and an [ATTACHED FILES]
// descriptor (with each file_id) is appended to the prompt.
func TestStageInputFiles(t *testing.T) {
st := &stagerFunc{}
ex := newStagerExecutor(st)
out := ex.stageInputFiles(context.Background(), "run-1", "agent-1",
[]tool.InputFile{{Name: "clip.mp3", MimeType: "audio/mpeg", Data: []byte("abcd")}},
"transcribe this")
if len(st.staged) != 1 || st.staged[0].name != "clip.mp3" {
t.Fatalf("staged = %+v, want one clip.mp3", st.staged)
}
if st.staged[0].runID != "run-1" || st.staged[0].agentID != "agent-1" {
t.Errorf("stager got runID/agentID = %q/%q, want run-1/agent-1", st.staged[0].runID, st.staged[0].agentID)
}
for _, want := range []string{"transcribe this", "[ATTACHED FILES]", "clip.mp3", "file_clip.mp3", "audio/mpeg"} {
if !strings.Contains(out, want) {
t.Errorf("output missing %q:\n%s", want, out)
}
}
}
// TestStageInputFilesNoStager: a nil port leaves the prompt untouched and never
// drops the run.
func TestStageInputFilesNoStager(t *testing.T) {
ex := newStagerExecutor(nil) // Ports.InputFiles == nil
out := ex.stageInputFiles(context.Background(), "r", "a",
[]tool.InputFile{{Name: "x.bin", Data: []byte("z")}}, "prompt")
if out != "prompt" {
t.Errorf("nil stager changed the prompt: %q", out)
}
}
// TestStageInputFilesNoFiles: no attachments leaves the prompt untouched.
func TestStageInputFilesNoFiles(t *testing.T) {
ex := newStagerExecutor(&stagerFunc{})
out := ex.stageInputFiles(context.Background(), "r", "a", nil, "prompt")
if out != "prompt" {
t.Errorf("no files changed the prompt: %q", out)
}
}
// TestStageInputFilesDedup: colliding base names are disambiguated so they don't
// clobber each other at /workspace/<name>.
func TestStageInputFilesDedup(t *testing.T) {
st := &stagerFunc{}
ex := newStagerExecutor(st)
out := ex.stageInputFiles(context.Background(), "r", "a", []tool.InputFile{
{Name: "a.wav", MimeType: "audio/wav", Data: []byte("1")},
{Name: "a.wav", MimeType: "audio/wav", Data: []byte("2")},
}, "go")
if len(st.staged) != 2 {
t.Fatalf("staged %d files, want 2", len(st.staged))
}
if st.staged[0].name != "a.wav" || st.staged[1].name != "a-2.wav" {
t.Errorf("dedup names = %q, %q; want a.wav, a-2.wav", st.staged[0].name, st.staged[1].name)
}
if !strings.Contains(out, "a-2.wav") {
t.Errorf("output missing disambiguated name:\n%s", out)
}
}
// TestStageInputFilesSkipsBad: empty + oversized files are skipped; a save error
// drops only that file. With nothing staged, the prompt is unchanged.
func TestStageInputFilesSkipsBad(t *testing.T) {
// Empty data → skipped; with no good files the prompt is returned as-is.
ex := newStagerExecutor(&stagerFunc{})
if out := ex.stageInputFiles(context.Background(), "r", "a",
[]tool.InputFile{{Name: "empty.bin", Data: nil}}, "p"); out != "p" {
t.Errorf("empty file should be skipped, got %q", out)
}
// A stager error → that file is dropped; nothing staged → prompt unchanged.
exErr := newStagerExecutor(&stagerFunc{err: errors.New("disk full")})
if out := exErr.stageInputFiles(context.Background(), "r", "a",
[]tool.InputFile{{Name: "x.bin", Data: []byte("z")}}, "p"); out != "p" {
t.Errorf("save error should drop the file and leave the prompt, got %q", out)
}
}
// TestStageInputFilesOversize: a file past the byte cap is skipped (prompt
// unchanged), exercising the size guard directly.
func TestStageInputFilesOversize(t *testing.T) {
st := &stagerFunc{}
ex := newStagerExecutor(st)
big := make([]byte, maxInputFileBytes+1)
out := ex.stageInputFiles(context.Background(), "r", "a",
[]tool.InputFile{{Name: "huge.bin", Data: big}}, "p")
if out != "p" || len(st.staged) != 0 {
t.Errorf("oversized file should be skipped: out=%q staged=%d", out, len(st.staged))
}
}
// TestStageInputFilesCountCap: more than maxInputFiles attachments are truncated
// to the cap.
func TestStageInputFilesCountCap(t *testing.T) {
st := &stagerFunc{}
ex := newStagerExecutor(st)
files := make([]tool.InputFile, maxInputFiles+5)
for i := range files {
files[i] = tool.InputFile{Name: "f.bin", Data: []byte("x")}
}
ex.stageInputFiles(context.Background(), "r", "a", files, "p")
if len(st.staged) != maxInputFiles {
t.Errorf("count cap: staged %d, want %d", len(st.staged), maxInputFiles)
}
}
// TestSanitizeName: traversal + absolute + control-char filenames are reduced to
// a safe base name (no path separators, no newlines), with a fallback.
func TestSanitizeName(t *testing.T) {
cases := map[string]string{
"../../etc/passwd": "passwd",
"/etc/cron.d/x": "x",
`..\..\windows\sys`: "sys",
"clip.mp3": "clip.mp3",
"": "attachment",
"..": "attachment",
".": "attachment",
"evil\n- injected": "evil- injected",
"a/b/c.wav": "c.wav",
}
for in, want := range cases {
if got := sanitizeName(in); got != want {
t.Errorf("sanitizeName(%q) = %q, want %q", in, got, want)
}
// A sanitized name must never carry a path separator or newline.
got := sanitizeName(in)
if strings.ContainsAny(got, "/\\\n\r") {
t.Errorf("sanitizeName(%q) = %q still contains a separator/newline", in, got)
}
}
}
// TestStageInputFilesSanitizesTraversal: a traversal filename is staged AND
// described under its safe base name only.
func TestStageInputFilesSanitizesTraversal(t *testing.T) {
st := &stagerFunc{}
ex := newStagerExecutor(st)
out := ex.stageInputFiles(context.Background(), "r", "a",
[]tool.InputFile{{Name: "../../../etc/passwd", MimeType: "text/plain", Data: []byte("x")}}, "go")
if len(st.staged) != 1 || st.staged[0].name != "passwd" {
t.Fatalf("staged name = %+v, want passwd", st.staged)
}
if strings.Contains(out, "..") || strings.Contains(out, "/etc/") {
t.Errorf("descriptor leaked the traversal path:\n%s", out)
}
}
// TestSanitizeFieldStripsBidiAndControl: control chars AND Unicode format/bidi
// overrides are removed from inlined values.
func TestSanitizeFieldStripsBidiAndControl(t *testing.T) {
in := "audio/mpg\n; rm -rf" // bidi override + newline
got := sanitizeField(in)
if strings.ContainsAny(got, "\n\r\t") || strings.ContainsRune(got, '') {
t.Errorf("sanitizeField left control/bidi chars: %q", got)
}
}
// TestStageInputFilesSanitizesMime: a mime with a control char is cleaned in BOTH
// the staged value and the descriptor.
func TestStageInputFilesSanitizesMime(t *testing.T) {
st := &stagerFunc{}
ex := newStagerExecutor(st)
out := ex.stageInputFiles(context.Background(), "r", "a",
[]tool.InputFile{{Name: "c.wav", MimeType: "audio/wav\ninjected", Data: []byte("x")}}, "go")
if len(st.staged) != 1 || strings.ContainsAny(st.staged[0].mime, "\n\r") {
t.Errorf("mime not sanitized before staging: %+v", st.staged)
}
if strings.Contains(out, "\ninjected") {
t.Errorf("descriptor carried an unsanitized mime newline:\n%s", out)
}
}
// TestStageInputFilesEmptyFileID: a stager returning an empty file_id drops the
// file (no blank file_id in the descriptor).
func TestStageInputFilesEmptyFileID(t *testing.T) {
ex := newStagerExecutor(emptyIDStager{})
out := ex.stageInputFiles(context.Background(), "r", "a",
[]tool.InputFile{{Name: "x.bin", Data: []byte("z")}}, "p")
if out != "p" {
t.Errorf("empty file_id should drop the file, got %q", out)
}
}
type emptyIDStager struct{}
func (emptyIDStager) StageInputFile(context.Context, string, string, string, string, []byte) (string, error) {
return "", nil
}
// TestHumanizeBytesNoPanic: an absurd size clamps to the last prefix instead of
// indexing past "KMGTPE".
func TestHumanizeBytesNoPanic(t *testing.T) {
defer func() {
if r := recover(); r != nil {
t.Fatalf("humanizeBytes panicked: %v", r)
}
}()
for _, n := range []int{0, 512, 2048, 5_000_000, 1 << 62} {
_ = humanizeBytes(n)
}
}
+14
View File
@@ -42,6 +42,20 @@ type Ports struct {
// Delivery is where the run's output + artifacts go. nil = the caller
// reads the Result in-process (the light-host default).
Delivery deliver.Delivery
// InputFiles persists non-image input attachments (audio, PDF, binary)
// carried on Invocation.InputFiles into a host file store under run scope,
// returning file_ids the agent can hand to a worker tool. nil = input files
// are silently ignored (the run still proceeds, text-only). The bytes are
// never inlined into the model context — the LLM can't read raw audio/binary.
InputFiles InputFileStager
}
// InputFileStager persists a single non-image input attachment into a host file
// store under run scope and returns a file_id the run can reference. It is the
// seam mort's skill FileStorage (and any host blob store) implements so the
// kernel can stage Invocation.InputFiles without importing a storage layer.
type InputFileStager interface {
StageInputFile(ctx context.Context, runID, agentID, name, mime string, content []byte) (fileID string, err error)
}
// RunInfo describes a run at start time — the attribution a recorder/critic
+4 -3
View File
@@ -154,9 +154,10 @@ type ContinuationContext struct {
// InputFile is a non-image file the user supplied with a run (audio,
// etc.). The executor stages it into the file store under run scope and
// surfaces its file_id to the agent. Name is a safe base name (no path
// separators) suitable for /workspace/<name>; MimeType is the resolved
// content type; Data is the raw bytes.
// surfaces its file_id to the agent. Name may be an untrusted attachment
// filename — the executor reduces it to a safe base name (stripping path
// separators + control chars) before staging or exposing it as
// /workspace/<name>; MimeType is the resolved content type; Data is the raw bytes.
type InputFile struct {
Name string
MimeType string