Files
executus/tools/file_search.go
T
steve ee6e9ef9f8
executus CI / test (pull_request) Successful in 59s
fix: address verified gadfly P3 review (3-cloud fleet)
All 3 cloud models converged on a real access-control bug; fixed it + the
other genuine findings (the false-positives were dropped):

Security (HIGH — all 3 models):
- create_file_url skipped ValidateScope: a same-skill caller could mint a
  PUBLIC url for a file scoped to another user/run. Now runs ValidateScope
  (admin-aware), skipped only for the descendant-grant case — mirroring the
  read tools.

Other real fixes:
- ValidateScope hard-coded `false` at every call site (admin branch dead) ->
  pass inv.CallerIsAdmin (the executor sets it via the host AdminPolicy; still
  false/fail-closed when no admin). Stale "no admin flag" comment corrected.
- create_file_url: ExpiresInSeconds clamped BEFORE the *time.Second multiply
  (huge values overflowed to a negative duration that slipped under the cap,
  minting already-expired tokens); swallowed json.Marshal error now returned.
- RegisterMeta: build the default budget WITH the configured MaxPerRun (was
  NewInMemorySearchBudget(nil) -> hardcoded 10, ignoring MetaDeps.MaxPerRun).
- classify: all-zero scores no longer return a false-positive top-1 winner;
  coerceClassifyScore uses strconv.ParseFloat (rejects trailing garbage like
  "50extra" that fmt.Sscanf silently accepted).
- file_delete: honor the descendant grant (parent can clean up a worker's
  artifacts) — was the lone cross-skill-reject-outright file tool.
- meta tools: input caps truncate at a UTF-8 rune boundary (truncateUTF8), not
  mid-rune.
- think: removed the dead `var _ = fmt.Errorf` import-keeper; file_save default
  aligned to 16 MiB (matched RegisterStore).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:31:59 -04:00

132 lines
4.8 KiB
Go

// file_search runs a token-AND search over the per-skill (or, for
// admin authors, cross-skill) file index. Returns up to N matches with
// {file_id, name, snippet, score}.
//
// Why admin-authoring only: a public skill could otherwise probe
// other skills' file content via cross-skill search. Restricting the
// tool's authoring requirement to admins blocks shared/public skills
// from depending on file_search at all (it never appears in their
// allowed-tool catalog at save time). Within a private skill,
// admin-authored or otherwise, scope is per-call: the handler always
// pins skill_id to inv.SkillID — no matter what the LLM-supplied scope
// arg says — so a non-admin caller invoking an admin-authored public
// skill cannot escape the skill's own bucket.
//
// Why use Storage's SearchFiles directly: token logic + scoring lives
// in the skills package. The handler is a thin transcoder.
package tools
import (
"context"
"encoding/json"
"fmt"
"gitea.stevedudenhoeffer.com/steve/executus/tool"
)
// FileSearcher is the narrow surface the file_search tool needs.
// Production wiring (mort.go) bridges *skills.System.Storage().
// nil-safe: a nil FileSearcher surfaces "not configured" at the first
// call.
type FileSearcher interface {
SearchFiles(ctx context.Context, skillID, scope, query string, limit int) ([]FileSearchDomainHit, error)
}
// FileSearchDomainHit mirrors skills.FileSearchHit (cycle-break domain
// shape). The production adapter is a struct copy.
type FileSearchDomainHit struct {
FileID string
SkillID string
Scope string
Name string
MimeType string
Snippet string
Score int
}
type fileSearchArgs struct {
Query string `json:"query" description:"Free-text search query. Tokenised, lowercased, ANDed."`
Scope string `json:"scope,omitempty" description:"Optional storage scope to restrict the search ('skill', 'user:<your_id>', 'run:<run_id>'). Empty = all scopes within this skill."`
Limit int `json:"limit,omitempty" description:"Optional max hits to return (default 25, max 100)."`
}
type fileSearchHit struct {
FileID string `json:"file_id"`
Name string `json:"name"`
Mime string `json:"mime,omitempty"`
Snippet string `json:"snippet,omitempty"`
Score int `json:"score"`
}
// NewFileSearch constructs the file_search tool. Authoring-required
// admin so non-admins can't include this tool in shared/public skills
// (the share-safety check rejects share+admin-only as private-only).
//
// Wait — if the tool is admin-authoring AND share-safe, an admin could
// author a public skill that uses it. That's the desired flow: admin
// curates the skill, but the privacy property still holds because the
// handler PINS skill_id to inv.SkillID. A non-admin caller of the
// public skill can ONLY search files within that skill's bucket, not
// cross-skill.
//
// Setting SafeForShare=false would force this tool to be private-only;
// that's needlessly restrictive. The privacy property comes from the
// per-call skill_id pin, not from share-time gating.
func NewFileSearch(searcher FileSearcher) tool.Tool {
return tool.NewGatedTool[fileSearchArgs](
"file_search",
"Full-text search over this skill's saved files. Returns array of {file_id, name, snippet, score} ordered by score desc. Tokens are lowercased + ANDed. Admin-authored only — non-admin callers of an admin-authored public skill still see only that skill's files.",
tool.Permission{
AuthoringRequirement: tool.RequirementAdmin,
OperatesOn: tool.ScopeCaller,
SafeForShare: true,
Categories: []string{"storage", "read"},
},
func(ctx context.Context, inv tool.Invocation, args fileSearchArgs) (string, error) {
if searcher == nil {
return "", fmt.Errorf("file_search: not configured")
}
if args.Query == "" {
return "", fmt.Errorf("file_search: query required")
}
limit := args.Limit
if limit <= 0 {
limit = 25
}
if limit > 100 {
limit = 100
}
scope := args.Scope
if scope != "" {
if err := ValidateScope(inv, scope, inv.CallerIsAdmin); err != nil {
return "", fmt.Errorf("file_search: %w", err)
}
}
// Pin skill_id to the invoking skill — even if the LLM
// supplies a different value somewhere, the handler always
// scopes to inv.SkillID. This is the privacy guarantee
// referenced in the package doc.
rows, err := searcher.SearchFiles(ctx, inv.SkillID, scope, args.Query, limit)
if err != nil {
return "", fmt.Errorf("file_search: %w", err)
}
out := make([]fileSearchHit, 0, len(rows))
for _, r := range rows {
out = append(out, fileSearchHit{
FileID: r.FileID,
Name: r.Name,
Mime: r.MimeType,
Snippet: r.Snippet,
Score: r.Score,
})
}
b, err := json.Marshal(out)
if err != nil {
return "", fmt.Errorf("file_search: marshal: %w", err)
}
return string(b), nil
},
)
}