P3: generic tool library (think/now/cite + meta + store groups) #3

Closed
steve wants to merge 4 commits from phase-3-tools into main
Owner

Phase 3: stand up executus/tools — the generic, host-agnostic tool library a host registers against a tool.Registry, then names in RunnableAgent.LowLevelTools.

What's here

  • Register(reg) — always-available, zero-config tools: think, now (UTC unless a CurrentTimeProvider is wired), cite (inert unless a CitationStorage is wired). All nil-safe.
  • RegisterMeta(reg, MetaDeps) — LLM-backed: classify, extract_entities, summarize over the llmmeta helper. Budget defaults to a shipped in-memory per-run cap.
  • RegisterStore(reg, StoreDeps) — agent memory: kv_get/set/list/delete + file_save/get/get_text/get_metadata/list/delete (+ file_search, create_file_url when wired). Quota defaults to a generous static cap; kv and file groups register independently.
  • Host-agnostic seams moved interface-only (no host coupling): research_providers.go, file_storage.go, kv_storage.go, quota_provider.go, file_descendant_grant.go, plus the in-memory budget default.

Verification

  • go build/vet/test -race ./... green; core go.sum still free of gorm/redis/discordgo/sqlite/wolfram.
  • End-to-end test: the executor runs an agent that calls a registered tool (fake model emits a think tool call → dispatched through the registry → finalised; step instrumentation captures it).

Notes

calculate was deferred (drags go-wolfram + a module-path replace — not worth it in the lean core for one tool). Remaining P3: web/net/compose groups + their default backends (DDG searcher, stdlib readability, SSRF http). The default in-memory KV/File store backends arrive with contrib/store at P4.

Stacked on the now-merged P2. Gadfly review (3 cloud models) is advisory — findings will be independently verified.

🤖 Generated with Claude Code

Phase 3: stand up `executus/tools` — the generic, host-agnostic tool library a host registers against a `tool.Registry`, then names in `RunnableAgent.LowLevelTools`. ## What's here - **`Register(reg)`** — always-available, zero-config tools: `think`, `now` (UTC unless a `CurrentTimeProvider` is wired), `cite` (inert unless a `CitationStorage` is wired). All nil-safe. - **`RegisterMeta(reg, MetaDeps)`** — LLM-backed: `classify`, `extract_entities`, `summarize` over the `llmmeta` helper. Budget defaults to a shipped in-memory per-run cap. - **`RegisterStore(reg, StoreDeps)`** — agent memory: `kv_get/set/list/delete` + `file_save/get/get_text/get_metadata/list/delete` (+ `file_search`, `create_file_url` when wired). Quota defaults to a generous static cap; kv and file groups register independently. - Host-agnostic seams moved interface-only (no host coupling): `research_providers.go`, `file_storage.go`, `kv_storage.go`, `quota_provider.go`, `file_descendant_grant.go`, plus the in-memory budget default. ## Verification - `go build/vet/test -race ./...` green; core `go.sum` still free of gorm/redis/discordgo/sqlite/wolfram. - **End-to-end test**: the executor runs an agent that *calls* a registered tool (fake model emits a `think` tool call → dispatched through the registry → finalised; step instrumentation captures it). ## Notes `calculate` was deferred (drags `go-wolfram` + a module-path replace — not worth it in the lean core for one tool). Remaining P3: web/net/compose groups + their default backends (DDG searcher, stdlib readability, SSRF http). The default in-memory KV/File store backends arrive with `contrib/store` at P4. Stacked on the now-merged P2. Gadfly review (3 cloud models) is advisory — findings will be independently verified. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 3 commits 2026-06-27 02:07:10 +00:00
Stand up executus/tools — the generic, host-agnostic tool library — and prove
the full pattern end to end:

- tools/tools.go: Register(reg) adds the always-available zero-dependency tools
  (currently `think`). A light host calls it and is immediately useful; backed
  tools (web/store/meta groups) will register via grouped registrars with
  nil-safe Deps as they land.
- tools/think.go: the `think` tool moved from mort (imports only executus/tool).
- tools/integration_test.go: end-to-end proof that the executor runs an agent
  which CALLS a registered tool — the fake model emits a `think` tool call, the
  executor dispatches it through the registry, the model finalises, and the step
  instrumentation captures the `think` step. Exercises the full tool-dispatch
  loop through run.Executor.

Stacked on phase-2-run-kernel (P3 needs run.Executor). Remaining P3: the
meta/web/net/store/compose groups + their Deps + default backends (splitting
mort's default.go grab-bag).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Grow executus/tools into a real generic tool library:

- Register(reg): the always-available, zero-config tools — think, now (UTC
  unless a CurrentTimeProvider is wired), cite (inert unless a CitationStorage
  is wired). All nil-safe; a light host calls Register and is useful.
- RegisterMeta(reg, MetaDeps): the LLM-backed meta tools — classify,
  extract_entities, summarize — over the llmmeta helper. Budget defaults to the
  shipped in-memory per-run cap; Files optional; caps default.
- Seams moved (interface/type-only, no host coupling): research_providers.go
  (CurrentTimeProvider/CitationStorage/SearchBudget/PageExtractor/PDFFetcher/…)
  and file_storage.go (FileStorage + FileDomainMeta). Plus the in-memory budget
  default (research_defaults.go) and scope_validate.go.

calculate deferred (drags github.com/Krognol/go-wolfram + a module-path replace
— not worth it in the lean core for one tool). Core go.sum still free of
gorm/redis/discordgo/sqlite/wolfram.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P3: store group — kv_* + file_* tools (agent memory)
executus CI / test (pull_request) Successful in 58s
Adversarial Review (Gadfly) / review (pull_request) Successful in 10m10s
ac961e1539
RegisterStore(reg, StoreDeps) registers the persistent-memory tools over the
host's KV and/or File backends:
- kv_get/set/list/delete (KVStorage seam)
- file_save/get/get_text/get_metadata/list/delete (FileStorage seam), plus
  file_search (FileSearcher) and create_file_url (FileTokenMinter) when wired.

Near-zero-config: Quota defaults to a generous static cap (staticQuota), the
per-value/per-file caps default, and the kv vs file groups register
independently (a host can take just one). Seams moved clean (interface-only):
kv_storage.go, quota_provider.go, file_descendant_grant.go. The default
in-memory KV/File backends come with contrib/store at P4.

Core go.sum still free of gorm/redis/discordgo/sqlite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Findings through the 🔒 security lens (host-agnostic generic tool library):

  • tools/file_search.go:73 — file_search is admin-only + share-safe. The handler at tools/file_search.go:110 pins skillID to inv.SkillID, so a non-admin caller invoking an admin-authored public skill can only search that skill's bucket. The doc-comment claim matches the implementation. Fail-closed type-assertion in tools/file_descendant_grant.go:48 keeps the cross-skill escape hatch off for fakes that don't implement DescendantRunChecker. Verified by reading both files.

  • tools/scope_validate.go — strict scope authorisation. Every code path I traced (kv_get, kv_set, kv_list, kv_delete, file_save, file_get, file_get_text, file_get_metadata, file_list, file_delete, create_file_url, summarize) reaches ValidateScope(inv, scope, false) before any storage call, and isAdmin=false is constant — no path can be coerced via an Invocation field into another user's bucket. user:, run:, root_run: cases I read in tools/scope_validate.go:48-103 reject mismatch; parent-run and continuation allowances use exact-string equality, not prefix match.

  • tools/create_file_url.go — token randomness + cross-skill guard. tools/create_file_url.go:170 reads 32 bytes from crypto/rand (~190 bits) and base64url-encodes; tools/create_file_url.go:118 does the cross-skill rejection + descendant grant before persisting. The default + max caps (DefaultFileURLExpiry, MaxFileURLExpiry, MaxFileURLViews) are applied before any DB write. Verified.

  • tools/kv_set.go — quota delta math + scope partition. tools/kv_set.go:107 subtracts prior value length on UPDATE only inside the per-skill (non-root_run) branch; the root_run partition is excluded because per-skill attribution is meaningless there. The type-assertion for KVHistoryRecorder is gated on storage succeeding — a failed history write cannot shadow the kv_set return. Verified.

  • tools/now.go — time.LoadLocation on untrusted string. tools/now.go:79 calls time.LoadLocation(tzName) with a caller-supplied IANA name. Go's stdlib does tzdata lookups only (no arbitrary file IO), and unknown zones fall back to UTC with a warning — not a security issue.

  • tools/classify.go / tools/extract_entities.go — prompt composition with caller text. Text, Categories, and field Description are concatenated into the user prompt verbatim (tools/classify.go:180-186, tools/extract_entities.go:259-267). This is the standard extraction-tool shape and the budgets gate cost; no system-prompt injection vector (system prompt is fixed; user text is on the user side). The coerced output is returned to the agent's context — downstream prompt-injection surface, inherent to LLM extract/classification, not introduced by this PR.

  • tools/file_save.go — base64 decode + size cap. tools/file_save.go:107 uses base64.StdEncoding.DecodeString and rejects on parse error; size cap is checked AFTER decode so an oversized-but-well-formed payload is caught. Per-file cap is the only line of defence when quota is nil, which is the documented test/admin opt-out path.

  • tools/cite.go — url_not_in_run_history rejection. tools/cite.go:97 reads the per-run touched-URL set via CitationStorage.GetTouchedURLs and rejects any URL not in it, preventing fabricated citations. The no_run_context branch (tools/cite.go:82) bails loudly when inv.RunID == "", so an empty run context cannot smuggle a cite through.

  • tools/think.go — empty-thought rejection. tools/think.go:60 returns a structured {"ok":false,"error":"empty_thought"} rather than silently accepting, closing off the "free spam" foot-gun noted in the doc-comment.

  • tools/scope_validate.go:60 — empty user:/run:/root_run: rejection. tools/scope_validate.go:54, 64, 83 each reject the bare-prefix shape with a descriptive error, closing off trivial "user:" matches.

No authn/authz gap, no SQL/command/path injection, no SSRF surface (network tools are deferred to a follow-up PR per the description), no unsafe deserialization, no secret/credential handling, and no insecure default that I can confirm by reading the code. Outside my lens I noticed nothing severe enough to flag.

🎯 Correctness — No material issues found

Review — Correctness Lens

VERDICT: No material issues found

Verification results:

  • Finding 1 (tools/kv_set.go:115, partition mismatch) — DROPPED. Read of tools/kv_set.go lines 85–115 confirms the gate if quota != nil && partition == inv.SkillID (line 93) is the only place the inner block runs; inside that branch partition == inv.SkillID is a tautology, so the KVGet(..., inv.SkillID, ...) call on line 107 is equivalent to passing partition. The draft itself concedes "the call is internally consistent" and frames the issue as a future-refactor hazard ("if a future refactor loosens the gating"). No current correctness bug — not a blocker.

  • Finding 2 (tools/file_save.go:127, dedup ignored) — DROPPED. Read of tools/file_save.go lines 24–27 (the in-file doc comment) explicitly states: "we do NOT subtract a 'prior' value here the way kv_set does: file_save always inserts a new file row (content-addressable dedup is at the blob layer, not the row layer), so every save is additive to FileUsageBytes." That is, FileUsageBytes measures ROWS (not dedup'd blob bytes), and each file_save does insert a new row, so the per-row quota charge is correct as written. The draft's premise ("as the comment elsewhere implies" FileUsageBytes returns dedup'd blob-layer total) is not supported — I grepped the whole repo for any such comment and found none; the in-file comment states the opposite. The agent-facing description ("Content is dedup'd by hash — multiple file_save calls with identical bytes share storage") reflects the blob-layer dedup, which is independent of the per-row quota count. No current correctness bug — not a blocker.

Neither finding survives verification.

🧯 Error handling & edge cases — Blocking issues found

VERDICT: Blocking issues found

Error-handling & edge-case review of the P3 tools package. Several material gaps; one cross-file inconsistency to flag.

  • tools/summarize.go:198-203 — missing descendantFileGrant escape hatch on file_id input. loadSummarizeInput rejects cross-skill file_ids outright, but file_get.go:74-79, file_get_text.go, and file_get_metadata.go all allow the parent → spawned-worker artifact path via descendantFileGrant. A parent orchestrator (the agent_spawn use case file_descendant_grant.go was added for) can read_page/file_save a worker's chart but cannot summarize it via file_id — the LLM is forced to re-read bytes inline. The package's own doc-comment for file_descendant_grant.go says "you may touch the artifacts of workers you (transitively) dispatched" but summarize silently breaks that contract. Fix: type-assert the storage dep to DescendantRunChecker the same way file_get does and fall through when the owner is a descendant.

  • tools/kv_set.go:93-114 + tools/file_save.go:118-130 — TOCTOU race in quota enforcement. Both gates do used := storage.UsageBytes(...) → arithmetic check → write, with no locking. Two concurrent calls in the same run can both observe used < kvMax and both commit, pushing total storage past the cap. The PR description even names this "per-skill byte quota" as a hard gate, but it isn't atomic. Suggested fix: serialize the check+write under a per-skill mutex in the production adapter (e.g., gormStorage.QuotaLock(skillID)), or convert to an atomic SQL UPDATE with a WHERE-clause guard and retry on conflict.

  • tools/file_delete.go:52-54 — inconsistent cross-skill handling (no descendant grant, unlike file_get/file_get_text/file_get_metadata). Probably defensible (delete is destructive; explicit non-inheritance is the safer default), but the asymmetry with the read tools should be a deliberate choice, not an accident — and the package doc on file_descendant_grant.go doesn't carve out a delete exception. Either add a code comment marking the asymmetry intentional, or wire the descendant check (matching the other tools).

  • tools/research_defaults.go:63 — comment references a non-existent constant. NewInMemorySearchBudget's comment says "kinds missing from the map fall back to maxPerKindDefault" but no such constant exists in the package; CheckAndIncrement hardcodes max = 10 (line 64). Either rename the inline 10 to a named constant matching the comment, or fix the comment. Verified by grep — no maxPerKindDefault symbol exists.

  • tools/think.go:71var _ = fmt.Errorf is a "keep the import alive" hack with no actual use. The only fmt reference is the unused-import keeper itself. The empty-thought branch returns a hand-rolled JSON literal, never formats anything. Remove the fmt import and the _ = line. Verified by reading lines 1-72 — fmt.Errorf is not called anywhere in the file.

  • tools/kv_set.go:81-83 — silent accept of null value. json.Unmarshal([]byte("null"), &probe) succeeds with probe == nil. Combined with the kv_get path returning the raw Value bytes verbatim, an agent that stores null reads back "null" (a 4-byte JSON literal string, not a JSON null). It's coherent at the wire level, but the schema comment ("object, array, string, number, bool, or null") advertises a value type the storage layer doesn't round-trip — kv_get of a null row returns the string "null", not the JSON null. Either reject null at kv_set time, or have kv_get return it as the literal null (unquoted). Minor, but the asymmetry between null and other JSON values is a sharp edge.

  • tools/summarize.go:208-214 + tools/classify.go:108-109 — byte-truncation can split a UTF-8 multibyte rune. text[:summarizeMaxInputBytes] and the matching classify slice happen at byte boundaries, not rune boundaries. A 32 KiB Chinese / emoji string truncates into invalid UTF-8 that the LLM then sees as replacement chars. Fix: use utf8.DecodeLastRune to walk back to a valid boundary, or convert via []rune first.

  • tools/cite.go:87-95 — storage errors swallowed into the response Reason field but not surfaced as error. A failed GetTouchedURLs returns reason: "touched_lookup_failed: <err>" rather than the standard error field. The agent's reaction logic must special-case the reason string. Minor consistency issue: every other cite failure mode uses Reason; a transport error is structurally indistinguishable from a validation failure, which is fine, but if RecordCitation ever returns a non-recoverable error (e.g., DB down), the agent silently keeps retrying. Consider at minimum returning ("", error) for GetTouchedURLs/RecordCitation failures so the executor surfaces a real error and can decide whether to retry or abort.

  • tools/store.go:46-49RegisterStore silently installs staticQuota when d.Quota == nil despite the QuotaProvider interface comment saying nil is "do NOT enforce". staticQuota.EffectiveQuota returns 64 << 20 / 1 << 30 regardless of caller. A host that reads quota_provider.go's "nil-safe: tools constructed against a nil QuotaProvider do NOT enforce" and passes StoreDeps{Quota: nil} expecting disabled enforcement actually gets a 64 MiB KV / 1 GiB file cap on every skill. The doc on StoreDeps itself says "defaults to a generous static cap" (correct), but the QuotaProvider interface comment (line 22 of quota_provider.go) contradicts that. Fix: align the QuotaProvider interface comment to the actual wrap-time default, or rename staticQuotadefaultStaticQuota and document it as "always-on, not skipped".

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 34s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **VERDICT:** No material issues found Findings through the 🔒 security lens (host-agnostic generic tool library): - **tools/file_search.go:73 — `file_search` is admin-only + share-safe.** The handler at tools/file_search.go:110 pins `skillID` to `inv.SkillID`, so a non-admin caller invoking an admin-authored public skill can only search that skill's bucket. The doc-comment claim matches the implementation. Fail-closed type-assertion in tools/file_descendant_grant.go:48 keeps the cross-skill escape hatch off for fakes that don't implement `DescendantRunChecker`. Verified by reading both files. - **tools/scope_validate.go — strict scope authorisation.** Every code path I traced (`kv_get`, `kv_set`, `kv_list`, `kv_delete`, `file_save`, `file_get`, `file_get_text`, `file_get_metadata`, `file_list`, `file_delete`, `create_file_url`, `summarize`) reaches `ValidateScope(inv, scope, false)` before any storage call, and `isAdmin=false` is constant — no path can be coerced via an `Invocation` field into another user's bucket. `user:`, `run:`, `root_run:` cases I read in tools/scope_validate.go:48-103 reject mismatch; parent-run and continuation allowances use exact-string equality, not prefix match. - **tools/create_file_url.go — token randomness + cross-skill guard.** tools/create_file_url.go:170 reads 32 bytes from `crypto/rand` (~190 bits) and base64url-encodes; tools/create_file_url.go:118 does the cross-skill rejection + descendant grant before persisting. The default + max caps (`DefaultFileURLExpiry`, `MaxFileURLExpiry`, `MaxFileURLViews`) are applied before any DB write. Verified. - **tools/kv_set.go — quota delta math + scope partition.** tools/kv_set.go:107 subtracts prior value length on UPDATE only inside the per-skill (non-root_run) branch; the root_run partition is excluded because per-skill attribution is meaningless there. The type-assertion for `KVHistoryRecorder` is gated on storage succeeding — a failed history write cannot shadow the kv_set return. Verified. - **tools/now.go — `time.LoadLocation` on untrusted string.** tools/now.go:79 calls `time.LoadLocation(tzName)` with a caller-supplied IANA name. Go's stdlib does tzdata lookups only (no arbitrary file IO), and unknown zones fall back to UTC with a `warning` — not a security issue. - **tools/classify.go / tools/extract_entities.go — prompt composition with caller text.** `Text`, `Categories`, and field `Description` are concatenated into the user prompt verbatim (tools/classify.go:180-186, tools/extract_entities.go:259-267). This is the standard extraction-tool shape and the budgets gate cost; no system-prompt injection vector (system prompt is fixed; user text is on the user side). The coerced output is returned to the agent's context — downstream prompt-injection surface, inherent to LLM extract/classification, not introduced by this PR. - **tools/file_save.go — base64 decode + size cap.** tools/file_save.go:107 uses `base64.StdEncoding.DecodeString` and rejects on parse error; size cap is checked AFTER decode so an oversized-but-well-formed payload is caught. Per-file cap is the only line of defence when `quota` is nil, which is the documented test/admin opt-out path. - **tools/cite.go — `url_not_in_run_history` rejection.** tools/cite.go:97 reads the per-run touched-URL set via `CitationStorage.GetTouchedURLs` and rejects any URL not in it, preventing fabricated citations. The `no_run_context` branch (tools/cite.go:82) bails loudly when `inv.RunID == ""`, so an empty run context cannot smuggle a cite through. - **tools/think.go — empty-thought rejection.** tools/think.go:60 returns a structured `{"ok":false,"error":"empty_thought"}` rather than silently accepting, closing off the "free spam" foot-gun noted in the doc-comment. - **tools/scope_validate.go:60 — empty `user:`/`run:`/`root_run:` rejection.** tools/scope_validate.go:54, 64, 83 each reject the bare-prefix shape with a descriptive error, closing off trivial `"user:"` matches. No authn/authz gap, no SQL/command/path injection, no SSRF surface (network tools are deferred to a follow-up PR per the description), no unsafe deserialization, no secret/credential handling, and no insecure default that I can confirm by reading the code. Outside my lens I noticed nothing severe enough to flag. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> ## Review — Correctness Lens **VERDICT: No material issues found** Verification results: - **Finding 1 (tools/kv_set.go:115, partition mismatch) — DROPPED.** Read of `tools/kv_set.go` lines 85–115 confirms the gate `if quota != nil && partition == inv.SkillID` (line 93) is the only place the inner block runs; inside that branch `partition == inv.SkillID` is a tautology, so the `KVGet(..., inv.SkillID, ...)` call on line 107 is equivalent to passing `partition`. The draft itself concedes "the call is internally consistent" and frames the issue as a future-refactor hazard ("if a future refactor loosens the gating"). No current correctness bug — not a blocker. - **Finding 2 (tools/file_save.go:127, dedup ignored) — DROPPED.** Read of `tools/file_save.go` lines 24–27 (the in-file doc comment) explicitly states: "we do NOT subtract a 'prior' value here the way kv_set does: file_save always inserts a new file row (content-addressable dedup is at the blob layer, not the row layer), so every save is additive to FileUsageBytes." That is, `FileUsageBytes` measures ROWS (not dedup'd blob bytes), and each `file_save` does insert a new row, so the per-row quota charge is correct as written. The draft's premise ("as the comment elsewhere implies" `FileUsageBytes` returns dedup'd blob-layer total) is not supported — I grepped the whole repo for any such comment and found none; the in-file comment states the opposite. The agent-facing description ("Content is dedup'd by hash — multiple file_save calls with identical bytes share storage") reflects the blob-layer dedup, which is independent of the per-row quota count. No current correctness bug — not a blocker. Neither finding survives verification. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Blocking issues found</summary> **VERDICT: Blocking issues found** Error-handling & edge-case review of the P3 tools package. Several material gaps; one cross-file inconsistency to flag. - **`tools/summarize.go:198-203` — missing `descendantFileGrant` escape hatch on file_id input.** `loadSummarizeInput` rejects cross-skill file_ids outright, but `file_get.go:74-79`, `file_get_text.go`, and `file_get_metadata.go` all allow the parent → spawned-worker artifact path via `descendantFileGrant`. A parent orchestrator (the agent_spawn use case `file_descendant_grant.go` was added for) can `read_page`/`file_save` a worker's chart but cannot summarize it via `file_id` — the LLM is forced to re-read bytes inline. The package's own doc-comment for `file_descendant_grant.go` says "you may touch the artifacts of workers you (transitively) dispatched" but summarize silently breaks that contract. Fix: type-assert the storage dep to `DescendantRunChecker` the same way `file_get` does and fall through when the owner is a descendant. - **`tools/kv_set.go:93-114` + `tools/file_save.go:118-130` — TOCTOU race in quota enforcement.** Both gates do `used := storage.UsageBytes(...)` → arithmetic check → write, with no locking. Two concurrent calls in the same run can both observe `used < kvMax` and both commit, pushing total storage past the cap. The PR description even names this "per-skill byte quota" as a hard gate, but it isn't atomic. Suggested fix: serialize the check+write under a per-skill mutex in the production adapter (e.g., `gormStorage.QuotaLock(skillID)`), or convert to an atomic SQL UPDATE with a WHERE-clause guard and retry on conflict. - **`tools/file_delete.go:52-54` — inconsistent cross-skill handling (no descendant grant, unlike file_get/file_get_text/file_get_metadata).** Probably defensible (delete is destructive; explicit non-inheritance is the safer default), but the asymmetry with the read tools should be a deliberate choice, not an accident — and the package doc on `file_descendant_grant.go` doesn't carve out a delete exception. Either add a code comment marking the asymmetry intentional, or wire the descendant check (matching the other tools). - **`tools/research_defaults.go:63` — comment references a non-existent constant.** `NewInMemorySearchBudget`'s comment says "kinds missing from the map fall back to maxPerKindDefault" but no such constant exists in the package; `CheckAndIncrement` hardcodes `max = 10` (line 64). Either rename the inline `10` to a named constant matching the comment, or fix the comment. Verified by grep — no `maxPerKindDefault` symbol exists. - **`tools/think.go:71` — `var _ = fmt.Errorf` is a "keep the import alive" hack with no actual use.** The only `fmt` reference is the unused-import keeper itself. The empty-thought branch returns a hand-rolled JSON literal, never formats anything. Remove the `fmt` import and the `_ =` line. Verified by reading lines 1-72 — `fmt.Errorf` is not called anywhere in the file. - **`tools/kv_set.go:81-83` — silent accept of `null` value.** `json.Unmarshal([]byte("null"), &probe)` succeeds with `probe == nil`. Combined with the `kv_get` path returning the raw `Value` bytes verbatim, an agent that stores `null` reads back `"null"` (a 4-byte JSON literal string, not a JSON null). It's coherent at the wire level, but the schema comment ("object, array, string, number, bool, or null") advertises a value type the storage layer doesn't round-trip — `kv_get` of a `null` row returns the string `"null"`, not the JSON null. Either reject `null` at `kv_set` time, or have `kv_get` return it as the literal `null` (unquoted). Minor, but the asymmetry between `null` and other JSON values is a sharp edge. - **`tools/summarize.go:208-214` + `tools/classify.go:108-109` — byte-truncation can split a UTF-8 multibyte rune.** `text[:summarizeMaxInputBytes]` and the matching classify slice happen at byte boundaries, not rune boundaries. A 32 KiB Chinese / emoji string truncates into invalid UTF-8 that the LLM then sees as replacement chars. Fix: use `utf8.DecodeLastRune` to walk back to a valid boundary, or convert via `[]rune` first. - **`tools/cite.go:87-95` — storage errors swallowed into the response `Reason` field but not surfaced as `error`.** A failed `GetTouchedURLs` returns `reason: "touched_lookup_failed: <err>"` rather than the standard `error` field. The agent's reaction logic must special-case the `reason` string. Minor consistency issue: every other cite failure mode uses `Reason`; a transport error is structurally indistinguishable from a validation failure, which is fine, but if `RecordCitation` ever returns a non-recoverable error (e.g., DB down), the agent silently keeps retrying. Consider at minimum returning `("", error)` for `GetTouchedURLs`/`RecordCitation` failures so the executor surfaces a real error and can decide whether to retry or abort. - **`tools/store.go:46-49` — `RegisterStore` silently installs `staticQuota` when `d.Quota == nil` despite the `QuotaProvider` interface comment saying nil is "do NOT enforce".** `staticQuota.EffectiveQuota` returns `64 << 20` / `1 << 30` regardless of caller. A host that reads `quota_provider.go`'s "nil-safe: tools constructed against a nil QuotaProvider do NOT enforce" and passes `StoreDeps{Quota: nil}` expecting disabled enforcement actually gets a 64 MiB KV / 1 GiB file cap on every skill. The doc on `StoreDeps` itself says "defaults to a generous static cap" (correct), but the `QuotaProvider` interface comment (line 22 of `quota_provider.go`) contradicts that. Fix: align the `QuotaProvider` interface comment to the actual wrap-time default, or rename `staticQuota` → `defaultStaticQuota` and document it as "always-on, not skipped". </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 34s</sub>

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

All findings confirmed against the actual code. The corrected review:

VERDICT: Blocking issues found

Security lens findings

1. ValidateScope ignores CallerIsAdmin — every tool hard-codes false, so admin overrides are dead code

tools/scope_validate.go:42ValidateScope(inv, scope, isAdmin) takes an explicit isAdmin bool, but every production caller passes the literal false. Verified via grep across all 11 call sites: file_save.go:87, file_list.go:44, file_get.go:85, file_get_metadata.go:73, file_get_text.go:80, file_delete.go:55, file_search.go:101, kv_get.go:46, kv_set.go:62, kv_list.go:52, kv_delete.go:36, summarize.go:201.

The comment at scope_validate.go:20-24 and :36-40 claims "production tools always pass isAdmin=false" and "Invocation does NOT carry an admin flag." That is stale: tool/registry.go:188 shows Invocation.CallerIsAdmin exists and is populated by the executor (documented as backed by Bot.GetMember), and is already read by code_exec for v15 admin-only network mode. The correct, available value is inv.CallerIsAdmin, and it is dropped on the floor everywhere.

Security impact: the entire admin branch of the authorization matrix (user:<other> allowed for admin at scope_validate.go:53, run:<other> allowed for admin at :107, root_run:<other> allowed for admin at :72) is unreachable in production. The documented ruleset (file doc lines 5-12) promises admin escalation, but it does not actually exist. This is a defense-in-depth gap: fail-closed today (safe), but any future reliance on admin status via these tools silently fails. One-token fix: pass inv.CallerIsAdmin instead of false at each call site, and correct the stale comments.

2. create_file_url missing scope check — a caller can mint a public URL for a file bypassing row-scope authorization

tools/create_file_url.go:109-130 — the handler fetches FileDomainMeta via FileGet, checks meta.SkillID != inv.SkillID with the descendant grant (descendantFileGrant), but never calls ValidateScope. Grep confirms no ValidateScope reference anywhere in create_file_url.go. Compare every other file-access tool: file_get.go:85, file_get_metadata.go:73, file_get_text.go:80, summarize.go:201 all run ValidateScope(inv, meta.Scope, false) after the cross-skill check; file_save.go:87 checks scope before writing.

Security impact: create_file_url is a publication primitive — it mints an unauthenticated mort.sh/files/<token> link. A skill that legitimately owns a file stored under user:bob scope can mint a public URL for it, whereas file_get correctly rejects the same caller reading that file's bytes because ValidateScope enforces user:<id> == inv.CallerID. Same-skill ownership (meta.SkillID == inv.SkillID) passes the cross-skill check at :127, then falls straight through to token minting with no scope gate. A file saved under run:<other_run_id> or user:<other> by the same skill becomes publicly publishable without the scope authorization every read path enforces. This is the one file-access tool that drops the scope check, and it drops it at exactly the wrong (publication) boundary.

Suggested fix: mirror file_get_metadata's pattern — track grantedViaDescendant, and when the file is owned by this skill (meta.SkillID == inv.SkillID), still run ValidateScope(inv, meta.Scope, inv.CallerIsAdmin) before minting the public link; skip it only for the descendant-grant case.

3. file_search lets an empty scope bypass ValidateScope and reach the searcher unvalidated

tools/file_search.go:99-104 — when args.Scope == "", the scope check is skipped and "" is passed straight to searcher.SearchFiles(ctx, inv.SkillID, "", ...). The handler pins skill_id to inv.SkillID (the primary cross-skill boundary), so the skill_id dimension is safe. But the scope dimension is left to the FileSearcher backend to interpret: an empty scope handed to a backend that treats "" as "all scopes within this skill" could surface rows from user:<other> scopes that ValidateScope would have rejected for a non-admin caller.

Security impact: minor / defense-in-depth — the skill_id pin is the primary boundary and is enforced here (:110). The empty-scope path is the one place a caller-supplied scope string flows into storage without validation. The FileSearcher backend is not in this PR (lives in mort.go), so whether "" is interpreted as cross-scope is an assumption about the future adapter. Worth either rejecting empty scope or documenting that the backend must not interpret "" as cross-scope.

🎯 Correctness — Blocking issues found

Both findings confirmed against the actual code. Verifying the remaining details and recomputing.

VERDICT: Blocking issues found

  • tools/create_file_url.go:120-129 — missing scope check (confirmed). file_get (line 85), file_get_text (line 80), file_get_metadata (line 73), and file_delete (line 55) all call ValidateScope(inv, meta.Scope, false) after the SkillID/descendant check. create_file_url does not: when meta.SkillID == inv.SkillID it falls straight through from line 129 to minting a public URL at line 154. A file saved by the same skill under user:<callerA> can therefore be published to a public /files/<token> URL by a different caller (callerB) of that shared skill who only knows the file_id. This breaks the per-user scope isolation the other four file tools enforce and is a concrete cross-user exposure — public link minting, not merely a read. Fix: track grantedViaDescendant (as the sibling tools do) and add if !grantedViaDescendant { if err := ValidateScope(inv, meta.Scope, false); err != nil { return "", fmt.Errorf("create_file_url: %w", err) } } after the skill/descendant branch, mirroring file_get.go:73-87.

  • tools/classify.go:286-298 — single-label returns a "winner" even when all scores are 0 (confirmed). selectClassifyLabels initialises bestScore := -1.0 and iterates with scores[c] > bestScore. If the LLM reply yields all-zero scores (empty/absent keys default to 0 in normaliseClassifyScores), the first category always satisfies 0 > -1, so the tool returns labels: [firstCategory] with scores all 0. The bestCat == "" guard at line 295 only fires when categories is empty, which earlier validation (line ~140) already rejects, so it never triggers here. An agent consuming just labels[] gets a false-positive classification. Consider returning nil (or surfacing an error) when the best score is 0/below a small epsilon so "no category fit" is distinguishable from "category A won".

  • tools/file_delete.go:52-53 vs the other file tools — descendant-grant inconsistency (confirmed). file_get/file_get_text/file_get_metadata/create_file_url honour descendantFileGrant for cross-skill access; file_delete rejects outright when meta.SkillID != inv.SkillID with no descendant escape hatch. Deleting a worker's artifact is arguably a distinct act from reading/publishing it, so the omission may be deliberate — but it's worth a one-line comment in file_delete stating the omission is intentional, since the package's stated blanket rule otherwise routes every cross-skill site through descendantFileGrant.

🧯 Error handling & edge cases — Minor issues

All three findings confirmed against the actual code. Outputting the corrected review.

VERDICT: Minor issues

  • tools/tools.go:59MetaDeps.MaxPerRun is silently ignored via RegisterMeta. When d.Budget is nil (the normal path), RegisterMeta constructs NewInMemorySearchBudget(nil) — an empty caps map (tools.go:59, research_defaults.go:47-54) — and passes that shared budget to all three meta tools. Inside each handler, the if budget == nil fallback that would have consulted cfg.MaxPerRun(ctx) is skipped (budget is non-nil, classify.go:118, extract_entities.go:128, summarize.go:108), so CheckAndIncrement lands in research_defaults.go:62-64 where b.cap[kind] is 0 → hardcoded max = 10. The fixedMetaConfig.MaxPerRun value derived from MetaDeps.MaxPerRun (tools.go:67,88) is therefore dead code in the RegisterMeta path. A host that sets MetaDeps.MaxPerRun = 50 expecting 50 calls/run still gets capped at 10. The default (10) happens to coincide, masking the bug. Fix: RegisterMeta should build the budget with the configured caps, e.g. NewInMemorySearchBudget(map[string]int{"classify": d.MaxPerRun, "extract_entities": d.MaxPerRun, "summarize": d.MaxPerRun}).

  • tools/create_file_url.go:133-138ExpiresInSeconds overflow bypasses the max-expiry cap. expiry = time.Duration(args.ExpiresInSeconds) * time.Second (create_file_url.go:134) can overflow int64 nanoseconds for very large int values (e.g. near math.MaxInt64), producing a negative time.Duration. The subsequent guard if expiry > MaxFileURLExpiry (create_file_url.go:136) is then false (negative < positive), so the cap is skipped and time.Now().Add(expiry) (create_file_url.go:139) yields a past expiresAt — the token is minted already-expired. The doc promises "Max 2592000 (30 days)" but the clamp only handles the over-but-positive case, not overflow/negative. Suggested fix: clamp args.ExpiresInSeconds to [0, int(MaxFileURLExpiry/time.Second)] before the multiply, or check expiry <= 0 after the multiply and fall back to the default.

  • tools/classify.go:118, extract_entities.go:128, summarize.go:108 — per-call budget fallback defeats enforcement when a tool is constructed directly with nil budget. Each handler, when budget == nil, builds a fresh InMemorySearchBudget on every invocation (classify.go:123-125, extract_entities.go:133-135, summarize.go:113-115), so the counter always starts at 0 → never exceeds → no effective cap. This is a footgun for hosts that construct the tools directly (rather than via RegisterMeta, which supplies a shared non-nil budget). The nil-safe claim ("budget defaults to an in-memory per-run cap") is misleading because the per-call allocation makes it per-call, not per-run. Not blocking (RegisterMeta is the documented path), but worth a comment or a constructor-level default so direct construction is still bounded.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 6m 32s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Blocking issues found</summary> All findings confirmed against the actual code. The corrected review: **VERDICT: Blocking issues found** ## Security lens findings ### 1. `ValidateScope` ignores `CallerIsAdmin` — every tool hard-codes `false`, so admin overrides are dead code `tools/scope_validate.go:42` — `ValidateScope(inv, scope, isAdmin)` takes an explicit `isAdmin` bool, but **every production caller passes the literal `false`**. Verified via grep across all 11 call sites: `file_save.go:87`, `file_list.go:44`, `file_get.go:85`, `file_get_metadata.go:73`, `file_get_text.go:80`, `file_delete.go:55`, `file_search.go:101`, `kv_get.go:46`, `kv_set.go:62`, `kv_list.go:52`, `kv_delete.go:36`, `summarize.go:201`. The comment at `scope_validate.go:20-24` and `:36-40` claims "production tools always pass isAdmin=false" and "Invocation does NOT carry an admin flag." That is stale: `tool/registry.go:188` shows `Invocation.CallerIsAdmin` exists and is populated by the executor (documented as backed by `Bot.GetMember`), and is already read by `code_exec` for v15 admin-only network mode. The correct, available value is `inv.CallerIsAdmin`, and it is dropped on the floor everywhere. **Security impact:** the entire admin branch of the authorization matrix (`user:<other>` allowed for admin at `scope_validate.go:53`, `run:<other>` allowed for admin at `:107`, `root_run:<other>` allowed for admin at `:72`) is unreachable in production. The documented ruleset (file doc lines 5-12) promises admin escalation, but it does not actually exist. This is a defense-in-depth gap: fail-closed today (safe), but any future reliance on admin status via these tools silently fails. One-token fix: pass `inv.CallerIsAdmin` instead of `false` at each call site, and correct the stale comments. ### 2. `create_file_url` missing scope check — a caller can mint a public URL for a file bypassing row-scope authorization `tools/create_file_url.go:109-130` — the handler fetches `FileDomainMeta` via `FileGet`, checks `meta.SkillID != inv.SkillID` with the descendant grant (`descendantFileGrant`), but **never calls `ValidateScope`**. Grep confirms no `ValidateScope` reference anywhere in `create_file_url.go`. Compare every other file-access tool: `file_get.go:85`, `file_get_metadata.go:73`, `file_get_text.go:80`, `summarize.go:201` all run `ValidateScope(inv, meta.Scope, false)` after the cross-skill check; `file_save.go:87` checks scope before writing. **Security impact:** `create_file_url` is a *publication* primitive — it mints an unauthenticated `mort.sh/files/<token>` link. A skill that legitimately owns a file stored under `user:bob` scope can mint a public URL for it, whereas `file_get` correctly rejects the same caller reading that file's bytes because `ValidateScope` enforces `user:<id>` == `inv.CallerID`. Same-skill ownership (`meta.SkillID == inv.SkillID`) passes the cross-skill check at `:127`, then falls straight through to token minting with no scope gate. A file saved under `run:<other_run_id>` or `user:<other>` by the same skill becomes publicly publishable without the scope authorization every read path enforces. This is the one file-access tool that drops the scope check, and it drops it at exactly the wrong (publication) boundary. **Suggested fix:** mirror `file_get_metadata`'s pattern — track `grantedViaDescendant`, and when the file is owned by this skill (`meta.SkillID == inv.SkillID`), still run `ValidateScope(inv, meta.Scope, inv.CallerIsAdmin)` before minting the public link; skip it only for the descendant-grant case. ### 3. `file_search` lets an empty `scope` bypass `ValidateScope` and reach the searcher unvalidated `tools/file_search.go:99-104` — when `args.Scope == ""`, the scope check is skipped and `""` is passed straight to `searcher.SearchFiles(ctx, inv.SkillID, "", ...)`. The handler pins `skill_id` to `inv.SkillID` (the primary cross-skill boundary), so the `skill_id` dimension is safe. But the *scope* dimension is left to the `FileSearcher` backend to interpret: an empty scope handed to a backend that treats `""` as "all scopes within this skill" could surface rows from `user:<other>` scopes that `ValidateScope` would have rejected for a non-admin caller. **Security impact:** minor / defense-in-depth — the `skill_id` pin is the primary boundary and is enforced here (`:110`). The empty-scope path is the one place a caller-supplied scope string flows into storage without validation. The `FileSearcher` backend is not in this PR (lives in `mort.go`), so whether `""` is interpreted as cross-scope is an assumption about the future adapter. Worth either rejecting empty scope or documenting that the backend must not interpret `""` as cross-scope. </details> <details><summary><b>🎯 Correctness</b> — Blocking issues found</summary> Both findings confirmed against the actual code. Verifying the remaining details and recomputing. **VERDICT: Blocking issues found** - **`tools/create_file_url.go:120-129` — missing scope check (confirmed).** `file_get` (line 85), `file_get_text` (line 80), `file_get_metadata` (line 73), and `file_delete` (line 55) all call `ValidateScope(inv, meta.Scope, false)` after the `SkillID`/descendant check. `create_file_url` does not: when `meta.SkillID == inv.SkillID` it falls straight through from line 129 to minting a public URL at line 154. A file saved by the same skill under `user:<callerA>` can therefore be published to a public `/files/<token>` URL by a *different* caller (`callerB`) of that shared skill who only knows the `file_id`. This breaks the per-user scope isolation the other four file tools enforce and is a concrete cross-user exposure — public link minting, not merely a read. Fix: track `grantedViaDescendant` (as the sibling tools do) and add `if !grantedViaDescendant { if err := ValidateScope(inv, meta.Scope, false); err != nil { return "", fmt.Errorf("create_file_url: %w", err) } }` after the skill/descendant branch, mirroring `file_get.go:73-87`. - **`tools/classify.go:286-298` — single-label returns a "winner" even when all scores are 0 (confirmed).** `selectClassifyLabels` initialises `bestScore := -1.0` and iterates with `scores[c] > bestScore`. If the LLM reply yields all-zero scores (empty/absent keys default to 0 in `normaliseClassifyScores`), the first category always satisfies `0 > -1`, so the tool returns `labels: [firstCategory]` with `scores` all `0`. The `bestCat == ""` guard at line 295 only fires when `categories` is empty, which earlier validation (line ~140) already rejects, so it never triggers here. An agent consuming just `labels[]` gets a false-positive classification. Consider returning `nil` (or surfacing an error) when the best score is `0`/below a small epsilon so "no category fit" is distinguishable from "category A won". - **`tools/file_delete.go:52-53` vs the other file tools — descendant-grant inconsistency (confirmed).** `file_get`/`file_get_text`/`file_get_metadata`/`create_file_url` honour `descendantFileGrant` for cross-skill access; `file_delete` rejects outright when `meta.SkillID != inv.SkillID` with no descendant escape hatch. Deleting a worker's artifact is arguably a distinct act from reading/publishing it, so the omission may be deliberate — but it's worth a one-line comment in `file_delete` stating the omission is intentional, since the package's stated blanket rule otherwise routes every cross-skill site through `descendantFileGrant`. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> All three findings confirmed against the actual code. Outputting the corrected review. VERDICT: Minor issues - **`tools/tools.go:59` — `MetaDeps.MaxPerRun` is silently ignored via `RegisterMeta`.** When `d.Budget` is nil (the normal path), `RegisterMeta` constructs `NewInMemorySearchBudget(nil)` — an empty caps map (`tools.go:59`, `research_defaults.go:47-54`) — and passes that shared budget to all three meta tools. Inside each handler, the `if budget == nil` fallback that would have consulted `cfg.MaxPerRun(ctx)` is skipped (budget is non-nil, `classify.go:118`, `extract_entities.go:128`, `summarize.go:108`), so `CheckAndIncrement` lands in `research_defaults.go:62-64` where `b.cap[kind]` is 0 → hardcoded `max = 10`. The `fixedMetaConfig.MaxPerRun` value derived from `MetaDeps.MaxPerRun` (`tools.go:67,88`) is therefore dead code in the `RegisterMeta` path. A host that sets `MetaDeps.MaxPerRun = 50` expecting 50 calls/run still gets capped at 10. The default (10) happens to coincide, masking the bug. Fix: `RegisterMeta` should build the budget with the configured caps, e.g. `NewInMemorySearchBudget(map[string]int{"classify": d.MaxPerRun, "extract_entities": d.MaxPerRun, "summarize": d.MaxPerRun})`. - **`tools/create_file_url.go:133-138` — `ExpiresInSeconds` overflow bypasses the max-expiry cap.** `expiry = time.Duration(args.ExpiresInSeconds) * time.Second` (`create_file_url.go:134`) can overflow `int64` nanoseconds for very large `int` values (e.g. near `math.MaxInt64`), producing a negative `time.Duration`. The subsequent guard `if expiry > MaxFileURLExpiry` (`create_file_url.go:136`) is then false (negative < positive), so the cap is skipped and `time.Now().Add(expiry)` (`create_file_url.go:139`) yields a past `expiresAt` — the token is minted already-expired. The doc promises "Max 2592000 (30 days)" but the clamp only handles the over-but-positive case, not overflow/negative. Suggested fix: clamp `args.ExpiresInSeconds` to `[0, int(MaxFileURLExpiry/time.Second)]` before the multiply, or check `expiry <= 0` after the multiply and fall back to the default. - **`tools/classify.go:118`, `extract_entities.go:128`, `summarize.go:108` — per-call budget fallback defeats enforcement when a tool is constructed directly with `nil` budget.** Each handler, when `budget == nil`, builds a *fresh* `InMemorySearchBudget` on every invocation (`classify.go:123-125`, `extract_entities.go:133-135`, `summarize.go:113-115`), so the counter always starts at 0 → never exceeds → no effective cap. This is a footgun for hosts that construct the tools directly (rather than via `RegisterMeta`, which supplies a shared non-nil budget). The nil-safe claim ("budget defaults to an in-memory per-run cap") is misleading because the per-call allocation makes it per-call, not per-run. Not blocking (RegisterMeta is the documented path), but worth a comment or a constructor-level default so direct construction is still bounded. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 6m 32s</sub>

🪰 Gadfly review — deepseek-v4-flash:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

🔒 Security Review — Verdict: Blocking issues found

1. tools/create_file_url.go:127 — Missing scope validation allows cross-user public URL minting

Confirmed. The create_file_url handler checks cross-skill ownership (line 127) but never calls ValidateScope(inv, meta.Scope, false) after fetching the file metadata. Every other file-access tool (file_get.go:85, file_get_metadata.go:73, file_get_text.go:80, file_delete.go:55, summarize.go:201) performs this scope check.

Impact: A caller in skill X can mint a public, unauthenticated URL for any file in skill X, regardless of its scope. For example, if a file was saved with scope user:bob, caller alice (same skill) can still create a public URL for it. The file's bytes become accessible to anyone with the link, bypassing the per-user isolation that ValidateScope enforces. This is a privilege escalation / access control bypass.

Suggested fix: Add a scope validation after the cross-skill check, matching the pattern in file_get.go:

// After the cross-skill check (line 129), add:
if err := ValidateScope(inv, meta.Scope, false); err != nil {
    return "", fmt.Errorf("create_file_url: %w", err)
}

The descendant-grant exception should also be handled: when descendantFileGrant passes, the scope check should be skipped (as file_get.go does with grantedViaDescendant), because the worker's file scope is the worker's run, not the caller's.


2. tools/file_delete.go:52 — Missing descendant grant for cross-skill file deletion (fail-closed, lower severity)

Confirmed. file_delete.go rejects any file where meta.SkillID != inv.SkillID (line 52) without consulting descendantFileGrant. All other file-access tools (file_get.go:75, file_get_metadata.go:67, file_get_text.go:74, create_file_url.go:127) allow the descendant-grant exception.

Impact: A parent skill that spawned a worker cannot delete the worker's files, even though it can read them. This is fail-closed (safe by default) but breaks the parent/worker handoff pattern that the descendant grant was designed to support. A worker's orphaned files would accumulate until the run-scope sweeper cleans them.

Suggested fix: Add the descendant grant check before the cross-skill rejection, matching the pattern in file_get.go:

if meta.SkillID != inv.SkillID {
    if !descendantFileGrant(ctx, storage, inv, meta.SkillID) {
        return "", fmt.Errorf("file_delete: file does not belong to this skill")
    }
}

3. tools/research_defaults.goInMemorySearchBudget is explicitly not production-correct (documented, but worth flagging)

Confirmed. The InMemorySearchBudget (lines 37–85) is a process-wide map that never resets across runs. The documentation at lines 29–36 explicitly says: "NOT production-correct because the map persists across the process lifetime; production wiring MUST plug a per-run reset." The default wiring in tools.go:59 uses this implementation.

Impact: In a test or misconfigured deployment where this default is used in production, budget counters bleed across runs. A single run could exhaust the budget for all subsequent runs of the same kind (e.g., classify, summarize), causing false "budget exceeded" rejections. This is a documented gap, not a code bug, but it is a security-relevant deployment pitfall.

Suggested fix: Ensure production wiring supplies a per-run-resetting SearchBudget implementation. The InMemorySearchBudget should be restricted to test use only, perhaps by unexporting it or adding a // +build test constraint.

🎯 Correctness — Minor issues

Now I have all the information I need to evaluate each finding.

Evaluation of Finding 1

The draft claims that when budget == nil, a new InMemorySearchBudget is created inside the handler closure, and "the local variable shadows the closure's captured budget", so every call gets a fresh budget.

Looking at the actual code in classify.go lines 118-126:

if budget == nil {
    maxPerRun := classifyFallbackMaxPerRun
    if cfg != nil {
        maxPerRun = cfg.MaxPerRun(ctx)
    }
    budget = NewInMemorySearchBudget(map[string]int{
        "classify": maxPerRun,
    })
}

The budget here is the parameter of NewClassify (line 70). The closure captures this variable. The assignment budget = NewInMemorySearchBudget(...) uses = (not :=), so it modifies the captured variable. There is no new local variable declaration for budget — only maxPerRun uses :=. On the next invocation, budget will be non-nil and the if budget == nil branch is skipped.

The draft's claim that "the local variable shadows the closure's captured budget" is incorrect. The budget is created once on the first nil-budget call and reused thereafter. The per-run cap IS enforced (though the InMemorySearchBudget is process-wide, as documented).

Finding 1 is DROPPED.

Evaluation of Finding 2

The draft claims coerceClassifyScore accepts malformed percent strings like "50%extra". Let me trace:

  1. strings.TrimSpace("50%extra")"50%extra"
  2. strings.TrimSuffix("50%extra", "%")"50extra"
  3. fmt.Sscanf("50extra", "%f", &f) parses 50.0 and stops (partial match), returns err == nil
  4. strings.HasSuffix(strings.TrimSpace(v), "%")"50%extra" ends with "a", not %false

So the draft's specific example is wrong: "50%extra" does NOT end with %, so HasSuffix returns false and the function returns 50.0 (not 0.5). However, the underlying issue is real: Sscanf with %f silently ignores trailing garbage like "50extra", returning 50.0 for a clearly malformed input. The function should validate that the entire string was consumed.

Finding 2 is KEPT (with corrected analysis).

Evaluation of Finding 3

  • tools/file_save.go:46: const defaultFileMaxBytes = 10 * 1024 * 1024 // 10 MiB
  • tools/store.go:37: comment says MaxFileBytes int // file_save per-file cap; default 16 MiB
  • tools/store.go:44-45: if d.MaxFileBytes <= 0 { d.MaxFileBytes = 16 << 20 }

Confirmed: RegisterStore defaults to 16 MiB, NewFileSave called directly with 0 defaults to 10 MiB. Real inconsistency.

Finding 3 is KEPT.


Corrected Review

🎯 Correctness — Verdict: Minor issues

1. coerceClassifyScore percent-string parsing accepts malformed input silently

File: tools/classify.go:249-265

case string:
    s := strings.TrimSuffix(strings.TrimSpace(v), "%")
    var f float64
    if _, err := fmt.Sscanf(s, "%f", &f); err == nil {
        if strings.HasSuffix(strings.TrimSpace(v), "%") {
            f = f / 100.0
        }
        return f, true
    }

fmt.Sscanf with %f parses a leading float and silently ignores trailing garbage. A string like "50extra" (no percent sign) is accepted: s becomes "50extra", Sscanf parses 50.0 and stops, HasSuffix is false, and the function returns (50.0, true). The caller gets a score of 50.0 for a clearly malformed input — the clamping in normaliseClassifyScores would cap it to 1.0, but the value is still silently accepted when it should be rejected.

Impact: An LLM that emits "50extra" instead of 50 gets a score of 1.0 (after clamping) rather than a rejection. This silently corrupts the classification distribution.

Fix: Validate that after removing the % suffix, the remainder is a pure numeric string with no trailing garbage. For example, use strconv.ParseFloat instead of fmt.Sscanf, which rejects trailing characters:

case string:
    trimmed := strings.TrimSpace(v)
    hasPct := strings.HasSuffix(trimmed, "%")
    s := strings.TrimSuffix(trimmed, "%")
    f, err := strconv.ParseFloat(s, 64)
    if err == nil {
        if hasPct {
            f = f / 100.0
        }
        return f, true
    }

2. Minor: defaultFileMaxBytes (10 MiB) vs RegisterStore default (16 MiB) inconsistency

Files: tools/file_save.go:46, tools/store.go:37

NewFileSave falls back to defaultFileMaxBytes = 10 * 1024 * 1024 (10 MiB) when maxFileBytes <= 0. But RegisterStore defaults MaxFileBytes to 16 << 20 (16 MiB) before passing it to NewFileSave. A caller using RegisterStore gets 16 MiB; a caller calling NewFileSave directly with 0 gets 10 MiB.

Not a correctness bug per se (both are valid caps), but the inconsistency is confusing and suggests one of the two constants is stale. The doc comment on StoreDeps.MaxFileBytes says "default 16 MiB", which matches RegisterStore — so defaultFileMaxBytes in file_save.go is likely the stale one.

🧯 Error handling & edge cases — Minor issues

All three findings are confirmed against the actual code. Here is the corrected review.


🧯 Error handling & edge cases — review

Verdict: Minor issues


1. create_file_url.go:181 — Swallowed json.Marshal error

b, _ := json.Marshal(res)

The error from json.Marshal is silently discarded. If marshaling fails (e.g. due to a non-serializable value in the struct), the function returns an empty string with no error, which would confuse the agent loop. Every other tool in this package (file_get.go:97, file_get_metadata.go:84, file_list.go:68, etc.) correctly returns the marshal error.

Fix: Change to b, err := json.Marshal(res) and return the error:

b, err := json.Marshal(res)
if err != nil {
    return "", fmt.Errorf("create_file_url: marshal: %w", err)
}

2. create_file_url.go — Missing ValidateScope call after file fetch

create_file_url fetches the file metadata and checks meta.SkillID against inv.SkillID (cross-skill guard), but it never calls ValidateScope on the file's meta.Scope. Every other file-access tool in this package does:

File Has ValidateScope?
file_get.go:85
file_get_text.go:80
file_get_metadata.go:73
file_delete.go:55
create_file_url.go

Impact: A caller in skill S could mint a public URL for a file that belongs to skill S but was saved under user:bob scope, even when the caller is alice. The cross-skill check passes (same skill), but the scope isolation is bypassed — the file was meant to be private to bob.

Fix: Add the same scope-validation pattern used in file_get.go:

grantedViaDescendant := false
if meta.SkillID != inv.SkillID {
    if !descendantFileGrant(ctx, fileStorage, inv, meta.SkillID) {
        return "", fmt.Errorf("create_file_url: file_id %q does not belong to this skill (cross-skill refs rejected)", args.FileID)
    }
    grantedViaDescendant = true
}
if err := ValidateScope(inv, meta.Scope, false); err != nil && !grantedViaDescendant {
    return "", fmt.Errorf("create_file_url: %w", err)
}

3. Byte-boundary truncation can split multi-byte UTF-8 characters

Three tools truncate input text at a hard byte boundary without regard for UTF-8 rune boundaries:

  • classify.go:114: text = text[:classifyMaxInputBytes] (16 KiB)
  • extract_entities.go:124: text = text[:extractEntitiesMaxInputBytes] (32 KiB)
  • summarize.go:213: return text[:summarizeMaxInputBytes], true, nil (32 KiB)

If the byte boundary falls in the middle of a multi-byte UTF-8 character, the resulting string contains an invalid trailing byte sequence. While LLMs are generally tolerant, this can cause tokenization artifacts or trigger validation warnings in downstream processing.

Fix: Truncate at the nearest rune boundary instead:

func truncateAtRuneBoundary(s string, maxBytes int) string {
    if len(s) <= maxBytes {
        return s
    }
    // Find the last complete rune within the limit.
    s = s[:maxBytes]
    for len(s) > 0 && !utf8.Valid([]byte{s[len(s)-1]}) {
        s = s[:len(s)-1]
    }
    return s
}

(Add "unicode/utf8" to imports.)

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 10m 8s

<!-- gadfly-review:ollama:deepseek-v4-flash:cloud --> ### 🪰 Gadfly review — `deepseek-v4-flash:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Blocking issues found</summary> ## 🔒 Security Review — Verdict: **Blocking issues found** ### 1. `tools/create_file_url.go:127` — Missing scope validation allows cross-user public URL minting **Confirmed.** The `create_file_url` handler checks cross-skill ownership (line 127) but **never calls `ValidateScope(inv, meta.Scope, false)`** after fetching the file metadata. Every other file-access tool (`file_get.go:85`, `file_get_metadata.go:73`, `file_get_text.go:80`, `file_delete.go:55`, `summarize.go:201`) performs this scope check. **Impact:** A caller in skill X can mint a **public, unauthenticated URL** for *any* file in skill X, regardless of its scope. For example, if a file was saved with scope `user:bob`, caller `alice` (same skill) can still create a public URL for it. The file's bytes become accessible to anyone with the link, bypassing the per-user isolation that `ValidateScope` enforces. This is a privilege escalation / access control bypass. **Suggested fix:** Add a scope validation after the cross-skill check, matching the pattern in `file_get.go`: ```go // After the cross-skill check (line 129), add: if err := ValidateScope(inv, meta.Scope, false); err != nil { return "", fmt.Errorf("create_file_url: %w", err) } ``` The descendant-grant exception should also be handled: when `descendantFileGrant` passes, the scope check should be skipped (as `file_get.go` does with `grantedViaDescendant`), because the worker's file scope is the worker's run, not the caller's. --- ### 2. `tools/file_delete.go:52` — Missing descendant grant for cross-skill file deletion (fail-closed, lower severity) **Confirmed.** `file_delete.go` rejects any file where `meta.SkillID != inv.SkillID` (line 52) without consulting `descendantFileGrant`. All other file-access tools (`file_get.go:75`, `file_get_metadata.go:67`, `file_get_text.go:74`, `create_file_url.go:127`) allow the descendant-grant exception. **Impact:** A parent skill that spawned a worker cannot delete the worker's files, even though it can read them. This is fail-closed (safe by default) but breaks the parent/worker handoff pattern that the descendant grant was designed to support. A worker's orphaned files would accumulate until the run-scope sweeper cleans them. **Suggested fix:** Add the descendant grant check before the cross-skill rejection, matching the pattern in `file_get.go`: ```go if meta.SkillID != inv.SkillID { if !descendantFileGrant(ctx, storage, inv, meta.SkillID) { return "", fmt.Errorf("file_delete: file does not belong to this skill") } } ``` --- ### 3. `tools/research_defaults.go` — `InMemorySearchBudget` is explicitly not production-correct (documented, but worth flagging) **Confirmed.** The `InMemorySearchBudget` (lines 37–85) is a process-wide map that never resets across runs. The documentation at lines 29–36 explicitly says: *"NOT production-correct because the map persists across the process lifetime; production wiring MUST plug a per-run reset."* The default wiring in `tools.go:59` uses this implementation. **Impact:** In a test or misconfigured deployment where this default is used in production, budget counters bleed across runs. A single run could exhaust the budget for all subsequent runs of the same kind (e.g., `classify`, `summarize`), causing false "budget exceeded" rejections. This is a documented gap, not a code bug, but it is a security-relevant deployment pitfall. **Suggested fix:** Ensure production wiring supplies a per-run-resetting `SearchBudget` implementation. The `InMemorySearchBudget` should be restricted to test use only, perhaps by unexporting it or adding a `// +build test` constraint. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> Now I have all the information I need to evaluate each finding. ## Evaluation of Finding 1 The draft claims that when `budget == nil`, a new `InMemorySearchBudget` is created inside the handler closure, and "the local variable shadows the closure's captured `budget`", so every call gets a fresh budget. Looking at the actual code in `classify.go` lines 118-126: ```go if budget == nil { maxPerRun := classifyFallbackMaxPerRun if cfg != nil { maxPerRun = cfg.MaxPerRun(ctx) } budget = NewInMemorySearchBudget(map[string]int{ "classify": maxPerRun, }) } ``` The `budget` here is the **parameter** of `NewClassify` (line 70). The closure captures this variable. The assignment `budget = NewInMemorySearchBudget(...)` uses `=` (not `:=`), so it **modifies the captured variable**. There is no new local variable declaration for `budget` — only `maxPerRun` uses `:=`. On the next invocation, `budget` will be non-nil and the `if budget == nil` branch is skipped. The draft's claim that "the local variable shadows the closure's captured budget" is **incorrect**. The budget is created once on the first nil-budget call and reused thereafter. The per-run cap IS enforced (though the `InMemorySearchBudget` is process-wide, as documented). **Finding 1 is DROPPED.** ## Evaluation of Finding 2 The draft claims `coerceClassifyScore` accepts malformed percent strings like `"50%extra"`. Let me trace: 1. `strings.TrimSpace("50%extra")` → `"50%extra"` 2. `strings.TrimSuffix("50%extra", "%")` → `"50extra"` 3. `fmt.Sscanf("50extra", "%f", &f)` parses `50.0` and stops (partial match), returns `err == nil` 4. `strings.HasSuffix(strings.TrimSpace(v), "%")` — `"50%extra"` ends with `"a"`, not `%` → **false** So the draft's specific example is wrong: `"50%extra"` does NOT end with `%`, so `HasSuffix` returns false and the function returns `50.0` (not `0.5`). However, the underlying issue is real: `Sscanf` with `%f` silently ignores trailing garbage like `"50extra"`, returning `50.0` for a clearly malformed input. The function should validate that the entire string was consumed. **Finding 2 is KEPT** (with corrected analysis). ## Evaluation of Finding 3 - `tools/file_save.go:46`: `const defaultFileMaxBytes = 10 * 1024 * 1024 // 10 MiB` - `tools/store.go:37`: comment says `MaxFileBytes int // file_save per-file cap; default 16 MiB` - `tools/store.go:44-45`: `if d.MaxFileBytes <= 0 { d.MaxFileBytes = 16 << 20 }` Confirmed: `RegisterStore` defaults to 16 MiB, `NewFileSave` called directly with `0` defaults to 10 MiB. Real inconsistency. **Finding 3 is KEPT.** --- ## Corrected Review ## 🎯 Correctness — Verdict: **Minor issues** ### 1. `coerceClassifyScore` percent-string parsing accepts malformed input silently **File:** `tools/classify.go:249-265` ```go case string: s := strings.TrimSuffix(strings.TrimSpace(v), "%") var f float64 if _, err := fmt.Sscanf(s, "%f", &f); err == nil { if strings.HasSuffix(strings.TrimSpace(v), "%") { f = f / 100.0 } return f, true } ``` `fmt.Sscanf` with `%f` parses a leading float and silently ignores trailing garbage. A string like `"50extra"` (no percent sign) is accepted: `s` becomes `"50extra"`, `Sscanf` parses `50.0` and stops, `HasSuffix` is false, and the function returns `(50.0, true)`. The caller gets a score of `50.0` for a clearly malformed input — the clamping in `normaliseClassifyScores` would cap it to `1.0`, but the value is still silently accepted when it should be rejected. **Impact:** An LLM that emits `"50extra"` instead of `50` gets a score of `1.0` (after clamping) rather than a rejection. This silently corrupts the classification distribution. **Fix:** Validate that after removing the `%` suffix, the remainder is a pure numeric string with no trailing garbage. For example, use `strconv.ParseFloat` instead of `fmt.Sscanf`, which rejects trailing characters: ```go case string: trimmed := strings.TrimSpace(v) hasPct := strings.HasSuffix(trimmed, "%") s := strings.TrimSuffix(trimmed, "%") f, err := strconv.ParseFloat(s, 64) if err == nil { if hasPct { f = f / 100.0 } return f, true } ``` --- ### 2. Minor: `defaultFileMaxBytes` (10 MiB) vs `RegisterStore` default (16 MiB) inconsistency **Files:** `tools/file_save.go:46`, `tools/store.go:37` `NewFileSave` falls back to `defaultFileMaxBytes = 10 * 1024 * 1024` (10 MiB) when `maxFileBytes <= 0`. But `RegisterStore` defaults `MaxFileBytes` to `16 << 20` (16 MiB) before passing it to `NewFileSave`. A caller using `RegisterStore` gets 16 MiB; a caller calling `NewFileSave` directly with `0` gets 10 MiB. Not a correctness bug per se (both are valid caps), but the inconsistency is confusing and suggests one of the two constants is stale. The doc comment on `StoreDeps.MaxFileBytes` says "default 16 MiB", which matches `RegisterStore` — so `defaultFileMaxBytes` in `file_save.go` is likely the stale one. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> All three findings are confirmed against the actual code. Here is the corrected review. --- ## 🧯 Error handling & edge cases — review **Verdict: Minor issues** --- ### 1. `create_file_url.go:181` — Swallowed `json.Marshal` error ```go b, _ := json.Marshal(res) ``` The error from `json.Marshal` is silently discarded. If marshaling fails (e.g. due to a non-serializable value in the struct), the function returns an empty string with no error, which would confuse the agent loop. Every other tool in this package (`file_get.go:97`, `file_get_metadata.go:84`, `file_list.go:68`, etc.) correctly returns the marshal error. **Fix:** Change to `b, err := json.Marshal(res)` and return the error: ```go b, err := json.Marshal(res) if err != nil { return "", fmt.Errorf("create_file_url: marshal: %w", err) } ``` --- ### 2. `create_file_url.go` — Missing `ValidateScope` call after file fetch `create_file_url` fetches the file metadata and checks `meta.SkillID` against `inv.SkillID` (cross-skill guard), but it **never calls `ValidateScope`** on the file's `meta.Scope`. Every other file-access tool in this package does: | File | Has `ValidateScope`? | |---|---| | `file_get.go:85` | ✅ | | `file_get_text.go:80` | ✅ | | `file_get_metadata.go:73` | ✅ | | `file_delete.go:55` | ✅ | | `create_file_url.go` | ❌ | **Impact:** A caller in skill `S` could mint a public URL for a file that belongs to skill `S` but was saved under `user:bob` scope, even when the caller is `alice`. The cross-skill check passes (same skill), but the scope isolation is bypassed — the file was meant to be private to `bob`. **Fix:** Add the same scope-validation pattern used in `file_get.go`: ```go grantedViaDescendant := false if meta.SkillID != inv.SkillID { if !descendantFileGrant(ctx, fileStorage, inv, meta.SkillID) { return "", fmt.Errorf("create_file_url: file_id %q does not belong to this skill (cross-skill refs rejected)", args.FileID) } grantedViaDescendant = true } if err := ValidateScope(inv, meta.Scope, false); err != nil && !grantedViaDescendant { return "", fmt.Errorf("create_file_url: %w", err) } ``` --- ### 3. Byte-boundary truncation can split multi-byte UTF-8 characters Three tools truncate input text at a hard byte boundary without regard for UTF-8 rune boundaries: - `classify.go:114`: `text = text[:classifyMaxInputBytes]` (16 KiB) - `extract_entities.go:124`: `text = text[:extractEntitiesMaxInputBytes]` (32 KiB) - `summarize.go:213`: `return text[:summarizeMaxInputBytes], true, nil` (32 KiB) If the byte boundary falls in the middle of a multi-byte UTF-8 character, the resulting string contains an invalid trailing byte sequence. While LLMs are generally tolerant, this can cause tokenization artifacts or trigger validation warnings in downstream processing. **Fix:** Truncate at the nearest rune boundary instead: ```go func truncateAtRuneBoundary(s string, maxBytes int) string { if len(s) <= maxBytes { return s } // Find the last complete rune within the limit. s = s[:maxBytes] for len(s) > 0 && !utf8.Valid([]byte{s[len(s)-1]}) { s = s[:len(s)-1] } return s } ``` (Add `"unicode/utf8"` to imports.) </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 10m 8s</sub>
steve added 1 commit 2026-06-27 02:32:01 +00:00
fix: address verified gadfly P3 review (3-cloud fleet)
executus CI / test (pull_request) Successful in 59s
ee6e9ef9f8
All 3 cloud models converged on a real access-control bug; fixed it + the
other genuine findings (the false-positives were dropped):

Security (HIGH — all 3 models):
- create_file_url skipped ValidateScope: a same-skill caller could mint a
  PUBLIC url for a file scoped to another user/run. Now runs ValidateScope
  (admin-aware), skipped only for the descendant-grant case — mirroring the
  read tools.

Other real fixes:
- ValidateScope hard-coded `false` at every call site (admin branch dead) ->
  pass inv.CallerIsAdmin (the executor sets it via the host AdminPolicy; still
  false/fail-closed when no admin). Stale "no admin flag" comment corrected.
- create_file_url: ExpiresInSeconds clamped BEFORE the *time.Second multiply
  (huge values overflowed to a negative duration that slipped under the cap,
  minting already-expired tokens); swallowed json.Marshal error now returned.
- RegisterMeta: build the default budget WITH the configured MaxPerRun (was
  NewInMemorySearchBudget(nil) -> hardcoded 10, ignoring MetaDeps.MaxPerRun).
- classify: all-zero scores no longer return a false-positive top-1 winner;
  coerceClassifyScore uses strconv.ParseFloat (rejects trailing garbage like
  "50extra" that fmt.Sscanf silently accepted).
- file_delete: honor the descendant grant (parent can clean up a worker's
  artifacts) — was the lone cross-skill-reject-outright file tool.
- meta tools: input caps truncate at a UTF-8 rune boundary (truncateUTF8), not
  mid-rune.
- think: removed the dead `var _ = fmt.Errorf` import-keeper; file_save default
  aligned to 16 MiB (matched RegisterStore).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve closed this pull request 2026-06-27 04:16:08 +00:00
steve deleted branch phase-3-tools 2026-06-27 04:16:08 +00:00
Some checks are pending
executus CI / test (pull_request) Successful in 59s

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/executus#3