d0bd3ec3d9
executus CI / test (push) Has been cancelled
All 3 cloud models converged on a real access-control bug; fixed it + the other genuine findings (the false-positives were dropped): Security (HIGH — all 3 models): - create_file_url skipped ValidateScope: a same-skill caller could mint a PUBLIC url for a file scoped to another user/run. Now runs ValidateScope (admin-aware), skipped only for the descendant-grant case — mirroring the read tools. Other real fixes: - ValidateScope hard-coded `false` at every call site (admin branch dead) -> pass inv.CallerIsAdmin (the executor sets it via the host AdminPolicy; still false/fail-closed when no admin). Stale "no admin flag" comment corrected. - create_file_url: ExpiresInSeconds clamped BEFORE the *time.Second multiply (huge values overflowed to a negative duration that slipped under the cap, minting already-expired tokens); swallowed json.Marshal error now returned. - RegisterMeta: build the default budget WITH the configured MaxPerRun (was NewInMemorySearchBudget(nil) -> hardcoded 10, ignoring MetaDeps.MaxPerRun). - classify: all-zero scores no longer return a false-positive top-1 winner; coerceClassifyScore uses strconv.ParseFloat (rejects trailing garbage like "50extra" that fmt.Sscanf silently accepted). - file_delete: honor the descendant grant (parent can clean up a worker's artifacts) — was the lone cross-skill-reject-outright file tool. - meta tools: input caps truncate at a UTF-8 rune boundary (truncateUTF8), not mid-rune. - think: removed the dead `var _ = fmt.Errorf` import-keeper; file_save default aligned to 16 MiB (matched RegisterStore). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
71 lines
3.0 KiB
Go
71 lines
3.0 KiB
Go
// Package tools — v11 think.
|
|
//
|
|
// Pure prompt-engineering tool: the agent's "thought" is recorded
|
|
// to skill_run_logs (via the audit hook the gated wrapper applies
|
|
// transparently) but produces no side effect. The literature on
|
|
// agent design notes that giving an agent an explicit `think` tool
|
|
// keeps it on plan better than giving it nothing — without one,
|
|
// agents tend to either skip planning OR babble into the final
|
|
// output. With one, planning lands in tool calls and the final
|
|
// output stays clean.
|
|
//
|
|
// V11 deliberately rejects empty thoughts. An agent that learns
|
|
// "calling think with empty args is free" will spam it; a
|
|
// rejection forces the call to actually carry reasoning.
|
|
package tools
|
|
|
|
import (
|
|
"context"
|
|
"strings"
|
|
|
|
"gitea.stevedudenhoeffer.com/steve/executus/tool"
|
|
)
|
|
|
|
type thinkParams struct {
|
|
Thought string `json:"thought" description:"Your reasoning. May be a plan, a working hypothesis, an analysis of a tool result, or anything else you'd note in a private scratchpad. Empty input is rejected — make this load-bearing."`
|
|
}
|
|
|
|
// thinkResponse is intentionally minimal. The agent doesn't need
|
|
// machine-readable output; the value is the audit trail + the
|
|
// implicit "now you've planned, what's next" prompting the call
|
|
// gives the agent loop.
|
|
type thinkResponse struct {
|
|
OK bool `json:"ok"`
|
|
Error string `json:"error,omitempty"`
|
|
}
|
|
|
|
// NewThink constructs the v11 think tool. No deps — the audit
|
|
// hook wrapper handles persistence transparently.
|
|
func NewThink() tool.Tool {
|
|
return tool.NewGatedTool[thinkParams](
|
|
"think",
|
|
"Record a thought / plan / working hypothesis. The thought is logged to the run trace but does NOT affect any external state. Use to slow down before a tricky tool call, sketch a multi-step plan, or summarise findings before continuing. Empty thoughts are rejected.",
|
|
tool.Permission{
|
|
AuthoringRequirement: tool.RequirementAnyone,
|
|
OperatesOn: tool.ScopeGlobal,
|
|
SafeForShare: true,
|
|
Categories: []string{"utility"},
|
|
},
|
|
func(_ context.Context, _ tool.Invocation, p thinkParams) (string, error) {
|
|
if strings.TrimSpace(p.Thought) == "" {
|
|
// Returns ok:false in a structured envelope rather
|
|
// than an error so the agent loop continues with a
|
|
// recoverable signal.
|
|
return `{"ok":false,"error":"empty_thought"}`, nil
|
|
}
|
|
// Successful think emits a flat JSON. The audit hook
|
|
// (auto-injected by NewGatedTool) writes the args + result
|
|
// pair so the trace UI shows the thought verbatim.
|
|
return `{"ok":true}`, nil
|
|
},
|
|
)
|
|
}
|
|
|
|
// Note: returning a hand-rolled JSON literal instead of a marshaller
|
|
// keeps think the cheapest possible tool — no heap allocation, no
|
|
// json.Marshal call, no goroutine-local buffer churn. The two output
|
|
// shapes are static. If a future field is added to thinkResponse,
|
|
// switch back to json.Marshal — but until then, the literal is the
|
|
// idiom that matches the tool's "do nothing" intent.
|
|
var _ = thinkResponse{} // declared so vet doesn't flag the unused struct
|