P4: Tier-2 batteries — audit + budget + persona noun #4

Closed
steve wants to merge 4 commits from phase-4-batteries into main
Owner

Phase 4 (first slice): the opt-in batteries that plug into run.Ports, each a clean lift of mort's interface/logic with the GORM impl left in mort, a zero-dependency in-memory default, and a CI-verified core → battery import count of 0.

Batteries

  • audit/run.Audit Sink + per-run Writer (already matches run.RunRecorder) + a queryable in-process Memory store (all 17 read/filter/purge/walk methods). End-to-end test: wire audit.NewSink(audit.NewMemory()) into the executor, run an agent, and the run is recorded with terminal status/output and queryable by caller. NewSink(nil) == no audit.
  • budget/DBBudget (rolling 7-day window) + NoOp satisfying run.Budget (Check/Commit), with a BudgetStorage seam + in-memory rollover. Tests: per-user cap enforced, window rolls over, NoOp always allows.
  • persona/ — the Agent noun decoupled from its Discord shell: agent.go/storage.go/builtin_loader.go with the command-binding embedding dropped (host concern). Agent.ToRunnable() lowers a persona into run.RunnableAgent — the bridge that lets the executor run a persona without importing this battery (the inversion of agentexec.Run(*agents.Agent)). Memory store implements all 11 CRUD + trigger-query methods.

Verification

  • go build/vet/test -race ./... green; core go.sum still free of gorm/redis/discordgo/sqlite.
  • Each battery imports core (run); core imports none of them (verified via go list -deps).

Remaining P4 (separate PR)

skill noun (mirrors persona), critic / schedule / checkpoint batteries, and contrib/store (pure-Go SQLite) — in progress now.

Independent of P3 (#3) — both branch off the merged main. Gadfly's 3-cloud review is advisory; findings will be independently verified.

🤖 Generated with Claude Code

Phase 4 (first slice): the opt-in batteries that plug into `run.Ports`, each a clean lift of mort's interface/logic with the GORM impl left in mort, a zero-dependency in-memory default, and a **CI-verified `core → battery` import count of 0**. ## Batteries - **`audit/`** — `run.Audit` `Sink` + per-run `Writer` (already matches `run.RunRecorder`) + a queryable in-process `Memory` store (all 17 read/filter/purge/walk methods). End-to-end test: wire `audit.NewSink(audit.NewMemory())` into the executor, run an agent, and the run is recorded with terminal status/output and queryable by caller. `NewSink(nil)` == no audit. - **`budget/`** — `DBBudget` (rolling 7-day window) + `NoOp` satisfying `run.Budget` (Check/Commit), with a `BudgetStorage` seam + in-memory rollover. Tests: per-user cap enforced, window rolls over, NoOp always allows. - **`persona/`** — the **Agent noun** decoupled from its Discord shell: `agent.go`/`storage.go`/`builtin_loader.go` with the command-binding embedding dropped (host concern). `Agent.ToRunnable()` lowers a persona into `run.RunnableAgent` — the bridge that lets the executor run a persona without importing this battery (the inversion of `agentexec.Run(*agents.Agent)`). `Memory` store implements all 11 CRUD + trigger-query methods. ## Verification - `go build/vet/test -race ./...` green; core `go.sum` still free of gorm/redis/discordgo/sqlite. - Each battery imports core (`run`); core imports none of them (verified via `go list -deps`). ## Remaining P4 (separate PR) skill noun (mirrors persona), critic / schedule / checkpoint batteries, and `contrib/store` (pure-Go SQLite) — in progress now. Independent of P3 (#3) — both branch off the merged main. Gadfly's 3-cloud review is advisory; findings will be independently verified. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 3 commits 2026-06-27 02:35:52 +00:00
First Tier-2 battery, plugging into run.Ports.Audit:
- storage.go/writer.go: skillaudit's Storage interface + per-run Writer moved
  clean (only utils->fmt); the Writer already matches run.RunRecorder's shape.
- sink.go: Sink adapts a Storage to run.Audit (StartRun -> a run row + a Writer
  wrapped as run.RunRecorder, converting run.RunStats on Close). NewSink(nil) is
  equivalent to no audit. Compile-time proofs: Sink is run.Audit, recorder is
  run.RunRecorder.
- memory.go: NewMemory() — a zero-dependency, queryable in-process Storage
  (retains runs + logs; all 17 read/filter/purge/walk methods) so a light host
  gets run history with no setup. Mort keeps its GORM Storage; contrib/store
  adds durable SQLite at P4.

End-to-end test: wire audit.NewSink(audit.NewMemory()) into the executor, run an
agent, and the run is recorded with terminal status/output and queryable by
caller. CI invariant verified: core imports ZERO from the audit battery (proper
battery direction; battery imports core, never the reverse).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second Tier-2 battery, plugging into run.Ports.Budget:
- budget.go: skillexec's BudgetTracker / NoOpBudget / DBBudget moved clean
  (stdlib only). Check/Commit match run.Budget exactly (compile-time proof in
  run.go: NoOpBudget and *DBBudget are run.Budget).
- storage.go: the BudgetStorage seam + SkillBudget domain, split out of mort's
  GORM file (the GORM impl stays in mort).
- memory.go: NewMemory() — zero-dependency in-process BudgetStorage with the
  7-day rolling-window rollover in Add.

Tests: per-user cap enforced, window rolls over after 7 days, NoOp always
allows. CI invariant: core imports ZERO from the budget battery.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P4: persona noun — Agent + ToRunnable bridge + Memory store
executus CI / test (pull_request) Failing after 58s
Adversarial Review (Gadfly) / review (pull_request) Successful in 20m1s
f521c583bd
The headline P4 piece (clean redesign): the Agent persona noun, decoupled from
its Discord shell.
- agent.go/storage.go/builtin_loader.go moved from mort's pkg/logic/agents; the
  Storage seam drops the Discord CommandBindingStorage embedding (a host
  concern). The host-entangled files (commands, chatbot_provider, command-
  binding dispatcher, personalization, system) stay in mort.
- runnable.go: Agent.ToRunnable() lowers a persona into run.RunnableAgent — the
  bridge that lets run.Executor run a persona without importing this battery
  (the inversion of agentexec.Run(*agents.Agent)).
- memory.go: NewMemory() — zero-dep in-process persona Storage (all 11 CRUD +
  trigger-query methods).

Tests: ToRunnable field/phase mapping; Memory round-trip. CI invariant: core
imports ZERO from persona.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

Confirmed: OnTool (lines 209-225) writes "args": string(call.Arguments) and "result": truncate(result, 4000) unconditionally for every tool, while OnStep (lines 149-155) + stepHasSecretTool (lines 171-185) only redact the assistant narration. The storage layer (memory.go AppendLog) persists the payload map verbatim, so secret-bearing tool args/results land in skill_run_logs exactly where the redaction was meant to keep secrets out. Both findings in the draft are confirmed against the actual code.

Verdict: Blocking issues found

  • audit/writer.go:209-225OnTool persists raw tool arguments and results for the very tools the writer explicitly flags as secret-bearing, defeating the OnStep redaction. stepHasSecretTool (lines 171-185) redacts the assistant narration on OnStep when a step fires mcp_call / email_send / any http_* tool, on the stated grounds that "the audit trace never persists secret-adjacent assistant text." But the matching OnTool callback writes "args": string(call.Arguments) and "result": truncate(result, 4000) to the log verbatim, unconditionally, for every tool call — these are the fields that actually carry the secret (MCP args, email body/recipients, raw HTTP request/response bodies), not the narration. The secret therefore lands in skill_run_logs exactly where the redaction was meant to keep it out. The redaction is theatre: it covers the text that might echo a secret while persisting the secret itself one event later. Confirmed by reading writer.go in full — there is no gating/redaction between OnTool and appendLog, and memory.go's AppendLog persists the payload map as-is. Suggested fix: apply the same stepHasSecretTool-style predicate (on call.Name) inside OnTool and redact args/result (e.g., "args_redacted": true) for the secret-bearing tool names, mirroring the OnStep policy.

  • audit/writer.go:171-185 — redaction policy is name-prefix based and brittle. strings.HasPrefix(c.Name, "http_") plus an enumerated case "mcp_call", "email_send" is an allowlist-of-shame approach: any new secret-bearing tool added without updating this list silently leaks. This is security-by-coincidence (it relies on every secret tool happening to be named with a known prefix). Worth at minimum a code comment pointing maintainers here when adding tools, or better, a single shared redaction-policy registry consulted by both OnStep and OnTool. Not blocking on its own, but it amplifies the finding above.

🎯 Correctness — Minor issues

VERDICT: Minor issues

The cross-package seams (run.Audit/run.RunRecorder/run.Budget/run.RunnableAgent) line up with the kernel — Sink.StartRun maps RunInfoSkillRun field-by-field, recorder.Close maps run.RunStatsaudit.RunStats, and the budget test logic checks out (rollover at 8d, cap enforcement at 110/100). The surviving issues are divergences between audit/memory.go and its own documented Storage contract — semantic bugs a caller relying on the doc would hit. Termination is still safe everywhere (cycles are caught by the seen map), so none of these are blocking.

  • audit/memory.go:209WalkParentChain returns the chain in the WRONG order. The Storage contract (audit/storage.go:202) states "oldest = root first, newest = runID last". The implementation appends runID first then walks up via ParentRunID, producing [runID, parent, …, root] (newest-first) — the exact reverse. A consumer relying on root-first ordering (the doc names "the loop guard in skill_invoke") gets the opposite. Fix: reverse chain before returning, or build it by appending to the front.

  • audit/memory.go:203ListChildrenByParent returns newest-first; contract says oldest-first. audit/storage.go:192 documents "oldest first"; the impl calls newestFirst(...) (reverse insertion order). Same divergence in ListFinishedRunsBefore at memory.go:226 vs. the "oldest first" wording at storage.go:212.

  • audit/memory.go:191PurgeOlderThan purges by FinishedAt, but the contract (storage.go:187) says StartedAt. The impl only purges rows where r.FinishedAt != nil && r.FinishedAt.Before(t). A run stuck in status="running" (the doc elsewhere calls this "crash mid-run, useful signal") is never purged regardless of age, and a run started before t but finished after t is retained — both contradict "deletes runs whose StartedAt is strictly before t". Align the doc and the impl (the GORM impl lives outside this checkout, so which side is canonical needs confirming against mort).

  • audit/memory.go:102page(...,0,limit) returns ALL matching rows for limit<=0, not "no rows". audit/storage.go:212 documents for ListFinishedRunsBefore that "limit <= 0 yields no rows (the caller is expected to specify a real bound)". page only truncates when limit > 0, so a non-positive limit returns the entire matching set. The same page helper carries the defect through ListRunsByCaller/ListRunsBySkill/ListRunsBySkillPaginated.

  • audit/memory.go:172ListRunsFiltered documents a limit clamp (<=0 or >500 → 50) that memory.go does not implement. audit/storage.go:179 promises a bounded admin scan; page neither substitutes 50 for non-positive limits nor caps at 500, so the Memory backend returns unbounded result sets where the contract promises bounded ones. Align the doc or add the clamp.

  • audit/storage.go:241MaxParentChainDepth = 32 is defined but unused. The interface doc (:205) says "Cap walk depth at 32… if the chain forms a cycle, we want a bounded result." The Memory WalkParentChain relies solely on the seen map (which does terminate cycles) and never honors the documented depth cap. Minor — termination is still guaranteed by seen, but the documented bound is unenforced.

🧯 Error handling & edge cases — Minor issues

Verdict: Minor issues

  • audit/memory.go:184-200 (Memory.PurgeOlderThan) — the Storage interface contract (audit/storage.go:187-189) states "deletes runs (and their logs) whose StartedAt is strictly before t". The implementation instead purges only when r.FinishedAt != nil && r.FinishedAt.Before(t). Concrete edge impact: a run that crashed mid-execution (status stuck at "running", FinishedAt == nil) is never purged regardless of how old its StartedAt is, so on a long-lived light-host process using Memory, crashed rows accumulate forever — the very retention concern the Memory godoc warns about. Either fix the predicate to match the documented StartedAt semantics (falling back to StartedAt when FinishedAt == nil), or update the interface comment to reflect that purge is keyed on FinishedAt and unfinished rows are intentionally retained. Confirmed against both files.

  • audit/memory.go:209-224 (WalkParentChain) — the interface doc (audit/storage.go:204-208) promises "Cap walk depth at 32 ... corrupted data ... bounded result rather than an infinite loop", and MaxParentChainDepth is exported as 32 (storage.go:245). The implementation guards via a seen map with no depth cap, and grep confirms MaxParentChainDepth has no other reference in the repo (dead constant). A cyclic chain still terminates via the seen set, so there's no infinite-loop risk — but the bounded-depth contract documented on the interface is not honored. Either enforce the depth (if len(chain) >= MaxParentChainDepth { break }) or correct the doc to say the seen set is the sole guard. Confirmed.

  • budget/budget.go:96-106 (NewDBBudget) — the constructor nil-guards now (defaults to time.Now) but does not guard storage or weeklyLimit. DBBudget.Check (line 128) immediately calls b.storage.Get(...) and (line 134) b.weeklyLimit(); a caller passing nil for either gets a hard nil-pointer / nil-func panic. The constructor's own doc only declares now nil-able ("now may be nil — defaults to time.Now"), so this is not a contract violation, but the silent panic-vs-loud-construction asymmetry with now is worth a defensive guard or an explicit "required-non-nil" note in the doc. Real panic risk; no buggy caller exists today (only the test wires it, with valid args).

  • audit/sink.go:39-50 (Sink.StartRun)s.storage.StartRun error is deliberately swallowed (_ = ...). The recorder is returned regardless, and on Close the writer calls FinishRun, which against Memory returns ErrNotFound for a run that never started — also swallowed via slog.Warn (audit/writer.go:288-289). Net effect on the unhappy path: a failed start yields a silent no-op audit row plus a periodic warn-noise stream rather than a clean failure. Confirmed intentional ("Best-effort: a failed StartRun must not break the user-visible run"); noted only as a swallowed-error consequence — no change required if the design choice is conscious.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9m 50s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Blocking issues found</summary> Confirmed: `OnTool` (lines 209-225) writes `"args": string(call.Arguments)` and `"result": truncate(result, 4000)` unconditionally for every tool, while `OnStep` (lines 149-155) + `stepHasSecretTool` (lines 171-185) only redact the assistant *narration*. The storage layer (`memory.go` `AppendLog`) persists the payload map verbatim, so secret-bearing tool args/results land in `skill_run_logs` exactly where the redaction was meant to keep secrets out. Both findings in the draft are confirmed against the actual code. Verdict: Blocking issues found - **`audit/writer.go:209-225` — `OnTool` persists raw tool arguments and results for the very tools the writer explicitly flags as secret-bearing, defeating the `OnStep` redaction.** `stepHasSecretTool` (lines 171-185) redacts the assistant *narration* on `OnStep` when a step fires `mcp_call` / `email_send` / any `http_*` tool, on the stated grounds that "the audit trace never persists secret-adjacent assistant text." But the matching `OnTool` callback writes `"args": string(call.Arguments)` and `"result": truncate(result, 4000)` to the log **verbatim, unconditionally, for every tool call** — these are the fields that actually *carry* the secret (MCP args, email body/recipients, raw HTTP request/response bodies), not the narration. The secret therefore lands in `skill_run_logs` exactly where the redaction was meant to keep it out. The redaction is theatre: it covers the text that *might echo* a secret while persisting the secret itself one event later. Confirmed by reading `writer.go` in full — there is no gating/redaction between `OnTool` and `appendLog`, and `memory.go`'s `AppendLog` persists the payload map as-is. Suggested fix: apply the same `stepHasSecretTool`-style predicate (on `call.Name`) inside `OnTool` and redact `args`/`result` (e.g., `"args_redacted": true`) for the secret-bearing tool names, mirroring the `OnStep` policy. - **`audit/writer.go:171-185` — redaction policy is name-prefix based and brittle.** `strings.HasPrefix(c.Name, "http_")` plus an enumerated `case "mcp_call", "email_send"` is an allowlist-of-shame approach: any new secret-bearing tool added without updating this list silently leaks. This is security-by-coincidence (it relies on every secret tool happening to be named with a known prefix). Worth at minimum a code comment pointing maintainers here when adding tools, or better, a single shared redaction-policy registry consulted by both `OnStep` and `OnTool`. Not blocking on its own, but it amplifies the finding above. </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> **VERDICT: Minor issues** The cross-package seams (`run.Audit`/`run.RunRecorder`/`run.Budget`/`run.RunnableAgent`) line up with the kernel — `Sink.StartRun` maps `RunInfo`→`SkillRun` field-by-field, `recorder.Close` maps `run.RunStats`→`audit.RunStats`, and the budget test logic checks out (rollover at 8d, cap enforcement at 110/100). The surviving issues are divergences between `audit/memory.go` and its own documented `Storage` contract — semantic bugs a caller relying on the doc would hit. Termination is still safe everywhere (cycles are caught by the `seen` map), so none of these are blocking. - **`audit/memory.go:209` — `WalkParentChain` returns the chain in the WRONG order.** The `Storage` contract (`audit/storage.go:202`) states *"oldest = root first, newest = runID last"*. The implementation appends `runID` first then walks up via `ParentRunID`, producing `[runID, parent, …, root]` (newest-first) — the exact reverse. A consumer relying on root-first ordering (the doc names "the loop guard in skill_invoke") gets the opposite. Fix: reverse `chain` before returning, or build it by appending to the front. - **`audit/memory.go:203` — `ListChildrenByParent` returns newest-first; contract says oldest-first.** `audit/storage.go:192` documents *"oldest first"*; the impl calls `newestFirst(...)` (reverse insertion order). Same divergence in `ListFinishedRunsBefore` at `memory.go:226` vs. the "oldest first" wording at `storage.go:212`. - **`audit/memory.go:191` — `PurgeOlderThan` purges by `FinishedAt`, but the contract (`storage.go:187`) says `StartedAt`.** The impl only purges rows where `r.FinishedAt != nil && r.FinishedAt.Before(t)`. A run stuck in `status="running"` (the doc elsewhere calls this "crash mid-run, useful signal") is never purged regardless of age, and a run started before `t` but finished after `t` is retained — both contradict "deletes runs whose StartedAt is strictly before t". Align the doc and the impl (the GORM impl lives outside this checkout, so which side is canonical needs confirming against mort). - **`audit/memory.go:102` — `page(...,0,limit)` returns ALL matching rows for `limit<=0`, not "no rows".** `audit/storage.go:212` documents for `ListFinishedRunsBefore` that *"limit <= 0 yields no rows (the caller is expected to specify a real bound)"*. `page` only truncates when `limit > 0`, so a non-positive limit returns the entire matching set. The same `page` helper carries the defect through `ListRunsByCaller`/`ListRunsBySkill`/`ListRunsBySkillPaginated`. - **`audit/memory.go:172` — `ListRunsFiltered` documents a limit clamp (`<=0 or >500 → 50`) that `memory.go` does not implement.** `audit/storage.go:179` promises a bounded admin scan; `page` neither substitutes 50 for non-positive limits nor caps at 500, so the Memory backend returns unbounded result sets where the contract promises bounded ones. Align the doc or add the clamp. - **`audit/storage.go:241` — `MaxParentChainDepth = 32` is defined but unused.** The interface doc (`:205`) says *"Cap walk depth at 32… if the chain forms a cycle, we want a bounded result."* The Memory `WalkParentChain` relies solely on the `seen` map (which does terminate cycles) and never honors the documented depth cap. Minor — termination is still guaranteed by `seen`, but the documented bound is unenforced. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> Verdict: Minor issues - **`audit/memory.go:184-200` (`Memory.PurgeOlderThan`)** — the `Storage` interface contract (`audit/storage.go:187-189`) states *"deletes runs (and their logs) whose **StartedAt** is strictly before t"*. The implementation instead purges only when `r.FinishedAt != nil && r.FinishedAt.Before(t)`. Concrete edge impact: a run that crashed mid-execution (status stuck at `"running"`, `FinishedAt == nil`) is **never** purged regardless of how old its `StartedAt` is, so on a long-lived light-host process using `Memory`, crashed rows accumulate forever — the very retention concern the `Memory` godoc warns about. Either fix the predicate to match the documented `StartedAt` semantics (falling back to `StartedAt` when `FinishedAt == nil`), or update the interface comment to reflect that purge is keyed on `FinishedAt` and unfinished rows are intentionally retained. Confirmed against both files. - **`audit/memory.go:209-224` (`WalkParentChain`)** — the interface doc (`audit/storage.go:204-208`) promises *"Cap walk depth at 32 ... corrupted data ... bounded result rather than an infinite loop"*, and `MaxParentChainDepth` is exported as 32 (`storage.go:245`). The implementation guards via a `seen` map with **no depth cap**, and `grep` confirms `MaxParentChainDepth` has no other reference in the repo (dead constant). A cyclic chain still terminates via the `seen` set, so there's no infinite-loop risk — but the bounded-depth contract documented on the interface is not honored. Either enforce the depth (`if len(chain) >= MaxParentChainDepth { break }`) or correct the doc to say the `seen` set is the sole guard. Confirmed. - **`budget/budget.go:96-106` (`NewDBBudget`)** — the constructor nil-guards `now` (defaults to `time.Now`) but does **not** guard `storage` or `weeklyLimit`. `DBBudget.Check` (line 128) immediately calls `b.storage.Get(...)` and (line 134) `b.weeklyLimit()`; a caller passing `nil` for either gets a hard nil-pointer / nil-func panic. The constructor's own doc only declares `now` nil-able ("now may be nil — defaults to time.Now"), so this is not a contract violation, but the silent panic-vs-loud-construction asymmetry with `now` is worth a defensive guard or an explicit "required-non-nil" note in the doc. Real panic risk; no buggy caller exists today (only the test wires it, with valid args). - **`audit/sink.go:39-50` (`Sink.StartRun`)** — `s.storage.StartRun` error is deliberately swallowed (`_ = ...`). The `recorder` is returned regardless, and on `Close` the writer calls `FinishRun`, which against `Memory` returns `ErrNotFound` for a run that never started — also swallowed via `slog.Warn` (`audit/writer.go:288-289`). Net effect on the unhappy path: a failed start yields a silent no-op audit row plus a periodic warn-noise stream rather than a clean failure. Confirmed intentional ("Best-effort: a failed StartRun must not break the user-visible run"); noted only as a swallowed-error consequence — no change required if the design choice is conscious. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9m 50s</sub>

🪰 Gadfly review — deepseek-v4-flash:cloud (ollama-cloud)

Verdict: Blocking issues found · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

I have independently verified the draft's finding against the actual code. Here is the corrected review.

🔒 Security Review — Verdict: Minor issues

Finding 1: Tool call arguments logged without redaction — potential secret leakage in audit trail

audit/writer.go:214-218 — The OnTool method persists the full raw call.Arguments (a json.RawMessage) to the audit log as the "args" field of a tool_call event:

w.appendLog("tool_call", map[string]any{
    "name": call.Name,
    "args": string(call.Arguments),   // <-- raw, unredacted
    "id":   call.ID,
})

The codebase already has a stepHasSecretTool function (audit/writer.go:171-185) that redacts narration text when secret-bearing tools (mcp_call, email_send, http_*) are called — but that redaction only applies to the "text" field in the "step" event. The tool call arguments themselves are logged without any equivalent filtering. This means:

  • email_send: email body, recipients, subject are persisted in the audit log
  • http_*: HTTP request bodies, headers, and URL query parameters (which may contain API keys, bearer tokens, etc.) are persisted
  • mcp_call: MCP server arguments that could contain secrets are persisted

The audit store (Memory or persistent) is queryable by callers via ListRunsByCaller, ListRunsFiltered, etc. — so any user who can query the audit trail can read these arguments.

Suggested fix: Apply the same secret-tool detection from stepHasSecretTool to the OnTool method. For secret-bearing tools, either redact the arguments (e.g. replace with "[redacted]") or log only the tool name and ID without the arguments. Example:

func (w *Writer) OnTool(call llm.ToolCall, result string) {
    if w == nil || w.storage == nil {
        return
    }
    w.calls.Add(1)
    args := string(call.Arguments)
    if isSecretTool(call.Name) {
        args = "[redacted]"
    }
    w.appendLog("tool_call", map[string]any{
        "name": call.Name,
        "args": args,
        "id":   call.ID,
    })
    // ... tool_result logging (result may also need redaction)
}

(Verified by reading audit/writer.go lines 209-225 and 171-185, confirming stepHasSecretTool exists but is unused in OnTool.)

🎯 Correctness⚠️ could not complete

⚠️ This reviewer failed to complete: agent produced no output

🧯 Error handling & edge cases — Blocking issues found

Now I have read all the relevant files. Let me verify each finding.

Finding 1 (PurgeOlderThan checks FinishedAt instead of StartedAt): The interface doc at audit/storage.go:187-189 says "whose StartedAt is strictly before t". The implementation at audit/memory.go:191 checks r.FinishedAt != nil && r.FinishedAt.Before(t). This is a clear contract violation. CONFIRMED.

Finding 2 (WalkParentChain ignores the 32-depth cap): The interface doc at audit/storage.go:205-208 says "Cap walk depth at 32". The constant MaxParentChainDepth = 32 exists at audit/storage.go:245. The implementation at audit/memory.go:209-223 has only cycle detection (seen map) but no depth counter. CONFIRMED.

Finding 3 (ListChildrenByParent returns newest-first, not oldest-first): The interface doc at audit/storage.go:191-199 says "oldest first". The implementation at audit/memory.go:203-206 delegates to m.newestFirst(). CONFIRMED.

Finding 4 (ListFinishedRunsBefore returns newest-first, not oldest-first): The interface doc at audit/storage.go:211-223 says "oldest first". The implementation at audit/memory.go:226-231 uses m.newestFirst(). CONFIRMED.

Finding 5 (ListRunsFiltered does not clamp limit): The interface doc at audit/storage.go:179-180 says "limit is clamped (<=0 or >500 → 50)". The implementation at audit/memory.go:172-175 passes limit straight to page() which only applies the limit when limit > 0 && limit < len(rs). A caller passing limit=0 gets all results. CONFIRMED.

Finding 6 (appendLog uses context.Background() with no timeout): At audit/writer.go:312, appendLog calls w.storage.AppendLog(context.Background(), log) with no deadline. This is inconsistent with Close (line 286) which uses context.WithTimeout(context.WithoutCancel(ctx), auditFinishTimeout). CONFIRMED.

Finding 7 (Silently swallowed StartRun error): At audit/sink.go:40, _ = s.storage.StartRun(...) discards the error. If StartRun fails, the recorder is still returned, and subsequent appendLog calls will succeed but CloseFinishRun will fail with ErrNotFound. CONFIRMED.

Finding 8 (budget/memory.go:33Get uses full Lock() instead of RLock()): The mutex is sync.Mutex (line 13), not sync.RWMutex. The Get method at line 24-33 is a read-only operation (map lookup + struct copy) but uses the exclusive Lock(). This serializes all concurrent Get calls. CONFIRMED.

All 8 findings are confirmed. The verdict remains Blocking issues found.


🧯 Error handling & edge cases — VERDICT: Blocking issues found

1. audit/memory.go:191PurgeOlderThan checks FinishedAt instead of StartedAt (contract violation)

The Storage interface doc (audit/storage.go:187-189) explicitly says: "PurgeOlderThan deletes runs (and their logs) whose StartedAt is strictly before t." The Memory implementation at line 191 checks r.FinishedAt != nil && r.FinishedAt.Before(t) instead. This means:

  • Runs that started long ago but are still in-flight (status="running", FinishedAt==nil) are never purged, even though their StartedAt is well before the cutoff.
  • Runs that finished recently but started long ago are purged correctly by accident (if FinishedAt happens to be before t).
  • Runs that started recently but finished long ago are not purged, even though the contract says they should be.

Fix: Change line 191 to if r.StartedAt.Before(t) { (and remove the FinishedAt != nil guard, or keep it only to skip unfinished runs if that's the intent — but the contract says StartedAt).


2. audit/memory.go:209WalkParentChain ignores the documented 32-depth cap

The Storage interface doc (audit/storage.go:205-208) says: "Cap walk depth at 32 to prevent pathological loops in the data itself." A MaxParentChainDepth = 32 constant exists at audit/storage.go:245 but is never referenced by the Memory implementation. The Memory impl at line 209-223 only has cycle detection (seen map) — a corrupted chain of 1000+ entries would walk all of them under the read lock, blocking all other readers.

Fix: Add a depth counter and break after MaxParentChainDepth iterations.


3. audit/memory.go:203ListChildrenByParent returns newest-first, not oldest-first

The interface doc (audit/storage.go:191-199) says "oldest first". The implementation delegates to newestFirst() which returns reverse insertion order. Callers expecting oldest-first ordering (e.g. the call-tree view) get the wrong order.

Fix: Either reverse the result before returning, or add a dedicated oldest-first helper.


4. audit/memory.go:226ListFinishedRunsBefore returns newest-first, not oldest-first

Same pattern as #3. The interface doc (audit/storage.go:211) says "oldest first", but the implementation uses newestFirst().

Fix: Reverse the result or use an oldest-first iteration.


5. audit/memory.go:172ListRunsFiltered does not clamp limit as the interface contract requires

The interface doc (audit/storage.go:179-180) says: "limit is clamped (<=0 or >500 → 50) to bound admin scans." The Memory implementation passes limit straight to page() which only applies the limit when limit > 0 && limit < len(rs). A caller passing limit=0 gets all results (unbounded), and limit=1000 returns up to 1000 rows. This violates the documented safety clamp.

Fix: Add clamping logic in ListRunsFiltered (or in page()) before calling page():

if limit <= 0 || limit > 500 {
    limit = 50
}

6. audit/writer.go:310appendLog uses context.Background() with no timeout

Every OnStep, OnTool, LogEvent, and LogError call goes through appendLog, which uses context.Background() with no deadline. If the storage implementation is a remote DB that hangs, the goroutine blocks indefinitely — there is no way to cancel or time out. This is inconsistent with Close (line 286) which properly uses context.WithTimeout(context.WithoutCancel(ctx), auditFinishTimeout).

Fix: Use a context with a short timeout, e.g.:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := w.storage.AppendLog(ctx, log); err != nil { ... }

7. audit/sink.go:73 — Silently swallowed StartRun error creates orphaned log entries

The StartRun call to storage at line 73 discards its error with _ =. If StartRun fails (e.g. storage is temporarily unavailable), the recorder is still returned. Subsequent appendLog calls will succeed (appending to the logs map), but CloseFinishRun will fail with ErrNotFound because the run row was never created. This leaves orphaned log entries with no corresponding run row — a data integrity hole.

Fix: At minimum, log the StartRun failure so operators can detect the data loss. Alternatively, return nil from StartRun when the storage write fails (skipping audit entirely for that run, which is cleaner than partial data).


8. budget/memory.go:33Get uses full Lock() instead of RLock()

The Get method is a read-only operation (map lookup + struct copy) but the mutex is sync.Mutex (not sync.RWMutex), so it acquires the exclusive write lock. This serializes all concurrent Get calls even though they could safely run in parallel. Not a correctness bug, but needlessly limits throughput under concurrent budget checks.

Fix: Change mu sync.Mutex to mu sync.RWMutex, then use m.mu.RLock() / m.mu.RUnlock() in Get.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 20m 0s

<!-- gadfly-review:ollama:deepseek-v4-flash:cloud --> ### 🪰 Gadfly review — `deepseek-v4-flash:cloud` (ollama-cloud) **Verdict: Blocking issues found · ⚠️ 1/3 lens(es) errored** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> I have independently verified the draft's finding against the actual code. Here is the corrected review. ## 🔒 Security Review — Verdict: **Minor issues** ### Finding 1: Tool call arguments logged without redaction — potential secret leakage in audit trail **`audit/writer.go:214-218`** — The `OnTool` method persists the full raw `call.Arguments` (a `json.RawMessage`) to the audit log as the `"args"` field of a `tool_call` event: ```go w.appendLog("tool_call", map[string]any{ "name": call.Name, "args": string(call.Arguments), // <-- raw, unredacted "id": call.ID, }) ``` The codebase already has a `stepHasSecretTool` function (`audit/writer.go:171-185`) that redacts *narration text* when secret-bearing tools (`mcp_call`, `email_send`, `http_*`) are called — but that redaction only applies to the `"text"` field in the `"step"` event. The tool call arguments themselves are logged without any equivalent filtering. This means: - **`email_send`**: email body, recipients, subject are persisted in the audit log - **`http_*`**: HTTP request bodies, headers, and URL query parameters (which may contain API keys, bearer tokens, etc.) are persisted - **`mcp_call`**: MCP server arguments that could contain secrets are persisted The audit store (`Memory` or persistent) is queryable by callers via `ListRunsByCaller`, `ListRunsFiltered`, etc. — so any user who can query the audit trail can read these arguments. **Suggested fix**: Apply the same secret-tool detection from `stepHasSecretTool` to the `OnTool` method. For secret-bearing tools, either redact the arguments (e.g. replace with `"[redacted]"`) or log only the tool name and ID without the arguments. Example: ```go func (w *Writer) OnTool(call llm.ToolCall, result string) { if w == nil || w.storage == nil { return } w.calls.Add(1) args := string(call.Arguments) if isSecretTool(call.Name) { args = "[redacted]" } w.appendLog("tool_call", map[string]any{ "name": call.Name, "args": args, "id": call.ID, }) // ... tool_result logging (result may also need redaction) } ``` (Verified by reading `audit/writer.go` lines 209-225 and 171-185, confirming `stepHasSecretTool` exists but is unused in `OnTool`.) </details> <details><summary><b>🎯 Correctness</b> — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent produced no output </details> <details><summary><b>🧯 Error handling & edge cases</b> — Blocking issues found</summary> Now I have read all the relevant files. Let me verify each finding. **Finding 1** (`PurgeOlderThan` checks `FinishedAt` instead of `StartedAt`): The interface doc at `audit/storage.go:187-189` says "whose **StartedAt** is strictly before t". The implementation at `audit/memory.go:191` checks `r.FinishedAt != nil && r.FinishedAt.Before(t)`. This is a clear contract violation. **CONFIRMED.** **Finding 2** (`WalkParentChain` ignores the 32-depth cap): The interface doc at `audit/storage.go:205-208` says "Cap walk depth at 32". The constant `MaxParentChainDepth = 32` exists at `audit/storage.go:245`. The implementation at `audit/memory.go:209-223` has only cycle detection (`seen` map) but no depth counter. **CONFIRMED.** **Finding 3** (`ListChildrenByParent` returns newest-first, not oldest-first): The interface doc at `audit/storage.go:191-199` says "**oldest first**". The implementation at `audit/memory.go:203-206` delegates to `m.newestFirst()`. **CONFIRMED.** **Finding 4** (`ListFinishedRunsBefore` returns newest-first, not oldest-first): The interface doc at `audit/storage.go:211-223` says "**oldest first**". The implementation at `audit/memory.go:226-231` uses `m.newestFirst()`. **CONFIRMED.** **Finding 5** (`ListRunsFiltered` does not clamp `limit`): The interface doc at `audit/storage.go:179-180` says "limit is clamped (<=0 or >500 → 50)". The implementation at `audit/memory.go:172-175` passes `limit` straight to `page()` which only applies the limit when `limit > 0 && limit < len(rs)`. A caller passing `limit=0` gets all results. **CONFIRMED.** **Finding 6** (`appendLog` uses `context.Background()` with no timeout): At `audit/writer.go:312`, `appendLog` calls `w.storage.AppendLog(context.Background(), log)` with no deadline. This is inconsistent with `Close` (line 286) which uses `context.WithTimeout(context.WithoutCancel(ctx), auditFinishTimeout)`. **CONFIRMED.** **Finding 7** (Silently swallowed `StartRun` error): At `audit/sink.go:40`, `_ = s.storage.StartRun(...)` discards the error. If `StartRun` fails, the recorder is still returned, and subsequent `appendLog` calls will succeed but `Close` → `FinishRun` will fail with `ErrNotFound`. **CONFIRMED.** **Finding 8** (`budget/memory.go:33` — `Get` uses full `Lock()` instead of `RLock()`): The mutex is `sync.Mutex` (line 13), not `sync.RWMutex`. The `Get` method at line 24-33 is a read-only operation (map lookup + struct copy) but uses the exclusive `Lock()`. This serializes all concurrent `Get` calls. **CONFIRMED.** All 8 findings are confirmed. The verdict remains **Blocking issues found**. --- ## 🧯 Error handling & edge cases — VERDICT: **Blocking issues found** ### 1. `audit/memory.go:191` — `PurgeOlderThan` checks `FinishedAt` instead of `StartedAt` (contract violation) The `Storage` interface doc (`audit/storage.go:187-189`) explicitly says: *"PurgeOlderThan deletes runs (and their logs) whose **StartedAt** is strictly before t."* The Memory implementation at line 191 checks `r.FinishedAt != nil && r.FinishedAt.Before(t)` instead. This means: - Runs that started long ago but are still in-flight (status=`"running"`, `FinishedAt==nil`) are **never** purged, even though their `StartedAt` is well before the cutoff. - Runs that finished recently but started long ago are purged correctly by accident (if `FinishedAt` happens to be before `t`). - Runs that started recently but finished long ago are **not** purged, even though the contract says they should be. **Fix:** Change line 191 to `if r.StartedAt.Before(t) {` (and remove the `FinishedAt != nil` guard, or keep it only to skip unfinished runs if that's the intent — but the contract says `StartedAt`). --- ### 2. `audit/memory.go:209` — `WalkParentChain` ignores the documented 32-depth cap The `Storage` interface doc (`audit/storage.go:205-208`) says: *"Cap walk depth at 32 to prevent pathological loops in the data itself."* A `MaxParentChainDepth = 32` constant exists at `audit/storage.go:245` but is **never referenced** by the Memory implementation. The Memory impl at line 209-223 only has cycle detection (`seen` map) — a corrupted chain of 1000+ entries would walk all of them under the read lock, blocking all other readers. **Fix:** Add a depth counter and break after `MaxParentChainDepth` iterations. --- ### 3. `audit/memory.go:203` — `ListChildrenByParent` returns newest-first, not oldest-first The interface doc (`audit/storage.go:191-199`) says *"oldest first"*. The implementation delegates to `newestFirst()` which returns reverse insertion order. Callers expecting oldest-first ordering (e.g. the call-tree view) get the wrong order. **Fix:** Either reverse the result before returning, or add a dedicated oldest-first helper. --- ### 4. `audit/memory.go:226` — `ListFinishedRunsBefore` returns newest-first, not oldest-first Same pattern as #3. The interface doc (`audit/storage.go:211`) says *"oldest first"*, but the implementation uses `newestFirst()`. **Fix:** Reverse the result or use an oldest-first iteration. --- ### 5. `audit/memory.go:172` — `ListRunsFiltered` does not clamp `limit` as the interface contract requires The interface doc (`audit/storage.go:179-180`) says: *"limit is clamped (<=0 or >500 → 50) to bound admin scans."* The Memory implementation passes `limit` straight to `page()` which only applies the limit when `limit > 0 && limit < len(rs)`. A caller passing `limit=0` gets **all** results (unbounded), and `limit=1000` returns up to 1000 rows. This violates the documented safety clamp. **Fix:** Add clamping logic in `ListRunsFiltered` (or in `page()`) before calling `page()`: ```go if limit <= 0 || limit > 500 { limit = 50 } ``` --- ### 6. `audit/writer.go:310` — `appendLog` uses `context.Background()` with no timeout Every `OnStep`, `OnTool`, `LogEvent`, and `LogError` call goes through `appendLog`, which uses `context.Background()` with **no deadline**. If the storage implementation is a remote DB that hangs, the goroutine blocks indefinitely — there is no way to cancel or time out. This is inconsistent with `Close` (line 286) which properly uses `context.WithTimeout(context.WithoutCancel(ctx), auditFinishTimeout)`. **Fix:** Use a context with a short timeout, e.g.: ```go ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() if err := w.storage.AppendLog(ctx, log); err != nil { ... } ``` --- ### 7. `audit/sink.go:73` — Silently swallowed `StartRun` error creates orphaned log entries The `StartRun` call to storage at line 73 discards its error with `_ =`. If `StartRun` fails (e.g. storage is temporarily unavailable), the recorder is still returned. Subsequent `appendLog` calls will succeed (appending to the logs map), but `Close` → `FinishRun` will fail with `ErrNotFound` because the run row was never created. This leaves orphaned log entries with no corresponding run row — a data integrity hole. **Fix:** At minimum, log the `StartRun` failure so operators can detect the data loss. Alternatively, return `nil` from `StartRun` when the storage write fails (skipping audit entirely for that run, which is cleaner than partial data). --- ### 8. `budget/memory.go:33` — `Get` uses full `Lock()` instead of `RLock()` The `Get` method is a read-only operation (map lookup + struct copy) but the mutex is `sync.Mutex` (not `sync.RWMutex`), so it acquires the exclusive write lock. This serializes all concurrent `Get` calls even though they could safely run in parallel. Not a correctness bug, but needlessly limits throughput under concurrent budget checks. **Fix:** Change `mu sync.Mutex` to `mu sync.RWMutex`, then use `m.mu.RLock()` / `m.mu.RUnlock()` in `Get`. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 20m 0s</sub>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

VERDICT: Blocking issues found

🔒 Security findings

  • audit/writer.go:209-224 (OnTool) — Tool-arg secret leak. Confirmed. OnStep checks stepHasSecretTool(resp) and redactions the assistant text (text_redacted: true) when a secret-bearing tool fired (mcp_call, email_send, http_* — see stepHasSecretTool at audit/writer.go:171-185). But OnTool at audit/writer.go:214-224 persists args: string(call.Arguments) and result: truncate(result, 4000) verbatim for every tool call, including those same tools. An MCP token, email recipient/body, or HTTP auth header passed as call.Arguments lands in plaintext in skill_run_logs.payload.args, defeating the policy OnStep sets up. Fix: gate args/result on the same stepHasSecretTool-style check (or pass the resp/tool-call list context through) and emit args_redacted: true + length hint for secret-bearing tools, mirroring the existing text_redacted behavior.

  • audit/sink.go:40-50 — Inputs persisted unredacted. Confirmed. info.Inputs from run.RunInfo is copied straight into the SkillRun.Inputs column (audit/storage.go:29 — no comment about redaction; field type map[string]any). The stepHasSecretTool guard in OnStep is per-event, not row-level; nothing redacts the input snapshot, so a user prompt containing secrets/PII persists verbatim.

  • audit/writer.go:303-315 (appendLog) — Unbounded context.Background() on hot path. Confirmed. Every OnStep/OnTool/LogEvent/LogError call ultimately invokes w.storage.AppendLog(context.Background(), …) with no timeout. Only Close uses auditFinishTimeout (10s, audit/writer.go:297). A hung audit DB blocks the run goroutine on every step. Fix: route appendLog through a short-timeout context (auditAppendTimeout, e.g. 1s) for defence-in-depth parity with Close.

  • audit/sink.go:40-50StartRun error silently dropped, recorder still returned. Confirmed. _ = s.storage.StartRun(…) at audit/sink.go:40 swallows the error; appendLog (audit/writer.go:303-315) does not verify the run exists before writing AppendLog events. A failed StartRun produces a silent audit gap with no surfaced signal beyond a per-event slog.Warn.

  • persona/builtin_loader.go:466,562WebhookIPAllowlist not CIDR-validated. Confirmed. The YAML tag webhook_ip_allowlist (builtin_loader.go:466) flows straight into Agent.WebhookIPAllowlist []string (builtin_loader.go:562); grep confirms no net.ParseCIDR / net.ParseIP call anywhere in persona/. The struct comment at persona/agent.go:111 documents "CIDR strings" but the loader accepts arbitrary strings, so malformed entries propagate silently. Fix: validate each entry with net.ParseCIDR at decode time in decodeAgentManifest, slog.Warn per bad entry, skip it (matching the existing slog.Warn pattern for missing skills / extends errors at builtin_loader.go:135,153,160,168,189,194).

🔒 Clean (verified)

  • WebhookSecret deliberately not seeded from YAML; preserved across reloads via fresh.WebhookSecret = existing.WebhookSecret (persona/builtin_loader.go:411, verified). Comment explicitly calls out the credential-leak risk.
  • WebhookSignatureRequired carried forward via fresh.WebhookSignatureRequired = existing.WebhookSignatureRequired (persona/builtin_loader.go:412, verified).
  • dec.KnownFields(true) in decodeAgentManifest (persona/builtin_loader.go:511, verified).
  • audit.NewSink(nil) returns nil recorder; StartRun early-returns nil when s.storage == nil (audit/sink.go:31-33, verified); covered by TestNilSinkRecordsNothing (audit/audit_test.go:71-78).
🎯 Correctness — Minor issues

VERDICT: Minor issues

Findings (Correctness lens):

  • audit/memory.go:188 — kept := m.order[:0:0] is benign but stylistically off. The [:0:0] zero-cap form forces a fresh allocation each call rather than reusing the existing slice's capacity (m.order[:0]). Not a correctness bug; flagged only as a deliberate-looking pattern that may not be intentional.

  • audit/storage.go:187-189 vs audit/memory.go:191 — PurgeOlderThan doc/impl mismatch. The Storage interface comment says "deletes runs (and their logs) whose StartedAt is strictly before t," but the Memory implementation keys on r.FinishedAt.Before(t). Callers passing t = time.Now() expecting "drop old starts" will silently keep running-but-stuck rows. Suggested fix: align the doc to FinishedAt.Before(t) (the actual semantics, which intentionally retain stuck rows) — or change the impl to key on StartedAt if that semantics is wanted.

  • budget/memory.go:47 vs budget/budget.go:133 — boundary asymmetry on the rolling 7-day window. Memory.Add rolls over at now.Sub(WindowStart) >= 7*24*time.Hour (inclusive), while DBBudget.Check enforces at < 7*24*time.Hour (exclusive). The two predicates disagree by exactly one nanosecond. Not a real-world bug, but the asymmetry is real and is worth aligning (both to inclusive, or both to exclusive).

  • persona/runnable.go:10-37 — ToRunnable silently drops a large slice of Agent fields. EncryptionEnabled, ExecutionLane, PersonalizationSources, Schedule, NextRunAt, LastScheduledRunAt, WebhookSecret, WebhookSignatureRequired, WebhookIPAllowlist, ChatbotChannelFilter, DefaultEmoji, StateReactEmoji, Tags, plus Phases[].FallbackMessage and Phases[].IsRunFunc are all lost in the DTO. The doc says "static shape only" so this is by design, but a future phase wiring any of these via RunnableAgent will silently get nothing. Worth a doc note naming the supported lowering set.

  • persona/memory.go:85-98 — GetAgentByWebhookSecret linear-scans every agent under m.mu.RLock. O(N) per incoming webhook. With thousands of personas with webhook secrets enabled, every webhook triggers a full scan holding the RLock. Not a correctness issue today, but a contention hazard that scales poorly and would be cleaner as a map[string]string keyed on WebhookSecret.

  • audit/writer.go:267-271 — Close comment names mort callers that are not in this PR. The comment says "the caller (skillexec.runInner / agentexec.runInner) ALSO detaches at the call site" — those identifiers are mort package paths, not executus code. The actual in-PR caller (run/executor.go:288 via finishAudit) does detach via the package's own detach() helper (run/executor.go:314), so the defence-in-depth behaviour is real; but the comment reads as dead-history for this battery. Substantively harmless, just misleading documentation.

  • go.mod — gopkg.in/yaml.v3 v3.0.1 added at top-level. Consumed by persona/builtin_loader.go. The PR framing claims core/batteries stay stdlib-only; yaml.v3 is the first non-stdlib third-party dep in a new package. Leaf decoder, low risk — but it does deviate from the stated "stdlib-only core" framing, so worth confirming the dependency budget is OK with this addition.

Nothing blocking found. run/ package confirmed to import none of audit, budget, or persona (no matches in grep). The end-to-end audit test wires through run.Ports.Audit correctly (Sink.StartRun with SubjectID = ra.ID verified). ToRunnable mapping round-trips the fields the test checks. DBBudget rolling-window test passes against Memory.Add semantics.

🧯 Error handling & edge cases — Blocking issues found

🧯 Error handling & edge cases review

VERDICT: Blocking issues found

Findings

  • audit/memory.go:144-170 (matchesFilter) contradicts the documented RunFilter contract. The doc on audit/storage.go:104-106 says of RunFilter.IncludeDryRun: "Ignored when Status is set explicitly (an explicit Status=='dry_run' still matches)." But the implementation evaluates the dry-run gate before the Status check and unconditionally excludes status="dry_run" whenever IncludeDryRun=false, even when the caller explicitly set Status="dry_run". Net effect: a host asking for exactly the dry-run rows (Status:"dry_run", IncludeDryRun:false) gets an empty page. Fix: when f.Status != "", skip the dry-run pre-filter. Verified against audit/storage.go:104-106 and audit/memory.go:145-148.

  • audit/memory.go:247 (LastRunBySkills) does not match its documented semantics. Doc on audit/storage.go:232-234 states: "When false, only status='ok' rows count." The implementation only excludes error and timeout; rows with status="budget_exceeded", "preempted", "lane_busy", or "cancelled" still pass through and surface as the "last successful run" timestamp on dashboards. Fix: change the predicate to !includeFailed && r.Status != "ok", or explicitly enumerate every non-ok status you want to drop. Verified against audit/storage.go:230-238 and audit/memory.go:247.

  • audit/writer.go:312 (appendLog) silently drops events on storage failure. Every OnStep / OnTool / LogEvent / LogError ultimately calls AppendLog with context.Background(). If AppendLog consistently errors (broken DB connection), the failure is only surfaced as a slog.Warn with no upper bound on how many warns get emitted. Under a sustained outage a single run can flood the log with thousands of warnings. Acceptable as "audit must not break user-visible execution" but worth a rate-limit / circuit-break (or at least debounce to the first + last error of a run) so a misconfigured DB doesn't drown mort.log. Edge-case concern.

  • budget/budget.go:1 stale package doc. The header comment says // Package skillexec runs saved Skill definitions via majordomo's agent loop — that's mort's skillexec package, not budget. Cosmetic but signals an untidy lift; fix the doc to describe the budget package.

  • budget/budget.go:133 duplicates the window constant. Check uses the literal 7*24*time.Hour while Memory.Add uses the named budgetWindow constant declared in budget/storage.go:32 (same package — so budget.go could reference it directly). Same value, but the duplicate literal is exactly the kind of magic number that drifts on future edits. Fix: use budgetWindow in Check.

  • budget/budget.go:38 NewNoOpBudget returns the internal BudgetTracker, not run.Budget. The constructor signature is func NewNoOpBudget() BudgetTracker { return NoOpBudget{} }. A caller who writes var b run.Budget = budget.NewNoOpBudget() will get a type-mismatch — run.Budget (run/ports.go:95-100) and BudgetTracker (budget/budget.go:24-30) have identical methods but are distinct interface types. The intent stated in budget/run.go:6-8 is that these plug into run.Ports.Budget, so the public constructor should return something assignable to run.Budget.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 35s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Blocking issues found</summary> **VERDICT: Blocking issues found** 🔒 **Security findings** - **`audit/writer.go:209-224` (OnTool) — Tool-arg secret leak.** Confirmed. `OnStep` checks `stepHasSecretTool(resp)` and redactions the assistant `text` (`text_redacted: true`) when a secret-bearing tool fired (`mcp_call`, `email_send`, `http_*` — see `stepHasSecretTool` at `audit/writer.go:171-185`). But `OnTool` at `audit/writer.go:214-224` persists `args: string(call.Arguments)` and `result: truncate(result, 4000)` verbatim for every tool call, including those same tools. An MCP token, email recipient/body, or HTTP auth header passed as `call.Arguments` lands in plaintext in `skill_run_logs.payload.args`, defeating the policy `OnStep` sets up. Fix: gate `args`/`result` on the same `stepHasSecretTool`-style check (or pass the resp/tool-call list context through) and emit `args_redacted: true` + length hint for secret-bearing tools, mirroring the existing `text_redacted` behavior. - **`audit/sink.go:40-50` — Inputs persisted unredacted.** Confirmed. `info.Inputs` from `run.RunInfo` is copied straight into the `SkillRun.Inputs` column (`audit/storage.go:29` — no comment about redaction; field type `map[string]any`). The `stepHasSecretTool` guard in `OnStep` is per-event, not row-level; nothing redacts the input snapshot, so a user prompt containing secrets/PII persists verbatim. - **`audit/writer.go:303-315` (appendLog) — Unbounded `context.Background()` on hot path.** Confirmed. Every `OnStep`/`OnTool`/`LogEvent`/`LogError` call ultimately invokes `w.storage.AppendLog(context.Background(), …)` with no timeout. Only `Close` uses `auditFinishTimeout` (10s, `audit/writer.go:297`). A hung audit DB blocks the run goroutine on every step. Fix: route `appendLog` through a short-timeout context (`auditAppendTimeout`, e.g. 1s) for defence-in-depth parity with `Close`. - **`audit/sink.go:40-50` — `StartRun` error silently dropped, recorder still returned.** Confirmed. `_ = s.storage.StartRun(…)` at `audit/sink.go:40` swallows the error; `appendLog` (`audit/writer.go:303-315`) does not verify the run exists before writing `AppendLog` events. A failed `StartRun` produces a silent audit gap with no surfaced signal beyond a per-event slog.Warn. - **`persona/builtin_loader.go:466,562` — `WebhookIPAllowlist` not CIDR-validated.** Confirmed. The YAML tag `webhook_ip_allowlist` (`builtin_loader.go:466`) flows straight into `Agent.WebhookIPAllowlist []string` (`builtin_loader.go:562`); `grep` confirms no `net.ParseCIDR` / `net.ParseIP` call anywhere in `persona/`. The struct comment at `persona/agent.go:111` documents "CIDR strings" but the loader accepts arbitrary strings, so malformed entries propagate silently. Fix: validate each entry with `net.ParseCIDR` at decode time in `decodeAgentManifest`, slog.Warn per bad entry, skip it (matching the existing `slog.Warn` pattern for missing skills / extends errors at `builtin_loader.go:135,153,160,168,189,194`). 🔒 **Clean (verified)** - `WebhookSecret` deliberately not seeded from YAML; preserved across reloads via `fresh.WebhookSecret = existing.WebhookSecret` (`persona/builtin_loader.go:411`, verified). Comment explicitly calls out the credential-leak risk. - `WebhookSignatureRequired` carried forward via `fresh.WebhookSignatureRequired = existing.WebhookSignatureRequired` (`persona/builtin_loader.go:412`, verified). - `dec.KnownFields(true)` in `decodeAgentManifest` (`persona/builtin_loader.go:511`, verified). - `audit.NewSink(nil)` returns nil recorder; `StartRun` early-returns nil when `s.storage == nil` (`audit/sink.go:31-33`, verified); covered by `TestNilSinkRecordsNothing` (`audit/audit_test.go:71-78`). </details> <details><summary><b>🎯 Correctness</b> — Minor issues</summary> ## VERDICT: Minor issues **Findings (Correctness lens):** - **audit/memory.go:188 — `kept := m.order[:0:0]`** is benign but stylistically off. The `[:0:0]` zero-cap form forces a fresh allocation each call rather than reusing the existing slice's capacity (`m.order[:0]`). Not a correctness bug; flagged only as a deliberate-looking pattern that may not be intentional. - **audit/storage.go:187-189 vs audit/memory.go:191 — `PurgeOlderThan` doc/impl mismatch.** The Storage interface comment says "deletes runs (and their logs) whose StartedAt is strictly before t," but the `Memory` implementation keys on `r.FinishedAt.Before(t)`. Callers passing `t = time.Now()` expecting "drop old starts" will silently keep running-but-stuck rows. Suggested fix: align the doc to `FinishedAt.Before(t)` (the actual semantics, which intentionally retain stuck rows) — or change the impl to key on `StartedAt` if that semantics is wanted. - **budget/memory.go:47 vs budget/budget.go:133 — boundary asymmetry on the rolling 7-day window.** `Memory.Add` rolls over at `now.Sub(WindowStart) >= 7*24*time.Hour` (inclusive), while `DBBudget.Check` enforces at `< 7*24*time.Hour` (exclusive). The two predicates disagree by exactly one nanosecond. Not a real-world bug, but the asymmetry is real and is worth aligning (both to inclusive, or both to exclusive). - **persona/runnable.go:10-37 — `ToRunnable` silently drops a large slice of `Agent` fields.** `EncryptionEnabled`, `ExecutionLane`, `PersonalizationSources`, `Schedule`, `NextRunAt`, `LastScheduledRunAt`, `WebhookSecret`, `WebhookSignatureRequired`, `WebhookIPAllowlist`, `ChatbotChannelFilter`, `DefaultEmoji`, `StateReactEmoji`, `Tags`, plus `Phases[].FallbackMessage` and `Phases[].IsRunFunc` are all lost in the DTO. The doc says "static shape only" so this is by design, but a future phase wiring any of these via `RunnableAgent` will silently get nothing. Worth a doc note naming the supported lowering set. - **persona/memory.go:85-98 — `GetAgentByWebhookSecret` linear-scans every agent under `m.mu.RLock`.** O(N) per incoming webhook. With thousands of personas with webhook secrets enabled, every webhook triggers a full scan holding the RLock. Not a correctness issue today, but a contention hazard that scales poorly and would be cleaner as a `map[string]string` keyed on `WebhookSecret`. - **audit/writer.go:267-271 — `Close` comment names mort callers that are not in this PR.** The comment says "the caller (skillexec.runInner / agentexec.runInner) ALSO detaches at the call site" — those identifiers are mort package paths, not executus code. The actual in-PR caller (`run/executor.go:288` via `finishAudit`) does detach via the package's own `detach()` helper (run/executor.go:314), so the defence-in-depth behaviour is real; but the comment reads as dead-history for this battery. Substantively harmless, just misleading documentation. - **go.mod — `gopkg.in/yaml.v3 v3.0.1` added at top-level.** Consumed by `persona/builtin_loader.go`. The PR framing claims core/batteries stay stdlib-only; `yaml.v3` is the first non-stdlib third-party dep in a new package. Leaf decoder, low risk — but it does deviate from the stated "stdlib-only core" framing, so worth confirming the dependency budget is OK with this addition. **Nothing blocking found.** `run/` package confirmed to import none of `audit`, `budget`, or `persona` (no matches in grep). The end-to-end audit test wires through `run.Ports.Audit` correctly (Sink.StartRun with `SubjectID = ra.ID` verified). `ToRunnable` mapping round-trips the fields the test checks. `DBBudget` rolling-window test passes against `Memory.Add` semantics. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Blocking issues found</summary> # 🧯 Error handling & edge cases review **VERDICT: Blocking issues found** ## Findings - **`audit/memory.go:144-170` (`matchesFilter`) contradicts the documented `RunFilter` contract.** The doc on `audit/storage.go:104-106` says of `RunFilter.IncludeDryRun`: *"Ignored when Status is set explicitly (an explicit Status=='dry_run' still matches)."* But the implementation evaluates the dry-run gate **before** the Status check and unconditionally excludes `status="dry_run"` whenever `IncludeDryRun=false`, even when the caller explicitly set `Status="dry_run"`. Net effect: a host asking for exactly the dry-run rows (`Status:"dry_run", IncludeDryRun:false`) gets an empty page. Fix: when `f.Status != ""`, skip the dry-run pre-filter. Verified against `audit/storage.go:104-106` and `audit/memory.go:145-148`. - **`audit/memory.go:247` (`LastRunBySkills`) does not match its documented semantics.** Doc on `audit/storage.go:232-234` states: *"When false, only status='ok' rows count."* The implementation only excludes `error` and `timeout`; rows with `status="budget_exceeded"`, `"preempted"`, `"lane_busy"`, or `"cancelled"` still pass through and surface as the "last successful run" timestamp on dashboards. Fix: change the predicate to `!includeFailed && r.Status != "ok"`, or explicitly enumerate every non-ok status you want to drop. Verified against `audit/storage.go:230-238` and `audit/memory.go:247`. - **`audit/writer.go:312` (`appendLog`) silently drops events on storage failure.** Every `OnStep` / `OnTool` / `LogEvent` / `LogError` ultimately calls `AppendLog` with `context.Background()`. If `AppendLog` consistently errors (broken DB connection), the failure is only surfaced as a `slog.Warn` with no upper bound on how many warns get emitted. Under a sustained outage a single run can flood the log with thousands of warnings. Acceptable as "audit must not break user-visible execution" but worth a rate-limit / circuit-break (or at least debounce to the first + last error of a run) so a misconfigured DB doesn't drown `mort.log`. Edge-case concern. - **`budget/budget.go:1` stale package doc.** The header comment says `// Package skillexec runs saved Skill definitions via majordomo's agent loop` — that's mort's skillexec package, not `budget`. Cosmetic but signals an untidy lift; fix the doc to describe the `budget` package. - **`budget/budget.go:133` duplicates the window constant.** `Check` uses the literal `7*24*time.Hour` while `Memory.Add` uses the named `budgetWindow` constant declared in `budget/storage.go:32` (same package — so `budget.go` could reference it directly). Same value, but the duplicate literal is exactly the kind of magic number that drifts on future edits. Fix: use `budgetWindow` in `Check`. - **`budget/budget.go:38` `NewNoOpBudget` returns the internal `BudgetTracker`, not `run.Budget`.** The constructor signature is `func NewNoOpBudget() BudgetTracker { return NoOpBudget{} }`. A caller who writes `var b run.Budget = budget.NewNoOpBudget()` will get a type-mismatch — `run.Budget` (`run/ports.go:95-100`) and `BudgetTracker` (`budget/budget.go:24-30`) have identical methods but are distinct interface types. The intent stated in `budget/run.go:6-8` is that these plug into `run.Ports.Budget`, so the public constructor should return something assignable to `run.Budget`. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 35s</sub>
steve added 1 commit 2026-06-27 03:44:36 +00:00
fix: address verified gadfly P4/#4 findings (audit/budget/persona)
executus CI / test (pull_request) Failing after 59s
09fc1a07e8
Security (all 3 models — HIGH): audit OnTool persisted raw tool args + results
verbatim for the very tools the OnStep narration-redaction flags as secret
(mcp_call/email_send/http_*) — the args/results are what CARRY the secret, so
they landed in skill_run_logs unredacted. Factored the predicate into
isSecretTool() (single source of truth) and OnTool now emits
args_redacted/result_redacted (+ lengths) for secret tools. Test asserts no
secret reaches the log. (persona) webhook_ip_allowlist entries are now
CIDR/IP-validated at load (malformed dropped + warned) instead of accepted raw.

Contract correctness (glm-5.2 + deepseek) — audit Memory now honors its
documented Storage contract: ListChildrenByParent/ListFinishedRunsBefore return
oldest-first; WalkParentChain returns root-first and honors MaxParentChainDepth;
ListRunsFiltered clamps limit (<=0 or >500 -> 50); ListFinishedRunsBefore with
limit<=0 returns none; an explicit RunFilter.Status (incl. "dry_run") matches
regardless of IncludeDryRun; LastRunBySkills counts only status=="ok" unless
includeFailed. (PurgeOlderThan's FinishedAt key is the SAFE behavior — in-flight
runs retained — so the doc was aligned to it, not the impl.)

Error-handling: appendLog now uses a bounded context (auditAppendTimeout=3s) so
a hung backend can't block the run goroutine on the hot path; Sink.StartRun
logs its (still best-effort) failure instead of swallowing it; budget Memory.Get
uses RLock (RWMutex); budget package doc fixed (was skillexec's); Check uses the
budgetWindow constant, not a duplicated literal.

Triaged false-positive: NewNoOpBudget returning BudgetTracker is assignable to
run.Budget (identical method sets) — no change needed.

Core go.sum still free of host/DB deps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve closed this pull request 2026-06-27 04:16:08 +00:00
steve deleted branch phase-4-batteries 2026-06-27 04:16:08 +00:00
Some checks are pending
executus CI / test (pull_request) Failing after 59s

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/executus#4