P2: run kernel + run.Ports inversion — executus is runnable #2

2026-06-27T00:45:47Z

steve commented

2026-06-27 00:45:47 +00:00

Phase 2: the run kernel and the inversion that lets one executor serve every host. executus is now runnable.

The inversion

mort's agentexec/skillexec import agents/agentcritic/skillaudit/skills/paste directly. Here the kernel depends only on:

RunnableAgent (a DTO) — the persona Agent and saved Skill will each lower into it, so the kernel never imports a noun battery.
run.Ports (all nil-safe) — Audit/RunRecorder, Budget, Critic/CriticHandle, Checkpointer, PaletteSource, Delivery — each mirroring mort's existing interface so the batteries implement them directly.
RunTally — inverts the *skillaudit.Writer dependency in the run-state accessor.

What's here

run/ — run-loop mechanics (cancel-merge, finalizers, run-state accessor, submit-capture, progress bridge), the RunnableAgent DTO, the full run.Ports, step instrumentation, and run.Executor: New(Config{...}) + Run(ctx, RunnableAgent, inv) Result. It wires model resolution + the tool registry + majordomo's loop + compaction + run-bounding (V10 detached-timeout) + step/audit observers + the Budget gate. Zero Ports = a bounded in-memory run (gadfly's case).
compact/ — the per-run context compactor.
examples/minimal — upgraded to the real "hello, agentic world" (~15 lines: Configure tiers → run.New → Run → print).
Also on this branch: the gadfly config sync to your foreman m1/m5 setup, and the 3 gadfly P1 fixes (cache-token guard, ledger error logging, io.LimitReader).

Verification

go build/vet/test -race ./... green. Core go.sum still free of gorm/redis/discordgo/sqlite.
New hermetic tests against majordomo's fake provider: end-to-end agent run (hello-world), Budget-rejection (no model call), Audit-port wiring (StartRun + Close with terminal status/output).

Follow-ups (incremental, not blocking)

Wire Critic/Checkpointer/PaletteSource/Delivery into the loop, multi-phase pipelines, and the no-tools direct path. The seams are defined and nil-safe; these light up as P4 (persona/skill + batteries) needs them.

Gadfly review is advisory — findings will be double-checked.

🤖 Generated with Claude Code

Phase 2: the run kernel and the inversion that lets one executor serve every host. **executus is now runnable.** ## The inversion mort's `agentexec`/`skillexec` import `agents`/`agentcritic`/`skillaudit`/`skills`/`paste` directly. Here the kernel depends only on: - **`RunnableAgent`** (a DTO) — the persona Agent and saved Skill will each *lower* into it, so the kernel never imports a noun battery. - **`run.Ports`** (all nil-safe) — `Audit`/`RunRecorder`, `Budget`, `Critic`/`CriticHandle`, `Checkpointer`, `PaletteSource`, `Delivery` — each mirroring mort's existing interface so the batteries implement them directly. - **`RunTally`** — inverts the `*skillaudit.Writer` dependency in the run-state accessor. ## What's here - **`run/`** — run-loop mechanics (cancel-merge, finalizers, run-state accessor, submit-capture, progress bridge), the `RunnableAgent` DTO, the full `run.Ports`, step instrumentation, and **`run.Executor`**: `New(Config{...})` + `Run(ctx, RunnableAgent, inv) Result`. It wires model resolution + the tool registry + majordomo's loop + compaction + run-bounding (V10 detached-timeout) + step/audit observers + the Budget gate. **Zero Ports = a bounded in-memory run** (gadfly's case). - **`compact/`** — the per-run context compactor. - **`examples/minimal`** — upgraded to the real *"hello, agentic world"* (~15 lines: `Configure` tiers → `run.New` → `Run` → print). - Also on this branch: the **gadfly config sync** to your foreman m1/m5 setup, and the **3 gadfly P1 fixes** (cache-token guard, ledger error logging, `io.LimitReader`). ## Verification - `go build/vet/test -race ./...` green. Core `go.sum` still free of gorm/redis/discordgo/sqlite. - New hermetic tests against majordomo's **fake provider**: end-to-end agent run (hello-world), Budget-rejection (no model call), Audit-port wiring (StartRun + Close with terminal status/output). ## Follow-ups (incremental, not blocking) Wire `Critic`/`Checkpointer`/`PaletteSource`/`Delivery` into the loop, multi-phase pipelines, and the no-tools direct path. The seams are defined and nil-safe; these light up as P4 (persona/skill + batteries) needs them. Gadfly review is advisory — findings will be double-checked. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

steve added 7 commits 2026-06-27 00:45:47 +00:00

P2 (foundation): run-loop mechanics + RunnableAgent DTO aab950f1c3

Stand up the executus/run kernel foundation, decoupled from mort:

- runengine.go: the shared run-loop scaffolding (MergeCancellation,
  CleanupContextTimeout, RunFinalizer/FireFinalizers, RunStateAccessor) moved
  from mort. The accessor's *skillaudit.Writer dependency is inverted to a
  narrow run.RunTally interface (TokenStats + ToolCallsCount) — the kernel
  reads live tallies without importing the audit battery.
- submit.go: the legacy submit-capture compat tool (stdlib + majordomo/llm).
- agent.go: RunnableAgent DTO — the kernel's view of "a thing to run" (tier,
  prompt, caps, palette, phases, critic config). The persona Agent and saved
  Skill will LOWER into this DTO so the kernel never imports a noun battery.
  This is the spine of the agentexec.Run(*agents.Agent) inversion.

run/ builds with only majordomo + executus/tool. The executor merge
(agentexec+skillexec -> run.Executor) and the nil-safe run.Ports
(Audit/Critic/Budget/Checkpointer/PaletteSource) are the next P2 block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

P2: define nil-safe run.Ports (the inversion spine) 8b30b9f889

Add run/ports.go: the host seams the executor will consume, every one
nil-safe so a light host runs with the zero Ports (no persistence/audit/
budget/critic/delegation/delivery) and a heavy host wires each to a battery.

Ports mirror mort's existing interfaces so the batteries implement them
directly:
- Audit + RunRecorder (mort skillaudit.Storage/Writer): StartRun -> per-run
  recorder (OnStep/OnTool/LogEvent/Close), recorder satisfies RunTally.
- Budget (mort skillexec.BudgetTracker): Check / Commit.
- Critic + CriticHandle (mort agentcritic): Monitor -> handle with
  RecordStep/RecordToolStart/Steer/Deadline/Stop (the loop wiring finalizes
  with the executor merge).
- Checkpointer (mort agentexec.RunCheckpointer): Save/Complete/Fail.
- PaletteSource (mort SkillInvokerForPalette + AgentInvokerForPalette):
  Resolve/Invoke skill + agent delegation.
Plus host-neutral RunInfo / RunStats.

This completes the P2 inversion DESIGN; the agentexec+skillexec ->
run.Executor merge that consumes these Ports is the remaining P2 work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ci: sync gadfly review config to mort's foreman-provider setup fe5074c3cf

Mirror mort's updated adversarial-review.yml: m1/m5 pulled in via the
GADFLY_ENDPOINT_M1/_M5 secrets using gadfly's "foreman" provider type
(providers m1/m5; models m1/qwen3:14b, m5/qwen3.6:35b-mlx), 2 cloud models,
3-lens suite, pinned to the gadfly :sha-6e3a83c image. Header adjusted for
executus; functional config identical to mort's tested version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge main (P1) into phase-2-run-kernel 0d38783912

fix: address gadfly P1 review (3 low-risk findings) 9a89d588b6

Triaged gadfly's P1 review (advisory). Fixed the three clearly-correct,
low-risk items; the rest were pre-existing mort behavior or theoretical:

- model/call.go: recordUsage dropped fully-cached responses (input==0 &&
  output==0 early-return missed CacheRead/CacheWrite-only usage, which
  Anthropic/OpenAI prompt-caching bills). Guard now also checks cache tokens.
- llmmeta/helper.go: recordLedger swallowed Storage.RecordMetaCall errors;
  now logs them (slog.Warn) so a non-logging Storage impl can't silently drop
  audit rows.
- model/cloud_sync.go: the ollama.com limit-cache used unbounded io.ReadAll;
  wrapped both reads in io.LimitReader(1 MiB) so a misbehaving endpoint can't
  exhaust memory before the 15s timeout.

Noted-not-fixed (follow-ups / pre-existing mort semantics): tier_not_allowed
ledger label on resolution failure, unknown-model usage attribution, the
cloud_sync https scheme allowlist, and several theoretical/cosmetic items.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

P2: move compactor -> compact/ + step instrumentation -> run/steps.go 4132af0216

- compact/compactor.go: the per-run stateful context compactor (token-threshold
  gate, fast-tier middle summarisation, fold memory) lifted from mort's
  skillexec/compactor.go. Self-contained; its only dependency is a ModelResolver
  func (model.ParseModelForContext satisfies it) + a token threshold.
- run/steps.go: the step-emission/instrumentation (stepEmitter, tool->kind/
  summary mapping with redaction, Result.Steps accumulation) from agentexec,
  repointed onto executus/tool.

Both build green. With the run-loop mechanics, RunnableAgent DTO, run.Ports,
compactor, and step instrumentation now all in place, the remaining P2 work is
the run.Executor itself (wiring these + majordomo's agent loop), which makes
executus runnable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

P2: run.Executor — executus is runnable

executus CI / test (pull_request) Successful in 1m0s

Details

Adversarial Review (Gadfly) / review (pull_request) Successful in 28m58s

Details

e76eed0011

The capstone of the run kernel: run.Executor.Run(ctx, RunnableAgent, inv)
ties model resolution + the tool registry + majordomo's agent loop +
context compaction + run-bounding + step/audit instrumentation into one
path, with every host concern behind the nil-safe run.Ports.

- run/executor.go: New(Config{Registry, Models, Defaults, Ports, Compactor,
  ContextTokens, SystemHeader}) + Run -> Result{RunID, Output, Steps, Usage,
  Err}. Budget gate (pre-run), model resolve, Audit StartRun/recorder
  (satisfies RunTally, stamped on inv.RunState), toolbox build, step observer
  (zips tool calls/results -> emitter + recorder.OnStep/OnTool), V10
  detached-MaxRuntime context with caller-cancel merged back, compaction wired
  from ContextTokens×ratio, audit Close + Budget Commit on a detached cleanup
  ctx. Zero Ports = a bounded in-memory run (gadfly's case).
- run/executor_test.go: hermetic end-to-end run against majordomo's fake
  provider (hello-world), Budget-rejection (no model call), Audit-port wiring
  (StartRun + Close with terminal status/output). All green under -race.
- examples/minimal upgraded to the real "hello, agentic world" (~15 lines:
  Configure tiers -> run.New -> Run -> print). README/CLAUDE.md updated.

Remaining P2 follow-ups (incremental): wire Critic/Checkpointer/PaletteSource/
Delivery into the loop, multi-phase Pipelines, and the no-tools direct path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitea-actions bot commented

2026-06-27 00:45:49 +00:00

🪰 Gadfly review — `m1/qwen3:14b` (m1)

Verdict: Blocking issues found · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

The code introduces a new run kernel and inversion in executus, focusing on making it runnable with nil-safe ports and a RunnableAgent DTO. Through a security lens, I examined for authn/authz gaps, injection vulnerabilities, unsafe deserialization, secret leakage, input validation, and handling of untrusted data.

Key areas reviewed:

The run.Ports interfaces (Audit, Budget, Critic, etc.) are all nil-safe and don't show any direct handling of untrusted data that would require validation.
The compactor's use of LLMs for summarization includes input truncation and escaping in renderTranscript, which mitigates potential injection risks.
The use of io.LimitReader in model/cloud_sync.go prevents excessive memory usage from large responses.
Context management with timeouts and cancellation merging in runengine.go helps prevent long-running processes.
Secrets in the workflow file are properly managed as environment variables and not exposed in the code.

No critical security issues were found in the reviewed code. The implementation appears to handle security considerations appropriately, with proper use of interfaces, context management, and input sanitization where necessary.

🎯 Correctness — Blocking issues found

Blocking issues found

1. `compact/compactor.go` - Inaccurate token estimation heuristic

Path: compact/compactor.go:156-161

The estimateTokens function uses a simplistic heuristic of chars / 4 to estimate token count, which is not accurate for modern LLMs. This could lead to:

Incorrect compaction thresholds (may compact too early/late)
Inconsistent behavior across different model architectures

Fix: Replace with model-specific token estimation (e.g., using llm.Model.EstimateTokens() if available) or a more accurate heuristic derived from actual model tokenization statistics.

2. `run/steps.go` - Null byte vulnerability in correlation key

Path: run/steps.go:167-170

The corrKey function uses \x00 as a delimiter, but JSON raw messages (json.RawMessage) can contain null bytes in practice (e.g., from binary data or malformed JSON). This creates a potential collision vulnerability where:

A tool call with name "tool" and args {"a": "\x00b"}
Would collide with name "tool\x00b" and empty args

Fix: Replace \x00 with a safer delimiter that cannot appear in valid JSON (e.g., "\x1E" or use a cryptographic hash of the combined name+args).

3. `run/executor.go` - Missing error handling in compaction

Path: run/executor.go:162-165

The compaction code returns fmt.Errorf("compactor: summarise middle: %w", err) but does not handle specific error cases that could indicate:

Model unavailability
Malformed response formats
Exceeding summary word limits

Fix: Add explicit error handling for these cases, and consider returning different error types to allow the executor to make informed decisions (e.g., retry, fallback, or terminate).

4. `compact/compactor.go` - Summary truncation risk

Path: compact/compactor.go:224-231

The truncate function adds "...(truncated)" to messages exceeding transcriptMessageCap (2048 chars). However, this truncation could remove critical context in tool results, leading to:

Incomplete or misleading summaries
Potential loss of important information for the agent's decision-making

Fix: Consider increasing the transcriptMessageCap or implementing a more sophisticated truncation strategy that preserves key information (e.g., truncating after the last complete sentence).

5. `run/executor.go` - Inconsistent error propagation

Path: run/executor.go:167-170

The code returns fmt.Errorf("resolve model %q: %w", tier, err) but does not consistently propagate errors through all code paths. For example:

The tool.Build call returns an error but doesn't check it before proceeding
The agent.Run call could return errors that are not properly handled

Fix: Add comprehensive error checking and propagation throughout the code, ensuring all error paths are properly handled and reported.

Verified with:

Read_file for context on token estimation and compaction logic
Grep for error handling patterns in similar code
Analysis of JSON handling in steps.go

🧯 Error handling & edge cases — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 0: all chain targets failed
m1/qwen3:14b: ollama m1/qwen3:14b: do request: Post "https://foreman-m1.orgrimmar.dudenhoeffer.casa/api/chat": context deadline exceeded

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 26m 25s}

### 🪰 Gadfly review — `m1/qwen3:14b` (m1) **Verdict: Blocking issues found · ⚠️ 1/3 lens(es) errored** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> Verdict: No material issues found The code introduces a new run kernel and inversion in executus, focusing on making it runnable with nil-safe ports and a RunnableAgent DTO. Through a security lens, I examined for authn/authz gaps, injection vulnerabilities, unsafe deserialization, secret leakage, input validation, and handling of untrusted data. Key areas reviewed: 1. The `run.Ports` interfaces (Audit, Budget, Critic, etc.) are all nil-safe and don't show any direct handling of untrusted data that would require validation. 2. The compactor's use of LLMs for summarization includes input truncation and escaping in `renderTranscript`, which mitigates potential injection risks. 3. The use of `io.LimitReader` in `model/cloud_sync.go` prevents excessive memory usage from large responses. 4. Context management with timeouts and cancellation merging in `runengine.go` helps prevent long-running processes. 5. Secrets in the workflow file are properly managed as environment variables and not exposed in the code. No critical security issues were found in the reviewed code. The implementation appears to handle security considerations appropriately, with proper use of interfaces, context management, and input sanitization where necessary. </details> <details><summary>🎯 Correctness — Blocking issues found</summary> # Blocking issues found ## 1. `compact/compactor.go` - Inaccurate token estimation heuristic **Path**: `compact/compactor.go:156-161` The `estimateTokens` function uses a simplistic heuristic of `chars / 4` to estimate token count, which is **not accurate** for modern LLMs. This could lead to: - **Incorrect compaction thresholds** (may compact too early/late) - **Inconsistent behavior** across different model architectures **Fix**: Replace with model-specific token estimation (e.g., using `llm.Model.EstimateTokens()` if available) or a more accurate heuristic derived from actual model tokenization statistics. ## 2. `run/steps.go` - Null byte vulnerability in correlation key **Path**: `run/steps.go:167-170` The `corrKey` function uses `\x00` as a delimiter, but JSON raw messages (`json.RawMessage`) **can contain null bytes** in practice (e.g., from binary data or malformed JSON). This creates a **potential collision vulnerability** where: - A tool call with name "tool" and args `{"a": "\x00b"}` - Would collide with name "tool\x00b" and empty args **Fix**: Replace `\x00` with a safer delimiter that cannot appear in valid JSON (e.g., `"\x1E"` or use a cryptographic hash of the combined name+args). ## 3. `run/executor.go` - Missing error handling in compaction **Path**: `run/executor.go:162-165` The compaction code returns `fmt.Errorf("compactor: summarise middle: %w", err)` but **does not handle specific error cases** that could indicate: - Model unavailability - Malformed response formats - Exceeding summary word limits **Fix**: Add explicit error handling for these cases, and consider returning different error types to allow the executor to make informed decisions (e.g., retry, fallback, or terminate). ## 4. `compact/compactor.go` - Summary truncation risk **Path**: `compact/compactor.go:224-231` The `truncate` function adds "...(truncated)" to messages exceeding `transcriptMessageCap` (2048 chars). However, **this truncation could remove critical context** in tool results, leading to: - Incomplete or misleading summaries - Potential loss of important information for the agent's decision-making **Fix**: Consider increasing the `transcriptMessageCap` or implementing a more sophisticated truncation strategy that preserves key information (e.g., truncating after the last complete sentence). ## 5. `run/executor.go` - Inconsistent error propagation **Path**: `run/executor.go:167-170` The code returns `fmt.Errorf("resolve model %q: %w", tier, err)` but **does not consistently propagate errors** through all code paths. For example: - The `tool.Build` call returns an error but doesn't check it before proceeding - The `agent.Run` call could return errors that are not properly handled **Fix**: Add comprehensive error checking and propagation throughout the code, ensuring all error paths are properly handled and reported. --- **Verified with**: - Read_file for context on token estimation and compaction logic - Grep for error handling patterns in similar code - Analysis of JSON handling in `steps.go` </details> <details><summary>🧯 Error handling & edge cases — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 0: all chain targets failed m1/qwen3:14b: ollama m1/qwen3:14b: do request: Post "https://foreman-m1.orgrimmar.dudenhoeffer.casa/api/chat": context deadline exceeded </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 26m 25s

gitea-actions bot commented

2026-06-27 00:45:49 +00:00

🪰 Gadfly review — `deepseek-v4-flash:cloud` (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

Now I have verified all the claims in the draft review against the actual code. Let me produce the corrected review.

Corrected Review

🔒 Security Review — Verdict: Minor issues

Finding 1: Context compactor sends unfiltered conversation data to a summarizer LLM (potential data leakage)

compact/compactor.go:225–272 — The summariseMiddle function renders the full middle-range conversation transcript (including tool call arguments, tool results, URLs, file IDs, KV keys, and any other data the agent has touched) via renderTranscript (lines 308–324) and sends it verbatim to the summarizer model via model.Generate. The prompt explicitly asks for "ANY file_ids or KV keys the agent created" and "KEY findings or decisions (factual results the agent will need later)."

Impact: If the summarizer tier (default "fast") resolves to a different model provider than the main agent model, all intermediate conversation data — including potentially sensitive tool results, API responses, file contents, and user data — is transmitted to that provider's endpoint. Even within the same provider, the data reaches a different model endpoint with potentially different data-handling policies. There is no filtering, redaction, or classification gate before data leaves the process.

Verified by: Reading compact/compactor.go lines 225–272 (summariseMiddle), lines 308–324 (renderTranscript), and the executor's wiring at run/executor.go:201–204 where the compactor is attached.

Suggested fix: Add a configurable content-filtering layer (e.g., a Sanitize func([]llm.Message) []llm.Message callback on CompactorConfig) that hosts can use to strip or redact sensitive message parts before they reach the summarizer. At minimum, document prominently in CompactorConfig that the summarizer receives the full unfiltered transcript and that hosts should ensure the summarizer tier's data handling is consistent with their security requirements.

No other material security issues found

model/cloud_sync.go — The io.LimitReader addition (lines 317, 370) is a correct SSRF-hardening fix; 1 MiB is a reasonable cap.
run/steps.go — The summaryForStart function properly redacts sensitive tool arguments for mcp_call, email_send, and http_* tools (showing only server/host, never args/body). Other tools' argument extraction via argString is display-only and does not execute or store the values unsafely.
run/executor.go — The Budget gate fires before any model call, preventing unauthorized resource consumption. Context detachment for cleanup ops (detach()) follows the established pattern and is not a security concern.
run/submit.go — The submit tool handler only stores the model's output string; no injection vector.
run/runengine.go — MergeCancellation goroutine is properly cleaned up via the returned cancel function (called by defer in the executor). No leak.
llmmeta/helper.go — Adding error logging for RecordMetaCall is an observability improvement, not a security issue.

🎯 Correctness — Blocking issues found

🎯 Correctness — Verdict: Blocking issues found

1. `SetIteration` never called — self-status tool always reports iteration 0

run/executor.go:166-185

The stepObserver closure in Run() does not call a.SetIteration(s.Index), even though RunStateAccessor.SetIteration exists (run/runengine.go:142) and the doc comment on line 119 says "the executor's step observer calls SetIteration each loop." The s.Index value is available (it's passed to rec.OnStep(s.Index, s.Response) on line 168), but it is never stored in the accessor. As a result, any self-status tool reading RunState().Iteration will always see 0, making the live-progress feature non-functional.

Fix: Add inv.RunState.SetIteration(s.Index) (or equivalent) inside the stepObserver closure, after the rec.OnStep call. The accessor was stamped onto inv.RunState at line 151, so it is reachable.

2. `toolStart` called from post-step observer, not pre-dispatch — step timing is wrong

run/executor.go:179-180

The design intent (stated in run/steps.go:37-39) is that toolStart fires from a pre-dispatch tool hook (before the tool runs) and toolEnd fires from the post-step observer (after). In the current code, both toolStart and toolEnd are called from the same stepObserver closure, which runs after the tool has already completed. This means:

step.StartedAt is set to the time the stepObserver fires (after the tool finished), not when the tool actually began executing.
The "running" status is never visible to live OnStep consumers — both "start" and "end" events fire in the same callback invocation, so the step appears to complete instantaneously.

Fix: Wire a pre-dispatch hook (e.g. via llm.Toolbox's BeforeTool mechanism or agent.WithToolHook) that calls emitter.toolStart(ctx, name, args) before each tool executes. The stepObserver should then only call emitter.toolEnd.

3. `toolStart` called with empty name/args when results outnumber tool calls

run/executor.go:174-180

When s.Response is nil or len(s.Results) > len(s.Response.ToolCalls), the call variable on line 175-177 remains its zero value (llm.ToolCall{}). The subsequent emitter.toolStart(ctx, call.Name, call.Arguments) on line 179 creates a step with an empty title (""). The matchPending fallback in toolEnd then uses correlation key "\x00" (empty name + nil args), which could collide across multiple such steps. While this edge case may be rare in practice, it produces semantically broken step records.

Fix: Guard the toolStart/toolEnd pair with a check that call.Name != "", or skip the iteration when i >= len(calls). Alternatively, ensure the pre-dispatch hook (see finding #2) has the real call data and only the toolEnd runs in the stepObserver.

4. `estimateTokens` comment/image discrepancy (minor, not blocking)

compact/compactor.go:280-281

The comment says "Images count a flat ~1K tokens each" but the code adds 4096 chars. At the chars/4 divisor, 4096 chars = 1024 tokens, which is indeed ~1K tokens. The comment describes the token count while the code stores chars — this is actually consistent, just confusingly documented. Not a bug, but the comment could be clearer: "Images count a flat 4096 chars (~1K tokens)."

🧯 Error handling & edge cases — Minor issues

Now I have all the information I need. Let me verify each finding against the actual code.

Finding 1 — SetIteration never called: CONFIRMED. The SetIteration method exists at run/runengine.go:142 and the comment at line 119 says "the executor's step observer calls SetIteration each loop", but the actual stepObserver closure at run/executor.go:166-185 never calls it. A grep for .SetIteration( across the entire repo returns zero matches. The iteration counter is dead code.

Finding 2 — Zero-value llm.ToolCall when results exceed tool calls: CONFIRMED. At run/executor.go:174-178, when i >= len(calls), call stays as the zero-value llm.ToolCall (empty Name, nil Arguments). This is then passed to emitter.toolStart(ctx, call.Name, call.Arguments) at line 179, which calls newStep("", nil) at steps.go:114, producing a step with Title: "" and Summary: "Using " (from the default case in summaryForStart).

Finding 3 — stepObserver captures ctx not runCtx: CONFIRMED. The stepObserver closure at line 166 captures ctx (the original caller context, possibly enriched by model resolution at line 133). runCtx is created later at line 190 with context.WithoutCancel(ctx) to detach from the caller's deadline. The stepObserver passes ctx to emitter.toolStart/toolEnd, which forward it to the user's onStep callback. If the caller cancels before the agent loop finishes, the callback receives a cancelled context. The MergeCancellation at line 192 will propagate the cancellation to runCtx shortly after, so the race window is small but real.

Finding 4 — msgs[0] access in renderCompacted: The draft itself says this is safe and not a bug. The guard at line 205 (st.prefixEnd > len(msgs)) combined with st.prefixEnd > 0 guarantees len(msgs) > 0. This is a non-issue note, not an actionable finding. DROPPED.

Finding 5 — Negative endMiddle in compact/compactor.go:173: The draft itself says this is safe and not a bug. The guard at line 174 catches it. Non-issue note. DROPPED.

🧯 Error handling & edge cases — Verdict: Minor issues

Findings

1. run/executor.go:166-185 — RunStateAccessor.SetIteration never called, iteration always reports 0

The stepObserver closure receives s.Index (the current agent-loop iteration) but never propagates it to the RunStateAccessor stored on inv.RunState. The accessor's SetIteration method exists (run/runengine.go:142) and the doc comment at line 119 says "the executor's step observer calls SetIteration each loop", but the actual step observer code never invokes it. A repo-wide grep confirms .SetIteration( is never called anywhere. This means the self-status tool (and any consumer of tool.RunState) will always see Iteration: 0 for the entire run.

Suggested fix: Add a call inside the stepObserver:

if acc, ok := inv.RunState.(*RunStateAccessor); ok {
    acc.SetIteration(s.Index)
}

2. run/executor.go:174-183 — Zero-value llm.ToolCall when results exceed tool calls

When len(s.Results) > len(s.Response.ToolCalls) (possible if a tool errors without a corresponding model-issued call, or on agent-loop edge cases), the extra results iterate with a zero-value llm.ToolCall — empty Name and nil Arguments. This creates steps with an empty Title in the step emitter, producing confusing output like Using (empty name) in the step summary.

Suggested fix: Skip results that have no matching tool call, or synthesize a fallback name:

if i < len(calls) {
    call = calls[i]
} else {
    continue // or call = llm.ToolCall{Name: "unknown"}
}

3. run/executor.go:166 — stepObserver captures original caller ctx, not runCtx

The stepObserver closure captures ctx (the original caller context, enriched by model resolution) rather than runCtx (the detached run context created at line 190). If the caller cancels their context while the agent loop is still running, the onStep callback receives a cancelled context. The MergeCancellation mechanism will terminate the loop shortly after, but there is a race window where the stepObserver fires with a cancelled context.

Suggested fix: Either capture runCtx instead (by moving the closure definition after runCtx is created), or pass context.WithoutCancel(ctx) to the emitter calls. (Low severity — the race window is small and onStep is typically used for logging.)

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 11m 54s}

### 🪰 Gadfly review — `deepseek-v4-flash:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Minor issues</summary> Now I have verified all the claims in the draft review against the actual code. Let me produce the corrected review. ## Corrected Review ## 🔒 Security Review — Verdict: **Minor issues** ### Finding 1: Context compactor sends unfiltered conversation data to a summarizer LLM (potential data leakage) **`compact/compactor.go:225–272`** — The `summariseMiddle` function renders the full middle-range conversation transcript (including tool call arguments, tool results, URLs, file IDs, KV keys, and any other data the agent has touched) via `renderTranscript` (lines 308–324) and sends it verbatim to the summarizer model via `model.Generate`. The prompt explicitly asks for "ANY file_ids or KV keys the agent created" and "KEY findings or decisions (factual results the agent will need later)." **Impact:** If the summarizer tier (default `"fast"`) resolves to a different model provider than the main agent model, all intermediate conversation data — including potentially sensitive tool results, API responses, file contents, and user data — is transmitted to that provider's endpoint. Even within the same provider, the data reaches a different model endpoint with potentially different data-handling policies. There is no filtering, redaction, or classification gate before data leaves the process. **Verified by:** Reading `compact/compactor.go` lines 225–272 (`summariseMiddle`), lines 308–324 (`renderTranscript`), and the executor's wiring at `run/executor.go:201–204` where the compactor is attached. **Suggested fix:** Add a configurable content-filtering layer (e.g., a `Sanitize func([]llm.Message) []llm.Message` callback on `CompactorConfig`) that hosts can use to strip or redact sensitive message parts before they reach the summarizer. At minimum, document prominently in `CompactorConfig` that the summarizer receives the full unfiltered transcript and that hosts should ensure the summarizer tier's data handling is consistent with their security requirements. --- ### No other material security issues found - **`model/cloud_sync.go`** — The `io.LimitReader` addition (lines 317, 370) is a correct SSRF-hardening fix; 1 MiB is a reasonable cap. - **`run/steps.go`** — The `summaryForStart` function properly redacts sensitive tool arguments for `mcp_call`, `email_send`, and `http_*` tools (showing only server/host, never args/body). Other tools' argument extraction via `argString` is display-only and does not execute or store the values unsafely. - **`run/executor.go`** — The Budget gate fires before any model call, preventing unauthorized resource consumption. Context detachment for cleanup ops (`detach()`) follows the established pattern and is not a security concern. - **`run/submit.go`** — The submit tool handler only stores the model's output string; no injection vector. - **`run/runengine.go`** — `MergeCancellation` goroutine is properly cleaned up via the returned cancel function (called by `defer` in the executor). No leak. - **`llmmeta/helper.go`** — Adding error logging for `RecordMetaCall` is an observability improvement, not a security issue. </details> <details><summary>🎯 Correctness — Blocking issues found</summary> ## 🎯 Correctness — Verdict: **Blocking issues found** ### 1. `SetIteration` never called — self-status tool always reports iteration 0 **`run/executor.go:166-185`** The `stepObserver` closure in `Run()` does not call `a.SetIteration(s.Index)`, even though `RunStateAccessor.SetIteration` exists (`run/runengine.go:142`) and the doc comment on line 119 says "the executor's step observer calls SetIteration each loop." The `s.Index` value is available (it's passed to `rec.OnStep(s.Index, s.Response)` on line 168), but it is never stored in the accessor. As a result, any self-status tool reading `RunState().Iteration` will always see 0, making the live-progress feature non-functional. **Fix:** Add `inv.RunState.SetIteration(s.Index)` (or equivalent) inside the `stepObserver` closure, after the `rec.OnStep` call. The accessor was stamped onto `inv.RunState` at line 151, so it is reachable. --- ### 2. `toolStart` called from post-step observer, not pre-dispatch — step timing is wrong **`run/executor.go:179-180`** The design intent (stated in `run/steps.go:37-39`) is that `toolStart` fires from a **pre-dispatch tool hook** (before the tool runs) and `toolEnd` fires from the **post-step observer** (after). In the current code, both `toolStart` and `toolEnd` are called from the same `stepObserver` closure, which runs **after** the tool has already completed. This means: - `step.StartedAt` is set to the time the stepObserver fires (after the tool finished), not when the tool actually began executing. - The "running" status is never visible to live `OnStep` consumers — both "start" and "end" events fire in the same callback invocation, so the step appears to complete instantaneously. **Fix:** Wire a pre-dispatch hook (e.g. via `llm.Toolbox`'s `BeforeTool` mechanism or `agent.WithToolHook`) that calls `emitter.toolStart(ctx, name, args)` before each tool executes. The `stepObserver` should then only call `emitter.toolEnd`. --- ### 3. `toolStart` called with empty name/args when results outnumber tool calls **`run/executor.go:174-180`** When `s.Response` is nil or `len(s.Results) > len(s.Response.ToolCalls)`, the `call` variable on line 175-177 remains its zero value (`llm.ToolCall{}`). The subsequent `emitter.toolStart(ctx, call.Name, call.Arguments)` on line 179 creates a step with an **empty title** (`""`). The `matchPending` fallback in `toolEnd` then uses correlation key `"\x00"` (empty name + nil args), which could collide across multiple such steps. While this edge case may be rare in practice, it produces semantically broken step records. **Fix:** Guard the `toolStart`/`toolEnd` pair with a check that `call.Name != ""`, or skip the iteration when `i >= len(calls)`. Alternatively, ensure the pre-dispatch hook (see finding #2) has the real call data and only the `toolEnd` runs in the stepObserver. --- ### 4. `estimateTokens` comment/image discrepancy (minor, not blocking) **`compact/compactor.go:280-281`** The comment says "Images count a flat ~1K tokens each" but the code adds `4096` chars. At the `chars/4` divisor, 4096 chars = 1024 tokens, which is indeed ~1K tokens. The comment describes the **token** count while the code stores **chars** — this is actually consistent, just confusingly documented. Not a bug, but the comment could be clearer: "Images count a flat 4096 chars (~1K tokens)." </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> Now I have all the information I need. Let me verify each finding against the actual code. **Finding 1 — `SetIteration` never called:** CONFIRMED. The `SetIteration` method exists at `run/runengine.go:142` and the comment at line 119 says "the executor's step observer calls SetIteration each loop", but the actual `stepObserver` closure at `run/executor.go:166-185` never calls it. A grep for `.SetIteration(` across the entire repo returns zero matches. The iteration counter is dead code. **Finding 2 — Zero-value `llm.ToolCall` when results exceed tool calls:** CONFIRMED. At `run/executor.go:174-178`, when `i >= len(calls)`, `call` stays as the zero-value `llm.ToolCall` (empty `Name`, nil `Arguments`). This is then passed to `emitter.toolStart(ctx, call.Name, call.Arguments)` at line 179, which calls `newStep("", nil)` at `steps.go:114`, producing a step with `Title: ""` and `Summary: "Using "` (from the `default` case in `summaryForStart`). **Finding 3 — `stepObserver` captures `ctx` not `runCtx`:** CONFIRMED. The `stepObserver` closure at line 166 captures `ctx` (the original caller context, possibly enriched by model resolution at line 133). `runCtx` is created later at line 190 with `context.WithoutCancel(ctx)` to detach from the caller's deadline. The stepObserver passes `ctx` to `emitter.toolStart`/`toolEnd`, which forward it to the user's `onStep` callback. If the caller cancels before the agent loop finishes, the callback receives a cancelled context. The `MergeCancellation` at line 192 will propagate the cancellation to `runCtx` shortly after, so the race window is small but real. **Finding 4 — `msgs[0]` access in `renderCompacted`:** The draft itself says this is safe and not a bug. The guard at line 205 (`st.prefixEnd > len(msgs)`) combined with `st.prefixEnd > 0` guarantees `len(msgs) > 0`. This is a non-issue note, not an actionable finding. **DROPPED.** **Finding 5 — Negative `endMiddle` in `compact/compactor.go:173`:** The draft itself says this is safe and not a bug. The guard at line 174 catches it. Non-issue note. **DROPPED.** --- ## 🧯 Error handling & edge cases — Verdict: **Minor issues** ### Findings **1. `run/executor.go:166-185` — `RunStateAccessor.SetIteration` never called, iteration always reports 0** The `stepObserver` closure receives `s.Index` (the current agent-loop iteration) but never propagates it to the `RunStateAccessor` stored on `inv.RunState`. The accessor's `SetIteration` method exists (`run/runengine.go:142`) and the doc comment at line 119 says "the executor's step observer calls SetIteration each loop", but the actual step observer code never invokes it. A repo-wide grep confirms `.SetIteration(` is never called anywhere. This means the self-status tool (and any consumer of `tool.RunState`) will always see `Iteration: 0` for the entire run. **Suggested fix:** Add a call inside the stepObserver: ```go if acc, ok := inv.RunState.(*RunStateAccessor); ok { acc.SetIteration(s.Index) } ``` --- **2. `run/executor.go:174-183` — Zero-value `llm.ToolCall` when results exceed tool calls** When `len(s.Results) > len(s.Response.ToolCalls)` (possible if a tool errors without a corresponding model-issued call, or on agent-loop edge cases), the extra results iterate with a zero-value `llm.ToolCall` — empty `Name` and nil `Arguments`. This creates steps with an empty `Title` in the step emitter, producing confusing output like `Using ` (empty name) in the step summary. **Suggested fix:** Skip results that have no matching tool call, or synthesize a fallback name: ```go if i < len(calls) { call = calls[i] } else { continue // or call = llm.ToolCall{Name: "unknown"} } ``` --- **3. `run/executor.go:166` — `stepObserver` captures original caller `ctx`, not `runCtx`** The `stepObserver` closure captures `ctx` (the original caller context, enriched by model resolution) rather than `runCtx` (the detached run context created at line 190). If the caller cancels their context while the agent loop is still running, the `onStep` callback receives a cancelled context. The `MergeCancellation` mechanism will terminate the loop shortly after, but there is a race window where the stepObserver fires with a cancelled context. **Suggested fix:** Either capture `runCtx` instead (by moving the closure definition after `runCtx` is created), or pass `context.WithoutCancel(ctx)` to the emitter calls. (Low severity — the race window is small and `onStep` is typically used for logging.) </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 11m 54s

gitea-actions bot commented

2026-06-27 00:45:49 +00:00

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues

The diff is largely fine through the security lens. The executor/kernel uses nil-safe ports and proper context detach for post-run writes; the GitHub Actions workflow change moves hardcoded LAN IPs behind GitHub secrets (a small hardening win). One material concern and a few lesser ones:

compact/compactor.go (cross-tier data flow in summariseMiddle / renderTranscript) — unredacted tool content forwarded to the summarizer LLM. Verified. summariseMiddle (around line 229) resolves cfg.Models(ctx, cfg.SummarizerTier) — the compactor docstring itself documents that the production default is "fast" but an admin may set "haiku" or any specific model spec — so the summarizer can be a different provider than the one serving the run. renderTranscript (lines 308–323) emits tool_call name=… args=… and tool_result name=… body=… lines verbatim, with only a 2KB-per-body length cap (transcriptMessageCap); there is no redaction of secret-bearing tool fields, even though run/steps.go:205-323 explicitly classifies mcp_call, email_send, and http_* as secret-bearing and redacts their args in user-facing step summaries. Three concrete exposures:
1. Secret-bearing tool args — mcp_call/email_send/http_get arguments that embed bearer tokens, API keys, or message bodies flow to the summarizer provider.
2. Tool-result bodies — file reads, KV stores, MCP responses, scraped pages are forwarded verbatim; a one-time login link or an http_get of an env dump would be sent to a second provider.
3. Folded-forward prior summary — the running summary is re-fed into every subsequent compaction (prevSummary at line 240-244), so a single secret that lands in one summary persists across many compactions and provider hops.
Fix: scrub tool-call args and tool-result bodies for known secret-bearing tool names before forwarding (at minimum, drop args for mcp_call/email_send/http_*/webhook_* and replace tool-result bodies with [redacted: <name>]); optionally also require SummarizerTier to resolve to the same provider as the run tier (kernel-level policy check).
compact/compactor.go (empty summarizer response path, ~line 268-271) — verified. If strings.TrimSpace(resp.Text()) is empty, the compactor returns errors.New("compactor: summarizer returned empty text") to the agent loop, which then sends the original uncompacted slice. Correct behaviour, static error string, harmless — not a security bug, flagged for completeness only.
run/executor.go:230-245 (finishAudit) — full run output persisted to the recorder's Close. Verified. stats.Output = res.Output and stats.Error = runErr.Error() are written verbatim. By design, but worth confirming the audit battery implementation redacts known-secret fields before the row lands in skill_run_logs — the kernel cannot enforce that. Not a blocker.
run/MergeCancellation goroutine hygiene — verified correct. Watcher goroutine (runengine.go:62-73) exits via either merged.Done() (triggered by the caller's cancel) or secondary.Done(). No leak.
Workflow secret change (.gitea/workflows/adversarial-review.yml) — verified. Old config had hardcoded LAN IPs (192.168.0.175, 192.168.0.173) inline; new config moves them behind ${{ secrets.GADFLY_ENDPOINT_M1 }} / …_M5. Strictly an improvement.

No authn/authz, SQL/command injection, SSRF, deserialization, or credential-leakage regressions introduced by this PR — the new code stays at the kernel/seam level and doesn't handle untrusted raw input directly. The material item above is the compactor cross-tier data flow, which is a real privacy-relevant exposure even if it doesn't qualify as "blocking" for the light-host scope (gadfly doesn't enable compaction).

🎯 Correctness — Blocking issues found

Good, those exist. Now let me write the review.

VERDICT: Blocking issues found

run/runengine.go:62-73 — MergeCancellation's goroutine leaks if secondary is a strict descendant of parent (i.e. never done while parent is also never done) — and the function's own cancel(nil) return path does NOT signal the watcher goroutine. At the only call site (run/executor.go:190-193), runCtx is a child of ctx via WithTimeout(WithoutCancel(ctx)), so the leak is bounded by maxRuntime, but the goroutine launched from cancel := context.WithCancelCause(parent) is leaked whenever parent outlives secondary's Done. The contract documented in the comment ("Always call the returned cancel to release the watcher goroutine") is broken because defer mergeCancel() calls cancel(nil), which cancels the merged context but doesn't make the goroutine exit — the goroutine selects on <-merged.Done() or <-secondary.Done(). After cancel(nil), merged is Done, so the goroutine returns. OK on inspection — but the goroutine continues running until EITHER branch fires; if the merge is called and the parent never cancels and secondary never cancels, it leaks. The goroutine ends only on merged.Done or secondary.Done; the cancel(nil) path DOES cause merged.Done. So OK.

Re-checking: the goroutine exits when <-merged.Done() returns OR <-secondary.Done() returns. The caller's defer mergeCancel() calls cancel(nil), which causes merged.Done(). So the goroutine DOES exit. Not a leak. OK.

Final findings:

VERDICT: Minor issues

run/executor.go:166-185 — stepObserver zips s.Response.ToolCalls (model-emitted) with s.Results (executed) by index i. If len(Results) > len(calls) (e.g. one call produced a synthetic result, or the loop retried), the extra results get an empty ToolCall{} and the step emitter logs them with empty name + args. Conversely if len(calls) > len(Results) (a call errored before result), those calls produce no step at all. This silently produces wrong step records when the agent loop's response/results arrays diverge. Verified the executor code at lines 170-184.
run/executor.go:155 — Registry.Build(..., nil) passes a nil AuditHook to every low-level tool. The tool registry's Build plumbs this onto toolInv.audit, and any tool handler that calls EmitAudit(inv, ...) against a tool with audit=nil will nil-panic. The Ports.Audit (RunRecorder) path is wired separately via rec.OnTool, so this is not a contradiction — but a host wiring LowLevelTools that does emit per-tool audit events gets nil there. Minor gap, not a blocker — it falls under "follow-ups" but should be flagged.
run/executor.go:202-205 — e.cfg.Compactor(threshold, nil) discards onFire. The comment on compact.CompactionEvent claims "the executor can log it to skill_run_logs", but the wiring doesn't. Intentional per follow-ups but the comment misleads.
run/steps.go:130-138 — matchPending fallback "most recent still-running step of the same tool name" closes the wrong step when two same-named tool calls are in flight with args that get re-canonicalised (e.g. JSON key ordering differs between start and end) — the first "end" closes the wrong "start". Then the next "end" can't find a pending match and synthesizes a fresh step, duplicating the second invocation. Verified at lines 87-103 (toolEnd) and 128-138 (matchPending).
run/runengine.go:62-73 — Minor: the goroutine selects on <-merged.Done() and <-secondary.Done(). There's a race window: if secondary cancels AND parent (and thus merged) was about to fire cancel(nil) in the deferred cancel, the goroutine may already have returned via merged.Done(). Not a correctness issue but WithCancelCause is overkill here — WithCancel would do, and the cause-propagation is only useful when secondary is the trigger (which is the path the comment describes).
compact/compactor.go:175-180 — renderCompacted re-checks msgs[0].Role == llm.RoleSystem even when prefixEnd > 0. If the user calls compact again with the synthetic summary as msgs[0] (which is llm.UserText — not system role), the conditional correctly skips it. OK. But the function writes a new synthetic summary every call: a second compaction produces [system, summary1, user(summary2), tail] — except prefixEnd advances past endMiddle, so renderCompacted would emit [system, summary2, tail2] using the OLD summary text. Bug check: at line 175 the out appends the (single) summary message even though the prior compaction's summary was already a user-role message; the second compaction's render keeps msgs[0] (system) if present, then prepends ONE summary message built from current summaryText, then tail from prefixEnd. So old summaries are not duplicated — prefixEnd jumps past them. OK. Verified.
model/call.go:240 — if u.InputTokens == 0 && u.OutputTokens == 0 && u.CacheReadTokens == 0 && u.CacheWriteTokens == 0 — adds cache tokens to the gate. A provider that returns ONLY cache tokens (no input/output counted) now records usage. This is a correctness improvement but may double-count if a downstream sink (e.g. llmmeta ledger) also counts cache tokens separately. Not visible here; depends on the sink's contract. Verified the four fields exist via the field access.

No material issues found at the blocking level. The agent-loop wiring, model resolution, budget gate, audit-port test, and the executor happy path look correct against the verified code.

VERDICT: Minor issues

run/executor.go:170-184 — step observer's index-based zip of s.Response.ToolCalls with s.Results silently produces empty-named/arg steps when arrays diverge (verified)
run/executor.go:155 — nil AuditHook plumbed to tool.Registry.Build will nil-panic any low-level tool handler that calls EmitAudit (verified against tool/registry.go:689)
run/steps.go:130-138 — matchPending fallback closes wrong step on JSON re-canonicalisation; second invocation synthesises a duplicate step (verified)
run/executor.go:202-205 — Compactor onFire callback always nil; doc comment on CompactionEvent misleads about logging (verified)

🧯 Error handling & edge cases — Blocking issues found

Now let me finalize. The findings I've confirmed as material issues:

CONFIRMED — run/executor.go:128-132 nil model not checked (compactor checks, executor doesn't).
CONFIRMED — compact/compactor.go:184 returns rendered, not original.
CONFIRMED — run/executor.go:210-213 Status never set to timeout/cancelled/budget_exceeded.
CONFIRMED — run/executor.go:174-184 step observer zip with nil response.
CONFIRMED — run/steps.go:132-149 matchPending name-only fallback.
CONFIRMED — compact/compactor.go:279-298 estimateTokens silently ignores non-text/non-image parts.
CONFIRMED — model/cloud_sync.go:317, 370 io.LimitReader truncates silently.

Findings 6, 7, 10 — DROP since the review itself concludes "No fix needed" / "this is fine" / only speculative defensive coding.

Verdict: Still "Blocking issues found" since multiple confirmed findings remain.

VERDICT: Blocking issues found

run/executor.go:128-132 — nil model from Models resolver is not checked
Only err != nil is checked after e.cfg.Models(ctx, tier). A resolver returning (ctx, nil, nil) flows straight into agent.New(model, ...) — nil-pointer panic. Note compact/compactor.go:233 already guards this case with errors.New("compactor: summarizer model resolved to nil"), but the executor doesn't. Suggested fix: add if model == nil { res.Err = fmt.Errorf("resolve model %q: nil llm.Model", tier); return res } next to the existing err check.
compact/compactor.go:181-185 — error path returns the previously-compacted slice, not the original
On summarizer failure the function does return rendered, fmt.Errorf("compactor: summarise middle: %w", err). rendered is the current rendered view (which after any prior successful fold is already a synthetic-summary slice), not msgs. The file header explicitly states: "any error (LLM unavailable, malformed response, etc.) is returned to the agent loop, which treats compactor errors as non-fatal and sends the original slice." The code contradicts that contract. Fix: return msgs, fmt.Errorf(...) to match the documented behavior.
run/executor.go:210-213 — Status field never set to timeout / cancelled / budget_exceeded
The RunStats.Status doc (run/ports.go:56) enumerates ok | error | timeout | budget_exceeded | cancelled | dry_run, but the executor only writes "ok" or "error". A run that hits the context.WithTimeout(runCtx, maxRuntime) deadline (context.DeadlineExceeded), one cancelled via MergeCancellation with cause ErrShutdown, etc., all collapse into "error". Audit consumers lose the ability to distinguish user-cancelled runs from LLM/loop failures. Fix: inspect runCtx.Err() (context.DeadlineExceeded ⇒ "timeout") and context.Cause(ctx) (when ErrShutdown ⇒ "cancelled").
run/executor.go:174-184 — step observer zip with nil response produces empty-name tool steps
When s.Response == nil but s.Results is populated, the for i, r := range s.Results loop zero-values call := llm.ToolCall{} for every result, then emitter.toolStart(ctx, "", nil) records a step with Title: "". corrKey("", nil) returns "\x00", shared by every such step — they collide in e.pending["\x00"]. The downstream UI shows ghost "tool" steps (via kindForTool("") falling through to "tool") and the audit recorder records empty-named tool calls via rec.OnTool(call, r.Content). Fix: guard with if s.Response == nil { continue }, or skip empty-name entries.
run/steps.go:132-149 — matchPending name-only fallback correlates wrong step when args differ
If the corrKey(name, args) lookup misses (e.g. whitespace/normalization differences between the before hook and the step observer, or json.RawMessage round-trips with key reordering), the fallback scans in reverse for any still-running step of the same tool name. Two concurrent web_search calls with different queries both pending, args don't match either pending key, and the wrong running step gets marked complete (with the wrong summaryForEnd). The misattributed step leaks out in Result.Steps and the audit OnTool log. Fix: at minimum, log/warn on the mismatch; better, index identical-name concurrent calls in a per-name FIFO.
compact/compactor.go:282-289 — estimateTokens silently ignores non-text/non-image parts
Any llm.Message.Parts entry that isn't TextPart or ImagePart (e.g. an audio part or future tool-input part) contributes 0 chars. The type switch has no default arm. The compactor's threshold check then underestimates; a run accumulating large unrecognized content will not trigger compaction and will hit the model's HTTP-400 cap exactly as the file's own header warns against. Fix: add a default arm with a conservative fallback (e.g. estimate via len(fmt.Sprint(part)) or a flat per-part char count), or document the supported part kinds.
model/cloud_sync.go:317, 370 — io.LimitReader truncates silently, surfacing as opaque JSON parse error
When ollama.com returns ≥ 1 MiB, io.ReadAll(io.LimitReader(resp.Body, maxLimitCacheResponseBytes)) returns EOF mid-stream with no error. Downstream json.Unmarshal then fails with unexpected end of JSON input — not actionable for an operator diagnosing a misbehaving endpoint. Fix: after ReadAll, check len(body) >= maxLimitCacheResponseBytes and return a clear fmt.Errorf("cloud_sync: response truncated at %d bytes (limit %d)", len(body), maxLimitCacheResponseBytes).

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 37s}

### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Minor issues</summary> **VERDICT: Minor issues** The diff is largely fine through the security lens. The executor/kernel uses nil-safe ports and proper context detach for post-run writes; the GitHub Actions workflow change moves hardcoded LAN IPs behind GitHub secrets (a small hardening win). One material concern and a few lesser ones: - **`compact/compactor.go` (cross-tier data flow in `summariseMiddle` / `renderTranscript`) — unredacted tool content forwarded to the summarizer LLM.** Verified. `summariseMiddle` (around line 229) resolves `cfg.Models(ctx, cfg.SummarizerTier)` — the compactor docstring itself documents that the production default is `"fast"` but an admin may set `"haiku"` or any specific model spec — so the summarizer can be a **different provider** than the one serving the run. `renderTranscript` (lines 308–323) emits `tool_call name=… args=…` and `tool_result name=… body=…` lines verbatim, with only a 2KB-per-body length cap (`transcriptMessageCap`); there is no redaction of secret-bearing tool fields, even though `run/steps.go:205-323` explicitly classifies `mcp_call`, `email_send`, and `http_*` as secret-bearing and redacts their args in user-facing step summaries. Three concrete exposures: 1. **Secret-bearing tool args** — `mcp_call`/`email_send`/`http_get` arguments that embed bearer tokens, API keys, or message bodies flow to the summarizer provider. 2. **Tool-result bodies** — file reads, KV stores, MCP responses, scraped pages are forwarded verbatim; a one-time login link or an `http_get` of an env dump would be sent to a second provider. 3. **Folded-forward prior summary** — the running summary is re-fed into every subsequent compaction (`prevSummary` at line 240-244), so a single secret that lands in one summary persists across many compactions and provider hops. Fix: scrub tool-call args and tool-result bodies for known secret-bearing tool names before forwarding (at minimum, drop args for `mcp_call`/`email_send`/`http_*`/`webhook_*` and replace tool-result bodies with `[redacted: <name>]`); optionally also require `SummarizerTier` to resolve to the same provider as the run tier (kernel-level policy check). - **`compact/compactor.go` (empty summarizer response path, ~line 268-271)** — verified. If `strings.TrimSpace(resp.Text())` is empty, the compactor returns `errors.New("compactor: summarizer returned empty text")` to the agent loop, which then sends the original uncompacted slice. Correct behaviour, static error string, harmless — not a security bug, flagged for completeness only. - **`run/executor.go:230-245` (`finishAudit`) — full run output persisted to the recorder's `Close`.** Verified. `stats.Output = res.Output` and `stats.Error = runErr.Error()` are written verbatim. By design, but worth confirming the audit battery implementation redacts known-secret fields before the row lands in `skill_run_logs` — the kernel cannot enforce that. Not a blocker. - **`run/MergeCancellation` goroutine hygiene** — verified correct. Watcher goroutine (runengine.go:62-73) exits via either `merged.Done()` (triggered by the caller's `cancel`) or `secondary.Done()`. No leak. - **Workflow secret change (`.gitea/workflows/adversarial-review.yml`)** — verified. Old config had hardcoded LAN IPs (`192.168.0.175`, `192.168.0.173`) inline; new config moves them behind `${{ secrets.GADFLY_ENDPOINT_M1 }}` / `…_M5`. Strictly an improvement. No authn/authz, SQL/command injection, SSRF, deserialization, or credential-leakage regressions introduced by this PR — the new code stays at the kernel/seam level and doesn't handle untrusted raw input directly. The material item above is the compactor cross-tier data flow, which is a real privacy-relevant exposure even if it doesn't qualify as "blocking" for the light-host scope (gadfly doesn't enable compaction). </details> <details><summary>🎯 Correctness — Blocking issues found</summary> Good, those exist. Now let me write the review. VERDICT: **Blocking issues found** - **run/runengine.go:62-73** — `MergeCancellation`'s goroutine **leaks if `secondary` is a strict descendant of `parent`** (i.e. never done while parent is also never done) — and the function's own `cancel(nil)` return path does NOT signal the watcher goroutine. At the only call site (`run/executor.go:190-193`), `runCtx` is a child of `ctx` via `WithTimeout(WithoutCancel(ctx))`, so the leak is bounded by `maxRuntime`, but the goroutine launched from `cancel := context.WithCancelCause(parent)` is leaked whenever `parent` outlives `secondary`'s Done. The contract documented in the comment ("Always call the returned cancel to release the watcher goroutine") is broken because `defer mergeCancel()` calls `cancel(nil)`, which cancels the merged context but doesn't make the goroutine exit — the goroutine selects on `<-merged.Done()` or `<-secondary.Done()`. After cancel(nil), merged is Done, so the goroutine returns. **OK on inspection — but the goroutine continues running until EITHER branch fires; if the merge is called and the parent never cancels and secondary never cancels, it leaks.** The goroutine ends only on merged.Done or secondary.Done; the `cancel(nil)` path DOES cause merged.Done. So OK. Re-checking: the goroutine exits when `<-merged.Done()` returns OR `<-secondary.Done()` returns. The caller's `defer mergeCancel()` calls `cancel(nil)`, which causes `merged.Done()`. So the goroutine DOES exit. **Not a leak.** OK. Final findings: **VERDICT: Minor issues** - **run/executor.go:166-185** — `stepObserver` zips `s.Response.ToolCalls` (model-emitted) with `s.Results` (executed) by index `i`. If `len(Results) > len(calls)` (e.g. one call produced a synthetic result, or the loop retried), the extra results get an empty `ToolCall{}` and the step emitter logs them with empty name + args. Conversely if `len(calls) > len(Results)` (a call errored before result), those calls produce no step at all. This silently produces wrong step records when the agent loop's response/results arrays diverge. Verified the executor code at lines 170-184. - **run/executor.go:155** — `Registry.Build(..., nil)` passes a nil `AuditHook` to every low-level tool. The tool registry's Build plumbs this onto `toolInv.audit`, and any tool handler that calls `EmitAudit(inv, ...)` against a tool with `audit=nil` will nil-panic. The Ports.Audit (RunRecorder) path is wired separately via `rec.OnTool`, so this is not a contradiction — but a host wiring LowLevelTools that does emit per-tool audit events gets nil there. **Minor gap, not a blocker** — it falls under "follow-ups" but should be flagged. - **run/executor.go:202-205** — `e.cfg.Compactor(threshold, nil)` discards `onFire`. The comment on `compact.CompactionEvent` claims "the executor can log it to skill_run_logs", but the wiring doesn't. Intentional per follow-ups but the comment misleads. - **run/steps.go:130-138** — `matchPending` fallback "most recent still-running step of the same tool name" closes the wrong step when two same-named tool calls are in flight with args that get re-canonicalised (e.g. JSON key ordering differs between start and end) — the first "end" closes the wrong "start". Then the next "end" can't find a pending match and synthesizes a fresh step, duplicating the second invocation. Verified at lines 87-103 (`toolEnd`) and 128-138 (`matchPending`). - **run/runengine.go:62-73** — Minor: the goroutine selects on `<-merged.Done()` and `<-secondary.Done()`. There's a race window: if `secondary` cancels AND `parent` (and thus merged) was about to fire cancel(nil) in the deferred cancel, the goroutine may already have returned via `merged.Done()`. Not a correctness issue but `WithCancelCause` is overkill here — `WithCancel` would do, and the cause-propagation is only useful when `secondary` is the trigger (which is the path the comment describes). - **compact/compactor.go:175-180** — `renderCompacted` re-checks `msgs[0].Role == llm.RoleSystem` even when `prefixEnd > 0`. If the user calls compact again with the synthetic summary as msgs[0] (which is `llm.UserText` — not system role), the conditional correctly skips it. OK. But the function writes a *new* synthetic summary every call: a second compaction produces `[system, summary1, user(summary2), tail]` — except `prefixEnd` advances past `endMiddle`, so `renderCompacted` would emit `[system, summary2, tail2]` using the OLD summary text. **Bug check**: at line 175 the `out` appends the (single) summary message even though the prior compaction's summary was already a user-role message; the second compaction's render keeps `msgs[0]` (system) if present, then prepends ONE summary message built from current `summaryText`, then tail from `prefixEnd`. So old summaries are not duplicated — `prefixEnd` jumps past them. OK. Verified. - **model/call.go:240** — `if u.InputTokens == 0 && u.OutputTokens == 0 && u.CacheReadTokens == 0 && u.CacheWriteTokens == 0` — adds cache tokens to the gate. A provider that returns ONLY cache tokens (no input/output counted) now records usage. This is a correctness improvement but **may double-count** if a downstream sink (e.g. llmmeta ledger) also counts cache tokens separately. Not visible here; depends on the sink's contract. Verified the four fields exist via the field access. No material issues found at the blocking level. The agent-loop wiring, model resolution, budget gate, audit-port test, and the executor happy path look correct against the verified code. **VERDICT: Minor issues** - run/executor.go:170-184 — step observer's index-based zip of `s.Response.ToolCalls` with `s.Results` silently produces empty-named/arg steps when arrays diverge (verified) - run/executor.go:155 — nil AuditHook plumbed to `tool.Registry.Build` will nil-panic any low-level tool handler that calls EmitAudit (verified against tool/registry.go:689) - run/steps.go:130-138 — `matchPending` fallback closes wrong step on JSON re-canonicalisation; second invocation synthesises a duplicate step (verified) - run/executor.go:202-205 — Compactor onFire callback always nil; doc comment on `CompactionEvent` misleads about logging (verified) </details> <details><summary>🧯 Error handling & edge cases — Blocking issues found</summary> Now let me finalize. The findings I've confirmed as material issues: 1. **CONFIRMED** — `run/executor.go:128-132` nil model not checked (compactor checks, executor doesn't). 2. **CONFIRMED** — `compact/compactor.go:184` returns rendered, not original. 3. **CONFIRMED** — `run/executor.go:210-213` Status never set to timeout/cancelled/budget_exceeded. 4. **CONFIRMED** — `run/executor.go:174-184` step observer zip with nil response. 5. **CONFIRMED** — `run/steps.go:132-149` matchPending name-only fallback. 8. **CONFIRMED** — `compact/compactor.go:279-298` estimateTokens silently ignores non-text/non-image parts. 9. **CONFIRMED** — `model/cloud_sync.go:317, 370` io.LimitReader truncates silently. Findings 6, 7, 10 — DROP since the review itself concludes "No fix needed" / "this is fine" / only speculative defensive coding. Verdict: Still "Blocking issues found" since multiple confirmed findings remain. **VERDICT: Blocking issues found** - **`run/executor.go:128-132` — nil model from `Models` resolver is not checked** Only `err != nil` is checked after `e.cfg.Models(ctx, tier)`. A resolver returning `(ctx, nil, nil)` flows straight into `agent.New(model, ...)` — nil-pointer panic. Note `compact/compactor.go:233` already guards this case with `errors.New("compactor: summarizer model resolved to nil")`, but the executor doesn't. Suggested fix: add `if model == nil { res.Err = fmt.Errorf("resolve model %q: nil llm.Model", tier); return res }` next to the existing err check. - **`compact/compactor.go:181-185` — error path returns the previously-compacted slice, not the original** On summarizer failure the function does `return rendered, fmt.Errorf("compactor: summarise middle: %w", err)`. `rendered` is the current rendered view (which after any prior successful fold is already a synthetic-summary slice), not `msgs`. The file header explicitly states: *"any error (LLM unavailable, malformed response, etc.) is returned to the agent loop, which treats compactor errors as non-fatal and sends the original slice."* The code contradicts that contract. Fix: `return msgs, fmt.Errorf(...)` to match the documented behavior. - **`run/executor.go:210-213` — `Status` field never set to `timeout` / `cancelled` / `budget_exceeded`** The `RunStats.Status` doc (`run/ports.go:56`) enumerates `ok | error | timeout | budget_exceeded | cancelled | dry_run`, but the executor only writes `"ok"` or `"error"`. A run that hits the `context.WithTimeout(runCtx, maxRuntime)` deadline (`context.DeadlineExceeded`), one cancelled via `MergeCancellation` with cause `ErrShutdown`, etc., all collapse into `"error"`. Audit consumers lose the ability to distinguish user-cancelled runs from LLM/loop failures. Fix: inspect `runCtx.Err()` (`context.DeadlineExceeded` ⇒ `"timeout"`) and `context.Cause(ctx)` (when `ErrShutdown` ⇒ `"cancelled"`). - **`run/executor.go:174-184` — step observer zip with nil response produces empty-name tool steps** When `s.Response == nil` but `s.Results` is populated, the `for i, r := range s.Results` loop zero-values `call := llm.ToolCall{}` for every result, then `emitter.toolStart(ctx, "", nil)` records a step with `Title: ""`. `corrKey("", nil)` returns `"\x00"`, shared by every such step — they collide in `e.pending["\x00"]`. The downstream UI shows ghost "tool" steps (via `kindForTool("")` falling through to `"tool"`) and the audit recorder records empty-named tool calls via `rec.OnTool(call, r.Content)`. Fix: guard with `if s.Response == nil { continue }`, or skip empty-name entries. - **`run/steps.go:132-149` — `matchPending` name-only fallback correlates wrong step when args differ** If the `corrKey(name, args)` lookup misses (e.g. whitespace/normalization differences between the `before` hook and the step observer, or `json.RawMessage` round-trips with key reordering), the fallback scans in reverse for any still-running step of the same tool name. Two concurrent `web_search` calls with different queries both pending, args don't match either pending key, and the wrong running step gets marked complete (with the wrong `summaryForEnd`). The misattributed step leaks out in `Result.Steps` and the audit `OnTool` log. Fix: at minimum, log/warn on the mismatch; better, index identical-name concurrent calls in a per-name FIFO. - **`compact/compactor.go:282-289` — `estimateTokens` silently ignores non-text/non-image parts** Any `llm.Message.Parts` entry that isn't `TextPart` or `ImagePart` (e.g. an audio part or future tool-input part) contributes 0 chars. The type switch has no default arm. The compactor's threshold check then underestimates; a run accumulating large unrecognized content will not trigger compaction and will hit the model's HTTP-400 cap exactly as the file's own header warns against. Fix: add a default arm with a conservative fallback (e.g. estimate via `len(fmt.Sprint(part))` or a flat per-part char count), or document the supported part kinds. - **`model/cloud_sync.go:317, 370` — `io.LimitReader` truncates silently, surfacing as opaque JSON parse error** When ollama.com returns ≥ 1 MiB, `io.ReadAll(io.LimitReader(resp.Body, maxLimitCacheResponseBytes))` returns EOF mid-stream with no error. Downstream `json.Unmarshal` then fails with `unexpected end of JSON input` — not actionable for an operator diagnosing a misbehaving endpoint. Fix: after `ReadAll`, check `len(body) >= maxLimitCacheResponseBytes` and return a clear `fmt.Errorf("cloud_sync: response truncated at %d bytes (limit %d)", len(body), maxLimitCacheResponseBytes)`. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 37s

gitea-actions bot commented

2026-06-27 00:45:49 +00:00

🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5)

Verdict: No material issues found · ⚠️ 2/3 lens(es) errored — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found.

All four draft findings were examined against the actual code and either mischaracterized or represent inherent design trade-offs rather than security vulnerabilities:
- Finding 1 (LLM Prompt Injection): The compactor's renderTranscript applies a per-message 2KB cap (transcriptMessageCap) before injecting content into the LLM prompt. This is standard practice for context compaction; prompt injection risk is limited by the truncation and the fact that the output becomes a summary paragraph rather than raw transcript. No material vulnerability.
- Finding 2 (CI Credentials Exposure): The workflow uses ${{ secrets.GADFLY_ENDPOINT_M1 }} syntax, which references Gitea-managed secrets injected only at runtime in CI — these are never exposed to PR reviewers or branch code. Standard and safe secret management.
- Finding 3 (Unbounded User Input): executor.go's input parameter originates from Discord messages (2000 char hard limit) and any model provider enforces context window caps well before memory exhaustion. The claimed "OOM" scenario is not exploitable in this architecture.
- Finding 4 (Budget Commit on Error Paths): Budget.Commit fires with time.Since(started).Seconds() on all paths, but the draft's attack scenario is incorrect: callers cannot "retry infinitely" because Budget.Check gates execution and consumes budget first. The system charges for wall-clock time regardless of success/failure — an attacker would consume their allocated budget rather than gain free compute. No material issue.

🎯 Correctness — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 6: context deadline exceeded

🧯 Error handling & edge cases — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 10: context deadline exceeded

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 28m 57s}

### 🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5) **Verdict: No material issues found · ⚠️ 2/3 lens(es) errored** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> **No material issues found.** - All four draft findings were examined against the actual code and either mischaracterized or represent inherent design trade-offs rather than security vulnerabilities: - **Finding 1 (LLM Prompt Injection):** The compactor's `renderTranscript` applies a per-message 2KB cap (`transcriptMessageCap`) before injecting content into the LLM prompt. This is standard practice for context compaction; prompt injection risk is limited by the truncation and the fact that the output becomes a summary paragraph rather than raw transcript. No material vulnerability. - **Finding 2 (CI Credentials Exposure):** The workflow uses `${{ secrets.GADFLY_ENDPOINT_M1 }}` syntax, which references Gitea-managed secrets injected only at runtime in CI — these are never exposed to PR reviewers or branch code. Standard and safe secret management. - **Finding 3 (Unbounded User Input):** `executor.go`'s `input` parameter originates from Discord messages (2000 char hard limit) and any model provider enforces context window caps well before memory exhaustion. The claimed "OOM" scenario is not exploitable in this architecture. - **Finding 4 (Budget Commit on Error Paths):** Budget.Commit fires with `time.Since(started).Seconds()` on all paths, but the draft's attack scenario is incorrect: callers cannot "retry infinitely" because Budget.Check gates execution and consumes budget first. The system charges for wall-clock time regardless of success/failure — an attacker would consume their allocated budget rather than gain free compute. No material issue. </details> <details><summary>🎯 Correctness — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 6: context deadline exceeded </details> <details><summary>🧯 Error handling & edge cases — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 10: context deadline exceeded </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 28m 57s

steve added 1 commit 2026-06-27 01:42:47 +00:00

fix: address verified gadfly P2 findings (9 real of 18)

executus CI / test (pull_request) Successful in 1m0s

Details

69c2eb5f47

Independently verified all 18 gadfly findings against the code (18-agent
fan-out). Fixed the 9 real ones; the other 9 were false-positive /
hallucinated / valid-tradeoff (no change).

High:
- F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the
  agent loop and nil-panicked. Now a clean error (Run never panics). +test.
- F9 compactor data-leak: renderTranscript sent tool-call args verbatim to
  the summarizer (a possibly-different provider/tier); secret-bearing tool
  args (mcp_call/email_send/http_*/webhook_*) are now redacted, with a doc
  note that result bodies still flow (summary needs them).

Medium/minor:
- F2 compactor error path returned the folded slice, not the original msgs
  (contradicting the documented non-fatal contract) -> return msgs.
- F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) /
  cancelled (Canceled) via statusFor. +test.
- F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs
  min(calls,results) only.
- F5 SetIteration was never called -> RunState.Iteration always 0; the step
  observer now updates it each loop.
- F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue).
- F7 estimateTokens had no default arm (future Part kinds counted as 0);
  unknown parts now counted conservatively.
- F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now
  a clear "response exceeded N bytes" via readCapped.
- F12 step observer captured the caller ctx; now the merged runCtx.
- F13 compaction onFire was nil (doc claimed it logged); now wired to
  audit LogEvent("compaction_fired").
- F11 (no pre-dispatch hook in majordomo) documented honestly as a known
  limitation; F18 UsageSink doc clarified cache tokens are subsets of input.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve added 1 commit 2026-06-27 02:01:08 +00:00

ci(gadfly): cloud-only fleet (3 models, drop local Macs)

executus CI / test (pull_request) Successful in 58s

Details

84e84f9785

Measured on the P2 review: the local Macs (m1/m5) took 26–29 min with lens
timeouts and found ZERO real bugs, while the two cloud models found every
genuine finding in 6–12 min. Drop the Macs; add glm-5.2:cloud as a third
cloud reviewer. Net: faster (~29→~12 min) and higher signal.

Models: minimax-m3:cloud, deepseek-v4-flash:cloud, glm-5.2:cloud
(ollama-cloud=3 concurrency). timeout-minutes 90→30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve merged commit b35514dfaa into main

2026-06-27 02:02:21 +00:00

steve deleted branch phase-2-run-kernel

2026-06-27 02:02:21 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: steve/executus#2

P2: run kernel + run.Ports inversion — executus is runnable #2

The inversion

What's here

Verification

Follow-ups (incremental, not blocking)

🪰 Gadfly review — m1/qwen3:14b (m1)

Blocking issues found

1. compact/compactor.go - Inaccurate token estimation heuristic

2. run/steps.go - Null byte vulnerability in correlation key

3. run/executor.go - Missing error handling in compaction

4. compact/compactor.go - Summary truncation risk

5. run/executor.go - Inconsistent error propagation

🪰 Gadfly review — deepseek-v4-flash:cloud (ollama-cloud)

Corrected Review

🔒 Security Review — Verdict: Minor issues

Finding 1: Context compactor sends unfiltered conversation data to a summarizer LLM (potential data leakage)

No other material security issues found

🎯 Correctness — Verdict: Blocking issues found

1. SetIteration never called — self-status tool always reports iteration 0

2. toolStart called from post-step observer, not pre-dispatch — step timing is wrong

3. toolStart called with empty name/args when results outnumber tool calls

4. estimateTokens comment/image discrepancy (minor, not blocking)

🧯 Error handling & edge cases — Verdict: Minor issues

Findings

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

🪰 Gadfly review — m5/qwen3.6:35b-mlx (m5)

🪰 Gadfly review — `m1/qwen3:14b` (m1)

1. `compact/compactor.go` - Inaccurate token estimation heuristic

2. `run/steps.go` - Null byte vulnerability in correlation key

3. `run/executor.go` - Missing error handling in compaction

4. `compact/compactor.go` - Summary truncation risk

5. `run/executor.go` - Inconsistent error propagation

🪰 Gadfly review — `deepseek-v4-flash:cloud` (ollama-cloud)

1. `SetIteration` never called — self-status tool always reports iteration 0

2. `toolStart` called from post-step observer, not pre-dispatch — step timing is wrong

3. `toolStart` called with empty name/args when results outnumber tool calls

4. `estimateTokens` comment/image discrepancy (minor, not blocking)

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5)