P2: run kernel + run.Ports inversion — executus is runnable #2
Reference in New Issue
Block a user
Delete Branch "phase-2-run-kernel"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Phase 2: the run kernel and the inversion that lets one executor serve every host. executus is now runnable.
The inversion
mort's
agentexec/skillexecimportagents/agentcritic/skillaudit/skills/pastedirectly. Here the kernel depends only on:RunnableAgent(a DTO) — the persona Agent and saved Skill will each lower into it, so the kernel never imports a noun battery.run.Ports(all nil-safe) —Audit/RunRecorder,Budget,Critic/CriticHandle,Checkpointer,PaletteSource,Delivery— each mirroring mort's existing interface so the batteries implement them directly.RunTally— inverts the*skillaudit.Writerdependency in the run-state accessor.What's here
run/— run-loop mechanics (cancel-merge, finalizers, run-state accessor, submit-capture, progress bridge), theRunnableAgentDTO, the fullrun.Ports, step instrumentation, andrun.Executor:New(Config{...})+Run(ctx, RunnableAgent, inv) Result. It wires model resolution + the tool registry + majordomo's loop + compaction + run-bounding (V10 detached-timeout) + step/audit observers + the Budget gate. Zero Ports = a bounded in-memory run (gadfly's case).compact/— the per-run context compactor.examples/minimal— upgraded to the real "hello, agentic world" (~15 lines:Configuretiers →run.New→Run→ print).io.LimitReader).Verification
go build/vet/test -race ./...green. Corego.sumstill free of gorm/redis/discordgo/sqlite.Follow-ups (incremental, not blocking)
Wire
Critic/Checkpointer/PaletteSource/Deliveryinto the loop, multi-phase pipelines, and the no-tools direct path. The seams are defined and nil-safe; these light up as P4 (persona/skill + batteries) needs them.Gadfly review is advisory — findings will be double-checked.
🤖 Generated with Claude Code
The capstone of the run kernel: run.Executor.Run(ctx, RunnableAgent, inv) ties model resolution + the tool registry + majordomo's agent loop + context compaction + run-bounding + step/audit instrumentation into one path, with every host concern behind the nil-safe run.Ports. - run/executor.go: New(Config{Registry, Models, Defaults, Ports, Compactor, ContextTokens, SystemHeader}) + Run -> Result{RunID, Output, Steps, Usage, Err}. Budget gate (pre-run), model resolve, Audit StartRun/recorder (satisfies RunTally, stamped on inv.RunState), toolbox build, step observer (zips tool calls/results -> emitter + recorder.OnStep/OnTool), V10 detached-MaxRuntime context with caller-cancel merged back, compaction wired from ContextTokens×ratio, audit Close + Budget Commit on a detached cleanup ctx. Zero Ports = a bounded in-memory run (gadfly's case). - run/executor_test.go: hermetic end-to-end run against majordomo's fake provider (hello-world), Budget-rejection (no model call), Audit-port wiring (StartRun + Close with terminal status/output). All green under -race. - examples/minimal upgraded to the real "hello, agentic world" (~15 lines: Configure tiers -> run.New -> Run -> print). README/CLAUDE.md updated. Remaining P2 follow-ups (incremental): wire Critic/Checkpointer/PaletteSource/ Delivery into the loop, multi-phase Pipelines, and the no-tools direct path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>🪰 Gadfly review —
m1/qwen3:14b(m1)Verdict: Blocking issues found · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
The code introduces a new run kernel and inversion in executus, focusing on making it runnable with nil-safe ports and a RunnableAgent DTO. Through a security lens, I examined for authn/authz gaps, injection vulnerabilities, unsafe deserialization, secret leakage, input validation, and handling of untrusted data.
Key areas reviewed:
run.Portsinterfaces (Audit, Budget, Critic, etc.) are all nil-safe and don't show any direct handling of untrusted data that would require validation.renderTranscript, which mitigates potential injection risks.io.LimitReaderinmodel/cloud_sync.goprevents excessive memory usage from large responses.runengine.gohelps prevent long-running processes.No critical security issues were found in the reviewed code. The implementation appears to handle security considerations appropriately, with proper use of interfaces, context management, and input sanitization where necessary.
🎯 Correctness — Blocking issues found
Blocking issues found
1.
compact/compactor.go- Inaccurate token estimation heuristicPath:
compact/compactor.go:156-161The
estimateTokensfunction uses a simplistic heuristic ofchars / 4to estimate token count, which is not accurate for modern LLMs. This could lead to:Fix: Replace with model-specific token estimation (e.g., using
llm.Model.EstimateTokens()if available) or a more accurate heuristic derived from actual model tokenization statistics.2.
run/steps.go- Null byte vulnerability in correlation keyPath:
run/steps.go:167-170The
corrKeyfunction uses\x00as a delimiter, but JSON raw messages (json.RawMessage) can contain null bytes in practice (e.g., from binary data or malformed JSON). This creates a potential collision vulnerability where:{"a": "\x00b"}Fix: Replace
\x00with a safer delimiter that cannot appear in valid JSON (e.g.,"\x1E"or use a cryptographic hash of the combined name+args).3.
run/executor.go- Missing error handling in compactionPath:
run/executor.go:162-165The compaction code returns
fmt.Errorf("compactor: summarise middle: %w", err)but does not handle specific error cases that could indicate:Fix: Add explicit error handling for these cases, and consider returning different error types to allow the executor to make informed decisions (e.g., retry, fallback, or terminate).
4.
compact/compactor.go- Summary truncation riskPath:
compact/compactor.go:224-231The
truncatefunction adds "...(truncated)" to messages exceedingtranscriptMessageCap(2048 chars). However, this truncation could remove critical context in tool results, leading to:Fix: Consider increasing the
transcriptMessageCapor implementing a more sophisticated truncation strategy that preserves key information (e.g., truncating after the last complete sentence).5.
run/executor.go- Inconsistent error propagationPath:
run/executor.go:167-170The code returns
fmt.Errorf("resolve model %q: %w", tier, err)but does not consistently propagate errors through all code paths. For example:tool.Buildcall returns an error but doesn't check it before proceedingagent.Runcall could return errors that are not properly handledFix: Add comprehensive error checking and propagation throughout the code, ensuring all error paths are properly handled and reported.
Verified with:
steps.go🧯 Error handling & edge cases — ⚠️ could not complete
⚠️ This reviewer failed to complete: agent: step 0: all chain targets failed
m1/qwen3:14b: ollama m1/qwen3:14b: do request: Post "https://foreman-m1.orgrimmar.dudenhoeffer.casa/api/chat": context deadline exceeded
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 26m 25s
🪰 Gadfly review —
deepseek-v4-flash:cloud(ollama-cloud)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
Now I have verified all the claims in the draft review against the actual code. Let me produce the corrected review.
Corrected Review
🔒 Security Review — Verdict: Minor issues
Finding 1: Context compactor sends unfiltered conversation data to a summarizer LLM (potential data leakage)
compact/compactor.go:225–272— ThesummariseMiddlefunction renders the full middle-range conversation transcript (including tool call arguments, tool results, URLs, file IDs, KV keys, and any other data the agent has touched) viarenderTranscript(lines 308–324) and sends it verbatim to the summarizer model viamodel.Generate. The prompt explicitly asks for "ANY file_ids or KV keys the agent created" and "KEY findings or decisions (factual results the agent will need later)."Impact: If the summarizer tier (default
"fast") resolves to a different model provider than the main agent model, all intermediate conversation data — including potentially sensitive tool results, API responses, file contents, and user data — is transmitted to that provider's endpoint. Even within the same provider, the data reaches a different model endpoint with potentially different data-handling policies. There is no filtering, redaction, or classification gate before data leaves the process.Verified by: Reading
compact/compactor.golines 225–272 (summariseMiddle), lines 308–324 (renderTranscript), and the executor's wiring atrun/executor.go:201–204where the compactor is attached.Suggested fix: Add a configurable content-filtering layer (e.g., a
Sanitize func([]llm.Message) []llm.Messagecallback onCompactorConfig) that hosts can use to strip or redact sensitive message parts before they reach the summarizer. At minimum, document prominently inCompactorConfigthat the summarizer receives the full unfiltered transcript and that hosts should ensure the summarizer tier's data handling is consistent with their security requirements.No other material security issues found
model/cloud_sync.go— Theio.LimitReaderaddition (lines 317, 370) is a correct SSRF-hardening fix; 1 MiB is a reasonable cap.run/steps.go— ThesummaryForStartfunction properly redacts sensitive tool arguments formcp_call,email_send, andhttp_*tools (showing only server/host, never args/body). Other tools' argument extraction viaargStringis display-only and does not execute or store the values unsafely.run/executor.go— The Budget gate fires before any model call, preventing unauthorized resource consumption. Context detachment for cleanup ops (detach()) follows the established pattern and is not a security concern.run/submit.go— The submit tool handler only stores the model's output string; no injection vector.run/runengine.go—MergeCancellationgoroutine is properly cleaned up via the returned cancel function (called bydeferin the executor). No leak.llmmeta/helper.go— Adding error logging forRecordMetaCallis an observability improvement, not a security issue.🎯 Correctness — Blocking issues found
🎯 Correctness — Verdict: Blocking issues found
1.
SetIterationnever called — self-status tool always reports iteration 0run/executor.go:166-185The
stepObserverclosure inRun()does not calla.SetIteration(s.Index), even thoughRunStateAccessor.SetIterationexists (run/runengine.go:142) and the doc comment on line 119 says "the executor's step observer calls SetIteration each loop." Thes.Indexvalue is available (it's passed torec.OnStep(s.Index, s.Response)on line 168), but it is never stored in the accessor. As a result, any self-status tool readingRunState().Iterationwill always see 0, making the live-progress feature non-functional.Fix: Add
inv.RunState.SetIteration(s.Index)(or equivalent) inside thestepObserverclosure, after therec.OnStepcall. The accessor was stamped ontoinv.RunStateat line 151, so it is reachable.2.
toolStartcalled from post-step observer, not pre-dispatch — step timing is wrongrun/executor.go:179-180The design intent (stated in
run/steps.go:37-39) is thattoolStartfires from a pre-dispatch tool hook (before the tool runs) andtoolEndfires from the post-step observer (after). In the current code, bothtoolStartandtoolEndare called from the samestepObserverclosure, which runs after the tool has already completed. This means:step.StartedAtis set to the time the stepObserver fires (after the tool finished), not when the tool actually began executing.OnStepconsumers — both "start" and "end" events fire in the same callback invocation, so the step appears to complete instantaneously.Fix: Wire a pre-dispatch hook (e.g. via
llm.Toolbox'sBeforeToolmechanism oragent.WithToolHook) that callsemitter.toolStart(ctx, name, args)before each tool executes. ThestepObservershould then only callemitter.toolEnd.3.
toolStartcalled with empty name/args when results outnumber tool callsrun/executor.go:174-180When
s.Responseis nil orlen(s.Results) > len(s.Response.ToolCalls), thecallvariable on line 175-177 remains its zero value (llm.ToolCall{}). The subsequentemitter.toolStart(ctx, call.Name, call.Arguments)on line 179 creates a step with an empty title (""). ThematchPendingfallback intoolEndthen uses correlation key"\x00"(empty name + nil args), which could collide across multiple such steps. While this edge case may be rare in practice, it produces semantically broken step records.Fix: Guard the
toolStart/toolEndpair with a check thatcall.Name != "", or skip the iteration wheni >= len(calls). Alternatively, ensure the pre-dispatch hook (see finding #2) has the real call data and only thetoolEndruns in the stepObserver.4.
estimateTokenscomment/image discrepancy (minor, not blocking)compact/compactor.go:280-281The comment says "Images count a flat ~1K tokens each" but the code adds
4096chars. At thechars/4divisor, 4096 chars = 1024 tokens, which is indeed ~1K tokens. The comment describes the token count while the code stores chars — this is actually consistent, just confusingly documented. Not a bug, but the comment could be clearer: "Images count a flat 4096 chars (~1K tokens)."🧯 Error handling & edge cases — Minor issues
Now I have all the information I need. Let me verify each finding against the actual code.
Finding 1 —
SetIterationnever called: CONFIRMED. TheSetIterationmethod exists atrun/runengine.go:142and the comment at line 119 says "the executor's step observer calls SetIteration each loop", but the actualstepObserverclosure atrun/executor.go:166-185never calls it. A grep for.SetIteration(across the entire repo returns zero matches. The iteration counter is dead code.Finding 2 — Zero-value
llm.ToolCallwhen results exceed tool calls: CONFIRMED. Atrun/executor.go:174-178, wheni >= len(calls),callstays as the zero-valuellm.ToolCall(emptyName, nilArguments). This is then passed toemitter.toolStart(ctx, call.Name, call.Arguments)at line 179, which callsnewStep("", nil)atsteps.go:114, producing a step withTitle: ""andSummary: "Using "(from thedefaultcase insummaryForStart).Finding 3 —
stepObservercapturesctxnotrunCtx: CONFIRMED. ThestepObserverclosure at line 166 capturesctx(the original caller context, possibly enriched by model resolution at line 133).runCtxis created later at line 190 withcontext.WithoutCancel(ctx)to detach from the caller's deadline. The stepObserver passesctxtoemitter.toolStart/toolEnd, which forward it to the user'sonStepcallback. If the caller cancels before the agent loop finishes, the callback receives a cancelled context. TheMergeCancellationat line 192 will propagate the cancellation torunCtxshortly after, so the race window is small but real.Finding 4 —
msgs[0]access inrenderCompacted: The draft itself says this is safe and not a bug. The guard at line 205 (st.prefixEnd > len(msgs)) combined withst.prefixEnd > 0guaranteeslen(msgs) > 0. This is a non-issue note, not an actionable finding. DROPPED.Finding 5 — Negative
endMiddleincompact/compactor.go:173: The draft itself says this is safe and not a bug. The guard at line 174 catches it. Non-issue note. DROPPED.🧯 Error handling & edge cases — Verdict: Minor issues
Findings
1.
run/executor.go:166-185—RunStateAccessor.SetIterationnever called, iteration always reports 0The
stepObserverclosure receivess.Index(the current agent-loop iteration) but never propagates it to theRunStateAccessorstored oninv.RunState. The accessor'sSetIterationmethod exists (run/runengine.go:142) and the doc comment at line 119 says "the executor's step observer calls SetIteration each loop", but the actual step observer code never invokes it. A repo-wide grep confirms.SetIteration(is never called anywhere. This means the self-status tool (and any consumer oftool.RunState) will always seeIteration: 0for the entire run.Suggested fix: Add a call inside the stepObserver:
2.
run/executor.go:174-183— Zero-valuellm.ToolCallwhen results exceed tool callsWhen
len(s.Results) > len(s.Response.ToolCalls)(possible if a tool errors without a corresponding model-issued call, or on agent-loop edge cases), the extra results iterate with a zero-valuellm.ToolCall— emptyNameand nilArguments. This creates steps with an emptyTitlein the step emitter, producing confusing output likeUsing(empty name) in the step summary.Suggested fix: Skip results that have no matching tool call, or synthesize a fallback name:
3.
run/executor.go:166—stepObservercaptures original callerctx, notrunCtxThe
stepObserverclosure capturesctx(the original caller context, enriched by model resolution) rather thanrunCtx(the detached run context created at line 190). If the caller cancels their context while the agent loop is still running, theonStepcallback receives a cancelled context. TheMergeCancellationmechanism will terminate the loop shortly after, but there is a race window where the stepObserver fires with a cancelled context.Suggested fix: Either capture
runCtxinstead (by moving the closure definition afterrunCtxis created), or passcontext.WithoutCancel(ctx)to the emitter calls. (Low severity — the race window is small andonStepis typically used for logging.)Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 11m 54s
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
VERDICT: Minor issues
The diff is largely fine through the security lens. The executor/kernel uses nil-safe ports and proper context detach for post-run writes; the GitHub Actions workflow change moves hardcoded LAN IPs behind GitHub secrets (a small hardening win). One material concern and a few lesser ones:
compact/compactor.go(cross-tier data flow insummariseMiddle/renderTranscript) — unredacted tool content forwarded to the summarizer LLM. Verified.summariseMiddle(around line 229) resolvescfg.Models(ctx, cfg.SummarizerTier)— the compactor docstring itself documents that the production default is"fast"but an admin may set"haiku"or any specific model spec — so the summarizer can be a different provider than the one serving the run.renderTranscript(lines 308–323) emitstool_call name=… args=…andtool_result name=… body=…lines verbatim, with only a 2KB-per-body length cap (transcriptMessageCap); there is no redaction of secret-bearing tool fields, even thoughrun/steps.go:205-323explicitly classifiesmcp_call,email_send, andhttp_*as secret-bearing and redacts their args in user-facing step summaries. Three concrete exposures:mcp_call/email_send/http_getarguments that embed bearer tokens, API keys, or message bodies flow to the summarizer provider.http_getof an env dump would be sent to a second provider.prevSummaryat line 240-244), so a single secret that lands in one summary persists across many compactions and provider hops.Fix: scrub tool-call args and tool-result bodies for known secret-bearing tool names before forwarding (at minimum, drop args for
mcp_call/email_send/http_*/webhook_*and replace tool-result bodies with[redacted: <name>]); optionally also requireSummarizerTierto resolve to the same provider as the run tier (kernel-level policy check).compact/compactor.go(empty summarizer response path, ~line 268-271) — verified. Ifstrings.TrimSpace(resp.Text())is empty, the compactor returnserrors.New("compactor: summarizer returned empty text")to the agent loop, which then sends the original uncompacted slice. Correct behaviour, static error string, harmless — not a security bug, flagged for completeness only.run/executor.go:230-245(finishAudit) — full run output persisted to the recorder'sClose. Verified.stats.Output = res.Outputandstats.Error = runErr.Error()are written verbatim. By design, but worth confirming the audit battery implementation redacts known-secret fields before the row lands inskill_run_logs— the kernel cannot enforce that. Not a blocker.run/MergeCancellationgoroutine hygiene — verified correct. Watcher goroutine (runengine.go:62-73) exits via eithermerged.Done()(triggered by the caller'scancel) orsecondary.Done(). No leak.Workflow secret change (
.gitea/workflows/adversarial-review.yml) — verified. Old config had hardcoded LAN IPs (192.168.0.175,192.168.0.173) inline; new config moves them behind${{ secrets.GADFLY_ENDPOINT_M1 }}/…_M5. Strictly an improvement.No authn/authz, SQL/command injection, SSRF, deserialization, or credential-leakage regressions introduced by this PR — the new code stays at the kernel/seam level and doesn't handle untrusted raw input directly. The material item above is the compactor cross-tier data flow, which is a real privacy-relevant exposure even if it doesn't qualify as "blocking" for the light-host scope (gadfly doesn't enable compaction).
🎯 Correctness — Blocking issues found
Good, those exist. Now let me write the review.
VERDICT: Blocking issues found
MergeCancellation's goroutine leaks ifsecondaryis a strict descendant ofparent(i.e. never done while parent is also never done) — and the function's owncancel(nil)return path does NOT signal the watcher goroutine. At the only call site (run/executor.go:190-193),runCtxis a child ofctxviaWithTimeout(WithoutCancel(ctx)), so the leak is bounded bymaxRuntime, but the goroutine launched fromcancel := context.WithCancelCause(parent)is leaked wheneverparentoutlivessecondary's Done. The contract documented in the comment ("Always call the returned cancel to release the watcher goroutine") is broken becausedefer mergeCancel()callscancel(nil), which cancels the merged context but doesn't make the goroutine exit — the goroutine selects on<-merged.Done()or<-secondary.Done(). After cancel(nil), merged is Done, so the goroutine returns. OK on inspection — but the goroutine continues running until EITHER branch fires; if the merge is called and the parent never cancels and secondary never cancels, it leaks. The goroutine ends only on merged.Done or secondary.Done; thecancel(nil)path DOES cause merged.Done. So OK.Re-checking: the goroutine exits when
<-merged.Done()returns OR<-secondary.Done()returns. The caller'sdefer mergeCancel()callscancel(nil), which causesmerged.Done(). So the goroutine DOES exit. Not a leak. OK.Final findings:
VERDICT: Minor issues
stepObserverzipss.Response.ToolCalls(model-emitted) withs.Results(executed) by indexi. Iflen(Results) > len(calls)(e.g. one call produced a synthetic result, or the loop retried), the extra results get an emptyToolCall{}and the step emitter logs them with empty name + args. Conversely iflen(calls) > len(Results)(a call errored before result), those calls produce no step at all. This silently produces wrong step records when the agent loop's response/results arrays diverge. Verified the executor code at lines 170-184.Registry.Build(..., nil)passes a nilAuditHookto every low-level tool. The tool registry's Build plumbs this ontotoolInv.audit, and any tool handler that callsEmitAudit(inv, ...)against a tool withaudit=nilwill nil-panic. The Ports.Audit (RunRecorder) path is wired separately viarec.OnTool, so this is not a contradiction — but a host wiring LowLevelTools that does emit per-tool audit events gets nil there. Minor gap, not a blocker — it falls under "follow-ups" but should be flagged.e.cfg.Compactor(threshold, nil)discardsonFire. The comment oncompact.CompactionEventclaims "the executor can log it to skill_run_logs", but the wiring doesn't. Intentional per follow-ups but the comment misleads.matchPendingfallback "most recent still-running step of the same tool name" closes the wrong step when two same-named tool calls are in flight with args that get re-canonicalised (e.g. JSON key ordering differs between start and end) — the first "end" closes the wrong "start". Then the next "end" can't find a pending match and synthesizes a fresh step, duplicating the second invocation. Verified at lines 87-103 (toolEnd) and 128-138 (matchPending).<-merged.Done()and<-secondary.Done(). There's a race window: ifsecondarycancels ANDparent(and thus merged) was about to fire cancel(nil) in the deferred cancel, the goroutine may already have returned viamerged.Done(). Not a correctness issue butWithCancelCauseis overkill here —WithCancelwould do, and the cause-propagation is only useful whensecondaryis the trigger (which is the path the comment describes).renderCompactedre-checksmsgs[0].Role == llm.RoleSystemeven whenprefixEnd > 0. If the user calls compact again with the synthetic summary as msgs[0] (which isllm.UserText— not system role), the conditional correctly skips it. OK. But the function writes a new synthetic summary every call: a second compaction produces[system, summary1, user(summary2), tail]— exceptprefixEndadvances pastendMiddle, sorenderCompactedwould emit[system, summary2, tail2]using the OLD summary text. Bug check: at line 175 theoutappends the (single) summary message even though the prior compaction's summary was already a user-role message; the second compaction's render keepsmsgs[0](system) if present, then prepends ONE summary message built from currentsummaryText, then tail fromprefixEnd. So old summaries are not duplicated —prefixEndjumps past them. OK. Verified.if u.InputTokens == 0 && u.OutputTokens == 0 && u.CacheReadTokens == 0 && u.CacheWriteTokens == 0— adds cache tokens to the gate. A provider that returns ONLY cache tokens (no input/output counted) now records usage. This is a correctness improvement but may double-count if a downstream sink (e.g. llmmeta ledger) also counts cache tokens separately. Not visible here; depends on the sink's contract. Verified the four fields exist via the field access.No material issues found at the blocking level. The agent-loop wiring, model resolution, budget gate, audit-port test, and the executor happy path look correct against the verified code.
VERDICT: Minor issues
s.Response.ToolCallswiths.Resultssilently produces empty-named/arg steps when arrays diverge (verified)tool.Registry.Buildwill nil-panic any low-level tool handler that calls EmitAudit (verified against tool/registry.go:689)matchPendingfallback closes wrong step on JSON re-canonicalisation; second invocation synthesises a duplicate step (verified)CompactionEventmisleads about logging (verified)🧯 Error handling & edge cases — Blocking issues found
Now let me finalize. The findings I've confirmed as material issues:
run/executor.go:128-132nil model not checked (compactor checks, executor doesn't).compact/compactor.go:184returns rendered, not original.run/executor.go:210-213Status never set to timeout/cancelled/budget_exceeded.run/executor.go:174-184step observer zip with nil response.run/steps.go:132-149matchPending name-only fallback.compact/compactor.go:279-298estimateTokens silently ignores non-text/non-image parts.model/cloud_sync.go:317, 370io.LimitReader truncates silently.Findings 6, 7, 10 — DROP since the review itself concludes "No fix needed" / "this is fine" / only speculative defensive coding.
Verdict: Still "Blocking issues found" since multiple confirmed findings remain.
VERDICT: Blocking issues found
run/executor.go:128-132— nil model fromModelsresolver is not checkedOnly
err != nilis checked aftere.cfg.Models(ctx, tier). A resolver returning(ctx, nil, nil)flows straight intoagent.New(model, ...)— nil-pointer panic. Notecompact/compactor.go:233already guards this case witherrors.New("compactor: summarizer model resolved to nil"), but the executor doesn't. Suggested fix: addif model == nil { res.Err = fmt.Errorf("resolve model %q: nil llm.Model", tier); return res }next to the existing err check.compact/compactor.go:181-185— error path returns the previously-compacted slice, not the originalOn summarizer failure the function does
return rendered, fmt.Errorf("compactor: summarise middle: %w", err).renderedis the current rendered view (which after any prior successful fold is already a synthetic-summary slice), notmsgs. The file header explicitly states: "any error (LLM unavailable, malformed response, etc.) is returned to the agent loop, which treats compactor errors as non-fatal and sends the original slice." The code contradicts that contract. Fix:return msgs, fmt.Errorf(...)to match the documented behavior.run/executor.go:210-213—Statusfield never set totimeout/cancelled/budget_exceededThe
RunStats.Statusdoc (run/ports.go:56) enumeratesok | error | timeout | budget_exceeded | cancelled | dry_run, but the executor only writes"ok"or"error". A run that hits thecontext.WithTimeout(runCtx, maxRuntime)deadline (context.DeadlineExceeded), one cancelled viaMergeCancellationwith causeErrShutdown, etc., all collapse into"error". Audit consumers lose the ability to distinguish user-cancelled runs from LLM/loop failures. Fix: inspectrunCtx.Err()(context.DeadlineExceeded⇒"timeout") andcontext.Cause(ctx)(whenErrShutdown⇒"cancelled").run/executor.go:174-184— step observer zip with nil response produces empty-name tool stepsWhen
s.Response == nilbuts.Resultsis populated, thefor i, r := range s.Resultsloop zero-valuescall := llm.ToolCall{}for every result, thenemitter.toolStart(ctx, "", nil)records a step withTitle: "".corrKey("", nil)returns"\x00", shared by every such step — they collide ine.pending["\x00"]. The downstream UI shows ghost "tool" steps (viakindForTool("")falling through to"tool") and the audit recorder records empty-named tool calls viarec.OnTool(call, r.Content). Fix: guard withif s.Response == nil { continue }, or skip empty-name entries.run/steps.go:132-149—matchPendingname-only fallback correlates wrong step when args differIf the
corrKey(name, args)lookup misses (e.g. whitespace/normalization differences between thebeforehook and the step observer, orjson.RawMessageround-trips with key reordering), the fallback scans in reverse for any still-running step of the same tool name. Two concurrentweb_searchcalls with different queries both pending, args don't match either pending key, and the wrong running step gets marked complete (with the wrongsummaryForEnd). The misattributed step leaks out inResult.Stepsand the auditOnToollog. Fix: at minimum, log/warn on the mismatch; better, index identical-name concurrent calls in a per-name FIFO.compact/compactor.go:282-289—estimateTokenssilently ignores non-text/non-image partsAny
llm.Message.Partsentry that isn'tTextPartorImagePart(e.g. an audio part or future tool-input part) contributes 0 chars. The type switch has no default arm. The compactor's threshold check then underestimates; a run accumulating large unrecognized content will not trigger compaction and will hit the model's HTTP-400 cap exactly as the file's own header warns against. Fix: add a default arm with a conservative fallback (e.g. estimate vialen(fmt.Sprint(part))or a flat per-part char count), or document the supported part kinds.model/cloud_sync.go:317, 370—io.LimitReadertruncates silently, surfacing as opaque JSON parse errorWhen ollama.com returns ≥ 1 MiB,
io.ReadAll(io.LimitReader(resp.Body, maxLimitCacheResponseBytes))returns EOF mid-stream with no error. Downstreamjson.Unmarshalthen fails withunexpected end of JSON input— not actionable for an operator diagnosing a misbehaving endpoint. Fix: afterReadAll, checklen(body) >= maxLimitCacheResponseBytesand return a clearfmt.Errorf("cloud_sync: response truncated at %d bytes (limit %d)", len(body), maxLimitCacheResponseBytes).Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 5m 37s
🪰 Gadfly review —
m5/qwen3.6:35b-mlx(m5)Verdict: No material issues found · ⚠️ 2/3 lens(es) errored — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
No material issues found.
renderTranscriptapplies a per-message 2KB cap (transcriptMessageCap) before injecting content into the LLM prompt. This is standard practice for context compaction; prompt injection risk is limited by the truncation and the fact that the output becomes a summary paragraph rather than raw transcript. No material vulnerability.${{ secrets.GADFLY_ENDPOINT_M1 }}syntax, which references Gitea-managed secrets injected only at runtime in CI — these are never exposed to PR reviewers or branch code. Standard and safe secret management.executor.go'sinputparameter originates from Discord messages (2000 char hard limit) and any model provider enforces context window caps well before memory exhaustion. The claimed "OOM" scenario is not exploitable in this architecture.time.Since(started).Seconds()on all paths, but the draft's attack scenario is incorrect: callers cannot "retry infinitely" because Budget.Check gates execution and consumes budget first. The system charges for wall-clock time regardless of success/failure — an attacker would consume their allocated budget rather than gain free compute. No material issue.🎯 Correctness — ⚠️ could not complete
⚠️ This reviewer failed to complete: agent: step 6: context deadline exceeded
🧯 Error handling & edge cases — ⚠️ could not complete
⚠️ This reviewer failed to complete: agent: step 10: context deadline exceeded
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 28m 57s
Independently verified all 18 gadfly findings against the code (18-agent fan-out). Fixed the 9 real ones; the other 9 were false-positive / hallucinated / valid-tradeoff (no change). High: - F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the agent loop and nil-panicked. Now a clean error (Run never panics). +test. - F9 compactor data-leak: renderTranscript sent tool-call args verbatim to the summarizer (a possibly-different provider/tier); secret-bearing tool args (mcp_call/email_send/http_*/webhook_*) are now redacted, with a doc note that result bodies still flow (summary needs them). Medium/minor: - F2 compactor error path returned the folded slice, not the original msgs (contradicting the documented non-fatal contract) -> return msgs. - F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) / cancelled (Canceled) via statusFor. +test. - F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs min(calls,results) only. - F5 SetIteration was never called -> RunState.Iteration always 0; the step observer now updates it each loop. - F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue). - F7 estimateTokens had no default arm (future Part kinds counted as 0); unknown parts now counted conservatively. - F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now a clear "response exceeded N bytes" via readCapped. - F12 step observer captured the caller ctx; now the merged runCtx. - F13 compaction onFire was nil (doc claimed it logged); now wired to audit LogEvent("compaction_fired"). - F11 (no pre-dispatch hook in majordomo) documented honestly as a known limitation; F18 UsageSink doc clarified cache tokens are subsets of input. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>