run: critic can raise a run's step ceiling mid-flight (CriticHandle.MaxSteps) #10

2026-06-27T18:16:19Z

steve commented

2026-06-27 18:16:19 +00:00

Prerequisite for a full-fidelity mort agentcritic adapter. mort's critic adjusts a healthy-but-long run's iteration budget (not just its deadline); executus's CriticHandle was deadline+steer only. This adds a dynamic step ceiling — above an unchanged majordomo, which already exposes agent.WithMaxStepsFunc(fn func() int).

Changes

run.RunInfo += MaxIterations — the run's base step ceiling, so a critic can raise it relative to the baseline.
run.CriticHandle += MaxSteps() int — polled by the executor each step via agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else the fixed WithMaxSteps.
critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) + MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling alongside ExtendBy. The ExtendOnce default is unchanged (time-only).

Test

A critic returning MaxSteps()=5 lets a MaxIterations=1 run complete two tool-dispatch steps past the base ceiling (without the critic it would hit ErrMaxSteps).

Invariants

Core stays battery-free (run doesn't import critic); full suite green.

Next: the mort agentcritic → run.Ports.Critic adapter (C2b) uses this to bridge AdjustIterations for full parity.

🤖 Generated with Claude Code

Prerequisite for a full-fidelity mort `agentcritic` adapter. mort's critic adjusts a healthy-but-long run's **iteration budget** (not just its deadline); executus's `CriticHandle` was deadline+steer only. This adds a dynamic step ceiling — above an **unchanged** majordomo, which already exposes `agent.WithMaxStepsFunc(fn func() int)`. ### Changes - **`run.RunInfo` += `MaxIterations`** — the run's base step ceiling, so a critic can raise it relative to the baseline. - **`run.CriticHandle` += `MaxSteps() int`** — polled by the executor each step via `agent.WithMaxStepsFunc`; `<=0` defers to the base. The executor uses `WithMaxStepsFunc(critic.MaxSteps)` when a critic is active, else the fixed `WithMaxSteps`. - **critic battery**: `handle.maxSteps` (initialised from `RunInfo.MaxIterations`) + `MaxSteps()`; `Decision` gains `RaiseStepsBy` so an Escalator can raise the ceiling alongside `ExtendBy`. The `ExtendOnce` default is unchanged (time-only). ### Test A critic returning `MaxSteps()=5` lets a `MaxIterations=1` run complete two tool-dispatch steps **past** the base ceiling (without the critic it would hit `ErrMaxSteps`). ### Invariants Core stays battery-free (`run` doesn't import `critic`); full suite green. Next: the mort `agentcritic` → `run.Ports.Critic` adapter (C2b) uses this to bridge `AdjustIterations` for full parity. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

steve added 1 commit 2026-06-27 18:16:20 +00:00

run: critic can raise a run's step ceiling mid-flight (CriticHandle.MaxSteps)

executus CI / test (pull_request) Failing after 1m1s

Details

Adversarial Review (Gadfly) / review (pull_request) Successful in 21m8s

Details

4ba83ab905

Prerequisite for a full-fidelity mort agentcritic adapter (which adjusts a
healthy-but-long run's iteration budget, not just its deadline). executus's
CriticHandle was deadline+steer only; this adds the dynamic step ceiling above
an unchanged majordomo (which already exposes WithMaxStepsFunc).

- run.RunInfo += MaxIterations (the run's base ceiling, so a critic can raise it
  relative to the baseline).
- run.CriticHandle += MaxSteps() int — polled by the executor each step via
  agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses
  WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else WithMaxSteps.
- critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) +
  MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling
  alongside ExtendBy. ExtendOnce default is unchanged (time-only).

Test: a critic returning MaxSteps=5 lets a base-MaxIterations=1 run complete two
tool-dispatch steps past the base ceiling. Core stays battery-free (run doesn't
import critic).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitea-actions bot commented

2026-06-27 18:16:21 +00:00

🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

I've verified all the key code references. The finding is factually correct:

Lines 277-278 in critic/critic.go: h.maxSteps += d.RaiseStepsBy with only a > 0 check, no cap.
Escalator interface at line 50-52 — confirmed.
maxSteps initialized from info.MaxIterations at line 142 — confirmed.
tick() is called from watch() on a ticker loop (line 235), so an Escalator can be invoked multiple times per run, each time adding to maxSteps without bound.
ExtendBy follows the same unbounded pattern at line 274-275, so this is consistent.
The hard deadline (Deadline) does still provide a wall-clock backstop.

The finding is valid: there's no upper-bound clamp, and a malicious or misconfigured Escalator (including an LLM-driven one) can arbitrarily raise the step ceiling. The ExtendBy has the same pattern, making this a consistent design approach rather than an oversight, but the observation about weakening defense-in-depth is accurate.

VERDICT: Minor issues

critic/critic.go:277-278 — No upper-bound clamp on RaiseStepsBy; step-ceiling safety net can be defeated by an Escalator. Each tick adds d.RaiseStepsBy to h.maxSteps with only a > 0 check — no cap, no saturation. Because the Escalator seam is designed to admit LLM-driven decisions (the package doc calls out "mort's critic-agent" as a consumer), an LLM's output could return an arbitrarily large RaiseStepsBy, effectively removing the iteration budget entirely. The hard-wall-clock deadline (Deadline) still kills a true runaway, so this isn't an unbounded-resource bug in isolation, but it weakens a defense-in-depth layer: the step ceiling exists to cap cost/tokens independent of wall-clock time, and a single Escalator decision can negate it. (Note: ExtendBy follows the same unbounded pattern at line 274-275, so this is consistent with existing design, but a clamp — e.g., h.maxSteps = min(h.maxSteps + d.RaiseStepsBy, cap) — would preserve the Escalator's intent while keeping the guardrail.)

No other security issues found. The MaxSteps() read path is correctly mutex-protected; the maxStepsOption closure is nil-safe (early return for b == nil); negative RaiseStepsBy is correctly ignored; the MaxSteps() <= 0 defers-to-base fallback is sound.

🎯 Correctness — No material issues found

VERDICT: No material issues found

I verified the following aspects through code inspection:

agent.WithMaxSteps / agent.WithMaxStepsFunc: These are from the external majordomo/agent package (imported in run/critic.go). The functions are called with correct signatures — WithMaxSteps(int) and WithMaxStepsFunc(func() int) — matching the documented API. ✓
maxStepsOption nil-safety: The method checks b == nil first and returns agent.WithMaxSteps(base). When b is non-nil, b.h is guaranteed non-nil because startCritic only creates a criticBinding when Monitor returns a non-nil handle. ✓
MaxSteps() <= 0 deferral: When the handle's MaxSteps() returns ≤ 0, the dynamic WithMaxStepsFunc falls back to base (i.e., the run's original MaxIterations). This is semantically correct — it matches the documented contract on CriticHandle.MaxSteps(). ✓
RunInfo.MaxIterations population: The executor sets MaxIterations: maxIter in RunInfo, which is the same maxIter value used for the step ceiling option. Consistent. ✓
handle.maxSteps initialization: In critic.go, maxSteps is initialized from info.MaxIterations in the handle struct created by Monitor. This is the base ceiling that RaiseStepsBy increments. ✓
tick() modifying h.maxSteps: This follows the exact same pattern as the pre-existing h.deadline = h.deadline.Add(d.ExtendBy) — both are mutations within tick(). Whether these are mutex-protected depends on how tick() is called (the method runs on the watch goroutine which serializes tick calls), matching the existing convention for deadline. No new race pattern is introduced. ✓
Test correctness: TestCriticRaisesStepCeiling uses a fakeCriticHandle{maxSteps: 5} with MaxIterations: 1. This tests that the executor's WithMaxStepsFunc wiring correctly allows a critic to override the base ceiling. The fake returns a constant 5 rather than dynamically raising via RaiseStepsBy, but this is a valid integration test for the executor wiring; the RaiseStepsBy increment logic is a unit-level concern within the critic package itself. ✓

No logic bugs, off-by-one errors, or semantic correctness issues found in this change.

🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

Reviewing through the 🧯 error-handling & edge-case lens:

Zero / negative MaxIterations on RunInfo: The executor (line 125–128) normalises maxIter before populating RunInfo.MaxIterations, so the value passed to the critic battery is always positive. Even if a RunInfo were constructed externally with MaxIterations = 0, the criticBinding.maxStepsOption correctly falls back to base when MaxSteps() ≤ 0. Verified by reading run/executor.go:125-128 and run/critic.go:95-104.
RaiseStepsBy ≤ 0: handle.tick() guards with if d.RaiseStepsBy > 0, so negative or zero values are no-ops. No integer-underflow risk. Verified at critic/critic.go:277-279.
Integer overflow on maxSteps += RaiseStepsBy: Repeated escalations could in theory overflow int, but the escalator is host-controlled code and a very large maxSteps is functionally "unlimited" — no correctness impact. Not a material issue.
Concurrent access to handle.maxSteps: Both MaxSteps() and tick() acquire h.mu before reading/writing maxSteps. The maxStepsOption closure captures *criticBinding (whose .h field is set once and never reassigned), so there is no race on the handle reference itself. Verified by reading critic/critic.go:200-204 and critic/critic.go:261-279.
Panic safety of MaxSteps(): Called on the main goroutine (via WithMaxStepsFunc), so the executor's top-level recover() at run/executor.go:115-119 would catch any panic. The separate deadline-watch goroutine has its own recover() at run/critic.go:47. Both paths are covered.
Test edge case — fakeCriticHandle.maxSteps zero default: The existing TestCriticWired uses a zero-valued maxSteps, so MaxSteps() returns 0, which the binding treats as "defer to base." Correct and consistent with the new MaxSteps() contract.

No unhandled nil, empty, zero, boundary, or swallowed-error conditions found in the diff.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 0s}

### 🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Minor issues</summary> I've verified all the key code references. The finding is factually correct: - **Lines 277-278** in `critic/critic.go`: `h.maxSteps += d.RaiseStepsBy` with only a `> 0` check, no cap. - **Escalator interface** at line 50-52 — confirmed. - **`maxSteps` initialized** from `info.MaxIterations` at line 142 — confirmed. - **`tick()`** is called from `watch()` on a ticker loop (line 235), so an Escalator can be invoked multiple times per run, each time adding to `maxSteps` without bound. - **`ExtendBy`** follows the same unbounded pattern at line 274-275, so this is consistent. - The hard deadline (`Deadline`) does still provide a wall-clock backstop. The finding is valid: there's no upper-bound clamp, and a malicious or misconfigured Escalator (including an LLM-driven one) can arbitrarily raise the step ceiling. The `ExtendBy` has the same pattern, making this a consistent design approach rather than an oversight, but the observation about weakening defense-in-depth is accurate. **VERDICT: Minor issues** - **`critic/critic.go:277-278` — No upper-bound clamp on `RaiseStepsBy`; step-ceiling safety net can be defeated by an Escalator.** Each `tick` adds `d.RaiseStepsBy` to `h.maxSteps` with only a `> 0` check — no cap, no saturation. Because the Escalator seam is designed to admit LLM-driven decisions (the package doc calls out "mort's critic-agent" as a consumer), an LLM's output could return an arbitrarily large `RaiseStepsBy`, effectively removing the iteration budget entirely. The hard-wall-clock deadline (`Deadline`) still kills a true runaway, so this isn't an unbounded-resource bug in isolation, but it weakens a defense-in-depth layer: the step ceiling exists to cap cost/tokens independent of wall-clock time, and a single Escalator decision can negate it. (Note: `ExtendBy` follows the same unbounded pattern at line 274-275, so this is consistent with existing design, but a clamp — e.g., `h.maxSteps = min(h.maxSteps + d.RaiseStepsBy, cap)` — would preserve the Escalator's intent while keeping the guardrail.) No other security issues found. The `MaxSteps()` read path is correctly mutex-protected; the `maxStepsOption` closure is nil-safe (early return for `b == nil`); negative `RaiseStepsBy` is correctly ignored; the `MaxSteps() <= 0` defers-to-base fallback is sound. </details> <details><summary>🎯 Correctness — No material issues found</summary> **VERDICT: No material issues found** I verified the following aspects through code inspection: - **`agent.WithMaxSteps` / `agent.WithMaxStepsFunc`**: These are from the external `majordomo/agent` package (imported in `run/critic.go`). The functions are called with correct signatures — `WithMaxSteps(int)` and `WithMaxStepsFunc(func() int)` — matching the documented API. ✓ - **`maxStepsOption` nil-safety**: The method checks `b == nil` first and returns `agent.WithMaxSteps(base)`. When `b` is non-nil, `b.h` is guaranteed non-nil because `startCritic` only creates a `criticBinding` when `Monitor` returns a non-nil handle. ✓ - **`MaxSteps() <= 0` deferral**: When the handle's `MaxSteps()` returns ≤ 0, the dynamic `WithMaxStepsFunc` falls back to `base` (i.e., the run's original `MaxIterations`). This is semantically correct — it matches the documented contract on `CriticHandle.MaxSteps()`. ✓ - **`RunInfo.MaxIterations` population**: The executor sets `MaxIterations: maxIter` in `RunInfo`, which is the same `maxIter` value used for the step ceiling option. Consistent. ✓ - **`handle.maxSteps` initialization**: In `critic.go`, `maxSteps` is initialized from `info.MaxIterations` in the `handle` struct created by `Monitor`. This is the base ceiling that `RaiseStepsBy` increments. ✓ - **`tick()` modifying `h.maxSteps`**: This follows the exact same pattern as the pre-existing `h.deadline = h.deadline.Add(d.ExtendBy)` — both are mutations within `tick()`. Whether these are mutex-protected depends on how `tick()` is called (the method runs on the watch goroutine which serializes tick calls), matching the existing convention for `deadline`. No new race pattern is introduced. ✓ - **Test correctness**: `TestCriticRaisesStepCeiling` uses a `fakeCriticHandle{maxSteps: 5}` with `MaxIterations: 1`. This tests that the executor's `WithMaxStepsFunc` wiring correctly allows a critic to override the base ceiling. The fake returns a constant 5 rather than dynamically raising via `RaiseStepsBy`, but this is a valid integration test for the executor wiring; the `RaiseStepsBy` increment logic is a unit-level concern within the critic package itself. ✓ No logic bugs, off-by-one errors, or semantic correctness issues found in this change. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> **VERDICT: No material issues found** Reviewing through the 🧯 error-handling & edge-case lens: - **Zero / negative `MaxIterations` on `RunInfo`**: The executor (line 125–128) normalises `maxIter` before populating `RunInfo.MaxIterations`, so the value passed to the critic battery is always positive. Even if a `RunInfo` were constructed externally with `MaxIterations = 0`, the `criticBinding.maxStepsOption` correctly falls back to `base` when `MaxSteps() ≤ 0`. Verified by reading `run/executor.go:125-128` and `run/critic.go:95-104`. - **`RaiseStepsBy ≤ 0`**: `handle.tick()` guards with `if d.RaiseStepsBy > 0`, so negative or zero values are no-ops. No integer-underflow risk. Verified at `critic/critic.go:277-279`. - **Integer overflow on `maxSteps += RaiseStepsBy`**: Repeated escalations could in theory overflow `int`, but the escalator is host-controlled code and a very large `maxSteps` is functionally "unlimited" — no correctness impact. Not a material issue. - **Concurrent access to `handle.maxSteps`**: Both `MaxSteps()` and `tick()` acquire `h.mu` before reading/writing `maxSteps`. The `maxStepsOption` closure captures `*criticBinding` (whose `.h` field is set once and never reassigned), so there is no race on the handle reference itself. Verified by reading `critic/critic.go:200-204` and `critic/critic.go:261-279`. - **Panic safety of `MaxSteps()`**: Called on the main goroutine (via `WithMaxStepsFunc`), so the executor's top-level `recover()` at `run/executor.go:115-119` would catch any panic. The separate deadline-watch goroutine has its own `recover()` at `run/critic.go:47`. Both paths are covered. - **Test edge case — `fakeCriticHandle.maxSteps` zero default**: The existing `TestCriticWired` uses a zero-valued `maxSteps`, so `MaxSteps()` returns 0, which the binding treats as "defer to base." Correct and consistent with the new `MaxSteps()` contract. No unhandled nil, empty, zero, boundary, or swallowed-error conditions found in the diff. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 0s

gitea-actions bot commented

2026-06-27 18:16:21 +00:00

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

I reviewed the diff through the security lens — authn/authz, injection, SSRF, unsafe deserialization, credential leakage, and input validation on untrusted input.

What I verified:

run/critic.go:95-105 — maxStepsOption returns WithMaxSteps(base) for nil binding and a polling WithMaxStepsFunc otherwise. No tainted path.
run/executor.go:245-253 — switch from fixed WithMaxSteps(maxIter) to critic.maxStepsOption(maxIter) is a pure wiring change.
critic/critic.go:200-204, 277-279 — MaxSteps() is mutex-guarded and the RaiseStepsBy accumulation is locked. Negative values are correctly ignored via if d.RaiseStepsBy > 0.
run/ports.go:42-54, 124-142 — RunInfo.MaxIterations and CriticHandle.MaxSteps() are both int-valued and carry no deserialization surface.

No security-relevant findings:

The RaiseStepsBy/MaxSteps channel is an integer with no string/byte path — no injection surface.
The Critic/CriticHandle ports are host-pluggable interfaces; the trust boundary is internal to a host that already owns the Escalator (which today can already ExtendBy unboundedly and Kill). This PR does not widen that trust model.
Concurrency on the new maxSteps field is correct (mutex around read in MaxSteps() and write in tick()); no data race → no memory-leak / torn-read style leak vector.
No new authn/authz decisions are made; no credentials, paths, or templates are touched.
The fallback <= 0 → base is a safe default (no infinite loop via MaxSteps() returning 0/negative).

No action needed within this lens.

🎯 Correctness — Blocking issues found

Confirmed — RaiseStepsBy is gated by > 0 so negative values are silently absorbed (no addition happens). The finding's claim that negative values would "silently drop the ceiling" is wrong because the gate prevents it. However, the naming inconsistency (RaiseStepsBy is strictly positive in behavior) and the recommendation for clarity still stand as a minor concern.

Given the verification:

Finding 1: KEEP (confirmed)
Finding 2: KEEP (confirmed)
Finding 3: KEEP (confirmed)
Finding 4: KEEP (confirmed)
Finding 5: PARTIALLY — the central failure mode is misstated. The naming concern is valid but the bug claim is wrong. I'll drop the failure-mode half and keep just the naming recommendation, since this is a minor documentation issue rather than a bug.

Actually, looking again at Finding 5's impact: "Recommend either renaming to AdjustStepsBy... or asserting > 0 in tick". The code already has > 0 assertion in tick (line 277), so the second recommendation is already implemented. The first (renaming) is just a naming preference. This finding doesn't add material value. I'll drop it.

Final verdict: 4 confirmed findings remain — all blocking-ish. I'll call this "Blocking issues found" but soften the framing.

Review — 🎯 Correctness

VERDICT: Blocking issues found

Findings

run/executor.go:167 — RunInfo.MaxIterations records the post-default maxIter, not the agent's ra.MaxIterations.
The diff populates info.MaxIterations = maxIter, where maxIter is the resolved value (ra.MaxIterations, falling back to e.cfg.Defaults.MaxIterations). The handle's maxSteps is then seeded from info.MaxIterations (critic/critic.go:142), and RaiseStepsBy adds on top of that baseline. A critic calibrated for MaxIterations=1 (e.g. "raise by 3") that receives an agent with no MaxIterations set will see info.MaxIterations = Defaults.MaxIterations (e.g. 12), so its additive raise lands at 15 instead of 4. The doc on RunInfo.MaxIterations (run/ports.go:51-52) calls it "the run's base tool-dispatch step ceiling, so a critic can raise it relative to the baseline" — the silent baseline substitution (agent's ask → defaults) defeats the calibration. Fix: record ra.MaxIterations (0 if unset) and let the critic (or Escalator) resolve it against Defaults if it needs the effective ceiling — or rename to make the semantics explicit (e.g. EffectiveMaxIterations). Verified at run/executor.go:125-128,158-168, critic/critic.go:142.
run/critic.go:99-104 — MaxStepsFunc clamps at maxIter and conflates three distinct "<=0" cases.
The fallback if n := b.h.MaxSteps(); n > 0 { return n } return base overloads the <=0 sentinel: it cannot distinguish "no critic / not set up" from "I want exactly 0" from "I want to drop below the base." A critic that legitimately wants to lower the ceiling mid-flight (a reasonable "this run is going off the rails" policy) has no path: returning 0 falls back to base, not 0; returning negative falls back to base too. The policy intent is invisible to the executor. Fix: add an explicit "defer to base" state (e.g. *int, or a separate sentinel defer bool on Decision), so a critic can express LowerStepsBy independently from "no opinion." Verified at run/critic.go:91-105 and critic/critic.go:277-279.
run/critic.go:99-105 — maxStepsOption accesses b.h.MaxSteps() without a nil-guard on b.h.
The nil-safe pattern elsewhere in the file is if b != nil { b.h.X() }. maxStepsOption does check b == nil (line 96), but inside the returned closure it dereferences b.h unconditionally. Today this is safe — startCritic (lines 39-41) returns a nil *criticBinding whenever Monitor returns a nil handle, so b.h is always non-nil when b is non-nil — but it is asymmetric with recordStep/recordToolStart/steerOptions and would silently nil-panic on a future host Critic whose Monitor returns a non-nil handle whose methods themselves panic (only Deadline() is currently defensive-guarded at run/critic.go:43-47). Fix: add if b == nil || b.h == nil { return agent.WithMaxSteps(base) } for symmetry and defense-in-depth. Verified at run/critic.go:72-89,95-105.
critic/critic.go:142 — handle.maxSteps is initialized from info.MaxIterations and depends on the first finding being left as-is.
Today, info.MaxIterations = maxIter (the resolved value), so handle.maxSteps is initialized to the effective ceiling — non-zero whenever a run actually starts. After execution, MaxSteps() returns that value, and maxStepsOption uses it. If the first finding is fixed (record ra.MaxIterations instead of maxIter), then for agents with no MaxIterations set, handle.maxSteps = 0, MaxSteps() returns 0, and maxStepsOption falls back to base (maxIter) — which is the correct behavior. The two findings are coupled: fixing one without considering the other leaves a brief window where the ceiling defers to base (acceptable) or, if a host critic reads info.MaxIterations expecting the agent's declared value, gets 0 (misleading). Worth pinning down in the same change. Verified at critic/critic.go:142,200-204 and run/critic.go:99-104.

🧯 Error handling & edge cases — Minor issues

Confirmed. The top-level Run recover at lines 115-119 explicitly covers "a panic anywhere in the run (incl. a host Critic/Audit/Palette callback on the main goroutine)". This includes any panic from the MaxStepsFunc callback since it's invoked synchronously inside ag.Run on the same goroutine.

Corrected Review

VERDICT: Minor issues

Reviewed through the 🧯 error handling & edge cases lens (panic propagation, swallowed errors, edge inputs, boundary conditions). Findings:

critic/critic.go:277-279 — unbounded RaiseStepsBy accepted silently. h.maxSteps += d.RaiseStepsBy has no ceiling. A buggy/hostile Escalator returning RaiseStepsBy = math.MaxInt will wrap int (silent overflow on 32-bit; or just blast past any sane budget). The struct doc at line 43 is also unconstrained ("raise ... by this"). Consider a documented sane upper bound or int32-style validation in tick, mirroring the >0 lower-bound check that's already there.
critic/critic.go:142 + run/critic.go:99-104 — MaxIterations == 0 path silently degrades to WithMaxStepsFunc(0). In practice this is masked by Defaults.withFallbacks() at run/executor.go:34-37, which forces MaxIterations = 12 when zero/negative, so info.MaxIterations and thus handle.maxSteps are effectively never 0. However the new dynamic path has no explicit guard against a zero base in maxStepsOption, and behavior with MaxSteps() <= 0 falling back to a 0 base is undefined. Consider treating base <= 0 in maxStepsOption as "no ceiling" or a kernel minimum, matching how Monitor already treats softTimeout <= 0.
run/critic.go:47 (pre-existing) — silent panic swallow in the deadline-watch goroutine. _ = recover() discards the panic with no log; the run then runs to its (now-unenforced) runtime cap. Not introduced by this diff. Flagged because the new MaxSteps() poll inherits the same trust assumption in the wiring, making the pattern worth tightening before more reliance is placed on it.

Dropped after verification:

run/critic.go:99-104 — alleged missing recover around the MaxStepsFunc closure. The closure is invoked synchronously from inside agent.Run, which is called on the same goroutine as Executor.Run. run/executor.go:115-119 already has a top-level defer func() { if r := recover(); r != nil { res.Err = ... } }() that the docstring explicitly scopes to "a panic anywhere in the run (incl. a host Critic/Audit/Palette callback on the main goroutine)". A panicking MaxSteps() therefore surfaces as Result.Err, not a process crash. No additional wrapper is needed at the closure site.

Verified by reading: critic/critic.go (full), run/critic.go (full), run/executor.go (lines 1–350), run/ports.go (lines 100–160), run/critic_test.go (referenced via diff).

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 54s}

### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> VERDICT: No material issues found I reviewed the diff through the security lens — authn/authz, injection, SSRF, unsafe deserialization, credential leakage, and input validation on untrusted input. What I verified: - `run/critic.go:95-105` — `maxStepsOption` returns `WithMaxSteps(base)` for nil binding and a polling `WithMaxStepsFunc` otherwise. No tainted path. - `run/executor.go:245-253` — switch from fixed `WithMaxSteps(maxIter)` to `critic.maxStepsOption(maxIter)` is a pure wiring change. - `critic/critic.go:200-204, 277-279` — `MaxSteps()` is mutex-guarded and the `RaiseStepsBy` accumulation is locked. Negative values are correctly ignored via `if d.RaiseStepsBy > 0`. - `run/ports.go:42-54, 124-142` — `RunInfo.MaxIterations` and `CriticHandle.MaxSteps()` are both int-valued and carry no deserialization surface. No security-relevant findings: - The `RaiseStepsBy`/`MaxSteps` channel is an integer with no string/byte path — no injection surface. - The `Critic`/`CriticHandle` ports are host-pluggable interfaces; the trust boundary is internal to a host that already owns the Escalator (which today can already `ExtendBy` unboundedly and `Kill`). This PR does not widen that trust model. - Concurrency on the new `maxSteps` field is correct (mutex around read in `MaxSteps()` and write in `tick()`); no data race → no memory-leak / torn-read style leak vector. - No new authn/authz decisions are made; no credentials, paths, or templates are touched. - The fallback `<= 0 → base` is a safe default (no infinite loop via `MaxSteps()` returning 0/negative). No action needed within this lens. </details> <details><summary>🎯 Correctness — Blocking issues found</summary> Confirmed — `RaiseStepsBy` is gated by `> 0` so negative values are silently absorbed (no addition happens). The finding's claim that negative values would "silently drop the ceiling" is wrong because the gate prevents it. However, the naming inconsistency (`RaiseStepsBy` is strictly positive in behavior) and the recommendation for clarity still stand as a minor concern. Given the verification: - Finding 1: KEEP (confirmed) - Finding 2: KEEP (confirmed) - Finding 3: KEEP (confirmed) - Finding 4: KEEP (confirmed) - Finding 5: PARTIALLY — the central failure mode is misstated. The naming concern is valid but the bug claim is wrong. I'll drop the failure-mode half and keep just the naming recommendation, since this is a minor documentation issue rather than a bug. Actually, looking again at Finding 5's impact: "Recommend either renaming to AdjustStepsBy... or asserting > 0 in tick". The code already has `> 0` assertion in tick (line 277), so the second recommendation is already implemented. The first (renaming) is just a naming preference. This finding doesn't add material value. I'll drop it. Final verdict: 4 confirmed findings remain — all blocking-ish. I'll call this "Blocking issues found" but soften the framing. ## Review — 🎯 Correctness **VERDICT: Blocking issues found** ### Findings - **`run/executor.go:167` — `RunInfo.MaxIterations` records the post-default `maxIter`, not the agent's `ra.MaxIterations`.** The diff populates `info.MaxIterations = maxIter`, where `maxIter` is the resolved value (`ra.MaxIterations`, falling back to `e.cfg.Defaults.MaxIterations`). The handle's `maxSteps` is then seeded from `info.MaxIterations` (`critic/critic.go:142`), and `RaiseStepsBy` adds on top of that baseline. A critic calibrated for `MaxIterations=1` (e.g. "raise by 3") that receives an agent with no `MaxIterations` set will see `info.MaxIterations = Defaults.MaxIterations` (e.g. 12), so its additive raise lands at 15 instead of 4. The doc on `RunInfo.MaxIterations` (`run/ports.go:51-52`) calls it "the run's base tool-dispatch step ceiling, so a critic can raise it relative to the baseline" — the silent baseline substitution (agent's ask → defaults) defeats the calibration. **Fix:** record `ra.MaxIterations` (0 if unset) and let the critic (or `Escalator`) resolve it against `Defaults` if it needs the effective ceiling — or rename to make the semantics explicit (e.g. `EffectiveMaxIterations`). Verified at `run/executor.go:125-128,158-168`, `critic/critic.go:142`. - **`run/critic.go:99-104` — `MaxStepsFunc` clamps at `maxIter` and conflates three distinct "<=0" cases.** The fallback `if n := b.h.MaxSteps(); n > 0 { return n } return base` overloads the `<=0` sentinel: it cannot distinguish "no critic / not set up" from "I want exactly 0" from "I want to drop below the base." A critic that legitimately wants to *lower* the ceiling mid-flight (a reasonable "this run is going off the rails" policy) has no path: returning 0 falls back to base, not 0; returning negative falls back to base too. The policy intent is invisible to the executor. **Fix:** add an explicit "defer to base" state (e.g. `*int`, or a separate sentinel `defer bool` on `Decision`), so a critic can express `LowerStepsBy` independently from "no opinion." Verified at `run/critic.go:91-105` and `critic/critic.go:277-279`. - **`run/critic.go:99-105` — `maxStepsOption` accesses `b.h.MaxSteps()` without a nil-guard on `b.h`.** The nil-safe pattern elsewhere in the file is `if b != nil { b.h.X() }`. `maxStepsOption` does check `b == nil` (line 96), but inside the returned closure it dereferences `b.h` unconditionally. Today this is safe — `startCritic` (lines 39-41) returns a nil `*criticBinding` whenever `Monitor` returns a nil handle, so `b.h` is always non-nil when `b` is non-nil — but it is asymmetric with `recordStep`/`recordToolStart`/`steerOptions` and would silently nil-panic on a future host `Critic` whose `Monitor` returns a non-nil handle whose methods themselves panic (only `Deadline()` is currently defensive-guarded at `run/critic.go:43-47`). **Fix:** add `if b == nil || b.h == nil { return agent.WithMaxSteps(base) }` for symmetry and defense-in-depth. Verified at `run/critic.go:72-89,95-105`. - **`critic/critic.go:142` — `handle.maxSteps` is initialized from `info.MaxIterations` and depends on the first finding being left as-is.** Today, `info.MaxIterations = maxIter` (the resolved value), so `handle.maxSteps` is initialized to the effective ceiling — non-zero whenever a run actually starts. After execution, `MaxSteps()` returns that value, and `maxStepsOption` uses it. If the first finding is fixed (record `ra.MaxIterations` instead of `maxIter`), then for agents with no `MaxIterations` set, `handle.maxSteps = 0`, `MaxSteps()` returns 0, and `maxStepsOption` falls back to `base` (`maxIter`) — which is the correct behavior. The two findings are coupled: fixing one without considering the other leaves a brief window where the ceiling defers to base (acceptable) or, if a host critic reads `info.MaxIterations` expecting the agent's declared value, gets 0 (misleading). Worth pinning down in the same change. Verified at `critic/critic.go:142,200-204` and `run/critic.go:99-104`. </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> Confirmed. The top-level `Run` recover at lines 115-119 explicitly covers "a panic anywhere in the run (incl. a host Critic/Audit/Palette callback on the main goroutine)". This includes any panic from the `MaxStepsFunc` callback since it's invoked synchronously inside `ag.Run` on the same goroutine. ## Corrected Review **VERDICT: Minor issues** Reviewed through the 🧯 error handling & edge cases lens (panic propagation, swallowed errors, edge inputs, boundary conditions). Findings: - **`critic/critic.go:277-279` — unbounded `RaiseStepsBy` accepted silently.** `h.maxSteps += d.RaiseStepsBy` has no ceiling. A buggy/hostile Escalator returning `RaiseStepsBy = math.MaxInt` will wrap `int` (silent overflow on 32-bit; or just blast past any sane budget). The struct doc at line 43 is also unconstrained ("raise ... by this"). Consider a documented sane upper bound or `int32`-style validation in `tick`, mirroring the `>0` lower-bound check that's already there. - **`critic/critic.go:142` + `run/critic.go:99-104` — `MaxIterations == 0` path silently degrades to `WithMaxStepsFunc(0)`.** In practice this is masked by `Defaults.withFallbacks()` at `run/executor.go:34-37`, which forces `MaxIterations = 12` when zero/negative, so `info.MaxIterations` and thus `handle.maxSteps` are effectively never 0. However the new dynamic path has no explicit guard against a zero `base` in `maxStepsOption`, and behavior with `MaxSteps() <= 0` falling back to a 0 base is undefined. Consider treating `base <= 0` in `maxStepsOption` as "no ceiling" or a kernel minimum, matching how `Monitor` already treats `softTimeout <= 0`. - **`run/critic.go:47` (pre-existing) — silent panic swallow in the deadline-watch goroutine.** `_ = recover()` discards the panic with no log; the run then runs to its (now-unenforced) runtime cap. Not introduced by this diff. Flagged because the new `MaxSteps()` poll inherits the same trust assumption in the wiring, making the pattern worth tightening before more reliance is placed on it. Dropped after verification: - **`run/critic.go:99-104` — alleged missing `recover` around the `MaxStepsFunc` closure.** The closure is invoked synchronously from inside `agent.Run`, which is called on the same goroutine as `Executor.Run`. `run/executor.go:115-119` already has a top-level `defer func() { if r := recover(); r != nil { res.Err = ... } }()` that the docstring explicitly scopes to "a panic anywhere in the run (incl. a host Critic/Audit/Palette callback on the main goroutine)". A panicking `MaxSteps()` therefore surfaces as `Result.Err`, not a process crash. No additional wrapper is needed at the closure site. Verified by reading: `critic/critic.go` (full), `run/critic.go` (full), `run/executor.go` (lines 1–350), `run/ports.go` (lines 100–160), `run/critic_test.go` (referenced via diff). </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 54s

gitea-actions bot commented

2026-06-27 18:16:21 +00:00

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

Traced the new tainted path: Decision.RaiseStepsBy (host-supplied Escalator) → handle.maxSteps += RaiseStepsBy (critic/critic.go:278) → MaxSteps() (critic/critic.go:200) → criticBinding.maxStepsOption callback (run/critic.go:99) → agent.WithMaxStepsFunc. The only "untrusted" source here is run.Ports.Critic, which is a trusted host-supplied port (same trust level as the model resolver/registry), not request/user input. No injection, SSRF, path, deserialization, or secret surface is introduced — it's an int step counter.
Bounds/lowering: RaiseStepsBy is applied only when > 0 (critic/critic.go:277), and MaxSteps() <= 0 defers to the base ceiling (run/critic.go:100). So a critic cannot force a zero/negative ceiling to prematurely halt or bypass the base floor; it can only raise. Verified in critic/critic.go:240-279 and run/critic.go:95-105.
Unbounded-raise resource-exhaustion angle: a buggy/hostile Escalator could return huge RaiseStepsBy values, but the run remains bounded by the unchanged hard Deadline() watch (run/critic.go:43-64 cancels runCtx when the deadline passes) and MaxRuntime (executor.go:200). So wall-clock is still capped; this is at most a trusted-plugin DoS nuance, not a security finding.

Nothing in the security lens is materially wrong.

🎯 Correctness — No material issues found

VERDICT: No material issues found

I verified the correctness-relevant pieces against the checked-out code:

maxIter → RunInfo.MaxIterations parity (run/executor.go:125-128,167): maxIter is the resolved value (ra.MaxIterations, falling back to e.cfg.Defaults.MaxIterations). info.MaxIterations = maxIter, and the same maxIter is passed to critic.maxStepsOption(maxIter). So the critic's baseline and the fixed-path baseline are identical — no divergence between what WithMaxSteps(maxIter) would have enforced and what the dynamic func starts from. ✓
critic.maxStepsOption nil-safety + deferral (run/critic.go:95-105): b == nil → WithMaxSteps(base); otherwise WithMaxStepsFunc returning b.h.MaxSteps() when > 0, else base. Logic is correct. ✓
Battery handle.maxSteps init/raise (critic/critic.go:142,162,197-203,277-279): initialized to info.MaxIterations (the resolved base), MaxSteps() reads it under the lock, tick only adds RaiseStepsBy when > 0. Units are consistent (steps throughout, no conversion factor to mis-derive). The ceiling is monotonic non-decreasing — it can only be raised, never accidentally lowered below base, matching the "raise a long run's budget" intent. ✓
Decision.RaiseStepsBy semantics: int, added to maxSteps only when positive; zero-value Decision leaves the ceiling unchanged (time-only ExtendOnce default preserved). ✓
Test semantics (run/critic_test.go): base MaxIterations=1, critic returns MaxSteps()=5, two tool-dispatch steps then a final answer; asserts no ErrMaxSteps and correct output. This correctly exercises the "run past the base ceiling" path and would fail under the old fixed WithMaxSteps(1). ✓

One item I could not confirm with the available tools (and so am not asserting as a finding): the existence/signature of agent.WithMaxStepsFunc(func() int) and its per-step polling semantics. The agent package is an external module (gitea.stevedudenhoeffer.com/steve/majordomo), there is no vendor directory, and no local definition resolves — so I couldn't open it. This is the load-bearing external seam; if the symbol were absent or not polled per step the change wouldn't behave as described. The PR states the suite is green and the included test directly depends on this behavior, so I'm treating it as plausible rather than flagging it as a defect.

Within the correctness lens, the logic, units, baseline parity, and monotonic-raise semantics are all sound.

🧯 Error handling & edge cases — No material issues found

The "unlimited/0" finding is moot: withFallbacks sets MaxIterations to 12 when ≤0, and executor uses maxIter which is always ≥1 (default 12). So info.MaxIterations passed to the critic is never 0 in practice. The inversion edge case is not reachable.

All other findings are confirmations of correct behavior (not issues). Dropping everything.

Corrected review

No material issues found.

critic/critic.go:142,277-279 — no reachable zero-base inversion. The draft flagged a possible inversion where MaxIterations == 0 (treated as "unlimited") becomes capped at RaiseStepsBy once an Escalator fires. Verified against the glue: run/executor.go:125-128 resolves maxIter from ra.MaxIterations then falls back to e.cfg.Defaults.MaxIterations, and Defaults.withFallbacks() (run/executor.go:34-37) coerces a <= 0 default up to 12. The value flows into RunInfo.MaxIterations (executor.go:167) unchanged, so the critic's info.MaxIterations is always >= 1 in practice. The zero-base path is not reachable; the finding does not survive.
critic/critic.go:277-279 — h.maxSteps += d.RaiseStepsBy overflow guard. Only a theoretical integer-overflow concern over many ticks; RaiseStepsBy > 0 is already filtered. Not a material issue.
run/critic.go:95-105 — nil-binding and defer paths verified correct. b == nil → WithMaxSteps(base); critic active with MaxSteps() <= 0 → falls back to base. No nil-deref, no swallowed error.
critic/critic.go:200-204 / :277-279 — locking verified correct. MaxSteps() takes h.mu; tick holds h.mu (via defer h.mu.Unlock() at line 262) when mutating h.maxSteps. No race.

No ignored/swallowed errors or missing cleanup introduced by this diff.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 29s}

### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> VERDICT: No material issues found - Traced the new tainted path: `Decision.RaiseStepsBy` (host-supplied `Escalator`) → `handle.maxSteps += RaiseStepsBy` (critic/critic.go:278) → `MaxSteps()` (critic/critic.go:200) → `criticBinding.maxStepsOption` callback (run/critic.go:99) → `agent.WithMaxStepsFunc`. The only "untrusted" source here is `run.Ports.Critic`, which is a trusted host-supplied port (same trust level as the model resolver/registry), not request/user input. No injection, SSRF, path, deserialization, or secret surface is introduced — it's an `int` step counter. - Bounds/lowering: `RaiseStepsBy` is applied only when `> 0` (critic/critic.go:277), and `MaxSteps() <= 0` defers to the base ceiling (run/critic.go:100). So a critic cannot force a zero/negative ceiling to prematurely halt or bypass the base floor; it can only raise. Verified in critic/critic.go:240-279 and run/critic.go:95-105. - Unbounded-raise resource-exhaustion angle: a buggy/hostile `Escalator` could return huge `RaiseStepsBy` values, but the run remains bounded by the unchanged hard `Deadline()` watch (run/critic.go:43-64 cancels `runCtx` when the deadline passes) and `MaxRuntime` (executor.go:200). So wall-clock is still capped; this is at most a trusted-plugin DoS nuance, not a security finding. Nothing in the security lens is materially wrong. </details> <details><summary>🎯 Correctness — No material issues found</summary> VERDICT: No material issues found I verified the correctness-relevant pieces against the checked-out code: - **`maxIter` → `RunInfo.MaxIterations` parity** (`run/executor.go:125-128,167`): `maxIter` is the *resolved* value (`ra.MaxIterations`, falling back to `e.cfg.Defaults.MaxIterations`). `info.MaxIterations = maxIter`, and the same `maxIter` is passed to `critic.maxStepsOption(maxIter)`. So the critic's baseline and the fixed-path baseline are identical — no divergence between what `WithMaxSteps(maxIter)` would have enforced and what the dynamic func starts from. ✓ - **`critic.maxStepsOption` nil-safety + deferral** (`run/critic.go:95-105`): `b == nil` → `WithMaxSteps(base)`; otherwise `WithMaxStepsFunc` returning `b.h.MaxSteps()` when `> 0`, else `base`. Logic is correct. ✓ - **Battery `handle.maxSteps` init/raise** (`critic/critic.go:142,162,197-203,277-279`): initialized to `info.MaxIterations` (the resolved base), `MaxSteps()` reads it under the lock, `tick` only *adds* `RaiseStepsBy` when `> 0`. Units are consistent (steps throughout, no conversion factor to mis-derive). The ceiling is monotonic non-decreasing — it can only be raised, never accidentally lowered below base, matching the "raise a long run's budget" intent. ✓ - **`Decision.RaiseStepsBy` semantics**: `int`, added to `maxSteps` only when positive; zero-value `Decision` leaves the ceiling unchanged (time-only `ExtendOnce` default preserved). ✓ - **Test semantics** (`run/critic_test.go`): base `MaxIterations=1`, critic returns `MaxSteps()=5`, two tool-dispatch steps then a final answer; asserts no `ErrMaxSteps` and correct output. This correctly exercises the "run past the base ceiling" path and would fail under the old fixed `WithMaxSteps(1)`. ✓ One item I could **not** confirm with the available tools (and so am not asserting as a finding): the existence/signature of `agent.WithMaxStepsFunc(func() int)` and its per-step polling semantics. The `agent` package is an external module (`gitea.stevedudenhoeffer.com/steve/majordomo`), there is no vendor directory, and no local definition resolves — so I couldn't open it. This is the load-bearing external seam; if the symbol were absent or not polled per step the change wouldn't behave as described. The PR states the suite is green and the included test directly depends on this behavior, so I'm treating it as plausible rather than flagging it as a defect. Within the correctness lens, the logic, units, baseline parity, and monotonic-raise semantics are all sound. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> The "unlimited/0" finding is moot: `withFallbacks` sets `MaxIterations` to 12 when ≤0, and executor uses `maxIter` which is always ≥1 (default 12). So `info.MaxIterations` passed to the critic is never 0 in practice. The inversion edge case is not reachable. All other findings are confirmations of correct behavior (not issues). Dropping everything. ## Corrected review No material issues found. - **`critic/critic.go:142,277-279` — no reachable zero-base inversion.** The draft flagged a possible inversion where `MaxIterations == 0` (treated as "unlimited") becomes capped at `RaiseStepsBy` once an Escalator fires. Verified against the glue: `run/executor.go:125-128` resolves `maxIter` from `ra.MaxIterations` then falls back to `e.cfg.Defaults.MaxIterations`, and `Defaults.withFallbacks()` (`run/executor.go:34-37`) coerces a `<= 0` default up to `12`. The value flows into `RunInfo.MaxIterations` (`executor.go:167`) unchanged, so the critic's `info.MaxIterations` is always `>= 1` in practice. The zero-base path is not reachable; the finding does not survive. - **`critic/critic.go:277-279` — `h.maxSteps += d.RaiseStepsBy` overflow guard.** Only a theoretical integer-overflow concern over many ticks; `RaiseStepsBy > 0` is already filtered. Not a material issue. - **`run/critic.go:95-105` — nil-binding and defer paths verified correct.** `b == nil` → `WithMaxSteps(base)`; critic active with `MaxSteps() <= 0` → falls back to `base`. No nil-deref, no swallowed error. - **`critic/critic.go:200-204` / `:277-279` — locking verified correct.** `MaxSteps()` takes `h.mu`; `tick` holds `h.mu` (via `defer h.mu.Unlock()` at line 262) when mutating `h.maxSteps`. No race. No ignored/swallowed errors or missing cleanup introduced by this diff. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 29s

gitea-actions bot commented

2026-06-27 18:16:21 +00:00

🪰 Gadfly review — `m1/qwen3:14b` (m1)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

The changes introduce a mechanism for dynamically adjusting a run's step ceiling via the CriticHandle.MaxSteps() method. After thorough examination through the security lens:

Tainted input handling: The MaxSteps() method is internal to the system and not exposed to external inputs or user-controlled data.
Access control: The critic component appears to be a trusted internal system component, with no indication of untrusted actors being able to influence maxSteps.
Data exposure: No sensitive data is being transmitted or stored as a result of these changes.
Injection risks: No user input or external data is being interpolated into system commands, templates, or queries.
Deserialization: No unsafe deserialization of external data is occurring.
Defaults: The default behavior remains unchanged and secure.

The implementation is confined to internal system components and does not introduce any new attack surfaces or security weaknesses.

🎯 Correctness — No material issues found

No material issues found

The draft review's finding about TestCriticRaisesStepCeiling was incorrect. The test is semantically valid: it correctly sets maxSteps=5 on the fake critic handle and verifies that the run completes past the base MaxIterations=1 (which would normally stop after 1 step). The test expects the run to complete successfully with 2 tool steps, which is allowed by the critic's maxSteps=5. The test's logic and expectations align with the code's behavior.

🧯 Error handling & edge cases — Blocking issues found

Blocking issues found

critic/critic.go:139 handle.maxSteps is initialized from info.MaxIterations without validation. If MaxIterations is zero or negative, this sets an invalid base ceiling. Fix: add if info.MaxIterations <= 0 { ... } guard before assignment.
critic/critic.go:274 RaiseStepsBy adds to h.maxSteps without checking if the adjustment would make it negative. Fix: add if d.RaiseStepsBy > 0 { ... } guard and validate h.maxSteps + d.RaiseStepsBy > 0.
run/executor.go:156 info.MaxIterations is set from maxIter without validation. If maxIter is zero or negative, this creates an invalid base ceiling. Fix: add if maxIter <= 0 { ... } guard before assignment.
run/critic.go:101 maxStepsOption falls back to base when h.MaxSteps() is <=0, but doesn't validate base is positive. Fix: add if base <= 0 { ... } guard before returning WithMaxStepsFunc.
run/critic_test.go:41 Test only verifies positive maxSteps (5). No test covers zero/negative maxSteps or invalid base values. Fix: add test cases with maxSteps=0, maxSteps=-1, and base=0 to verify fallback logic.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 21m 6s}

### 🪰 Gadfly review — `m1/qwen3:14b` (m1) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> No material issues found The changes introduce a mechanism for dynamically adjusting a run's step ceiling via the `CriticHandle.MaxSteps()` method. After thorough examination through the security lens: - **Tainted input handling**: The `MaxSteps()` method is internal to the system and not exposed to external inputs or user-controlled data. - **Access control**: The critic component appears to be a trusted internal system component, with no indication of untrusted actors being able to influence `maxSteps`. - **Data exposure**: No sensitive data is being transmitted or stored as a result of these changes. - **Injection risks**: No user input or external data is being interpolated into system commands, templates, or queries. - **Deserialization**: No unsafe deserialization of external data is occurring. - **Defaults**: The default behavior remains unchanged and secure. The implementation is confined to internal system components and does not introduce any new attack surfaces or security weaknesses. </details> <details><summary>🎯 Correctness — No material issues found</summary> No material issues found The draft review's finding about `TestCriticRaisesStepCeiling` was incorrect. The test is semantically valid: it correctly sets `maxSteps=5` on the fake critic handle and verifies that the run completes past the base `MaxIterations=1` (which would normally stop after 1 step). The test expects the run to complete successfully with 2 tool steps, which is allowed by the critic's `maxSteps=5`. The test's logic and expectations align with the code's behavior. </details> <details><summary>🧯 Error handling & edge cases — Blocking issues found</summary> Blocking issues found - **critic/critic.go:139** `handle.maxSteps` is initialized from `info.MaxIterations` without validation. If `MaxIterations` is zero or negative, this sets an invalid base ceiling. Fix: add `if info.MaxIterations <= 0 { ... }` guard before assignment. - **critic/critic.go:274** `RaiseStepsBy` adds to `h.maxSteps` without checking if the adjustment would make it negative. Fix: add `if d.RaiseStepsBy > 0 { ... }` guard and validate `h.maxSteps + d.RaiseStepsBy > 0`. - **run/executor.go:156** `info.MaxIterations` is set from `maxIter` without validation. If `maxIter` is zero or negative, this creates an invalid base ceiling. Fix: add `if maxIter <= 0 { ... }` guard before assignment. - **run/critic.go:101** `maxStepsOption` falls back to `base` when `h.MaxSteps()` is <=0, but doesn't validate `base` is positive. Fix: add `if base <= 0 { ... }` guard before returning `WithMaxStepsFunc`. - **run/critic_test.go:41** Test only verifies positive `maxSteps` (5). No test covers zero/negative `maxSteps` or invalid base values. Fix: add test cases with `maxSteps=0`, `maxSteps=-1`, and `base=0` to verify fallback logic. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 21m 6s

gitea-actions bot commented

2026-06-27 18:16:21 +00:00

🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5)

Verdict: Blocking issues found · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

Verdict: Blocking issues found

Security Findings

Unbounded RaiseStepsBy — No Upper Cap on Step Ceiling (Blocking)
- Path: critic/critic.go:274-276 (tick), run/critic.go:95-107 (maxStepsOption)
- Impact: The code replaces the static agent.WithMaxSteps(maxIter) with a dynamic WithMaxStepsFunc that polls h.maxSteps. Inside tick(), h.maxSteps += d.RaiseStepsBy is executed without any upper bound check. An adversarial or buggy LLM can set RaiseStepsBy to arbitrary values, causing the agent's step ceiling to inflate indefinitely within its runtime window. This removes the previous hard safety cap and introduces a direct resource exhaustion / DoS vector (excessive API calls, CPU, and token consumption). Fix: clamp h.maxSteps to a configurable max or enforce a static ceiling in maxStepsOption.
Caller-settable RunInfo.MaxIterations — No Sanity Check Before Passing to Critic (Minor)
- Path: run/executor.go:160, run/ports.go:53
- Impact: The PR wires ra.MaxIterations directly into the newly added RunInfo.MaxIterations field, which seeds the critic's baseline step ceiling. If a caller supplies a huge integer for MaxIterations, the handle initializes with that inflated budget before the Escalator can even raise it further. A defensive validation or default cap should be enforced in executor.go before info.MaxIterations is assigned to prevent downstream escalation from an already-invalid starting state.

🎯 Correctness — No material issues found

VERDICT: No material issues found

Findings:

No correctness issues found. The RunInfo.MaxIterations field is correctly populated from the effective maxIter value (which already accounts for defaults via withFallbacks()), and this same effective value flows to both info.MaxIterations (used by critic.System.Monitor to initialize h.maxSteps) and maxStepsOption(base) (which uses it as the fallback). This ensures consistency between what the critic sees as baseline and what the executor passes to majordomo.
The dynamic ceiling mechanism is sound. In criticBinding.maxStepsOption:
- When no critic exists (b == nil), returns fixed agent.WithMaxSteps(base) — unchanged behavior.
- When a critic exists, returns agent.WithMaxStepsFunc(func() int { ... }) that polls b.h.MaxSteps() each iteration. If the handle returns > 0, use that ceiling; otherwise fall back to base.
The handle.tick lock-step update of maxSteps is correct. RaiseStepsBy > 0 guard prevents negative or zero values from corrupting the ceiling, and h.maxSteps += d.RaiseStepsBy increments properly under the mutex, matching the concurrency-safety pattern already used for deadline, steer, etc.
Test validates the mechanism correctly. fakeCriticHandle{maxSteps: 5} with MaxIterations: 1 verifies that when a critic returns a ceiling > base, the agent can run past the original step limit. The test doesn't exercise the real handle.tick → RaiseStepsBy path (it uses a hardcoded fake), but this is a valid simplification since the dynamic-polling path through WithMaxStepsFunc is the core change being verified.
The flow from executor to critic handle is consistent: info.MaxIterations = maxIter at run/executor.go:167 where maxIter has already been resolved (from ra.MaxIterations or default). This same maxIter is passed as base into maxStepsOption(maxIter). No divergence.

🧯 Error handling & edge cases — ⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 8: context deadline exceeded

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 14m 31s}

### 🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5) **Verdict: Blocking issues found · ⚠️ 1/3 lens(es) errored** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Blocking issues found</summary> **Verdict: Blocking issues found** ## Security Findings - **Unbounded `RaiseStepsBy` — No Upper Cap on Step Ceiling (Blocking)** - **Path:** `critic/critic.go:274-276` (`tick`), `run/critic.go:95-107` (`maxStepsOption`) - **Impact:** The code replaces the static `agent.WithMaxSteps(maxIter)` with a dynamic `WithMaxStepsFunc` that polls `h.maxSteps`. Inside `tick()`, `h.maxSteps += d.RaiseStepsBy` is executed without any upper bound check. An adversarial or buggy LLM can set `RaiseStepsBy` to arbitrary values, causing the agent's step ceiling to inflate indefinitely within its runtime window. This removes the previous hard safety cap and introduces a direct resource exhaustion / DoS vector (excessive API calls, CPU, and token consumption). Fix: clamp `h.maxSteps` to a configurable max or enforce a static ceiling in `maxStepsOption`. - **Caller-settable `RunInfo.MaxIterations` — No Sanity Check Before Passing to Critic (Minor)** - **Path:** `run/executor.go:160`, `run/ports.go:53` - **Impact:** The PR wires `ra.MaxIterations` directly into the newly added `RunInfo.MaxIterations` field, which seeds the critic's baseline step ceiling. If a caller supplies a huge integer for `MaxIterations`, the handle initializes with that inflated budget before the Escalator can even raise it further. A defensive validation or default cap should be enforced in `executor.go` before `info.MaxIterations` is assigned to prevent downstream escalation from an already-invalid starting state. </details> <details><summary>🎯 Correctness — No material issues found</summary> VERDICT: No material issues found **Findings:** - **No correctness issues found.** The `RunInfo.MaxIterations` field is correctly populated from the effective `maxIter` value (which already accounts for defaults via `withFallbacks()`), and this same effective value flows to both `info.MaxIterations` (used by `critic.System.Monitor` to initialize `h.maxSteps`) and `maxStepsOption(base)` (which uses it as the fallback). This ensures consistency between what the critic sees as baseline and what the executor passes to majordomo. - **The dynamic ceiling mechanism is sound.** In `criticBinding.maxStepsOption`: - When no critic exists (`b == nil`), returns fixed `agent.WithMaxSteps(base)` — unchanged behavior. - When a critic exists, returns `agent.WithMaxStepsFunc(func() int { ... })` that polls `b.h.MaxSteps()` each iteration. If the handle returns `> 0`, use that ceiling; otherwise fall back to `base`. - **The `handle.tick` lock-step update of `maxSteps` is correct.** `RaiseStepsBy > 0` guard prevents negative or zero values from corrupting the ceiling, and `h.maxSteps += d.RaiseStepsBy` increments properly under the mutex, matching the concurrency-safety pattern already used for `deadline`, `steer`, etc. - **Test validates the mechanism correctly.** `fakeCriticHandle{maxSteps: 5}` with `MaxIterations: 1` verifies that when a critic returns a ceiling > base, the agent can run past the original step limit. The test doesn't exercise the real `handle.tick` → `RaiseStepsBy` path (it uses a hardcoded fake), but this is a valid simplification since the dynamic-polling path through `WithMaxStepsFunc` is the core change being verified. - **The flow from executor to critic handle is consistent:** `info.MaxIterations = maxIter` at `run/executor.go:167` where `maxIter` has already been resolved (from `ra.MaxIterations` or default). This same `maxIter` is passed as `base` into `maxStepsOption(maxIter)`. No divergence. </details> <details><summary>🧯 Error handling & edge cases — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 8: context deadline exceeded </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 14m 31s

gitea-actions bot commented

2026-06-27 18:17:51 +00:00

🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

Security lens review: the change exposes a dynamic step ceiling via a new CriticHandle.MaxSteps() method and a RaiseStepsBy knob in Decision. The tainted-input path is short and safe: MaxIterations is set by the executor from already-resolved RunnableAgent/Defaults values, passed into the critic port, and the dynamic function merely returns an int to agent.WithMaxStepsFunc. There is no injection surface, no credential or secret handling, no deserialization, and no authz bypass introduced. The critic port is a privileged host extension by design, so an escalator raising its own run's step ceiling is the intended capability. No security issues.

🎯 Correctness — No material issues found

Verdict: No material issues found

I reviewed the change through the Correctness lens by reading the modified files and cross-checking the relevant call sites.

Key verification points:

run.RunInfo.MaxIterations is correctly populated from the resolved maxIter value (run/executor.go:167), not the raw ra.MaxIterations, so host defaults are honored.
criticBinding.maxStepsOption returns agent.WithMaxSteps(base) when no critic is active (run/critic.go:95-97), preserving the original fixed behavior. When a critic is active it uses agent.WithMaxStepsFunc and falls back to base if the handle returns <=0, matching the documented interface contract (run/ports.go:135-139).
The critic battery initializes handle.maxSteps from info.MaxIterations (critic/critic.go:142) and only raises it via d.RaiseStepsBy (critic/critic.go:277-279). The arithmetic h.maxSteps += d.RaiseStepsBy is bounded and additive, which is the intended semantics for raising a ceiling mid-flight.
The test TestCriticRaisesStepCeiling uses MaxIterations=1 and a critic returning MaxSteps=5, which correctly exercises that two tool-dispatch steps can occur past the base ceiling.
The comment in critic/critic.go:13-16 correctly reflects that the executor already wires MaxSteps, matching the actual state of run/executor.go:247-250.

No logic bugs, incorrect constants, or semantic regressions were found in this lens.

🧯 Error handling & edge cases — Minor issues

Verdict: Minor issues

run/critic.go:99-104 — Unbudgeted reduction edge case. The dynamic step closure returns any positive MaxSteps() value as-is, so a misbehaving or over-eager critic can lower the step ceiling below the run's base MaxIterations, even though the new API is raise-only (Decision.RaiseStepsBy) and the doc says the critic "raises" the budget. Since the intended invariant is "raise the ceiling relative to the baseline", values 0 < n < base should probably be clamped to base, or the interface contract should explicitly allow lowering. Verified by reading run/critic.go and run/executor.go:125-128 (base is a normalized positive maxIter).

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 20s}

### 🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> No material issues found Security lens review: the change exposes a dynamic step ceiling via a new `CriticHandle.MaxSteps()` method and a `RaiseStepsBy` knob in `Decision`. The tainted-input path is short and safe: `MaxIterations` is set by the executor from already-resolved `RunnableAgent`/`Defaults` values, passed into the critic port, and the dynamic function merely returns an `int` to `agent.WithMaxStepsFunc`. There is no injection surface, no credential or secret handling, no deserialization, and no authz bypass introduced. The critic port is a privileged host extension by design, so an escalator raising its own run's step ceiling is the intended capability. No security issues. </details> <details><summary>🎯 Correctness — No material issues found</summary> **Verdict:** No material issues found I reviewed the change through the **Correctness** lens by reading the modified files and cross-checking the relevant call sites. Key verification points: - `run.RunInfo.MaxIterations` is correctly populated from the resolved `maxIter` value (`run/executor.go:167`), not the raw `ra.MaxIterations`, so host defaults are honored. - `criticBinding.maxStepsOption` returns `agent.WithMaxSteps(base)` when no critic is active (`run/critic.go:95-97`), preserving the original fixed behavior. When a critic is active it uses `agent.WithMaxStepsFunc` and falls back to `base` if the handle returns `<=0`, matching the documented interface contract (`run/ports.go:135-139`). - The `critic` battery initializes `handle.maxSteps` from `info.MaxIterations` (`critic/critic.go:142`) and only raises it via `d.RaiseStepsBy` (`critic/critic.go:277-279`). The arithmetic `h.maxSteps += d.RaiseStepsBy` is bounded and additive, which is the intended semantics for raising a ceiling mid-flight. - The test `TestCriticRaisesStepCeiling` uses `MaxIterations=1` and a critic returning `MaxSteps=5`, which correctly exercises that two tool-dispatch steps can occur past the base ceiling. - The comment in `critic/critic.go:13-16` correctly reflects that the executor already wires `MaxSteps`, matching the actual state of `run/executor.go:247-250`. No logic bugs, incorrect constants, or semantic regressions were found in this lens. </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> **Verdict: Minor issues** - `run/critic.go:99-104` — Unbudgeted reduction edge case. The dynamic step closure returns any positive `MaxSteps()` value as-is, so a misbehaving or over-eager critic can **lower** the step ceiling below the run's base `MaxIterations`, even though the new API is raise-only (`Decision.RaiseStepsBy`) and the doc says the critic "raises" the budget. Since the intended invariant is "raise the ceiling relative to the baseline", values `0 < n < base` should probably be clamped to `base`, or the interface contract should explicitly allow lowering. Verified by reading `run/critic.go` and `run/executor.go:125-128` (base is a normalized positive `maxIter`). </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 20s

gitea-actions bot commented

2026-06-27 18:18:21 +00:00

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

I traced every tainted-data path through the security lens:

MaxIterations originates from ra.MaxIterations (user-provided) → sanitized by withFallbacks() (negative/zero → 12) → stored in RunInfo → flows to critic/critic.go:142 as handle.maxSteps initializer and to Audit.StartRun. No injection surface: it's an int used only in arithmetic comparisons.
RaiseStepsBy comes from the host Escalator (trusted host code) → gated by > 0 at critic/critic.go:277 → added to maxSteps under mutex. Negative values are ignored. Integer overflow (if an Escalator returned math.MaxInt repeatedly) would wrap maxSteps negative, but maxStepsOption at run/critic.go:100 treats <= 0 as "defer to base" — self-correcting, no security impact.
MaxSteps() is polled per-step via WithMaxStepsFunc and used only as a loop bound in majordomo. The closure at run/critic.go:99-104 captures the handle reference; MaxSteps() is properly mutex-guarded (critic/critic.go:200-204). No TOCTOU that could be exploited.
No new logging/exfiltration: the critic's panic-recovery log at critic/critic.go:223 only emits RunID, not the full RunInfo. MaxIterations is not a secret.
No new authn/authz, injection, SSRF, deserialization, or command-execution surface — the change is purely internal plumbing (struct field + interface method + int arithmetic).

The default ExtendOnce escalator does not set RaiseStepsBy, so the default behavior is unchanged (time-only extension).

🎯 Correctness — No material issues found

Verdict: No material issues found

After tracing the full logic chain — from RunInfo.MaxIterations → critic handle init → MaxSteps() polling → maxStepsOption fallback → tick() raise — the implementation is semantically consistent and correct for the described use case.

What I verified:

critic/critic.go: handle.maxSteps is initialized from info.MaxIterations (line 142), MaxSteps() returns it under lock (line 200–204), and tick() adds RaiseStepsBy under the same lock (line 277–278). No race between the watch goroutine's write and the executor's poll.
run/critic.go: maxStepsOption correctly branches on b == nil (line 96) — Go allows method calls on nil pointers, and the nil receiver is handled. The closure captures base by value (line 99–104), which is safe since base is an int.
run/executor.go: maxIter is always > 0 after defaults (line 126–128), so info.MaxIterations is always positive, and the critic handle's maxSteps starts at a positive value. The critic.maxStepsOption(maxIter) call (line 250) works correctly whether critic is nil or not.
run/ports.go: RunInfo.MaxIterations is a pure addition to a struct passed by value — backward-compatible with all existing callers. CriticHandle.MaxSteps() is a net-new method; the only implementor (critic.handle) provides it.

One observation (not a bug, but a design constraint worth noting): The critic only fires OnSoftTimeout when the run is idle (idle ≥ softTimeout). The step ceiling, however, matters when the run is active (iterating). If a run burns through its step budget without stalling, the critic never gets a chance to raise the ceiling — the agent hits ErrMaxSteps first. This isn't a correctness defect in this PR (the mechanism is sound), but the mort adapter will need to ensure its Escalator is invoked before the step budget is exhausted, or accept that the raise only helps runs that stall mid-way.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

The diff is clean through the error-handling & edge-cases lens. I verified the following by reading the full files (not just the diff hunks):

Nil-safety of maxStepsOption: criticBinding.maxStepsOption has a b == nil guard (verified at run/critic.go:96), and startCritic only returns a non-nil binding when h != nil (verified at run/critic.go:38-41). The closure's b.h is never nil when the binding is non-nil.
MaxSteps() <= 0 fallback: The maxStepsOption closure correctly falls back to base when MaxSteps() returns 0 or negative (run/critic.go:100). The critic handle starts with maxSteps: info.MaxIterations (critic/critic.go:142), and if that's 0 (e.g., from a host that constructs RunInfo without the new field), the fallback still works.
RaiseStepsBy guarded against negative/zero: tick() only applies RaiseStepsBy when > 0 (critic/critic.go:277), so a misbehaving Escalator returning negative values is silently ignored — consistent with the "raise" (not "adjust") semantics.
Kill path ignores RaiseStepsBy: When d.Kill is true, tick() returns early before reaching the RaiseStepsBy block (critic/critic.go:266-269), matching the existing treatment of Nudge/ExtendBy on kill.
Concurrent safety: handle.MaxSteps() acquires h.mu (critic/critic.go:200-203), consistent with all other handle methods. The tick() mutation of h.maxSteps is also under the same mutex (acquired at line 261).
Panic recovery: MaxSteps() is polled on the main goroutine (via WithMaxStepsFunc), so the executor's top-level recover (run/executor.go:115-118) catches any panic — consistent with how Steer() is already handled.
Integer overflow: h.maxSteps += d.RaiseStepsBy uses int; overflow would require billions of steps on 32-bit or quintillions on 64-bit — not a realistic concern for a single run.
RunInfo backward compatibility: The new MaxIterations field is at the end of the struct; existing callers that omit it get the zero value, which the critic correctly treats as "no baseline override" (verified at critic/critic_test.go:31,70,97 where RunInfo is constructed without it).

No ignored errors, missing cleanup, nil-dereference risks, or unhandled boundary conditions found.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 16s}

### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> ## Verdict: No material issues found I traced every tainted-data path through the security lens: - **`MaxIterations`** originates from `ra.MaxIterations` (user-provided) → sanitized by `withFallbacks()` (negative/zero → 12) → stored in `RunInfo` → flows to `critic/critic.go:142` as `handle.maxSteps` initializer and to `Audit.StartRun`. No injection surface: it's an `int` used only in arithmetic comparisons. - **`RaiseStepsBy`** comes from the host `Escalator` (trusted host code) → gated by `> 0` at `critic/critic.go:277` → added to `maxSteps` under mutex. Negative values are ignored. Integer overflow (if an Escalator returned `math.MaxInt` repeatedly) would wrap `maxSteps` negative, but `maxStepsOption` at `run/critic.go:100` treats `<= 0` as "defer to base" — self-correcting, no security impact. - **`MaxSteps()`** is polled per-step via `WithMaxStepsFunc` and used only as a loop bound in majordomo. The closure at `run/critic.go:99-104` captures the handle reference; `MaxSteps()` is properly mutex-guarded (`critic/critic.go:200-204`). No TOCTOU that could be exploited. - **No new logging/exfiltration**: the critic's panic-recovery log at `critic/critic.go:223` only emits `RunID`, not the full `RunInfo`. `MaxIterations` is not a secret. - **No new authn/authz, injection, SSRF, deserialization, or command-execution surface** — the change is purely internal plumbing (struct field + interface method + int arithmetic). The default `ExtendOnce` escalator does not set `RaiseStepsBy`, so the default behavior is unchanged (time-only extension). </details> <details><summary>🎯 Correctness — No material issues found</summary> ## Verdict: No material issues found After tracing the full logic chain — from `RunInfo.MaxIterations` → critic handle init → `MaxSteps()` polling → `maxStepsOption` fallback → `tick()` raise — the implementation is semantically consistent and correct for the described use case. **What I verified:** - `critic/critic.go`: `handle.maxSteps` is initialized from `info.MaxIterations` (line 142), `MaxSteps()` returns it under lock (line 200–204), and `tick()` adds `RaiseStepsBy` under the same lock (line 277–278). No race between the watch goroutine's write and the executor's poll. - `run/critic.go`: `maxStepsOption` correctly branches on `b == nil` (line 96) — Go allows method calls on nil pointers, and the nil receiver is handled. The closure captures `base` by value (line 99–104), which is safe since `base` is an `int`. - `run/executor.go`: `maxIter` is always > 0 after defaults (line 126–128), so `info.MaxIterations` is always positive, and the critic handle's `maxSteps` starts at a positive value. The `critic.maxStepsOption(maxIter)` call (line 250) works correctly whether `critic` is nil or not. - `run/ports.go`: `RunInfo.MaxIterations` is a pure addition to a struct passed by value — backward-compatible with all existing callers. `CriticHandle.MaxSteps()` is a net-new method; the only implementor (`critic.handle`) provides it. **One observation (not a bug, but a design constraint worth noting):** The critic only fires `OnSoftTimeout` when the run is *idle* (idle ≥ softTimeout). The step ceiling, however, matters when the run is *active* (iterating). If a run burns through its step budget without stalling, the critic never gets a chance to raise the ceiling — the agent hits `ErrMaxSteps` first. This isn't a correctness defect in this PR (the mechanism is sound), but the mort adapter will need to ensure its Escalator is invoked before the step budget is exhausted, or accept that the raise only helps runs that stall mid-way. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> ## Verdict: No material issues found The diff is clean through the error-handling & edge-cases lens. I verified the following by reading the full files (not just the diff hunks): - **Nil-safety of `maxStepsOption`**: `criticBinding.maxStepsOption` has a `b == nil` guard (verified at `run/critic.go:96`), and `startCritic` only returns a non-nil binding when `h != nil` (verified at `run/critic.go:38-41`). The closure's `b.h` is never nil when the binding is non-nil. - **`MaxSteps() <= 0` fallback**: The `maxStepsOption` closure correctly falls back to `base` when `MaxSteps()` returns 0 or negative (`run/critic.go:100`). The critic handle starts with `maxSteps: info.MaxIterations` (`critic/critic.go:142`), and if that's 0 (e.g., from a host that constructs `RunInfo` without the new field), the fallback still works. - **`RaiseStepsBy` guarded against negative/zero**: `tick()` only applies `RaiseStepsBy` when `> 0` (`critic/critic.go:277`), so a misbehaving Escalator returning negative values is silently ignored — consistent with the "raise" (not "adjust") semantics. - **Kill path ignores `RaiseStepsBy`**: When `d.Kill` is true, `tick()` returns early before reaching the `RaiseStepsBy` block (`critic/critic.go:266-269`), matching the existing treatment of `Nudge`/`ExtendBy` on kill. - **Concurrent safety**: `handle.MaxSteps()` acquires `h.mu` (`critic/critic.go:200-203`), consistent with all other handle methods. The `tick()` mutation of `h.maxSteps` is also under the same mutex (acquired at line 261). - **Panic recovery**: `MaxSteps()` is polled on the main goroutine (via `WithMaxStepsFunc`), so the executor's top-level `recover` (`run/executor.go:115-118`) catches any panic — consistent with how `Steer()` is already handled. - **Integer overflow**: `h.maxSteps += d.RaiseStepsBy` uses `int`; overflow would require billions of steps on 32-bit or quintillions on 64-bit — not a realistic concern for a single run. - **`RunInfo` backward compatibility**: The new `MaxIterations` field is at the end of the struct; existing callers that omit it get the zero value, which the critic correctly treats as "no baseline override" (verified at `critic/critic_test.go:31,70,97` where `RunInfo` is constructed without it). No ignored errors, missing cleanup, nil-dereference risks, or unhandled boundary conditions found. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 16s

gitea-actions bot commented

2026-06-27 18:19:16 +00:00

🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

Explanation: After reviewing the changes through the security lens (authentication/authorization gaps, injection flaws, SSRF, unsafe deserialization, secret leakage, input validation, and insecure defaults), no security issues were identified.

Key observations:

The changes introduce a dynamic step ceiling (MaxSteps) controlled by the critic via RaiseStepsBy in the Decision struct.
The critic is assumed to be a trusted component (part of the system's "critic battery" per the PR description), as it is provided via the Executor's Ports.Critic interface and not exposed to untrusted users.
While a malicious critic could theoretically cause resource exhaustion via extreme RaiseStepsBy values or cause integer overflow (leading to unexpected fallback behavior), the critic already possesses equivalent destructive capabilities via deadline extension (ExtendBy) and termination (Kill). Thus, the new feature does not introduce a new attack surface beyond what the critic already wields.
No injection vectors (SQL, command, template), SSRF risks, deserialization flaws, or secrets leakage were found in the modified code.
Input validation is implicitly handled: negative RaiseStepsBy values are ignored (only positive values increase the ceiling), and the executor falls back to the base MaxIterations if MaxSteps() returns ≤0.

No further security concerns were identified within the scope of this change.

No material issues found

🎯 Correctness — Blocking issues found

Blocking issues found:

run/critic.go:91-100: The maxStepsOption function contains a logical error. When a critic is present, it returns agent.WithMaxStepsFunc that calls b.h.MaxSteps() and returns that value if >0, otherwise falls back to base. However, the critic's handle.MaxSteps() returns h.maxSteps which is initialized to info.MaxIterations (the base) and only increased via RaiseStepsBy. Since the base is always positive (set from agent's MaxIterations or default 12), the critic's MaxSteps() will always be >= base > 0, making the fallback dead code. More critically, if the critic's MaxSteps() returns 0 or negative (which shouldn't happen but could if misused), it would incorrectly fall back to base instead of honoring the critic's intended ceiling of 0 or negative (which should mean "no limit" or "use base" per the interface comment). The correct behavior per the CriticHandle.MaxSteps() comment is: "Return <= 0 to defer to the run's base MaxIterations." So the function should return base when critic's MaxSteps() <= 0, but use critic's value when > 0. Current implementation does the opposite: it uses critic's value when > 0 (correct) but falls back to base when <= 0 (also correct per comment). Wait, re-reading: the function says: if n := b.h.MaxSteps(); n > 0 { return n } else { return base }. That matches the comment: critic's value > 0 is used, otherwise defer to base. So the logic is correct. But there's a different issue: the critic's MaxSteps() is initialized to base and only increased, so it will never be <=0. The fallback is never used when critic is present. This isn't a bug but makes the fallback redundant. However, the real issue is that the critic's MaxSteps() should be allowed to return values <=0 to defer to base, but the current design only allows increasing the ceiling (via RaiseStepsBy >0). The comment says critic can raise the ceiling, implying it can only increase, not decrease. So the fallback for <=0 is future-proofing but currently unused. Not a blocking issue.
run/critic.go:91-100: The function returns agent.WithMaxSteps(base) when b == nil. But when b == nil (no critic), we should use the fixed base ceiling. That's correct.

No blocking issues found in correctness lens. The logic appears sound.

However, let me double-check the test: run/critic_test.go:65-106: The test sets up a fake critic with maxSteps:5, and run with MaxIterations:1. The agent option becomes WithMaxStepsFunc that returns 5 (since critic's MaxSteps() returns 5>0). The agent runs up to 5 steps. The test provides two tool calls then a final answer, expecting success. This is correct.

One more check: run/executor.go:247-250: The comment says: "Step ceiling: a fixed WithMaxSteps(maxIter) normally, but when a critic is active it owns a DYNAMIC ceiling (WithMaxStepsFunc) so it can raise a healthy-but-long run's budget mid-flight. Falls back to maxIter." This matches the implementation.

Thus, no blocking issues found in correctness review.

But wait: let's verify the critic's MaxSteps() method is indeed called each step. The agent.WithMaxStepsFunc will call the provided func each step to get the current max steps. That's correct.

One potential issue: in critic/critic.go, the handle's MaxSteps() method locks the mutex. The agent's WithMaxStepsFunc will be called from the agent's run loop, which may be concurrent with the critic's watch goroutine (which calls tick and updates maxSteps). The mutex protects concurrent access, so that's fine.

Another: in critic/critic.go, the tick method updates h.maxSteps under the mutex when processing a decision. That's correct.

Thus, no correctness issues found.

Verdict: No material issues found.

🧯 Error handling & edge cases — No material issues found

No material issues found

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 48s}

### 🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> **Verdict**: No material issues found **Explanation**: After reviewing the changes through the security lens (authentication/authorization gaps, injection flaws, SSRF, unsafe deserialization, secret leakage, input validation, and insecure defaults), no security issues were identified. Key observations: - The changes introduce a dynamic step ceiling (`MaxSteps`) controlled by the critic via `RaiseStepsBy` in the `Decision` struct. - The critic is assumed to be a trusted component (part of the system's "critic battery" per the PR description), as it is provided via the Executor's `Ports.Critic` interface and not exposed to untrusted users. - While a malicious critic could theoretically cause resource exhaustion via extreme `RaiseStepsBy` values or cause integer overflow (leading to unexpected fallback behavior), the critic already possesses equivalent destructive capabilities via deadline extension (`ExtendBy`) and termination (`Kill`). Thus, the new feature does not introduce a new attack surface beyond what the critic already wields. - No injection vectors (SQL, command, template), SSRF risks, deserialization flaws, or secrets leakage were found in the modified code. - Input validation is implicitly handled: negative `RaiseStepsBy` values are ignored (only positive values increase the ceiling), and the executor falls back to the base `MaxIterations` if `MaxSteps()` returns ≤0. No further security concerns were identified within the scope of this change. ```markdown No material issues found ``` </details> <details><summary>🎯 Correctness — Blocking issues found</summary> Blocking issues found: 1. run/critic.go:91-100: The maxStepsOption function contains a logical error. When a critic is present, it returns agent.WithMaxStepsFunc that calls b.h.MaxSteps() and returns that value if >0, otherwise falls back to base. However, the critic's handle.MaxSteps() returns h.maxSteps which is initialized to info.MaxIterations (the base) and only increased via RaiseStepsBy. Since the base is always positive (set from agent's MaxIterations or default 12), the critic's MaxSteps() will always be >= base > 0, making the fallback dead code. More critically, if the critic's MaxSteps() returns 0 or negative (which shouldn't happen but could if misused), it would incorrectly fall back to base instead of honoring the critic's intended ceiling of 0 or negative (which should mean "no limit" or "use base" per the interface comment). The correct behavior per the CriticHandle.MaxSteps() comment is: "Return <= 0 to defer to the run's base MaxIterations." So the function should return base when critic's MaxSteps() <= 0, but use critic's value when > 0. Current implementation does the opposite: it uses critic's value when > 0 (correct) but falls back to base when <= 0 (also correct per comment). Wait, re-reading: the function says: if n := b.h.MaxSteps(); n > 0 { return n } else { return base }. That matches the comment: critic's value > 0 is used, otherwise defer to base. So the logic is correct. But there's a different issue: the critic's MaxSteps() is initialized to base and only increased, so it will never be <=0. The fallback is never used when critic is present. This isn't a bug but makes the fallback redundant. However, the real issue is that the critic's MaxSteps() should be allowed to return values <=0 to defer to base, but the current design only allows increasing the ceiling (via RaiseStepsBy >0). The comment says critic can raise the ceiling, implying it can only increase, not decrease. So the fallback for <=0 is future-proofing but currently unused. Not a blocking issue. 2. run/critic.go:91-100: The function returns agent.WithMaxSteps(base) when b == nil. But when b == nil (no critic), we should use the fixed base ceiling. That's correct. No blocking issues found in correctness lens. The logic appears sound. However, let me double-check the test: run/critic_test.go:65-106: The test sets up a fake critic with maxSteps:5, and run with MaxIterations:1. The agent option becomes WithMaxStepsFunc that returns 5 (since critic's MaxSteps() returns 5>0). The agent runs up to 5 steps. The test provides two tool calls then a final answer, expecting success. This is correct. One more check: run/executor.go:247-250: The comment says: "Step ceiling: a fixed WithMaxSteps(maxIter) normally, but when a critic is active it owns a DYNAMIC ceiling (WithMaxStepsFunc) so it can raise a healthy-but-long run's budget mid-flight. Falls back to maxIter." This matches the implementation. Thus, no blocking issues found in correctness review. But wait: let's verify the critic's MaxSteps() method is indeed called each step. The agent.WithMaxStepsFunc will call the provided func each step to get the current max steps. That's correct. One potential issue: in critic/critic.go, the handle's MaxSteps() method locks the mutex. The agent's WithMaxStepsFunc will be called from the agent's run loop, which may be concurrent with the critic's watch goroutine (which calls tick and updates maxSteps). The mutex protects concurrent access, so that's fine. Another: in critic/critic.go, the tick method updates h.maxSteps under the mutex when processing a decision. That's correct. Thus, no correctness issues found. Verdict: No material issues found. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> No material issues found </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 7m 48s

gitea-actions bot commented

2026-06-27 18:19:37 +00:00

🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

Verdict: Minor issues

Potential info leak via RunInfo.MaxIterations exposure (run/ports.go:51‑54).
The new MaxIterations field is added to RunInfo and is passed to downstream ports (e.g., Ports.Critic). If a custom critic implementation logs or returns the entire RunInfo to an external service, the internal iteration limit could be disclosed, which may aid an attacker in probing run budgets.
Lack of validation for CriticHandle.MaxSteps return value (run/critic.go:95‑104).
The dynamic step ceiling is obtained from b.h.MaxSteps(). If a malicious critic returns a negative value other than 0, WithMaxStepsFunc will accept it, causing the executor to treat the ceiling as the base maxIter only when n > 0. This allows a critic to effectively disable the step ceiling by returning -1, potentially leading to unbounded execution and resource exhaustion.
Missing authentication/authorization check before exposing CriticHandle.MaxSteps (critic/critic.go:162‑166).
The handle’s MaxSteps method is called on every step, but no verification is performed to ensure the caller (the executor) is authorized to query the critic’s state. If future code exposed the handle to other components, an unprivileged caller could infer or manipulate the step ceiling. Keeping CriticHandle unexported and adding a comment emphasizing its limited use mitigates this surface.

🎯 Correctness — Minor issues

Verdict: Minor issues

Findings

run/critic.go: maxStepsOption and handle.tick overflow risk (run/critic.go:91‑104, run/critic.go:277‑279)
The dynamic step‑ceiling option polls handle.MaxSteps(), and handle.tick adds d.RaiseStepsBy to h.maxSteps without any overflow protection. Repeated raises could overflow an int. A sanity check before adding would make the logic robust.
run/ports.go / skill/validate.go: RunInfo.MaxIterations documentation (run/ports.go:51‑53, skill/validate.go:206‑210)
The new MaxIterations field is correctly used throughout the code and the validator still enforces the existing hard ceiling (≤ 50 for non‑admin users). The only shortcoming is that the field’s comment could be clarified to state explicitly that it represents the base step ceiling, while the critic‑managed dynamic ceiling is separate.

🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

run/critic.go: maxStepsOption (run/critic.go:95‑104) – The maxStepsOption method calls b.h.MaxSteps() without checking that b.h is non‑nil. Although criticBinding is only created when a critic monitor returns a non‑nil handle, the code does not guard against a possible nil h field, which could cause a panic if b.h were ever nil. Adding a nil check before invoking MaxSteps() would make this path safe. Impact: potential panic in the executor when a critic handle is nil.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 28s}

### 🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — Minor issues</summary> **Verdict:** Minor issues - **Potential info leak via `RunInfo.MaxIterations` exposure** (`run/ports.go:51‑54`). The new `MaxIterations` field is added to `RunInfo` and is passed to downstream ports (e.g., `Ports.Critic`). If a custom critic implementation logs or returns the entire `RunInfo` to an external service, the internal iteration limit could be disclosed, which may aid an attacker in probing run budgets. - **Lack of validation for `CriticHandle.MaxSteps` return value** (`run/critic.go:95‑104`). The dynamic step ceiling is obtained from `b.h.MaxSteps()`. If a malicious critic returns a negative value other than `0`, `WithMaxStepsFunc` will accept it, causing the executor to treat the ceiling as the base `maxIter` only when `n > 0`. This allows a critic to effectively disable the step ceiling by returning `-1`, potentially leading to unbounded execution and resource exhaustion. - **Missing authentication/authorization check before exposing `CriticHandle.MaxSteps`** (`critic/critic.go:162‑166`). The handle’s `MaxSteps` method is called on every step, but no verification is performed to ensure the caller (the executor) is authorized to query the critic’s state. If future code exposed the handle to other components, an unprivileged caller could infer or manipulate the step ceiling. Keeping `CriticHandle` unexported and adding a comment emphasizing its limited use mitigates this surface. </details> <details><summary>🎯 Correctness — Minor issues</summary> **Verdict:** Minor issues **Findings** - **run/critic.go: maxStepsOption and handle.tick overflow risk** (`run/critic.go:91‑104`, `run/critic.go:277‑279`) The dynamic step‑ceiling option polls `handle.MaxSteps()`, and `handle.tick` adds `d.RaiseStepsBy` to `h.maxSteps` without any overflow protection. Repeated raises could overflow an `int`. A sanity check before adding would make the logic robust. - **run/ports.go / skill/validate.go: RunInfo.MaxIterations documentation** (`run/ports.go:51‑53`, `skill/validate.go:206‑210`) The new `MaxIterations` field is correctly used throughout the code and the validator still enforces the existing hard ceiling (≤ 50 for non‑admin users). The only shortcoming is that the field’s comment could be clarified to state explicitly that it represents the *base* step ceiling, while the critic‑managed dynamic ceiling is separate. </details> <details><summary>🧯 Error handling & edge cases — Minor issues</summary> **VERDICT:** Minor issues - **run/critic.go: maxStepsOption** (`run/critic.go:95‑104`) – The `maxStepsOption` method calls `b.h.MaxSteps()` without checking that `b.h` is non‑nil. Although `criticBinding` is only created when a critic monitor returns a non‑nil handle, the code does not guard against a possible nil `h` field, which could cause a panic if `b.h` were ever nil. Adding a nil check before invoking `MaxSteps()` would make this path safe. *Impact: potential panic in the executor when a critic handle is nil.* </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 28s

gitea-actions bot commented

2026-06-27 18:20:11 +00:00

🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found.

🎯 Correctness — No material issues found

VERDICT: No material issues found

The changes correctly implement a dynamic step ceiling mechanism through the critic system. The logic for raising the step ceiling via RaiseStepsBy in the critic's decision and the executor's polling of MaxSteps() is sound. The test case TestCriticRaisesStepCeiling appropriately verifies that a critic can allow a run to exceed its base MaxIterations. The integration between the run and critic packages maintains the intended invariants, and the fallback behavior when MaxSteps() <= 0 correctly defaults to the base ceiling. No correctness issues were identified in the diff.

🧯 Error handling & edge cases — No material issues found

No material issues found.

The changes introduce a dynamic step ceiling mechanism through the critic system, allowing a critic to raise a run's step ceiling mid-flight. The implementation correctly:

Adds MaxIterations to RunInfo to track the base step ceiling
Extends CriticHandle with MaxSteps() int method
Updates the critic system to track and modify the step ceiling via RaiseStepsBy
Modifies the executor to use a dynamic step ceiling when a critic is active
Includes proper fallback behavior when MaxSteps() <= 0
Adds comprehensive test coverage for the new functionality

All error handling and edge cases appear to be properly addressed, with appropriate locking around shared state and clear documentation of the behavior. The changes are well-integrated and maintain existing invariants.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 40s}

### 🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> No material issues found. </details> <details><summary>🎯 Correctness — No material issues found</summary> VERDICT: No material issues found The changes correctly implement a dynamic step ceiling mechanism through the critic system. The logic for raising the step ceiling via `RaiseStepsBy` in the critic's decision and the executor's polling of `MaxSteps()` is sound. The test case `TestCriticRaisesStepCeiling` appropriately verifies that a critic can allow a run to exceed its base `MaxIterations`. The integration between the `run` and `critic` packages maintains the intended invariants, and the fallback behavior when `MaxSteps() <= 0` correctly defaults to the base ceiling. No correctness issues were identified in the diff. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> No material issues found. The changes introduce a dynamic step ceiling mechanism through the critic system, allowing a critic to raise a run's step ceiling mid-flight. The implementation correctly: 1. Adds `MaxIterations` to `RunInfo` to track the base step ceiling 2. Extends `CriticHandle` with `MaxSteps() int` method 3. Updates the critic system to track and modify the step ceiling via `RaiseStepsBy` 4. Modifies the executor to use a dynamic step ceiling when a critic is active 5. Includes proper fallback behavior when `MaxSteps() <= 0` 6. Adds comprehensive test coverage for the new functionality All error handling and edge cases appear to be properly addressed, with appropriate locking around shared state and clear documentation of the behavior. The changes are well-integrated and maintain existing invariants. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 40s

gitea-actions bot commented

2026-06-27 18:20:51 +00:00

🪰 Gadfly review — `gemma4:cloud` (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

🎯 Correctness — No material issues found

Verdict: No material issues found

I have verified the changes against the repository. The logic for dynamically adjusting the step ceiling is correctly implemented across the critic battery and the run executor:

Baseline propagation: run.Executor correctly passes the maxIter baseline into RunInfo, which the critic.System then uses to initialize the handle.maxSteps baseline.
Dynamic Polling: The criticBinding.maxStepsOption implements the fallback logic correctly: if the CriticHandle.MaxSteps() returns a positive value (meaning the critic has set or raised the ceiling), that value is used; otherwise, it falls back to the original baseline.
Concurrency: Access to handle.maxSteps in critic/critic.go is protected by a mutex, which is necessary since the agent loop (polling MaxSteps()) and the critic monitor (updating via tick()) run in separate goroutines.
Verification: The provided test TestCriticRaisesStepCeiling correctly simulates the behavior by using a fake handle that returns a higher MaxSteps value, allowing a run with MaxIterations: 1 to complete multiple steps.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

All findings from the draft review were dropped after verification:

critic/critic.go:277: Dropped. The use of int for step counts is standard in the project. Given the context of agent iterations (typically in the tens or hundreds), the risk of integer overflow is negligible.
run/critic.go:13: Dropped. The logic is consistent with the project's use of 0 or negative values to signify "no limit" or "defer to default". The fallback if n := b.h.MaxSteps(); n > 0 { return n } return base correctly implements the intended behavior where the critic can either override the limit (if > 0) or let the executor's base limit apply.
Compilation/Namespace Issue: Dropped. The reviewer claimed critic.maxStepsOption in run/executor.go would not compile because it's a method on *criticBinding. However, in run/executor.go at line 250, the call is actually critic.maxStepsOption(maxIter). Looking at the imports and the code, the critic variable in run/executor.go is an instance of *criticBinding (created by e.startCritic at line 208), not the critic package. The call is a valid method call on a pointer to criticBinding.

_{Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 24s}

### 🪰 Gadfly review — `gemma4:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary>🔒 Security — No material issues found</summary> No material issues found </details> <details><summary>🎯 Correctness — No material issues found</summary> Verdict: No material issues found I have verified the changes against the repository. The logic for dynamically adjusting the step ceiling is correctly implemented across the `critic` battery and the `run` executor: 1. **Baseline propagation**: `run.Executor` correctly passes the `maxIter` baseline into `RunInfo`, which the `critic.System` then uses to initialize the `handle.maxSteps` baseline. 2. **Dynamic Polling**: The `criticBinding.maxStepsOption` implements the fallback logic correctly: if the `CriticHandle.MaxSteps()` returns a positive value (meaning the critic has set or raised the ceiling), that value is used; otherwise, it falls back to the original baseline. 3. **Concurrency**: Access to `handle.maxSteps` in `critic/critic.go` is protected by a mutex, which is necessary since the agent loop (polling `MaxSteps()`) and the critic monitor (updating via `tick()`) run in separate goroutines. 4. **Verification**: The provided test `TestCriticRaisesStepCeiling` correctly simulates the behavior by using a fake handle that returns a higher `MaxSteps` value, allowing a run with `MaxIterations: 1` to complete multiple steps. </details> <details><summary>🧯 Error handling & edge cases — No material issues found</summary> ## Verdict: No material issues found All findings from the draft review were dropped after verification: - **`critic/critic.go:277`**: Dropped. The use of `int` for step counts is standard in the project. Given the context of agent iterations (typically in the tens or hundreds), the risk of integer overflow is negligible. - **`run/critic.go:13`**: Dropped. The logic is consistent with the project's use of `0` or negative values to signify "no limit" or "defer to default". The fallback `if n := b.h.MaxSteps(); n > 0 { return n } return base` correctly implements the intended behavior where the critic can either override the limit (if `> 0`) or let the executor's base limit apply. - **Compilation/Namespace Issue**: Dropped. The reviewer claimed `critic.maxStepsOption` in `run/executor.go` would not compile because it's a method on `*criticBinding`. However, in `run/executor.go` at line 250, the call is actually `critic.maxStepsOption(maxIter)`. Looking at the imports and the code, the `critic` variable in `run/executor.go` is an instance of `*criticBinding` (created by `e.startCritic` at line 208), not the `critic` package. The call is a valid method call on a pointer to `criticBinding`. </details> Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 24s

steve added 1 commit 2026-06-27 18:38:50 +00:00

critic: overflow-guard maxSteps += RaiseStepsBy (gadfly 5-model convergence)

executus CI / test (pull_request) Has been cancelled

Details

306d575c31

A buggy/hostile Escalator returning a huge RaiseStepsBy could wrap handle.maxSteps
negative (which the executor reads as defer-to-base). Clamp at math.MaxInt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve added 1 commit 2026-06-27 18:41:17 +00:00

ci(gadfly): drop the m1 reviewer (dead weight; keep m5)

executus CI / test (pull_request) Failing after 1m1s

Details

f3bd43b726

m1/qwen3:14b proved consistently low-value + slowest in the pool over multiple
PRs. Removed from GADFLY_MODELS + GADFLY_PROVIDER_CONCURRENCY + its endpoint so it
never fires again. m5 retained.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve added 1 commit 2026-06-27 18:54:01 +00:00

chore: go mod tidy (add missing go.sum entry; CI tidiness gate)

executus CI / test (pull_request) Successful in 2m8s

Details

executus CI / test (push) Successful in 1m45s

Details

1a1d5e417b

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

steve closed this pull request

2026-06-27 19:00:27 +00:00

steve deleted branch critic-iteration-ceiling

2026-06-27 19:00:27 +00:00

gitea-actions bot referenced this pull request

2026-06-28 02:32:24 +00:00

ci: inherit gadfly's default swarm (slim caller, re-pin @b02b11d) #14

executus CI / test (pull_request) Successful in 2m8s

Details

executus CI / test (push) Successful in 1m45s

Details

Pull request closed

This pull request cannot be reopened because the branch was deleted.

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: steve/executus#10

run: critic can raise a run's step ceiling mid-flight (CriticHandle.MaxSteps) #10

Changes

Test

Invariants

🪰 Gadfly review — glm-5.1:cloud (ollama-cloud)

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Review — 🎯 Correctness

Findings

Corrected Review

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Corrected review

🪰 Gadfly review — m1/qwen3:14b (m1)

🪰 Gadfly review — m5/qwen3.6:35b-mlx (m5)

Security Findings

🪰 Gadfly review — kimi-k2.7-code:cloud (ollama-cloud)

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: No material issues found

Verdict: No material issues found

Verdict: No material issues found

🪰 Gadfly review — nemotron-3-super:cloud (ollama-cloud)

🪰 Gadfly review — gpt-oss:120b-cloud (ollama-cloud)

🪰 Gadfly review — qwen3-coder:480b-cloud (ollama-cloud)

🪰 Gadfly review — gemma4:cloud (ollama-cloud)

Verdict: No material issues found

Pull request closed

🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud)

🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud)

🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud)

🪰 Gadfly review — `m1/qwen3:14b` (m1)

🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5)

🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud)

🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud)

🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud)

🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud)

🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud)

🪰 Gadfly review — `gemma4:cloud` (ollama-cloud)