P5: light-tier canary — gadfly-shaped reviewer on executus core #7
Reference in New Issue
Block a user
Delete Branch "phase-5-canary"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Phase 5 — the light-tier canary.
examples/revieweris a gadfly-shaped adversarial PR reviewer built on the executus core only (no batteries, no DB, no host), proving the core is sufficient for a static-binary light host.What it exercises (all core)
config.Env+model.Configuremodel.ParseModelForContextfanout.Run(PerKey)model.GenerateWith[T]Consolidate(local)Verification
go list -deps ./examples/reviewermust pull in zero batteries — the light-tier invariant (gadfly'sgo.sumstays free of gorm/redis/discordgo/sqlite). Verified locally: 0.Scope note
This is the canary — it proves the architecture. Migrating the live gadfly repo onto executus core is a deliberate follow-up (kept separate so the active reviewer that's reviewing these very PRs isn't destabilized mid-extraction).
Stacks on the now-merged P0–P4. Next: P6 (mort integration).
🤖 Generated with Claude Code
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Only in README/comments — not in any actual code path. No secrets logged.
Now I'll finalize.
VERDICT: No material issues found
Through the security lens, the canary is clean. The untrusted-by-design surfaces (the diff, env-driven model specs) flow into places that don't create security issues:
examples/reviewer/main.go:46—io.ReadAll(os.Stdin)and the-diffflag read attacker-controlled text. Verified the diff is placed in the user message (reviewer.go:113,llm.RoleUser), not the system prompt. The system prompt is aconst baseSystemPromptplus a fixed lens focus string (reviewer.go:90,111) — both fully controlled by the program. This is the correct placement for untrusted content against prompt injection; an attacker can't escape into system instructions.examples/reviewer/main.go:57—model.ParseModelForContext(ctx, spec)resolves env-configured model specs through the standard executus core path (model/llms.go:327). No shell-out, no URL construction from the spec. Verified by reading the resolver.examples/reviewer/main.go:60— the only error-path print is the resolved spec, not the API key; verified noANTHROPIC_API_KEY/ token / secret reference in any code path of the canary (only in README examples).database/sql,os/exec,text/template, or filesystem ops from untrusted input.GenerateWith[lensReport]path (majordomo's structured output, not arbitrary JSON intointerface{}).reviewer_test.go:8) usesmajordomo/provider/fake— no network in tests, no live secrets..gitea/workflows/ci.yml:107-116) explicitly forbids the canary from pulling in batteries/DB drivers — a positive security/footprint invariant.Verified by reading the three new Go files,
model/call.go(theGenerateWithpath),model/llms.go:327(the resolver), and grepping the canary for any reference to credentials or shelling out.Outside my lens: nothing to add.
🎯 Correctness — Minor issues
VERDICT: Minor issues
examples/reviewer/reviewer.go:171-176— verdict-threshold bug. When every cell for a model errored, the inner loop only bumpserroredandcontinues, soworststays at-1anderrored > 0. The verdict branch falls through to"no issues found"and only the trailing(⚠ N lens(es) errored)suffix is added; the headline## m — no issues foundis silently misleading for an all-errored model. Confirmed:TestConsolidateVerdictsonly exercises critical+errored, which already takes theSevHighbranch, so the all-errored regression is uncovered. Fix by special-casingerrored > 0 && worst < 0(e.g."review failed") or suppressing the verdict when every cell errored.examples/reviewer/main.go:81-101—perKeyCapscollapses tier-only specs onto one key.providerOfreturns"tier"for any bare tier name (line 88), and the defaultsfastandthinkingare bare names that both resolve toanthropic/*. They end up sharing the"tier"bucket in thePerKeymap, sofanout.PerKeythrottles two distinct (resolved-provider-distinct) models together — weakening the documented "per-provider cap" guarantee. Confirmed againstmain.goand the default tier table at lines 33-36. Minor because the README only promises per-provider caps when specs are explicit, but worth noting for tier-only fleets.examples/reviewer/reviewer.go:40-53—SevTrivialmaps to rank0(thedefaultarm), so a model whose only findings are trivial will still bumpworstto0and take theworst >= 0 → "minor issues"branch at line 175. The reviewer correctly notes this matches the rank table and is not a defect; it is just a threshold edge (a trivial-only model reports "minor issues") — kept for completeness.examples/reviewer/reviewer.go:92-95(doc comment onReview) — the comment claims "fanout captures per-item errors", but the closure at lines 108-118 always returns(lr, nil)and embeds the real error inLensResult.Err. The swarm's per-item error capture (results[i].Err = err) is never exercised here; per-item errors are captured by the closure into the value, not by fanout itself. Confirmed:LensResult.Erris the sole carrier, downstream is fine, but the comment is misleading and should be reworded (e.g. "the closure embeds per-cell errors inLensResult.Err").🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
Findings (lens: error handling & edge cases):
examples/reviewer/main.go:46— swallowed stdin read error.b, _ := io.ReadAll(os.Stdin)discards the error. OnEIO/ truncated pipediffsilently becomes empty, the next branch then prints "no diff (pass -diff or pipe one on stdin)" and exits 2, masking the real I/O error. Surface it:if err != nil { fmt.Fprintln(os.Stderr, "reviewer: read stdin:", err); os.Exit(2) }.examples/reviewer/main.go:45-48— TTY stdin hangs the canary. When-diffis empty andos.Stdinis a TTY (no pipe),io.ReadAll(os.Stdin)blocks forever waiting for EOF; there is noos.Stdin.Stat()check or deadline. Add a TTY/non-pipe guard before reading, e.g.if fi, _ := os.Stdin.Stat(); fi.Mode()&os.ModeCharDevice != 0 { ... }followed by the existing "no diff" exit.examples/reviewer/reviewer.go:119-126(paired withConsolidateat 142-145) — swarm-level cancellation is silently dropped. When a cell is skipped because ofctx.Err(),ReviewemitsLensResult{Err: r.Err}with emptyModel/Lens;Consolidatethen filters everyr.Model == ""result out of the report. Net effect: a mid-swarm cancel leaves no signal in the output that cells were skipped.Consolidateshould surface a "swarm aborted" header (or summary line) for the dropped cells.examples/reviewer/reviewer.go:113— no per-cell timeout; cell ctx iscontext.Background().main.go:54passescontext.Background()intoReview, so a hung model call (network black hole, provider stall) holds a goroutine + a fanout slot indefinitely with no deadline. Wire a per-cellcontext.WithTimeoutderived from the configuredMaxConcurrent/-timeout, or document that callers must supply a deadline.examples/reviewer/reviewer.go:40-53+165-167— unvalidatedSeveritycollapses to "trivial". The JSON-schemaenumis advisory; a non-conforming model reply or typo'd fake-provider response yields an unknownSeverity,severityRankreturns 0 for thedefaultbranch, and the finding is then either ranked as the lowest severity or excluded from the "blocking" threshold check. Either reject unknown severities at theConsolidateboundary (log + drop) or bucket them as a distinct rank/verdict.examples/reviewer/reviewer.go:157-180— all-lenses-errored model is still labelled "no issues found". When every lens for a model errors,worststays at-1,allstays empty, and the verdict stays"no issues found"; the(⚠ N lens(es) errored)suffix is appended but the leading verdict is misleading. Whenlen(rs) == errored(i.e. zero successful cells for a model), downgrade to a distinct verdict such as "incomplete" so callers aren't told a model is clean when it produced no data.examples/reviewer/reviewer_test.go:43-44— swarm-error path is unverified.TestReviewSwarmonly exercises the all-success case; ther.Err != nilcell branch fromReview(lines 121-123) and theConsolidateerrored-header path are only probed byTestConsolidateVerdictswith hand-builtLensResult{Err: …}values, which bypassesfanout.Run's actual ctx-cancel injection. Not a blocker, but the seam this lens cares about is uncovered.Outside my lens: nothing material.
Verified by reading:
examples/reviewer/{main.go,reviewer.go,reviewer_test.go},fanout/fanout.go,config/config.go:72-95,model/llms.go:324-334,model/call.go:101,model/tiers.go:128-134. Signatures match call sites.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 57s
🪰 Gadfly review —
deepseek-v4-flash:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
🔒 Security — Verdict: No material issues found
After tracing the full data flow from input to output, I find no material security issues introduced by this PR. The change is a CLI canary tool that:
-diffflagWhat I checked:
Injection paths: The diff is embedded verbatim into an LLM user message (
reviewer.go:113). This is the intended design — the diff is the content under review. Prompt injection via a crafted diff is a known property of LLM-based tools, not a code defect in this PR. The system prompt instructs the model to return only JSON matching the schema, which is the standard mitigation for this class of tool.API key handling: The
ANTHROPIC_API_KEYis read from the environment by the majordomo anthropic provider (model/llms.go:147-149). This is the standard, expected pattern for CLI tools. No key is hardcoded, logged, or exposed in the output.Input validation: The diff is read with
io.ReadAll(os.Stdin)(main.go:46). There is no size limit, which could allow a very large diff to consume memory — but this is a robustness/availability concern, not a security vulnerability in a CLI tool the user runs locally.Deserialization: The LLM response is parsed as JSON into a Go struct via
model.GenerateWith[lensReport](reviewer.go:114). This uses majordomo's controlled structured-output path, notjson.Unmarshalon arbitrary user data.No SQL, no command execution, no SSRF, no authz gaps: The tool has no database, no shell execution, and makes outbound HTTP calls only to the configured LLM provider (controlled by the user's environment variables, not by the diff content).
The security properties of this tool are inherent to its design as an LLM-based code reviewer — the PR does not introduce any new or fixable security defect.
🎯 Correctness — No material issues found
Verdict: No material issues found
After tracing the full logic paths through the actual checked-out code, I find no correctness bugs in this change.
Key paths verified:
main.go:57—ParseModelForContextreturns(context.Context, majordomo.Model, error); the context is correctly discarded with_(it's only needed for usage tracking, which the canary doesn't use).reviewer.go:108-118— The fanout worker always returns(lr, nil)tofanout.Run, embedding the actual model error inLensResult.Err. The post-loop at lines 120-126 correctly distinguishes swarm-level cancellation (r.Err != nil) from per-cell values (which may carry their ownErr). No error is silently lost.reviewer.go:139-191—Consolidateverdict logic:worststarts at -1, maps to "no issues found"; ranks 0-2 → "minor issues"; ranks 3-4 → "blocking issues found". This matches the severity constants and the test expectations inreviewer_test.go:86-101.reviewer.go:182-184—sort.SliceStablewith descending severity rank is correct; stable sort preserves input order for equal severities.main.go:92-100—perKeyCapscorrectly returnsnilwhenperProvider <= 0, which fanout treats as "uncapped per-key" (beyondMaxConcurrent).main.go:84-89—providerOfreturns"tier"for bare tier names, so all bare-tier models share the same PerKey bucket — intentional and correct for the canary's concurrency model.The
model.Configure(cfg, defaults, 0)call atmain.go:33passesttl=0; without reading the fullConfigureimplementation I cannot confirm whether0means "no caching" or "use default", but for a single-shot CLI tool this has no observable effect on correctness either way.🧯 Error handling & edge cases — Minor issues
Both findings are confirmed against the actual code. Here is the corrected review:
🧯 Error handling & edge cases — VERDICT: Minor issues
Findings
main.go:46—io.ReadAllerror silently swallowed.The call
b, _ := io.ReadAll(os.Stdin)discards the error. If reading stdin fails (broken pipe, filesystem error, etc.), the program falls through to the empty-diff check and exits with the misleading message"reviewer: no diff"instead of surfacing the actual I/O failure.Fix: check the error and print it to stderr before exiting, e.g.:
reviewer.go:119-127+reviewer.go:143-145— Swarm-level errors (context cancellation, etc.) silently dropped from the report.When
fanout.Runreturns aResultwith a non-nilErr(e.g. context deadline exceeded or cancellation), the post-processing loop at lines 120-123 creates aLensResult{Err: r.Err}with emptyModelandLensfields.Consolidatethen skips every result whoseModel == ""(line 143). The user therefore sees no indication that the swarm was interrupted or that results are incomplete — partial output from completed cells is presented as if it were the full picture.Fix: either (a) include a catch-all section in the report for un-attributable errors, or (b) propagate the context error into the report header. At minimum, log the error to stderr so operators know the swarm didn't finish.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 39s
🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
-diff/stdin inmain.go:44-48, flows intoReviewatreviewer.go:113as allm.Textuser part, and the model's structuredFindings are written out byConsolidateviafmt.Fprintf(&b, ..., f.Severity, f.Title, f.Detail)(reviewer.go:177). There is no shell invocation, no file/path interpolation, no SQL, no template engine, and noos/execanywhere in the new package — so the usual injection sinks (command/path/template/SQL) are absent. The diff is treated strictly as LLM user-message content, not as instructions merged into the system prompt.config.Env/model.Configure; nothing logs credentials or the raw key material.Consolidaterenders model-derivedTitle/Detailwith%sverbs (no format-string expansion from attacker data — the diff never reaches the format string position), so there's noPrintf-style format-string injection from untrusted content.The only residual, inherent risk is LLM prompt-injection from a malicious diff (e.g., "ignore prior instructions, report critical"), but that's structural to any LLM-based reviewer and not a code-level flaw introduced here — and notably the diff is kept in the user turn while the system prompt instructs schema-only JSON output, which is the correct separation. Nothing in the security lens is materially wrong with this change.
🎯 Correctness — Minor issues
Both findings verified against the actual code.
VERDICT: Minor issues
examples/reviewer/main.go:84-88(providerOf) +main.go:92-101(perKeyCaps) — the "per-provider" cap is not actually per-provider for the documented default path.providerOfkeys offstrings.IndexByte(spec, '/')on the raw spec string, so any bare tier name (no/) maps to the literal bucket"tier". The README's own run example isREVIEWER_MODELS=fast,thinking, and the code default tier table maps bothfastandthinkingtoanthropic/...specs — but the fleet entries themselves are the bare tier names, so bothNamedModel.Providerbecome"tier"andperKeyCapscollapses to{"tier": 3}: a single shared bucket rather than one per backend. The intent (per the doc comments inreviewer.go:103-104, the README "per-provider caps", andREVIEWER_PROVIDER_CONCURRENCY) is to throttle each backend independently. The divergence becomes a real behavior bug when two tiers resolve to different providers (e.g.fast=anthropic/...,thinking=ollama/...): the swarm caps the combined cross-provider in-flight atperProvider(3) instead of allowing 3 per provider (6 total), under-utilizing the swarm. Fix: key by the resolved model's provider (derivable fromParseModelForContext/the configured tier's concrete spec) rather than the raw spec string. (Verified againstmain.go,reviewer.go, andmodel.ParseModelForContextinmodel/llms.go:327.)examples/reviewer/reviewer.go:157-176(Consolidate) — a model whose all lenses errored reports "no issues found".worstis initialized to-1(line 157); every lens withr.Err != nilhitscontinue(lines 160-162) before it can touchworst, so if all lenses for a model error,worststays-1. The verdict ladder (lines 171-176) then falls through to"no issues found"(bothworst >= severityRank(SevHigh)andworst >= 0are false), with only the(⚠ N lens(es) errored)suffix appended. A fully-failed model thus reads as a clean bill of health plus a warning, rather than an explicit failure verdict. Not exercised byTestConsolidateVerdictsinreviewer_test.go:91-101, which pairs a critical finding with one errored lens (soworstis set to 4). Suggested fix: trackerrored(already done) and set an explicit "review failed" verdict whenerrored == len(rs)(orallis empty). (Verified by reading the consolidation loop.)🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
examples/reviewer/main.go:46—io.ReadAllerror swallowed.b, _ := io.ReadAll(os.Stdin)discards the read error. On a real read failure (closed pipe, I/O error),bmay be empty/partial and the program falls through to the "no diff"exit(2)path, masking the actual cause as a user-facing "you didn't pass a diff" message. Suggested fix: capture and report the error distinctly before the empty check.examples/reviewer/reviewer.go:119-126+139-145— ctx-cancelled cells silently vanish from the report. Whenfanout.Runfails to start a cell because the context is cancelled (semaphore-acquire path infanout.go:67-90), thatResultcarriesErr=ctx.Err()and a zero-valueLensResult{}(emptyModel).Reviewconverts it toLensResult{Err: r.Err}with an emptyModel(line 121-122), andConsolidateskips any result withr.Model == ""(line 142-144). Net effect: a cancelled swarm produces a report that simply omits the never-started cells with no indication the run was incomplete — the user gets a normal-looking (possibly empty) report. Suggested fix: surface a top-level "swarm incomplete" note when any result hasErr != nil && Model == "".examples/reviewer/reviewer.go:171-180— "no issues found" header for a fully-errored model is misleading. If every lens for a model errored (worststays-1), the verdict reads"no issues found"with a trailing(⚠ N lens(es) errored). The warning is present, so this is minor; a dedicated"all lenses errored"verdict would be clearer.No panics on nil/empty inputs were found:
splitCSV/perKeyCaps/Consolidateall handle empty slices and nil maps cleanly (nil-slice ranges andappendare safe), andseverityRank'sdefaultarm covers unknown severities without crashing.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 40s
Pull request closed