fix: clean-lens findings + trim the dogfood swarm to strong reviewers #4

Merged
steve merged 2 commits from fix/clean-lens-no-findings into main 2026-06-27 22:14:08 +00:00
Owner

Two scoreboard/quality changes surfaced by grading the dogfood reviews.

1. Don't emit findings from clean "No material issues" lenses

A lens that reports "No material issues found" still writes path:line references — but as verification notes ("verified X at file:line is safe"), not problems. The telemetry extracted those as findings, polluting the store and unfairly penalizing thorough clean-pass reviewers (claude-code/sonnet, deepseek, minimax). emit() now skips extraction for a clean-verdict lens (the run is still recorded). New TestEmit_SkipsCleanVerdictLens; README updated.

2. Trim the dogfood swarm to the strong reviewers

Per the scoreboard, dropped the low-signal reviewers from gadfly's own swarm:

  • m5/qwen3.6:35b-mlx (slow local Mac — removes the last foreman/local lane, so GADFLY_ENDPOINT_M5 + the m5 concurrency lane are gone)
  • gemma4:cloud, gpt-oss:120b-cloud, kimi-k2.7-code:cloud (weak cloud models)

Remaining: minimax-m3, glm-5.2, glm-5.1, deepseek-v4-pro, nemotron-3-super, qwen3-coder:480b + claude-code/sonnet6 cloud + the Claude Code engine, all fast cloud lanes.

Both take effect once merged. gofmt/vet/build/test -race green; workflow YAML validated.

🤖 Generated with Claude Code

Two scoreboard/quality changes surfaced by grading the dogfood reviews. ### 1. Don't emit findings from clean "No material issues" lenses A lens that reports *"No material issues found"* still writes `path:line` references — but as **verification notes** ("verified X at `file:line` is safe"), not problems. The telemetry extracted those as findings, polluting the store and **unfairly penalizing thorough clean-pass reviewers** (`claude-code/sonnet`, `deepseek`, `minimax`). `emit()` now skips extraction for a clean-verdict lens (the run is still recorded). New `TestEmit_SkipsCleanVerdictLens`; README updated. ### 2. Trim the dogfood swarm to the strong reviewers Per the scoreboard, dropped the low-signal reviewers from gadfly's own swarm: - **m5/qwen3.6:35b-mlx** (slow local Mac — removes the last foreman/local lane, so `GADFLY_ENDPOINT_M5` + the m5 concurrency lane are gone) - **gemma4:cloud**, **gpt-oss:120b-cloud**, **kimi-k2.7-code:cloud** (weak cloud models) Remaining: `minimax-m3`, `glm-5.2`, `glm-5.1`, `deepseek-v4-pro`, `nemotron-3-super`, `qwen3-coder:480b` + `claude-code/sonnet` — **6 cloud + the Claude Code engine**, all fast cloud lanes. Both take effect once merged. `gofmt`/`vet`/`build`/`test -race` green; workflow YAML validated. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-06-27 21:59:59 +00:00
fix: don't emit findings from clean "No material issues" lenses
Build & push image / build-and-push (pull_request) Successful in 7s
Adversarial Review (Gadfly) / review (pull_request) Successful in 8m33s
85f3b2546b
A lens whose verdict is "No material issues found" still tends to write
path:line references — but as verification notes ("verified X at
file:line is safe"), not problems. The telemetry was extracting those as
findings, which (a) pollutes the gadfly-reports store with false
positives and (b) unfairly penalizes thorough reviewers that do clean
passes — the FP penalty hit clean security passes from claude-code/sonnet,
deepseek, and minimax even though they correctly found nothing.

emit() now skips findings extraction for a clean-verdict lens (the run is
still recorded). Surfaced by grading the dogfood reviews: a large share
of "false positives" were exactly these clean-verification bullets.

Added TestEmit_SkipsCleanVerdictLens; README telemetry section updated.
gofmt clean, go vet quiet, go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🪰 Gadfly — live review status

11/11 reviewers finished · updated 2026-06-27 22:08:32Z

claude-code/sonnet · claude-code — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

deepseek-v4-pro:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

gemma4:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

glm-5.1:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

glm-5.2:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

gpt-oss:120b-cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

kimi-k2.7-code:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

m5/qwen3.6:35b-mlx · m5 — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

minimax-m3:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

nemotron-3-super:cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

qwen3-coder:480b-cloud · ollama-cloud — done

  • security — No material issues found
  • correctness — No material issues found
  • error-handling — No material issues found

Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.

<!-- gadfly-status-board --> ## 🪰 Gadfly — live review status 11/11 reviewers finished · updated 2026-06-27 22:08:32Z #### `claude-code/sonnet` · claude-code — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `deepseek-v4-pro:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `gemma4:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `glm-5.1:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `glm-5.2:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `gpt-oss:120b-cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `kimi-k2.7-code:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `m5/qwen3.6:35b-mlx` · m5 — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `minimax-m3:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `nemotron-3-super:cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found #### `qwen3-coder:480b-cloud` · ollama-cloud — ✅ done - ✅ **security** — No material issues found - ✅ **correctness** — No material issues found - ✅ **error-handling** — No material issues found <sub>Live status board. Findings are posted in each model's own comment. Advisory only — does not block merge.</sub>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

The change is a heuristic refinement confined to the optional, advisory findings‑telemetry path in cmd/gadfly/emit.go. I checked the diff against the surrounding code (emit.go:96‑161, consolidate.go:8‑62, emit_test.go:182‑220) and traced the data flow.

Security checklist for this PR

  • Authn/authz: unchanged. The GADFLY_FINDINGS_TOKEN header logic in emit.go:151‑159 is untouched; the same postJSON path is reused.
  • Injection (SQL / command / template / path): none introduced. The new branch is if r.verdict == verdictClean { continue }verdict is a typed enum (consolidate.go:9‑16) set internally by parseVerdict, never built from external strings in this codepath. No new format strings, queries, or shell calls.
  • SSRF: no change. The base URL still comes from GADFLY_FINDINGS_URL (emit.go:97); the continue does not alter URL construction or fetch targets.
  • Unsafe deserialization: no change. JSON marshalling is the same; the new branch reduces (not expands) what is sent.
  • Secret / credential leakage: unchanged. No new env reads or log lines that would echo the token.
  • Input validation: no new untrusted surface. The verdict comparison is over a constrained enum, and the markdown‑extraction skip actually reduces the amount of untrusted lens output forwarded to the store.
  • Insecure defaults: telemetry still defaults to off (emit.go:98‑100) and never touches stdout or the exit code (per the file header, emit.go:7‑11).

Side‑effect worth noting (informational, not a finding)

A model that opens with the literal phrase "no material issue" but later describes a real problem will now have its findings suppressed in /reports (the store still gets /runs, and stdout output to the user is unchanged). This is a pre‑existing property of parseVerdict (lead‑phrase wins, consolidate.go:33‑54) and not introduced by this PR — flagging only because the security lens is supposed to trace tainted data; here the "tainted" data is the model's self‑reported verdict text, and the trust boundary has not moved.

No issues to report under the security lens.

🎯 Correctness — No material issues found

VERDICT: No material issues found

Through the correctness lens, the change is straightforward and sound:

  • verdictClean is correctly defined at cmd/gadfly/consolidate.go:13, and specialistResult.verdict carries it through from parseVerdict (cmd/gadfly/main.go:265) to the emit loop.
  • The new guard sits after the errored skip and before parseFindings, so a clean lens's path:line references cannot reach the reports payload — verified by reading cmd/gadfly/emit.go:120-148.
  • /runs is posted outside the per-lens loop (emit.go:153), so per-model run/throughput stats remain unaffected as claimed. /reports is gated by len(reports) > 0 (emit.go:156); suppressing findings in the loop correctly suppresses the POST.
  • The test (emit_test.go:185-220) constructs a clean-verdict result whose markdown does contain a path:line reference (run/executor.go:166) and asserts /reports is hit 0 times while /runs is hit once — that directly exercises the new branch and would fail if the guard were missing (the existing parseFindings regex would otherwise happily extract it, as TestParseFindings_FallbackParagraphTitle demonstrates).

Semantic note (not a bug, just verification): the guard trusts parseVerdict's earliest-phrase rule. If a model ever writes "No material issues found" and then lists real problems, the guard suppresses them. That's a deliberate design choice documented in the PR ("a clean-verdict lens emits no findings") and consistent with how worstVerdict already treats self-reported verdicts as authoritative. No unit-level test asserts this contract today, but the PR description and README explicitly call it out, and a contradictory self-verdict is a model-behavior problem rather than a telemetry bug.

Nothing else to flag within correctness.

🧯 Error handling & edge cases — No material issues found

I've reviewed the diff and the surrounding code. Here's my assessment through the error handling & edge cases lens.

Verification performed:

  • Read cmd/gadfly/emit.go (full function around the change at line 119–148; the errored short-circuit at 121 precedes the new verdictClean check, so ordering is correct)
  • Read cmd/gadfly/consolidate.go to confirm verdictClean is the right symbol and how parseVerdict produces it
  • Read cmd/gadfly/main.go:234–268 to confirm specialistResult.errored and verdict are always set together (panic recovery sets verdictUnknown, errored=true)
  • Read the full new test plus surrounding tests in cmd/gadfly/emit_test.go
  • Verified the existing TestParseFindings_FallbackParagraphTitle still tests parseFindings directly, so it is unaffected by the new emit() gate

Findings: none material.

Notes on edge-case behavior I checked, all correct:

  • r.errored is checked before r.verdict == verdictClean, so a panic-recovered lens (errored=true, verdictUnknown) is already short-circuited.
  • parseFindings already returns nil for empty/whitespace output and the ⚠️ failure notice, so the gate is defense-in-depth rather than a correctness fix.
  • verdictUnknown (the only remaining un-gated path) is the right carve-out: it covers lenses whose output couldn't be parsed as a verdict, and those may still contain real findings worth recording.
  • runID formatting, atomic counters in the test handler, t.Setenv cleanup, and defer srv.Close() are all in order.

VERDICT: No material issues found

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 29s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **VERDICT: No material issues found** The change is a heuristic refinement confined to the optional, advisory findings‑telemetry path in `cmd/gadfly/emit.go`. I checked the diff against the surrounding code (`emit.go:96‑161`, `consolidate.go:8‑62`, `emit_test.go:182‑220`) and traced the data flow. **Security checklist for this PR** - **Authn/authz**: unchanged. The `GADFLY_FINDINGS_TOKEN` header logic in `emit.go:151‑159` is untouched; the same `postJSON` path is reused. - **Injection (SQL / command / template / path)**: none introduced. The new branch is `if r.verdict == verdictClean { continue }` — `verdict` is a typed enum (`consolidate.go:9‑16`) set internally by `parseVerdict`, never built from external strings in this codepath. No new format strings, queries, or shell calls. - **SSRF**: no change. The base URL still comes from `GADFLY_FINDINGS_URL` (emit.go:97); the `continue` does not alter URL construction or fetch targets. - **Unsafe deserialization**: no change. JSON marshalling is the same; the new branch reduces (not expands) what is sent. - **Secret / credential leakage**: unchanged. No new env reads or log lines that would echo the token. - **Input validation**: no new untrusted surface. The verdict comparison is over a constrained enum, and the markdown‑extraction skip actually *reduces* the amount of untrusted lens output forwarded to the store. - **Insecure defaults**: telemetry still defaults to off (emit.go:98‑100) and never touches stdout or the exit code (per the file header, emit.go:7‑11). **Side‑effect worth noting (informational, not a finding)** A model that opens with the literal phrase "no material issue" but later describes a real problem will now have its findings suppressed in `/reports` (the store still gets `/runs`, and stdout output to the user is unchanged). This is a pre‑existing property of `parseVerdict` (lead‑phrase wins, `consolidate.go:33‑54`) and not introduced by this PR — flagging only because the security lens is supposed to trace tainted data; here the "tainted" data is the model's self‑reported verdict text, and the trust boundary has not moved. No issues to report under the security lens. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **VERDICT: No material issues found** Through the correctness lens, the change is straightforward and sound: - `verdictClean` is correctly defined at `cmd/gadfly/consolidate.go:13`, and `specialistResult.verdict` carries it through from `parseVerdict` (`cmd/gadfly/main.go:265`) to the emit loop. - The new guard sits after the `errored` skip and before `parseFindings`, so a clean lens's path:line references cannot reach the reports payload — verified by reading `cmd/gadfly/emit.go:120-148`. - `/runs` is posted outside the per-lens loop (`emit.go:153`), so per-model run/throughput stats remain unaffected as claimed. `/reports` is gated by `len(reports) > 0` (`emit.go:156`); suppressing findings in the loop correctly suppresses the POST. - The test (`emit_test.go:185-220`) constructs a clean-verdict result whose markdown *does* contain a path:line reference (`run/executor.go:166`) and asserts `/reports` is hit 0 times while `/runs` is hit once — that directly exercises the new branch and would fail if the guard were missing (the existing `parseFindings` regex would otherwise happily extract it, as `TestParseFindings_FallbackParagraphTitle` demonstrates). Semantic note (not a bug, just verification): the guard trusts `parseVerdict`'s earliest-phrase rule. If a model ever writes "No material issues found" and then lists real problems, the guard suppresses them. That's a deliberate design choice documented in the PR ("a clean-verdict lens emits no findings") and consistent with how `worstVerdict` already treats self-reported verdicts as authoritative. No unit-level test asserts this contract today, but the PR description and README explicitly call it out, and a contradictory self-verdict is a model-behavior problem rather than a telemetry bug. Nothing else to flag within correctness. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> I've reviewed the diff and the surrounding code. Here's my assessment through the error handling & edge cases lens. **Verification performed:** - Read `cmd/gadfly/emit.go` (full function around the change at line 119–148; the `errored` short-circuit at 121 precedes the new `verdictClean` check, so ordering is correct) - Read `cmd/gadfly/consolidate.go` to confirm `verdictClean` is the right symbol and how `parseVerdict` produces it - Read `cmd/gadfly/main.go:234–268` to confirm `specialistResult.errored` and `verdict` are always set together (panic recovery sets `verdictUnknown, errored=true`) - Read the full new test plus surrounding tests in `cmd/gadfly/emit_test.go` - Verified the existing `TestParseFindings_FallbackParagraphTitle` still tests `parseFindings` directly, so it is unaffected by the new `emit()` gate **Findings:** none material. Notes on edge-case behavior I checked, all correct: - `r.errored` is checked before `r.verdict == verdictClean`, so a panic-recovered lens (`errored=true, verdictUnknown`) is already short-circuited. - `parseFindings` already returns nil for empty/whitespace output and the `⚠️` failure notice, so the gate is defense-in-depth rather than a correctness fix. - `verdictUnknown` (the only remaining un-gated path) is the right carve-out: it covers lenses whose output couldn't be parsed as a verdict, and those may still contain real findings worth recording. - `runID` formatting, atomic counters in the test handler, `t.Setenv` cleanup, and `defer srv.Close()` are all in order. **VERDICT: No material issues found** </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 29s</sub>

🪰 Gadfly review — glm-5.1:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

The change is a straightforward, type-safe gate: r.verdict == verdictClean compares two verdict-typed ints against a const, so there's no string injection risk. The logic correctly:

  • Still posts /runs for a clean lens (no auth or telemetry gap).
  • Skips /reports only when verdict is verdictClean; verdictUnknown (unparseable output) falls through to the findings path — the safer default.
  • Reduces outbound data, which is strictly a surface reduction, not an expansion.

I verified the verdict type definition (consolidate.go:9-16), the parseVerdict function, the postJSON auth handling, and the test's coverage of the new branch. No auth bypass, injection, SSRF, secret leakage, or unsafe-untrusted-data path is introduced.

🎯 Correctness — No material issues found

VERDICT: No material issues found

The change is logically sound:

  • verdictClean (value 1, defined in consolidate.go:13) is the correct constant to check — it maps to "No material issues found" and is only assigned when parseVerdict detects that exact phrase in a lens's output.
  • The continue at emit.go:130 is placed correctly: after the errored skip (line 121–123) and before r.verdict.label() and parseFindings (lines 132–133), so no severity label or findings are produced for clean lenses.
  • The /runs POST (line 153) remains unconditional — only the /reports section is gated by the continue. This matches the stated intent ("the run itself is still recorded").
  • When all lenses are clean, reports stays nil, so len(reports) > 0 (line 156) prevents a /reports POST entirely, consistent with the test's assertion of 0 /reports calls.
  • verdictUnknown lenses are intentionally unchanged (their path:line references might be real issues expressed in an unrecognized format), matching the PR description.
🧯 Error handling & edge cases — No material issues found

VERDICT: No material issues found

I verified the following through code inspection:

  • verdictClean constant and type: Defined in consolidate.go:13 as verdictClean (iota value 1, distinct from verdictUnknown=0), so the == verdictClean comparison won't accidentally match a zero-initialized or unknown verdict.
  • Ordering vs. errored check: The r.errored guard (line 121) runs before the verdictClean guard (line 129), so an errored lens is always skipped regardless of verdict.
  • verdictUnknown lenses: Not equal to verdictClean (different iota values), so unknown-verdict lenses still emit findings — correct, since they may contain real problems.
  • Test correctness: TestEmit_SkipsCleanVerdictLens constructs a specialistResult{verdict: verdictClean} with path:line references in the markdown, and asserts /runs is hit once (run still recorded) but /reports is never hit (no findings extracted). The test server and env setup are consistent with the existing TestEmit_PostsRunsAndReports.
  • No swallowed errors or missing cleanup: The continue skips only findings extraction for clean lenses; the /runs POST and all error logging are unaffected.

No error-handling or edge-case issues in this change.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 16s

<!-- gadfly-review:ollama:glm-5.1:cloud --> ### 🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **VERDICT: No material issues found** The change is a straightforward, type-safe gate: `r.verdict == verdictClean` compares two `verdict`-typed ints against a const, so there's no string injection risk. The logic correctly: - Still posts `/runs` for a clean lens (no auth or telemetry gap). - Skips `/reports` only when verdict is `verdictClean`; `verdictUnknown` (unparseable output) falls through to the findings path — the safer default. - Reduces outbound data, which is strictly a surface reduction, not an expansion. I verified the verdict type definition (`consolidate.go:9-16`), the `parseVerdict` function, the `postJSON` auth handling, and the test's coverage of the new branch. No auth bypass, injection, SSRF, secret leakage, or unsafe-untrusted-data path is introduced. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **VERDICT: No material issues found** The change is logically sound: - `verdictClean` (value 1, defined in `consolidate.go:13`) is the correct constant to check — it maps to "No material issues found" and is only assigned when `parseVerdict` detects that exact phrase in a lens's output. - The `continue` at `emit.go:130` is placed correctly: after the `errored` skip (line 121–123) and before `r.verdict.label()` and `parseFindings` (lines 132–133), so no severity label or findings are produced for clean lenses. - The `/runs` POST (line 153) remains unconditional — only the `/reports` section is gated by the `continue`. This matches the stated intent ("the run itself is still recorded"). - When all lenses are clean, `reports` stays nil, so `len(reports) > 0` (line 156) prevents a `/reports` POST entirely, consistent with the test's assertion of 0 `/reports` calls. - `verdictUnknown` lenses are intentionally unchanged (their path:line references might be real issues expressed in an unrecognized format), matching the PR description. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> **VERDICT: No material issues found** I verified the following through code inspection: - **`verdictClean` constant and type**: Defined in `consolidate.go:13` as `verdictClean` (iota value 1, distinct from `verdictUnknown`=0), so the `== verdictClean` comparison won't accidentally match a zero-initialized or unknown verdict. ✅ - **Ordering vs. `errored` check**: The `r.errored` guard (line 121) runs before the `verdictClean` guard (line 129), so an errored lens is always skipped regardless of verdict. ✅ - **`verdictUnknown` lenses**: Not equal to `verdictClean` (different iota values), so unknown-verdict lenses still emit findings — correct, since they may contain real problems. ✅ - **Test correctness**: `TestEmit_SkipsCleanVerdictLens` constructs a `specialistResult{verdict: verdictClean}` with path:line references in the markdown, and asserts `/runs` is hit once (run still recorded) but `/reports` is never hit (no findings extracted). The test server and env setup are consistent with the existing `TestEmit_PostsRunsAndReports`. ✅ - **No swallowed errors or missing cleanup**: The `continue` skips only findings extraction for clean lenses; the `/runs` POST and all error logging are unaffected. ✅ No error-handling or edge-case issues in this change. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 16s</sub>

🪰 Gadfly review — m5/qwen3.6:35b-mlx (m5)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

Summary: Through the 🔒 Security lens, the diff introduces a single continue guard (if r.verdict == verdictClean { continue }) in cmd/gadfly/emit.go:129-130 that skips building reportPayload entries for clean-verdict lenses. No new code paths, no new input sources, no change to auth/token handling or HTTP behavior.

Verified details:

  • GADFLY_FINDINGS_URL (env var) → base string at emit.go:97; appended with /runs and /reports. Standard stdlib net/http POST with a fixed 10s timeout (findingsHTTPTimeout). No change.
  • GADFLY_FINDINGS_TOKEN passed to postJSON unchanged. No new secret exposure or leakage introduced.
  • verdictClean is an internal const in consolidate.go:13; the check is a simple integer comparison against a field already present on specialistResult.
  • The parseFindings function (which extracts path:line references via regex) is still called for non-clean lenses exactly as before.

This change does not introduce any authn/authz gaps, injection vectors, SSRF risks, secret leakage, or other security regressions. It is a benign data-quality guardrail on the existing best-effort telemetry pipeline.

🎯 Correctness — No material issues found

No material issues found

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

Through the Error handling & edge cases lens, this change is sound:

  • Logic correctness: The new verdictClean check (emit.go:129) sits correctly between the existing errored early-exit and the parseFindings call. When a lens reports clean, findings extraction is skipped entirely — preventing false-positive path:line references (verification notes) from polluting the /reports store. Verified against consolidate.go:13 where verdictClean is defined as the second iota value (clean=1 < minor=2 < blocking=3), and cross-referenced all other uses in the codebase (lens_concurrency_test.go, specialists_test.go) to confirm it's the intended "No material issues found" sentinel.

  • Edge cases handled:

    • All lenses clean: reports remains empty; /runs is still POSTed (line 153 fires unconditionally) while /reports is skipped by the len(reports) > 0 guard (line 156). No change in behavior — correct.
    • Mixed lenses (some clean, some not): Only non-clean/non-errored lenses contribute findings. The existing error handling (r.errored continue, line 121) still fires first, preserving the prior contract that errored lenses emit nothing.
    • Empty results slice: Loop body never enters; identical to pre-change behavior. No nil-pointer risk since results is []specialistResult (value type), not pointer slice.
  • Test coverage: TestEmit_SkipsCleanVerdictLens directly verifies the intended behavior — clean lens → /runs posted, /reports not. The test sets up a realistic clean markdown with path:line verification notes and confirms zero findings are emitted. Good test design.

  • Error handling preserved: No new error paths introduced. The postJSON timeout (10s stdlib), stderr-only logging, and best-effort semantics all remain untouched by this control-flow addition.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 23s

<!-- gadfly-review:ollama:m5/qwen3.6:35b-mlx --> ### 🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## No material issues found **Summary:** Through the 🔒 Security lens, the diff introduces a single `continue` guard (`if r.verdict == verdictClean { continue }`) in `cmd/gadfly/emit.go:129-130` that skips building `reportPayload` entries for clean-verdict lenses. No new code paths, no new input sources, no change to auth/token handling or HTTP behavior. **Verified details:** - `GADFLY_FINDINGS_URL` (env var) → `base` string at `emit.go:97`; appended with `/runs` and `/reports`. Standard stdlib `net/http` POST with a fixed 10s timeout (`findingsHTTPTimeout`). No change. - `GADFLY_FINDINGS_TOKEN` passed to `postJSON` unchanged. No new secret exposure or leakage introduced. - `verdictClean` is an internal const in `consolidate.go:13`; the check is a simple integer comparison against a field already present on `specialistResult`. - The `parseFindings` function (which extracts `path:line` references via regex) is still called for non-clean lenses exactly as before. This change does not introduce any authn/authz gaps, injection vectors, SSRF risks, secret leakage, or other security regressions. It is a benign data-quality guardrail on the existing best-effort telemetry pipeline. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> No material issues found </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> **Verdict:** No material issues found Through the Error handling & edge cases lens, this change is sound: - **Logic correctness**: The new `verdictClean` check (`emit.go:129`) sits correctly between the existing `errored` early-exit and the `parseFindings` call. When a lens reports clean, findings extraction is skipped entirely — preventing false-positive `path:line` references (verification notes) from polluting the `/reports` store. Verified against `consolidate.go:13` where `verdictClean` is defined as the second iota value (clean=1 < minor=2 < blocking=3), and cross-referenced all other uses in the codebase (`lens_concurrency_test.go`, `specialists_test.go`) to confirm it's the intended "No material issues found" sentinel. - **Edge cases handled**: - *All lenses clean*: `reports` remains empty; `/runs` is still POSTed (line 153 fires unconditionally) while `/reports` is skipped by the `len(reports) > 0` guard (line 156). No change in behavior — correct. - *Mixed lenses* (some clean, some not): Only non-clean/non-errored lenses contribute findings. The existing error handling (`r.errored` continue, line 121) still fires first, preserving the prior contract that errored lenses emit nothing. - *Empty results slice*: Loop body never enters; identical to pre-change behavior. No nil-pointer risk since `results` is `[]specialistResult` (value type), not pointer slice. - **Test coverage**: `TestEmit_SkipsCleanVerdictLens` directly verifies the intended behavior — clean lens → `/runs` posted, `/reports` not. The test sets up a realistic clean markdown with `path:line` verification notes and confirms zero findings are emitted. Good test design. - **Error handling preserved**: No new error paths introduced. The `postJSON` timeout (10s stdlib), stderr-only logging, and best-effort semantics all remain untouched by this control-flow addition. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 8m 23s</sub>

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

  • The diff adds a single continue guard (cmd/gadfly/emit.go:129) that skips findings extraction for lenses whose verdict is verdictClean. From a security standpoint this is a pure output-filter narrowing: it reduces what is posted to the telemetry endpoint, introduces no new inputs, no parsing, no deserialization, no auth/authz surface, and no change to secret/token handling (GADFLY_FINDINGS_TOKEN is read identically before/after). Verified by reading emit.go:100-161 and the verdict definitions in consolidate.go:12-15verdictClean is an internal enum value derived from the model's parsed output, not from any untrusted external input, so it cannot be tainted/injected to suppress findings maliciously.
  • The new test (emit_test.go) uses httptest over localhost and t.Setenv for config; no credentials are logged or leaked, and the test server returns canned responses. Nothing insecure.
  • README change only documents behavior; no security-relevant default changes.
🎯 Correctness — No material issues found

VERDICT: No material issues found

  • Verified emit.go:129 against consolidate.go:11-16: verdictClean is the correct constant for the "No material issues found" verdict, and r.verdict is the field actually populated by parseVerdict in main.go:265. The continue correctly skips findings extraction while leaving the run record intact (the /runs POST at line 153 is outside the loop and unaffected).
  • Verified the loop logic: the new verdictClean check sits between the existing errored guard and the findings-extraction block, so errored lenses and minor/blocking/unknown lenses are unaffected exactly as the PR description claims.
  • The "improvements" lens is advisory and can land on verdictBlocking without escalating the overall verdict (consolidate.go:69), but it was never special-cased in emit before this change either — and now, if it reports clean ("No suggestions."), it correctly emits no findings, which is the intended behavior. No regression.
  • Test TestEmit_SkipsCleanVerdictLens is consistent with the implementation: it constructs a specialistResult with verdict: verdictClean containing a path:line reference and asserts /runs is hit once and /reports zero times. This matches the code path I verified.
🧯 Error handling & edge cases — No material issues found

No material issues found

  • Verified cmd/gadfly/consolidate.go:11-16 for the verdict ordering: verdictUnknown=0, verdictClean=1, verdictMinor=2, verdictBlocking=3. The new guard r.verdict == verdictClean correctly targets only the clean case; verdictUnknown (unparseable, non-errored) lenses still emit findings as intended, matching the PR description.
  • Verified the panic/errored path at cmd/gadfly/main.go:258: an errored lens sets verdict: verdictUnknown, errored: true, so it is already skipped by the pre-existing r.errored check before reaching the new verdictClean guard — no double-skip or misclassification.
  • The new guard is a pure value comparison on a value-type field (verdict is int), so there is no nil/zero-value risk; an unset specialistResult.verdict defaults to verdictUnknown (0), which is intentionally not skipped, so a malformed result still contributes findings rather than being silently dropped.
  • The test TestEmit_SkipsCleanVerdictLens covers the clean-with-verification-notes edge case (markdown contains a path:line reference but verdict is clean) and correctly asserts /runs is still posted while /reports is not.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 16s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> VERDICT: No material issues found - The diff adds a single `continue` guard (`cmd/gadfly/emit.go:129`) that skips findings extraction for lenses whose verdict is `verdictClean`. From a security standpoint this is a pure output-filter narrowing: it reduces what is posted to the telemetry endpoint, introduces no new inputs, no parsing, no deserialization, no auth/authz surface, and no change to secret/token handling (`GADFLY_FINDINGS_TOKEN` is read identically before/after). Verified by reading `emit.go:100-161` and the verdict definitions in `consolidate.go:12-15` — `verdictClean` is an internal enum value derived from the model's parsed output, not from any untrusted external input, so it cannot be tainted/injected to suppress findings maliciously. - The new test (`emit_test.go`) uses `httptest` over localhost and `t.Setenv` for config; no credentials are logged or leaked, and the test server returns canned responses. Nothing insecure. - README change only documents behavior; no security-relevant default changes. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **VERDICT: No material issues found** - Verified `emit.go:129` against `consolidate.go:11-16`: `verdictClean` is the correct constant for the "No material issues found" verdict, and `r.verdict` is the field actually populated by `parseVerdict` in `main.go:265`. The `continue` correctly skips findings extraction while leaving the run record intact (the `/runs` POST at line 153 is outside the loop and unaffected). - Verified the loop logic: the new `verdictClean` check sits between the existing `errored` guard and the findings-extraction block, so errored lenses and minor/blocking/unknown lenses are unaffected exactly as the PR description claims. - The "improvements" lens is advisory and can land on `verdictBlocking` without escalating the overall verdict (`consolidate.go:69`), but it was never special-cased in `emit` before this change either — and now, if it reports clean ("No suggestions."), it correctly emits no findings, which is the intended behavior. No regression. - Test `TestEmit_SkipsCleanVerdictLens` is consistent with the implementation: it constructs a `specialistResult` with `verdict: verdictClean` containing a `path:line` reference and asserts `/runs` is hit once and `/reports` zero times. This matches the code path I verified. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> No material issues found - Verified `cmd/gadfly/consolidate.go:11-16` for the verdict ordering: `verdictUnknown=0, verdictClean=1, verdictMinor=2, verdictBlocking=3`. The new guard `r.verdict == verdictClean` correctly targets only the clean case; `verdictUnknown` (unparseable, non-errored) lenses still emit findings as intended, matching the PR description. - Verified the panic/errored path at `cmd/gadfly/main.go:258`: an errored lens sets `verdict: verdictUnknown, errored: true`, so it is already skipped by the pre-existing `r.errored` check before reaching the new `verdictClean` guard — no double-skip or misclassification. - The new guard is a pure value comparison on a value-type field (`verdict` is `int`), so there is no nil/zero-value risk; an unset `specialistResult.verdict` defaults to `verdictUnknown` (0), which is intentionally *not* skipped, so a malformed result still contributes findings rather than being silently dropped. - The test `TestEmit_SkipsCleanVerdictLens` covers the clean-with-verification-notes edge case (markdown contains a `path:line` reference but verdict is clean) and correctly asserts `/runs` is still posted while `/reports` is not. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 16s</sub>

🪰 Gadfly review — claude-code/sonnet (claude-code)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

The security-relevant change here is narrow: a new continue branch in emit() (emit.go:129) that short-circuits findings extraction when r.verdict == verdictClean. I verified the full data-flow:

  • The verdict is already parsed from LLM output by parseVerdict() (consolidate.go:33–54) before emit() is called. The new branch compares against a Go iota constant — no new taint path is introduced.
  • The continue is a reduction in scope, not an expansion: a clean-verdict lens contributes no entries to reports, so no new data reaches the external telemetry endpoint.
  • No new environment variables, network endpoints, regex patterns, or deserialization surfaces are added.

The broader codebase has pre-existing questions (LLM-controlled file paths POSTed to the findings store without path validation, verdict-spoofing via substring injection) but those are untouched by this diff and belong to a separate audit.

🎯 Correctness — No material issues found

No material issues found

Verified against the actual code:

  • verdictClean is correctly identified. consolidate.go:13 defines verdictClean as the iota value for "No material issues found", and parseVerdict (consolidate.go:33–54) sets it when "no material issue" appears earliest in the output. The new continue at emit.go:129–131 correctly targets exactly this case.

  • Ordering of guards is correct. The r.errored check (emit.go:121) fires first, the r.verdict == verdictClean check second. These two paths are disjoint in practice (errored results carry verdictUnknown, not verdictClean), and even if both flags were somehow set simultaneously the earlier guard handles it cleanly — no interaction bug.

  • /runs is always posted; /reports is conditional. emit() posts /runs unconditionally (line 153), then posts /reports only when len(reports) > 0 (line 156). For a clean lens, the new continue means no reportPayload is appended → reports stays empty → /reports is never called. This is precisely what the test asserts, and the logic is sound.

  • verdictUnknown lenses still contribute findings — they are not verdictClean (iota 0 vs 1), so the new guard doesn't silence them. Whether that's ideal is a pre-existing design question, not a regression from this change.

  • Test is correct. TestEmit_SkipsCleanVerdictLens constructs a specialistResult with verdict: verdictClean and a markdown body containing a path:line reference, then asserts exactly 1 /runs hit and 0 /reports hits — matching the intended invariant.

🧯 Error handling & edge cases — No material issues found

I now have everything I need. The control flow is clear: findings are collected in a loop first, then /runs is posted once outside the loop, then /reports only if len(reports) > 0. Let me render the review.


Verdict: No material issues found

The change is structurally sound through the error-handling & edge-cases lens:

  • Control-flow ordering is correct. The /runs POST happens after the results loop (line 153), so the continue at line 129 only skips appending to reports — it does not suppress the run record. Verified by reading emit.go:119–161.

  • Zero-value safety is not a concern. verdictClean = 1 (second in the iota); the zero value is verdictUnknown = 0. An accidentally-uninitialized verdict cannot silently match the clean guard. Verified by reading consolidate.go.

  • Errored-then-clean ordering is safe. The r.errored guard at line 121 precedes the clean check — an errored lens with a coincidentally-clean verdict would already be skipped. No double-path issue.

  • Empty-reports guard is intact. The existing if len(reports) > 0 at line 156 already handles the all-clean/all-errored case and prevents a vacuous /reports POST. The new path falls cleanly through this guard.

  • Test is race-free. emit() is synchronous (single-threaded callers), and the test correctly uses atomic.AddInt32/LoadInt32 as a defensive measure; httptest.NewServer handles requests serially in a goroutine the HTTP client blocks on. No timing window.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 52s

<!-- gadfly-review:ollama:claude-code/sonnet --> ### 🪰 Gadfly review — `claude-code/sonnet` (claude-code) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **No material issues found** The security-relevant change here is narrow: a new `continue` branch in `emit()` (`emit.go:129`) that short-circuits findings extraction when `r.verdict == verdictClean`. I verified the full data-flow: - The verdict is already parsed from LLM output by `parseVerdict()` (consolidate.go:33–54) before `emit()` is called. The new branch compares against a Go `iota` constant — no new taint path is introduced. - The `continue` is a reduction in scope, not an expansion: a clean-verdict lens contributes no entries to `reports`, so no new data reaches the external telemetry endpoint. - No new environment variables, network endpoints, regex patterns, or deserialization surfaces are added. The broader codebase has pre-existing questions (LLM-controlled file paths POSTed to the findings store without path validation, verdict-spoofing via substring injection) but those are untouched by this diff and belong to a separate audit. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **No material issues found** Verified against the actual code: - **`verdictClean` is correctly identified.** `consolidate.go:13` defines `verdictClean` as the iota value for "No material issues found", and `parseVerdict` (consolidate.go:33–54) sets it when "no material issue" appears earliest in the output. The new `continue` at `emit.go:129–131` correctly targets exactly this case. - **Ordering of guards is correct.** The `r.errored` check (emit.go:121) fires first, the `r.verdict == verdictClean` check second. These two paths are disjoint in practice (errored results carry `verdictUnknown`, not `verdictClean`), and even if both flags were somehow set simultaneously the earlier guard handles it cleanly — no interaction bug. - **`/runs` is always posted; `/reports` is conditional.** `emit()` posts `/runs` unconditionally (line 153), then posts `/reports` only when `len(reports) > 0` (line 156). For a clean lens, the new `continue` means no `reportPayload` is appended → `reports` stays empty → `/reports` is never called. This is precisely what the test asserts, and the logic is sound. - **`verdictUnknown` lenses still contribute findings** — they are not `verdictClean` (iota 0 vs 1), so the new guard doesn't silence them. Whether that's ideal is a pre-existing design question, not a regression from this change. - **Test is correct.** `TestEmit_SkipsCleanVerdictLens` constructs a `specialistResult` with `verdict: verdictClean` and a markdown body containing a `path:line` reference, then asserts exactly 1 `/runs` hit and 0 `/reports` hits — matching the intended invariant. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> I now have everything I need. The control flow is clear: findings are collected in a loop first, then `/runs` is posted once outside the loop, then `/reports` only if `len(reports) > 0`. Let me render the review. --- **Verdict: No material issues found** The change is structurally sound through the error-handling & edge-cases lens: - **Control-flow ordering is correct.** The `/runs` POST happens *after* the results loop (line 153), so the `continue` at line 129 only skips appending to `reports` — it does not suppress the run record. Verified by reading `emit.go:119–161`. - **Zero-value safety is not a concern.** `verdictClean = 1` (second in the `iota`); the zero value is `verdictUnknown = 0`. An accidentally-uninitialized verdict cannot silently match the clean guard. Verified by reading `consolidate.go`. - **Errored-then-clean ordering is safe.** The `r.errored` guard at line 121 precedes the clean check — an errored lens with a coincidentally-clean verdict would already be skipped. No double-path issue. - **Empty-reports guard is intact.** The existing `if len(reports) > 0` at line 156 already handles the all-clean/all-errored case and prevents a vacuous `/reports` POST. The new path falls cleanly through this guard. - **Test is race-free.** `emit()` is synchronous (single-threaded callers), and the test correctly uses `atomic.AddInt32`/`LoadInt32` as a defensive measure; `httptest.NewServer` handles requests serially in a goroutine the HTTP client blocks on. No timing window. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 52s</sub>

🪰 Gadfly review — kimi-k2.7-code:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found

I reviewed the change through the 🔒 Security lens. The PR only affects telemetry emission logic:

  • It adds a verdictClean skip in cmd/gadfly/emit.go:129 so clean lenses don't POST heuristic path:line extractions as findings.
  • It adds a test in cmd/gadfly/emit_test.go:185 and updates README documentation.

I verified:

  • verdictClean is declared in cmd/gadfly/consolidate.go:13 and the comparison is well-typed.
  • The HTTP client setup, token handling via GADFLY_FINDINGS_TOKEN, 10-second timeout, and stderr-only error logging in emit.go:150–160 are unchanged.
  • No untrusted data is parsed, executed, deserialized, or used to construct paths/URLs/DB queries.
  • No secrets are logged or exposed in new code paths.

The change does not introduce authn/authz gaps, injection, SSRF, unsafe deserialization, credential leakage, missing input validation, or insecure defaults. From a security perspective, this PR is clean.

🎯 Correctness — No material issues found

No material issues found.

🧯 Error handling & edge cases — No material issues found

No material issues found

No error-handling or edge-case issues in this change. Verified:

  • The clean-verdict skip is evaluated after the r.errored guard (cmd/gadfly/emit.go:121), so errored lenses still contribute nothing and a clean-but-errored result does not leak into findings.
  • The skip occurs before severity lookup / parseFindings (cmd/gadfly/emit.go:129), so a clean lens correctly emits zero reports while the /runs payload is still posted.
  • Empty or nil results behaves the same as before: the loop is a no-op, /runs is still posted.
  • The new TestEmit_SkipsCleanVerdictLens exercises the mixed clean+path:line case and asserts /reports is not hit.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 4s

<!-- gadfly-review:ollama:kimi-k2.7-code:cloud --> ### 🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **No material issues found** I reviewed the change through the 🔒 Security lens. The PR only affects telemetry emission logic: - It adds a `verdictClean` skip in `cmd/gadfly/emit.go:129` so clean lenses don't POST heuristic `path:line` extractions as findings. - It adds a test in `cmd/gadfly/emit_test.go:185` and updates README documentation. I verified: - `verdictClean` is declared in `cmd/gadfly/consolidate.go:13` and the comparison is well-typed. - The HTTP client setup, token handling via `GADFLY_FINDINGS_TOKEN`, 10-second timeout, and stderr-only error logging in `emit.go:150–160` are unchanged. - No untrusted data is parsed, executed, deserialized, or used to construct paths/URLs/DB queries. - No secrets are logged or exposed in new code paths. The change does not introduce authn/authz gaps, injection, SSRF, unsafe deserialization, credential leakage, missing input validation, or insecure defaults. From a security perspective, this PR is clean. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> No material issues found. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> No material issues found No error-handling or edge-case issues in this change. Verified: - The clean-verdict skip is evaluated **after** the `r.errored` guard (`cmd/gadfly/emit.go:121`), so errored lenses still contribute nothing and a clean-but-errored result does not leak into findings. - The skip occurs **before** severity lookup / `parseFindings` (`cmd/gadfly/emit.go:129`), so a clean lens correctly emits zero reports while the `/runs` payload is still posted. - Empty or nil `results` behaves the same as before: the loop is a no-op, `/runs` is still posted. - The new `TestEmit_SkipsCleanVerdictLens` exercises the mixed clean+path:line case and asserts `/reports` is not hit. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 4s</sub>

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

The change is purely a telemetry filtering decision — it skips findings extraction for lenses whose self-declared verdict is verdictClean. I reviewed the full emit.go, consolidate.go, and both test files. Through the security lens:

  • No new input processing: The verdictClean check (r.verdict == verdictClean) is an integer comparison against an enum derived from parseVerdict(), which itself does substring matching on AI-generated output. No new untrusted data enters the system.
  • No injection surface: The change doesn't alter how GADFLY_FINDINGS_URL, GADFLY_REPO, GADFLY_PR, GADFLY_MODEL, or GADFLY_FINDINGS_TOKEN are read or used. The runID construction (fmt.Sprintf("%s#%d:%s", ...)) is pre-existing and goes into a JSON field, not a URL path or query.
  • No secret exposure: The Bearer token handling in postJSON is untouched. No new logging or error paths that could leak the token.
  • No authz bypass: The /reports endpoint is still only called when there are actual findings to report. Skipping clean lenses doesn't grant or deny access to anything.
  • No SSRF change: GADFLY_FINDINGS_URL usage is unchanged.

The one theoretical concern — that a lens could be manipulated into declaring itself "clean" to suppress findings — is not a security issue here: the lens output is AI-generated (not attacker-controlled), the telemetry is explicitly best-effort/advisory, and the actual review output to stdout is unaffected.

🎯 Correctness — No material issues found

No material issues found

The change is a single, well-scoped guard: when a lens self-reports verdictClean, skip parseFindings for that lens. I verified the following against the checked-out code:

  • verdictClean constant (consolidate.go:13) is iota value 1, and the comparison r.verdict == verdictClean (emit.go:129) is correct.
  • Placement relative to errored (emit.go:121-123): the errored skip fires first, so a lens that both errored and somehow parsed as clean is still excluded — no regression.
  • /runs still posts: the run payload is built before the per-lens loop and posted unconditionally at line 153. The test (emit_test.go:214-216) confirms this.
  • /reports skipped when all lenses are clean: reports stays nil, and the len(reports) > 0 guard at line 156 prevents the POST. The test confirms zero /reports calls.
  • Non-clean verdicts unaffected: verdictMinor, verdictBlocking, and verdictUnknown all fall through to the existing parseFindings path.
  • parseVerdict leniency: the pre-existing earliest-match logic could theoretically misclassify a lens as clean if "no material issue" appears before a real finding phrase. That's a pre-existing concern in parseVerdict, not introduced here, and the PR's tradeoff (trust the lens's self-declared verdict) is explicitly the intended design.
🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

The change is minimal and well-defended. After reading emit.go in full and tracing the control flow:

  • The verdictClean skip is correctly placed after the r.errored guard (emit.go:121-131). An errored lens is already excluded before we reach the new check, so there's no double-counting or ordering ambiguity.
  • verdictUnknown (value 0) is intentionally not skipped — it's distinct from verdictClean (value 1), so a lens whose verdict couldn't be parsed still flows through parseFindings. This is conservative and matches the PR description ("Errored and minor/blocking/unknown lenses are unchanged").
  • The len(reports) > 0 guard at line 156 already prevents an empty /reports POST, so the new skip is belt-and-suspenders: it prevents parseFindings from being called at all for clean lenses, and even if something slipped through, an empty reports slice wouldn't hit the wire.
  • No nil, zero-value, or overflow riskr is a value type in the range loop, verdict is an int-based enum, and the comparison is a plain equality check.
  • The test (emit_test.go:185-220) correctly verifies that /runs is still posted (the run is recorded) while /reports is not, using markdown that contains a path:line reference a naive parser would extract.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 46s

<!-- gadfly-review:ollama:deepseek-v4-pro:cloud --> ### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## Verdict: No material issues found The change is purely a telemetry filtering decision — it skips findings extraction for lenses whose self-declared verdict is `verdictClean`. I reviewed the full `emit.go`, `consolidate.go`, and both test files. Through the security lens: - **No new input processing**: The `verdictClean` check (`r.verdict == verdictClean`) is an integer comparison against an enum derived from `parseVerdict()`, which itself does substring matching on AI-generated output. No new untrusted data enters the system. - **No injection surface**: The change doesn't alter how `GADFLY_FINDINGS_URL`, `GADFLY_REPO`, `GADFLY_PR`, `GADFLY_MODEL`, or `GADFLY_FINDINGS_TOKEN` are read or used. The `runID` construction (`fmt.Sprintf("%s#%d:%s", ...)`) is pre-existing and goes into a JSON field, not a URL path or query. - **No secret exposure**: The Bearer token handling in `postJSON` is untouched. No new logging or error paths that could leak the token. - **No authz bypass**: The `/reports` endpoint is still only called when there are actual findings to report. Skipping clean lenses doesn't grant or deny access to anything. - **No SSRF change**: `GADFLY_FINDINGS_URL` usage is unchanged. The one theoretical concern — that a lens could be manipulated into declaring itself "clean" to suppress findings — is not a security issue here: the lens output is AI-generated (not attacker-controlled), the telemetry is explicitly best-effort/advisory, and the actual review output to stdout is unaffected. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **No material issues found** The change is a single, well-scoped guard: when a lens self-reports `verdictClean`, skip `parseFindings` for that lens. I verified the following against the checked-out code: - **`verdictClean` constant** (`consolidate.go:13`) is `iota` value 1, and the comparison `r.verdict == verdictClean` (`emit.go:129`) is correct. - **Placement relative to `errored`** (`emit.go:121-123`): the errored skip fires first, so a lens that both errored and somehow parsed as clean is still excluded — no regression. - **`/runs` still posts**: the run payload is built before the per-lens loop and posted unconditionally at line 153. The test (`emit_test.go:214-216`) confirms this. - **`/reports` skipped when all lenses are clean**: `reports` stays nil, and the `len(reports) > 0` guard at line 156 prevents the POST. The test confirms zero `/reports` calls. - **Non-clean verdicts unaffected**: `verdictMinor`, `verdictBlocking`, and `verdictUnknown` all fall through to the existing `parseFindings` path. - **`parseVerdict` leniency**: the pre-existing earliest-match logic could theoretically misclassify a lens as clean if "no material issue" appears before a real finding phrase. That's a pre-existing concern in `parseVerdict`, not introduced here, and the PR's tradeoff (trust the lens's self-declared verdict) is explicitly the intended design. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> ## Verdict: No material issues found The change is minimal and well-defended. After reading `emit.go` in full and tracing the control flow: - **The `verdictClean` skip is correctly placed after the `r.errored` guard** (`emit.go:121-131`). An errored lens is already excluded before we reach the new check, so there's no double-counting or ordering ambiguity. - **`verdictUnknown` (value 0) is intentionally not skipped** — it's distinct from `verdictClean` (value 1), so a lens whose verdict couldn't be parsed still flows through `parseFindings`. This is conservative and matches the PR description ("Errored and minor/blocking/unknown lenses are unchanged"). - **The `len(reports) > 0` guard at line 156** already prevents an empty `/reports` POST, so the new skip is belt-and-suspenders: it prevents `parseFindings` from being called at all for clean lenses, and even if something slipped through, an empty reports slice wouldn't hit the wire. - **No nil, zero-value, or overflow risk** — `r` is a value type in the range loop, `verdict` is an `int`-based enum, and the comparison is a plain equality check. - **The test** (`emit_test.go:185-220`) correctly verifies that `/runs` is still posted (the run is recorded) while `/reports` is not, using markdown that contains a `path:line` reference a naive parser would extract. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 46s</sub>

🪰 Gadfly review — nemotron-3-super:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Security Lens Review: No Material Issues Found

After reviewing the changes through the security lens (focusing on authn/authz gaps, injection flaws, SSRF, unsafe deserialization, secret leakage, input validation, and insecure defaults), no material security issues were identified.

Findings Summary

  • No security-relevant changes: The modification solely adjusts telemetry emission logic to skip findings extraction for lenses reporting "No material issues found". This affects only advisory telemetry reporting (/reports endpoint), not the core security review functionality.
  • No change to attack surface:
    • The telemetry endpoint (GADFLY_FINDINGS_URL) remains gated by environment variable and unchanged in its implementation
    • Authentication (GADFLY_FINDINGS_TOKEN) and timeout handling are unaltered
    • Finding extraction logic (parseFindings) is unchanged for non-clean verdicts
  • No new taint paths: The change only adds an early continue for clean verdicts - it doesn't alter how untrusted lens output is processed or where it flows
  • Telemetry safety:
    • Reduces potential leakage of file paths/line numbers (by omitting verification notes from clean lenses)
    • Does not affect the actual review output (stdout) or exit code
    • Purely impacts advisory model quality tracking

Verification

  • Verified the added condition in emit.go (cmd/gadfly/emit.go:129-131 correctly skips report generation for verdictClean
  • Confirmed telemetry posting still occurs for /runs (run metadata) even when skipping /reports (findings)
  • Reviewed that finding extraction logic remains unchanged for non-clean lenses
  • No authentication/authorization mechanisms were modified
  • No input validation or sanitization code was altered

The change is purely a telemetry refinement to reduce false positives in findings data and does not introduce any security weaknesses.

Verdict: No material issues found

🎯 Correctness — No material issues found

No material issues found

(Note: Since no draft review findings were provided in the user's message, there are no findings to verify. The verdict is computed as "No material issues found" when no findings survive verification.)

🧯 Error handling & edge cases — No material issues found

No material issues found

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 6s

<!-- gadfly-review:ollama:nemotron-3-super:cloud --> ### 🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> ## Security Lens Review: No Material Issues Found After reviewing the changes through the security lens (focusing on authn/authz gaps, injection flaws, SSRF, unsafe deserialization, secret leakage, input validation, and insecure defaults), no material security issues were identified. ### Findings Summary - **No security-relevant changes**: The modification solely adjusts telemetry emission logic to skip findings extraction for lenses reporting "No material issues found". This affects only advisory telemetry reporting (`/reports` endpoint), not the core security review functionality. - **No change to attack surface**: - The telemetry endpoint (`GADFLY_FINDINGS_URL`) remains gated by environment variable and unchanged in its implementation - Authentication (`GADFLY_FINDINGS_TOKEN`) and timeout handling are unaltered - Finding extraction logic (`parseFindings`) is unchanged for non-clean verdicts - **No new taint paths**: The change only adds an early continue for clean verdicts - it doesn't alter how untrusted lens output is processed or where it flows - **Telemetry safety**: - Reduces potential leakage of file paths/line numbers (by omitting verification notes from clean lenses) - Does not affect the actual review output (stdout) or exit code - Purely impacts advisory model quality tracking ### Verification - Verified the added condition in `emit.go` (cmd/gadfly/emit.go:129-131 correctly skips report generation for `verdictClean` - Confirmed telemetry posting still occurs for `/runs` (run metadata) even when skipping `/reports` (findings) - Reviewed that finding extraction logic remains unchanged for non-clean lenses - No authentication/authorization mechanisms were modified - No input validation or sanitization code was altered The change is purely a telemetry refinement to reduce false positives in findings data and does not introduce any security weaknesses. **Verdict**: No material issues found </think> </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> No material issues found (Note: Since no draft review findings were provided in the user's message, there are no findings to verify. The verdict is computed as "No material issues found" when no findings survive verification.) </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> No material issues found </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 6s</sub>

🪰 Gadfly review — gpt-oss:120b-cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

  • The modification merely adds a check to skip emitting findings when a lens reports the "No material issues found" verdict.
  • No new handling of untrusted input, credential exposure, injection vectors, or insecure defaults is introduced.
  • All existing security‑related code paths (environment variable handling, HTTP POST logic, regular‑expression parsing) remain unchanged and continue to enforce the same validations.

Therefore, the change does not create any security regression.

🎯 Correctness — No material issues found

Verdict: No material issues found

  • The new early‑continue for verdictClean correctly prevents emission of findings while still allowing the run record to be posted.
  • No compile‑time problems: verdictClean is defined in the same package (cmd/gadfly) and requires no additional imports.
  • Logic flow remains sound: errored lenses are still skipped first, then clean lenses are skipped, leaving only non‑clean lenses to contribute findings.
  • Existing tests (TestEmit_SkipsCleanVerdictLens) confirm the intended behavior, and the implementation respects the maxFindingsPerLens and other limits unchanged.

No correctness defects identified.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

  • The new early‑return for clean lenses (if r.verdict == verdictClean { continue }) correctly prevents any findings from being emitted while still allowing the /runs telemetry to be posted.
  • All related error handling (skipping errored lenses, checking HTTP response status, and respecting the telemetry enable flag) remains unchanged.
  • No new edge cases (e.g., nil content, empty slices, zero/negative values) are introduced by this change, and the existing postJSON, envAtoi, and parsing logic continue to handle errors and boundary conditions as before.

All checks were verified by reading the updated emit.go implementation and confirming that the new logic does not introduce unchecked errors or missed cleanup.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 43s

<!-- gadfly-review:ollama:gpt-oss:120b-cloud --> ### 🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> **Verdict:** No material issues found - The modification merely adds a check to skip emitting findings when a lens reports the *"No material issues found"* verdict. - No new handling of untrusted input, credential exposure, injection vectors, or insecure defaults is introduced. - All existing security‑related code paths (environment variable handling, HTTP POST logic, regular‑expression parsing) remain unchanged and continue to enforce the same validations. Therefore, the change does not create any security regression. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **Verdict:** No material issues found - The new early‑continue for `verdictClean` correctly prevents emission of findings while still allowing the run record to be posted. - No compile‑time problems: `verdictClean` is defined in the same package (`cmd/gadfly`) and requires no additional imports. - Logic flow remains sound: errored lenses are still skipped first, then clean lenses are skipped, leaving only non‑clean lenses to contribute findings. - Existing tests (`TestEmit_SkipsCleanVerdictLens`) confirm the intended behavior, and the implementation respects the `maxFindingsPerLens` and other limits unchanged. No correctness defects identified. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> **Verdict: No material issues found** - The new early‑return for clean lenses (`if r.verdict == verdictClean { continue }`) correctly prevents any findings from being emitted while still allowing the `/runs` telemetry to be posted. - All related error handling (skipping errored lenses, checking HTTP response status, and respecting the telemetry enable flag) remains unchanged. - No new edge cases (e.g., nil content, empty slices, zero/negative values) are introduced by this change, and the existing `postJSON`, `envAtoi`, and parsing logic continue to handle errors and boundary conditions as before. All checks were verified by reading the updated `emit.go` implementation and confirming that the new logic does not introduce unchecked errors or missed cleanup. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 43s</sub>

🪰 Gadfly review — qwen3-coder:480b-cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

No material issues found.

🎯 Correctness — No material issues found

VERDICT: No material issues found

The change correctly implements the intended behavior of skipping findings extraction for lenses with a "No material issues found" verdict. The logic is sound and the new test case properly validates that:

  1. A clean lens does not emit findings (verified by checking that /reports is not called)
  2. The run is still recorded (verified by checking that /runs is called once)

The README update accurately documents this behavior. No correctness issues were found in the implementation or tests.

🧯 Error handling & edge cases — No material issues found

No material issues found.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 38s

<!-- gadfly-review:ollama:qwen3-coder:480b-cloud --> ### 🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> No material issues found. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> VERDICT: No material issues found The change correctly implements the intended behavior of skipping findings extraction for lenses with a "No material issues found" verdict. The logic is sound and the new test case properly validates that: 1. A clean lens does not emit findings (verified by checking that `/reports` is not called) 2. The run is still recorded (verified by checking that `/runs` is called once) The README update accurately documents this behavior. No correctness issues were found in the implementation or tests. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> No material issues found. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 38s</sub>

🪰 Gadfly review — gemma4:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

🎯 Correctness — No material issues found

VERDICT: No material issues found

The logic change correctly prevents findings extraction for lenses with a verdictClean status, which aligns with the goal of treating path:line references in "No material issues found" reviews as verification notes rather than bugs.

  • Verified that emit still proceeds to post to /runs even when a lens is skipped via the verdictClean check, as the /runs payload is constructed independently of the results loop (lines 108-117 in cmd/gadfly/emit.go) and posted afterward (line 153).
  • Verified that the new test TestEmit_SkipsCleanVerdictLens in cmd/gadfly/emit_test.go accurately simulates this behavior by asserting that /runs is called exactly once while /reports is not called.
  • The change is narrowly scoped to the telemetry extraction logic and does not affect the actual review output presented to the user.
🧯 Error handling & edge cases — No material issues found

No material issues found

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9s

<!-- gadfly-review:ollama:gemma4:cloud --> ### 🪰 Gadfly review — `gemma4:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> VERDICT: No material issues found </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> VERDICT: No material issues found The logic change correctly prevents findings extraction for lenses with a `verdictClean` status, which aligns with the goal of treating `path:line` references in "No material issues found" reviews as verification notes rather than bugs. - Verified that `emit` still proceeds to post to `/runs` even when a lens is skipped via the `verdictClean` check, as the `/runs` payload is constructed independently of the `results` loop (lines 108-117 in `cmd/gadfly/emit.go`) and posted afterward (line 153). - Verified that the new test `TestEmit_SkipsCleanVerdictLens` in `cmd/gadfly/emit_test.go` accurately simulates this behavior by asserting that `/runs` is called exactly once while `/reports` is not called. - The change is narrowly scoped to the telemetry extraction logic and does not affect the actual review output presented to the user. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> No material issues found </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9s</sub>
steve added 1 commit 2026-06-27 22:07:55 +00:00
ci: trim the dogfood swarm to the strong reviewers
Build & push image / build-and-push (pull_request) Successful in 7s
c87f97b7b2
Drop the low-signal reviewers from gadfly's own swarm per the scoreboard:
- m5/qwen3.6:35b-mlx (slow local Mac; removes the last foreman/local lane,
  so GADFLY_ENDPOINT_M5 + the m5 concurrency lane are gone)
- gemma4:cloud, gpt-oss:120b-cloud, kimi-k2.7-code:cloud (weak cloud models)

Remaining: minimax-m3, glm-5.2, glm-5.1, deepseek-v4-pro, nemotron-3-super,
qwen3-coder:480b + claude-code/sonnet — 6 cloud + the Claude Code engine,
all fast cloud lanes. Takes effect once this merges to main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve changed title from fix: don't emit findings from clean "No material issues" lenses to fix: clean-lens findings + trim the dogfood swarm to strong reviewers 2026-06-27 22:08:15 +00:00
steve merged commit 80d8f53f63 into main 2026-06-27 22:14:08 +00:00
steve deleted branch fix/clean-lens-no-findings 2026-06-27 22:14:08 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/gadfly#4