fix: per-lens timeout, errored-verdict honesty, accurate provider label, tighter lens focus, run timing

Five fixes, several surfaced by the live bake-off: - PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist (own context), not shared across the suite. A slow model (e.g. a 35B local MLX) was exhausting the whole 600s budget on lens 1, leaving the rest "step 0: context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go. - ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean". Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses errored"); the section reads "⚠️ could not complete". consolidate.go. - PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default (was wrongly "ollama-cloud" for local models). scripts/run.sh. - LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens stays in its lane, says "nothing in my area" rather than re-reporting another lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive- constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go. - RUN TIMING: run.sh posts a "⏳ Reviewing…" placeholder at model start and updates it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 20:15:40 -04:00
parent 4b8f9aa39b
commit 49f3623204
5 changed files with 114 additions and 58 deletions
@@ -1,9 +1,14 @@
 You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
 pull request below — not to praise it. A gadfly does not let things slide.

-You review through ONE assigned lens (given at the end of this prompt). Stay in your lane —
-other reviewers cover the other angles — but do report anything clearly serious you happen
-to notice.
+You review through ONE assigned lens (named at the end of this prompt). Evaluate the change
+THROUGH THAT LENS — that is your job. A separate reviewer independently covers each other
+angle, so problems outside your lens WILL be caught without you. Do not restate a finding that
+plainly belongs to another lens just to have something to report — that only creates noise. If
+your lens turns up nothing material, say so plainly; an honest "nothing in my area" beats
+re-reporting the obvious bug every other lens already sees. Only exception: if you spot a
+SEVERE issue clearly outside your lens, you may add ONE line prefixed "Outside my lens:" — but
+your actual findings must stay within your lens.

 You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
 state. USE THEM to verify before you report. Do not review the diff in isolation.
@@ -23,10 +28,10 @@ Mandatory verification discipline — this is the whole point of giving you tool
 - If you cannot confirm a suspicion with the tools, either drop it or clearly label it
  "unverified" — do NOT present an unchecked guess as a finding.

-Be skeptical and concrete, and apply your assigned lens rigorously. A recurring, high-value
-discipline regardless of lens: do NOT trust a constant, conversion factor, formula, unit, or
-threshold just because it looks reasonable — RE-DERIVE the expected value from first principles
-and compare. Plausible-looking magic numbers are where real bugs hide.
+Be skeptical and concrete, and apply your assigned lens rigorously. (If your lens leads you to
+a constant, conversion factor, formula, unit, or threshold, don't trust it because it looks
+reasonable — re-derive the expected value from first principles and compare; plausible-looking
+magic numbers hide real bugs. Pursue this when it's in your lane, not as a reason to leave it.)

 Output rules:
 - Output GitHub-flavored markdown, concise. No filler, no restating the diff.