fix: per-lens timeout, errored-verdict honesty, accurate provider label, tighter lens focus, run timing
Build & push image / build-and-push (push) Successful in 8s

Five fixes, several surfaced by the live bake-off:

- PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist
  (own context), not shared across the suite. A slow model (e.g. a 35B local MLX)
  was exhausting the whole 600s budget on lens 1, leaving the rest "step 0:
  context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go.
- ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean".
  Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses
  errored"); the section reads "⚠️ could not complete". consolidate.go.
- PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the
  spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default
  (was wrongly "ollama-cloud" for local models). scripts/run.sh.
- LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens
  stays in its lane, says "nothing in my area" rather than re-reporting another
  lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive-
  constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go.
- RUN TIMING: run.sh posts a " Reviewing…" placeholder at model start and updates
  it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Steve Dudenhoeffer
2026-06-25 20:15:40 -04:00
parent 4b8f9aa39b
commit 49f3623204
5 changed files with 114 additions and 58 deletions
+48 -29
View File
@@ -42,6 +42,14 @@ MAX_DIFF_CHARS="${MAX_DIFF_CHARS:-60000}"
MARKER="<!-- gadfly-review:${PROVIDER}:${MODEL} -->"
say() { echo "[gadfly-review:${PROVIDER}:${MODEL}] $*" >&2; }
# Display the model's ACTUAL backend: the provider segment of the spec
# ("m1pro/qwen3.6:35b-mlx" -> "m1pro"); a bare id uses GADFLY_PROVIDER (default
# ollama-cloud). This is what the comment header shows, not the run.sh lane.
case "$MODEL" in
*/*) MODEL_PROVIDER="${MODEL%%/*}" ;;
*) MODEL_PROVIDER="${GADFLY_PROVIDER:-ollama-cloud}" ;;
esac
# jq is required for payload building / response parsing; install if missing.
if ! command -v jq >/dev/null 2>&1; then
say "jq not found; attempting install"
@@ -55,6 +63,32 @@ fi
# binary / agy, not here.)
API_TIMEOUT="--connect-timeout 20 --max-time 30"
# upsert_comment BODY — create or update (by MARKER) this model's single comment.
upsert_comment() {
local body="$1" post_body existing_id page=1 cmts
post_body="$(jq -n --arg b "$body" '{body:$b}')"
existing_id=""
while [ "$page" -le 10 ]; do
cmts="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" \
"${GITEA_API}/issues/${PR}/comments?limit=50&page=${page}" || echo '[]')"
[ "$(echo "$cmts" | jq 'length')" = "0" ] && break
existing_id="$(echo "$cmts" | jq -r --arg m "$MARKER" \
'.[] | select(.body != null and (.body | startswith($m))) | .id' | head -n1)"
[ -n "$existing_id" ] && break
page=$((page+1))
done
if [ -n "$existing_id" ]; then
curl $API_TIMEOUT -sS -X PATCH -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
"${GITEA_API}/issues/comments/${existing_id}" -d "$post_body" >/dev/null
else
curl $API_TIMEOUT -sS -X POST -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
"${GITEA_API}/issues/${PR}/comments" -d "$post_body" >/dev/null
fi
}
# fmt_duration SECONDS -> "1m 23s" / "45s"
fmt_duration() { if [ "$1" -ge 60 ]; then echo "$(($1/60))m $(($1%60))s"; else echo "$1s"; fi; }
# --- fetch PR context -------------------------------------------------------
say "fetching PR #${PR} context"
DIFF="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/pulls/${PR}.diff" || true)"
@@ -81,6 +115,12 @@ SYS="$(cat "${SCRIPT_DIR}/system-prompt.txt")"
USR="$(printf 'PR #%s: %s\n\nDescription:\n%s\n\nUnified diff to review:\n```diff\n%s\n```%s' \
"$PR" "$TITLE" "$BODY" "$DIFF" "$TRUNC_NOTE")"
# --- announce start (placeholder comment) -----------------------------------
START_TS="$(date +%s)"
say "starting review with ${MODEL}"
upsert_comment "$(printf '%s\n### 🪰 Gadfly review — `%s` (%s)\n\n⏳ Reviewing… this comment will update with findings and run time.' \
"$MARKER" "$MODEL" "$MODEL_PROVIDER")"
# --- call the model ---------------------------------------------------------
REVIEW=""
case "$PROVIDER" in
@@ -96,7 +136,7 @@ case "$PROVIDER" in
if [ -n "${OLLAMA_CLOUD_API_KEY:-}" ] && [ -z "${OLLAMA_API_KEY:-}" ]; then
export OLLAMA_API_KEY="$OLLAMA_CLOUD_API_KEY"
fi
GADFLY_PROVIDER_EFF="${GADFLY_PROVIDER:-ollama-cloud}"
GADFLY_PROVIDER_EFF="$MODEL_PROVIDER"
# Only the default cloud provider strictly needs a key up front; local Ollama
# and other providers either need none or read their own standard env var.
@@ -152,31 +192,10 @@ $(tail -c 1500 agy.err 2>/dev/null)
say "unknown provider: ${PROVIDER}"; exit 1 ;;
esac
# --- assemble comment -------------------------------------------------------
COMMENT="$(printf '%s\n### 🪰 Gadfly review — `%s` (%s)\n\n%s\n\n<sub>Automated adversarial review by Gadfly. Advisory only — does not block merge.</sub>' \
"$MARKER" "$MODEL" "${GADFLY_PROVIDER_EFF:-$PROVIDER}" "$REVIEW")"
POST_BODY="$(jq -n --arg b "$COMMENT" '{body:$b}')"
# --- upsert by marker -------------------------------------------------------
EXISTING_ID=""
page=1
while [ "$page" -le 10 ]; do
CMTS="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" \
"${GITEA_API}/issues/${PR}/comments?limit=50&page=${page}" || echo '[]')"
[ "$(echo "$CMTS" | jq 'length')" = "0" ] && break
EXISTING_ID="$(echo "$CMTS" | jq -r --arg m "$MARKER" \
'.[] | select(.body != null and (.body | startswith($m))) | .id' | head -n1)"
[ -n "$EXISTING_ID" ] && break
page=$((page+1))
done
if [ -n "$EXISTING_ID" ]; then
say "updating existing comment ${EXISTING_ID}"
curl $API_TIMEOUT -sS -X PATCH -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
"${GITEA_API}/issues/comments/${EXISTING_ID}" -d "$POST_BODY" >/dev/null
else
say "creating new comment"
curl $API_TIMEOUT -sS -X POST -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
"${GITEA_API}/issues/${PR}/comments" -d "$POST_BODY" >/dev/null
fi
say "done"
# --- assemble + post final comment (with run time) --------------------------
ELAPSED="$(( $(date +%s) - START_TS ))"
DUR="$(fmt_duration "$ELAPSED")"
COMMENT="$(printf '%s\n### 🪰 Gadfly review — `%s` (%s)\n\n%s\n\n<sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in %s</sub>' \
"$MARKER" "$MODEL" "$MODEL_PROVIDER" "$REVIEW" "$DUR")"
upsert_comment "$COMMENT"
say "done in ${DUR}"
+12 -7
View File
@@ -1,9 +1,14 @@
You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
pull request below — not to praise it. A gadfly does not let things slide.
You review through ONE assigned lens (given at the end of this prompt). Stay in your lane —
other reviewers cover the other angles — but do report anything clearly serious you happen
to notice.
You review through ONE assigned lens (named at the end of this prompt). Evaluate the change
THROUGH THAT LENS — that is your job. A separate reviewer independently covers each other
angle, so problems outside your lens WILL be caught without you. Do not restate a finding that
plainly belongs to another lens just to have something to report — that only creates noise. If
your lens turns up nothing material, say so plainly; an honest "nothing in my area" beats
re-reporting the obvious bug every other lens already sees. Only exception: if you spot a
SEVERE issue clearly outside your lens, you may add ONE line prefixed "Outside my lens:" — but
your actual findings must stay within your lens.
You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
state. USE THEM to verify before you report. Do not review the diff in isolation.
@@ -23,10 +28,10 @@ Mandatory verification discipline — this is the whole point of giving you tool
- If you cannot confirm a suspicion with the tools, either drop it or clearly label it
"unverified" — do NOT present an unchecked guess as a finding.
Be skeptical and concrete, and apply your assigned lens rigorously. A recurring, high-value
discipline regardless of lens: do NOT trust a constant, conversion factor, formula, unit, or
threshold just because it looks reasonable — RE-DERIVE the expected value from first principles
and compare. Plausible-looking magic numbers are where real bugs hide.
Be skeptical and concrete, and apply your assigned lens rigorously. (If your lens leads you to
a constant, conversion factor, formula, unit, or threshold, don't trust it because it looks
reasonable — re-derive the expected value from first principles and compare; plausible-looking
magic numbers hide real bugs. Pursue this when it's in your lane, not as a reason to leave it.)
Output rules:
- Output GitHub-flavored markdown, concise. No filler, no restating the diff.