Gadfly: agentic adversarial PR reviewer (initial extraction)

Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/ find_files/get_diff), verifies findings before reporting, two-pass review + adversarial recheck, posts one labeled comment per model. Advisory only. - cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo - entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML) - Dockerfile: multi-stage; build-time module token never reaches the final image - .gitea/workflows/build-image.yml: tag v* → build & push image - examples/: ~15-line consumer stub - system prompt genericized + hardened to re-derive constants/formulas (semantic bugs) Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 18:42:20 -04:00
commit c0d0152a34
18 changed files with 1879 additions and 0 deletions
@@ -0,0 +1,171 @@
+#!/usr/bin/env bash
+# Adversarial PR review runner.
+#
+# Fetches a PR's unified diff + metadata from Gitea, asks ONE model to review it
+# adversarially, then upserts the result as a single labeled PR comment (so
+# re-runs on new commits update the comment in place instead of stacking dupes).
+#
+# The ollama lane is AGENTIC: it runs the cmd/gadfly Go binary, which drives a
+# tool-using agent (majordomo + Ollama Cloud) over the PR's checked-out repo so
+# the model can read_file/grep/etc. to VERIFY findings instead of guessing from
+# the diff alone. The antigravity lane stays a one-shot `agy` call (agy has its
+# own file tools).
+#
+# Required env:
+#   GITEA_API    e.g. https://gitea.stevedudenhoeffer.com/api/v1/repos/steve/mort
+#   GITEA_TOKEN  token with repo write access (posts the comment)
+#   PR           pull request index/number
+#   PROVIDER     "ollama" | "antigravity"
+#   MODEL        model id (e.g. qwen3-coder:480b-cloud, gemini-3-pro)
+#
+# Provider-specific env:
+#   ollama:      OLLAMA_CLOUD_API_KEY, GADFLY_BIN (path to the built reviewer),
+#                GADFLY_REPO_DIR (checked-out repo; default: this script's repo)
+#   antigravity: `agy` on PATH with credentials already seeded (~/.gemini)
+#
+# Optional:
+#   MAX_DIFF_CHARS  diff truncation cap for the prompt (default 60000)
+#
+# This script is advisory: it never fails the job for review content. It exits
+# non-zero only on a usage/configuration error.
+set -uo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+MAX_DIFF_CHARS="${MAX_DIFF_CHARS:-60000}"
+
+: "${GITEA_API:?GITEA_API required}"
+: "${GITEA_TOKEN:?GITEA_TOKEN required}"
+: "${PR:?PR required}"
+: "${PROVIDER:?PROVIDER required}"
+: "${MODEL:?MODEL required}"
+
+MARKER="<!-- gadfly-review:${PROVIDER}:${MODEL} -->"
+say() { echo "[gadfly-review:${PROVIDER}:${MODEL}] $*" >&2; }
+
+# jq is required for payload building / response parsing; install if missing.
+if ! command -v jq >/dev/null 2>&1; then
+  say "jq not found; attempting install"
+  { apt-get update -qq && apt-get install -y -qq jq; } >/dev/null 2>&1 \
+    || { sudo apt-get update -qq && sudo apt-get install -y -qq jq; } >/dev/null 2>&1 \
+    || { say "could not install jq"; exit 1; }
+fi
+
+# curl timeouts: Gitea API calls are quick. Word-split on purpose so the flags
+# expand as separate args. (The LLM call's own deadline lives in the reviewer
+# binary / agy, not here.)
+API_TIMEOUT="--connect-timeout 20 --max-time 30"
+
+# --- fetch PR context -------------------------------------------------------
+say "fetching PR #${PR} context"
+DIFF="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/pulls/${PR}.diff" || true)"
+META="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/pulls/${PR}" || echo '{}')"
+TITLE="$(echo "$META" | jq -r '.title // ""')"
+BODY="$(echo "$META" | jq -r '.body // ""')"
+
+if [ -z "$DIFF" ]; then
+  say "empty diff; nothing to review"
+  exit 0
+fi
+
+# Keep the FULL diff for the agentic (ollama) reviewer — it can pull the whole
+# thing via the get_diff tool and embeds a truncated copy in the prompt itself.
+# The truncated copy below is only for the one-shot antigravity prompt.
+FULL_DIFF="$DIFF"
+TRUNC_NOTE=""
+if [ "${#DIFF}" -gt "$MAX_DIFF_CHARS" ]; then
+  DIFF="${DIFF:0:$MAX_DIFF_CHARS}"
+  TRUNC_NOTE=$'\n\n[NOTE: diff truncated to '"${MAX_DIFF_CHARS}"' chars for length; review the rest manually.]'
+fi
+
+SYS="$(cat "${SCRIPT_DIR}/system-prompt.txt")"
+USR="$(printf 'PR #%s: %s\n\nDescription:\n%s\n\nUnified diff to review:\n```diff\n%s\n```%s' \
+  "$PR" "$TITLE" "$BODY" "$DIFF" "$TRUNC_NOTE")"
+
+# --- call the model ---------------------------------------------------------
+REVIEW=""
+case "$PROVIDER" in
+  ollama)
+    # Agentic lane: hand off to the cmd/gadfly binary, which runs a tool-using
+    # agent over the checked-out repo so it can verify findings instead of
+    # guessing from the diff. The workflow builds the binary and exports
+    # GADFLY_BIN + GADFLY_REPO_DIR; we fall back to sane defaults for a
+    # local run.
+    if [ -z "${OLLAMA_CLOUD_API_KEY:-}" ]; then
+      REVIEW="⚠️ \`OLLAMA_CLOUD_API_KEY\` is not configured; this reviewer was skipped."
+    else
+      BIN="${GADFLY_BIN:-gadfly}"
+      if ! command -v "$BIN" >/dev/null 2>&1 && [ ! -x "$BIN" ]; then
+        REVIEW="⚠️ Agentic reviewer binary not found (\`GADFLY_BIN=${BIN}\`); the workflow build step may have failed."
+      else
+        REPO_DIR="${GADFLY_REPO_DIR:-$(cd "${SCRIPT_DIR}/../../.." && pwd)}"
+        DIFF_FILE="$(mktemp)"
+        ERR_FILE="${DIFF_FILE}.err"
+        printf '%s' "$FULL_DIFF" > "$DIFF_FILE"
+        REVIEW="$(
+          OLLAMA_API_KEY="$OLLAMA_CLOUD_API_KEY" \
+          GADFLY_MODEL="$MODEL" \
+          GADFLY_REPO_DIR="$REPO_DIR" \
+          GADFLY_DIFF_FILE="$DIFF_FILE" \
+          GADFLY_SYSTEM_FILE="${SCRIPT_DIR}/system-prompt.txt" \
+          GADFLY_TITLE="$TITLE" \
+          GADFLY_BODY="$BODY" \
+          GADFLY_MAX_DIFF_CHARS="$MAX_DIFF_CHARS" \
+          "$BIN" 2>"$ERR_FILE"
+        )"
+        rc=$?
+        if [ "$rc" -ne 0 ] || [ -z "$REVIEW" ]; then
+          REVIEW="⚠️ Agentic reviewer for \`${MODEL}\` failed (exit ${rc}):
+\`\`\`
+$(tail -c 1500 "$ERR_FILE" 2>/dev/null)
+\`\`\`"
+        fi
+        rm -f "$DIFF_FILE" "$ERR_FILE"
+      fi
+    fi
+    ;;
+  antigravity)
+    if ! command -v agy >/dev/null 2>&1; then
+      REVIEW="⚠️ Antigravity CLI (\`agy\`) not found on PATH."
+    else
+      FULL="$(printf '%s\n\n%s' "$SYS" "$USR")"
+      if ! REVIEW="$(agy -p "$FULL" --model "$MODEL" 2>agy.err)"; then
+        REVIEW="⚠️ Antigravity CLI failed:
+\`\`\`
+$(tail -c 1500 agy.err 2>/dev/null)
+\`\`\`"
+      fi
+      [ -z "$REVIEW" ] && REVIEW="⚠️ Antigravity CLI returned no output (auth/quota?)."
+    fi
+    ;;
+  *)
+    say "unknown provider: ${PROVIDER}"; exit 1 ;;
+esac
+
+# --- assemble comment -------------------------------------------------------
+COMMENT="$(printf '%s\n### 🔭 Adversarial review — `%s` (%s)\n\n%s\n\n<sub>Automated adversarial review. Advisory only — does not block merge.</sub>' \
+  "$MARKER" "$MODEL" "$PROVIDER" "$REVIEW")"
+POST_BODY="$(jq -n --arg b "$COMMENT" '{body:$b}')"
+
+# --- upsert by marker -------------------------------------------------------
+EXISTING_ID=""
+page=1
+while [ "$page" -le 10 ]; do
+  CMTS="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" \
+    "${GITEA_API}/issues/${PR}/comments?limit=50&page=${page}" || echo '[]')"
+  [ "$(echo "$CMTS" | jq 'length')" = "0" ] && break
+  EXISTING_ID="$(echo "$CMTS" | jq -r --arg m "$MARKER" \
+    '.[] | select(.body != null and (.body | startswith($m))) | .id' | head -n1)"
+  [ -n "$EXISTING_ID" ] && break
+  page=$((page+1))
+done
+
+if [ -n "$EXISTING_ID" ]; then
+  say "updating existing comment ${EXISTING_ID}"
+  curl $API_TIMEOUT -sS -X PATCH -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
+    "${GITEA_API}/issues/comments/${EXISTING_ID}" -d "$POST_BODY" >/dev/null
+else
+  say "creating new comment"
+  curl $API_TIMEOUT -sS -X POST -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
+    "${GITEA_API}/issues/${PR}/comments" -d "$POST_BODY" >/dev/null
+fi
+say "done"
@@ -0,0 +1,47 @@
+You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
+pull request below — not to praise it. A gadfly does not let things slide.
+
+You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
+state. USE THEM to verify before you report. Do not review the diff in isolation.
+- read_file(path[, start_line, limit]) — read a file with line numbers.
+- list_dir([path]) — list a directory.
+- grep(pattern[, path, max_results]) — RE2 regex search across the repo.
+- find_files(name[, max_results]) — locate a file by path substring.
+- get_diff() — the full unified diff (the task message may truncate it).
+
+Mandatory verification discipline — this is the whole point of giving you tools:
+- Before claiming a missing/duplicate import, an undefined symbol, a wrong signature,
+  a type error, or any "this won't compile / won't resolve" issue: OPEN the file and
+  CHECK. The diff hunk shows only a few context lines; the declaration you're worried
+  about is almost always just outside it.
+- Before claiming a cross-file problem (a caller you think you broke, a missing update
+  to another layer/interface): grep for the symbol and read the other side.
+- If you cannot confirm a suspicion with the tools, either drop it or clearly label it
+  "unverified" — do NOT present an unchecked guess as a finding.
+
+Be skeptical and concrete. Hunt specifically for:
+- Correctness bugs and logic errors introduced by the change.
+- SEMANTIC / domain correctness — the failure mode plausible-looking code hides best.
+  Do NOT trust a constant, conversion factor, formula, unit, or threshold just because
+  it looks reasonable. Independently RE-DERIVE the expected value from first principles
+  (units, dimensions, edge values) and compare. A magic number that "looks about right"
+  is exactly where real bugs hide (e.g. a linear factor used where it must be squared).
+- Concurrency issues: data races, deadlocks, unsynchronized shared state, leaked tasks.
+- Security problems: injection, missing authz/authn, secret leakage, unsafe input handling.
+- Error handling gaps: ignored errors, swallowed exceptions, missing rollback/cleanup.
+- Resource leaks: unclosed handles/bodies/files, context/lifetime misuse, unbounded growth.
+- Missed edge cases: off-by-one, nil/null, empty collection, overflow, zero/negative.
+- Violations of THIS repo's own conventions. Discover them — do not assume. Read any
+  README / CONTRIBUTING / CLAUDE.md / AGENTS.md / lint config the repo ships, and hold
+  the change to the patterns the surrounding code actually uses.
+
+Output rules:
+- Output GitHub-flavored markdown, concise. No filler, no restating the diff.
+- Lead with a one-line VERDICT: exactly one of "No material issues found",
+  "Minor issues", or "Blocking issues found".
+- Then a short bulleted list of findings. For each finding cite `path:line` and explain
+  the concrete impact and a suggested fix. Note which findings you verified by reading
+  the code (and how) versus any you could not confirm.
+- Only report issues you are reasonably confident are real after checking. If the diff
+  is clean, say so plainly rather than inventing nits.
+- When you are done investigating, STOP calling tools and reply with the final review.