Gadfly: agentic adversarial PR reviewer (initial extraction)

Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in
Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/
find_files/get_diff), verifies findings before reporting, two-pass review +
adversarial recheck, posts one labeled comment per model. Advisory only.

- cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo
- entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML)
- Dockerfile: multi-stage; build-time module token never reaches the final image
- .gitea/workflows/build-image.yml: tag v* → build & push image
- examples/: ~15-line consumer stub
- system prompt genericized + hardened to re-derive constants/formulas (semantic bugs)

Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Steve Dudenhoeffer
2026-06-25 18:42:20 -04:00
commit c0d0152a34
18 changed files with 1879 additions and 0 deletions
+171
View File
@@ -0,0 +1,171 @@
#!/usr/bin/env bash
# Adversarial PR review runner.
#
# Fetches a PR's unified diff + metadata from Gitea, asks ONE model to review it
# adversarially, then upserts the result as a single labeled PR comment (so
# re-runs on new commits update the comment in place instead of stacking dupes).
#
# The ollama lane is AGENTIC: it runs the cmd/gadfly Go binary, which drives a
# tool-using agent (majordomo + Ollama Cloud) over the PR's checked-out repo so
# the model can read_file/grep/etc. to VERIFY findings instead of guessing from
# the diff alone. The antigravity lane stays a one-shot `agy` call (agy has its
# own file tools).
#
# Required env:
# GITEA_API e.g. https://gitea.stevedudenhoeffer.com/api/v1/repos/steve/mort
# GITEA_TOKEN token with repo write access (posts the comment)
# PR pull request index/number
# PROVIDER "ollama" | "antigravity"
# MODEL model id (e.g. qwen3-coder:480b-cloud, gemini-3-pro)
#
# Provider-specific env:
# ollama: OLLAMA_CLOUD_API_KEY, GADFLY_BIN (path to the built reviewer),
# GADFLY_REPO_DIR (checked-out repo; default: this script's repo)
# antigravity: `agy` on PATH with credentials already seeded (~/.gemini)
#
# Optional:
# MAX_DIFF_CHARS diff truncation cap for the prompt (default 60000)
#
# This script is advisory: it never fails the job for review content. It exits
# non-zero only on a usage/configuration error.
set -uo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
MAX_DIFF_CHARS="${MAX_DIFF_CHARS:-60000}"
: "${GITEA_API:?GITEA_API required}"
: "${GITEA_TOKEN:?GITEA_TOKEN required}"
: "${PR:?PR required}"
: "${PROVIDER:?PROVIDER required}"
: "${MODEL:?MODEL required}"
MARKER="<!-- gadfly-review:${PROVIDER}:${MODEL} -->"
say() { echo "[gadfly-review:${PROVIDER}:${MODEL}] $*" >&2; }
# jq is required for payload building / response parsing; install if missing.
if ! command -v jq >/dev/null 2>&1; then
say "jq not found; attempting install"
{ apt-get update -qq && apt-get install -y -qq jq; } >/dev/null 2>&1 \
|| { sudo apt-get update -qq && sudo apt-get install -y -qq jq; } >/dev/null 2>&1 \
|| { say "could not install jq"; exit 1; }
fi
# curl timeouts: Gitea API calls are quick. Word-split on purpose so the flags
# expand as separate args. (The LLM call's own deadline lives in the reviewer
# binary / agy, not here.)
API_TIMEOUT="--connect-timeout 20 --max-time 30"
# --- fetch PR context -------------------------------------------------------
say "fetching PR #${PR} context"
DIFF="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/pulls/${PR}.diff" || true)"
META="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/pulls/${PR}" || echo '{}')"
TITLE="$(echo "$META" | jq -r '.title // ""')"
BODY="$(echo "$META" | jq -r '.body // ""')"
if [ -z "$DIFF" ]; then
say "empty diff; nothing to review"
exit 0
fi
# Keep the FULL diff for the agentic (ollama) reviewer — it can pull the whole
# thing via the get_diff tool and embeds a truncated copy in the prompt itself.
# The truncated copy below is only for the one-shot antigravity prompt.
FULL_DIFF="$DIFF"
TRUNC_NOTE=""
if [ "${#DIFF}" -gt "$MAX_DIFF_CHARS" ]; then
DIFF="${DIFF:0:$MAX_DIFF_CHARS}"
TRUNC_NOTE=$'\n\n[NOTE: diff truncated to '"${MAX_DIFF_CHARS}"' chars for length; review the rest manually.]'
fi
SYS="$(cat "${SCRIPT_DIR}/system-prompt.txt")"
USR="$(printf 'PR #%s: %s\n\nDescription:\n%s\n\nUnified diff to review:\n```diff\n%s\n```%s' \
"$PR" "$TITLE" "$BODY" "$DIFF" "$TRUNC_NOTE")"
# --- call the model ---------------------------------------------------------
REVIEW=""
case "$PROVIDER" in
ollama)
# Agentic lane: hand off to the cmd/gadfly binary, which runs a tool-using
# agent over the checked-out repo so it can verify findings instead of
# guessing from the diff. The workflow builds the binary and exports
# GADFLY_BIN + GADFLY_REPO_DIR; we fall back to sane defaults for a
# local run.
if [ -z "${OLLAMA_CLOUD_API_KEY:-}" ]; then
REVIEW="⚠️ \`OLLAMA_CLOUD_API_KEY\` is not configured; this reviewer was skipped."
else
BIN="${GADFLY_BIN:-gadfly}"
if ! command -v "$BIN" >/dev/null 2>&1 && [ ! -x "$BIN" ]; then
REVIEW="⚠️ Agentic reviewer binary not found (\`GADFLY_BIN=${BIN}\`); the workflow build step may have failed."
else
REPO_DIR="${GADFLY_REPO_DIR:-$(cd "${SCRIPT_DIR}/../../.." && pwd)}"
DIFF_FILE="$(mktemp)"
ERR_FILE="${DIFF_FILE}.err"
printf '%s' "$FULL_DIFF" > "$DIFF_FILE"
REVIEW="$(
OLLAMA_API_KEY="$OLLAMA_CLOUD_API_KEY" \
GADFLY_MODEL="$MODEL" \
GADFLY_REPO_DIR="$REPO_DIR" \
GADFLY_DIFF_FILE="$DIFF_FILE" \
GADFLY_SYSTEM_FILE="${SCRIPT_DIR}/system-prompt.txt" \
GADFLY_TITLE="$TITLE" \
GADFLY_BODY="$BODY" \
GADFLY_MAX_DIFF_CHARS="$MAX_DIFF_CHARS" \
"$BIN" 2>"$ERR_FILE"
)"
rc=$?
if [ "$rc" -ne 0 ] || [ -z "$REVIEW" ]; then
REVIEW="⚠️ Agentic reviewer for \`${MODEL}\` failed (exit ${rc}):
\`\`\`
$(tail -c 1500 "$ERR_FILE" 2>/dev/null)
\`\`\`"
fi
rm -f "$DIFF_FILE" "$ERR_FILE"
fi
fi
;;
antigravity)
if ! command -v agy >/dev/null 2>&1; then
REVIEW="⚠️ Antigravity CLI (\`agy\`) not found on PATH."
else
FULL="$(printf '%s\n\n%s' "$SYS" "$USR")"
if ! REVIEW="$(agy -p "$FULL" --model "$MODEL" 2>agy.err)"; then
REVIEW="⚠️ Antigravity CLI failed:
\`\`\`
$(tail -c 1500 agy.err 2>/dev/null)
\`\`\`"
fi
[ -z "$REVIEW" ] && REVIEW="⚠️ Antigravity CLI returned no output (auth/quota?)."
fi
;;
*)
say "unknown provider: ${PROVIDER}"; exit 1 ;;
esac
# --- assemble comment -------------------------------------------------------
COMMENT="$(printf '%s\n### 🔭 Adversarial review — `%s` (%s)\n\n%s\n\n<sub>Automated adversarial review. Advisory only — does not block merge.</sub>' \
"$MARKER" "$MODEL" "$PROVIDER" "$REVIEW")"
POST_BODY="$(jq -n --arg b "$COMMENT" '{body:$b}')"
# --- upsert by marker -------------------------------------------------------
EXISTING_ID=""
page=1
while [ "$page" -le 10 ]; do
CMTS="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" \
"${GITEA_API}/issues/${PR}/comments?limit=50&page=${page}" || echo '[]')"
[ "$(echo "$CMTS" | jq 'length')" = "0" ] && break
EXISTING_ID="$(echo "$CMTS" | jq -r --arg m "$MARKER" \
'.[] | select(.body != null and (.body | startswith($m))) | .id' | head -n1)"
[ -n "$EXISTING_ID" ] && break
page=$((page+1))
done
if [ -n "$EXISTING_ID" ]; then
say "updating existing comment ${EXISTING_ID}"
curl $API_TIMEOUT -sS -X PATCH -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
"${GITEA_API}/issues/comments/${EXISTING_ID}" -d "$POST_BODY" >/dev/null
else
say "creating new comment"
curl $API_TIMEOUT -sS -X POST -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
"${GITEA_API}/issues/${PR}/comments" -d "$POST_BODY" >/dev/null
fi
say "done"
+47
View File
@@ -0,0 +1,47 @@
You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
pull request below — not to praise it. A gadfly does not let things slide.
You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
state. USE THEM to verify before you report. Do not review the diff in isolation.
- read_file(path[, start_line, limit]) — read a file with line numbers.
- list_dir([path]) — list a directory.
- grep(pattern[, path, max_results]) — RE2 regex search across the repo.
- find_files(name[, max_results]) — locate a file by path substring.
- get_diff() — the full unified diff (the task message may truncate it).
Mandatory verification discipline — this is the whole point of giving you tools:
- Before claiming a missing/duplicate import, an undefined symbol, a wrong signature,
a type error, or any "this won't compile / won't resolve" issue: OPEN the file and
CHECK. The diff hunk shows only a few context lines; the declaration you're worried
about is almost always just outside it.
- Before claiming a cross-file problem (a caller you think you broke, a missing update
to another layer/interface): grep for the symbol and read the other side.
- If you cannot confirm a suspicion with the tools, either drop it or clearly label it
"unverified" — do NOT present an unchecked guess as a finding.
Be skeptical and concrete. Hunt specifically for:
- Correctness bugs and logic errors introduced by the change.
- SEMANTIC / domain correctness — the failure mode plausible-looking code hides best.
Do NOT trust a constant, conversion factor, formula, unit, or threshold just because
it looks reasonable. Independently RE-DERIVE the expected value from first principles
(units, dimensions, edge values) and compare. A magic number that "looks about right"
is exactly where real bugs hide (e.g. a linear factor used where it must be squared).
- Concurrency issues: data races, deadlocks, unsynchronized shared state, leaked tasks.
- Security problems: injection, missing authz/authn, secret leakage, unsafe input handling.
- Error handling gaps: ignored errors, swallowed exceptions, missing rollback/cleanup.
- Resource leaks: unclosed handles/bodies/files, context/lifetime misuse, unbounded growth.
- Missed edge cases: off-by-one, nil/null, empty collection, overflow, zero/negative.
- Violations of THIS repo's own conventions. Discover them — do not assume. Read any
README / CONTRIBUTING / CLAUDE.md / AGENTS.md / lint config the repo ships, and hold
the change to the patterns the surrounding code actually uses.
Output rules:
- Output GitHub-flavored markdown, concise. No filler, no restating the diff.
- Lead with a one-line VERDICT: exactly one of "No material issues found",
"Minor issues", or "Blocking issues found".
- Then a short bulleted list of findings. For each finding cite `path:line` and explain
the concrete impact and a suggested fix. Note which findings you verified by reading
the code (and how) versus any you could not confirm.
- Only report issues you are reasonably confident are real after checking. If the diff
is clean, say so plainly rather than inventing nits.
- When you are done investigating, STOP calling tools and reply with the final review.