Gadfly: agentic adversarial PR reviewer (initial extraction)
Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/ find_files/get_diff), verifies findings before reporting, two-pass review + adversarial recheck, posts one labeled comment per model. Advisory only. - cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo - entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML) - Dockerfile: multi-stage; build-time module token never reaches the final image - .gitea/workflows/build-image.yml: tag v* → build & push image - examples/: ~15-line consumer stub - system prompt genericized + hardened to re-derive constants/formulas (semantic bugs) Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
+171
@@ -0,0 +1,171 @@
|
||||
#!/usr/bin/env bash
|
||||
# Adversarial PR review runner.
|
||||
#
|
||||
# Fetches a PR's unified diff + metadata from Gitea, asks ONE model to review it
|
||||
# adversarially, then upserts the result as a single labeled PR comment (so
|
||||
# re-runs on new commits update the comment in place instead of stacking dupes).
|
||||
#
|
||||
# The ollama lane is AGENTIC: it runs the cmd/gadfly Go binary, which drives a
|
||||
# tool-using agent (majordomo + Ollama Cloud) over the PR's checked-out repo so
|
||||
# the model can read_file/grep/etc. to VERIFY findings instead of guessing from
|
||||
# the diff alone. The antigravity lane stays a one-shot `agy` call (agy has its
|
||||
# own file tools).
|
||||
#
|
||||
# Required env:
|
||||
# GITEA_API e.g. https://gitea.stevedudenhoeffer.com/api/v1/repos/steve/mort
|
||||
# GITEA_TOKEN token with repo write access (posts the comment)
|
||||
# PR pull request index/number
|
||||
# PROVIDER "ollama" | "antigravity"
|
||||
# MODEL model id (e.g. qwen3-coder:480b-cloud, gemini-3-pro)
|
||||
#
|
||||
# Provider-specific env:
|
||||
# ollama: OLLAMA_CLOUD_API_KEY, GADFLY_BIN (path to the built reviewer),
|
||||
# GADFLY_REPO_DIR (checked-out repo; default: this script's repo)
|
||||
# antigravity: `agy` on PATH with credentials already seeded (~/.gemini)
|
||||
#
|
||||
# Optional:
|
||||
# MAX_DIFF_CHARS diff truncation cap for the prompt (default 60000)
|
||||
#
|
||||
# This script is advisory: it never fails the job for review content. It exits
|
||||
# non-zero only on a usage/configuration error.
|
||||
set -uo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
MAX_DIFF_CHARS="${MAX_DIFF_CHARS:-60000}"
|
||||
|
||||
: "${GITEA_API:?GITEA_API required}"
|
||||
: "${GITEA_TOKEN:?GITEA_TOKEN required}"
|
||||
: "${PR:?PR required}"
|
||||
: "${PROVIDER:?PROVIDER required}"
|
||||
: "${MODEL:?MODEL required}"
|
||||
|
||||
MARKER="<!-- gadfly-review:${PROVIDER}:${MODEL} -->"
|
||||
say() { echo "[gadfly-review:${PROVIDER}:${MODEL}] $*" >&2; }
|
||||
|
||||
# jq is required for payload building / response parsing; install if missing.
|
||||
if ! command -v jq >/dev/null 2>&1; then
|
||||
say "jq not found; attempting install"
|
||||
{ apt-get update -qq && apt-get install -y -qq jq; } >/dev/null 2>&1 \
|
||||
|| { sudo apt-get update -qq && sudo apt-get install -y -qq jq; } >/dev/null 2>&1 \
|
||||
|| { say "could not install jq"; exit 1; }
|
||||
fi
|
||||
|
||||
# curl timeouts: Gitea API calls are quick. Word-split on purpose so the flags
|
||||
# expand as separate args. (The LLM call's own deadline lives in the reviewer
|
||||
# binary / agy, not here.)
|
||||
API_TIMEOUT="--connect-timeout 20 --max-time 30"
|
||||
|
||||
# --- fetch PR context -------------------------------------------------------
|
||||
say "fetching PR #${PR} context"
|
||||
DIFF="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/pulls/${PR}.diff" || true)"
|
||||
META="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/pulls/${PR}" || echo '{}')"
|
||||
TITLE="$(echo "$META" | jq -r '.title // ""')"
|
||||
BODY="$(echo "$META" | jq -r '.body // ""')"
|
||||
|
||||
if [ -z "$DIFF" ]; then
|
||||
say "empty diff; nothing to review"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Keep the FULL diff for the agentic (ollama) reviewer — it can pull the whole
|
||||
# thing via the get_diff tool and embeds a truncated copy in the prompt itself.
|
||||
# The truncated copy below is only for the one-shot antigravity prompt.
|
||||
FULL_DIFF="$DIFF"
|
||||
TRUNC_NOTE=""
|
||||
if [ "${#DIFF}" -gt "$MAX_DIFF_CHARS" ]; then
|
||||
DIFF="${DIFF:0:$MAX_DIFF_CHARS}"
|
||||
TRUNC_NOTE=$'\n\n[NOTE: diff truncated to '"${MAX_DIFF_CHARS}"' chars for length; review the rest manually.]'
|
||||
fi
|
||||
|
||||
SYS="$(cat "${SCRIPT_DIR}/system-prompt.txt")"
|
||||
USR="$(printf 'PR #%s: %s\n\nDescription:\n%s\n\nUnified diff to review:\n```diff\n%s\n```%s' \
|
||||
"$PR" "$TITLE" "$BODY" "$DIFF" "$TRUNC_NOTE")"
|
||||
|
||||
# --- call the model ---------------------------------------------------------
|
||||
REVIEW=""
|
||||
case "$PROVIDER" in
|
||||
ollama)
|
||||
# Agentic lane: hand off to the cmd/gadfly binary, which runs a tool-using
|
||||
# agent over the checked-out repo so it can verify findings instead of
|
||||
# guessing from the diff. The workflow builds the binary and exports
|
||||
# GADFLY_BIN + GADFLY_REPO_DIR; we fall back to sane defaults for a
|
||||
# local run.
|
||||
if [ -z "${OLLAMA_CLOUD_API_KEY:-}" ]; then
|
||||
REVIEW="⚠️ \`OLLAMA_CLOUD_API_KEY\` is not configured; this reviewer was skipped."
|
||||
else
|
||||
BIN="${GADFLY_BIN:-gadfly}"
|
||||
if ! command -v "$BIN" >/dev/null 2>&1 && [ ! -x "$BIN" ]; then
|
||||
REVIEW="⚠️ Agentic reviewer binary not found (\`GADFLY_BIN=${BIN}\`); the workflow build step may have failed."
|
||||
else
|
||||
REPO_DIR="${GADFLY_REPO_DIR:-$(cd "${SCRIPT_DIR}/../../.." && pwd)}"
|
||||
DIFF_FILE="$(mktemp)"
|
||||
ERR_FILE="${DIFF_FILE}.err"
|
||||
printf '%s' "$FULL_DIFF" > "$DIFF_FILE"
|
||||
REVIEW="$(
|
||||
OLLAMA_API_KEY="$OLLAMA_CLOUD_API_KEY" \
|
||||
GADFLY_MODEL="$MODEL" \
|
||||
GADFLY_REPO_DIR="$REPO_DIR" \
|
||||
GADFLY_DIFF_FILE="$DIFF_FILE" \
|
||||
GADFLY_SYSTEM_FILE="${SCRIPT_DIR}/system-prompt.txt" \
|
||||
GADFLY_TITLE="$TITLE" \
|
||||
GADFLY_BODY="$BODY" \
|
||||
GADFLY_MAX_DIFF_CHARS="$MAX_DIFF_CHARS" \
|
||||
"$BIN" 2>"$ERR_FILE"
|
||||
)"
|
||||
rc=$?
|
||||
if [ "$rc" -ne 0 ] || [ -z "$REVIEW" ]; then
|
||||
REVIEW="⚠️ Agentic reviewer for \`${MODEL}\` failed (exit ${rc}):
|
||||
\`\`\`
|
||||
$(tail -c 1500 "$ERR_FILE" 2>/dev/null)
|
||||
\`\`\`"
|
||||
fi
|
||||
rm -f "$DIFF_FILE" "$ERR_FILE"
|
||||
fi
|
||||
fi
|
||||
;;
|
||||
antigravity)
|
||||
if ! command -v agy >/dev/null 2>&1; then
|
||||
REVIEW="⚠️ Antigravity CLI (\`agy\`) not found on PATH."
|
||||
else
|
||||
FULL="$(printf '%s\n\n%s' "$SYS" "$USR")"
|
||||
if ! REVIEW="$(agy -p "$FULL" --model "$MODEL" 2>agy.err)"; then
|
||||
REVIEW="⚠️ Antigravity CLI failed:
|
||||
\`\`\`
|
||||
$(tail -c 1500 agy.err 2>/dev/null)
|
||||
\`\`\`"
|
||||
fi
|
||||
[ -z "$REVIEW" ] && REVIEW="⚠️ Antigravity CLI returned no output (auth/quota?)."
|
||||
fi
|
||||
;;
|
||||
*)
|
||||
say "unknown provider: ${PROVIDER}"; exit 1 ;;
|
||||
esac
|
||||
|
||||
# --- assemble comment -------------------------------------------------------
|
||||
COMMENT="$(printf '%s\n### 🔭 Adversarial review — `%s` (%s)\n\n%s\n\n<sub>Automated adversarial review. Advisory only — does not block merge.</sub>' \
|
||||
"$MARKER" "$MODEL" "$PROVIDER" "$REVIEW")"
|
||||
POST_BODY="$(jq -n --arg b "$COMMENT" '{body:$b}')"
|
||||
|
||||
# --- upsert by marker -------------------------------------------------------
|
||||
EXISTING_ID=""
|
||||
page=1
|
||||
while [ "$page" -le 10 ]; do
|
||||
CMTS="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" \
|
||||
"${GITEA_API}/issues/${PR}/comments?limit=50&page=${page}" || echo '[]')"
|
||||
[ "$(echo "$CMTS" | jq 'length')" = "0" ] && break
|
||||
EXISTING_ID="$(echo "$CMTS" | jq -r --arg m "$MARKER" \
|
||||
'.[] | select(.body != null and (.body | startswith($m))) | .id' | head -n1)"
|
||||
[ -n "$EXISTING_ID" ] && break
|
||||
page=$((page+1))
|
||||
done
|
||||
|
||||
if [ -n "$EXISTING_ID" ]; then
|
||||
say "updating existing comment ${EXISTING_ID}"
|
||||
curl $API_TIMEOUT -sS -X PATCH -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
|
||||
"${GITEA_API}/issues/comments/${EXISTING_ID}" -d "$POST_BODY" >/dev/null
|
||||
else
|
||||
say "creating new comment"
|
||||
curl $API_TIMEOUT -sS -X POST -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
|
||||
"${GITEA_API}/issues/${PR}/comments" -d "$POST_BODY" >/dev/null
|
||||
fi
|
||||
say "done"
|
||||
@@ -0,0 +1,47 @@
|
||||
You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
|
||||
pull request below — not to praise it. A gadfly does not let things slide.
|
||||
|
||||
You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
|
||||
state. USE THEM to verify before you report. Do not review the diff in isolation.
|
||||
- read_file(path[, start_line, limit]) — read a file with line numbers.
|
||||
- list_dir([path]) — list a directory.
|
||||
- grep(pattern[, path, max_results]) — RE2 regex search across the repo.
|
||||
- find_files(name[, max_results]) — locate a file by path substring.
|
||||
- get_diff() — the full unified diff (the task message may truncate it).
|
||||
|
||||
Mandatory verification discipline — this is the whole point of giving you tools:
|
||||
- Before claiming a missing/duplicate import, an undefined symbol, a wrong signature,
|
||||
a type error, or any "this won't compile / won't resolve" issue: OPEN the file and
|
||||
CHECK. The diff hunk shows only a few context lines; the declaration you're worried
|
||||
about is almost always just outside it.
|
||||
- Before claiming a cross-file problem (a caller you think you broke, a missing update
|
||||
to another layer/interface): grep for the symbol and read the other side.
|
||||
- If you cannot confirm a suspicion with the tools, either drop it or clearly label it
|
||||
"unverified" — do NOT present an unchecked guess as a finding.
|
||||
|
||||
Be skeptical and concrete. Hunt specifically for:
|
||||
- Correctness bugs and logic errors introduced by the change.
|
||||
- SEMANTIC / domain correctness — the failure mode plausible-looking code hides best.
|
||||
Do NOT trust a constant, conversion factor, formula, unit, or threshold just because
|
||||
it looks reasonable. Independently RE-DERIVE the expected value from first principles
|
||||
(units, dimensions, edge values) and compare. A magic number that "looks about right"
|
||||
is exactly where real bugs hide (e.g. a linear factor used where it must be squared).
|
||||
- Concurrency issues: data races, deadlocks, unsynchronized shared state, leaked tasks.
|
||||
- Security problems: injection, missing authz/authn, secret leakage, unsafe input handling.
|
||||
- Error handling gaps: ignored errors, swallowed exceptions, missing rollback/cleanup.
|
||||
- Resource leaks: unclosed handles/bodies/files, context/lifetime misuse, unbounded growth.
|
||||
- Missed edge cases: off-by-one, nil/null, empty collection, overflow, zero/negative.
|
||||
- Violations of THIS repo's own conventions. Discover them — do not assume. Read any
|
||||
README / CONTRIBUTING / CLAUDE.md / AGENTS.md / lint config the repo ships, and hold
|
||||
the change to the patterns the surrounding code actually uses.
|
||||
|
||||
Output rules:
|
||||
- Output GitHub-flavored markdown, concise. No filler, no restating the diff.
|
||||
- Lead with a one-line VERDICT: exactly one of "No material issues found",
|
||||
"Minor issues", or "Blocking issues found".
|
||||
- Then a short bulleted list of findings. For each finding cite `path:line` and explain
|
||||
the concrete impact and a suggested fix. Note which findings you verified by reading
|
||||
the code (and how) versus any you could not confirm.
|
||||
- Only report issues you are reasonably confident are real after checking. If the diff
|
||||
is clean, say so plainly rather than inventing nits.
|
||||
- When you are done investigating, STOP calling tools and reply with the final review.
|
||||
Reference in New Issue
Block a user