Gadfly: agentic adversarial PR reviewer (initial extraction)

Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/ find_files/get_diff), verifies findings before reporting, two-pass review + adversarial recheck, posts one labeled comment per model. Advisory only. - cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo - entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML) - Dockerfile: multi-stage; build-time module token never reaches the final image - .gitea/workflows/build-image.yml: tag v* → build & push image - examples/: ~15-line consumer stub - system prompt genericized + hardened to re-derive constants/formulas (semantic bugs) Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 18:42:20 -04:00
commit c0d0152a34
18 changed files with 1879 additions and 0 deletions
@@ -0,0 +1,6 @@
+.git
+*.orig.*
+*.orig
+README.md
+examples
+testdata
@@ -0,0 +1,47 @@
+name: Build & push image
+
+# Builds the Gadfly reviewer container and pushes it to the Gitea container
+# registry. Tag a release (v1, v1.2.0, …) to publish that version + :latest.
+#
+# Required repo secrets:
+#   REGISTRY_USER / REGISTRY_PASSWORD  Gitea creds with registry push + read
+#                                      access to the private majordomo module.
+
+on:
+  push:
+    tags: ["v*"]
+  workflow_dispatch: {}
+
+env:
+  IMAGE: gitea.stevedudenhoeffer.com/steve/gadfly
+
+jobs:
+  image:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Log in to the registry
+        run: |
+          echo "${{ secrets.REGISTRY_PASSWORD }}" \
+            | docker login gitea.stevedudenhoeffer.com -u "${{ secrets.REGISTRY_USER }}" --password-stdin
+
+      - name: Resolve tags
+        id: tags
+        run: |
+          if [ "${{ github.ref_type }}" = "tag" ]; then
+            echo "version=${{ github.ref_name }}" >> "$GITHUB_OUTPUT"
+          else
+            echo "version=dev-$(echo ${{ github.sha }} | cut -c1-8)" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: Build & push
+        run: |
+          docker build \
+            --build-arg GIT_USER="${{ secrets.REGISTRY_USER }}" \
+            --build-arg GIT_TOKEN="${{ secrets.REGISTRY_PASSWORD }}" \
+            -t "${IMAGE}:${{ steps.tags.outputs.version }}" \
+            -t "${IMAGE}:latest" \
+            .
+          docker push "${IMAGE}:${{ steps.tags.outputs.version }}"
+          docker push "${IMAGE}:latest"
@@ -0,0 +1,3 @@
+/gadfly
+/out
+*.orig
@@ -0,0 +1,30 @@
+# syntax=docker/dockerfile:1
+#
+# Multi-stage so the private-module access token used to fetch the majordomo
+# dependency lives ONLY in the build stage and never lands in the final image.
+
+FROM golang:1.26 AS build
+ARG GIT_HOST=gitea.stevedudenhoeffer.com
+ARG GIT_USER=
+ARG GIT_TOKEN=
+ENV CGO_ENABLED=0 \
+    GOFLAGS=-mod=mod \
+    GOSUMDB=off
+ENV GOPRIVATE=${GIT_HOST}/* GONOSUMDB=${GIT_HOST}/*
+WORKDIR /src
+# Private Go module access (majordomo). Token is confined to this stage.
+RUN if [ -n "$GIT_TOKEN" ]; then \
+      git config --global url."https://${GIT_USER}:${GIT_TOKEN}@${GIT_HOST}/".insteadOf "https://${GIT_HOST}/"; \
+    fi
+COPY go.mod go.sum ./
+RUN go mod download
+COPY . .
+RUN go build -trimpath -ldflags="-s -w" -o /out/gadfly ./cmd/gadfly
+
+FROM alpine:3.20
+RUN apk add --no-cache bash git curl jq ca-certificates
+COPY --from=build /out/gadfly /usr/local/bin/gadfly
+COPY scripts /app/scripts
+COPY entrypoint.sh /entrypoint.sh
+RUN chmod +x /entrypoint.sh /app/scripts/run.sh /usr/local/bin/gadfly
+ENTRYPOINT ["/entrypoint.sh"]
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Steve Dudenhoeffer
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,101 @@
+# 🪰 Gadfly
+
+**An AI gadfly for your pull requests.** Gadfly is an *adversarial* code reviewer that
+runs in Gitea Actions: on every PR it reads your actual repository, hunts for real
+problems, verifies them against the code, and posts its findings as a comment. It does not
+praise your code. A gadfly does not let things slide.
+
+> ### 🤖 Heads up: this is a vibe-coded project
+> Gadfly was built almost entirely by an AI agent (Claude Code), prompts and all — the
+> reviewer's "brain" is a language model, and so was most of the author. It works and it's
+> tested, but treat it accordingly: **it is advisory only, it never blocks a merge, and you
+> should still review its reviews.** Issues and PRs welcome; expect the occasional
+> AI-flavored rough edge.
+
+## What makes it different
+
+Most LLM "review my diff" bots read the diff in isolation and hallucinate problems they
+can't actually see — a "missing import" that's three lines above the hunk, a "broken
+caller" in a file they never opened. Gadfly is **agentic**: the model has read-only tools
+over the checked-out repo and is *required* to use them before reporting anything.
+
+- **Tools:** `read_file`, `list_dir`, `grep`, `find_files`, `get_diff`.
+- **Verify-before-claiming discipline:** baked into the system prompt — open the file,
+  grep the symbol, or drop the finding.
+- **Two passes:** a *review* pass drafts findings, then an adversarial *recheck* pass
+  independently re-verifies each one against the code and drops the ones it can't confirm,
+  recomputing the verdict. This is what kills "confident but wrong."
+- **Semantic-bug hunting:** it's told not to trust a plausible-looking constant, conversion
+  factor, or formula — re-derive the expected value, because that's where real bugs hide.
+
+Every review leads with a one-line verdict: **No material issues found**, **Minor issues**,
+or **Blocking issues found**.
+
+## Turn it on for a repo
+
+Gadfly ships as a container image, so consuming repos don't build anything — they just run
+it. Drop one file in your repo and set a couple of secrets/vars:
+
+1. Copy [`examples/adversarial-review.yml`](examples/adversarial-review.yml) to
+   `.gitea/workflows/adversarial-review.yml` in your repo.
+2. Add repo config:
+   - **secret** `OLLAMA_CLOUD_API_KEY` — your [Ollama Cloud](https://ollama.com) key (empty
+     ⇒ Gadfly posts a harmless "not configured" notice instead of reviewing).
+   - **var** `OLLAMA_REVIEW_MODELS` *(optional)* — comma-separated model ids
+     (default `qwen3-coder:480b-cloud,gpt-oss:120b-cloud`). One comment per model.
+   - **var** `GADFLY_ALLOWED_USERS` *(optional)* — who may re-trigger via comment; empty ⇒
+     any repo collaborator.
+
+`GITEA_TOKEN` is provided automatically by Actions; comments post as the `gitea-actions`
+user, scoped to that repo — no bot account needed.
+
+### Triggers
+
+1. A **new/reopened/ready** non-draft PR — automatic.
+2. Commenting **`@gadfly review`** on a PR — re-review on demand (gated to allowed users).
+3. **workflow_dispatch** — manual, with a `pr_number` input.
+
+(Pushing new commits does *not* auto-re-review — comment `@gadfly review` after pushing
+fixes. This keeps usage down.)
+
+## How it's packaged
+
+```
+cmd/gadfly/            the agentic reviewer binary (majordomo + Ollama Cloud); zero deps beyond stdlib + majordomo
+scripts/run.sh         fetches the PR diff, runs the reviewer, upserts one labeled comment
+scripts/system-prompt.txt  the reviewer persona + verification discipline
+entrypoint.sh          the container brains: trigger gating, clone, model loop (logic lives here, not in YAML)
+Dockerfile             multi-stage; the build-time module token never reaches the final image
+.gitea/workflows/build-image.yml   tags v* → build & push the image
+examples/              the ~15-line stub a consuming repo drops in
+```
+
+The image is published to `gitea.stevedudenhoeffer.com/steve/gadfly`. Push a `v*` tag to
+build and publish a new version (and `:latest`).
+
+## Configuration (advanced)
+
+The reviewer binary reads these (the stub/entrypoint set sane defaults):
+
+| Env | Default | Meaning |
+|-----|---------|---------|
+| `OLLAMA_API_KEY` | — | Ollama Cloud bearer key (required for real reviews) |
+| `GADFLY_MODEL` | — | model id |
+| `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap |
+| `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
+| `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |
+| `GADFLY_TIMEOUT_SECS` | 300 | overall deadline (both passes) |
+| `GADFLY_MAX_DIFF_CHARS` | 60000 | diff chars embedded in the prompt (full diff via `get_diff`) |
+| `GADFLY_TRIGGER_PHRASE` | `@gadfly review` | comment phrase that re-triggers |
+| `GADFLY_ALLOWED_USERS` | *(collaborators)* | comma-separated allow-list for comment triggers |
+
+## Building locally
+
+```sh
+go build ./cmd/gadfly      # needs read access to the private majordomo module
+go test ./...
+```
+
+## License
+
+MIT — see [LICENSE](LICENSE).
@@ -0,0 +1,287 @@
+// Command gadfly is the agentic backend for the PR adversarial-review
+// workflow (.gitea/workflows/pr-adversarial-review.yml). Unlike the old
+// one-shot chat call, it runs a tool-using agent (majordomo + Ollama Cloud)
+// over the PR's CHECKED-OUT repository: the model can read_file / list_dir /
+// grep / find_files / get_diff to VERIFY a finding before reporting it, which
+// kills the "diff-only" false positives (claiming a missing import or a
+// non-existent method it simply couldn't see).
+//
+// It is a pure producer of review text: it reads the diff + the repo and
+// prints the review markdown to stdout. All Gitea I/O (fetching the diff,
+// upserting the comment) stays in run.sh, so this binary needs no repo write
+// access and is straightforward to unit-test.
+//
+// Two passes (unless the draft is a clean "no material issues" pass): a
+// REVIEW pass produces a draft, then an adversarial RECHECK pass independently
+// re-verifies every finding against the actual files with the same tools and
+// drops the ones it cannot confirm, recomputing the verdict. This catches the
+// "confident but wrong" findings that survive a single pass — e.g. claiming an
+// env var is unset when a wrapper script sets it (see recheck.go).
+//
+// Inputs (env):
+//
+//	OLLAMA_API_KEY          Ollama Cloud bearer key (required).
+//	GADFLY_MODEL          model id, e.g. "qwen3-coder:480b-cloud" (required).
+//	GADFLY_REPO_DIR       path to the checked-out repo (required; the FS sandbox root).
+//	GADFLY_DIFF_FILE      path to a file holding the full unified diff (required).
+//	GADFLY_SYSTEM_FILE    path to the reviewer system prompt (required).
+//	GADFLY_TITLE          PR title (optional).
+//	GADFLY_BODY           PR description (optional).
+//	GADFLY_MAX_STEPS      review-pass step cap (optional, default 24).
+//	GADFLY_WRAPUP_RESERVE steps before the cap at which the agent is told to
+//	                        stop investigating and write its answer (optional,
+//	                        default 4). Plus a tool-free finalization fallback
+//	                        guarantees a step-exhausted pass still emits output.
+//	GADFLY_RECHECK        set to 0/false to skip the recheck pass (optional, default on).
+//	GADFLY_RECHECK_MAX_STEPS recheck-pass step cap (optional, default 16).
+//	GADFLY_TIMEOUT_SECS   overall deadline in seconds, shared by both passes (optional, default 300).
+//	GADFLY_MAX_DIFF_CHARS diff chars embedded in the prompt (optional, default 60000;
+//	                        the full diff is always available via the get_diff tool).
+//
+// On success it prints the review to stdout and exits 0. On a usage/config or
+// model error it prints a diagnostic to stderr and exits non-zero; run.sh then
+// posts a "reviewer failed" notice (advisory — never fails the CI job).
+package main
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"os"
+	"strconv"
+	"strings"
+	"time"
+
+	"gitea.stevedudenhoeffer.com/steve/majordomo/agent"
+	llm "gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+	"gitea.stevedudenhoeffer.com/steve/majordomo/provider/ollama"
+)
+
+const (
+	defaultMaxSteps     = 24
+	defaultTimeoutSecs  = 300
+	defaultMaxDiffChars = 60000
+	// defaultWrapUpReserve is how many steps before the cap the agent is told
+	// to stop investigating and write its final answer. Reserving a margin is
+	// what keeps a thorough reviewer from spending its whole budget on tool
+	// calls and then hard-failing with "max steps reached without a final
+	// answer" — it always has a few steps left to wrap up.
+	defaultWrapUpReserve = 4
+)
+
+// wrapUpInstruction is steered into a running agent once it comes within the
+// wrap-up reserve of its step cap: a forceful nudge to stop calling tools and
+// emit the final answer using only what it has already gathered.
+const wrapUpInstruction = "⚠️ You are almost out of your investigation budget — only a few tool steps remain. " +
+	"STOP calling tools now and write your FINAL answer immediately, using only what you have already verified. " +
+	"Do not begin any new investigation. If a finding could not be confirmed, drop it or mark it explicitly as unverified. " +
+	"Output the review in the required format right now."
+
+// finalizeInstruction is the user message sent on the tool-free fallback pass
+// when the agent exhausted its budget (or tripped a loop guard) without ever
+// producing a final answer. It forces the model to synthesize whatever it has.
+const finalizeInstruction = "You have run out of investigation steps. Do NOT call any tools. " +
+	"Based solely on what you have already gathered above, write your final answer now in the required format. " +
+	"If you could not confirm some findings, omit them or mark them as unverified, but produce the answer."
+
+func main() {
+	if err := run(); err != nil {
+		fmt.Fprintln(os.Stderr, "gadfly:", err)
+		os.Exit(1)
+	}
+}
+
+func run() error {
+	apiKey := os.Getenv("OLLAMA_API_KEY")
+	if apiKey == "" {
+		return errors.New("OLLAMA_API_KEY is required")
+	}
+	model := os.Getenv("GADFLY_MODEL")
+	repoDir := os.Getenv("GADFLY_REPO_DIR")
+	diffFile := os.Getenv("GADFLY_DIFF_FILE")
+	systemFile := os.Getenv("GADFLY_SYSTEM_FILE")
+	if model == "" || repoDir == "" || diffFile == "" || systemFile == "" {
+		return errors.New("GADFLY_MODEL, GADFLY_REPO_DIR, GADFLY_DIFF_FILE and GADFLY_SYSTEM_FILE are all required")
+	}
+
+	diffBytes, err := os.ReadFile(diffFile)
+	if err != nil {
+		return fmt.Errorf("read diff file: %w", err)
+	}
+	diff := string(diffBytes)
+	if strings.TrimSpace(diff) == "" {
+		return errors.New("empty diff; nothing to review")
+	}
+
+	systemBytes, err := os.ReadFile(systemFile)
+	if err != nil {
+		return fmt.Errorf("read system prompt: %w", err)
+	}
+
+	fsTools, err := newRepoFS(repoDir, diff)
+	if err != nil {
+		return err
+	}
+
+	mdl, err := ollama.Cloud(ollama.WithToken(apiKey)).Model(model)
+	if err != nil {
+		return fmt.Errorf("build model %q: %w", model, err)
+	}
+
+	timeout := time.Duration(envInt("GADFLY_TIMEOUT_SECS", defaultTimeoutSecs)) * time.Second
+	ctx, cancel := context.WithTimeout(context.Background(), timeout)
+	defer cancel()
+
+	// Pass 1 — review: produce the draft.
+	draft, err := runAgent(ctx, mdl, fsTools, string(systemBytes), buildTask(diff),
+		envInt("GADFLY_MAX_STEPS", defaultMaxSteps))
+	if err != nil {
+		return fmt.Errorf("review pass: %w", err)
+	}
+
+	// Pass 2 — recheck: adversarially re-verify the draft's findings and drop
+	// the unconfirmed ones. Skipped for a clean draft (nothing to verify) or
+	// when disabled. A recheck failure is non-fatal — we emit the unverified
+	// draft rather than losing the review entirely.
+	final := draft
+	if shouldRecheck(draft) {
+		rechecked, rerr := runAgent(ctx, mdl, fsTools, recheckSystemPrompt, buildRecheckTask(draft, diff),
+			envInt("GADFLY_RECHECK_MAX_STEPS", defaultRecheckMaxSteps))
+		if rerr != nil {
+			fmt.Fprintln(os.Stderr, "gadfly: recheck pass failed; emitting unverified draft:", rerr)
+		} else {
+			final = rechecked
+		}
+	}
+
+	fmt.Println(final)
+	return nil
+}
+
+// runAgent runs one agent pass (its own fresh toolbox over the sandbox) and
+// returns the final answer. An empty answer is an error — the caller decides
+// whether that is fatal (review pass) or recoverable (recheck pass). A
+// non-empty answer that ended on a budget/guard error is still returned: the
+// model wrote its output, then ran out of steps.
+//
+// Two mechanisms keep a step-hungry model from hard-failing with no output:
+//  1. A wrap-up steer: once the run comes within wrapUpReserve steps of the
+//     cap, a forceful "stop calling tools, write your final answer" message is
+//     injected so the model spends its remaining steps finalizing.
+//  2. A finalization fallback: if the loop still ends empty (the model ignored
+//     the nudge, or a loop guard tripped), one tool-free model call forces a
+//     final answer out of the transcript already gathered.
+func runAgent(ctx context.Context, mdl llm.Model, fsTools *repoFS, system, task string, maxSteps int) (string, error) {
+	box, err := fsTools.toolbox()
+	if err != nil {
+		return "", err
+	}
+	loop := agent.New(mdl, system,
+		agent.WithToolbox(box),
+		agent.WithMaxSteps(maxSteps),
+		// Guard rails: stop the model from spinning on failing or identical
+		// tool calls instead of writing its answer.
+		agent.WithToolErrorLimits(4, 4),
+	)
+
+	wrapUpAt := maxSteps - wrapUpReserve()
+	if wrapUpAt < 1 {
+		wrapUpAt = 1
+	}
+	var completed int // steps finished so far (updated after each step)
+	nudged := false
+
+	res, runErr := loop.Run(ctx, task,
+		agent.OnStep(func(s agent.Step) { completed = s.Index + 1 }),
+		agent.WithSteer(func() []llm.Message {
+			if !nudged && completed >= wrapUpAt {
+				nudged = true
+				return []llm.Message{llm.UserText(wrapUpInstruction)}
+			}
+			return nil
+		}),
+	)
+
+	out := ""
+	if res != nil {
+		out = strings.TrimSpace(res.Output)
+	}
+	if out != "" {
+		return out, nil
+	}
+
+	// No final answer. If we still have budget on the clock and a transcript to
+	// work from, force a tool-free finalization rather than losing the pass.
+	if res != nil && len(res.Messages) > 0 && ctx.Err() == nil {
+		if forced := forceFinalAnswer(ctx, mdl, system, res.Messages); forced != "" {
+			return forced, nil
+		}
+	}
+
+	if runErr != nil {
+		return "", runErr
+	}
+	return "", errors.New("agent produced no output")
+}
+
+// forceFinalAnswer makes one tool-free model call to squeeze a final answer out
+// of an agent that exhausted its step budget without producing one. Tools are
+// forbidden (ToolChoice "none") so the model must synthesize from the transcript
+// instead of investigating further. Best-effort: any error or empty reply
+// returns "" and the caller falls back to its normal empty-output handling.
+func forceFinalAnswer(ctx context.Context, mdl llm.Model, system string, transcript []llm.Message) string {
+	msgs := append(append([]llm.Message(nil), transcript...), llm.UserText(finalizeInstruction))
+	resp, err := mdl.Generate(ctx, llm.Request{
+		System:     system,
+		Messages:   msgs,
+		ToolChoice: "none",
+	})
+	if err != nil || resp == nil {
+		return ""
+	}
+	return strings.TrimSpace(resp.Text())
+}
+
+// wrapUpReserve is how many steps before the cap the wrap-up nudge fires,
+// overridable via GADFLY_WRAPUP_RESERVE.
+func wrapUpReserve() int {
+	return envInt("GADFLY_WRAPUP_RESERVE", defaultWrapUpReserve)
+}
+
+// buildTask assembles the user message: PR metadata plus the unified diff,
+// truncated for the prompt (the full diff stays available via get_diff).
+func buildTask(diff string) string {
+	title := os.Getenv("GADFLY_TITLE")
+	body := os.Getenv("GADFLY_BODY")
+
+	maxDiff := envInt("GADFLY_MAX_DIFF_CHARS", defaultMaxDiffChars)
+	truncNote := ""
+	if maxDiff > 0 && len(diff) > maxDiff {
+		diff = diff[:maxDiff]
+		truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars in this message; call get_diff for the full text.]", maxDiff)
+	}
+
+	var b strings.Builder
+	if title != "" {
+		fmt.Fprintf(&b, "PR title: %s\n\n", title)
+	}
+	if strings.TrimSpace(body) != "" {
+		fmt.Fprintf(&b, "PR description:\n%s\n\n", body)
+	}
+	b.WriteString("Review the following unified diff. Before reporting any cross-file or compile-correctness issue, use your tools (read_file, grep, find_files) to verify it against the actual checked-out code — do not rely on the diff alone.\n\n")
+	fmt.Fprintf(&b, "```diff\n%s\n```%s", diff, truncNote)
+	return b.String()
+}
+
+// envInt reads an integer env var, falling back to def when unset or unparseable.
+func envInt(name string, def int) int {
+	v := strings.TrimSpace(os.Getenv(name))
+	if v == "" {
+		return def
+	}
+	n, err := strconv.Atoi(v)
+	if err != nil || n <= 0 {
+		return def
+	}
+	return n
+}
@@ -0,0 +1,97 @@
+package main
+
+import (
+	"fmt"
+	"os"
+	"strings"
+)
+
+// defaultRecheckMaxSteps bounds the verification pass. It is smaller than the
+// review pass: re-checking a handful of existing findings needs fewer steps
+// than discovering them.
+const defaultRecheckMaxSteps = 16
+
+// recheckSystemPrompt drives the second, adversarial verification pass. The
+// model is given a DRAFT review and must independently confirm each finding
+// against the real code before letting it survive — the antidote to a
+// single-pass reviewer that reads a couple of files, mis-connects them, and
+// posts a confident but wrong "blocking" verdict.
+const recheckSystemPrompt = `You are a VERIFICATION GATE for an automated adversarial code review of the
+"mort" project (a large Go Discord bot). You are given a DRAFT review produced
+by another model. Your job is NOT to write a new review — it is to confirm or
+reject each finding in the draft against the ACTUAL code, then output the
+corrected review.
+
+You have the same read-only repository tools as the original reviewer:
+- read_file(path[, start_line, limit]), list_dir([path]), grep(pattern[, path,
+  max_results]), find_files(name[, max_results]), get_diff().
+
+For EVERY finding in the draft:
+1. Independently reproduce the reasoning by reading the actual files with your
+   tools — do not trust the draft's claim, and do not trust the diff hunk alone.
+2. KEEP the finding only if you can positively confirm it against the code.
+3. DROP the finding if you cannot confirm it, or if the code contradicts it.
+
+Watch especially for findings that ignore the "glue" around a change — the most
+common false positive. Before keeping a claim that something is "missing",
+"undefined", "never set", "not exported", or "won't compile", GREP THE WHOLE
+REPO for it: the thing is very often satisfied in a place the original reviewer
+didn't look — a shell script or Makefile that sets an env var, a CI YAML, an
+adjacent file, generated code, or a wrapper that maps one name to another. A
+finding that an env var X is unset is wrong if any script invokes the program
+with "X=... prog". Check before you keep.
+
+Output rules:
+- Output the corrected review in the SAME format as the draft: a one-line
+  VERDICT ("No material issues found", "Minor issues", or "Blocking issues
+  found"), then the surviving findings as bullets with path:line and impact.
+- Recompute the VERDICT from what SURVIVES. If every finding was dropped, the
+  verdict is "No material issues found".
+- Do NOT invent new findings; this is a verification gate, not a fresh review.
+- Do NOT include meta-commentary about the verification process or which
+  findings you dropped — output only the final, corrected review markdown.
+- When done investigating, STOP calling tools and reply with the review.`
+
+// recheckEnabled reports whether the verification pass should run. On unless
+// GADFLY_RECHECK is explicitly a falsey value.
+func recheckEnabled() bool {
+	switch strings.ToLower(strings.TrimSpace(os.Getenv("GADFLY_RECHECK"))) {
+	case "0", "false", "no", "off":
+		return false
+	default:
+		return true
+	}
+}
+
+// shouldRecheck decides whether to run the verification pass for a given draft.
+// A clean "no material issues" draft has nothing to verify, so it is skipped
+// even when rechecking is enabled — saving a whole model pass on clean PRs.
+func shouldRecheck(draft string) bool {
+	if !recheckEnabled() {
+		return false
+	}
+	if strings.Contains(strings.ToLower(draft), "no material issues") {
+		return false
+	}
+	return true
+}
+
+// buildRecheckTask is the verification pass's user message: the draft review to
+// scrutinize, with the full diff available via get_diff (and embedded here,
+// truncated, to save a tool call).
+func buildRecheckTask(draft, diff string) string {
+	maxDiff := envInt("GADFLY_MAX_DIFF_CHARS", defaultMaxDiffChars)
+	truncNote := ""
+	if maxDiff > 0 && len(diff) > maxDiff {
+		diff = diff[:maxDiff]
+		truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars here; call get_diff for the full text.]", maxDiff)
+	}
+
+	var b strings.Builder
+	b.WriteString("Verify the following DRAFT review against the actual code, drop every finding you cannot confirm, and output the corrected review.\n\n")
+	b.WriteString("## Draft review\n\n")
+	b.WriteString(draft)
+	b.WriteString("\n\n## PR diff under review\n\n")
+	fmt.Fprintf(&b, "```diff\n%s\n```%s", diff, truncNote)
+	return b.String()
+}
@@ -0,0 +1,101 @@
+package main
+
+import (
+	"context"
+	"strings"
+	"testing"
+
+	llm "gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+	"gitea.stevedudenhoeffer.com/steve/majordomo/provider/fake"
+)
+
+func TestShouldRecheck(t *testing.T) {
+	t.Setenv("GADFLY_RECHECK", "") // default on
+
+	if shouldRecheck("VERDICT: Blocking issues found\n- something is wrong") != true {
+		t.Error("a draft with findings should be rechecked")
+	}
+	if shouldRecheck("No material issues found.") != false {
+		t.Error("a clean draft should skip recheck")
+	}
+	if shouldRecheck("### review\n\nNo material issues found.\n") != false {
+		t.Error("clean draft detection should be case/whitespace tolerant")
+	}
+
+	// Explicit disable wins even when there are findings.
+	t.Setenv("GADFLY_RECHECK", "0")
+	if shouldRecheck("Blocking issues found\n- x") != false {
+		t.Error("GADFLY_RECHECK=0 must disable recheck")
+	}
+	t.Setenv("GADFLY_RECHECK", "false")
+	if shouldRecheck("Blocking issues found\n- x") != false {
+		t.Error("GADFLY_RECHECK=false must disable recheck")
+	}
+}
+
+func TestRecheckEnabled(t *testing.T) {
+	for _, v := range []string{"", "1", "true", "yes", "anything"} {
+		t.Setenv("GADFLY_RECHECK", v)
+		if !recheckEnabled() {
+			t.Errorf("GADFLY_RECHECK=%q should be enabled", v)
+		}
+	}
+	for _, v := range []string{"0", "false", "no", "off", "OFF", " False "} {
+		t.Setenv("GADFLY_RECHECK", v)
+		if recheckEnabled() {
+			t.Errorf("GADFLY_RECHECK=%q should be disabled", v)
+		}
+	}
+}
+
+func TestBuildRecheckTask(t *testing.T) {
+	t.Setenv("GADFLY_MAX_DIFF_CHARS", "")
+	draft := "VERDICT: Blocking issues found\n- foo.go:1 broken"
+	out := buildRecheckTask(draft, "diff --git a/x b/x\n+y\n")
+	if !strings.Contains(out, draft) {
+		t.Error("recheck task must include the draft review")
+	}
+	if !strings.Contains(out, "Verify") || !strings.Contains(out, "drop every finding you cannot confirm") {
+		t.Errorf("recheck task missing the verify instruction:\n%s", out)
+	}
+	if !strings.Contains(out, "diff --git") {
+		t.Error("recheck task should include the diff")
+	}
+}
+
+// fakeModel builds a fake majordomo model that always replies with the given
+// text (no tool calls), so the agent loop ends on its first step.
+func fakeModel(t *testing.T, reply string) llm.Model {
+	t.Helper()
+	p := fake.New("fake", fake.WithDefault(func(string, llm.Request) fake.Step {
+		return fake.Reply(reply)
+	}))
+	m, err := p.Model("mock")
+	if err != nil {
+		t.Fatal(err)
+	}
+	return m
+}
+
+func TestRunAgent_ReturnsOutput(t *testing.T) {
+	fs, err := newRepoFS(t.TempDir(), "diff")
+	if err != nil {
+		t.Fatal(err)
+	}
+	mdl := fakeModel(t, "  corrected review: No material issues found.  ")
+	out, err := runAgent(context.Background(), mdl, fs, "sys", "task", 4)
+	if err != nil {
+		t.Fatalf("runAgent: %v", err)
+	}
+	if out != "corrected review: No material issues found." {
+		t.Errorf("runAgent should return trimmed model output, got %q", out)
+	}
+}
+
+func TestRunAgent_EmptyIsError(t *testing.T) {
+	fs, _ := newRepoFS(t.TempDir(), "diff")
+	mdl := fakeModel(t, "   ")
+	if _, err := runAgent(context.Background(), mdl, fs, "sys", "task", 4); err == nil {
+		t.Error("runAgent should error on empty model output")
+	}
+}
@@ -0,0 +1,388 @@
+package main
+
+import (
+	"bufio"
+	"context"
+	"fmt"
+	"os"
+	"path/filepath"
+	"regexp"
+	"sort"
+	"strings"
+
+	llm "gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+)
+
+// Tool output bounds. The reviewer is a chat agent with a finite context, so
+// every tool caps how much it can pull in one call — a runaway read_file or
+// grep would blow the window and stall the loop.
+const (
+	maxFileBytes   = 64 * 1024 // per read_file call
+	maxReadLines   = 800       // per read_file call
+	maxGrepResults = 200       // per grep call
+	maxFindResults = 200       // per find_files call
+	maxLineLen     = 400       // truncate any single returned line to this
+)
+
+// skipDirs are never descended into by grep / find_files — noise and bulk that
+// a code reviewer never needs and that would swamp the results.
+var skipDirs = map[string]bool{
+	".git":         true,
+	"node_modules": true,
+	"vendor":       true,
+}
+
+// repoFS is a read-only, sandboxed view of the checked-out repository. Every
+// path argument from the model is resolved against root and rejected if it
+// escapes (symlink or `..` traversal), so a hostile diff can never make the
+// reviewer read outside the checkout.
+type repoFS struct {
+	root string // absolute, symlink-resolved repo root
+	diff string // the full PR unified diff (served by get_diff)
+}
+
+// newRepoFS resolves root to an absolute, symlink-free path.
+func newRepoFS(root, diff string) (*repoFS, error) {
+	abs, err := filepath.Abs(root)
+	if err != nil {
+		return nil, fmt.Errorf("resolve repo dir: %w", err)
+	}
+	// EvalSymlinks so prefix containment checks survive a symlinked root
+	// (e.g. macOS /tmp -> /private/tmp).
+	if resolved, err := filepath.EvalSymlinks(abs); err == nil {
+		abs = resolved
+	}
+	info, err := os.Stat(abs)
+	if err != nil {
+		return nil, fmt.Errorf("repo dir %q: %w", root, err)
+	}
+	if !info.IsDir() {
+		return nil, fmt.Errorf("repo dir %q is not a directory", root)
+	}
+	return &repoFS{root: abs, diff: diff}, nil
+}
+
+// resolve maps a model-supplied relative path to an absolute path inside the
+// sandbox, rejecting anything that escapes root. An empty path means root.
+func (r *repoFS) resolve(rel string) (string, error) {
+	rel = strings.TrimSpace(rel)
+	rel = strings.TrimPrefix(rel, "./")
+	if rel == "" || rel == "." {
+		return r.root, nil
+	}
+	if filepath.IsAbs(rel) {
+		// Allow an absolute path only if it already points inside the sandbox.
+		clean := filepath.Clean(rel)
+		if err := r.contains(clean); err != nil {
+			return "", err
+		}
+		return clean, nil
+	}
+	joined := filepath.Clean(filepath.Join(r.root, rel))
+	if err := r.contains(joined); err != nil {
+		return "", err
+	}
+	return joined, nil
+}
+
+// contains verifies abs is root or lives beneath it.
+func (r *repoFS) contains(abs string) error {
+	if abs == r.root {
+		return nil
+	}
+	if !strings.HasPrefix(abs, r.root+string(os.PathSeparator)) {
+		return fmt.Errorf("path escapes the repository sandbox")
+	}
+	return nil
+}
+
+// toolbox builds the read-only review toolbox over this sandbox.
+func (r *repoFS) toolbox() (*llm.Toolbox, error) {
+	box := llm.NewToolbox("gadfly")
+	tools := []llm.Tool{
+		r.readFileTool(),
+		r.listDirTool(),
+		r.grepTool(),
+		r.findFilesTool(),
+		r.getDiffTool(),
+	}
+	for _, t := range tools {
+		if err := box.Add(t); err != nil {
+			return nil, fmt.Errorf("add tool %q: %w", t.Name, err)
+		}
+	}
+	return box, nil
+}
+
+type readFileArgs struct {
+	Path      string `json:"path" description:"Repository-relative path of the file to read, e.g. pkg/logic/agentexec/pipeline.go"`
+	StartLine int    `json:"start_line,omitempty" description:"Optional 1-based line to start from (default 1)."`
+	Limit     int    `json:"limit,omitempty" description:"Optional max number of lines to return (default/maximum 800)."`
+}
+
+func (r *repoFS) readFileTool() llm.Tool {
+	return llm.DefineTool[readFileArgs](
+		"read_file",
+		"Read a file from the repository at its current checked-out state, with line numbers. Use this to verify the surrounding code, imports, and symbols a diff hunk touches before reporting an issue.",
+		func(_ context.Context, args readFileArgs) (any, error) {
+			abs, err := r.resolve(args.Path)
+			if err != nil {
+				return nil, err
+			}
+			info, err := os.Stat(abs)
+			if err != nil {
+				return nil, fmt.Errorf("stat %q: %w", args.Path, err)
+			}
+			if info.IsDir() {
+				return nil, fmt.Errorf("%q is a directory; use list_dir", args.Path)
+			}
+			f, err := os.Open(abs)
+			if err != nil {
+				return nil, fmt.Errorf("open %q: %w", args.Path, err)
+			}
+			defer f.Close()
+
+			start := args.StartLine
+			if start < 1 {
+				start = 1
+			}
+			limit := args.Limit
+			if limit <= 0 || limit > maxReadLines {
+				limit = maxReadLines
+			}
+
+			var b strings.Builder
+			sc := bufio.NewScanner(f)
+			sc.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
+			lineNo := 0
+			emitted := 0
+			for sc.Scan() {
+				lineNo++
+				if lineNo < start {
+					continue
+				}
+				if emitted >= limit || b.Len() >= maxFileBytes {
+					fmt.Fprintf(&b, "... (truncated at line %d; call read_file again with start_line=%d for more)\n", lineNo, lineNo)
+					break
+				}
+				line := sc.Text()
+				if len(line) > maxLineLen {
+					line = line[:maxLineLen] + "…"
+				}
+				fmt.Fprintf(&b, "%d\t%s\n", lineNo, line)
+				emitted++
+			}
+			if err := sc.Err(); err != nil {
+				return nil, fmt.Errorf("read %q: %w", args.Path, err)
+			}
+			if emitted == 0 {
+				return fmt.Sprintf("(%s has no lines at/after %d; file has %d lines)", args.Path, start, lineNo), nil
+			}
+			return b.String(), nil
+		},
+	)
+}
+
+type listDirArgs struct {
+	Path string `json:"path,omitempty" description:"Optional repository-relative directory (default: repo root)."`
+}
+
+func (r *repoFS) listDirTool() llm.Tool {
+	return llm.DefineTool[listDirArgs](
+		"list_dir",
+		"List the entries of a directory in the repository (directories marked with a trailing /). Use it to discover where code lives before reading.",
+		func(_ context.Context, args listDirArgs) (any, error) {
+			abs, err := r.resolve(args.Path)
+			if err != nil {
+				return nil, err
+			}
+			entries, err := os.ReadDir(abs)
+			if err != nil {
+				return nil, fmt.Errorf("list %q: %w", args.Path, err)
+			}
+			names := make([]string, 0, len(entries))
+			for _, e := range entries {
+				name := e.Name()
+				if e.IsDir() {
+					name += "/"
+				}
+				names = append(names, name)
+			}
+			sort.Strings(names)
+			if len(names) == 0 {
+				return "(empty directory)", nil
+			}
+			return strings.Join(names, "\n"), nil
+		},
+	)
+}
+
+type grepArgs struct {
+	Pattern    string `json:"pattern" description:"A Go (RE2) regular expression to search for."`
+	Path       string `json:"path,omitempty" description:"Optional repository-relative file or subdirectory to scope the search (default: whole repo)."`
+	MaxResults int    `json:"max_results,omitempty" description:"Optional cap on matching lines returned (default/maximum 200)."`
+}
+
+func (r *repoFS) grepTool() llm.Tool {
+	return llm.DefineTool[grepArgs](
+		"grep",
+		"Search the repository's text files for a regular expression and return matching `path:line: text`. Use it to check whether a symbol, import, or call exists elsewhere before claiming a cross-file problem.",
+		func(_ context.Context, args grepArgs) (any, error) {
+			if strings.TrimSpace(args.Pattern) == "" {
+				return nil, fmt.Errorf("pattern is required")
+			}
+			re, err := regexp.Compile(args.Pattern)
+			if err != nil {
+				return nil, fmt.Errorf("invalid regexp: %w", err)
+			}
+			base, err := r.resolve(args.Path)
+			if err != nil {
+				return nil, err
+			}
+			limit := args.MaxResults
+			if limit <= 0 || limit > maxGrepResults {
+				limit = maxGrepResults
+			}
+
+			var out []string
+			truncated := false
+			walkErr := filepath.WalkDir(base, func(path string, d os.DirEntry, err error) error {
+				if err != nil {
+					return nil // skip unreadable entries
+				}
+				if d.IsDir() {
+					if skipDirs[d.Name()] && path != base {
+						return filepath.SkipDir
+					}
+					return nil
+				}
+				if len(out) >= limit {
+					truncated = true
+					return filepath.SkipAll
+				}
+				matchesInFile(path, r.root, re, limit, &out)
+				return nil
+			})
+			if walkErr != nil {
+				return nil, fmt.Errorf("search: %w", walkErr)
+			}
+			if len(out) > limit {
+				out = out[:limit]
+				truncated = true
+			}
+			if len(out) == 0 {
+				return "(no matches)", nil
+			}
+			res := strings.Join(out, "\n")
+			if truncated {
+				res += fmt.Sprintf("\n... (truncated at %d matches; narrow the pattern or path)", limit)
+			}
+			return res, nil
+		},
+	)
+}
+
+// matchesInFile appends "relpath:line: text" for each regexp match in a single
+// text file, stopping once the global cap is reached. Binary files (NUL in the
+// first chunk) and oversized files are skipped.
+func matchesInFile(path, root string, re *regexp.Regexp, limit int, out *[]string) {
+	f, err := os.Open(path)
+	if err != nil {
+		return
+	}
+	defer f.Close()
+
+	rel, relErr := filepath.Rel(root, path)
+	if relErr != nil {
+		rel = path
+	}
+	sc := bufio.NewScanner(f)
+	sc.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
+	lineNo := 0
+	for sc.Scan() {
+		if len(*out) >= limit {
+			return
+		}
+		lineNo++
+		line := sc.Text()
+		if lineNo == 1 && strings.IndexByte(line, 0) >= 0 {
+			return // looks binary
+		}
+		if re.MatchString(line) {
+			trimmed := strings.TrimSpace(line)
+			if len(trimmed) > maxLineLen {
+				trimmed = trimmed[:maxLineLen] + "…"
+			}
+			*out = append(*out, fmt.Sprintf("%s:%d: %s", rel, lineNo, trimmed))
+		}
+	}
+}
+
+type findFilesArgs struct {
+	Name       string `json:"name" description:"Case-insensitive substring of the file path to match, e.g. \"pipeline.go\" or \"agentexec/\"."`
+	MaxResults int    `json:"max_results,omitempty" description:"Optional cap on paths returned (default/maximum 200)."`
+}
+
+func (r *repoFS) findFilesTool() llm.Tool {
+	return llm.DefineTool[findFilesArgs](
+		"find_files",
+		"Find files whose repository-relative path contains a case-insensitive substring. Use it to locate a file by name when you don't know its directory.",
+		func(_ context.Context, args findFilesArgs) (any, error) {
+			needle := strings.ToLower(strings.TrimSpace(args.Name))
+			if needle == "" {
+				return nil, fmt.Errorf("name is required")
+			}
+			limit := args.MaxResults
+			if limit <= 0 || limit > maxFindResults {
+				limit = maxFindResults
+			}
+			var out []string
+			truncated := false
+			_ = filepath.WalkDir(r.root, func(path string, d os.DirEntry, err error) error {
+				if err != nil {
+					return nil
+				}
+				if d.IsDir() {
+					if skipDirs[d.Name()] && path != r.root {
+						return filepath.SkipDir
+					}
+					return nil
+				}
+				if len(out) >= limit {
+					truncated = true
+					return filepath.SkipAll
+				}
+				rel, relErr := filepath.Rel(r.root, path)
+				if relErr != nil {
+					return nil
+				}
+				if strings.Contains(strings.ToLower(rel), needle) {
+					out = append(out, rel)
+				}
+				return nil
+			})
+			sort.Strings(out)
+			if len(out) == 0 {
+				return "(no files matched)", nil
+			}
+			res := strings.Join(out, "\n")
+			if truncated {
+				res += fmt.Sprintf("\n... (truncated at %d files; narrow the name)", limit)
+			}
+			return res, nil
+		},
+	)
+}
+
+func (r *repoFS) getDiffTool() llm.Tool {
+	return llm.DefineTool[struct{}](
+		"get_diff",
+		"Return the complete unified diff under review. The diff is also included (possibly truncated) in the task message; call this to get the full, untruncated text.",
+		func(_ context.Context, _ struct{}) (any, error) {
+			if strings.TrimSpace(r.diff) == "" {
+				return "(empty diff)", nil
+			}
+			return r.diff, nil
+		},
+	)
+}
@@ -0,0 +1,243 @@
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+)
+
+// buildFixtureRepo lays down a small repo tree for the toolbox tests and
+// returns its root.
+func buildFixtureRepo(t *testing.T) string {
+	t.Helper()
+	root := t.TempDir()
+	write := func(rel, content string) {
+		p := filepath.Join(root, rel)
+		if err := os.MkdirAll(filepath.Dir(p), 0o755); err != nil {
+			t.Fatal(err)
+		}
+		if err := os.WriteFile(p, []byte(content), 0o644); err != nil {
+			t.Fatal(err)
+		}
+	}
+	write("pkg/foo/foo.go", "package foo\n\nfunc Hello() string {\n\treturn \"hi\"\n}\n")
+	write("pkg/foo/bar.go", "package foo\n\n// TODO: refactor\nvar Answer = 42\n")
+	write("README.md", "# Fixture\n\nHello world.\n")
+	write(".git/config", "[core]\n\tbare = false\n") // must be skipped by grep/find
+	write("secret.txt", "this file lives at the repo root\n")
+	return root
+}
+
+// call invokes a tool from the sandbox's toolbox by name with JSON args and
+// returns the result string (or the error).
+func call(t *testing.T, fs *repoFS, name string, args map[string]any) (string, error) {
+	t.Helper()
+	box, err := fs.toolbox()
+	if err != nil {
+		t.Fatalf("toolbox: %v", err)
+	}
+	tool, ok := box.Get(name)
+	if !ok {
+		t.Fatalf("tool %q not in toolbox", name)
+	}
+	raw, err := json.Marshal(args)
+	if err != nil {
+		t.Fatal(err)
+	}
+	out, herr := tool.Handler(context.Background(), raw)
+	if herr != nil {
+		return "", herr
+	}
+	s, _ := out.(string)
+	return s, nil
+}
+
+func TestRepoFS_ResolveSandbox(t *testing.T) {
+	root := buildFixtureRepo(t)
+	fs, err := newRepoFS(root, "")
+	if err != nil {
+		t.Fatalf("newRepoFS: %v", err)
+	}
+
+	// In-bounds paths resolve.
+	if _, err := fs.resolve("pkg/foo/foo.go"); err != nil {
+		t.Errorf("in-bounds path rejected: %v", err)
+	}
+	if got, err := fs.resolve(""); err != nil || got != fs.root {
+		t.Errorf("empty path should be root: got %q err %v", got, err)
+	}
+
+	// Escapes are rejected.
+	for _, bad := range []string{"../outside", "../../etc/passwd", "pkg/../../escape", "/etc/passwd"} {
+		if _, err := fs.resolve(bad); err == nil {
+			t.Errorf("path %q escaped the sandbox but was allowed", bad)
+		}
+	}
+}
+
+func TestReadFileTool(t *testing.T) {
+	root := buildFixtureRepo(t)
+	fs, _ := newRepoFS(root, "")
+
+	out, err := call(t, fs, "read_file", map[string]any{"path": "pkg/foo/foo.go"})
+	if err != nil {
+		t.Fatalf("read_file: %v", err)
+	}
+	if !strings.Contains(out, "func Hello()") {
+		t.Errorf("expected file body, got:\n%s", out)
+	}
+	if !strings.Contains(out, "1\t") {
+		t.Errorf("expected line numbers, got:\n%s", out)
+	}
+
+	// Line slicing.
+	out, err = call(t, fs, "read_file", map[string]any{"path": "pkg/foo/foo.go", "start_line": 3, "limit": 1})
+	if err != nil {
+		t.Fatalf("read_file slice: %v", err)
+	}
+	if !strings.Contains(out, "func Hello()") || strings.Contains(out, "package foo") {
+		t.Errorf("slice should start at line 3 only, got:\n%s", out)
+	}
+
+	// Reading a directory is an error directing to list_dir.
+	if _, err := call(t, fs, "read_file", map[string]any{"path": "pkg/foo"}); err == nil {
+		t.Error("reading a directory should error")
+	}
+
+	// Escape is rejected.
+	if _, err := call(t, fs, "read_file", map[string]any{"path": "../escape"}); err == nil {
+		t.Error("read_file should reject sandbox escape")
+	}
+}
+
+func TestListDirTool(t *testing.T) {
+	root := buildFixtureRepo(t)
+	fs, _ := newRepoFS(root, "")
+
+	out, err := call(t, fs, "list_dir", map[string]any{"path": "pkg/foo"})
+	if err != nil {
+		t.Fatalf("list_dir: %v", err)
+	}
+	for _, want := range []string{"foo.go", "bar.go"} {
+		if !strings.Contains(out, want) {
+			t.Errorf("list_dir missing %q in:\n%s", want, out)
+		}
+	}
+
+	// Root listing marks directories with a trailing slash.
+	out, _ = call(t, fs, "list_dir", map[string]any{})
+	if !strings.Contains(out, "pkg/") {
+		t.Errorf("expected pkg/ (dir with trailing slash) in root listing:\n%s", out)
+	}
+}
+
+func TestGrepTool(t *testing.T) {
+	root := buildFixtureRepo(t)
+	fs, _ := newRepoFS(root, "")
+
+	out, err := call(t, fs, "grep", map[string]any{"pattern": "func Hello"})
+	if err != nil {
+		t.Fatalf("grep: %v", err)
+	}
+	if !strings.Contains(out, "pkg/foo/foo.go:") {
+		t.Errorf("grep should locate the func, got:\n%s", out)
+	}
+
+	// .git is skipped.
+	out, _ = call(t, fs, "grep", map[string]any{"pattern": "bare = false"})
+	if strings.Contains(out, ".git/") {
+		t.Errorf("grep must not descend into .git, got:\n%s", out)
+	}
+
+	// No matches is a clean message, not an error.
+	out, err = call(t, fs, "grep", map[string]any{"pattern": "zzz_no_such_token_zzz"})
+	if err != nil || !strings.Contains(out, "no matches") {
+		t.Errorf("expected clean no-match, got %q err %v", out, err)
+	}
+
+	// Invalid regexp surfaces as an error.
+	if _, err := call(t, fs, "grep", map[string]any{"pattern": "([unterminated"}); err == nil {
+		t.Error("invalid regexp should error")
+	}
+
+	// Scoped grep honors the path.
+	out, _ = call(t, fs, "grep", map[string]any{"pattern": "Answer", "path": "pkg/foo/bar.go"})
+	if !strings.Contains(out, "bar.go:") {
+		t.Errorf("scoped grep missed the match:\n%s", out)
+	}
+}
+
+func TestFindFilesTool(t *testing.T) {
+	root := buildFixtureRepo(t)
+	fs, _ := newRepoFS(root, "")
+
+	out, err := call(t, fs, "find_files", map[string]any{"name": "foo.go"})
+	if err != nil {
+		t.Fatalf("find_files: %v", err)
+	}
+	if !strings.Contains(out, "pkg/foo/foo.go") {
+		t.Errorf("find_files missed foo.go:\n%s", out)
+	}
+
+	// Case-insensitive substring on the path.
+	out, _ = call(t, fs, "find_files", map[string]any{"name": "PKG/FOO"})
+	if !strings.Contains(out, "pkg/foo/") {
+		t.Errorf("find_files should be case-insensitive on the path:\n%s", out)
+	}
+
+	// .git entries are not surfaced.
+	out, _ = call(t, fs, "find_files", map[string]any{"name": "config"})
+	if strings.Contains(out, ".git/") {
+		t.Errorf("find_files must skip .git, got:\n%s", out)
+	}
+}
+
+func TestGetDiffTool(t *testing.T) {
+	root := buildFixtureRepo(t)
+	const diff = "diff --git a/x b/x\n+added line\n"
+	fs, _ := newRepoFS(root, diff)
+
+	out, err := call(t, fs, "get_diff", map[string]any{})
+	if err != nil {
+		t.Fatalf("get_diff: %v", err)
+	}
+	if out != diff {
+		t.Errorf("get_diff returned %q, want %q", out, diff)
+	}
+}
+
+func TestNewRepoFS_BadRoot(t *testing.T) {
+	// A file (not a directory) is rejected.
+	f := filepath.Join(t.TempDir(), "afile")
+	if err := os.WriteFile(f, []byte("x"), 0o644); err != nil {
+		t.Fatal(err)
+	}
+	if _, err := newRepoFS(f, ""); err == nil {
+		t.Error("newRepoFS should reject a non-directory root")
+	}
+	if _, err := newRepoFS(filepath.Join(t.TempDir(), "missing"), ""); err == nil {
+		t.Error("newRepoFS should reject a missing root")
+	}
+}
+
+// Ensure the toolbox exposes exactly the expected tools (guards against an
+// accidental rename breaking the system prompt's tool references).
+func TestToolbox_Names(t *testing.T) {
+	fs, _ := newRepoFS(t.TempDir(), "")
+	box, err := fs.toolbox()
+	if err != nil {
+		t.Fatalf("toolbox: %v", err)
+	}
+	got := map[string]bool{}
+	for _, tl := range box.Tools() {
+		got[tl.Name] = true
+	}
+	for _, want := range []string{"read_file", "list_dir", "grep", "find_files", "get_diff"} {
+		if !got[want] {
+			t.Errorf("toolbox missing tool %q", want)
+		}
+	}
+}
@@ -0,0 +1,143 @@
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"strings"
+	"testing"
+
+	llm "gitea.stevedudenhoeffer.com/steve/majordomo/llm"
+	"gitea.stevedudenhoeffer.com/steve/majordomo/provider/fake"
+)
+
+// spinToolCall is a response that asks for the get_diff tool (which succeeds and
+// ignores extra args), used to burn agent steps without producing a final
+// answer. The args vary by n so successive calls are not byte-identical — that
+// dodges the agent's same-call loop guard, exactly as a real reviewer making
+// distinct tool calls would.
+func spinToolCall(n int) fake.Step {
+	return fake.ReplyWith(llm.Response{
+		ToolCalls: []llm.ToolCall{{
+			ID:        "call",
+			Name:      "get_diff",
+			Arguments: json.RawMessage(fmt.Sprintf(`{"_n":%d}`, n)),
+		}},
+		FinishReason: llm.FinishToolCalls,
+		Usage:        llm.Usage{InputTokens: 1, OutputTokens: 1},
+	})
+}
+
+// lastUserText returns the text of the final message in the request, which is
+// what a fresh Generate call is reacting to.
+func lastUserText(req llm.Request) string {
+	if len(req.Messages) == 0 {
+		return ""
+	}
+	return req.Messages[len(req.Messages)-1].Text()
+}
+
+// TestRunAgent_WrapUpNudgeProducesAnswer: a model that keeps calling tools until
+// it is nudged to wrap up should still finish inside its budget — the steer
+// message arrives a few steps before the cap and the model writes its answer.
+func TestRunAgent_WrapUpNudgeProducesAnswer(t *testing.T) {
+	t.Setenv("GADFLY_WRAPUP_RESERVE", "4")
+
+	final := "VERDICT: No material issues found."
+	nudgeSeen := false
+	n := 0
+	p := fake.New("fake", fake.WithDefault(func(_ string, req llm.Request) fake.Step {
+		if strings.Contains(lastUserText(req), "almost out of your investigation budget") {
+			nudgeSeen = true
+			return fake.Reply(final)
+		}
+		n++
+		return spinToolCall(n)
+	}))
+	mdl, err := p.Model("mock")
+	if err != nil {
+		t.Fatal(err)
+	}
+	fs, _ := newRepoFS(t.TempDir(), "diff --git a/x b/x\n+y\n")
+
+	out, err := runAgent(context.Background(), mdl, fs, "sys", "task", 12)
+	if err != nil {
+		t.Fatalf("runAgent should succeed via wrap-up nudge, got error: %v", err)
+	}
+	if out != final {
+		t.Errorf("expected final review %q, got %q", final, out)
+	}
+	if !nudgeSeen {
+		t.Error("the wrap-up nudge was never delivered to the model")
+	}
+}
+
+// TestRunAgent_FinalizationFallback: a model that ignores the wrap-up nudge and
+// spins on tools until the cap should NOT hard-fail — the tool-free finalization
+// pass forces a final answer out of the transcript.
+func TestRunAgent_FinalizationFallback(t *testing.T) {
+	t.Setenv("GADFLY_WRAPUP_RESERVE", "2")
+
+	final := "VERDICT: Minor issues\n- something"
+	forcedCalled := false
+	n := 0
+	p := fake.New("fake", fake.WithDefault(func(_ string, req llm.Request) fake.Step {
+		// Only the tool-free finalization pass forbids tools — reply there.
+		if req.ToolChoice == "none" {
+			forcedCalled = true
+			return fake.Reply(final)
+		}
+		// Otherwise keep spinning, ignoring the wrap-up nudge entirely.
+		n++
+		return spinToolCall(n)
+	}))
+	mdl, err := p.Model("mock")
+	if err != nil {
+		t.Fatal(err)
+	}
+	fs, _ := newRepoFS(t.TempDir(), "diff --git a/x b/x\n+y\n")
+
+	out, err := runAgent(context.Background(), mdl, fs, "sys", "task", 6)
+	if err != nil {
+		t.Fatalf("runAgent should recover via finalization fallback, got error: %v", err)
+	}
+	if !forcedCalled {
+		t.Error("finalization fallback was never invoked")
+	}
+	if out != final {
+		t.Errorf("expected forced final answer %q, got %q", final, out)
+	}
+}
+
+// TestRunAgent_FallbackStillEmptyIsError: if even the tool-free finalization
+// yields nothing, runAgent surfaces an error rather than a phantom success.
+func TestRunAgent_FallbackStillEmptyIsError(t *testing.T) {
+	n := 0
+	p := fake.New("fake", fake.WithDefault(func(_ string, req llm.Request) fake.Step {
+		if req.ToolChoice == "none" {
+			return fake.Reply("   ") // finalization produces only whitespace
+		}
+		n++
+		return spinToolCall(n)
+	}))
+	mdl, err := p.Model("mock")
+	if err != nil {
+		t.Fatal(err)
+	}
+	fs, _ := newRepoFS(t.TempDir(), "diff --git a/x b/x\n+y\n")
+
+	if _, err := runAgent(context.Background(), mdl, fs, "sys", "task", 4); err == nil {
+		t.Error("runAgent should error when the finalization fallback also yields no output")
+	}
+}
+
+func TestWrapUpReserve(t *testing.T) {
+	t.Setenv("GADFLY_WRAPUP_RESERVE", "")
+	if got := wrapUpReserve(); got != defaultWrapUpReserve {
+		t.Errorf("default wrap-up reserve = %d, want %d", got, defaultWrapUpReserve)
+	}
+	t.Setenv("GADFLY_WRAPUP_RESERVE", "7")
+	if got := wrapUpReserve(); got != 7 {
+		t.Errorf("wrap-up reserve override = %d, want 7", got)
+	}
+}
@@ -0,0 +1,135 @@
+#!/usr/bin/env bash
+# Gadfly container entrypoint.
+#
+# This is the brains that used to live in the Gitea Actions workflow YAML. A
+# consuming repo only commits a ~15-line stub workflow that runs this image and
+# passes the event context as env; ALL the gating, cloning, model-looping and
+# comment I/O happens here, so the stub stays dumb (act_runner has weak YAML
+# expression support — keep logic in the image, not the workflow).
+#
+# What it does:
+#   1. Decides whether this event should trigger a review (draft skip, comment
+#      trigger phrase + allowed-user gate, PR detection). Non-triggers exit 0.
+#   2. Acknowledges a comment trigger with a 👀 reaction.
+#   3. Shallow-clones the PR's head branch (the agentic reviewer reads the
+#      checked-out tree to VERIFY findings, not just the diff).
+#   4. Runs the gadfly reviewer once per configured model via run.sh, which
+#      upserts one labeled PR comment per model.
+#
+# Advisory only: it never blocks a merge. Config/usage errors exit non-zero;
+# everything review-related is posted as a comment, never a failed check.
+#
+# Env (set by the consumer's stub workflow from the github.* context):
+#   GITEA_API             https://HOST/api/v1/repos/OWNER/REPO            (required)
+#   GITEA_TOKEN           built-in Actions token (posts comments)         (required)
+#   OLLAMA_CLOUD_API_KEY  Ollama Cloud key; empty => "not configured" notice
+#   EVENT_NAME            pull_request | issue_comment | workflow_dispatch (required)
+#   PR                    pull request number                             (required)
+#   PR_BRANCH             head branch (github.head_ref); empty => fetched from API
+#   IS_DRAFT              'true' on a draft PR => skipped
+#   COMMENT_BODY          comment text (issue_comment only)
+#   COMMENT_ID            comment id, for the 👀 reaction (issue_comment only)
+#   ACTOR                 github.actor (the user who triggered)
+# Optional config:
+#   OLLAMA_REVIEW_MODELS  comma-separated model ids (default below)
+#   GADFLY_TRIGGER_PHRASE comment phrase that triggers a re-review (default "@gadfly review")
+#   GADFLY_ALLOWED_USERS  comma-separated usernames allowed to comment-trigger;
+#                         empty => fall back to "is a repo collaborator"
+set -uo pipefail
+
+DEFAULT_MODELS="qwen3-coder:480b-cloud,gpt-oss:120b-cloud"
+TRIGGER_PHRASE="${GADFLY_TRIGGER_PHRASE:-@gadfly review}"
+SCRIPTS_DIR="/app/scripts"
+WORKDIR="${WORKDIR:-/tmp/gadfly}"
+
+log() { echo "[gadfly] $*" >&2; }
+die() { log "ERROR: $*"; exit 1; }
+
+: "${GITEA_API:?GITEA_API required}"
+: "${GITEA_TOKEN:?GITEA_TOKEN required}"
+: "${PR:?PR required}"
+: "${EVENT_NAME:?EVENT_NAME required}"
+
+API() { curl -fsS --connect-timeout 20 --max-time 30 -H "Authorization: token ${GITEA_TOKEN}" "$@"; }
+
+# --- is the commenter allowed to trigger a re-review? ----------------------
+actor_allowed() {
+  local actor="$1"
+  [ -z "$actor" ] && return 1
+  if [ -n "${GADFLY_ALLOWED_USERS:-}" ]; then
+    local IFS=','
+    for u in $GADFLY_ALLOWED_USERS; do
+      [ "$(echo "$u" | tr -d '[:space:]')" = "$actor" ] && return 0
+    done
+    return 1
+  fi
+  # No explicit allow-list: allow anyone with collaborator (write) access.
+  local code
+  code="$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 20 --max-time 30 \
+    -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/collaborators/${actor}")"
+  [ "$code" = "204" ]
+}
+
+# --- trigger gating --------------------------------------------------------
+case "$EVENT_NAME" in
+  workflow_dispatch)
+    log "manual dispatch for PR #${PR}" ;;
+  pull_request)
+    if [ "${IS_DRAFT:-false}" = "true" ]; then
+      log "PR #${PR} is a draft; skipping"; exit 0
+    fi
+    log "new/updated PR #${PR}" ;;
+  issue_comment)
+    case "${COMMENT_BODY:-}" in
+      *"$TRIGGER_PHRASE"*) : ;;
+      *) log "comment does not contain trigger phrase ${TRIGGER_PHRASE}; skipping"; exit 0 ;;
+    esac
+    if ! actor_allowed "${ACTOR:-}"; then
+      log "actor '${ACTOR:-}' not allowed to trigger; skipping"; exit 0
+    fi
+    # Must be a comment on a PR, not a plain issue.
+    if ! API "${GITEA_API}/pulls/${PR}" >/dev/null 2>&1; then
+      log "issue #${PR} is not a pull request; skipping"; exit 0
+    fi
+    # Acknowledge with 👀.
+    if [ -n "${COMMENT_ID:-}" ]; then
+      curl -s -X POST -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
+        "${GITEA_API}/issues/comments/${COMMENT_ID}/reactions" -d '{"content":"eyes"}' >/dev/null 2>&1 || true
+    fi
+    log "comment-triggered review for PR #${PR} by ${ACTOR:-?}" ;;
+  *)
+    log "event '${EVENT_NAME}' not handled; skipping"; exit 0 ;;
+esac
+
+# --- resolve head branch ---------------------------------------------------
+BRANCH="${PR_BRANCH:-}"
+if [ -z "$BRANCH" ]; then
+  BRANCH="$(API "${GITEA_API}/pulls/${PR}" | jq -r '.head.ref // ""')"
+fi
+[ -z "$BRANCH" ] && die "could not determine PR #${PR} head branch"
+
+# --- clone the PR's checked-out tree (shallow) -----------------------------
+HOST="${GITEA_API%%/api/v1/*}"            # https://host
+REPO_PATH="${GITEA_API##*/api/v1/repos/}" # owner/repo
+CLONE_URL="https://token:${GITEA_TOKEN}@${HOST#https://}/${REPO_PATH}.git"
+REPO_DIR="${WORKDIR}/repo"
+rm -rf "$REPO_DIR"; mkdir -p "$WORKDIR"
+log "cloning ${REPO_PATH} @ ${BRANCH}"
+git clone --depth=1 --branch "$BRANCH" "$CLONE_URL" "$REPO_DIR" 2>/dev/null \
+  || die "clone of ${REPO_PATH}@${BRANCH} failed"
+
+# --- review once per model -------------------------------------------------
+MODELS="${OLLAMA_REVIEW_MODELS:-$DEFAULT_MODELS}"
+log "models: ${MODELS}"
+IFS=',' read -ra ARR <<< "$MODELS" || true
+for raw in "${ARR[@]}"; do
+  m="$(echo "$raw" | tr -d '[:space:]')"
+  [ -z "$m" ] && continue
+  log "::: reviewing with ${m}"
+  PROVIDER=ollama \
+  MODEL="$m" \
+  GADFLY_BIN="/usr/local/bin/gadfly" \
+  GADFLY_REPO_DIR="$REPO_DIR" \
+  bash "${SCRIPTS_DIR}/run.sh" || log "model ${m} failed (continuing)"
+done
+log "done"
@@ -0,0 +1,52 @@
+# Drop this in ANY Gitea repo at .gitea/workflows/adversarial-review.yml to turn
+# Gadfly on. The image holds all the logic; this stub just forwards the event
+# context. Advisory only — it never blocks a merge.
+#
+# Per-repo setup (no code changes needed):
+#   secret  OLLAMA_CLOUD_API_KEY   your Ollama Cloud key
+#   var     OLLAMA_REVIEW_MODELS   (optional) comma-separated model ids
+#   var     GADFLY_ALLOWED_USERS   (optional) who may "@gadfly review"; empty =
+#                                  any repo collaborator
+# GITEA_TOKEN is provided automatically; comments post as the gitea-actions user.
+
+name: Adversarial Review (Gadfly)
+
+on:
+  pull_request:
+    types: [opened, reopened, ready_for_review]
+  issue_comment:
+    types: [created]
+  workflow_dispatch:
+    inputs:
+      pr_number:
+        description: "PR number to review"
+        required: true
+
+permissions:
+  contents: read
+  issues: write
+  pull-requests: write
+
+concurrency:
+  group: gadfly-${{ github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number }}
+  cancel-in-progress: true
+
+jobs:
+  review:
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    steps:
+      - uses: docker://gitea.stevedudenhoeffer.com/steve/gadfly:v1
+        env:
+          GITEA_API: ${{ github.server_url }}/api/v1/repos/${{ github.repository }}
+          GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
+          OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }}
+          OLLAMA_REVIEW_MODELS: ${{ vars.OLLAMA_REVIEW_MODELS }}
+          GADFLY_ALLOWED_USERS: ${{ vars.GADFLY_ALLOWED_USERS }}
+          EVENT_NAME: ${{ github.event_name }}
+          PR: ${{ github.event.pull_request.number || github.event.issue.number || github.event.inputs.pr_number }}
+          PR_BRANCH: ${{ github.head_ref }}
+          IS_DRAFT: ${{ github.event.pull_request.draft }}
+          COMMENT_BODY: ${{ github.event.comment.body }}
+          COMMENT_ID: ${{ github.event.comment.id }}
+          ACTOR: ${{ github.actor }}
@@ -0,0 +1,5 @@
+module gitea.stevedudenhoeffer.com/steve/gadfly
+
+go 1.26.2
+
+require gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b
@@ -0,0 +1,2 @@
+gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b h1:/pglCqQW02kV2p9tKyQpIJoXZK2p7LKLeDCZL/V26MM=
+gitea.stevedudenhoeffer.com/steve/majordomo v0.0.0-20260610113006-0147a79d187b/go.mod h1:UZLveG17SmENt4sne2RSLIbioix30RZbRIQUzBAnOyY=
@@ -0,0 +1,171 @@
+#!/usr/bin/env bash
+# Adversarial PR review runner.
+#
+# Fetches a PR's unified diff + metadata from Gitea, asks ONE model to review it
+# adversarially, then upserts the result as a single labeled PR comment (so
+# re-runs on new commits update the comment in place instead of stacking dupes).
+#
+# The ollama lane is AGENTIC: it runs the cmd/gadfly Go binary, which drives a
+# tool-using agent (majordomo + Ollama Cloud) over the PR's checked-out repo so
+# the model can read_file/grep/etc. to VERIFY findings instead of guessing from
+# the diff alone. The antigravity lane stays a one-shot `agy` call (agy has its
+# own file tools).
+#
+# Required env:
+#   GITEA_API    e.g. https://gitea.stevedudenhoeffer.com/api/v1/repos/steve/mort
+#   GITEA_TOKEN  token with repo write access (posts the comment)
+#   PR           pull request index/number
+#   PROVIDER     "ollama" | "antigravity"
+#   MODEL        model id (e.g. qwen3-coder:480b-cloud, gemini-3-pro)
+#
+# Provider-specific env:
+#   ollama:      OLLAMA_CLOUD_API_KEY, GADFLY_BIN (path to the built reviewer),
+#                GADFLY_REPO_DIR (checked-out repo; default: this script's repo)
+#   antigravity: `agy` on PATH with credentials already seeded (~/.gemini)
+#
+# Optional:
+#   MAX_DIFF_CHARS  diff truncation cap for the prompt (default 60000)
+#
+# This script is advisory: it never fails the job for review content. It exits
+# non-zero only on a usage/configuration error.
+set -uo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+MAX_DIFF_CHARS="${MAX_DIFF_CHARS:-60000}"
+
+: "${GITEA_API:?GITEA_API required}"
+: "${GITEA_TOKEN:?GITEA_TOKEN required}"
+: "${PR:?PR required}"
+: "${PROVIDER:?PROVIDER required}"
+: "${MODEL:?MODEL required}"
+
+MARKER="<!-- gadfly-review:${PROVIDER}:${MODEL} -->"
+say() { echo "[gadfly-review:${PROVIDER}:${MODEL}] $*" >&2; }
+
+# jq is required for payload building / response parsing; install if missing.
+if ! command -v jq >/dev/null 2>&1; then
+  say "jq not found; attempting install"
+  { apt-get update -qq && apt-get install -y -qq jq; } >/dev/null 2>&1 \
+    || { sudo apt-get update -qq && sudo apt-get install -y -qq jq; } >/dev/null 2>&1 \
+    || { say "could not install jq"; exit 1; }
+fi
+
+# curl timeouts: Gitea API calls are quick. Word-split on purpose so the flags
+# expand as separate args. (The LLM call's own deadline lives in the reviewer
+# binary / agy, not here.)
+API_TIMEOUT="--connect-timeout 20 --max-time 30"
+
+# --- fetch PR context -------------------------------------------------------
+say "fetching PR #${PR} context"
+DIFF="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/pulls/${PR}.diff" || true)"
+META="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" "${GITEA_API}/pulls/${PR}" || echo '{}')"
+TITLE="$(echo "$META" | jq -r '.title // ""')"
+BODY="$(echo "$META" | jq -r '.body // ""')"
+
+if [ -z "$DIFF" ]; then
+  say "empty diff; nothing to review"
+  exit 0
+fi
+
+# Keep the FULL diff for the agentic (ollama) reviewer — it can pull the whole
+# thing via the get_diff tool and embeds a truncated copy in the prompt itself.
+# The truncated copy below is only for the one-shot antigravity prompt.
+FULL_DIFF="$DIFF"
+TRUNC_NOTE=""
+if [ "${#DIFF}" -gt "$MAX_DIFF_CHARS" ]; then
+  DIFF="${DIFF:0:$MAX_DIFF_CHARS}"
+  TRUNC_NOTE=$'\n\n[NOTE: diff truncated to '"${MAX_DIFF_CHARS}"' chars for length; review the rest manually.]'
+fi
+
+SYS="$(cat "${SCRIPT_DIR}/system-prompt.txt")"
+USR="$(printf 'PR #%s: %s\n\nDescription:\n%s\n\nUnified diff to review:\n```diff\n%s\n```%s' \
+  "$PR" "$TITLE" "$BODY" "$DIFF" "$TRUNC_NOTE")"
+
+# --- call the model ---------------------------------------------------------
+REVIEW=""
+case "$PROVIDER" in
+  ollama)
+    # Agentic lane: hand off to the cmd/gadfly binary, which runs a tool-using
+    # agent over the checked-out repo so it can verify findings instead of
+    # guessing from the diff. The workflow builds the binary and exports
+    # GADFLY_BIN + GADFLY_REPO_DIR; we fall back to sane defaults for a
+    # local run.
+    if [ -z "${OLLAMA_CLOUD_API_KEY:-}" ]; then
+      REVIEW="⚠️ \`OLLAMA_CLOUD_API_KEY\` is not configured; this reviewer was skipped."
+    else
+      BIN="${GADFLY_BIN:-gadfly}"
+      if ! command -v "$BIN" >/dev/null 2>&1 && [ ! -x "$BIN" ]; then
+        REVIEW="⚠️ Agentic reviewer binary not found (\`GADFLY_BIN=${BIN}\`); the workflow build step may have failed."
+      else
+        REPO_DIR="${GADFLY_REPO_DIR:-$(cd "${SCRIPT_DIR}/../../.." && pwd)}"
+        DIFF_FILE="$(mktemp)"
+        ERR_FILE="${DIFF_FILE}.err"
+        printf '%s' "$FULL_DIFF" > "$DIFF_FILE"
+        REVIEW="$(
+          OLLAMA_API_KEY="$OLLAMA_CLOUD_API_KEY" \
+          GADFLY_MODEL="$MODEL" \
+          GADFLY_REPO_DIR="$REPO_DIR" \
+          GADFLY_DIFF_FILE="$DIFF_FILE" \
+          GADFLY_SYSTEM_FILE="${SCRIPT_DIR}/system-prompt.txt" \
+          GADFLY_TITLE="$TITLE" \
+          GADFLY_BODY="$BODY" \
+          GADFLY_MAX_DIFF_CHARS="$MAX_DIFF_CHARS" \
+          "$BIN" 2>"$ERR_FILE"
+        )"
+        rc=$?
+        if [ "$rc" -ne 0 ] || [ -z "$REVIEW" ]; then
+          REVIEW="⚠️ Agentic reviewer for \`${MODEL}\` failed (exit ${rc}):
+\`\`\`
+$(tail -c 1500 "$ERR_FILE" 2>/dev/null)
+\`\`\`"
+        fi
+        rm -f "$DIFF_FILE" "$ERR_FILE"
+      fi
+    fi
+    ;;
+  antigravity)
+    if ! command -v agy >/dev/null 2>&1; then
+      REVIEW="⚠️ Antigravity CLI (\`agy\`) not found on PATH."
+    else
+      FULL="$(printf '%s\n\n%s' "$SYS" "$USR")"
+      if ! REVIEW="$(agy -p "$FULL" --model "$MODEL" 2>agy.err)"; then
+        REVIEW="⚠️ Antigravity CLI failed:
+\`\`\`
+$(tail -c 1500 agy.err 2>/dev/null)
+\`\`\`"
+      fi
+      [ -z "$REVIEW" ] && REVIEW="⚠️ Antigravity CLI returned no output (auth/quota?)."
+    fi
+    ;;
+  *)
+    say "unknown provider: ${PROVIDER}"; exit 1 ;;
+esac
+
+# --- assemble comment -------------------------------------------------------
+COMMENT="$(printf '%s\n### 🔭 Adversarial review — `%s` (%s)\n\n%s\n\n<sub>Automated adversarial review. Advisory only — does not block merge.</sub>' \
+  "$MARKER" "$MODEL" "$PROVIDER" "$REVIEW")"
+POST_BODY="$(jq -n --arg b "$COMMENT" '{body:$b}')"
+
+# --- upsert by marker -------------------------------------------------------
+EXISTING_ID=""
+page=1
+while [ "$page" -le 10 ]; do
+  CMTS="$(curl $API_TIMEOUT -fsS -H "Authorization: token ${GITEA_TOKEN}" \
+    "${GITEA_API}/issues/${PR}/comments?limit=50&page=${page}" || echo '[]')"
+  [ "$(echo "$CMTS" | jq 'length')" = "0" ] && break
+  EXISTING_ID="$(echo "$CMTS" | jq -r --arg m "$MARKER" \
+    '.[] | select(.body != null and (.body | startswith($m))) | .id' | head -n1)"
+  [ -n "$EXISTING_ID" ] && break
+  page=$((page+1))
+done
+
+if [ -n "$EXISTING_ID" ]; then
+  say "updating existing comment ${EXISTING_ID}"
+  curl $API_TIMEOUT -sS -X PATCH -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
+    "${GITEA_API}/issues/comments/${EXISTING_ID}" -d "$POST_BODY" >/dev/null
+else
+  say "creating new comment"
+  curl $API_TIMEOUT -sS -X POST -H "Authorization: token ${GITEA_TOKEN}" -H "Content-Type: application/json" \
+    "${GITEA_API}/issues/${PR}/comments" -d "$POST_BODY" >/dev/null
+fi
+say "done"
@@ -0,0 +1,47 @@
+You are Gadfly, an ADVERSARIAL code reviewer. Your job is to find real problems in the
+pull request below — not to praise it. A gadfly does not let things slide.
+
+You are AGENTIC: you have read-only tools over the repository AT THIS PR's checked-out
+state. USE THEM to verify before you report. Do not review the diff in isolation.
+- read_file(path[, start_line, limit]) — read a file with line numbers.
+- list_dir([path]) — list a directory.
+- grep(pattern[, path, max_results]) — RE2 regex search across the repo.
+- find_files(name[, max_results]) — locate a file by path substring.
+- get_diff() — the full unified diff (the task message may truncate it).
+
+Mandatory verification discipline — this is the whole point of giving you tools:
+- Before claiming a missing/duplicate import, an undefined symbol, a wrong signature,
+  a type error, or any "this won't compile / won't resolve" issue: OPEN the file and
+  CHECK. The diff hunk shows only a few context lines; the declaration you're worried
+  about is almost always just outside it.
+- Before claiming a cross-file problem (a caller you think you broke, a missing update
+  to another layer/interface): grep for the symbol and read the other side.
+- If you cannot confirm a suspicion with the tools, either drop it or clearly label it
+  "unverified" — do NOT present an unchecked guess as a finding.
+
+Be skeptical and concrete. Hunt specifically for:
+- Correctness bugs and logic errors introduced by the change.
+- SEMANTIC / domain correctness — the failure mode plausible-looking code hides best.
+  Do NOT trust a constant, conversion factor, formula, unit, or threshold just because
+  it looks reasonable. Independently RE-DERIVE the expected value from first principles
+  (units, dimensions, edge values) and compare. A magic number that "looks about right"
+  is exactly where real bugs hide (e.g. a linear factor used where it must be squared).
+- Concurrency issues: data races, deadlocks, unsynchronized shared state, leaked tasks.
+- Security problems: injection, missing authz/authn, secret leakage, unsafe input handling.
+- Error handling gaps: ignored errors, swallowed exceptions, missing rollback/cleanup.
+- Resource leaks: unclosed handles/bodies/files, context/lifetime misuse, unbounded growth.
+- Missed edge cases: off-by-one, nil/null, empty collection, overflow, zero/negative.
+- Violations of THIS repo's own conventions. Discover them — do not assume. Read any
+  README / CONTRIBUTING / CLAUDE.md / AGENTS.md / lint config the repo ships, and hold
+  the change to the patterns the surrounding code actually uses.
+
+Output rules:
+- Output GitHub-flavored markdown, concise. No filler, no restating the diff.
+- Lead with a one-line VERDICT: exactly one of "No material issues found",
+  "Minor issues", or "Blocking issues found".
+- Then a short bulleted list of findings. For each finding cite `path:line` and explain
+  the concrete impact and a suggested fix. Note which findings you verified by reading
+  the code (and how) versus any you could not confirm.
+- Only report issues you are reasonably confident are real after checking. If the diff
+  is clean, say so plainly rather than inventing nits.
+- When you are done investigating, STOP calling tools and reply with the final review.