# ๐Ÿชฐ Gadfly **An AI gadfly for your pull requests.** Gadfly is an *adversarial* code reviewer that runs in Gitea Actions: on every PR it reads your actual repository, hunts for real problems, verifies them against the code, and posts its findings as a comment. It does not praise your code. A gadfly does not let things slide. > ### ๐Ÿค– Heads up: this is a vibe-coded project > Gadfly was built almost entirely by an AI agent (Claude Code), prompts and all โ€” the > reviewer's "brain" is a language model, and so was most of the author. It works and it's > tested, but treat it accordingly: **it is advisory only, it never blocks a merge, and you > should still review its reviews.** Issues and PRs welcome; expect the occasional > AI-flavored rough edge. ## What makes it different Most LLM "review my diff" bots read the diff in isolation and hallucinate problems they can't actually see โ€” a "missing import" that's three lines above the hunk, a "broken caller" in a file they never opened. Gadfly is **agentic**: the model has read-only tools over the checked-out repo and is *required* to use them before reporting anything. - **Tools:** `read_file`, `list_dir`, `grep`, `find_files`, `get_diff`. - **Verify-before-claiming discipline:** baked into the system prompt โ€” open the file, grep the symbol, or drop the finding. - **Two passes:** a *review* pass drafts findings, then an adversarial *recheck* pass independently re-verifies each one against the code and drops the ones it can't confirm, recomputing the verdict. This is what kills "confident but wrong." - **Semantic-bug hunting:** it's told not to trust a plausible-looking constant, conversion factor, or formula โ€” re-derive the expected value, because that's where real bugs hide. Every review leads with a one-line verdict: **No material issues found**, **Minor issues**, or **Blocking issues found**. ## Turn it on for a repo Gadfly ships as a container image, so consuming repos don't build anything โ€” they just run it. Drop one file in your repo and set a couple of secrets/vars: 1. Copy a stub from [`examples/`](examples/) to `.gitea/workflows/adversarial-review.yml` in your repo โ€” [`adversarial-review.yml`](examples/adversarial-review.yml) for the Ollama Cloud default, or a provider-specific one (local Ollama, OpenAI-compatible, endpoint aliases). See the [examples index](examples/README.md). 2. Add repo config: - **secret** `OLLAMA_CLOUD_API_KEY` โ€” your [Ollama Cloud](https://ollama.com) key (empty โ‡’ Gadfly posts a harmless "not configured" notice instead of reviewing). *Not needed if you point Gadfly at a different provider โ€” see [Models & providers](#models--providers).* - **var** `OLLAMA_REVIEW_MODELS` *(optional)* โ€” comma-separated model ids (default `qwen3-coder:480b-cloud,gpt-oss:120b-cloud`). One comment per model. - **var** `GADFLY_ALLOWED_USERS` *(optional)* โ€” who may re-trigger via comment; empty โ‡’ any repo collaborator. `GITEA_TOKEN` is provided automatically by Actions; comments post as the `gitea-actions` user, scoped to that repo โ€” no bot account needed. ## Models & providers Gadfly is built on [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo), so the reviewer model is not hard-wired โ€” it can target anything majordomo supports. Pick a provider by setting `GADFLY_PROVIDER` (used to prefix bare model ids); point at a custom endpoint with `GADFLY_BASE_URL`; supply a key with `GADFLY_API_KEY` or the provider's standard env var. A `GADFLY_MODEL`/`GADFLY_MODELS` value that already contains a `provider/` prefix (or is a majordomo failover chain / alias) is used verbatim. | Provider | `GADFLY_PROVIDER` | Key env | Status | |----------|-------------------|---------|--------| | **Ollama Cloud** (default) | `ollama-cloud` | `OLLAMA_API_KEY` / `OLLAMA_CLOUD_API_KEY` | โœ… in active use | | **Local Ollama** | `ollama` | none (`OLLAMA_HOST` or `GADFLY_BASE_URL` for a remote daemon) | โœ… tested | | **[foreman](https://gitea.stevedudenhoeffer.com/steve/foreman)** (native-Ollama queue daemon) | `foreman` + `GADFLY_BASE_URL`, or a `GADFLY_ENDPOINT_*` / `LLM_*` `foreman://` entry | optional bearer (via the endpoint/DSN) | โœ… native-Ollama path | | **OpenAI-compatible** (incl. local Ollama's `/v1`) | `openai` + `GADFLY_BASE_URL` | `OPENAI_API_KEY` (any non-empty for Ollama) | โœ… tested against Ollama | | **OpenAI** | `openai` | `OPENAI_API_KEY` | โš ๏ธ wired, **untested** | | **Anthropic** | `anthropic` | `ANTHROPIC_API_KEY` | โš ๏ธ wired, **untested** | | **Google (Gemini)** | `google` | `GOOGLE_API_KEY` / `GEMINI_API_KEY` | โš ๏ธ wired, **untested** | > ### ๐Ÿงช Honest status > Only the **Ollama** paths above are actually exercised. The OpenAI / Anthropic / Google > providers come "for free" from majordomo's abstraction and *should* work, but I haven't > spent money verifying them โ€” treat them as untested. The OpenAI-**compatible** path **is** > tested, because you can point it at a local Ollama (`GADFLY_BASE_URL=http://localhost:11434/v1`) > and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you > try a cloud provider and it works (or doesn't), please open an issue. ### Endpoint aliases via env vars For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by name with env vars and then reference `name/model` in `GADFLY_MODEL`/`GADFLY_MODELS`: ```sh # http-capable (Gadfly-native) โ€” base URL used verbatim, so plaintext LAN works: GADFLY_ENDPOINT_BIGBOX="ollama|http://192.168.1.50:11434" GADFLY_ENDPOINT_GPU="openai|http://gpu.lan:8000/v1|sk-local" GADFLY_ENDPOINT_M1="foreman|http://foreman-m1:8080|tok" # native-Ollama queue daemon GADFLY_MODELS="bigbox/qwen2.5-coder:7b,gpu/llama3.1,m1/qwen3:14b" # pure spec alias (a model, or a failover chain): GADFLY_ALIAS_FAST="bigbox/qwen2.5-coder:7b,ollama-cloud/gpt-oss:120b-cloud" GADFLY_MODEL="fast" ``` `` is lowercased to form the registry name (`GADFLY_ENDPOINT_BIGBOX` โ†’ `bigbox`). This is the same idea as majordomo's built-in **`LLM_*` env DSNs** (`LLM_BIGBOX=ollama://tok@host`, `LLM_M1=foreman://tok@host`), which Gadfly also honors โ€” but those are **HTTPS-only**, so for a plaintext local Ollama or `http://` foreman use `GADFLY_ENDPOINT_*` instead. > **Gitea Actions note:** repo `vars`/`secrets` aren't auto-exposed as env โ€” add each alias to > the stub workflow's `env:` block, e.g. `GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}`. ## Specialists (the review swarm) Instead of one generic reviewer, Gadfly runs a **suite of specialists** โ€” each a focused lens with its own review (+recheck) pass โ€” and merges them into **one comment**, a collapsible section per lens, led by an overall verdict (the worst across lenses; the optional `improvements` lens never escalates it). **Default suite** (when nothing is configured): `security`, `correctness`, `maintainability` (code cleanliness), `performance`, `error-handling`. **Also built in** (opt-in by name): `tests`, `docs`, `conventions`, and `improvements` (strict & quiet โ€” at most 1โ€“2 high-value, non-blocking suggestions, silent otherwise). Select which run with **`GADFLY_SPECIALISTS`** (comma-separated names, or `all`): ```yaml GADFLY_SPECIALISTS: "security,correctness,maintainability,tests" ``` **Define your own** โ€” two ways, which compose (env overrides file overrides built-ins): ```yaml # 1. env: GADFLY_SPECIALIST_="" (also overrides a built-in by reusing its name) GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes." GADFLY_SPECIALISTS: "security,correctness,migrations" ``` ```yaml # 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml: specialists: [security, correctness, maintainability, migrations] define: - name: migrations title: "๐Ÿ—ƒ๏ธ DB migrations" focus: "Review schema migrations for destructive ops, missing indexes, table locks." ``` **Dynamic selection (`auto`):** set `GADFLY_SPECIALISTS: auto` and a selector model reads the changed files + PR description and picks only the lenses that materially apply (and may invent an ad-hoc one โ€” e.g. a "migrations" lens for a schema change). The selector is `GADFLY_SELECTOR_MODEL` if set (a cheap tier is ideal), else the review model. Capped and de-duplicated; falls back to the default suite if selection fails. **Worker-tier delegation:** set `GADFLY_WORKER_MODEL` (a cheap/fast model) to give every reviewer a `delegate_investigation` tool โ€” it offloads mechanical legwork (trace all callers, gather every usage, check a pattern across files) to a worker sub-agent that returns a concise, evidence-cited digest, so the expensive model reasons over summaries instead of raw file dumps. Unset = no delegation (current behavior). > **Cost:** each specialist is its own review+recheck, so cost โ‰ˆ *specialists ร— models ร— 2*. > The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, let `auto` pick > only what a diff needs, and point heavy legwork at a cheap `GADFLY_WORKER_MODEL`. ### Concurrency (per-provider lanes) With multiple models, each **provider** is its own lane and lanes run in **parallel**, so a fast cloud provider isn't stuck behind a slow local box. Within a lane, at most `cap` models run at once โ€” `cap` comes from `GADFLY_PROVIDER_CONCURRENCY` (a `provider=N` map) else `GADFLY_CONCURRENCY` (default `1`). The timeout is **per-lens** (`GADFLY_TIMEOUT_SECS`), so a slow model on one lens can't starve the others. ```yaml # One local box (serial โ€” it serves one model at a time) + 3 cloud reviews at once, # both lanes running concurrently: GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,m1pro=1" GADFLY_MODELS: "m1pro/qwen3:14b,qwen3-coder:480b-cloud,gpt-oss:120b-cloud" ``` A model's provider is the spec's first segment (`m1pro/โ€ฆ` โ†’ `m1pro`), or `GADFLY_PROVIDER`/ `ollama-cloud` for a bare id. Default (`cap 1`) keeps a single-provider pool fully sequential. ### Triggers 1. A **new/reopened/ready** non-draft PR โ€” automatic. 2. Commenting **`@gadfly review`** on a PR โ€” re-review on demand (gated to allowed users). 3. **workflow_dispatch** โ€” manual, with a `pr_number` input. (Pushing new commits does *not* auto-re-review โ€” comment `@gadfly review` after pushing fixes. This keeps usage down.) > **Comment trigger needs the workflow on your default branch.** Gitea runs `issue_comment` > workflows from the **default branch**, so `@gadfly review` only works once this stub is > merged to `main` (the `pull_request` auto-trigger works from the PR branch immediately). > > **Security:** the example stubs gate the comment trigger with a job-level > `if: github.event_name != 'issue_comment' || github.actor == ''` so an untrusted > commenter can't start a secret-bearing run โ€” edit it to your maintainers and keep it in > sync with `GADFLY_ALLOWED_USERS` (the in-container check). `@gadfly review` is plain-text > matched (configurable via `GADFLY_TRIGGER_PHRASE`), so no bot account is required; comments > post as `gitea-actions`. ## How it's packaged ``` cmd/gadfly/ the agentic reviewer binary (majordomo + Ollama Cloud); zero deps beyond stdlib + majordomo scripts/run.sh fetches the PR diff, runs the reviewer, upserts one labeled comment scripts/system-prompt.txt the reviewer persona + verification discipline entrypoint.sh the container brains: trigger gating, clone, model loop (logic lives here, not in YAML) Dockerfile multi-stage; build-time module creds (BuildKit secrets) never reach the final image .gitea/workflows/build-image.yml push to main โ†’ :latest; tag v* โ†’ : + :latest examples/ the ~15-line stub a consuming repo drops in ``` The image is published to `gitea.stevedudenhoeffer.com/steve/gadfly`. Every push to `main` rebuilds and republishes `:latest` (plus `:sha-`); pushing a `v*` tag publishes that pinned version (plus `:latest`). Pin consumers to a `:vN` tag for stability, or track `:latest` to ride main. ## Configuration (advanced) The reviewer binary reads these (the stub/entrypoint set sane defaults): | Env | Default | Meaning | |-----|---------|---------| | `GADFLY_MODEL` | โ€” | model id, or `provider/model` spec, or majordomo alias/chain | | `GADFLY_PROVIDER` | `ollama-cloud` | provider prefix for a bare model id | | `GADFLY_BASE_URL` | โ€” | override endpoint (OpenAI/Ollama-compatible servers) | | `GADFLY_API_KEY` | โ€” | provider key; falls back to the provider's standard env | | `GADFLY_SPECIALISTS` | default suite | csv of lenses, `all`, or `auto` (dynamic selection) | | `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode | | `GADFLY_WORKER_MODEL` | โ€” | cheap model for `delegate_investigation`; unset = no delegation | | `GADFLY_WORKER_MAX_STEPS` | 8 | tool-step cap for a delegated worker run | | `GADFLY_CONCURRENCY` | 1 | default max models run at once **per provider** | | `GADFLY_PROVIDER_CONCURRENCY` | โ€” | per-provider overrides, e.g. `ollama-cloud=3,m1pro=1` | | `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap | | `GADFLY_TIMEOUT_SECS` | 300 | deadline **per specialist lens** (review+recheck) | | `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass | | `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap | | `GADFLY_TIMEOUT_SECS` | 300 | overall deadline (both passes) | | `GADFLY_MAX_DIFF_CHARS` | 60000 | diff chars embedded in the prompt (full diff via `get_diff`) | | `GADFLY_TRIGGER_PHRASE` | `@gadfly review` | comment phrase that re-triggers | | `GADFLY_ALLOWED_USERS` | *(collaborators)* | comma-separated allow-list for comment triggers | ## Building locally ```sh go build ./cmd/gadfly # needs read access to the private majordomo module go test ./... ``` ## License MIT โ€” see [LICENSE](LICENSE).