From PR #10's own review (maintainability/perf lenses): examples/README.md hadn't been updated for the default swarm, and CLAUDE.md's 'keep the default model count low' cost guidance read as contradicting the new heavy default. Clarify that the IMAGE default stays minimal while the REUSABLE ships an opinionated heavier default consumers inherit/override. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🪰 Gadfly
An AI gadfly for your pull requests. Gadfly is an adversarial code reviewer that runs in Gitea Actions: on every PR it reads your actual repository, hunts for real problems, verifies them against the code, and posts its findings as a comment. It does not praise your code. A gadfly does not let things slide.
🤖 Heads up: this is a vibe-coded project
Gadfly was built almost entirely by an AI agent (Claude Code), prompts and all — the reviewer's "brain" is a language model, and so was most of the author. It works and it's tested, but treat it accordingly: it is advisory only, it never blocks a merge, and you should still review its reviews. Issues and PRs welcome; expect the occasional AI-flavored rough edge.
What makes it different
Most LLM "review my diff" bots read the diff in isolation and hallucinate problems they can't actually see — a "missing import" that's three lines above the hunk, a "broken caller" in a file they never opened. Gadfly is agentic: the model has read-only tools over the checked-out repo and is required to use them before reporting anything.
- Tools:
read_file,list_dir,grep,find_files,get_diff. - Verify-before-claiming discipline: baked into the system prompt — open the file, grep the symbol, or drop the finding.
- Two passes: a review pass drafts findings, then an adversarial recheck pass independently re-verifies each one against the code and drops the ones it can't confirm, recomputing the verdict. This is what kills "confident but wrong."
- Semantic-bug hunting: it's told not to trust a plausible-looking constant, conversion factor, or formula — re-derive the expected value, because that's where real bugs hide.
Every review leads with a one-line verdict: No material issues found, Minor issues, or Blocking issues found.
Turn it on for a repo
Gadfly ships as a container image, so consuming repos don't build anything — they just run it. Drop one file in your repo and set a couple of secrets/vars:
- Copy a stub from
examples/to.gitea/workflows/adversarial-review.ymlin your repo. Two flavors: the slimreusable.yml— a tiny caller of Gadfly's reusable workflow (uses: steve/gadfly/.gitea/workflows/review-reusable.yml@…, forwarding only the secrets the reviewer needs), which ships a default swarm (3 cloud models + the Claude Code engine, 5-lens suite) you inherit by omittingwith:or override per-input — or the full self-containedadversarial-review.yml(Ollama Cloud default, with inline notes for every provider / local Ollama / OpenAI-compatible / endpoint aliases). See the examples index. - Add repo config:
- secret
OLLAMA_CLOUD_API_KEY— your Ollama Cloud key (empty ⇒ Gadfly posts a harmless "not configured" notice instead of reviewing). Not needed if you point Gadfly at a different provider — see Models & providers. - var
OLLAMA_REVIEW_MODELS(optional) — comma-separated model ids (defaultqwen3-coder:480b-cloud,gpt-oss:120b-cloud). One comment per model. - var
GADFLY_ALLOWED_USERS(optional) — who may re-trigger via comment; empty ⇒ any repo collaborator.
- secret
GITEA_TOKEN is provided automatically by Actions; comments post as the gitea-actions
user, scoped to that repo — no bot account needed.
Models & providers
Gadfly is built on majordomo, so the
reviewer model is not hard-wired — it can target anything majordomo supports. Pick a provider
by setting GADFLY_PROVIDER (used to prefix bare model ids); point at a custom endpoint with
GADFLY_BASE_URL; supply a key with GADFLY_API_KEY or the provider's standard env var. A
GADFLY_MODEL/GADFLY_MODELS value that already contains a provider/ prefix (or is a
majordomo failover chain / alias) is used verbatim.
| Provider | GADFLY_PROVIDER |
Key env | Status |
|---|---|---|---|
| Ollama Cloud (default) | ollama-cloud |
OLLAMA_API_KEY / OLLAMA_CLOUD_API_KEY |
✅ in active use |
| Local Ollama | ollama |
none (OLLAMA_HOST or GADFLY_BASE_URL for a remote daemon) |
✅ tested |
| foreman (native-Ollama queue daemon) | foreman + GADFLY_BASE_URL, or a GADFLY_ENDPOINT_* / LLM_* foreman:// entry |
optional bearer (via the endpoint/DSN) | ✅ native-Ollama path |
| llama-swap (model-swapping proxy) | llama-swap/llama-swaps (un-hyphenated llamaswap/llamaswaps also accepted) + GADFLY_BASE_URL or a GADFLY_ENDPOINT_* entry, or an LLM_* llama-swap:// / llama-swaps:// DSN |
optional bearer | ⚠️ wired, untested |
OpenAI-compatible (incl. local Ollama's /v1) |
openai + GADFLY_BASE_URL |
OPENAI_API_KEY (any non-empty for Ollama) |
✅ tested against Ollama |
| OpenAI | openai |
OPENAI_API_KEY |
⚠️ wired, untested |
| Anthropic | anthropic |
ANTHROPIC_API_KEY |
⚠️ wired, untested |
| Google (Gemini) | google |
GOOGLE_API_KEY / GEMINI_API_KEY |
⚠️ wired, untested |
🧪 Honest status
Only the Ollama paths above are actually exercised. The OpenAI / Anthropic / Google providers come "for free" from majordomo's abstraction and should work, but I haven't spent money verifying them — treat them as untested. The OpenAI-compatible path is tested, because you can point it at a local Ollama (
GADFLY_BASE_URL=http://localhost:11434/v1) and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you try a cloud provider and it works (or doesn't), please open an issue.
Claude Code engine (claude-code)
Besides the majordomo model loop, Gadfly can review through the Claude Code
CLI: for each lens it shells out to claude -p inside the checked-out repo, so Claude Code
uses its own read tools (Read/Grep/Glob) to verify findings against real code, then Gadfly
parses the result and runs the same verdict-parse → recheck → consolidate → emit pipeline. The
CLI is bundled in the image (Node + @anthropic-ai/claude-code).
Select it as a model id — bare claude-code (CLI default model) or claude-code/<model> (the
suffix becomes --model, e.g. claude-code/sonnet, claude-code/opus). An optional
:<thinking> suffix forces an extended-thinking budget for that reviewer — :max (the high
"ultrathink" tier) or :<n> for a specific token budget — so you can run the same model at two
thinking depths as separate reviewers:
GADFLY_MODELS: "claude-code/sonnet,claude-code/opus,claude-code/opus:max"
The thinking budget is applied via the MAX_THINKING_TOKENS env on the CLI subprocess; it's
best-effort (a no-op if the installed CLI build doesn't honor it).
Auth is read from the environment: the default is a Pro/Max subscription via
CLAUDE_CODE_OAUTH_TOKEN (from claude setup-token; no --bare), falling back to
ANTHROPIC_API_KEY. Don't set both. Tuning knobs (all optional):
| Env | Default | Meaning |
|---|---|---|
GADFLY_CLAUDE_MODEL |
(from the spec suffix) | overrides the --model value |
GADFLY_CLAUDE_PERMISSION_MODE |
plan |
--permission-mode (read-only plan keeps it from editing) |
GADFLY_CLAUDE_ALLOWED_TOOLS |
(unset) | --allowedTools value, passed verbatim (e.g. Read,Grep,Glob) |
GADFLY_CLAUDE_EXTRA_ARGS |
(unset) | extra CLI args, whitespace-split (no shell quoting) and appended after the defaults (e.g. --max-turns 30) |
GADFLY_CLAUDE_BIN |
claude |
CLI binary path |
These are operator knobs (workflow env), not PR-author input. Because
GADFLY_CLAUDE_EXTRA_ARGSis appended after the defaults, it can override the read-only--permission-mode plan(e.g. passing--permission-mode acceptEdits), so keep it read-only unless you mean otherwise. It's whitespace-split, so values can't contain spaces — useGADFLY_CLAUDE_ALLOWED_TOOLS/_PERMISSION_MODE/_MODELfor those. The subprocess runs with a minimal environment (its auth token +PATH/HOME/locale/GADFLY_CLAUDE_*), not the runner's full env, so the Gitea token and provider keys aren't handed to the CLI.
Alternate backends (example only, not validated here). Because the subprocess env forwards
ANTHROPIC_* and CLAUDE_*, you can point the same engine at a non-Anthropic backend by setting
ANTHROPIC_BASE_URL (and ANTHROPIC_AUTH_TOKEN/ANTHROPIC_API_KEY) to an Anthropic-API-compatible
proxy — e.g. claude-code-router or LiteLLM in
front of Ollama — to run Ollama models through Claude Code's harness and compare it against the
native majordomo loop. Whether tool-use survives a given proxy/backend varies, so this is documented
as an example, not wired or tested here.
The Pro/Max path is dogfooded but otherwise lightly tested.
claude-code/sonnetnow runs on gadfly's own PRs (see.gitea/workflows/adversarial-review.yml), but treat the engine as new — and note that subscription auth in automated CI is a gray area in Anthropic's terms.autospecialist selection and thedelegate_investigationworker are majordomo-only and are skipped with this engine (Claude Code does its own legwork).
Endpoint aliases via env vars
For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by
name with env vars and then reference name/model in GADFLY_MODEL/GADFLY_MODELS:
# http-capable (Gadfly-native) — base URL used verbatim, so plaintext LAN works:
GADFLY_ENDPOINT_BIGBOX="ollama|http://192.168.1.50:11434"
GADFLY_ENDPOINT_GPU="openai|http://gpu.lan:8000/v1|sk-local"
GADFLY_ENDPOINT_M1="foreman|http://foreman-m1:8080|tok" # native-Ollama queue daemon
GADFLY_MODELS="bigbox/qwen2.5-coder:7b,gpu/llama3.1,m1/qwen3:14b"
# pure spec alias (a model, or a failover chain):
GADFLY_ALIAS_FAST="bigbox/qwen2.5-coder:7b,ollama-cloud/gpt-oss:120b-cloud"
GADFLY_MODEL="fast"
<NAME> is lowercased to form the registry name (GADFLY_ENDPOINT_BIGBOX → bigbox). This
is the same idea as majordomo's built-in LLM_* env DSNs (LLM_BIGBOX=ollama://tok@host,
LLM_M1=foreman://tok@host), which Gadfly also honors — but those are HTTPS-only, so for a
plaintext local Ollama or http:// foreman use GADFLY_ENDPOINT_* instead.
Gitea Actions note: repo
vars/secretsaren't auto-exposed as env — add each alias to the stub workflow'senv:block, e.g.GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}.
Specialists (the review swarm)
Instead of one generic reviewer, Gadfly runs a suite of specialists — each a focused lens
with its own review (+recheck) pass — and merges them into one comment, a collapsible
section per lens, led by an overall verdict (the worst across lenses; the optional
improvements lens never escalates it).
Default suite (when nothing is configured):
security, correctness, maintainability (code cleanliness), performance, error-handling.
Also built in (opt-in by name): tests, docs, conventions, and improvements
(strict & quiet — at most 1–2 high-value, non-blocking suggestions, silent otherwise).
Select which run with GADFLY_SPECIALISTS (comma-separated names, or all):
GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"
Define your own — two ways, which compose (env overrides file overrides built-ins):
# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>" (also overrides a built-in by reusing its name)
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
GADFLY_SPECIALISTS: "security,correctness,migrations"
# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
specialists: [security, correctness, maintainability, migrations]
define:
- name: migrations
title: "🗃️ DB migrations"
focus: "Review schema migrations for destructive ops, missing indexes, table locks."
Dynamic selection (auto): set GADFLY_SPECIALISTS: auto and a selector model reads the
changed files + PR description and picks only the lenses that materially apply (and may invent
an ad-hoc one — e.g. a "migrations" lens for a schema change). The selector is
GADFLY_SELECTOR_MODEL if set (a cheap tier is ideal), else the review model. Capped and
de-duplicated; falls back to the default suite if selection fails.
Worker-tier delegation: set GADFLY_WORKER_MODEL (a cheap/fast model) to give every
reviewer a delegate_investigation tool — it offloads mechanical legwork (trace all callers,
gather every usage, check a pattern across files) to a worker sub-agent that returns a concise,
evidence-cited digest, so the expensive model reasons over summaries instead of raw file dumps.
Unset = no delegation (current behavior).
Cost: each specialist is its own review+recheck, so cost ≈ specialists × models × 2. The default suite runs on a single model. Trim with
GADFLY_SPECIALISTS, letautopick only what a diff needs, and point heavy legwork at a cheapGADFLY_WORKER_MODEL.
Concurrency (per-provider lanes)
With multiple models, each provider is its own lane and lanes run in parallel, so a fast
cloud provider isn't stuck behind a slow local box. Within a lane, at most cap models run at
once — cap comes from GADFLY_PROVIDER_CONCURRENCY (a provider=N map) else GADFLY_CONCURRENCY
(default 1). The timeout is per-lens (GADFLY_TIMEOUT_SECS), so a slow model on one lens
can't starve the others.
# One local box (serial — it serves one model at a time) + 3 cloud reviews at once,
# both lanes running concurrently:
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,m1pro=1"
GADFLY_MODELS: "m1pro/qwen3:14b,qwen3-coder:480b-cloud,gpt-oss:120b-cloud"
A model's provider is the spec's first segment (m1pro/… → m1pro), or GADFLY_PROVIDER/
ollama-cloud for a bare id. Default (cap 1) keeps a single-provider pool fully sequential.
Lens fan-out (within a model). By default the specialist lenses run sequentially inside
each model (GADFLY_LENS_CONCURRENCY=1). Raise it to overlap the independent per-lens
review+recheck passes — the model then posts its consolidated comment as soon as its lenses
finish (so with sequential models, results stream in per model and per-model timings stay
clean). Like the model cap, it's per-provider configurable: GADFLY_PROVIDER_LENS_CONCURRENCY
takes a provider=N map keyed by the same provider lanes as GADFLY_PROVIDER_CONCURRENCY,
falling back to the GADFLY_LENS_CONCURRENCY scalar (default 1). It multiplies with the
model cap: total in-flight requests ≈ models-at-once × lenses-at-once, so to fan lenses out
without oversubscribing a backend, keep its model cap low and raise its lens cap:
# Per provider: cloud runs one model at a time but fans its 3 lenses out (3 concurrent requests);
# the slow local box stays fully serial. Both provider lanes still run in parallel.
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=1,m1=1"
GADFLY_PROVIDER_LENS_CONCURRENCY: "ollama-cloud=3,m1=1"
GADFLY_SPECIALISTS: "security,correctness,error-handling"
Live status board
When several models (each with several lenses) review a PR, the individual findings land in
one comment per model — but while that's in flight all you'd see is a row of
⏳ Reviewing… placeholders. So Gadfly also upserts one consolidated status-board comment
that aggregates every model's per-lens progress as it happens:
## 🪰 Gadfly — live review status
1/3 reviewers finished · updated 2026-06-27 18:14:56Z
#### `glm-5.2:cloud` · ollama-cloud — ⏳ 2/4 lenses
- ✅ security — No material issues found
- 🔄 correctness — running
- ⏸️ performance — queued
…
Each model process publishes its lenses (queued → running → finished + verdict) to a small
JSON file, and a background renderer in entrypoint.sh re-renders + upserts the single comment
every GADFLY_STATUS_POLL_SECS (default 12s) until the swarm finishes. It's advisory and
best-effort — the per-model findings comments are unaffected — and entirely separate from those.
Turn it off with GADFLY_STATUS_BOARD=0.
Triggers
- A new/reopened/ready non-draft PR — automatic.
- Commenting
@gadfly reviewon a PR — re-review on demand (gated to allowed users). - workflow_dispatch — manual, with a
pr_numberinput.
(Pushing new commits does not auto-re-review — comment @gadfly review after pushing
fixes. This keeps usage down.)
Comment trigger needs the workflow on your default branch. Gitea runs
issue_commentworkflows from the default branch, so@gadfly reviewonly works once this stub is merged tomain(thepull_requestauto-trigger works from the PR branch immediately).Security: the example stubs gate the comment trigger with a job-level
if: github.event_name != 'issue_comment' || github.actor == '<you>'so an untrusted commenter can't start a secret-bearing run — edit it to your maintainers and keep it in sync withGADFLY_ALLOWED_USERS(the in-container check).@gadfly reviewis plain-text matched (configurable viaGADFLY_TRIGGER_PHRASE), so no bot account is required; comments post asgitea-actions.
How it's packaged
cmd/gadfly/ the agentic reviewer binary (majordomo + Ollama Cloud); zero deps beyond stdlib + majordomo
scripts/run.sh fetches the PR diff, runs the reviewer, upserts one labeled comment
scripts/status-board.sh renders + upserts the single live status-board comment (per-lens progress)
scripts/system-prompt.txt the reviewer persona + verification discipline
entrypoint.sh the container brains: trigger gating, clone, model loop (logic lives here, not in YAML)
Dockerfile multi-stage; build-time module creds (BuildKit secrets) never reach the final image
.gitea/workflows/build-image.yml push to main → :latest; tag v* → :<tag> + :latest
examples/ the ~15-line stub a consuming repo drops in
The image is published to gitea.stevedudenhoeffer.com/steve/gadfly. Every push to main
rebuilds and republishes :latest (plus :sha-<short>); pushing a v* tag publishes that
pinned version (plus :latest). Pin consumers to a :vN tag for stability, or track
:latest to ride main.
Configuration (advanced)
The reviewer binary reads these (the stub/entrypoint set sane defaults):
| Env | Default | Meaning |
|---|---|---|
GADFLY_MODEL |
— | model id, or provider/model spec, or majordomo alias/chain |
GADFLY_PROVIDER |
ollama-cloud |
provider prefix for a bare model id |
GADFLY_BASE_URL |
— | override endpoint (OpenAI/Ollama-compatible servers) |
GADFLY_API_KEY |
— | provider key; falls back to the provider's standard env |
claude-code model id |
— | route a model through the bundled Claude Code CLI (claude-code / claude-code/<model>); see Claude Code engine for its GADFLY_CLAUDE_* knobs |
GADFLY_SPECIALISTS |
default suite | csv of lenses, all, or auto (dynamic selection) |
GADFLY_SELECTOR_MODEL |
review model | model that picks lenses in auto mode |
GADFLY_WORKER_MODEL |
— | cheap model for delegate_investigation; unset = no delegation |
GADFLY_WORKER_MAX_STEPS |
8 | tool-step cap for a delegated worker run |
GADFLY_CONCURRENCY |
1 | default max models run at once per provider |
GADFLY_PROVIDER_CONCURRENCY |
— | per-provider overrides, e.g. ollama-cloud=3,m1pro=1 |
GADFLY_LENS_CONCURRENCY |
1 | specialist lenses run at once within a model (× model cap = total in-flight) |
GADFLY_PROVIDER_LENS_CONCURRENCY |
— | per-provider lens overrides, same lanes as GADFLY_PROVIDER_CONCURRENCY, e.g. ollama-cloud=3,m1=1 |
GADFLY_MAX_STEPS |
24 | review-pass tool-step cap |
GADFLY_TIMEOUT_SECS |
300 | deadline per specialist lens (review+recheck) |
GADFLY_RECHECK |
on | set 0/false to skip the recheck pass |
GADFLY_RECHECK_MAX_STEPS |
16 | recheck-pass step cap |
GADFLY_MAX_DIFF_CHARS |
60000 | diff chars embedded in the prompt (full diff via get_diff) |
GADFLY_STATUS_BOARD |
on | set 0 to disable the live status-board comment |
GADFLY_STATUS_POLL_SECS |
12 | how often the status board re-renders/upserts |
GADFLY_TRIGGER_PHRASE |
@gadfly review |
comment phrase that re-triggers |
GADFLY_ALLOWED_USERS |
(collaborators) | comma-separated allow-list for comment triggers |
GADFLY_FINDINGS_URL |
— | gadfly-reports store base URL; set to enable findings telemetry (off when empty) |
GADFLY_FINDINGS_TOKEN |
�� | bearer token for the gadfly-reports store (sent as Authorization: Bearer …) |
GADFLY_REPO |
(from GITEA_API) |
owner/repo slug stamped on emitted runs/findings (set by entrypoint.sh) |
GADFLY_PR |
(from event) | PR number stamped on emitted runs/findings (set by entrypoint.sh) |
Findings telemetry (optional)
Gadfly can record what it found so model quality can be tracked over time. It is
off by default and purely advisory: set GADFLY_FINDINGS_URL to a
gadfly-reports store base URL and,
after each review, the binary best-effort POSTs the run (/runs) and the
findings it surfaced (/reports) to that store. Add GADFLY_FINDINGS_TOKEN
to send an Authorization: Bearer … header. entrypoint.sh supplies the run
context (GADFLY_REPO, GADFLY_PR) automatically.
Findings are extracted heuristically from each lens's markdown — a path:line
reference anchors a finding, titled by the nearest preceding heading / numbered
item / bold lead-in. A lens whose verdict is "No material issues found"
emits no findings: its path:line references are verification notes
("verified X is safe"), not problems, so extracting them would record false
positives and unfairly penalize thorough clean-pass reviewers. The emit is
strictly best-effort: a short (~10s) timeout, any error (or a non-2xx response)
is logged to stderr only, and it never changes the review output or the exit
code.
Building locally
go build ./cmd/gadfly # needs read access to the private majordomo module
go test ./...
License
MIT — see LICENSE.