After each review the binary POSTs the run + its heuristically-extracted findings to GADFLY_FINDINGS_URL (off unless set). Advisory: any error only goes to stderr — never touches stdout, the exit code, or the review. stdlib net/http only (no new deps). entrypoint.sh derives GADFLY_REPO/GADFLY_PR and passes through GADFLY_FINDINGS_URL/GADFLY_FINDINGS_TOKEN. Also renames store references from the old 'docket' name to 'gadfly-reports'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
16 KiB
🪰 Gadfly
An AI gadfly for your pull requests. Gadfly is an adversarial code reviewer that runs in Gitea Actions: on every PR it reads your actual repository, hunts for real problems, verifies them against the code, and posts its findings as a comment. It does not praise your code. A gadfly does not let things slide.
🤖 Heads up: this is a vibe-coded project
Gadfly was built almost entirely by an AI agent (Claude Code), prompts and all — the reviewer's "brain" is a language model, and so was most of the author. It works and it's tested, but treat it accordingly: it is advisory only, it never blocks a merge, and you should still review its reviews. Issues and PRs welcome; expect the occasional AI-flavored rough edge.
What makes it different
Most LLM "review my diff" bots read the diff in isolation and hallucinate problems they can't actually see — a "missing import" that's three lines above the hunk, a "broken caller" in a file they never opened. Gadfly is agentic: the model has read-only tools over the checked-out repo and is required to use them before reporting anything.
- Tools:
read_file,list_dir,grep,find_files,get_diff. - Verify-before-claiming discipline: baked into the system prompt — open the file, grep the symbol, or drop the finding.
- Two passes: a review pass drafts findings, then an adversarial recheck pass independently re-verifies each one against the code and drops the ones it can't confirm, recomputing the verdict. This is what kills "confident but wrong."
- Semantic-bug hunting: it's told not to trust a plausible-looking constant, conversion factor, or formula — re-derive the expected value, because that's where real bugs hide.
Every review leads with a one-line verdict: No material issues found, Minor issues, or Blocking issues found.
Turn it on for a repo
Gadfly ships as a container image, so consuming repos don't build anything — they just run it. Drop one file in your repo and set a couple of secrets/vars:
- Copy a stub from
examples/to.gitea/workflows/adversarial-review.ymlin your repo —adversarial-review.ymlfor the Ollama Cloud default, or a provider-specific one (local Ollama, OpenAI-compatible, endpoint aliases). See the examples index. - Add repo config:
- secret
OLLAMA_CLOUD_API_KEY— your Ollama Cloud key (empty ⇒ Gadfly posts a harmless "not configured" notice instead of reviewing). Not needed if you point Gadfly at a different provider — see Models & providers. - var
OLLAMA_REVIEW_MODELS(optional) — comma-separated model ids (defaultqwen3-coder:480b-cloud,gpt-oss:120b-cloud). One comment per model. - var
GADFLY_ALLOWED_USERS(optional) — who may re-trigger via comment; empty ⇒ any repo collaborator.
- secret
GITEA_TOKEN is provided automatically by Actions; comments post as the gitea-actions
user, scoped to that repo — no bot account needed.
Models & providers
Gadfly is built on majordomo, so the
reviewer model is not hard-wired — it can target anything majordomo supports. Pick a provider
by setting GADFLY_PROVIDER (used to prefix bare model ids); point at a custom endpoint with
GADFLY_BASE_URL; supply a key with GADFLY_API_KEY or the provider's standard env var. A
GADFLY_MODEL/GADFLY_MODELS value that already contains a provider/ prefix (or is a
majordomo failover chain / alias) is used verbatim.
| Provider | GADFLY_PROVIDER |
Key env | Status |
|---|---|---|---|
| Ollama Cloud (default) | ollama-cloud |
OLLAMA_API_KEY / OLLAMA_CLOUD_API_KEY |
✅ in active use |
| Local Ollama | ollama |
none (OLLAMA_HOST or GADFLY_BASE_URL for a remote daemon) |
✅ tested |
| foreman (native-Ollama queue daemon) | foreman + GADFLY_BASE_URL, or a GADFLY_ENDPOINT_* / LLM_* foreman:// entry |
optional bearer (via the endpoint/DSN) | ✅ native-Ollama path |
OpenAI-compatible (incl. local Ollama's /v1) |
openai + GADFLY_BASE_URL |
OPENAI_API_KEY (any non-empty for Ollama) |
✅ tested against Ollama |
| OpenAI | openai |
OPENAI_API_KEY |
⚠️ wired, untested |
| Anthropic | anthropic |
ANTHROPIC_API_KEY |
⚠️ wired, untested |
| Google (Gemini) | google |
GOOGLE_API_KEY / GEMINI_API_KEY |
⚠️ wired, untested |
🧪 Honest status
Only the Ollama paths above are actually exercised. The OpenAI / Anthropic / Google providers come "for free" from majordomo's abstraction and should work, but I haven't spent money verifying them — treat them as untested. The OpenAI-compatible path is tested, because you can point it at a local Ollama (
GADFLY_BASE_URL=http://localhost:11434/v1) and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you try a cloud provider and it works (or doesn't), please open an issue.
Endpoint aliases via env vars
For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by
name with env vars and then reference name/model in GADFLY_MODEL/GADFLY_MODELS:
# http-capable (Gadfly-native) — base URL used verbatim, so plaintext LAN works:
GADFLY_ENDPOINT_BIGBOX="ollama|http://192.168.1.50:11434"
GADFLY_ENDPOINT_GPU="openai|http://gpu.lan:8000/v1|sk-local"
GADFLY_ENDPOINT_M1="foreman|http://foreman-m1:8080|tok" # native-Ollama queue daemon
GADFLY_MODELS="bigbox/qwen2.5-coder:7b,gpu/llama3.1,m1/qwen3:14b"
# pure spec alias (a model, or a failover chain):
GADFLY_ALIAS_FAST="bigbox/qwen2.5-coder:7b,ollama-cloud/gpt-oss:120b-cloud"
GADFLY_MODEL="fast"
<NAME> is lowercased to form the registry name (GADFLY_ENDPOINT_BIGBOX → bigbox). This
is the same idea as majordomo's built-in LLM_* env DSNs (LLM_BIGBOX=ollama://tok@host,
LLM_M1=foreman://tok@host), which Gadfly also honors — but those are HTTPS-only, so for a
plaintext local Ollama or http:// foreman use GADFLY_ENDPOINT_* instead.
Gitea Actions note: repo
vars/secretsaren't auto-exposed as env — add each alias to the stub workflow'senv:block, e.g.GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}.
Specialists (the review swarm)
Instead of one generic reviewer, Gadfly runs a suite of specialists — each a focused lens
with its own review (+recheck) pass — and merges them into one comment, a collapsible
section per lens, led by an overall verdict (the worst across lenses; the optional
improvements lens never escalates it).
Default suite (when nothing is configured):
security, correctness, maintainability (code cleanliness), performance, error-handling.
Also built in (opt-in by name): tests, docs, conventions, and improvements
(strict & quiet — at most 1–2 high-value, non-blocking suggestions, silent otherwise).
Select which run with GADFLY_SPECIALISTS (comma-separated names, or all):
GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"
Define your own — two ways, which compose (env overrides file overrides built-ins):
# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>" (also overrides a built-in by reusing its name)
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
GADFLY_SPECIALISTS: "security,correctness,migrations"
# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
specialists: [security, correctness, maintainability, migrations]
define:
- name: migrations
title: "🗃️ DB migrations"
focus: "Review schema migrations for destructive ops, missing indexes, table locks."
Dynamic selection (auto): set GADFLY_SPECIALISTS: auto and a selector model reads the
changed files + PR description and picks only the lenses that materially apply (and may invent
an ad-hoc one — e.g. a "migrations" lens for a schema change). The selector is
GADFLY_SELECTOR_MODEL if set (a cheap tier is ideal), else the review model. Capped and
de-duplicated; falls back to the default suite if selection fails.
Worker-tier delegation: set GADFLY_WORKER_MODEL (a cheap/fast model) to give every
reviewer a delegate_investigation tool — it offloads mechanical legwork (trace all callers,
gather every usage, check a pattern across files) to a worker sub-agent that returns a concise,
evidence-cited digest, so the expensive model reasons over summaries instead of raw file dumps.
Unset = no delegation (current behavior).
Cost: each specialist is its own review+recheck, so cost ≈ specialists × models × 2. The default suite runs on a single model. Trim with
GADFLY_SPECIALISTS, letautopick only what a diff needs, and point heavy legwork at a cheapGADFLY_WORKER_MODEL.
Concurrency (per-provider lanes)
With multiple models, each provider is its own lane and lanes run in parallel, so a fast
cloud provider isn't stuck behind a slow local box. Within a lane, at most cap models run at
once — cap comes from GADFLY_PROVIDER_CONCURRENCY (a provider=N map) else GADFLY_CONCURRENCY
(default 1). The timeout is per-lens (GADFLY_TIMEOUT_SECS), so a slow model on one lens
can't starve the others.
# One local box (serial — it serves one model at a time) + 3 cloud reviews at once,
# both lanes running concurrently:
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,m1pro=1"
GADFLY_MODELS: "m1pro/qwen3:14b,qwen3-coder:480b-cloud,gpt-oss:120b-cloud"
A model's provider is the spec's first segment (m1pro/… → m1pro), or GADFLY_PROVIDER/
ollama-cloud for a bare id. Default (cap 1) keeps a single-provider pool fully sequential.
Lens fan-out (within a model). By default the specialist lenses run sequentially inside
each model (GADFLY_LENS_CONCURRENCY=1). Raise it to overlap the independent per-lens
review+recheck passes — the model then posts its consolidated comment as soon as its lenses
finish (so with sequential models, results stream in per model and per-model timings stay
clean). Like the model cap, it's per-provider configurable: GADFLY_PROVIDER_LENS_CONCURRENCY
takes a provider=N map keyed by the same provider lanes as GADFLY_PROVIDER_CONCURRENCY,
falling back to the GADFLY_LENS_CONCURRENCY scalar (default 1). It multiplies with the
model cap: total in-flight requests ≈ models-at-once × lenses-at-once, so to fan lenses out
without oversubscribing a backend, keep its model cap low and raise its lens cap:
# Per provider: cloud runs one model at a time but fans its 3 lenses out (3 concurrent requests);
# the slow local box stays fully serial. Both provider lanes still run in parallel.
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=1,m1=1"
GADFLY_PROVIDER_LENS_CONCURRENCY: "ollama-cloud=3,m1=1"
GADFLY_SPECIALISTS: "security,correctness,error-handling"
Triggers
- A new/reopened/ready non-draft PR — automatic.
- Commenting
@gadfly reviewon a PR — re-review on demand (gated to allowed users). - workflow_dispatch — manual, with a
pr_numberinput.
(Pushing new commits does not auto-re-review — comment @gadfly review after pushing
fixes. This keeps usage down.)
Comment trigger needs the workflow on your default branch. Gitea runs
issue_commentworkflows from the default branch, so@gadfly reviewonly works once this stub is merged tomain(thepull_requestauto-trigger works from the PR branch immediately).Security: the example stubs gate the comment trigger with a job-level
if: github.event_name != 'issue_comment' || github.actor == '<you>'so an untrusted commenter can't start a secret-bearing run — edit it to your maintainers and keep it in sync withGADFLY_ALLOWED_USERS(the in-container check).@gadfly reviewis plain-text matched (configurable viaGADFLY_TRIGGER_PHRASE), so no bot account is required; comments post asgitea-actions.
How it's packaged
cmd/gadfly/ the agentic reviewer binary (majordomo + Ollama Cloud); zero deps beyond stdlib + majordomo
scripts/run.sh fetches the PR diff, runs the reviewer, upserts one labeled comment
scripts/system-prompt.txt the reviewer persona + verification discipline
entrypoint.sh the container brains: trigger gating, clone, model loop (logic lives here, not in YAML)
Dockerfile multi-stage; build-time module creds (BuildKit secrets) never reach the final image
.gitea/workflows/build-image.yml push to main → :latest; tag v* → :<tag> + :latest
examples/ the ~15-line stub a consuming repo drops in
The image is published to gitea.stevedudenhoeffer.com/steve/gadfly. Every push to main
rebuilds and republishes :latest (plus :sha-<short>); pushing a v* tag publishes that
pinned version (plus :latest). Pin consumers to a :vN tag for stability, or track
:latest to ride main.
Configuration (advanced)
The reviewer binary reads these (the stub/entrypoint set sane defaults):
| Env | Default | Meaning |
|---|---|---|
GADFLY_MODEL |
— | model id, or provider/model spec, or majordomo alias/chain |
GADFLY_PROVIDER |
ollama-cloud |
provider prefix for a bare model id |
GADFLY_BASE_URL |
— | override endpoint (OpenAI/Ollama-compatible servers) |
GADFLY_API_KEY |
— | provider key; falls back to the provider's standard env |
GADFLY_SPECIALISTS |
default suite | csv of lenses, all, or auto (dynamic selection) |
GADFLY_SELECTOR_MODEL |
review model | model that picks lenses in auto mode |
GADFLY_WORKER_MODEL |
— | cheap model for delegate_investigation; unset = no delegation |
GADFLY_WORKER_MAX_STEPS |
8 | tool-step cap for a delegated worker run |
GADFLY_CONCURRENCY |
1 | default max models run at once per provider |
GADFLY_PROVIDER_CONCURRENCY |
— | per-provider overrides, e.g. ollama-cloud=3,m1pro=1 |
GADFLY_LENS_CONCURRENCY |
1 | specialist lenses run at once within a model (× model cap = total in-flight) |
GADFLY_PROVIDER_LENS_CONCURRENCY |
— | per-provider lens overrides, same lanes as GADFLY_PROVIDER_CONCURRENCY, e.g. ollama-cloud=3,m1=1 |
GADFLY_MAX_STEPS |
24 | review-pass tool-step cap |
GADFLY_TIMEOUT_SECS |
300 | deadline per specialist lens (review+recheck) |
GADFLY_RECHECK |
on | set 0/false to skip the recheck pass |
GADFLY_RECHECK_MAX_STEPS |
16 | recheck-pass step cap |
GADFLY_MAX_DIFF_CHARS |
60000 | diff chars embedded in the prompt (full diff via get_diff) |
GADFLY_TRIGGER_PHRASE |
@gadfly review |
comment phrase that re-triggers |
GADFLY_ALLOWED_USERS |
(collaborators) | comma-separated allow-list for comment triggers |
GADFLY_FINDINGS_URL |
— | gadfly-reports store base URL; set to enable findings telemetry (off when empty) |
GADFLY_FINDINGS_TOKEN |
— | bearer token for the gadfly-reports store (sent as Authorization: Bearer …) |
GADFLY_REPO |
(from GITEA_API) |
owner/repo slug stamped on emitted runs/findings (set by entrypoint.sh) |
GADFLY_PR |
(from event) | PR number stamped on emitted runs/findings (set by entrypoint.sh) |
Findings telemetry (optional)
Gadfly can record what it found so model quality can be tracked over time. It is
off by default and purely advisory: set GADFLY_FINDINGS_URL to a
gadfly-reports store base URL and,
after each review, the binary best-effort POSTs the run (/runs) and the
findings it surfaced (/reports) to that store. Add GADFLY_FINDINGS_TOKEN
to send an Authorization: Bearer … header. entrypoint.sh supplies the run
context (GADFLY_REPO, GADFLY_PR) automatically.
Findings are extracted heuristically from each lens's markdown — a path:line
reference anchors a finding, titled by the nearest preceding heading / numbered
item / bold lead-in. The emit is strictly best-effort: a short (~10s) timeout,
any error (or a non-2xx response) is logged to stderr only, and it never
changes the review output or the exit code.
Building locally
go build ./cmd/gadfly # needs read access to the private majordomo module
go test ./...
License
MIT — see LICENSE.