Steve Dudenhoeffer 49f3623204
Build & push image / build-and-push (push) Successful in 8s
fix: per-lens timeout, errored-verdict honesty, accurate provider label, tighter lens focus, run timing
Five fixes, several surfaced by the live bake-off:

- PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist
  (own context), not shared across the suite. A slow model (e.g. a 35B local MLX)
  was exhausting the whole 600s budget on lens 1, leaving the rest "step 0:
  context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go.
- ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean".
  Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses
  errored"); the section reads "⚠️ could not complete". consolidate.go.
- PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the
  spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default
  (was wrongly "ollama-cloud" for local models). scripts/run.sh.
- LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens
  stays in its lane, says "nothing in my area" rather than re-reporting another
  lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive-
  constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go.
- RUN TIMING: run.sh posts a " Reviewing…" placeholder at model start and updates
  it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 20:15:40 -04:00

🪰 Gadfly

An AI gadfly for your pull requests. Gadfly is an adversarial code reviewer that runs in Gitea Actions: on every PR it reads your actual repository, hunts for real problems, verifies them against the code, and posts its findings as a comment. It does not praise your code. A gadfly does not let things slide.

🤖 Heads up: this is a vibe-coded project

Gadfly was built almost entirely by an AI agent (Claude Code), prompts and all — the reviewer's "brain" is a language model, and so was most of the author. It works and it's tested, but treat it accordingly: it is advisory only, it never blocks a merge, and you should still review its reviews. Issues and PRs welcome; expect the occasional AI-flavored rough edge.

What makes it different

Most LLM "review my diff" bots read the diff in isolation and hallucinate problems they can't actually see — a "missing import" that's three lines above the hunk, a "broken caller" in a file they never opened. Gadfly is agentic: the model has read-only tools over the checked-out repo and is required to use them before reporting anything.

  • Tools: read_file, list_dir, grep, find_files, get_diff.
  • Verify-before-claiming discipline: baked into the system prompt — open the file, grep the symbol, or drop the finding.
  • Two passes: a review pass drafts findings, then an adversarial recheck pass independently re-verifies each one against the code and drops the ones it can't confirm, recomputing the verdict. This is what kills "confident but wrong."
  • Semantic-bug hunting: it's told not to trust a plausible-looking constant, conversion factor, or formula — re-derive the expected value, because that's where real bugs hide.

Every review leads with a one-line verdict: No material issues found, Minor issues, or Blocking issues found.

Turn it on for a repo

Gadfly ships as a container image, so consuming repos don't build anything — they just run it. Drop one file in your repo and set a couple of secrets/vars:

  1. Copy a stub from examples/ to .gitea/workflows/adversarial-review.yml in your repo — adversarial-review.yml for the Ollama Cloud default, or a provider-specific one (local Ollama, OpenAI-compatible, endpoint aliases). See the examples index.
  2. Add repo config:
    • secret OLLAMA_CLOUD_API_KEY — your Ollama Cloud key (empty ⇒ Gadfly posts a harmless "not configured" notice instead of reviewing). Not needed if you point Gadfly at a different provider — see Models & providers.
    • var OLLAMA_REVIEW_MODELS (optional) — comma-separated model ids (default qwen3-coder:480b-cloud,gpt-oss:120b-cloud). One comment per model.
    • var GADFLY_ALLOWED_USERS (optional) — who may re-trigger via comment; empty ⇒ any repo collaborator.

GITEA_TOKEN is provided automatically by Actions; comments post as the gitea-actions user, scoped to that repo — no bot account needed.

Models & providers

Gadfly is built on majordomo, so the reviewer model is not hard-wired — it can target anything majordomo supports. Pick a provider by setting GADFLY_PROVIDER (used to prefix bare model ids); point at a custom endpoint with GADFLY_BASE_URL; supply a key with GADFLY_API_KEY or the provider's standard env var. A GADFLY_MODEL/GADFLY_MODELS value that already contains a provider/ prefix (or is a majordomo failover chain / alias) is used verbatim.

Provider GADFLY_PROVIDER Key env Status
Ollama Cloud (default) ollama-cloud OLLAMA_API_KEY / OLLAMA_CLOUD_API_KEY in active use
Local Ollama ollama none (OLLAMA_HOST or GADFLY_BASE_URL for a remote daemon) tested
OpenAI-compatible (incl. local Ollama's /v1) openai + GADFLY_BASE_URL OPENAI_API_KEY (any non-empty for Ollama) tested against Ollama
OpenAI openai OPENAI_API_KEY ⚠️ wired, untested
Anthropic anthropic ANTHROPIC_API_KEY ⚠️ wired, untested
Google (Gemini) google GOOGLE_API_KEY / GEMINI_API_KEY ⚠️ wired, untested

🧪 Honest status

Only the Ollama paths above are actually exercised. The OpenAI / Anthropic / Google providers come "for free" from majordomo's abstraction and should work, but I haven't spent money verifying them — treat them as untested. The OpenAI-compatible path is tested, because you can point it at a local Ollama (GADFLY_BASE_URL=http://localhost:11434/v1) and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you try a cloud provider and it works (or doesn't), please open an issue.

Endpoint aliases via env vars

For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by name with env vars and then reference name/model in GADFLY_MODEL/GADFLY_MODELS:

# http-capable (Gadfly-native) — base URL used verbatim, so plaintext LAN works:
GADFLY_ENDPOINT_BIGBOX="ollama|http://192.168.1.50:11434"
GADFLY_ENDPOINT_GPU="openai|http://gpu.lan:8000/v1|sk-local"
GADFLY_MODELS="bigbox/qwen2.5-coder:7b,gpu/llama3.1"

# pure spec alias (a model, or a failover chain):
GADFLY_ALIAS_FAST="bigbox/qwen2.5-coder:7b,ollama-cloud/gpt-oss:120b-cloud"
GADFLY_MODEL="fast"

<NAME> is lowercased to form the registry name (GADFLY_ENDPOINT_BIGBOXbigbox). This is the same idea as majordomo's built-in LLM_* env DSNs (LLM_BIGBOX=ollama://tok@host), which Gadfly also honors — but those are HTTPS-only, so for plaintext local Ollama use GADFLY_ENDPOINT_* instead.

Gitea Actions note: repo vars/secrets aren't auto-exposed as env — add each alias to the stub workflow's env: block, e.g. GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}.

Specialists (the review swarm)

Instead of one generic reviewer, Gadfly runs a suite of specialists — each a focused lens with its own review (+recheck) pass — and merges them into one comment, a collapsible section per lens, led by an overall verdict (the worst across lenses; the optional improvements lens never escalates it).

Default suite (when nothing is configured): security, correctness, maintainability (code cleanliness), performance, error-handling.

Also built in (opt-in by name): tests, docs, conventions, and improvements (strict & quiet — at most 12 high-value, non-blocking suggestions, silent otherwise).

Select which run with GADFLY_SPECIALISTS (comma-separated names, or all):

GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"

Define your own — two ways, which compose (env overrides file overrides built-ins):

# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>"  (also overrides a built-in by reusing its name)
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
GADFLY_SPECIALISTS: "security,correctness,migrations"
# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
specialists: [security, correctness, maintainability, migrations]
define:
  - name: migrations
    title: "🗃️ DB migrations"
    focus: "Review schema migrations for destructive ops, missing indexes, table locks."

Dynamic selection (auto): set GADFLY_SPECIALISTS: auto and a selector model reads the changed files + PR description and picks only the lenses that materially apply (and may invent an ad-hoc one — e.g. a "migrations" lens for a schema change). The selector is GADFLY_SELECTOR_MODEL if set (a cheap tier is ideal), else the review model. Capped and de-duplicated; falls back to the default suite if selection fails.

Worker-tier delegation: set GADFLY_WORKER_MODEL (a cheap/fast model) to give every reviewer a delegate_investigation tool — it offloads mechanical legwork (trace all callers, gather every usage, check a pattern across files) to a worker sub-agent that returns a concise, evidence-cited digest, so the expensive model reasons over summaries instead of raw file dumps. Unset = no delegation (current behavior).

Cost: each specialist is its own review+recheck, so cost ≈ specialists × models × 2. The default suite runs on a single model. Trim with GADFLY_SPECIALISTS, let auto pick only what a diff needs, and point heavy legwork at a cheap GADFLY_WORKER_MODEL.

Triggers

  1. A new/reopened/ready non-draft PR — automatic.
  2. Commenting @gadfly review on a PR — re-review on demand (gated to allowed users).
  3. workflow_dispatch — manual, with a pr_number input.

(Pushing new commits does not auto-re-review — comment @gadfly review after pushing fixes. This keeps usage down.)

How it's packaged

cmd/gadfly/            the agentic reviewer binary (majordomo + Ollama Cloud); zero deps beyond stdlib + majordomo
scripts/run.sh         fetches the PR diff, runs the reviewer, upserts one labeled comment
scripts/system-prompt.txt  the reviewer persona + verification discipline
entrypoint.sh          the container brains: trigger gating, clone, model loop (logic lives here, not in YAML)
Dockerfile             multi-stage; build-time module creds (BuildKit secrets) never reach the final image
.gitea/workflows/build-image.yml   push to main → :latest; tag v* → :<tag> + :latest
examples/              the ~15-line stub a consuming repo drops in

The image is published to gitea.stevedudenhoeffer.com/steve/gadfly. Every push to main rebuilds and republishes :latest (plus :sha-<short>); pushing a v* tag publishes that pinned version (plus :latest). Pin consumers to a :vN tag for stability, or track :latest to ride main.

Configuration (advanced)

The reviewer binary reads these (the stub/entrypoint set sane defaults):

Env Default Meaning
GADFLY_MODEL model id, or provider/model spec, or majordomo alias/chain
GADFLY_PROVIDER ollama-cloud provider prefix for a bare model id
GADFLY_BASE_URL override endpoint (OpenAI/Ollama-compatible servers)
GADFLY_API_KEY provider key; falls back to the provider's standard env
GADFLY_SPECIALISTS default suite csv of lenses, all, or auto (dynamic selection)
GADFLY_SELECTOR_MODEL review model model that picks lenses in auto mode
GADFLY_WORKER_MODEL cheap model for delegate_investigation; unset = no delegation
GADFLY_WORKER_MAX_STEPS 8 tool-step cap for a delegated worker run
GADFLY_MAX_STEPS 24 review-pass tool-step cap
GADFLY_RECHECK on set 0/false to skip the recheck pass
GADFLY_RECHECK_MAX_STEPS 16 recheck-pass step cap
GADFLY_TIMEOUT_SECS 300 overall deadline (both passes)
GADFLY_MAX_DIFF_CHARS 60000 diff chars embedded in the prompt (full diff via get_diff)
GADFLY_TRIGGER_PHRASE @gadfly review comment phrase that re-triggers
GADFLY_ALLOWED_USERS (collaborators) comma-separated allow-list for comment triggers

Building locally

go build ./cmd/gadfly      # needs read access to the private majordomo module
go test ./...

License

MIT — see LICENSE.

S
Description
🪰 Agentic adversarial code reviewer for Gitea Actions — hunts real bugs across a swarm of specialist lenses, verifies each against the checked-out repo, and posts one advisory PR comment. Provider-agnostic via majordomo. Advisory only, never blocks a merge.
Readme MIT 1.1 MiB
Languages
Go 84.5%
Shell 14.8%
Dockerfile 0.7%