T

Build & push image / build-and-push (pull_request) Successful in 6s

Details

docs: reconcile examples/README + CLAUDE.md with the heavier reusable default

From PR #10's own review (maintainability/perf lenses): examples/README.md
hadn't been updated for the default swarm, and CLAUDE.md's 'keep the default
model count low' cost guidance read as contradicting the new heavy default.
Clarify that the IMAGE default stays minimal while the REUSABLE ships an
opinionated heavier default consumers inherit/override.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-27 22:18:43 -04:00

.gitea/workflows

tune(reusable): claude-code=1 model x 5 lenses (cap peak at 5 concurrent)

2026-06-27 22:12:17 -04:00

cmd/gadfly

feat: bump majordomo + support llama-swap(s) provider spellings (#7 )

2026-06-27 23:18:56 +00:00

examples

docs: reconcile examples/README + CLAUDE.md with the heavier reusable default

2026-06-27 22:18:43 -04:00

scripts

feat: claude-code reviewer engine (#2 )

2026-06-27 20:40:41 +00:00

.dockerignore

Gadfly: agentic adversarial PR reviewer (initial extraction)

2026-06-25 18:42:20 -04:00

.gitignore

Gadfly: agentic adversarial PR reviewer (initial extraction)

2026-06-25 18:42:20 -04:00

CLAUDE.md

docs: reconcile examples/README + CLAUDE.md with the heavier reusable default

2026-06-27 22:18:43 -04:00

Dockerfile

feat: claude-code reviewer engine (#2 )

2026-06-27 20:40:41 +00:00

entrypoint.sh

feat: claude-code reviewer engine (#2 )

2026-06-27 20:40:41 +00:00

go.mod

feat: bump majordomo + support llama-swap(s) provider spellings (#7 )

2026-06-27 23:18:56 +00:00

go.sum

feat: bump majordomo + support llama-swap(s) provider spellings (#7 )

2026-06-27 23:18:56 +00:00

LICENSE

Gadfly: agentic adversarial PR reviewer (initial extraction)

2026-06-25 18:42:20 -04:00

README.md

feat(reusable): ship the curated swarm as the default config consumers inherit

2026-06-27 22:05:31 -04:00

README.md

🪰 Gadfly

An AI gadfly for your pull requests. Gadfly is an adversarial code reviewer that runs in Gitea Actions: on every PR it reads your actual repository, hunts for real problems, verifies them against the code, and posts its findings as a comment. It does not praise your code. A gadfly does not let things slide.

🤖 Heads up: this is a vibe-coded project

Gadfly was built almost entirely by an AI agent (Claude Code), prompts and all — the reviewer's "brain" is a language model, and so was most of the author. It works and it's tested, but treat it accordingly: it is advisory only, it never blocks a merge, and you should still review its reviews. Issues and PRs welcome; expect the occasional AI-flavored rough edge.

What makes it different

Most LLM "review my diff" bots read the diff in isolation and hallucinate problems they can't actually see — a "missing import" that's three lines above the hunk, a "broken caller" in a file they never opened. Gadfly is agentic: the model has read-only tools over the checked-out repo and is required to use them before reporting anything.

Tools: read_file, list_dir, grep, find_files, get_diff.
Verify-before-claiming discipline: baked into the system prompt — open the file, grep the symbol, or drop the finding.
Two passes: a review pass drafts findings, then an adversarial recheck pass independently re-verifies each one against the code and drops the ones it can't confirm, recomputing the verdict. This is what kills "confident but wrong."
Semantic-bug hunting: it's told not to trust a plausible-looking constant, conversion factor, or formula — re-derive the expected value, because that's where real bugs hide.

Every review leads with a one-line verdict: No material issues found, Minor issues, or Blocking issues found.

Turn it on for a repo

Gadfly ships as a container image, so consuming repos don't build anything — they just run it. Drop one file in your repo and set a couple of secrets/vars:

Copy a stub from examples/ to .gitea/workflows/adversarial-review.yml in your repo. Two flavors: the slim reusable.yml — a tiny caller of Gadfly's reusable workflow (uses: steve/gadfly/.gitea/workflows/review-reusable.yml@…, forwarding only the secrets the reviewer needs), which ships a default swarm (3 cloud models + the Claude Code engine, 5-lens suite) you inherit by omitting with: or override per-input — or the full self-contained adversarial-review.yml (Ollama Cloud default, with inline notes for every provider / local Ollama / OpenAI-compatible / endpoint aliases). See the examples index.
Add repo config:
- secret OLLAMA_CLOUD_API_KEY — your Ollama Cloud key (empty ⇒ Gadfly posts a harmless "not configured" notice instead of reviewing). Not needed if you point Gadfly at a different provider — see Models & providers.
- var OLLAMA_REVIEW_MODELS (optional) — comma-separated model ids (default qwen3-coder:480b-cloud,gpt-oss:120b-cloud). One comment per model.
- var GADFLY_ALLOWED_USERS (optional) — who may re-trigger via comment; empty ⇒ any repo collaborator.

GITEA_TOKEN is provided automatically by Actions; comments post as the gitea-actions user, scoped to that repo — no bot account needed.

Models & providers

Gadfly is built on majordomo, so the reviewer model is not hard-wired — it can target anything majordomo supports. Pick a provider by setting GADFLY_PROVIDER (used to prefix bare model ids); point at a custom endpoint with GADFLY_BASE_URL; supply a key with GADFLY_API_KEY or the provider's standard env var. A GADFLY_MODEL/GADFLY_MODELS value that already contains a provider/ prefix (or is a majordomo failover chain / alias) is used verbatim.

Provider	`GADFLY_PROVIDER`	Key env	Status
Ollama Cloud (default)	`ollama-cloud`	`OLLAMA_API_KEY` / `OLLAMA_CLOUD_API_KEY`	✅ in active use
Local Ollama	`ollama`	none (`OLLAMA_HOST` or `GADFLY_BASE_URL` for a remote daemon)	✅ tested
foreman (native-Ollama queue daemon)	`foreman` + `GADFLY_BASE_URL`, or a `GADFLY_ENDPOINT_` / `LLM_` `foreman://` entry	optional bearer (via the endpoint/DSN)	✅ native-Ollama path
llama-swap (model-swapping proxy)	`llama-swap`/`llama-swaps` (un-hyphenated `llamaswap`/`llamaswaps` also accepted) + `GADFLY_BASE_URL` or a `GADFLY_ENDPOINT_` entry, or an `LLM_` `llama-swap://` / `llama-swaps://` DSN	optional bearer	⚠️ wired, untested
OpenAI-compatible (incl. local Ollama's `/v1`)	`openai` + `GADFLY_BASE_URL`	`OPENAI_API_KEY` (any non-empty for Ollama)	✅ tested against Ollama
OpenAI	`openai`	`OPENAI_API_KEY`	⚠️ wired, untested
Anthropic	`anthropic`	`ANTHROPIC_API_KEY`	⚠️ wired, untested
Google (Gemini)	`google`	`GOOGLE_API_KEY` / `GEMINI_API_KEY`	⚠️ wired, untested

🧪 Honest status

Only the Ollama paths above are actually exercised. The OpenAI / Anthropic / Google providers come "for free" from majordomo's abstraction and should work, but I haven't spent money verifying them — treat them as untested. The OpenAI-compatible path is tested, because you can point it at a local Ollama (GADFLY_BASE_URL=http://localhost:11434/v1) and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you try a cloud provider and it works (or doesn't), please open an issue.

Claude Code engine (`claude-code`)

Besides the majordomo model loop, Gadfly can review through the Claude Code CLI: for each lens it shells out to claude -p inside the checked-out repo, so Claude Code uses its own read tools (Read/Grep/Glob) to verify findings against real code, then Gadfly parses the result and runs the same verdict-parse → recheck → consolidate → emit pipeline. The CLI is bundled in the image (Node + @anthropic-ai/claude-code).

Select it as a model id — bare claude-code (CLI default model) or claude-code/<model> (the suffix becomes --model, e.g. claude-code/sonnet, claude-code/opus). An optional :<thinking> suffix forces an extended-thinking budget for that reviewer — :max (the high "ultrathink" tier) or :<n> for a specific token budget — so you can run the same model at two thinking depths as separate reviewers:

GADFLY_MODELS: "claude-code/sonnet,claude-code/opus,claude-code/opus:max"

The thinking budget is applied via the MAX_THINKING_TOKENS env on the CLI subprocess; it's best-effort (a no-op if the installed CLI build doesn't honor it).

Auth is read from the environment: the default is a Pro/Max subscription via CLAUDE_CODE_OAUTH_TOKEN (from claude setup-token; no --bare), falling back to ANTHROPIC_API_KEY. Don't set both. Tuning knobs (all optional):

Env	Default	Meaning
`GADFLY_CLAUDE_MODEL`	(from the spec suffix)	overrides the `--model` value
`GADFLY_CLAUDE_PERMISSION_MODE`	`plan`	`--permission-mode` (read-only `plan` keeps it from editing)
`GADFLY_CLAUDE_ALLOWED_TOOLS`	(unset)	`--allowedTools` value, passed verbatim (e.g. `Read,Grep,Glob`)
`GADFLY_CLAUDE_EXTRA_ARGS`	(unset)	extra CLI args, whitespace-split (no shell quoting) and appended after the defaults (e.g. `--max-turns 30`)
`GADFLY_CLAUDE_BIN`	`claude`	CLI binary path

These are operator knobs (workflow env), not PR-author input. Because GADFLY_CLAUDE_EXTRA_ARGS is appended after the defaults, it can override the read-only --permission-mode plan (e.g. passing --permission-mode acceptEdits), so keep it read-only unless you mean otherwise. It's whitespace-split, so values can't contain spaces — use GADFLY_CLAUDE_ALLOWED_TOOLS / _PERMISSION_MODE / _MODEL for those. The subprocess runs with a minimal environment (its auth token + PATH/HOME/locale/GADFLY_CLAUDE_*), not the runner's full env, so the Gitea token and provider keys aren't handed to the CLI.

Alternate backends (example only, not validated here). Because the subprocess env forwards ANTHROPIC_* and CLAUDE_*, you can point the same engine at a non-Anthropic backend by setting ANTHROPIC_BASE_URL (and ANTHROPIC_AUTH_TOKEN/ANTHROPIC_API_KEY) to an Anthropic-API-compatible proxy — e.g. claude-code-router or LiteLLM in front of Ollama — to run Ollama models through Claude Code's harness and compare it against the native majordomo loop. Whether tool-use survives a given proxy/backend varies, so this is documented as an example, not wired or tested here.

The Pro/Max path is dogfooded but otherwise lightly tested. claude-code/sonnet now runs on gadfly's own PRs (see .gitea/workflows/adversarial-review.yml), but treat the engine as new — and note that subscription auth in automated CI is a gray area in Anthropic's terms. auto specialist selection and the delegate_investigation worker are majordomo-only and are skipped with this engine (Claude Code does its own legwork).

Endpoint aliases via env vars

For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by name with env vars and then reference name/model in GADFLY_MODEL/GADFLY_MODELS:

# http-capable (Gadfly-native) — base URL used verbatim, so plaintext LAN works:
GADFLY_ENDPOINT_BIGBOX="ollama|http://192.168.1.50:11434"
GADFLY_ENDPOINT_GPU="openai|http://gpu.lan:8000/v1|sk-local"
GADFLY_ENDPOINT_M1="foreman|http://foreman-m1:8080|tok"   # native-Ollama queue daemon
GADFLY_MODELS="bigbox/qwen2.5-coder:7b,gpu/llama3.1,m1/qwen3:14b"

# pure spec alias (a model, or a failover chain):
GADFLY_ALIAS_FAST="bigbox/qwen2.5-coder:7b,ollama-cloud/gpt-oss:120b-cloud"
GADFLY_MODEL="fast"

<NAME> is lowercased to form the registry name (GADFLY_ENDPOINT_BIGBOX → bigbox). This is the same idea as majordomo's built-in LLM_* env DSNs (LLM_BIGBOX=ollama://tok@host, LLM_M1=foreman://tok@host), which Gadfly also honors — but those are HTTPS-only, so for a plaintext local Ollama or http:// foreman use GADFLY_ENDPOINT_* instead.

Gitea Actions note: repo vars/secrets aren't auto-exposed as env — add each alias to the stub workflow's env: block, e.g. GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}.

Specialists (the review swarm)

Instead of one generic reviewer, Gadfly runs a suite of specialists — each a focused lens with its own review (+recheck) pass — and merges them into one comment, a collapsible section per lens, led by an overall verdict (the worst across lenses; the optional improvements lens never escalates it).

Default suite (when nothing is configured): security, correctness, maintainability (code cleanliness), performance, error-handling.

Also built in (opt-in by name): tests, docs, conventions, and improvements (strict & quiet — at most 1–2 high-value, non-blocking suggestions, silent otherwise).

Select which run with GADFLY_SPECIALISTS (comma-separated names, or all):

GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"

Define your own — two ways, which compose (env overrides file overrides built-ins):

# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>"  (also overrides a built-in by reusing its name)
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
GADFLY_SPECIALISTS: "security,correctness,migrations"

# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
specialists: [security, correctness, maintainability, migrations]
define:
  - name: migrations
    title: "🗃️ DB migrations"
    focus: "Review schema migrations for destructive ops, missing indexes, table locks."

Dynamic selection (auto): set GADFLY_SPECIALISTS: auto and a selector model reads the changed files + PR description and picks only the lenses that materially apply (and may invent an ad-hoc one — e.g. a "migrations" lens for a schema change). The selector is GADFLY_SELECTOR_MODEL if set (a cheap tier is ideal), else the review model. Capped and de-duplicated; falls back to the default suite if selection fails.

Worker-tier delegation: set GADFLY_WORKER_MODEL (a cheap/fast model) to give every reviewer a delegate_investigation tool — it offloads mechanical legwork (trace all callers, gather every usage, check a pattern across files) to a worker sub-agent that returns a concise, evidence-cited digest, so the expensive model reasons over summaries instead of raw file dumps. Unset = no delegation (current behavior).

Cost: each specialist is its own review+recheck, so cost ≈ specialists × models × 2. The default suite runs on a single model. Trim with GADFLY_SPECIALISTS, let auto pick only what a diff needs, and point heavy legwork at a cheap GADFLY_WORKER_MODEL.

Concurrency (per-provider lanes)

With multiple models, each provider is its own lane and lanes run in parallel, so a fast cloud provider isn't stuck behind a slow local box. Within a lane, at most cap models run at once — cap comes from GADFLY_PROVIDER_CONCURRENCY (a provider=N map) else GADFLY_CONCURRENCY (default 1). The timeout is per-lens (GADFLY_TIMEOUT_SECS), so a slow model on one lens can't starve the others.

# One local box (serial — it serves one model at a time) + 3 cloud reviews at once,
# both lanes running concurrently:
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,m1pro=1"
GADFLY_MODELS: "m1pro/qwen3:14b,qwen3-coder:480b-cloud,gpt-oss:120b-cloud"

A model's provider is the spec's first segment (m1pro/… → m1pro), or GADFLY_PROVIDER/ ollama-cloud for a bare id. Default (cap 1) keeps a single-provider pool fully sequential.

Lens fan-out (within a model). By default the specialist lenses run sequentially inside each model (GADFLY_LENS_CONCURRENCY=1). Raise it to overlap the independent per-lens review+recheck passes — the model then posts its consolidated comment as soon as its lenses finish (so with sequential models, results stream in per model and per-model timings stay clean). Like the model cap, it's per-provider configurable: GADFLY_PROVIDER_LENS_CONCURRENCY takes a provider=N map keyed by the same provider lanes as GADFLY_PROVIDER_CONCURRENCY, falling back to the GADFLY_LENS_CONCURRENCY scalar (default 1). It multiplies with the model cap: total in-flight requests ≈ models-at-once × lenses-at-once, so to fan lenses out without oversubscribing a backend, keep its model cap low and raise its lens cap:

# Per provider: cloud runs one model at a time but fans its 3 lenses out (3 concurrent requests);
# the slow local box stays fully serial. Both provider lanes still run in parallel.
GADFLY_PROVIDER_CONCURRENCY:      "ollama-cloud=1,m1=1"
GADFLY_PROVIDER_LENS_CONCURRENCY: "ollama-cloud=3,m1=1"
GADFLY_SPECIALISTS: "security,correctness,error-handling"

Live status board

When several models (each with several lenses) review a PR, the individual findings land in one comment per model — but while that's in flight all you'd see is a row of ⏳ Reviewing… placeholders. So Gadfly also upserts one consolidated status-board comment that aggregates every model's per-lens progress as it happens:

## 🪰 Gadfly — live review status
1/3 reviewers finished · updated 2026-06-27 18:14:56Z

#### `glm-5.2:cloud` · ollama-cloud — ⏳ 2/4 lenses
- ✅ security — No material issues found
- 🔄 correctness — running
- ⏸️ performance — queued
…

Each model process publishes its lenses (queued → running → finished + verdict) to a small JSON file, and a background renderer in entrypoint.sh re-renders + upserts the single comment every GADFLY_STATUS_POLL_SECS (default 12s) until the swarm finishes. It's advisory and best-effort — the per-model findings comments are unaffected — and entirely separate from those. Turn it off with GADFLY_STATUS_BOARD=0.

Triggers

A new/reopened/ready non-draft PR — automatic.
Commenting @gadfly review on a PR — re-review on demand (gated to allowed users).
workflow_dispatch — manual, with a pr_number input.

(Pushing new commits does not auto-re-review — comment @gadfly review after pushing fixes. This keeps usage down.)

Comment trigger needs the workflow on your default branch. Gitea runs issue_comment workflows from the default branch, so @gadfly review only works once this stub is merged to main (the pull_request auto-trigger works from the PR branch immediately).

Security: the example stubs gate the comment trigger with a job-level if: github.event_name != 'issue_comment' || github.actor == '<you>' so an untrusted commenter can't start a secret-bearing run — edit it to your maintainers and keep it in sync with GADFLY_ALLOWED_USERS (the in-container check). @gadfly review is plain-text matched (configurable via GADFLY_TRIGGER_PHRASE), so no bot account is required; comments post as gitea-actions.

How it's packaged

cmd/gadfly/            the agentic reviewer binary (majordomo + Ollama Cloud); zero deps beyond stdlib + majordomo
scripts/run.sh         fetches the PR diff, runs the reviewer, upserts one labeled comment
scripts/status-board.sh    renders + upserts the single live status-board comment (per-lens progress)
scripts/system-prompt.txt  the reviewer persona + verification discipline
entrypoint.sh          the container brains: trigger gating, clone, model loop (logic lives here, not in YAML)
Dockerfile             multi-stage; build-time module creds (BuildKit secrets) never reach the final image
.gitea/workflows/build-image.yml   push to main → :latest; tag v* → :<tag> + :latest
examples/              the ~15-line stub a consuming repo drops in

The image is published to gitea.stevedudenhoeffer.com/steve/gadfly. Every push to main rebuilds and republishes :latest (plus :sha-<short>); pushing a v* tag publishes that pinned version (plus :latest). Pin consumers to a :vN tag for stability, or track :latest to ride main.

Configuration (advanced)

The reviewer binary reads these (the stub/entrypoint set sane defaults):

Env	Default	Meaning
`GADFLY_MODEL`	—	model id, or `provider/model` spec, or majordomo alias/chain
`GADFLY_PROVIDER`	`ollama-cloud`	provider prefix for a bare model id
`GADFLY_BASE_URL`	—	override endpoint (OpenAI/Ollama-compatible servers)
`GADFLY_API_KEY`	—	provider key; falls back to the provider's standard env
`claude-code` model id	—	route a model through the bundled Claude Code CLI (`claude-code` / `claude-code/<model>`); see Claude Code engine for its `GADFLY_CLAUDE_*` knobs
`GADFLY_SPECIALISTS`	default suite	csv of lenses, `all`, or `auto` (dynamic selection)
`GADFLY_SELECTOR_MODEL`	review model	model that picks lenses in `auto` mode
`GADFLY_WORKER_MODEL`	—	cheap model for `delegate_investigation`; unset = no delegation
`GADFLY_WORKER_MAX_STEPS`	8	tool-step cap for a delegated worker run
`GADFLY_CONCURRENCY`	1	default max models run at once per provider
`GADFLY_PROVIDER_CONCURRENCY`	—	per-provider overrides, e.g. `ollama-cloud=3,m1pro=1`
`GADFLY_LENS_CONCURRENCY`	1	specialist lenses run at once within a model (× model cap = total in-flight)
`GADFLY_PROVIDER_LENS_CONCURRENCY`	—	per-provider lens overrides, same lanes as `GADFLY_PROVIDER_CONCURRENCY`, e.g. `ollama-cloud=3,m1=1`
`GADFLY_MAX_STEPS`	24	review-pass tool-step cap
`GADFLY_TIMEOUT_SECS`	300	deadline per specialist lens (review+recheck)
`GADFLY_RECHECK`	on	set `0`/`false` to skip the recheck pass
`GADFLY_RECHECK_MAX_STEPS`	16	recheck-pass step cap
`GADFLY_MAX_DIFF_CHARS`	60000	diff chars embedded in the prompt (full diff via `get_diff`)
`GADFLY_STATUS_BOARD`	on	set `0` to disable the live status-board comment
`GADFLY_STATUS_POLL_SECS`	12	how often the status board re-renders/upserts
`GADFLY_TRIGGER_PHRASE`	`@gadfly review`	comment phrase that re-triggers
`GADFLY_ALLOWED_USERS`	(collaborators)	comma-separated allow-list for comment triggers
`GADFLY_FINDINGS_URL`	—	gadfly-reports store base URL; set to enable findings telemetry (off when empty)
`GADFLY_FINDINGS_TOKEN`	��	bearer token for the gadfly-reports store (sent as `Authorization: Bearer …`)
`GADFLY_REPO`	(from `GITEA_API`)	`owner/repo` slug stamped on emitted runs/findings (set by `entrypoint.sh`)
`GADFLY_PR`	(from event)	PR number stamped on emitted runs/findings (set by `entrypoint.sh`)

Findings telemetry (optional)

Gadfly can record what it found so model quality can be tracked over time. It is off by default and purely advisory: set GADFLY_FINDINGS_URL to a gadfly-reports store base URL and, after each review, the binary best-effort POSTs the run (/runs) and the findings it surfaced (/reports) to that store. Add GADFLY_FINDINGS_TOKEN to send an Authorization: Bearer … header. entrypoint.sh supplies the run context (GADFLY_REPO, GADFLY_PR) automatically.

Findings are extracted heuristically from each lens's markdown — a path:line reference anchors a finding, titled by the nearest preceding heading / numbered item / bold lead-in. A lens whose verdict is "No material issues found" emits no findings: its path:line references are verification notes ("verified X is safe"), not problems, so extracting them would record false positives and unfairly penalize thorough clean-pass reviewers. The emit is strictly best-effort: a short (~10s) timeout, any error (or a non-2xx response) is logged to stderr only, and it never changes the review output or the exit code.

Building locally

go build ./cmd/gadfly      # needs read access to the private majordomo module
go test ./...

License

MIT — see LICENSE.

Description

🪰 Agentic adversarial code reviewer for Gitea Actions — hunts real bugs across a swarm of specialist lenses, verifies each against the checked-out repo, and posts one advisory PR comment. Provider-agnostic via majordomo. Advisory only, never blocks a merge.

Readme MIT 1.1 MiB

README.md Unescape Escape

🪰 Gadfly

🤖 Heads up: this is a vibe-coded project

What makes it different

Turn it on for a repo

Models & providers

🧪 Honest status

Claude Code engine (claude-code)

Endpoint aliases via env vars

Specialists (the review swarm)

Concurrency (per-provider lanes)

Live status board

Triggers

How it's packaged

Configuration (advanced)

Findings telemetry (optional)

Building locally

License

README.md

Claude Code engine (`claude-code`)