Files

T

Build & push image / build-and-push (push) Successful in 7s

Details

feat: bump majordomo + support llama-swap(s) provider spellings (#7 )

Bump majordomo to the latest build and accept every llama-swap spelling
(llama-swap/llama-swaps + un-hyphenated llamaswap/llamaswaps) in gadfly's
endpoint switches; the LLM_* llama-swap(s):// DSN path already worked via
majordomo.Parse. README + error messages + endpointProvider alias tests.

Swarm review: 8/9 clean; qwen3-coder's "Blocking" was a false positive
(claimed llamaswap was untested — it has dedicated test cases). Folded in
its one fair nit (README now lists the un-hyphenated aliases).

gofmt clean, go vet quiet, go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>

2026-06-27 23:18:56 +00:00

22 KiB

Raw Blame History

🪰 Gadfly

An AI gadfly for your pull requests. Gadfly is an adversarial code reviewer that runs in Gitea Actions: on every PR it reads your actual repository, hunts for real problems, verifies them against the code, and posts its findings as a comment. It does not praise your code. A gadfly does not let things slide.

🤖 Heads up: this is a vibe-coded project

Gadfly was built almost entirely by an AI agent (Claude Code), prompts and all — the reviewer's "brain" is a language model, and so was most of the author. It works and it's tested, but treat it accordingly: it is advisory only, it never blocks a merge, and you should still review its reviews. Issues and PRs welcome; expect the occasional AI-flavored rough edge.

What makes it different

Most LLM "review my diff" bots read the diff in isolation and hallucinate problems they can't actually see — a "missing import" that's three lines above the hunk, a "broken caller" in a file they never opened. Gadfly is agentic: the model has read-only tools over the checked-out repo and is required to use them before reporting anything.

Tools: read_file, list_dir, grep, find_files, get_diff.
Verify-before-claiming discipline: baked into the system prompt — open the file, grep the symbol, or drop the finding.
Two passes: a review pass drafts findings, then an adversarial recheck pass independently re-verifies each one against the code and drops the ones it can't confirm, recomputing the verdict. This is what kills "confident but wrong."
Semantic-bug hunting: it's told not to trust a plausible-looking constant, conversion factor, or formula — re-derive the expected value, because that's where real bugs hide.

Every review leads with a one-line verdict: No material issues found, Minor issues, or Blocking issues found.

Turn it on for a repo

Gadfly ships as a container image, so consuming repos don't build anything — they just run it. Drop one file in your repo and set a couple of secrets/vars:

Copy a stub from examples/ to .gitea/workflows/adversarial-review.yml in your repo — adversarial-review.yml for the Ollama Cloud default, or a provider-specific one (local Ollama, OpenAI-compatible, endpoint aliases). See the examples index.
Add repo config:
- secret OLLAMA_CLOUD_API_KEY — your Ollama Cloud key (empty ⇒ Gadfly posts a harmless "not configured" notice instead of reviewing). Not needed if you point Gadfly at a different provider — see Models & providers.
- var OLLAMA_REVIEW_MODELS (optional) — comma-separated model ids (default qwen3-coder:480b-cloud,gpt-oss:120b-cloud). One comment per model.
- var GADFLY_ALLOWED_USERS (optional) — who may re-trigger via comment; empty ⇒ any repo collaborator.

GITEA_TOKEN is provided automatically by Actions; comments post as the gitea-actions user, scoped to that repo — no bot account needed.

Models & providers

Gadfly is built on majordomo, so the reviewer model is not hard-wired — it can target anything majordomo supports. Pick a provider by setting GADFLY_PROVIDER (used to prefix bare model ids); point at a custom endpoint with GADFLY_BASE_URL; supply a key with GADFLY_API_KEY or the provider's standard env var. A GADFLY_MODEL/GADFLY_MODELS value that already contains a provider/ prefix (or is a majordomo failover chain / alias) is used verbatim.

Provider	`GADFLY_PROVIDER`	Key env	Status
Ollama Cloud (default)	`ollama-cloud`	`OLLAMA_API_KEY` / `OLLAMA_CLOUD_API_KEY`	✅ in active use
Local Ollama	`ollama`	none (`OLLAMA_HOST` or `GADFLY_BASE_URL` for a remote daemon)	✅ tested
foreman (native-Ollama queue daemon)	`foreman` + `GADFLY_BASE_URL`, or a `GADFLY_ENDPOINT_` / `LLM_` `foreman://` entry	optional bearer (via the endpoint/DSN)	✅ native-Ollama path
llama-swap (model-swapping proxy)	`llama-swap`/`llama-swaps` (un-hyphenated `llamaswap`/`llamaswaps` also accepted) + `GADFLY_BASE_URL` or a `GADFLY_ENDPOINT_` entry, or an `LLM_` `llama-swap://` / `llama-swaps://` DSN	optional bearer	⚠️ wired, untested
OpenAI-compatible (incl. local Ollama's `/v1`)	`openai` + `GADFLY_BASE_URL`	`OPENAI_API_KEY` (any non-empty for Ollama)	✅ tested against Ollama
OpenAI	`openai`	`OPENAI_API_KEY`	⚠️ wired, untested
Anthropic	`anthropic`	`ANTHROPIC_API_KEY`	⚠️ wired, untested
Google (Gemini)	`google`	`GOOGLE_API_KEY` / `GEMINI_API_KEY`	⚠️ wired, untested

🧪 Honest status

Only the Ollama paths above are actually exercised. The OpenAI / Anthropic / Google providers come "for free" from majordomo's abstraction and should work, but I haven't spent money verifying them — treat them as untested. The OpenAI-compatible path is tested, because you can point it at a local Ollama (GADFLY_BASE_URL=http://localhost:11434/v1) and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you try a cloud provider and it works (or doesn't), please open an issue.

Claude Code engine (`claude-code`)

Besides the majordomo model loop, Gadfly can review through the Claude Code CLI: for each lens it shells out to claude -p inside the checked-out repo, so Claude Code uses its own read tools (Read/Grep/Glob) to verify findings against real code, then Gadfly parses the result and runs the same verdict-parse → recheck → consolidate → emit pipeline. The CLI is bundled in the image (Node + @anthropic-ai/claude-code).

Select it as a model id — bare claude-code (CLI default model) or claude-code/<model> (the suffix becomes --model, e.g. claude-code/sonnet, claude-code/opus). An optional :<thinking> suffix forces an extended-thinking budget for that reviewer — :max (the high "ultrathink" tier) or :<n> for a specific token budget — so you can run the same model at two thinking depths as separate reviewers:

GADFLY_MODELS: "claude-code/sonnet,claude-code/opus,claude-code/opus:max"

The thinking budget is applied via the MAX_THINKING_TOKENS env on the CLI subprocess; it's best-effort (a no-op if the installed CLI build doesn't honor it).

Auth is read from the environment: the default is a Pro/Max subscription via CLAUDE_CODE_OAUTH_TOKEN (from claude setup-token; no --bare), falling back to ANTHROPIC_API_KEY. Don't set both. Tuning knobs (all optional):

Env	Default	Meaning
`GADFLY_CLAUDE_MODEL`	(from the spec suffix)	overrides the `--model` value
`GADFLY_CLAUDE_PERMISSION_MODE`	`plan`	`--permission-mode` (read-only `plan` keeps it from editing)
`GADFLY_CLAUDE_ALLOWED_TOOLS`	(unset)	`--allowedTools` value, passed verbatim (e.g. `Read,Grep,Glob`)
`GADFLY_CLAUDE_EXTRA_ARGS`	(unset)	extra CLI args, whitespace-split (no shell quoting) and appended after the defaults (e.g. `--max-turns 30`)
`GADFLY_CLAUDE_BIN`	`claude`	CLI binary path

These are operator knobs (workflow env), not PR-author input. Because GADFLY_CLAUDE_EXTRA_ARGS is appended after the defaults, it can override the read-only --permission-mode plan (e.g. passing --permission-mode acceptEdits), so keep it read-only unless you mean otherwise. It's whitespace-split, so values can't contain spaces — use GADFLY_CLAUDE_ALLOWED_TOOLS / _PERMISSION_MODE / _MODEL for those. The subprocess runs with a minimal environment (its auth token + PATH/HOME/locale/GADFLY_CLAUDE_*), not the runner's full env, so the Gitea token and provider keys aren't handed to the CLI.

Alternate backends (example only, not validated here). Because the subprocess env forwards ANTHROPIC_* and CLAUDE_*, you can point the same engine at a non-Anthropic backend by setting ANTHROPIC_BASE_URL (and ANTHROPIC_AUTH_TOKEN/ANTHROPIC_API_KEY) to an Anthropic-API-compatible proxy — e.g. claude-code-router or LiteLLM in front of Ollama — to run Ollama models through Claude Code's harness and compare it against the native majordomo loop. Whether tool-use survives a given proxy/backend varies, so this is documented as an example, not wired or tested here.

The Pro/Max path is dogfooded but otherwise lightly tested. claude-code/sonnet now runs on gadfly's own PRs (see .gitea/workflows/adversarial-review.yml), but treat the engine as new — and note that subscription auth in automated CI is a gray area in Anthropic's terms. auto specialist selection and the delegate_investigation worker are majordomo-only and are skipped with this engine (Claude Code does its own legwork).

Endpoint aliases via env vars

For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by name with env vars and then reference name/model in GADFLY_MODEL/GADFLY_MODELS:

# http-capable (Gadfly-native) — base URL used verbatim, so plaintext LAN works:
GADFLY_ENDPOINT_BIGBOX="ollama|http://192.168.1.50:11434"
GADFLY_ENDPOINT_GPU="openai|http://gpu.lan:8000/v1|sk-local"
GADFLY_ENDPOINT_M1="foreman|http://foreman-m1:8080|tok"   # native-Ollama queue daemon
GADFLY_MODELS="bigbox/qwen2.5-coder:7b,gpu/llama3.1,m1/qwen3:14b"

# pure spec alias (a model, or a failover chain):
GADFLY_ALIAS_FAST="bigbox/qwen2.5-coder:7b,ollama-cloud/gpt-oss:120b-cloud"
GADFLY_MODEL="fast"

<NAME> is lowercased to form the registry name (GADFLY_ENDPOINT_BIGBOX → bigbox). This is the same idea as majordomo's built-in LLM_* env DSNs (LLM_BIGBOX=ollama://tok@host, LLM_M1=foreman://tok@host), which Gadfly also honors — but those are HTTPS-only, so for a plaintext local Ollama or http:// foreman use GADFLY_ENDPOINT_* instead.

Gitea Actions note: repo vars/secrets aren't auto-exposed as env — add each alias to the stub workflow's env: block, e.g. GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}.

Specialists (the review swarm)

Instead of one generic reviewer, Gadfly runs a suite of specialists — each a focused lens with its own review (+recheck) pass — and merges them into one comment, a collapsible section per lens, led by an overall verdict (the worst across lenses; the optional improvements lens never escalates it).

Default suite (when nothing is configured): security, correctness, maintainability (code cleanliness), performance, error-handling.

Also built in (opt-in by name): tests, docs, conventions, and improvements (strict & quiet — at most 1–2 high-value, non-blocking suggestions, silent otherwise).

Select which run with GADFLY_SPECIALISTS (comma-separated names, or all):

GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"

Define your own — two ways, which compose (env overrides file overrides built-ins):

# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>"  (also overrides a built-in by reusing its name)
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
GADFLY_SPECIALISTS: "security,correctness,migrations"

# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
specialists: [security, correctness, maintainability, migrations]
define:
  - name: migrations
    title: "🗃️ DB migrations"
    focus: "Review schema migrations for destructive ops, missing indexes, table locks."

Dynamic selection (auto): set GADFLY_SPECIALISTS: auto and a selector model reads the changed files + PR description and picks only the lenses that materially apply (and may invent an ad-hoc one — e.g. a "migrations" lens for a schema change). The selector is GADFLY_SELECTOR_MODEL if set (a cheap tier is ideal), else the review model. Capped and de-duplicated; falls back to the default suite if selection fails.

Worker-tier delegation: set GADFLY_WORKER_MODEL (a cheap/fast model) to give every reviewer a delegate_investigation tool — it offloads mechanical legwork (trace all callers, gather every usage, check a pattern across files) to a worker sub-agent that returns a concise, evidence-cited digest, so the expensive model reasons over summaries instead of raw file dumps. Unset = no delegation (current behavior).

Cost: each specialist is its own review+recheck, so cost ≈ specialists × models × 2. The default suite runs on a single model. Trim with GADFLY_SPECIALISTS, let auto pick only what a diff needs, and point heavy legwork at a cheap GADFLY_WORKER_MODEL.

Concurrency (per-provider lanes)

With multiple models, each provider is its own lane and lanes run in parallel, so a fast cloud provider isn't stuck behind a slow local box. Within a lane, at most cap models run at once — cap comes from GADFLY_PROVIDER_CONCURRENCY (a provider=N map) else GADFLY_CONCURRENCY (default 1). The timeout is per-lens (GADFLY_TIMEOUT_SECS), so a slow model on one lens can't starve the others.

# One local box (serial — it serves one model at a time) + 3 cloud reviews at once,
# both lanes running concurrently:
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,m1pro=1"
GADFLY_MODELS: "m1pro/qwen3:14b,qwen3-coder:480b-cloud,gpt-oss:120b-cloud"

A model's provider is the spec's first segment (m1pro/… → m1pro), or GADFLY_PROVIDER/ ollama-cloud for a bare id. Default (cap 1) keeps a single-provider pool fully sequential.

Lens fan-out (within a model). By default the specialist lenses run sequentially inside each model (GADFLY_LENS_CONCURRENCY=1). Raise it to overlap the independent per-lens review+recheck passes — the model then posts its consolidated comment as soon as its lenses finish (so with sequential models, results stream in per model and per-model timings stay clean). Like the model cap, it's per-provider configurable: GADFLY_PROVIDER_LENS_CONCURRENCY takes a provider=N map keyed by the same provider lanes as GADFLY_PROVIDER_CONCURRENCY, falling back to the GADFLY_LENS_CONCURRENCY scalar (default 1). It multiplies with the model cap: total in-flight requests ≈ models-at-once × lenses-at-once, so to fan lenses out without oversubscribing a backend, keep its model cap low and raise its lens cap:

# Per provider: cloud runs one model at a time but fans its 3 lenses out (3 concurrent requests);
# the slow local box stays fully serial. Both provider lanes still run in parallel.
GADFLY_PROVIDER_CONCURRENCY:      "ollama-cloud=1,m1=1"
GADFLY_PROVIDER_LENS_CONCURRENCY: "ollama-cloud=3,m1=1"
GADFLY_SPECIALISTS: "security,correctness,error-handling"

Live status board

When several models (each with several lenses) review a PR, the individual findings land in one comment per model — but while that's in flight all you'd see is a row of ⏳ Reviewing… placeholders. So Gadfly also upserts one consolidated status-board comment that aggregates every model's per-lens progress as it happens:

## 🪰 Gadfly — live review status
1/3 reviewers finished · updated 2026-06-27 18:14:56Z

#### `glm-5.2:cloud` · ollama-cloud — ⏳ 2/4 lenses
- ✅ security — No material issues found
- 🔄 correctness — running
- ⏸️ performance — queued
…

Each model process publishes its lenses (queued → running → finished + verdict) to a small JSON file, and a background renderer in entrypoint.sh re-renders + upserts the single comment every GADFLY_STATUS_POLL_SECS (default 12s) until the swarm finishes. It's advisory and best-effort — the per-model findings comments are unaffected — and entirely separate from those. Turn it off with GADFLY_STATUS_BOARD=0.

Triggers

A new/reopened/ready non-draft PR — automatic.
Commenting @gadfly review on a PR — re-review on demand (gated to allowed users).
workflow_dispatch — manual, with a pr_number input.

(Pushing new commits does not auto-re-review — comment @gadfly review after pushing fixes. This keeps usage down.)

Comment trigger needs the workflow on your default branch. Gitea runs issue_comment workflows from the default branch, so @gadfly review only works once this stub is merged to main (the pull_request auto-trigger works from the PR branch immediately).

Security: the example stubs gate the comment trigger with a job-level if: github.event_name != 'issue_comment' || github.actor == '<you>' so an untrusted commenter can't start a secret-bearing run — edit it to your maintainers and keep it in sync with GADFLY_ALLOWED_USERS (the in-container check). @gadfly review is plain-text matched (configurable via GADFLY_TRIGGER_PHRASE), so no bot account is required; comments post as gitea-actions.

How it's packaged

cmd/gadfly/            the agentic reviewer binary (majordomo + Ollama Cloud); zero deps beyond stdlib + majordomo
scripts/run.sh         fetches the PR diff, runs the reviewer, upserts one labeled comment
scripts/status-board.sh    renders + upserts the single live status-board comment (per-lens progress)
scripts/system-prompt.txt  the reviewer persona + verification discipline
entrypoint.sh          the container brains: trigger gating, clone, model loop (logic lives here, not in YAML)
Dockerfile             multi-stage; build-time module creds (BuildKit secrets) never reach the final image
.gitea/workflows/build-image.yml   push to main → :latest; tag v* → :<tag> + :latest
examples/              the ~15-line stub a consuming repo drops in

The image is published to gitea.stevedudenhoeffer.com/steve/gadfly. Every push to main rebuilds and republishes :latest (plus :sha-<short>); pushing a v* tag publishes that pinned version (plus :latest). Pin consumers to a :vN tag for stability, or track :latest to ride main.

Configuration (advanced)

The reviewer binary reads these (the stub/entrypoint set sane defaults):

Env	Default	Meaning
`GADFLY_MODEL`	—	model id, or `provider/model` spec, or majordomo alias/chain
`GADFLY_PROVIDER`	`ollama-cloud`	provider prefix for a bare model id
`GADFLY_BASE_URL`	—	override endpoint (OpenAI/Ollama-compatible servers)
`GADFLY_API_KEY`	—	provider key; falls back to the provider's standard env
`claude-code` model id	—	route a model through the bundled Claude Code CLI (`claude-code` / `claude-code/<model>`); see Claude Code engine for its `GADFLY_CLAUDE_*` knobs
`GADFLY_SPECIALISTS`	default suite	csv of lenses, `all`, or `auto` (dynamic selection)
`GADFLY_SELECTOR_MODEL`	review model	model that picks lenses in `auto` mode
`GADFLY_WORKER_MODEL`	—	cheap model for `delegate_investigation`; unset = no delegation
`GADFLY_WORKER_MAX_STEPS`	8	tool-step cap for a delegated worker run
`GADFLY_CONCURRENCY`	1	default max models run at once per provider
`GADFLY_PROVIDER_CONCURRENCY`	—	per-provider overrides, e.g. `ollama-cloud=3,m1pro=1`
`GADFLY_LENS_CONCURRENCY`	1	specialist lenses run at once within a model (× model cap = total in-flight)
`GADFLY_PROVIDER_LENS_CONCURRENCY`	—	per-provider lens overrides, same lanes as `GADFLY_PROVIDER_CONCURRENCY`, e.g. `ollama-cloud=3,m1=1`
`GADFLY_MAX_STEPS`	24	review-pass tool-step cap
`GADFLY_TIMEOUT_SECS`	300	deadline per specialist lens (review+recheck)
`GADFLY_RECHECK`	on	set `0`/`false` to skip the recheck pass
`GADFLY_RECHECK_MAX_STEPS`	16	recheck-pass step cap
`GADFLY_MAX_DIFF_CHARS`	60000	diff chars embedded in the prompt (full diff via `get_diff`)
`GADFLY_STATUS_BOARD`	on	set `0` to disable the live status-board comment
`GADFLY_STATUS_POLL_SECS`	12	how often the status board re-renders/upserts
`GADFLY_TRIGGER_PHRASE`	`@gadfly review`	comment phrase that re-triggers
`GADFLY_ALLOWED_USERS`	(collaborators)	comma-separated allow-list for comment triggers
`GADFLY_FINDINGS_URL`	—	gadfly-reports store base URL; set to enable findings telemetry (off when empty)
`GADFLY_FINDINGS_TOKEN`	—	bearer token for the gadfly-reports store (sent as `Authorization: Bearer …`)
`GADFLY_REPO`	(from `GITEA_API`)	`owner/repo` slug stamped on emitted runs/findings (set by `entrypoint.sh`)
`GADFLY_PR`	(from event)	PR number stamped on emitted runs/findings (set by `entrypoint.sh`)

Findings telemetry (optional)

Gadfly can record what it found so model quality can be tracked over time. It is off by default and purely advisory: set GADFLY_FINDINGS_URL to a gadfly-reports store base URL and, after each review, the binary best-effort POSTs the run (/runs) and the findings it surfaced (/reports) to that store. Add GADFLY_FINDINGS_TOKEN to send an Authorization: Bearer … header. entrypoint.sh supplies the run context (GADFLY_REPO, GADFLY_PR) automatically.

Findings are extracted heuristically from each lens's markdown — a path:line reference anchors a finding, titled by the nearest preceding heading / numbered item / bold lead-in. A lens whose verdict is "No material issues found" emits no findings: its path:line references are verification notes ("verified X is safe"), not problems, so extracting them would record false positives and unfairly penalize thorough clean-pass reviewers. The emit is strictly best-effort: a short (~10s) timeout, any error (or a non-2xx response) is logged to stderr only, and it never changes the review output or the exit code.

Building locally

go build ./cmd/gadfly      # needs read access to the private majordomo module
go test ./...

License

MIT — see LICENSE.

22 KiB Raw Blame History Unescape Escape