7809d1b93d
Build & push image / build-and-push (push) Successful in 8s
Replace the single generic review with a suite of focused specialists, each its own review+recheck pass, merged into ONE comment (a collapsible section per lens, led by the worst verdict; the optional `improvements` lens never escalates it). - cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness, maintainability, performance, error-handling) + opt-in (tests, docs, conventions, improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define). Precedence: built-ins < file < env. Unknown names error but don't sink the run. - cmd/gadfly/consolidate.go: verdict parse + one-comment render. - main.go: loop specialists; per-lens failure is an inline notice, never fatal. Default timeout bumped to 600s (suite runs sequentially). - base system prompt trimmed to persona+tools+discipline+output; lens-specific focus is appended per specialist (semantic re-derivation discipline kept in base). - entrypoint default models -> single model (suite already gives breadth; cost ~= specialists × models × 2). Adds gopkg.in/yaml.v3. - docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var, CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
199 lines
10 KiB
Markdown
199 lines
10 KiB
Markdown
# 🪰 Gadfly
|
||
|
||
**An AI gadfly for your pull requests.** Gadfly is an *adversarial* code reviewer that
|
||
runs in Gitea Actions: on every PR it reads your actual repository, hunts for real
|
||
problems, verifies them against the code, and posts its findings as a comment. It does not
|
||
praise your code. A gadfly does not let things slide.
|
||
|
||
> ### 🤖 Heads up: this is a vibe-coded project
|
||
> Gadfly was built almost entirely by an AI agent (Claude Code), prompts and all — the
|
||
> reviewer's "brain" is a language model, and so was most of the author. It works and it's
|
||
> tested, but treat it accordingly: **it is advisory only, it never blocks a merge, and you
|
||
> should still review its reviews.** Issues and PRs welcome; expect the occasional
|
||
> AI-flavored rough edge.
|
||
|
||
## What makes it different
|
||
|
||
Most LLM "review my diff" bots read the diff in isolation and hallucinate problems they
|
||
can't actually see — a "missing import" that's three lines above the hunk, a "broken
|
||
caller" in a file they never opened. Gadfly is **agentic**: the model has read-only tools
|
||
over the checked-out repo and is *required* to use them before reporting anything.
|
||
|
||
- **Tools:** `read_file`, `list_dir`, `grep`, `find_files`, `get_diff`.
|
||
- **Verify-before-claiming discipline:** baked into the system prompt — open the file,
|
||
grep the symbol, or drop the finding.
|
||
- **Two passes:** a *review* pass drafts findings, then an adversarial *recheck* pass
|
||
independently re-verifies each one against the code and drops the ones it can't confirm,
|
||
recomputing the verdict. This is what kills "confident but wrong."
|
||
- **Semantic-bug hunting:** it's told not to trust a plausible-looking constant, conversion
|
||
factor, or formula — re-derive the expected value, because that's where real bugs hide.
|
||
|
||
Every review leads with a one-line verdict: **No material issues found**, **Minor issues**,
|
||
or **Blocking issues found**.
|
||
|
||
## Turn it on for a repo
|
||
|
||
Gadfly ships as a container image, so consuming repos don't build anything — they just run
|
||
it. Drop one file in your repo and set a couple of secrets/vars:
|
||
|
||
1. Copy a stub from [`examples/`](examples/) to `.gitea/workflows/adversarial-review.yml` in
|
||
your repo — [`adversarial-review.yml`](examples/adversarial-review.yml) for the Ollama Cloud
|
||
default, or a provider-specific one (local Ollama, OpenAI-compatible, endpoint aliases). See
|
||
the [examples index](examples/README.md).
|
||
2. Add repo config:
|
||
- **secret** `OLLAMA_CLOUD_API_KEY` — your [Ollama Cloud](https://ollama.com) key (empty
|
||
⇒ Gadfly posts a harmless "not configured" notice instead of reviewing). *Not needed if
|
||
you point Gadfly at a different provider — see [Models & providers](#models--providers).*
|
||
- **var** `OLLAMA_REVIEW_MODELS` *(optional)* — comma-separated model ids
|
||
(default `qwen3-coder:480b-cloud,gpt-oss:120b-cloud`). One comment per model.
|
||
- **var** `GADFLY_ALLOWED_USERS` *(optional)* — who may re-trigger via comment; empty ⇒
|
||
any repo collaborator.
|
||
|
||
`GITEA_TOKEN` is provided automatically by Actions; comments post as the `gitea-actions`
|
||
user, scoped to that repo — no bot account needed.
|
||
|
||
## Models & providers
|
||
|
||
Gadfly is built on [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo), so the
|
||
reviewer model is not hard-wired — it can target anything majordomo supports. Pick a provider
|
||
by setting `GADFLY_PROVIDER` (used to prefix bare model ids); point at a custom endpoint with
|
||
`GADFLY_BASE_URL`; supply a key with `GADFLY_API_KEY` or the provider's standard env var. A
|
||
`GADFLY_MODEL`/`GADFLY_MODELS` value that already contains a `provider/` prefix (or is a
|
||
majordomo failover chain / alias) is used verbatim.
|
||
|
||
| Provider | `GADFLY_PROVIDER` | Key env | Status |
|
||
|----------|-------------------|---------|--------|
|
||
| **Ollama Cloud** (default) | `ollama-cloud` | `OLLAMA_API_KEY` / `OLLAMA_CLOUD_API_KEY` | ✅ in active use |
|
||
| **Local Ollama** | `ollama` | none (`OLLAMA_HOST` or `GADFLY_BASE_URL` for a remote daemon) | ✅ tested |
|
||
| **OpenAI-compatible** (incl. local Ollama's `/v1`) | `openai` + `GADFLY_BASE_URL` | `OPENAI_API_KEY` (any non-empty for Ollama) | ✅ tested against Ollama |
|
||
| **OpenAI** | `openai` | `OPENAI_API_KEY` | ⚠️ wired, **untested** |
|
||
| **Anthropic** | `anthropic` | `ANTHROPIC_API_KEY` | ⚠️ wired, **untested** |
|
||
| **Google (Gemini)** | `google` | `GOOGLE_API_KEY` / `GEMINI_API_KEY` | ⚠️ wired, **untested** |
|
||
|
||
> ### 🧪 Honest status
|
||
> Only the **Ollama** paths above are actually exercised. The OpenAI / Anthropic / Google
|
||
> providers come "for free" from majordomo's abstraction and *should* work, but I haven't
|
||
> spent money verifying them — treat them as untested. The OpenAI-**compatible** path **is**
|
||
> tested, because you can point it at a local Ollama (`GADFLY_BASE_URL=http://localhost:11434/v1`)
|
||
> and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you
|
||
> try a cloud provider and it works (or doesn't), please open an issue.
|
||
|
||
### Endpoint aliases via env vars
|
||
|
||
For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by
|
||
name with env vars and then reference `name/model` in `GADFLY_MODEL`/`GADFLY_MODELS`:
|
||
|
||
```sh
|
||
# http-capable (Gadfly-native) — base URL used verbatim, so plaintext LAN works:
|
||
GADFLY_ENDPOINT_BIGBOX="ollama|http://192.168.1.50:11434"
|
||
GADFLY_ENDPOINT_GPU="openai|http://gpu.lan:8000/v1|sk-local"
|
||
GADFLY_MODELS="bigbox/qwen2.5-coder:7b,gpu/llama3.1"
|
||
|
||
# pure spec alias (a model, or a failover chain):
|
||
GADFLY_ALIAS_FAST="bigbox/qwen2.5-coder:7b,ollama-cloud/gpt-oss:120b-cloud"
|
||
GADFLY_MODEL="fast"
|
||
```
|
||
|
||
`<NAME>` is lowercased to form the registry name (`GADFLY_ENDPOINT_BIGBOX` → `bigbox`). This
|
||
is the same idea as majordomo's built-in **`LLM_*` env DSNs** (`LLM_BIGBOX=ollama://tok@host`),
|
||
which Gadfly also honors — but those are **HTTPS-only**, so for plaintext local Ollama use
|
||
`GADFLY_ENDPOINT_*` instead.
|
||
|
||
> **Gitea Actions note:** repo `vars`/`secrets` aren't auto-exposed as env — add each alias to
|
||
> the stub workflow's `env:` block, e.g. `GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}`.
|
||
|
||
## Specialists (the review swarm)
|
||
|
||
Instead of one generic reviewer, Gadfly runs a **suite of specialists** — each a focused lens
|
||
with its own review (+recheck) pass — and merges them into **one comment**, a collapsible
|
||
section per lens, led by an overall verdict (the worst across lenses; the optional
|
||
`improvements` lens never escalates it).
|
||
|
||
**Default suite** (when nothing is configured):
|
||
`security`, `correctness`, `maintainability` (code cleanliness), `performance`, `error-handling`.
|
||
|
||
**Also built in** (opt-in by name): `tests`, `docs`, `conventions`, and `improvements`
|
||
(strict & quiet — at most 1–2 high-value, non-blocking suggestions, silent otherwise).
|
||
|
||
Select which run with **`GADFLY_SPECIALISTS`** (comma-separated names, or `all`):
|
||
|
||
```yaml
|
||
GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"
|
||
```
|
||
|
||
**Define your own** — two ways, which compose (env overrides file overrides built-ins):
|
||
|
||
```yaml
|
||
# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>" (also overrides a built-in by reusing its name)
|
||
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
|
||
GADFLY_SPECIALISTS: "security,correctness,migrations"
|
||
```
|
||
|
||
```yaml
|
||
# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
|
||
specialists: [security, correctness, maintainability, migrations]
|
||
define:
|
||
- name: migrations
|
||
title: "🗃️ DB migrations"
|
||
focus: "Review schema migrations for destructive ops, missing indexes, table locks."
|
||
```
|
||
|
||
> **Cost:** each specialist is its own review+recheck, so cost ≈ *specialists × models × 2*.
|
||
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, and a future
|
||
> `auto` mode will let a cheap model pick only the lenses a given diff actually needs.
|
||
|
||
### Triggers
|
||
|
||
1. A **new/reopened/ready** non-draft PR — automatic.
|
||
2. Commenting **`@gadfly review`** on a PR — re-review on demand (gated to allowed users).
|
||
3. **workflow_dispatch** — manual, with a `pr_number` input.
|
||
|
||
(Pushing new commits does *not* auto-re-review — comment `@gadfly review` after pushing
|
||
fixes. This keeps usage down.)
|
||
|
||
## How it's packaged
|
||
|
||
```
|
||
cmd/gadfly/ the agentic reviewer binary (majordomo + Ollama Cloud); zero deps beyond stdlib + majordomo
|
||
scripts/run.sh fetches the PR diff, runs the reviewer, upserts one labeled comment
|
||
scripts/system-prompt.txt the reviewer persona + verification discipline
|
||
entrypoint.sh the container brains: trigger gating, clone, model loop (logic lives here, not in YAML)
|
||
Dockerfile multi-stage; build-time module creds (BuildKit secrets) never reach the final image
|
||
.gitea/workflows/build-image.yml push to main → :latest; tag v* → :<tag> + :latest
|
||
examples/ the ~15-line stub a consuming repo drops in
|
||
```
|
||
|
||
The image is published to `gitea.stevedudenhoeffer.com/steve/gadfly`. Every push to `main`
|
||
rebuilds and republishes `:latest` (plus `:sha-<short>`); pushing a `v*` tag publishes that
|
||
pinned version (plus `:latest`). Pin consumers to a `:vN` tag for stability, or track
|
||
`:latest` to ride main.
|
||
|
||
## Configuration (advanced)
|
||
|
||
The reviewer binary reads these (the stub/entrypoint set sane defaults):
|
||
|
||
| Env | Default | Meaning |
|
||
|-----|---------|---------|
|
||
| `GADFLY_MODEL` | — | model id, or `provider/model` spec, or majordomo alias/chain |
|
||
| `GADFLY_PROVIDER` | `ollama-cloud` | provider prefix for a bare model id |
|
||
| `GADFLY_BASE_URL` | — | override endpoint (OpenAI/Ollama-compatible servers) |
|
||
| `GADFLY_API_KEY` | — | provider key; falls back to the provider's standard env |
|
||
| `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap |
|
||
| `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
|
||
| `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |
|
||
| `GADFLY_TIMEOUT_SECS` | 300 | overall deadline (both passes) |
|
||
| `GADFLY_MAX_DIFF_CHARS` | 60000 | diff chars embedded in the prompt (full diff via `get_diff`) |
|
||
| `GADFLY_TRIGGER_PHRASE` | `@gadfly review` | comment phrase that re-triggers |
|
||
| `GADFLY_ALLOWED_USERS` | *(collaborators)* | comma-separated allow-list for comment triggers |
|
||
|
||
## Building locally
|
||
|
||
```sh
|
||
go build ./cmd/gadfly # needs read access to the private majordomo module
|
||
go test ./...
|
||
```
|
||
|
||
## License
|
||
|
||
MIT — see [LICENSE](LICENSE).
|