402 lines
24 KiB
Markdown
402 lines
24 KiB
Markdown
# 🪰 Gadfly
|
||
|
||
**An AI gadfly for your pull requests.** Gadfly is an *adversarial* code reviewer that
|
||
runs in Gitea Actions: on every PR it reads your actual repository, hunts for real
|
||
problems, verifies them against the code, and posts its findings as a comment. It does not
|
||
praise your code. A gadfly does not let things slide.
|
||
|
||
> ### 🤖 Heads up: this is a vibe-coded project
|
||
> Gadfly was built almost entirely by an AI agent (Claude Code), prompts and all — the
|
||
> reviewer's "brain" is a language model, and so was most of the author. It works and it's
|
||
> tested, but treat it accordingly: **it is advisory only, it never blocks a merge, and you
|
||
> should still review its reviews.** Issues and PRs welcome; expect the occasional
|
||
> AI-flavored rough edge.
|
||
|
||
## What makes it different
|
||
|
||
Most LLM "review my diff" bots read the diff in isolation and hallucinate problems they
|
||
can't actually see — a "missing import" that's three lines above the hunk, a "broken
|
||
caller" in a file they never opened. Gadfly is **agentic**: the model has read-only tools
|
||
over the checked-out repo and is *required* to use them before reporting anything.
|
||
|
||
- **Tools:** `read_file`, `list_dir`, `grep`, `find_files`, `get_diff`.
|
||
- **Verify-before-claiming discipline:** baked into the system prompt — open the file,
|
||
grep the symbol, or drop the finding.
|
||
- **Two passes:** a *review* pass drafts findings, then an adversarial *recheck* pass
|
||
independently re-verifies each one against the code and drops the ones it can't confirm,
|
||
recomputing the verdict. This is what kills "confident but wrong."
|
||
- **Semantic-bug hunting:** it's told not to trust a plausible-looking constant, conversion
|
||
factor, or formula — re-derive the expected value, because that's where real bugs hide.
|
||
|
||
Every review leads with a one-line verdict: **No material issues found**, **Minor issues**,
|
||
or **Blocking issues found**.
|
||
|
||
## Turn it on for a repo
|
||
|
||
Gadfly ships as a container image, so consuming repos don't build anything — they just run
|
||
it. Drop one file in your repo and set a couple of secrets/vars:
|
||
|
||
1. Copy a stub from [`examples/`](examples/) to `.gitea/workflows/adversarial-review.yml` in
|
||
your repo. Two flavors: the slim [`reusable.yml`](examples/reusable.yml) — a tiny caller of
|
||
Gadfly's **reusable workflow** (`uses: steve/gadfly/.gitea/workflows/review-reusable.yml@…`,
|
||
forwarding only the secrets the reviewer needs), whose **default swarm is set centrally via owner
|
||
variables** (see [Central config via variables](#central-config-via-variables)) and inherited by omitting `with:` — or the full self-contained
|
||
[`adversarial-review.yml`](examples/adversarial-review.yml) (Ollama Cloud default, with inline
|
||
notes for every provider / local Ollama / OpenAI-compatible / endpoint aliases). See the
|
||
[examples index](examples/README.md).
|
||
2. Add repo config:
|
||
- **secret** `OLLAMA_CLOUD_API_KEY` — your [Ollama Cloud](https://ollama.com) key (empty
|
||
⇒ Gadfly posts a harmless "not configured" notice instead of reviewing). *Not needed if
|
||
you point Gadfly at a different provider — see [Models & providers](#models--providers).*
|
||
- **var** `OLLAMA_REVIEW_MODELS` *(optional)* — comma-separated model ids
|
||
(default `qwen3-coder:480b-cloud,gpt-oss:120b-cloud`). One comment per model.
|
||
- **var** `GADFLY_ALLOWED_USERS` *(optional)* — who may re-trigger via comment; empty ⇒
|
||
any repo collaborator.
|
||
|
||
`GITEA_TOKEN` is provided automatically by Actions; comments post as the `gitea-actions`
|
||
user, scoped to that repo — no bot account needed.
|
||
|
||
## Models & providers
|
||
|
||
Gadfly is built on [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo), so the
|
||
reviewer model is not hard-wired — it can target anything majordomo supports. Pick a provider
|
||
by setting `GADFLY_PROVIDER` (used to prefix bare model ids); point at a custom endpoint with
|
||
`GADFLY_BASE_URL`; supply a key with `GADFLY_API_KEY` or the provider's standard env var. A
|
||
`GADFLY_MODEL`/`GADFLY_MODELS` value that already contains a `provider/` prefix (or is a
|
||
majordomo failover chain / alias) is used verbatim.
|
||
|
||
| Provider | `GADFLY_PROVIDER` | Key env | Status |
|
||
|----------|-------------------|---------|--------|
|
||
| **Ollama Cloud** (default) | `ollama-cloud` | `OLLAMA_API_KEY` / `OLLAMA_CLOUD_API_KEY` | ✅ in active use |
|
||
| **Local Ollama** | `ollama` | none (`OLLAMA_HOST` or `GADFLY_BASE_URL` for a remote daemon) | ✅ tested |
|
||
| **[foreman](https://gitea.stevedudenhoeffer.com/steve/foreman)** (native-Ollama queue daemon) | `foreman` + `GADFLY_BASE_URL`, or a `GADFLY_ENDPOINT_*` / `LLM_*` `foreman://` entry | optional bearer (via the endpoint/DSN) | ✅ native-Ollama path |
|
||
| **[llama-swap](https://github.com/mostlygeek/llama-swap)** (model-swapping proxy) | `llama-swap`/`llama-swaps` (un-hyphenated `llamaswap`/`llamaswaps` also accepted) + `GADFLY_BASE_URL` or a `GADFLY_ENDPOINT_*` entry, or an `LLM_*` `llama-swap://` / `llama-swaps://` DSN | optional bearer | ⚠️ wired, **untested** |
|
||
| **OpenAI-compatible** (incl. local Ollama's `/v1`) | `openai` + `GADFLY_BASE_URL` | `OPENAI_API_KEY` (any non-empty for Ollama) | ✅ tested against Ollama |
|
||
| **OpenAI** | `openai` | `OPENAI_API_KEY` | ⚠️ wired, **untested** |
|
||
| **Anthropic** | `anthropic` | `ANTHROPIC_API_KEY` | ⚠️ wired, **untested** |
|
||
| **Google (Gemini)** | `google` | `GOOGLE_API_KEY` / `GEMINI_API_KEY` | ⚠️ wired, **untested** |
|
||
|
||
> ### 🧪 Honest status
|
||
> Only the **Ollama** paths above are actually exercised. The OpenAI / Anthropic / Google
|
||
> providers come "for free" from majordomo's abstraction and *should* work, but I haven't
|
||
> spent money verifying them — treat them as untested. The OpenAI-**compatible** path **is**
|
||
> tested, because you can point it at a local Ollama (`GADFLY_BASE_URL=http://localhost:11434/v1`)
|
||
> and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you
|
||
> try a cloud provider and it works (or doesn't), please open an issue.
|
||
|
||
### Claude Code engine (`claude-code`)
|
||
|
||
Besides the majordomo model loop, Gadfly can review through the **[Claude Code](https://claude.com/claude-code)
|
||
CLI**: for each lens it shells out to `claude -p` *inside the checked-out repo*, so Claude Code
|
||
uses its **own** read tools (Read/Grep/Glob) to verify findings against real code, then Gadfly
|
||
parses the result and runs the same verdict-parse → recheck → consolidate → emit pipeline. The
|
||
CLI is bundled in the image (Node + `@anthropic-ai/claude-code`).
|
||
|
||
Select it as a model id — bare `claude-code` (CLI default model) or `claude-code/<model>` (the
|
||
suffix becomes `--model`, e.g. `claude-code/sonnet`, `claude-code/opus`). An optional
|
||
`:<thinking>` suffix forces an extended-thinking budget for that reviewer — `:max` (the high
|
||
"ultrathink" tier) or `:<n>` for a specific token budget — so you can run the same model at two
|
||
thinking depths as separate reviewers:
|
||
|
||
```yaml
|
||
GADFLY_MODELS: "claude-code/sonnet,claude-code/opus,claude-code/opus:max"
|
||
```
|
||
|
||
The thinking budget is applied via the `MAX_THINKING_TOKENS` env on the CLI subprocess; it's
|
||
best-effort (a no-op if the installed CLI build doesn't honor it).
|
||
|
||
Auth is read from the environment: the default is a **Pro/Max subscription** via
|
||
`CLAUDE_CODE_OAUTH_TOKEN` (from `claude setup-token`; no `--bare`), falling back to
|
||
`ANTHROPIC_API_KEY`. Don't set both. Tuning knobs (all optional):
|
||
|
||
| Env | Default | Meaning |
|
||
|-----|---------|---------|
|
||
| `GADFLY_CLAUDE_MODEL` | *(from the spec suffix)* | overrides the `--model` value |
|
||
| `GADFLY_CLAUDE_PERMISSION_MODE` | `plan` | `--permission-mode` (read-only `plan` keeps it from editing) |
|
||
| `GADFLY_CLAUDE_ALLOWED_TOOLS` | *(unset)* | `--allowedTools` value, passed verbatim (e.g. `Read,Grep,Glob`) |
|
||
| `GADFLY_CLAUDE_EXTRA_ARGS` | *(unset)* | extra CLI args, **whitespace-split** (no shell quoting) and appended after the defaults (e.g. `--max-turns 30`) |
|
||
| `GADFLY_CLAUDE_BIN` | `claude` | CLI binary path |
|
||
|
||
> These are **operator** knobs (workflow env), not PR-author input. Because
|
||
> `GADFLY_CLAUDE_EXTRA_ARGS` is appended *after* the defaults, it can override the
|
||
> read-only `--permission-mode plan` (e.g. passing `--permission-mode acceptEdits`),
|
||
> so keep it read-only unless you mean otherwise. It's whitespace-split, so values
|
||
> can't contain spaces — use `GADFLY_CLAUDE_ALLOWED_TOOLS` / `_PERMISSION_MODE` /
|
||
> `_MODEL` for those. The subprocess runs with a **minimal environment** (its auth
|
||
> token + `PATH`/`HOME`/locale/`GADFLY_CLAUDE_*`), not the runner's full env, so the
|
||
> Gitea token and provider keys aren't handed to the CLI.
|
||
|
||
**Alternate backends (example only, not validated here).** Because the subprocess env forwards
|
||
`ANTHROPIC_*` and `CLAUDE_*`, you can point the same engine at a non-Anthropic backend by setting
|
||
`ANTHROPIC_BASE_URL` (and `ANTHROPIC_AUTH_TOKEN`/`ANTHROPIC_API_KEY`) to an **Anthropic-API-compatible
|
||
proxy** — e.g. [claude-code-router](https://github.com/musistudio/claude-code-router) or LiteLLM in
|
||
front of Ollama — to run *Ollama models through Claude Code's harness* and compare it against the
|
||
native majordomo loop. Whether tool-use survives a given proxy/backend varies, so this is documented
|
||
as an example, not wired or tested here.
|
||
|
||
> **The Pro/Max path is dogfooded but otherwise lightly tested.** `claude-code/sonnet` now runs on
|
||
> gadfly's own PRs (see `.gitea/workflows/adversarial-review.yml`), but treat the engine as new —
|
||
> and note that subscription auth in automated CI is a gray area in Anthropic's terms. `auto`
|
||
> specialist selection and the `delegate_investigation` worker are majordomo-only and are skipped
|
||
> with this engine (Claude Code does its own legwork).
|
||
|
||
### Endpoint aliases via env vars
|
||
|
||
For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by
|
||
name with env vars and then reference `name/model` in `GADFLY_MODEL`/`GADFLY_MODELS`:
|
||
|
||
```sh
|
||
# http-capable (Gadfly-native) — base URL used verbatim, so plaintext LAN works:
|
||
GADFLY_ENDPOINT_BIGBOX="ollama|http://192.168.1.50:11434"
|
||
GADFLY_ENDPOINT_GPU="openai|http://gpu.lan:8000/v1|sk-local"
|
||
GADFLY_ENDPOINT_M1="foreman|http://foreman-m1:8080|tok" # native-Ollama queue daemon
|
||
GADFLY_MODELS="bigbox/qwen2.5-coder:7b,gpu/llama3.1,m1/qwen3:14b"
|
||
|
||
# pure spec alias (a model, or a failover chain):
|
||
GADFLY_ALIAS_FAST="bigbox/qwen2.5-coder:7b,ollama-cloud/gpt-oss:120b-cloud"
|
||
GADFLY_MODEL="fast"
|
||
```
|
||
|
||
`<NAME>` is lowercased to form the registry name (`GADFLY_ENDPOINT_BIGBOX` → `bigbox`). This
|
||
is the same idea as majordomo's built-in **`LLM_*` env DSNs** (`LLM_BIGBOX=ollama://tok@host`,
|
||
`LLM_M1=foreman://tok@host`), which Gadfly also honors — but those are **HTTPS-only**, so for a
|
||
plaintext local Ollama or `http://` foreman use `GADFLY_ENDPOINT_*` instead.
|
||
|
||
> **Gitea Actions note:** repo `vars`/`secrets` aren't auto-exposed as env — add each alias to
|
||
> the stub workflow's `env:` block, e.g. `GADFLY_ENDPOINT_BIGBOX: ${{ vars.GADFLY_ENDPOINT_BIGBOX }}`.
|
||
|
||
## Specialists (the review swarm)
|
||
|
||
Instead of one generic reviewer, Gadfly runs a **suite of specialists** — each a focused lens
|
||
with its own review (+recheck) pass — and merges them into **one comment**, a collapsible
|
||
section per lens, led by an overall verdict (the worst across lenses; the optional
|
||
`improvements` lens never escalates it).
|
||
|
||
**Default suite** (when nothing is configured):
|
||
`security`, `correctness`, `maintainability` (code cleanliness), `performance`, `error-handling`.
|
||
|
||
**Also built in** (opt-in by name): `tests`, `docs`, `conventions`, and `improvements`
|
||
(strict & quiet — at most 1–2 high-value, non-blocking suggestions, silent otherwise).
|
||
|
||
Select which run with **`GADFLY_SPECIALISTS`** (comma-separated names, or `all`):
|
||
|
||
```yaml
|
||
GADFLY_SPECIALISTS: "security,correctness,maintainability,tests"
|
||
```
|
||
|
||
**Define your own** — two ways, which compose (env overrides file overrides built-ins):
|
||
|
||
```yaml
|
||
# 1. env: GADFLY_SPECIALIST_<NAME>="<focus>" (also overrides a built-in by reusing its name)
|
||
GADFLY_SPECIALIST_MIGRATIONS: "Review DB migrations for destructive or unindexed changes."
|
||
GADFLY_SPECIALISTS: "security,correctness,migrations"
|
||
```
|
||
|
||
```yaml
|
||
# 2. a repo .gadfly.yml at the repo root (version-controlled). See examples/.gadfly.yml:
|
||
specialists: [security, correctness, maintainability, migrations]
|
||
define:
|
||
- name: migrations
|
||
title: "🗃️ DB migrations"
|
||
focus: "Review schema migrations for destructive ops, missing indexes, table locks."
|
||
```
|
||
|
||
**Dynamic selection (`auto`):** set `GADFLY_SPECIALISTS: auto` and a selector model reads the
|
||
changed files + PR description and picks only the lenses that materially apply (and may invent
|
||
an ad-hoc one — e.g. a "migrations" lens for a schema change). The selector is
|
||
`GADFLY_SELECTOR_MODEL` if set (a cheap tier is ideal), else the review model. Capped and
|
||
de-duplicated; falls back to the default suite if selection fails.
|
||
|
||
**Worker-tier delegation:** set `GADFLY_WORKER_MODEL` (a cheap/fast model) to give every
|
||
reviewer a `delegate_investigation` tool — it offloads mechanical legwork (trace all callers,
|
||
gather every usage, check a pattern across files) to a worker sub-agent that returns a concise,
|
||
evidence-cited digest, so the expensive model reasons over summaries instead of raw file dumps.
|
||
Unset = no delegation (current behavior).
|
||
|
||
> **Cost:** each specialist is its own review+recheck, so cost ≈ *specialists × models × 2*.
|
||
> The default suite runs on a **single** model. Trim with `GADFLY_SPECIALISTS`, let `auto` pick
|
||
> only what a diff needs, and point heavy legwork at a cheap `GADFLY_WORKER_MODEL`.
|
||
|
||
### Concurrency (per-provider lanes)
|
||
|
||
With multiple models, each **provider** is its own lane and lanes run in **parallel**, so a fast
|
||
cloud provider isn't stuck behind a slow local box. Within a lane, at most `cap` models run at
|
||
once — `cap` comes from `GADFLY_PROVIDER_CONCURRENCY` (a `provider=N` map) else `GADFLY_CONCURRENCY`
|
||
(default `1`). The timeout is **per-lens** (`GADFLY_TIMEOUT_SECS`), so a slow model on one lens
|
||
can't starve the others.
|
||
|
||
```yaml
|
||
# One local box (serial — it serves one model at a time) + 3 cloud reviews at once,
|
||
# both lanes running concurrently:
|
||
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3,m1pro=1"
|
||
GADFLY_MODELS: "m1pro/qwen3:14b,qwen3-coder:480b-cloud,gpt-oss:120b-cloud"
|
||
```
|
||
|
||
A model's provider is the spec's first segment (`m1pro/…` → `m1pro`), or `GADFLY_PROVIDER`/
|
||
`ollama-cloud` for a bare id. Default (`cap 1`) keeps a single-provider pool fully sequential.
|
||
|
||
**Lens fan-out (within a model).** By default the specialist lenses run **sequentially** inside
|
||
each model (`GADFLY_LENS_CONCURRENCY=1`). Raise it to overlap the independent per-lens
|
||
review+recheck passes — the model then posts its consolidated comment as soon as its lenses
|
||
finish (so with sequential models, results stream in per model and per-model timings stay
|
||
clean). Like the model cap, it's **per-provider configurable**: `GADFLY_PROVIDER_LENS_CONCURRENCY`
|
||
takes a `provider=N` map keyed by the **same provider lanes** as `GADFLY_PROVIDER_CONCURRENCY`,
|
||
falling back to the `GADFLY_LENS_CONCURRENCY` scalar (default `1`). **It multiplies with the
|
||
model cap:** total in-flight requests ≈ *models-at-once × lenses-at-once*, so to fan lenses out
|
||
without oversubscribing a backend, keep its model cap low and raise its lens cap:
|
||
|
||
```yaml
|
||
# Per provider: cloud runs one model at a time but fans its 3 lenses out (3 concurrent requests);
|
||
# the slow local box stays fully serial. Both provider lanes still run in parallel.
|
||
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=1,m1=1"
|
||
GADFLY_PROVIDER_LENS_CONCURRENCY: "ollama-cloud=3,m1=1"
|
||
GADFLY_SPECIALISTS: "security,correctness,error-handling"
|
||
```
|
||
|
||
### Live status board
|
||
|
||
When several models (each with several lenses) review a PR, the individual findings land in
|
||
**one comment per model** — but while that's in flight all you'd see is a row of
|
||
`⏳ Reviewing…` placeholders. So Gadfly also upserts **one consolidated status-board comment**
|
||
that aggregates every model's per-lens progress as it happens:
|
||
|
||
```
|
||
## 🪰 Gadfly — live review status
|
||
1/3 reviewers finished · updated 2026-06-27 18:14:56Z
|
||
|
||
#### `glm-5.2:cloud` · ollama-cloud — ⏳ 2/4 lenses
|
||
- ✅ security — No material issues found
|
||
- 🔄 correctness — running
|
||
- ⏸️ performance — queued
|
||
…
|
||
```
|
||
|
||
Each model process publishes its lenses (queued → running → finished + verdict) to a small
|
||
JSON file, and a background renderer in `entrypoint.sh` re-renders + upserts the single comment
|
||
every `GADFLY_STATUS_POLL_SECS` (default 12s) until the swarm finishes. It's advisory and
|
||
best-effort — the per-model findings comments are unaffected — and entirely separate from those.
|
||
Turn it off with `GADFLY_STATUS_BOARD=0`.
|
||
|
||
### Triggers
|
||
|
||
1. A **new/reopened/ready** non-draft PR — automatic.
|
||
2. Commenting **`@gadfly review`** on a PR — re-review on demand (gated to allowed users).
|
||
3. **workflow_dispatch** — manual, with a `pr_number` input.
|
||
|
||
(Pushing new commits does *not* auto-re-review — comment `@gadfly review` after pushing
|
||
fixes. This keeps usage down.)
|
||
|
||
> **Comment trigger needs the workflow on your default branch.** Gitea runs `issue_comment`
|
||
> workflows from the **default branch**, so `@gadfly review` only works once this stub is
|
||
> merged to `main` (the `pull_request` auto-trigger works from the PR branch immediately).
|
||
>
|
||
> **Security:** the example stubs gate the comment trigger with a job-level
|
||
> `if: github.event_name != 'issue_comment' || github.actor == '<you>'` so an untrusted
|
||
> commenter can't start a secret-bearing run — edit it to your maintainers and keep it in
|
||
> sync with `GADFLY_ALLOWED_USERS` (the in-container check). `@gadfly review` is plain-text
|
||
> matched (configurable via `GADFLY_TRIGGER_PHRASE`), so no bot account is required; comments
|
||
> post as `gitea-actions`.
|
||
|
||
## How it's packaged
|
||
|
||
```
|
||
cmd/gadfly/ the agentic reviewer binary (majordomo + Ollama Cloud); zero deps beyond stdlib + majordomo
|
||
scripts/run.sh fetches the PR diff, runs the reviewer, upserts one labeled comment
|
||
scripts/status-board.sh renders + upserts the single live status-board comment (per-lens progress)
|
||
scripts/system-prompt.txt the reviewer persona + verification discipline
|
||
entrypoint.sh the container brains: trigger gating, clone, model loop (logic lives here, not in YAML)
|
||
Dockerfile multi-stage; build-time module creds (BuildKit secrets) never reach the final image
|
||
.gitea/workflows/build-image.yml push to main → :latest; tag v* → :<tag> + :latest
|
||
examples/ the ~15-line stub a consuming repo drops in
|
||
```
|
||
|
||
The image is published to `gitea.stevedudenhoeffer.com/steve/gadfly`. Every push to `main`
|
||
rebuilds and republishes `:latest` (plus `:sha-<short>`); pushing a `v*` tag publishes that
|
||
pinned version (plus `:latest`). Pin full-stub consumers to a `:vN` image tag for stability, or track
|
||
`:latest` to ride main. **Reusable-workflow consumers should pin the workflow ref to an immutable
|
||
`review-reusable.yml@<sha>`** — long-lived act_runners *cache the workflow file by ref*, so a moved tag
|
||
(`@v1`) or `@main` is often **not** re-fetched and silently runs a stale copy. A fresh `@<sha>` is the
|
||
only reliable way to roll out a *structural* change to the reusable.
|
||
|
||
### Central config via variables
|
||
|
||
So you don't have to re-pin every consumer just to retune the swarm, the reusable resolves its config
|
||
at **runtime** — `with:` input → owner **user/org-level variable** → image default — and variables are
|
||
injected per-run (not part of the cached file), so changing one variable propagates to every consumer
|
||
on its next review **without** a re-pin or a tag move:
|
||
|
||
| Variable (user/org scope) | Sets |
|
||
|---|---|
|
||
| `GADFLY_DEFAULT_MODELS` | `GADFLY_MODELS` (csv) |
|
||
| `GADFLY_DEFAULT_SPECIALISTS` | the lens suite |
|
||
| `GADFLY_DEFAULT_PROVIDER_CONCURRENCY` | models-at-once per provider |
|
||
| `GADFLY_DEFAULT_PROVIDER_LENS_CONCURRENCY` | lenses-at-once per provider |
|
||
| `GADFLY_ENDPOINT_RAGNAROS` | a named endpoint, e.g. `llamaswap\|https://host` |
|
||
|
||
Adding a *new* named endpoint still needs a one-line reusable edit (Gitea can't auto-expose arbitrary
|
||
`vars.GADFLY_ENDPOINT_*`); the values of already-wired ones are pure variables.
|
||
|
||
## Configuration (advanced)
|
||
|
||
The reviewer binary reads these (the stub/entrypoint set sane defaults):
|
||
|
||
| Env | Default | Meaning |
|
||
|-----|---------|---------|
|
||
| `GADFLY_MODEL` | — | model id, or `provider/model` spec, or majordomo alias/chain |
|
||
| `GADFLY_PROVIDER` | `ollama-cloud` | provider prefix for a bare model id |
|
||
| `GADFLY_BASE_URL` | — | override endpoint (OpenAI/Ollama-compatible servers) |
|
||
| `GADFLY_API_KEY` | — | provider key; falls back to the provider's standard env |
|
||
| `claude-code` model id | — | route a model through the bundled Claude Code CLI (`claude-code` / `claude-code/<model>`); see [Claude Code engine](#claude-code-engine-claude-code) for its `GADFLY_CLAUDE_*` knobs |
|
||
| `GADFLY_SPECIALISTS` | default suite | csv of lenses, `all`, or `auto` (dynamic selection) |
|
||
| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
|
||
| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
|
||
| `GADFLY_WORKER_MAX_STEPS` | 8 | tool-step cap for a delegated worker run |
|
||
| `GADFLY_CONCURRENCY` | 1 | default max models run at once **per provider** |
|
||
| `GADFLY_PROVIDER_CONCURRENCY` | — | per-provider overrides, e.g. `ollama-cloud=3,m1pro=1` |
|
||
| `GADFLY_LENS_CONCURRENCY` | 1 | specialist lenses run at once **within a model** (× model cap = total in-flight) |
|
||
| `GADFLY_PROVIDER_LENS_CONCURRENCY` | — | per-provider lens overrides, same lanes as `GADFLY_PROVIDER_CONCURRENCY`, e.g. `ollama-cloud=3,m1=1` |
|
||
| `GADFLY_MAX_STEPS` | 24 | review-pass tool-step cap |
|
||
| `GADFLY_TIMEOUT_SECS` | 300 | deadline **per specialist lens** (review+recheck) |
|
||
| `GADFLY_RECHECK` | on | set `0`/`false` to skip the recheck pass |
|
||
| `GADFLY_RECHECK_MAX_STEPS` | 16 | recheck-pass step cap |
|
||
| `GADFLY_MAX_DIFF_CHARS` | 60000 | diff chars embedded in the prompt (full diff via `get_diff`) |
|
||
| `GADFLY_STATUS_BOARD` | on | set `0` to disable the live status-board comment |
|
||
| `GADFLY_STATUS_POLL_SECS` | 12 | how often the status board re-renders/upserts |
|
||
| `GADFLY_TRIGGER_PHRASE` | `@gadfly review` | comment phrase that re-triggers |
|
||
| `GADFLY_ALLOWED_USERS` | *(collaborators)* | comma-separated allow-list for comment triggers |
|
||
| `GADFLY_FINDINGS_URL` | — | gadfly-reports store base URL; set to enable findings telemetry (off when empty) |
|
||
| `GADFLY_FINDINGS_TOKEN` | — | bearer token for the gadfly-reports store (sent as `Authorization: Bearer …`) |
|
||
| `GADFLY_REPO` | *(from `GITEA_API`)* | `owner/repo` slug stamped on emitted runs/findings (set by `entrypoint.sh`) |
|
||
| `GADFLY_PR` | *(from event)* | PR number stamped on emitted runs/findings (set by `entrypoint.sh`) |
|
||
|
||
## Findings telemetry (optional)
|
||
|
||
Gadfly can record what it found so model quality can be tracked over time. It is
|
||
**off by default** and purely advisory: set **`GADFLY_FINDINGS_URL`** to a
|
||
[gadfly-reports](https://gitea.stevedudenhoeffer.com/steve/gadfly-reports) store base URL and,
|
||
after each review, the binary best-effort `POST`s the run (`/runs`) and the
|
||
findings it surfaced (`/reports`) to that store. Add **`GADFLY_FINDINGS_TOKEN`**
|
||
to send an `Authorization: Bearer …` header. `entrypoint.sh` supplies the run
|
||
context (`GADFLY_REPO`, `GADFLY_PR`) automatically.
|
||
|
||
Findings are extracted heuristically from each lens's markdown — a `path:line`
|
||
reference anchors a finding, titled by the nearest preceding heading / numbered
|
||
item / bold lead-in. A lens whose verdict is **"No material issues found"**
|
||
emits **no** findings: its `path:line` references are verification notes
|
||
("verified X is safe"), not problems, so extracting them would record false
|
||
positives and unfairly penalize thorough clean-pass reviewers. The emit is
|
||
strictly best-effort: a short (~10s) timeout, any error (or a non-2xx response)
|
||
is logged to stderr only, and it **never** changes the review output or the exit
|
||
code.
|
||
|
||
## Building locally
|
||
|
||
```sh
|
||
go build ./cmd/gadfly # needs read access to the private majordomo module
|
||
go test ./...
|
||
```
|
||
|
||
## License
|
||
|
||
MIT — see [LICENSE](LICENSE).
|