4654036dea
Build & push image / build-and-push (pull_request) Successful in 6s
From PR #10's own review (maintainability/perf lenses): examples/README.md hadn't been updated for the default swarm, and CLAUDE.md's 'keep the default model count low' cost guidance read as contradicting the new heavy default. Clarify that the IMAGE default stays minimal while the REUSABLE ships an opinionated heavier default consumers inherit/override. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
148 lines
9.4 KiB
Markdown
148 lines
9.4 KiB
Markdown
# Gadfly — Developer Guide
|
||
|
||
Gadfly (🪰) is an **agentic adversarial code reviewer** that runs in Gitea Actions. On a pull
|
||
request it reads the *checked-out repository* with read-only tools, hunts for real problems,
|
||
verifies each one against the actual code, and posts its findings as a comment. It is
|
||
**advisory only** — it never blocks a merge.
|
||
|
||
> This is a public, **vibe-coded** project (built largely by an AI agent). Keep that framing
|
||
> honest in the README; don't oversell it.
|
||
|
||
## Project goals (keep changes aligned to these)
|
||
|
||
1. **Find *real* problems, not nits.** The whole point of the agentic tools + two-pass
|
||
recheck is to kill diff-only false positives. Anything that raises the false-positive rate
|
||
(or removes verification) works against the project.
|
||
2. **Advisory, never blocking.** Gadfly must never fail a CI job for review *content*, never
|
||
merge, never deploy. Non-zero exit only on usage/config errors; even then run.sh posts a
|
||
notice rather than failing. Do not add it to branch-protection required checks.
|
||
3. **Easy to turn on for any repo.** Consumers should need only a ~15-line stub workflow + a
|
||
couple of secrets/vars. All real logic lives in the image (`entrypoint.sh`), not in the
|
||
consumer's YAML (Gitea's act_runner has weak YAML expression support).
|
||
4. **Provider-agnostic.** Powered by [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo),
|
||
so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any
|
||
OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
|
||
5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib + majordomo. Keep
|
||
it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).
|
||
|
||
## Architecture
|
||
|
||
```
|
||
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
|
||
main.go orchestration: loop specialists, each a review pass + adversarial recheck
|
||
engine.go reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
|
||
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
|
||
auto.go dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
|
||
delegate.go worker-tier delegate_investigation tool (cheap sub-agent does legwork)
|
||
consolidate.go verdict parsing + one-comment consolidation (a section per specialist)
|
||
model.go provider/model + selector + worker resolution (majordomo.Parse) + endpoint aliases
|
||
tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
|
||
recheck.go second-pass verification prompt + verdict recompute
|
||
*_test.go sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests
|
||
scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
|
||
scripts/status-board.sh render+upsert ONE live status-board comment (per-model/per-lens progress)
|
||
scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific)
|
||
entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that
|
||
used to live in workflow YAML)
|
||
Dockerfile multi-stage; private-module creds via BuildKit secrets never reach the final image
|
||
.gitea/workflows/build-image.yml push main → :latest; tag v* → :<tag>+:latest; PR → build-only
|
||
.gitea/workflows/review-reusable.yml reusable (workflow_call) review job; ships the DEFAULT swarm as
|
||
input defaults (3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite;
|
||
claude models serial, 5 lenses each) so consumers inherit it by omitting `with:`. Consumers subscribe
|
||
with an ~8-line caller forwarding only the secrets the reviewer needs (Phase 4);
|
||
gadfly's own adversarial-review.yml is a thin caller of it (dogfoods the path).
|
||
examples/ copy-paste consumer stub workflows for different providers
|
||
```
|
||
|
||
**Data flow:** consumer stub workflow → container `entrypoint.sh` (gate + clone) →
|
||
`scripts/run.sh` (per model) → `cmd/gadfly` binary (agentic review) → markdown → run.sh
|
||
upserts a PR comment as `gitea-actions`.
|
||
|
||
**Two passes:** a *review* pass drafts findings; an adversarial *recheck* pass independently
|
||
re-verifies each finding against the code and drops the unconfirmed ones, recomputing the
|
||
verdict. Verdict is one of: `No material issues found` / `Minor issues` / `Blocking issues found`.
|
||
|
||
## Build / test
|
||
|
||
```sh
|
||
go build ./cmd/gadfly # needs read access to the private majordomo module
|
||
go test ./...
|
||
gofmt -l cmd/ # must be clean
|
||
docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .
|
||
```
|
||
|
||
Run it locally against a real diff without CI:
|
||
|
||
```sh
|
||
git -C /path/to/repo diff main > /tmp/x.diff
|
||
GADFLY_PROVIDER=ollama GADFLY_MODEL=qwen2.5-coder:7b \
|
||
GADFLY_REPO_DIR=/path/to/repo GADFLY_DIFF_FILE=/tmp/x.diff \
|
||
GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly
|
||
```
|
||
|
||
## Release / deploy
|
||
|
||
- **Push to `main`** → CI builds and pushes `:latest` (+ `:sha-<short>`).
|
||
- **Tag `v*`** → publishes `:<tag>` (+ `:latest`). Pin consumers to `:vN` for stability.
|
||
- Required CI secrets: `REGISTRY_USER` / `REGISTRY_PASSWORD` (registry push + read access to the
|
||
private majordomo module). Optional `DISCORD_WEBHOOK_URL`.
|
||
|
||
## Configuration
|
||
|
||
The full env reference lives in the **README** (`Specialists`, `Models & providers`,
|
||
`Configuration`). Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`),
|
||
`GADFLY_MODEL`/`GADFLY_MODELS`, `GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via
|
||
`GADFLY_ENDPOINT_<NAME>` / `GADFLY_ALIAS_<NAME>` (http-capable) and majordomo `LLM_*` DSNs
|
||
(HTTPS-only).
|
||
|
||
**Specialists (the swarm):** the reviewer runs a suite of focused lenses, one consolidated
|
||
comment with a section each. Default suite = security/correctness/maintainability/performance/
|
||
error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via
|
||
`GADFLY_SPECIALISTS` (csv or `all`); define/override via `GADFLY_SPECIALIST_<NAME>` env or a repo
|
||
`.gadfly.yml` (`specialists:` + `define:`). See `cmd/gadfly/specialists.go`. Cost ≈
|
||
specialists × models × 2 passes — the **image/entrypoint** default stays minimal (one model) for
|
||
that reason; the **reusable workflow** (`review-reusable.yml`) deliberately ships a heavier
|
||
opinionated default swarm (3 cloud + Claude Code, 5 lenses) for steve's own fleet, which consumers
|
||
inherit or override per-input.
|
||
**Dynamic `auto`** (`GADFLY_SPECIALISTS=auto`): a selector (`GADFLY_SELECTOR_MODEL` or the review
|
||
model) picks lenses per-diff and may invent ad-hoc ones (`cmd/gadfly/auto.go`). **Worker-tier**
|
||
(`GADFLY_WORKER_MODEL`): a `delegate_investigation` tool offloads grep/read legwork to a cheap
|
||
sub-agent (`cmd/gadfly/delegate.go`).
|
||
|
||
**Tested vs untested:** only the Ollama paths (local + OpenAI-compatible pointed at Ollama)
|
||
are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are
|
||
**untested** (no spend). Keep the README honest about this; update it if that changes.
|
||
|
||
## When making changes — maintenance rules
|
||
|
||
- **Keep the README and `examples/` current.** Any change to env vars, flags, defaults,
|
||
triggers, provider support, or the consumer stub MUST be reflected in `README.md` and the
|
||
relevant files under `examples/` in the *same* change. The README's `Configuration` table,
|
||
the `Models & providers` table, and the example workflows are the contract users rely on —
|
||
stale docs are a bug.
|
||
- **Preserve the advisory-only invariant** (goal #2). If you touch exit codes or the workflow,
|
||
re-confirm a review can never fail/block a consumer's CI.
|
||
- **Don't add mort-specific (or any single-consumer) assumptions** to the binary or system
|
||
prompt. The system prompt is intentionally generic; repo-specific conventions should be
|
||
discovered by the agent at runtime (it can read the repo's own CONTRIBUTING/CLAUDE.md), not
|
||
hardcoded here.
|
||
- **Keep secrets out of image layers.** Private-module creds flow via BuildKit `--mount=type=secret`
|
||
in the build stage only; never bake them into the final image or commit them.
|
||
- Add a test when you add logic (see the `*_test.go` patterns). Keep `gofmt` clean and `go vet` quiet.
|
||
|
||
## Lessons
|
||
|
||
- majordomo's `LLM_*` env DSNs are **HTTPS-only** (`DSN.BaseURL()` forces `https://`), so they
|
||
can't express a plaintext local Ollama. That's why Gadfly adds the http-capable
|
||
`GADFLY_ENDPOINT_<NAME>="provider|base-url[|key]"` mechanism (see `cmd/gadfly/model.go`).
|
||
- Gitea `vars`/`secrets` are **not** auto-exposed as env in a job — the consumer stub must map
|
||
each one explicitly in its `env:` block (dynamic alias names can't be auto-enumerated).
|
||
- **`uses: docker://…:latest` is CACHED by act_runner** — a freshly-pushed `:latest` is often
|
||
NOT re-pulled, so the job silently runs the previous image. For a run that must use a specific
|
||
build (e.g. validating a just-pushed fix), pin the consumer stub to the immutable
|
||
`:sha-<short>` tag the build publishes, not `:latest`.
|
||
- **Concurrency is per-provider** (`entrypoint.sh`): each provider is a lane, lanes run in
|
||
parallel, `cap` (from `GADFLY_PROVIDER_CONCURRENCY` else `GADFLY_CONCURRENCY`, default 1) bounds
|
||
models-at-once within a lane. The review timeout (`GADFLY_TIMEOUT_SECS`) is **per-lens**, not
|
||
shared across the suite — a slow model can't starve later lenses (the original timeout bug).
|