From PR #10's own review (maintainability/perf lenses): examples/README.md hadn't been updated for the default swarm, and CLAUDE.md's 'keep the default model count low' cost guidance read as contradicting the new heavy default. Clarify that the IMAGE default stays minimal while the REUSABLE ships an opinionated heavier default consumers inherit/override. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.4 KiB
Gadfly — Developer Guide
Gadfly (🪰) is an agentic adversarial code reviewer that runs in Gitea Actions. On a pull request it reads the checked-out repository with read-only tools, hunts for real problems, verifies each one against the actual code, and posts its findings as a comment. It is advisory only — it never blocks a merge.
This is a public, vibe-coded project (built largely by an AI agent). Keep that framing honest in the README; don't oversell it.
Project goals (keep changes aligned to these)
- Find real problems, not nits. The whole point of the agentic tools + two-pass recheck is to kill diff-only false positives. Anything that raises the false-positive rate (or removes verification) works against the project.
- Advisory, never blocking. Gadfly must never fail a CI job for review content, never merge, never deploy. Non-zero exit only on usage/config errors; even then run.sh posts a notice rather than failing. Do not add it to branch-protection required checks.
- Easy to turn on for any repo. Consumers should need only a ~15-line stub workflow + a
couple of secrets/vars. All real logic lives in the image (
entrypoint.sh), not in the consumer's YAML (Gitea's act_runner has weak YAML expression support). - Provider-agnostic. Powered by majordomo, so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
- Portable & self-contained.
cmd/gadflydepends only on the Go stdlib + majordomo. Keep it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).
Architecture
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
main.go orchestration: loop specialists, each a review pass + adversarial recheck
engine.go reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
auto.go dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
delegate.go worker-tier delegate_investigation tool (cheap sub-agent does legwork)
consolidate.go verdict parsing + one-comment consolidation (a section per specialist)
model.go provider/model + selector + worker resolution (majordomo.Parse) + endpoint aliases
tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
recheck.go second-pass verification prompt + verdict recompute
*_test.go sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests
scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
scripts/status-board.sh render+upsert ONE live status-board comment (per-model/per-lens progress)
scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific)
entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that
used to live in workflow YAML)
Dockerfile multi-stage; private-module creds via BuildKit secrets never reach the final image
.gitea/workflows/build-image.yml push main → :latest; tag v* → :<tag>+:latest; PR → build-only
.gitea/workflows/review-reusable.yml reusable (workflow_call) review job; ships the DEFAULT swarm as
input defaults (3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite;
claude models serial, 5 lenses each) so consumers inherit it by omitting `with:`. Consumers subscribe
with an ~8-line caller forwarding only the secrets the reviewer needs (Phase 4);
gadfly's own adversarial-review.yml is a thin caller of it (dogfoods the path).
examples/ copy-paste consumer stub workflows for different providers
Data flow: consumer stub workflow → container entrypoint.sh (gate + clone) →
scripts/run.sh (per model) → cmd/gadfly binary (agentic review) → markdown → run.sh
upserts a PR comment as gitea-actions.
Two passes: a review pass drafts findings; an adversarial recheck pass independently
re-verifies each finding against the code and drops the unconfirmed ones, recomputing the
verdict. Verdict is one of: No material issues found / Minor issues / Blocking issues found.
Build / test
go build ./cmd/gadfly # needs read access to the private majordomo module
go test ./...
gofmt -l cmd/ # must be clean
docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .
Run it locally against a real diff without CI:
git -C /path/to/repo diff main > /tmp/x.diff
GADFLY_PROVIDER=ollama GADFLY_MODEL=qwen2.5-coder:7b \
GADFLY_REPO_DIR=/path/to/repo GADFLY_DIFF_FILE=/tmp/x.diff \
GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly
Release / deploy
- Push to
main→ CI builds and pushes:latest(+:sha-<short>). - Tag
v*→ publishes:<tag>(+:latest). Pin consumers to:vNfor stability. - Required CI secrets:
REGISTRY_USER/REGISTRY_PASSWORD(registry push + read access to the private majordomo module). OptionalDISCORD_WEBHOOK_URL.
Configuration
The full env reference lives in the README (Specialists, Models & providers,
Configuration). Provider selection: GADFLY_PROVIDER (default ollama-cloud),
GADFLY_MODEL/GADFLY_MODELS, GADFLY_BASE_URL, GADFLY_API_KEY. Named endpoint aliases via
GADFLY_ENDPOINT_<NAME> / GADFLY_ALIAS_<NAME> (http-capable) and majordomo LLM_* DSNs
(HTTPS-only).
Specialists (the swarm): the reviewer runs a suite of focused lenses, one consolidated
comment with a section each. Default suite = security/correctness/maintainability/performance/
error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via
GADFLY_SPECIALISTS (csv or all); define/override via GADFLY_SPECIALIST_<NAME> env or a repo
.gadfly.yml (specialists: + define:). See cmd/gadfly/specialists.go. Cost ≈
specialists × models × 2 passes — the image/entrypoint default stays minimal (one model) for
that reason; the reusable workflow (review-reusable.yml) deliberately ships a heavier
opinionated default swarm (3 cloud + Claude Code, 5 lenses) for steve's own fleet, which consumers
inherit or override per-input.
Dynamic auto (GADFLY_SPECIALISTS=auto): a selector (GADFLY_SELECTOR_MODEL or the review
model) picks lenses per-diff and may invent ad-hoc ones (cmd/gadfly/auto.go). Worker-tier
(GADFLY_WORKER_MODEL): a delegate_investigation tool offloads grep/read legwork to a cheap
sub-agent (cmd/gadfly/delegate.go).
Tested vs untested: only the Ollama paths (local + OpenAI-compatible pointed at Ollama) are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are untested (no spend). Keep the README honest about this; update it if that changes.
When making changes — maintenance rules
- Keep the README and
examples/current. Any change to env vars, flags, defaults, triggers, provider support, or the consumer stub MUST be reflected inREADME.mdand the relevant files underexamples/in the same change. The README'sConfigurationtable, theModels & providerstable, and the example workflows are the contract users rely on — stale docs are a bug. - Preserve the advisory-only invariant (goal #2). If you touch exit codes or the workflow, re-confirm a review can never fail/block a consumer's CI.
- Don't add mort-specific (or any single-consumer) assumptions to the binary or system prompt. The system prompt is intentionally generic; repo-specific conventions should be discovered by the agent at runtime (it can read the repo's own CONTRIBUTING/CLAUDE.md), not hardcoded here.
- Keep secrets out of image layers. Private-module creds flow via BuildKit
--mount=type=secretin the build stage only; never bake them into the final image or commit them. - Add a test when you add logic (see the
*_test.gopatterns). Keepgofmtclean andgo vetquiet.
Lessons
- majordomo's
LLM_*env DSNs are HTTPS-only (DSN.BaseURL()forceshttps://), so they can't express a plaintext local Ollama. That's why Gadfly adds the http-capableGADFLY_ENDPOINT_<NAME>="provider|base-url[|key]"mechanism (seecmd/gadfly/model.go). - Gitea
vars/secretsare not auto-exposed as env in a job — the consumer stub must map each one explicitly in itsenv:block (dynamic alias names can't be auto-enumerated). uses: docker://…:latestis CACHED by act_runner — a freshly-pushed:latestis often NOT re-pulled, so the job silently runs the previous image. For a run that must use a specific build (e.g. validating a just-pushed fix), pin the consumer stub to the immutable:sha-<short>tag the build publishes, not:latest.- Concurrency is per-provider (
entrypoint.sh): each provider is a lane, lanes run in parallel,cap(fromGADFLY_PROVIDER_CONCURRENCYelseGADFLY_CONCURRENCY, default 1) bounds models-at-once within a lane. The review timeout (GADFLY_TIMEOUT_SECS) is per-lens, not shared across the suite — a slow model can't starve later lenses (the original timeout bug).