Files
gadfly/CLAUDE.md
T
Steve Dudenhoeffer 4654036dea
Build & push image / build-and-push (pull_request) Successful in 6s
docs: reconcile examples/README + CLAUDE.md with the heavier reusable default
From PR #10's own review (maintainability/perf lenses): examples/README.md
hadn't been updated for the default swarm, and CLAUDE.md's 'keep the default
model count low' cost guidance read as contradicting the new heavy default.
Clarify that the IMAGE default stays minimal while the REUSABLE ships an
opinionated heavier default consumers inherit/override.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 22:18:43 -04:00

9.4 KiB
Raw Blame History

Gadfly — Developer Guide

Gadfly (🪰) is an agentic adversarial code reviewer that runs in Gitea Actions. On a pull request it reads the checked-out repository with read-only tools, hunts for real problems, verifies each one against the actual code, and posts its findings as a comment. It is advisory only — it never blocks a merge.

This is a public, vibe-coded project (built largely by an AI agent). Keep that framing honest in the README; don't oversell it.

Project goals (keep changes aligned to these)

  1. Find real problems, not nits. The whole point of the agentic tools + two-pass recheck is to kill diff-only false positives. Anything that raises the false-positive rate (or removes verification) works against the project.
  2. Advisory, never blocking. Gadfly must never fail a CI job for review content, never merge, never deploy. Non-zero exit only on usage/config errors; even then run.sh posts a notice rather than failing. Do not add it to branch-protection required checks.
  3. Easy to turn on for any repo. Consumers should need only a ~15-line stub workflow + a couple of secrets/vars. All real logic lives in the image (entrypoint.sh), not in the consumer's YAML (Gitea's act_runner has weak YAML expression support).
  4. Provider-agnostic. Powered by majordomo, so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
  5. Portable & self-contained. cmd/gadfly depends only on the Go stdlib + majordomo. Keep it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).

Architecture

cmd/gadfly/            the reviewer binary — pure producer of review markdown (stdout)
  main.go              orchestration: loop specialists, each a review pass + adversarial recheck
  engine.go            reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
  specialists.go       specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
  auto.go              dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
  delegate.go          worker-tier delegate_investigation tool (cheap sub-agent does legwork)
  consolidate.go       verdict parsing + one-comment consolidation (a section per specialist)
  model.go             provider/model + selector + worker resolution (majordomo.Parse) + endpoint aliases
  tools.go             the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
  recheck.go           second-pass verification prompt + verdict recompute
  *_test.go            sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests
scripts/run.sh         fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
scripts/status-board.sh    render+upsert ONE live status-board comment (per-model/per-lens progress)
scripts/system-prompt.txt   the reviewer persona + verification discipline (generic, not repo-specific)
entrypoint.sh          container brains: trigger gating, PR clone, model loop (the logic that
                       used to live in workflow YAML)
Dockerfile             multi-stage; private-module creds via BuildKit secrets never reach the final image
.gitea/workflows/build-image.yml   push main → :latest; tag v* → :<tag>+:latest; PR → build-only
.gitea/workflows/review-reusable.yml  reusable (workflow_call) review job; ships the DEFAULT swarm as
                       input defaults (3 cloud + Claude Code sonnet/opus/opus:max, 5-lens suite;
                       claude models serial, 5 lenses each) so consumers inherit it by omitting `with:`. Consumers subscribe
                       with an ~8-line caller forwarding only the secrets the reviewer needs (Phase 4);
                       gadfly's own adversarial-review.yml is a thin caller of it (dogfoods the path).
examples/              copy-paste consumer stub workflows for different providers

Data flow: consumer stub workflow → container entrypoint.sh (gate + clone) → scripts/run.sh (per model) → cmd/gadfly binary (agentic review) → markdown → run.sh upserts a PR comment as gitea-actions.

Two passes: a review pass drafts findings; an adversarial recheck pass independently re-verifies each finding against the code and drops the unconfirmed ones, recomputing the verdict. Verdict is one of: No material issues found / Minor issues / Blocking issues found.

Build / test

go build ./cmd/gadfly      # needs read access to the private majordomo module
go test ./...
gofmt -l cmd/               # must be clean
docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .

Run it locally against a real diff without CI:

git -C /path/to/repo diff main > /tmp/x.diff
GADFLY_PROVIDER=ollama GADFLY_MODEL=qwen2.5-coder:7b \
GADFLY_REPO_DIR=/path/to/repo GADFLY_DIFF_FILE=/tmp/x.diff \
GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly

Release / deploy

  • Push to main → CI builds and pushes :latest (+ :sha-<short>).
  • Tag v* → publishes :<tag> (+ :latest). Pin consumers to :vN for stability.
  • Required CI secrets: REGISTRY_USER / REGISTRY_PASSWORD (registry push + read access to the private majordomo module). Optional DISCORD_WEBHOOK_URL.

Configuration

The full env reference lives in the README (Specialists, Models & providers, Configuration). Provider selection: GADFLY_PROVIDER (default ollama-cloud), GADFLY_MODEL/GADFLY_MODELS, GADFLY_BASE_URL, GADFLY_API_KEY. Named endpoint aliases via GADFLY_ENDPOINT_<NAME> / GADFLY_ALIAS_<NAME> (http-capable) and majordomo LLM_* DSNs (HTTPS-only).

Specialists (the swarm): the reviewer runs a suite of focused lenses, one consolidated comment with a section each. Default suite = security/correctness/maintainability/performance/ error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via GADFLY_SPECIALISTS (csv or all); define/override via GADFLY_SPECIALIST_<NAME> env or a repo .gadfly.yml (specialists: + define:). See cmd/gadfly/specialists.go. Cost ≈ specialists × models × 2 passes — the image/entrypoint default stays minimal (one model) for that reason; the reusable workflow (review-reusable.yml) deliberately ships a heavier opinionated default swarm (3 cloud + Claude Code, 5 lenses) for steve's own fleet, which consumers inherit or override per-input. Dynamic auto (GADFLY_SPECIALISTS=auto): a selector (GADFLY_SELECTOR_MODEL or the review model) picks lenses per-diff and may invent ad-hoc ones (cmd/gadfly/auto.go). Worker-tier (GADFLY_WORKER_MODEL): a delegate_investigation tool offloads grep/read legwork to a cheap sub-agent (cmd/gadfly/delegate.go).

Tested vs untested: only the Ollama paths (local + OpenAI-compatible pointed at Ollama) are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are untested (no spend). Keep the README honest about this; update it if that changes.

When making changes — maintenance rules

  • Keep the README and examples/ current. Any change to env vars, flags, defaults, triggers, provider support, or the consumer stub MUST be reflected in README.md and the relevant files under examples/ in the same change. The README's Configuration table, the Models & providers table, and the example workflows are the contract users rely on — stale docs are a bug.
  • Preserve the advisory-only invariant (goal #2). If you touch exit codes or the workflow, re-confirm a review can never fail/block a consumer's CI.
  • Don't add mort-specific (or any single-consumer) assumptions to the binary or system prompt. The system prompt is intentionally generic; repo-specific conventions should be discovered by the agent at runtime (it can read the repo's own CONTRIBUTING/CLAUDE.md), not hardcoded here.
  • Keep secrets out of image layers. Private-module creds flow via BuildKit --mount=type=secret in the build stage only; never bake them into the final image or commit them.
  • Add a test when you add logic (see the *_test.go patterns). Keep gofmt clean and go vet quiet.

Lessons

  • majordomo's LLM_* env DSNs are HTTPS-only (DSN.BaseURL() forces https://), so they can't express a plaintext local Ollama. That's why Gadfly adds the http-capable GADFLY_ENDPOINT_<NAME>="provider|base-url[|key]" mechanism (see cmd/gadfly/model.go).
  • Gitea vars/secrets are not auto-exposed as env in a job — the consumer stub must map each one explicitly in its env: block (dynamic alias names can't be auto-enumerated).
  • uses: docker://…:latest is CACHED by act_runner — a freshly-pushed :latest is often NOT re-pulled, so the job silently runs the previous image. For a run that must use a specific build (e.g. validating a just-pushed fix), pin the consumer stub to the immutable :sha-<short> tag the build publishes, not :latest.
  • Concurrency is per-provider (entrypoint.sh): each provider is a lane, lanes run in parallel, cap (from GADFLY_PROVIDER_CONCURRENCY else GADFLY_CONCURRENCY, default 1) bounds models-at-once within a lane. The review timeout (GADFLY_TIMEOUT_SECS) is per-lens, not shared across the suite — a slow model can't starve later lenses (the original timeout bug).