Files
gadfly/CLAUDE.md
T
Steve Dudenhoeffer 7809d1b93d
Build & push image / build-and-push (push) Successful in 8s
feat: specialist suite — configurable + custom review lenses (one consolidated comment)
Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).

- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
  maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
  improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
  GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
  Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
  Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
  focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
  specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
  CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:23:05 -04:00

7.3 KiB
Raw Blame History

Gadfly — Developer Guide

Gadfly (🪰) is an agentic adversarial code reviewer that runs in Gitea Actions. On a pull request it reads the checked-out repository with read-only tools, hunts for real problems, verifies each one against the actual code, and posts its findings as a comment. It is advisory only — it never blocks a merge.

This is a public, vibe-coded project (built largely by an AI agent). Keep that framing honest in the README; don't oversell it.

Project goals (keep changes aligned to these)

  1. Find real problems, not nits. The whole point of the agentic tools + two-pass recheck is to kill diff-only false positives. Anything that raises the false-positive rate (or removes verification) works against the project.
  2. Advisory, never blocking. Gadfly must never fail a CI job for review content, never merge, never deploy. Non-zero exit only on usage/config errors; even then run.sh posts a notice rather than failing. Do not add it to branch-protection required checks.
  3. Easy to turn on for any repo. Consumers should need only a ~15-line stub workflow + a couple of secrets/vars. All real logic lives in the image (entrypoint.sh), not in the consumer's YAML (Gitea's act_runner has weak YAML expression support).
  4. Provider-agnostic. Powered by majordomo, so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
  5. Portable & self-contained. cmd/gadfly depends only on the Go stdlib + majordomo. Keep it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).

Architecture

cmd/gadfly/            the reviewer binary — pure producer of review markdown (stdout)
  main.go              orchestration: loop specialists, each a review pass + adversarial recheck
  specialists.go       specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
  consolidate.go       verdict parsing + one-comment consolidation (a section per specialist)
  model.go             provider/model resolution (majordomo.Parse) + env endpoint aliases
  tools.go             the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
  recheck.go           second-pass verification prompt + verdict recompute
  *_test.go            sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests
scripts/run.sh         fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
scripts/system-prompt.txt   the reviewer persona + verification discipline (generic, not repo-specific)
entrypoint.sh          container brains: trigger gating, PR clone, model loop (the logic that
                       used to live in workflow YAML)
Dockerfile             multi-stage; private-module creds via BuildKit secrets never reach the final image
.gitea/workflows/build-image.yml   push main → :latest; tag v* → :<tag>+:latest; PR → build-only
examples/              copy-paste consumer stub workflows for different providers

Data flow: consumer stub workflow → container entrypoint.sh (gate + clone) → scripts/run.sh (per model) → cmd/gadfly binary (agentic review) → markdown → run.sh upserts a PR comment as gitea-actions.

Two passes: a review pass drafts findings; an adversarial recheck pass independently re-verifies each finding against the code and drops the unconfirmed ones, recomputing the verdict. Verdict is one of: No material issues found / Minor issues / Blocking issues found.

Build / test

go build ./cmd/gadfly      # needs read access to the private majordomo module
go test ./...
gofmt -l cmd/               # must be clean
docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .

Run it locally against a real diff without CI:

git -C /path/to/repo diff main > /tmp/x.diff
GADFLY_PROVIDER=ollama GADFLY_MODEL=qwen2.5-coder:7b \
GADFLY_REPO_DIR=/path/to/repo GADFLY_DIFF_FILE=/tmp/x.diff \
GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly

Release / deploy

  • Push to main → CI builds and pushes :latest (+ :sha-<short>).
  • Tag v* → publishes :<tag> (+ :latest). Pin consumers to :vN for stability.
  • Required CI secrets: REGISTRY_USER / REGISTRY_PASSWORD (registry push + read access to the private majordomo module). Optional DISCORD_WEBHOOK_URL.

Configuration

The full env reference lives in the README (Specialists, Models & providers, Configuration). Provider selection: GADFLY_PROVIDER (default ollama-cloud), GADFLY_MODEL/GADFLY_MODELS, GADFLY_BASE_URL, GADFLY_API_KEY. Named endpoint aliases via GADFLY_ENDPOINT_<NAME> / GADFLY_ALIAS_<NAME> (http-capable) and majordomo LLM_* DSNs (HTTPS-only).

Specialists (the swarm): the reviewer runs a suite of focused lenses, one consolidated comment with a section each. Default suite = security/correctness/maintainability/performance/ error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via GADFLY_SPECIALISTS (csv or all); define/override via GADFLY_SPECIALIST_<NAME> env or a repo .gadfly.yml (specialists: + define:). See cmd/gadfly/specialists.go. Cost ≈ specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one). Dynamic auto selection (a cheap model picks lenses per-diff) is the planned next step.

Tested vs untested: only the Ollama paths (local + OpenAI-compatible pointed at Ollama) are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are untested (no spend). Keep the README honest about this; update it if that changes.

When making changes — maintenance rules

  • Keep the README and examples/ current. Any change to env vars, flags, defaults, triggers, provider support, or the consumer stub MUST be reflected in README.md and the relevant files under examples/ in the same change. The README's Configuration table, the Models & providers table, and the example workflows are the contract users rely on — stale docs are a bug.
  • Preserve the advisory-only invariant (goal #2). If you touch exit codes or the workflow, re-confirm a review can never fail/block a consumer's CI.
  • Don't add mort-specific (or any single-consumer) assumptions to the binary or system prompt. The system prompt is intentionally generic; repo-specific conventions should be discovered by the agent at runtime (it can read the repo's own CONTRIBUTING/CLAUDE.md), not hardcoded here.
  • Keep secrets out of image layers. Private-module creds flow via BuildKit --mount=type=secret in the build stage only; never bake them into the final image or commit them.
  • Add a test when you add logic (see the *_test.go patterns). Keep gofmt clean and go vet quiet.

Lessons

  • majordomo's LLM_* env DSNs are HTTPS-only (DSN.BaseURL() forces https://), so they can't express a plaintext local Ollama. That's why Gadfly adds the http-capable GADFLY_ENDPOINT_<NAME>="provider|base-url[|key]" mechanism (see cmd/gadfly/model.go).
  • Gitea vars/secrets are not auto-exposed as env in a job — the consumer stub must map each one explicitly in its env: block (dynamic alias names can't be auto-enumerated).