Files
gadfly/CLAUDE.md
T
Steve Dudenhoeffer daff6d08a1
Build & push image / build-and-push (pull_request) Successful in 6s
docs: drop stale 'secrets: inherit' mentions (reusable comment + CLAUDE.md)
Self-review on PR #9 flagged two doc-drift spots left over from the
explicit-secret-forwarding switch. Cosmetic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 21:00:40 -04:00

143 lines
9.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Gadfly — Developer Guide
Gadfly (🪰) is an **agentic adversarial code reviewer** that runs in Gitea Actions. On a pull
request it reads the *checked-out repository* with read-only tools, hunts for real problems,
verifies each one against the actual code, and posts its findings as a comment. It is
**advisory only** — it never blocks a merge.
> This is a public, **vibe-coded** project (built largely by an AI agent). Keep that framing
> honest in the README; don't oversell it.
## Project goals (keep changes aligned to these)
1. **Find *real* problems, not nits.** The whole point of the agentic tools + two-pass
recheck is to kill diff-only false positives. Anything that raises the false-positive rate
(or removes verification) works against the project.
2. **Advisory, never blocking.** Gadfly must never fail a CI job for review *content*, never
merge, never deploy. Non-zero exit only on usage/config errors; even then run.sh posts a
notice rather than failing. Do not add it to branch-protection required checks.
3. **Easy to turn on for any repo.** Consumers should need only a ~15-line stub workflow + a
couple of secrets/vars. All real logic lives in the image (`entrypoint.sh`), not in the
consumer's YAML (Gitea's act_runner has weak YAML expression support).
4. **Provider-agnostic.** Powered by [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo),
so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any
OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib + majordomo. Keep
it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).
## Architecture
```
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
main.go orchestration: loop specialists, each a review pass + adversarial recheck
engine.go reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
auto.go dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
delegate.go worker-tier delegate_investigation tool (cheap sub-agent does legwork)
consolidate.go verdict parsing + one-comment consolidation (a section per specialist)
model.go provider/model + selector + worker resolution (majordomo.Parse) + endpoint aliases
tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
recheck.go second-pass verification prompt + verdict recompute
*_test.go sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests
scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
scripts/status-board.sh render+upsert ONE live status-board comment (per-model/per-lens progress)
scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific)
entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that
used to live in workflow YAML)
Dockerfile multi-stage; private-module creds via BuildKit secrets never reach the final image
.gitea/workflows/build-image.yml push main → :latest; tag v* → :<tag>+:latest; PR → build-only
.gitea/workflows/review-reusable.yml reusable (workflow_call) review job; consumers subscribe with
an ~8-line caller forwarding only the secrets the reviewer needs (Phase 4). gadfly's own
adversarial-review.yml is a thin caller of it (dogfoods the path).
examples/ copy-paste consumer stub workflows for different providers
```
**Data flow:** consumer stub workflow → container `entrypoint.sh` (gate + clone) →
`scripts/run.sh` (per model) → `cmd/gadfly` binary (agentic review) → markdown → run.sh
upserts a PR comment as `gitea-actions`.
**Two passes:** a *review* pass drafts findings; an adversarial *recheck* pass independently
re-verifies each finding against the code and drops the unconfirmed ones, recomputing the
verdict. Verdict is one of: `No material issues found` / `Minor issues` / `Blocking issues found`.
## Build / test
```sh
go build ./cmd/gadfly # needs read access to the private majordomo module
go test ./...
gofmt -l cmd/ # must be clean
docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .
```
Run it locally against a real diff without CI:
```sh
git -C /path/to/repo diff main > /tmp/x.diff
GADFLY_PROVIDER=ollama GADFLY_MODEL=qwen2.5-coder:7b \
GADFLY_REPO_DIR=/path/to/repo GADFLY_DIFF_FILE=/tmp/x.diff \
GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly
```
## Release / deploy
- **Push to `main`** → CI builds and pushes `:latest` (+ `:sha-<short>`).
- **Tag `v*`** → publishes `:<tag>` (+ `:latest`). Pin consumers to `:vN` for stability.
- Required CI secrets: `REGISTRY_USER` / `REGISTRY_PASSWORD` (registry push + read access to the
private majordomo module). Optional `DISCORD_WEBHOOK_URL`.
## Configuration
The full env reference lives in the **README** (`Specialists`, `Models & providers`,
`Configuration`). Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`),
`GADFLY_MODEL`/`GADFLY_MODELS`, `GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via
`GADFLY_ENDPOINT_<NAME>` / `GADFLY_ALIAS_<NAME>` (http-capable) and majordomo `LLM_*` DSNs
(HTTPS-only).
**Specialists (the swarm):** the reviewer runs a suite of focused lenses, one consolidated
comment with a section each. Default suite = security/correctness/maintainability/performance/
error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via
`GADFLY_SPECIALISTS` (csv or `all`); define/override via `GADFLY_SPECIALIST_<NAME>` env or a repo
`.gadfly.yml` (`specialists:` + `define:`). See `cmd/gadfly/specialists.go`. Cost ≈
specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one).
**Dynamic `auto`** (`GADFLY_SPECIALISTS=auto`): a selector (`GADFLY_SELECTOR_MODEL` or the review
model) picks lenses per-diff and may invent ad-hoc ones (`cmd/gadfly/auto.go`). **Worker-tier**
(`GADFLY_WORKER_MODEL`): a `delegate_investigation` tool offloads grep/read legwork to a cheap
sub-agent (`cmd/gadfly/delegate.go`).
**Tested vs untested:** only the Ollama paths (local + OpenAI-compatible pointed at Ollama)
are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are
**untested** (no spend). Keep the README honest about this; update it if that changes.
## When making changes — maintenance rules
- **Keep the README and `examples/` current.** Any change to env vars, flags, defaults,
triggers, provider support, or the consumer stub MUST be reflected in `README.md` and the
relevant files under `examples/` in the *same* change. The README's `Configuration` table,
the `Models & providers` table, and the example workflows are the contract users rely on —
stale docs are a bug.
- **Preserve the advisory-only invariant** (goal #2). If you touch exit codes or the workflow,
re-confirm a review can never fail/block a consumer's CI.
- **Don't add mort-specific (or any single-consumer) assumptions** to the binary or system
prompt. The system prompt is intentionally generic; repo-specific conventions should be
discovered by the agent at runtime (it can read the repo's own CONTRIBUTING/CLAUDE.md), not
hardcoded here.
- **Keep secrets out of image layers.** Private-module creds flow via BuildKit `--mount=type=secret`
in the build stage only; never bake them into the final image or commit them.
- Add a test when you add logic (see the `*_test.go` patterns). Keep `gofmt` clean and `go vet` quiet.
## Lessons
- majordomo's `LLM_*` env DSNs are **HTTPS-only** (`DSN.BaseURL()` forces `https://`), so they
can't express a plaintext local Ollama. That's why Gadfly adds the http-capable
`GADFLY_ENDPOINT_<NAME>="provider|base-url[|key]"` mechanism (see `cmd/gadfly/model.go`).
- Gitea `vars`/`secrets` are **not** auto-exposed as env in a job — the consumer stub must map
each one explicitly in its `env:` block (dynamic alias names can't be auto-enumerated).
- **`uses: docker://…:latest` is CACHED by act_runner** — a freshly-pushed `:latest` is often
NOT re-pulled, so the job silently runs the previous image. For a run that must use a specific
build (e.g. validating a just-pushed fix), pin the consumer stub to the immutable
`:sha-<short>` tag the build publishes, not `:latest`.
- **Concurrency is per-provider** (`entrypoint.sh`): each provider is a lane, lanes run in
parallel, `cap` (from `GADFLY_PROVIDER_CONCURRENCY` else `GADFLY_CONCURRENCY`, default 1) bounds
models-at-once within a lane. The review timeout (`GADFLY_TIMEOUT_SECS`) is **per-lens**, not
shared across the suite — a slow model can't starve later lenses (the original timeout bug).