4237a18d09
Phase 1 of the gadfly-games build. Adds a second review engine alongside the majordomo agent loop: for each lens, shell out to the Claude Code CLI (`claude -p`) inside the checked-out repo so it verifies findings with its OWN read tools, then reuse gadfly's verdict-parse + recheck + consolidate + emit pipeline unchanged. - cmd/gadfly/engine.go: new reviewEngine interface with two impls — majordomoEngine (wraps the existing runAgent path) and claudeCodeEngine (exec `claude -p ... --output-format json`, parse `.result`). main.go's runSpecialists/reviewWithSpecialist are now engine-agnostic. - Select via a model id: `claude-code` (CLI default) or `claude-code/<model>` (suffix → --model). Auth inherits from the env: Pro/Max via CLAUDE_CODE_OAUTH_TOKEN (no --bare), else ANTHROPIC_API_KEY. Read-only by default (--permission-mode plan); tunable via GADFLY_CLAUDE_*. - auto-select + delegate worker are majordomo-only and are skipped with this engine (Claude Code does its own legwork). - Dockerfile bundles Node + @anthropic-ai/claude-code (larger image). - Docs: README "Claude Code engine" section + config rows, examples/ claude-code.yml stub, examples/README + CLAUDE.md updated. Honest note that subscription-auth-in-CI is untested here / a ToS gray area. - Bumps the dogfood image pin to :sha-c3d09d3 so gadfly's own PRs now review with the live status board from Phase 3. New engine_test.go covers spec detection, model derivation, and argv building (no live CLI call). gofmt clean, go vet quiet, go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
140 lines
8.7 KiB
Markdown
140 lines
8.7 KiB
Markdown
# Gadfly — Developer Guide
|
||
|
||
Gadfly (🪰) is an **agentic adversarial code reviewer** that runs in Gitea Actions. On a pull
|
||
request it reads the *checked-out repository* with read-only tools, hunts for real problems,
|
||
verifies each one against the actual code, and posts its findings as a comment. It is
|
||
**advisory only** — it never blocks a merge.
|
||
|
||
> This is a public, **vibe-coded** project (built largely by an AI agent). Keep that framing
|
||
> honest in the README; don't oversell it.
|
||
|
||
## Project goals (keep changes aligned to these)
|
||
|
||
1. **Find *real* problems, not nits.** The whole point of the agentic tools + two-pass
|
||
recheck is to kill diff-only false positives. Anything that raises the false-positive rate
|
||
(or removes verification) works against the project.
|
||
2. **Advisory, never blocking.** Gadfly must never fail a CI job for review *content*, never
|
||
merge, never deploy. Non-zero exit only on usage/config errors; even then run.sh posts a
|
||
notice rather than failing. Do not add it to branch-protection required checks.
|
||
3. **Easy to turn on for any repo.** Consumers should need only a ~15-line stub workflow + a
|
||
couple of secrets/vars. All real logic lives in the image (`entrypoint.sh`), not in the
|
||
consumer's YAML (Gitea's act_runner has weak YAML expression support).
|
||
4. **Provider-agnostic.** Powered by [majordomo](https://gitea.stevedudenhoeffer.com/steve/majordomo),
|
||
so it can target Ollama (local/cloud), OpenAI, Anthropic, Google, or any
|
||
OpenAI/Ollama-compatible endpoint. Don't re-hardcode a single provider.
|
||
5. **Portable & self-contained.** `cmd/gadfly` depends only on the Go stdlib + majordomo. Keep
|
||
it that way — no heavyweight deps, no coupling to any one consumer repo (e.g. mort).
|
||
|
||
## Architecture
|
||
|
||
```
|
||
cmd/gadfly/ the reviewer binary — pure producer of review markdown (stdout)
|
||
main.go orchestration: loop specialists, each a review pass + adversarial recheck
|
||
engine.go reviewEngine abstraction: majordomo agent loop vs claude-code CLI shell-out
|
||
specialists.go specialist lenses: built-ins, default suite, env + .gadfly.yml resolution
|
||
auto.go dynamic `auto` selection: a selector model picks lenses per-diff (may invent)
|
||
delegate.go worker-tier delegate_investigation tool (cheap sub-agent does legwork)
|
||
consolidate.go verdict parsing + one-comment consolidation (a section per specialist)
|
||
model.go provider/model + selector + worker resolution (majordomo.Parse) + endpoint aliases
|
||
tools.go the 5 read-only repo tools (read_file/list_dir/grep/find_files/get_diff)
|
||
recheck.go second-pass verification prompt + verdict recompute
|
||
*_test.go sandbox, recheck, wrap-up, spec/endpoint-parse, specialist-resolution tests
|
||
scripts/run.sh fetch PR diff+meta, run the binary, upsert ONE labeled PR comment
|
||
scripts/status-board.sh render+upsert ONE live status-board comment (per-model/per-lens progress)
|
||
scripts/system-prompt.txt the reviewer persona + verification discipline (generic, not repo-specific)
|
||
entrypoint.sh container brains: trigger gating, PR clone, model loop (the logic that
|
||
used to live in workflow YAML)
|
||
Dockerfile multi-stage; private-module creds via BuildKit secrets never reach the final image
|
||
.gitea/workflows/build-image.yml push main → :latest; tag v* → :<tag>+:latest; PR → build-only
|
||
examples/ copy-paste consumer stub workflows for different providers
|
||
```
|
||
|
||
**Data flow:** consumer stub workflow → container `entrypoint.sh` (gate + clone) →
|
||
`scripts/run.sh` (per model) → `cmd/gadfly` binary (agentic review) → markdown → run.sh
|
||
upserts a PR comment as `gitea-actions`.
|
||
|
||
**Two passes:** a *review* pass drafts findings; an adversarial *recheck* pass independently
|
||
re-verifies each finding against the code and drops the unconfirmed ones, recomputing the
|
||
verdict. Verdict is one of: `No material issues found` / `Minor issues` / `Blocking issues found`.
|
||
|
||
## Build / test
|
||
|
||
```sh
|
||
go build ./cmd/gadfly # needs read access to the private majordomo module
|
||
go test ./...
|
||
gofmt -l cmd/ # must be clean
|
||
docker build -t gadfly:dev --secret id=REGISTRY_USER,env=REGISTRY_USER --secret id=REGISTRY_PASSWORD,env=REGISTRY_PASSWORD .
|
||
```
|
||
|
||
Run it locally against a real diff without CI:
|
||
|
||
```sh
|
||
git -C /path/to/repo diff main > /tmp/x.diff
|
||
GADFLY_PROVIDER=ollama GADFLY_MODEL=qwen2.5-coder:7b \
|
||
GADFLY_REPO_DIR=/path/to/repo GADFLY_DIFF_FILE=/tmp/x.diff \
|
||
GADFLY_SYSTEM_FILE=scripts/system-prompt.txt ./gadfly
|
||
```
|
||
|
||
## Release / deploy
|
||
|
||
- **Push to `main`** → CI builds and pushes `:latest` (+ `:sha-<short>`).
|
||
- **Tag `v*`** → publishes `:<tag>` (+ `:latest`). Pin consumers to `:vN` for stability.
|
||
- Required CI secrets: `REGISTRY_USER` / `REGISTRY_PASSWORD` (registry push + read access to the
|
||
private majordomo module). Optional `DISCORD_WEBHOOK_URL`.
|
||
|
||
## Configuration
|
||
|
||
The full env reference lives in the **README** (`Specialists`, `Models & providers`,
|
||
`Configuration`). Provider selection: `GADFLY_PROVIDER` (default `ollama-cloud`),
|
||
`GADFLY_MODEL`/`GADFLY_MODELS`, `GADFLY_BASE_URL`, `GADFLY_API_KEY`. Named endpoint aliases via
|
||
`GADFLY_ENDPOINT_<NAME>` / `GADFLY_ALIAS_<NAME>` (http-capable) and majordomo `LLM_*` DSNs
|
||
(HTTPS-only).
|
||
|
||
**Specialists (the swarm):** the reviewer runs a suite of focused lenses, one consolidated
|
||
comment with a section each. Default suite = security/correctness/maintainability/performance/
|
||
error-handling; opt-in built-ins = tests/docs/conventions/improvements. Select via
|
||
`GADFLY_SPECIALISTS` (csv or `all`); define/override via `GADFLY_SPECIALIST_<NAME>` env or a repo
|
||
`.gadfly.yml` (`specialists:` + `define:`). See `cmd/gadfly/specialists.go`. Cost ≈
|
||
specialists × models × 2 passes — keep the default model count low (entrypoint defaults to one).
|
||
**Dynamic `auto`** (`GADFLY_SPECIALISTS=auto`): a selector (`GADFLY_SELECTOR_MODEL` or the review
|
||
model) picks lenses per-diff and may invent ad-hoc ones (`cmd/gadfly/auto.go`). **Worker-tier**
|
||
(`GADFLY_WORKER_MODEL`): a `delegate_investigation` tool offloads grep/read legwork to a cheap
|
||
sub-agent (`cmd/gadfly/delegate.go`).
|
||
|
||
**Tested vs untested:** only the Ollama paths (local + OpenAI-compatible pointed at Ollama)
|
||
are actually exercised. OpenAI/Anthropic/Google come from majordomo's abstraction and are
|
||
**untested** (no spend). Keep the README honest about this; update it if that changes.
|
||
|
||
## When making changes — maintenance rules
|
||
|
||
- **Keep the README and `examples/` current.** Any change to env vars, flags, defaults,
|
||
triggers, provider support, or the consumer stub MUST be reflected in `README.md` and the
|
||
relevant files under `examples/` in the *same* change. The README's `Configuration` table,
|
||
the `Models & providers` table, and the example workflows are the contract users rely on —
|
||
stale docs are a bug.
|
||
- **Preserve the advisory-only invariant** (goal #2). If you touch exit codes or the workflow,
|
||
re-confirm a review can never fail/block a consumer's CI.
|
||
- **Don't add mort-specific (or any single-consumer) assumptions** to the binary or system
|
||
prompt. The system prompt is intentionally generic; repo-specific conventions should be
|
||
discovered by the agent at runtime (it can read the repo's own CONTRIBUTING/CLAUDE.md), not
|
||
hardcoded here.
|
||
- **Keep secrets out of image layers.** Private-module creds flow via BuildKit `--mount=type=secret`
|
||
in the build stage only; never bake them into the final image or commit them.
|
||
- Add a test when you add logic (see the `*_test.go` patterns). Keep `gofmt` clean and `go vet` quiet.
|
||
|
||
## Lessons
|
||
|
||
- majordomo's `LLM_*` env DSNs are **HTTPS-only** (`DSN.BaseURL()` forces `https://`), so they
|
||
can't express a plaintext local Ollama. That's why Gadfly adds the http-capable
|
||
`GADFLY_ENDPOINT_<NAME>="provider|base-url[|key]"` mechanism (see `cmd/gadfly/model.go`).
|
||
- Gitea `vars`/`secrets` are **not** auto-exposed as env in a job — the consumer stub must map
|
||
each one explicitly in its `env:` block (dynamic alias names can't be auto-enumerated).
|
||
- **`uses: docker://…:latest` is CACHED by act_runner** — a freshly-pushed `:latest` is often
|
||
NOT re-pulled, so the job silently runs the previous image. For a run that must use a specific
|
||
build (e.g. validating a just-pushed fix), pin the consumer stub to the immutable
|
||
`:sha-<short>` tag the build publishes, not `:latest`.
|
||
- **Concurrency is per-provider** (`entrypoint.sh`): each provider is a lane, lanes run in
|
||
parallel, `cap` (from `GADFLY_PROVIDER_CONCURRENCY` else `GADFLY_CONCURRENCY`, default 1) bounds
|
||
models-at-once within a lane. The review timeout (`GADFLY_TIMEOUT_SECS`) is **per-lens**, not
|
||
shared across the suite — a slow model can't starve later lenses (the original timeout bug).
|