feat: claude-code reviewer engine (per-lens claude -p shell-out)
Build & push image / build-and-push (pull_request) Successful in 46s
Adversarial Review (Gadfly) / review (pull_request) Failing after 30m16s

Phase 1 of the gadfly-games build. Adds a second review engine alongside
the majordomo agent loop: for each lens, shell out to the Claude Code CLI
(`claude -p`) inside the checked-out repo so it verifies findings with
its OWN read tools, then reuse gadfly's verdict-parse + recheck +
consolidate + emit pipeline unchanged.

- cmd/gadfly/engine.go: new reviewEngine interface with two impls —
  majordomoEngine (wraps the existing runAgent path) and claudeCodeEngine
  (exec `claude -p ... --output-format json`, parse `.result`). main.go's
  runSpecialists/reviewWithSpecialist are now engine-agnostic.
- Select via a model id: `claude-code` (CLI default) or
  `claude-code/<model>` (suffix → --model). Auth inherits from the env:
  Pro/Max via CLAUDE_CODE_OAUTH_TOKEN (no --bare), else ANTHROPIC_API_KEY.
  Read-only by default (--permission-mode plan); tunable via GADFLY_CLAUDE_*.
- auto-select + delegate worker are majordomo-only and are skipped with
  this engine (Claude Code does its own legwork).
- Dockerfile bundles Node + @anthropic-ai/claude-code (larger image).
- Docs: README "Claude Code engine" section + config rows, examples/
  claude-code.yml stub, examples/README + CLAUDE.md updated. Honest note
  that subscription-auth-in-CI is untested here / a ToS gray area.
- Bumps the dogfood image pin to :sha-c3d09d3 so gadfly's own PRs now
  review with the live status board from Phase 3.

New engine_test.go covers spec detection, model derivation, and argv
building (no live CLI call). gofmt clean, go vet quiet, go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-27 15:26:57 -04:00
parent c3d09d3bd4
commit 4237a18d09
12 changed files with 450 additions and 33 deletions
+34
View File
@@ -79,6 +79,39 @@ majordomo failover chain / alias) is used verbatim.
> and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you
> try a cloud provider and it works (or doesn't), please open an issue.
### Claude Code engine (`claude-code`)
Besides the majordomo model loop, Gadfly can review through the **[Claude Code](https://claude.com/claude-code)
CLI**: for each lens it shells out to `claude -p` *inside the checked-out repo*, so Claude Code
uses its **own** read tools (Read/Grep/Glob) to verify findings against real code, then Gadfly
parses the result and runs the same verdict-parse → recheck → consolidate → emit pipeline. The
CLI is bundled in the image (Node + `@anthropic-ai/claude-code`).
Select it as a model id — bare `claude-code` (CLI default model) or `claude-code/<model>` (the
suffix becomes `--model`, e.g. `claude-code/sonnet`, `claude-code/opus`):
```yaml
GADFLY_MODELS: "claude-code/sonnet,claude-code/opus"
```
Auth is read from the environment: the default is a **Pro/Max subscription** via
`CLAUDE_CODE_OAUTH_TOKEN` (from `claude setup-token`; no `--bare`), falling back to
`ANTHROPIC_API_KEY`. Don't set both. Tuning knobs (all optional):
| Env | Default | Meaning |
|-----|---------|---------|
| `GADFLY_CLAUDE_MODEL` | *(from the spec suffix)* | overrides the `--model` value |
| `GADFLY_CLAUDE_PERMISSION_MODE` | `plan` | `--permission-mode` (read-only `plan` keeps it from editing) |
| `GADFLY_CLAUDE_ALLOWED_TOOLS` | *(unset)* | `--allowedTools` value, passed verbatim (e.g. `Read,Grep,Glob`) |
| `GADFLY_CLAUDE_EXTRA_ARGS` | *(unset)* | extra CLI args appended verbatim (e.g. `--max-turns 30`) |
| `GADFLY_CLAUDE_BIN` | `claude` | CLI binary path |
> **Untested, like the cloud providers.** This wires the CLI in and is exercised by its unit
> tests, but a live subscription-auth run hasn't been validated end-to-end here — and using
> subscription auth in automated CI is a gray area in Anthropic's terms. `auto` specialist
> selection and the `delegate_investigation` worker are majordomo-only and are skipped with this
> engine (Claude Code does its own legwork).
### Endpoint aliases via env vars
For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by
@@ -264,6 +297,7 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
| `GADFLY_PROVIDER` | `ollama-cloud` | provider prefix for a bare model id |
| `GADFLY_BASE_URL` | — | override endpoint (OpenAI/Ollama-compatible servers) |
| `GADFLY_API_KEY` | — | provider key; falls back to the provider's standard env |
| `claude-code` model id | — | route a model through the bundled Claude Code CLI (`claude-code` / `claude-code/<model>`); see [Claude Code engine](#claude-code-engine-claude-code) for its `GADFLY_CLAUDE_*` knobs |
| `GADFLY_SPECIALISTS` | default suite | csv of lenses, `all`, or `auto` (dynamic selection) |
| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |