feat: claude-code reviewer engine (#2)
Build & push image / build-and-push (push) Successful in 28s
Build & push image / build-and-push (push) Successful in 28s
Phase 1: a second review engine alongside the majordomo agent loop. For each lens, shell out to the Claude Code CLI (`claude -p --output-format json`) inside the checked-out repo so it verifies findings with its own read tools, then reuse gadfly's verdict-parse + recheck + consolidate + emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`; auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY; read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node + @anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board image (PR #2 was the first dogfood with the live board + full fleet). Folded in the swarm's own review findings: minimal subprocess env (no GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr), process-group cleanup on timeout, rune-safe error truncation, and engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings graded via the gadfly MCP. gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
This commit was merged in pull request #2.
This commit is contained in:
@@ -79,6 +79,48 @@ majordomo failover chain / alias) is used verbatim.
|
||||
> and exercise the exact same code an OpenAI/OpenRouter endpoint would hit, for free. If you
|
||||
> try a cloud provider and it works (or doesn't), please open an issue.
|
||||
|
||||
### Claude Code engine (`claude-code`)
|
||||
|
||||
Besides the majordomo model loop, Gadfly can review through the **[Claude Code](https://claude.com/claude-code)
|
||||
CLI**: for each lens it shells out to `claude -p` *inside the checked-out repo*, so Claude Code
|
||||
uses its **own** read tools (Read/Grep/Glob) to verify findings against real code, then Gadfly
|
||||
parses the result and runs the same verdict-parse → recheck → consolidate → emit pipeline. The
|
||||
CLI is bundled in the image (Node + `@anthropic-ai/claude-code`).
|
||||
|
||||
Select it as a model id — bare `claude-code` (CLI default model) or `claude-code/<model>` (the
|
||||
suffix becomes `--model`, e.g. `claude-code/sonnet`, `claude-code/opus`):
|
||||
|
||||
```yaml
|
||||
GADFLY_MODELS: "claude-code/sonnet,claude-code/opus"
|
||||
```
|
||||
|
||||
Auth is read from the environment: the default is a **Pro/Max subscription** via
|
||||
`CLAUDE_CODE_OAUTH_TOKEN` (from `claude setup-token`; no `--bare`), falling back to
|
||||
`ANTHROPIC_API_KEY`. Don't set both. Tuning knobs (all optional):
|
||||
|
||||
| Env | Default | Meaning |
|
||||
|-----|---------|---------|
|
||||
| `GADFLY_CLAUDE_MODEL` | *(from the spec suffix)* | overrides the `--model` value |
|
||||
| `GADFLY_CLAUDE_PERMISSION_MODE` | `plan` | `--permission-mode` (read-only `plan` keeps it from editing) |
|
||||
| `GADFLY_CLAUDE_ALLOWED_TOOLS` | *(unset)* | `--allowedTools` value, passed verbatim (e.g. `Read,Grep,Glob`) |
|
||||
| `GADFLY_CLAUDE_EXTRA_ARGS` | *(unset)* | extra CLI args, **whitespace-split** (no shell quoting) and appended after the defaults (e.g. `--max-turns 30`) |
|
||||
| `GADFLY_CLAUDE_BIN` | `claude` | CLI binary path |
|
||||
|
||||
> These are **operator** knobs (workflow env), not PR-author input. Because
|
||||
> `GADFLY_CLAUDE_EXTRA_ARGS` is appended *after* the defaults, it can override the
|
||||
> read-only `--permission-mode plan` (e.g. passing `--permission-mode acceptEdits`),
|
||||
> so keep it read-only unless you mean otherwise. It's whitespace-split, so values
|
||||
> can't contain spaces — use `GADFLY_CLAUDE_ALLOWED_TOOLS` / `_PERMISSION_MODE` /
|
||||
> `_MODEL` for those. The subprocess runs with a **minimal environment** (its auth
|
||||
> token + `PATH`/`HOME`/locale/`GADFLY_CLAUDE_*`), not the runner's full env, so the
|
||||
> Gitea token and provider keys aren't handed to the CLI.
|
||||
|
||||
> **Untested, like the cloud providers.** This wires the CLI in and is exercised by its unit
|
||||
> tests, but a live subscription-auth run hasn't been validated end-to-end here — and using
|
||||
> subscription auth in automated CI is a gray area in Anthropic's terms. `auto` specialist
|
||||
> selection and the `delegate_investigation` worker are majordomo-only and are skipped with this
|
||||
> engine (Claude Code does its own legwork).
|
||||
|
||||
### Endpoint aliases via env vars
|
||||
|
||||
For multiple named backends (e.g. a couple of Ollama boxes on your LAN), register them by
|
||||
@@ -264,6 +306,7 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
|
||||
| `GADFLY_PROVIDER` | `ollama-cloud` | provider prefix for a bare model id |
|
||||
| `GADFLY_BASE_URL` | — | override endpoint (OpenAI/Ollama-compatible servers) |
|
||||
| `GADFLY_API_KEY` | — | provider key; falls back to the provider's standard env |
|
||||
| `claude-code` model id | — | route a model through the bundled Claude Code CLI (`claude-code` / `claude-code/<model>`); see [Claude Code engine](#claude-code-engine-claude-code) for its `GADFLY_CLAUDE_*` knobs |
|
||||
| `GADFLY_SPECIALISTS` | default suite | csv of lenses, `all`, or `auto` (dynamic selection) |
|
||||
| `GADFLY_SELECTOR_MODEL` | review model | model that picks lenses in `auto` mode |
|
||||
| `GADFLY_WORKER_MODEL` | — | cheap model for `delegate_investigation`; unset = no delegation |
|
||||
|
||||
Reference in New Issue
Block a user