docs: document the Gadfly adversarial review loop in CLAUDE.md
Records the PR workflow: push work to a PR (never straight to main), wait for Gadfly to finish and weigh its findings, then grade each finding back to the gadfly-reports MCP (record_finding_grade / list_findings / scoreboard) so the telemetry can measure whether each model earns its keep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -129,6 +129,27 @@ CI: `.gitea/workflows/ci.yaml` (Gitea Actions, mirrors foreman). README.md
|
|||||||
must match reality in the same commit that changes behavior — no
|
must match reality in the same commit that changes behavior — no
|
||||||
aspirational docs; unbuilt features are marked pending in the matrix.
|
aspirational docs; unbuilt features are marked pending in the matrix.
|
||||||
|
|
||||||
|
## Adversarial review loop (Gadfly)
|
||||||
|
|
||||||
|
Ship work through PRs and let Gadfly review it before merge:
|
||||||
|
|
||||||
|
- **Push to a PR, never straight to `main`.** Branch, push, open a PR.
|
||||||
|
`.gitea/workflows/adversarial-review.yml` runs Gadfly (the standalone
|
||||||
|
agentic adversarial reviewer) — a full fleet of 9 ollama-cloud models +
|
||||||
|
the M1/M5 Macs via foreman, each running the 3-lens suite (security,
|
||||||
|
correctness, error-handling). Advisory only; it never blocks the merge.
|
||||||
|
- **Wait for Gadfly to finish, then read its output.** Don't merge while the
|
||||||
|
review is still running. Each model posts one consolidated comment; weigh
|
||||||
|
every finding on its merits and fix the real ones (Gadfly is a simple
|
||||||
|
system — findings are advisory, so confirm before acting).
|
||||||
|
- **Grade the findings back to the Gadfly MCP.** For each finding, call
|
||||||
|
`mcp__gadfly__record_finding_grade`: `is_real=true` + a `severity`
|
||||||
|
(trivial|small|medium|high|critical) for a genuine problem, or
|
||||||
|
`is_real=false` for a false positive; add `notes`/`usefulness` when
|
||||||
|
useful. Use `mcp__gadfly__list_findings` (`only_ungraded=true`) to find
|
||||||
|
what still needs grading and `mcp__gadfly__scoreboard` for the per-model
|
||||||
|
rollup. This telemetry is how we measure whether each model earns its keep.
|
||||||
|
|
||||||
## Out of scope (anti-creep)
|
## Out of scope (anti-creep)
|
||||||
|
|
||||||
No persistent store (health is in-memory behind the registry), no
|
No persistent store (health is in-memory behind the registry), no
|
||||||
|
|||||||
Reference in New Issue
Block a user