feat: cross-model consensus consolidation (one ranked comment, not N walls)
With >=2 models the swarm now posts ONE consensus comment instead of a per-model comment each. Every model writes its structured findings to a shared dir (GADFLY_FINDINGS_OUT); after the swarm finishes, a consolidation pass (the binary in GADFLY_CONSOLIDATE_DIR mode) clusters findings by location (±3 lines), counts how many models independently flagged each, escalates to the highest reported severity, and renders an agreement-ranked table — cross-model agreement being the strongest real-vs-false-positive signal we have. - consensus.go: modelFindings artifact, collectFindings (shared with emit), writeFindingsOut, runConsolidate, clustering + renderConsensus. Lone low-severity findings fold away; a lone critical still surfaces; each model's full review is preserved folded inside the one comment. - main.go: GADFLY_CONSOLIDATE_DIR dispatches to consolidation; writeFindingsOut after a normal review. - emit.go: reuse collectFindings (one definition of "what a finding is"). - run.sh: when GADFLY_CONSOLIDATE=1, write findings + skip the per-model comment (live progress still on the status board). - entrypoint.sh: auto-enable for >=2 models; run the consolidator after the barrier and upsert ONE consensus comment; FALL BACK to per-model comments if consolidation yields nothing (advisory invariant: never lose output). - README + reusable workflow (`consolidate` input) + entrypoint docs updated. Tests: clustering/agreement/tolerance, severity escalation, nit folding, lone critical, write→consolidate round-trip, empty-dir error. gofmt/vet clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -277,6 +277,35 @@ every `GADFLY_STATUS_POLL_SECS` (default 12s) until the swarm finishes. It's adv
|
||||
best-effort — the per-model findings comments are unaffected — and entirely separate from those.
|
||||
Turn it off with `GADFLY_STATUS_BOARD=0`.
|
||||
|
||||
### Consensus consolidation
|
||||
|
||||
With **two or more models**, posting one comment each means a reader faces N walls of prose that
|
||||
mostly agree. Instead Gadfly consolidates: every model writes its findings to a shared file, and
|
||||
after the whole swarm finishes a single pass clusters those findings by location, counts **how
|
||||
many models independently flagged each one**, and posts **one consensus comment**:
|
||||
|
||||
```
|
||||
## 🪰 Gadfly review — consensus across 7 models
|
||||
|
||||
**Verdict: Blocking issues found** · 9 findings (3 with multi-model agreement)
|
||||
|
||||
| | Finding | Where | Models | Lens |
|
||||
|--|--|--|--|--|
|
||||
| 🔴 | Auth bypass: token not verified | `auth/login.go:42` | 6/7 | security |
|
||||
| 🟠 | Unbounded retry loop | `sync/worker.go:88` | 3/7 | error-handling |
|
||||
|
||||
<details><summary>4 single-model findings (lower confidence)</summary> … </details>
|
||||
<details><summary>Per-model detail</summary> … each model's full review, folded … </details>
|
||||
```
|
||||
|
||||
Cross-model agreement is the strongest real-vs-false-positive signal available, so findings are
|
||||
ranked by it (a lone low-severity finding folds away; a lone *critical* still surfaces). The
|
||||
per-model comments are suppressed in this mode — each model's full review is preserved, folded,
|
||||
inside the consensus comment — and nothing is lost: if consolidation can't run, Gadfly falls
|
||||
back to posting the per-model comments. Controlled by `GADFLY_CONSOLIDATE`: `auto` (default — on
|
||||
for ≥2 models), `1` (force on), `0` (force off, one comment per model). Single-model runs are
|
||||
unaffected.
|
||||
|
||||
### Triggers
|
||||
|
||||
1. A **new/reopened/ready** non-draft PR — automatic.
|
||||
@@ -362,6 +391,7 @@ The reviewer binary reads these (the stub/entrypoint set sane defaults):
|
||||
| `GADFLY_MAX_DIFF_CHARS` | 60000 | diff chars embedded in the prompt (full diff via `get_diff`) |
|
||||
| `GADFLY_STATUS_BOARD` | on | set `0` to disable the live status-board comment |
|
||||
| `GADFLY_STATUS_POLL_SECS` | 12 | how often the status board re-renders/upserts |
|
||||
| `GADFLY_CONSOLIDATE` | `auto` | cross-model consensus comment: `auto` (on for ≥2 models), `1` (force on), `0` (off — one comment per model) |
|
||||
| `GADFLY_TRIGGER_PHRASE` | `@gadfly review` | comment phrase that re-triggers |
|
||||
| `GADFLY_ALLOWED_USERS` | *(collaborators)* | comma-separated allow-list for comment triggers |
|
||||
| `GADFLY_FINDINGS_URL` | — | gadfly-reports store base URL; set to enable findings telemetry (off when empty) |
|
||||
|
||||
Reference in New Issue
Block a user