From PR #17's own adversarial review (now running on the Phase-1 image — note
the much cleaner, less-fragmented findings). Graded in gadfly-reports.
- Never lose output (advisory invariant): run.sh writes a stub findings file
when a model fails/skips under consolidation, so a failed model still shows
(as failed) and an all-models-fail run still posts a comment.
- Errored models excluded from the agreement denominator (modelFindings.Errored
+ allErrored); shown folded as "reviewer failed", not counted as clean.
- Sliding cluster window: span [line,maxLine] widened by tolerance on both
edges, so chains of nearby findings merge instead of splitting (fixes
agreement under-counting). Clustering is now per-file (no global O(n²) scan).
- Lone-finding headline threshold raised high -> critical (matches the docs);
only consensus or a lone critical earns the headline, the rest fold.
- findings_file_for appends a cksum so foo:bar vs foo/bar can't collide.
- mdCell escapes the location cell + neutralizes backticks (table-injection);
"No material issues" no longer shown when folded findings exist.
- Removed dead cluster.detail; named findingRef type; single partition/agreed
pass; writeFindingsOut MkdirAll; captureStdout defer-restore + close.
Tests added: sliding-window chain merge, errored-denominator exclusion, lone
HIGH folds. gofmt/vet/bash -n clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
With >=2 models the swarm now posts ONE consensus comment instead of a
per-model comment each. Every model writes its structured findings to a shared
dir (GADFLY_FINDINGS_OUT); after the swarm finishes, a consolidation pass
(the binary in GADFLY_CONSOLIDATE_DIR mode) clusters findings by location
(±3 lines), counts how many models independently flagged each, escalates to the
highest reported severity, and renders an agreement-ranked table — cross-model
agreement being the strongest real-vs-false-positive signal we have.
- consensus.go: modelFindings artifact, collectFindings (shared with emit),
writeFindingsOut, runConsolidate, clustering + renderConsensus. Lone
low-severity findings fold away; a lone critical still surfaces; each model's
full review is preserved folded inside the one comment.
- main.go: GADFLY_CONSOLIDATE_DIR dispatches to consolidation; writeFindingsOut
after a normal review.
- emit.go: reuse collectFindings (one definition of "what a finding is").
- run.sh: when GADFLY_CONSOLIDATE=1, write findings + skip the per-model comment
(live progress still on the status board).
- entrypoint.sh: auto-enable for >=2 models; run the consolidator after the
barrier and upsert ONE consensus comment; FALL BACK to per-model comments if
consolidation yields nothing (advisory invariant: never lose output).
- README + reusable workflow (`consolidate` input) + entrypoint docs updated.
Tests: clustering/agreement/tolerance, severity escalation, nit folding, lone
critical, write→consolidate round-trip, empty-dir error. gofmt/vet clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>