feat: cross-model consensus consolidation (one ranked comment, not N walls) #17

Merged
steve merged 2 commits from feat/consensus-consolidation into main 2026-06-28 22:56:15 +00:00

2 Commits

Author SHA1 Message Date
steve 58c46db695 fix: address swarm review of consensus consolidation (PR #17)
Build & push image / build-and-push (pull_request) Successful in 6s
From PR #17's own adversarial review (now running on the Phase-1 image — note
the much cleaner, less-fragmented findings). Graded in gadfly-reports.

- Never lose output (advisory invariant): run.sh writes a stub findings file
  when a model fails/skips under consolidation, so a failed model still shows
  (as failed) and an all-models-fail run still posts a comment.
- Errored models excluded from the agreement denominator (modelFindings.Errored
  + allErrored); shown folded as "reviewer failed", not counted as clean.
- Sliding cluster window: span [line,maxLine] widened by tolerance on both
  edges, so chains of nearby findings merge instead of splitting (fixes
  agreement under-counting). Clustering is now per-file (no global O(n²) scan).
- Lone-finding headline threshold raised high -> critical (matches the docs);
  only consensus or a lone critical earns the headline, the rest fold.
- findings_file_for appends a cksum so foo:bar vs foo/bar can't collide.
- mdCell escapes the location cell + neutralizes backticks (table-injection);
  "No material issues" no longer shown when folded findings exist.
- Removed dead cluster.detail; named findingRef type; single partition/agreed
  pass; writeFindingsOut MkdirAll; captureStdout defer-restore + close.

Tests added: sliding-window chain merge, errored-denominator exclusion, lone
HIGH folds. gofmt/vet/bash -n clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 18:56:05 -04:00
steve 052833d830 feat: cross-model consensus consolidation (one ranked comment, not N walls)
Build & push image / build-and-push (pull_request) Successful in 4s
Adversarial Review (Gadfly) / review (pull_request) Successful in 16m16s
With >=2 models the swarm now posts ONE consensus comment instead of a
per-model comment each. Every model writes its structured findings to a shared
dir (GADFLY_FINDINGS_OUT); after the swarm finishes, a consolidation pass
(the binary in GADFLY_CONSOLIDATE_DIR mode) clusters findings by location
(±3 lines), counts how many models independently flagged each, escalates to the
highest reported severity, and renders an agreement-ranked table — cross-model
agreement being the strongest real-vs-false-positive signal we have.

- consensus.go: modelFindings artifact, collectFindings (shared with emit),
  writeFindingsOut, runConsolidate, clustering + renderConsensus. Lone
  low-severity findings fold away; a lone critical still surfaces; each model's
  full review is preserved folded inside the one comment.
- main.go: GADFLY_CONSOLIDATE_DIR dispatches to consolidation; writeFindingsOut
  after a normal review.
- emit.go: reuse collectFindings (one definition of "what a finding is").
- run.sh: when GADFLY_CONSOLIDATE=1, write findings + skip the per-model comment
  (live progress still on the status board).
- entrypoint.sh: auto-enable for >=2 models; run the consolidator after the
  barrier and upsert ONE consensus comment; FALL BACK to per-model comments if
  consolidation yields nothing (advisory invariant: never lose output).
- README + reusable workflow (`consolidate` input) + entrypoint docs updated.

Tests: clustering/agreement/tolerance, severity escalation, nit folding, lone
critical, write→consolidate round-trip, empty-dir error. gofmt/vet clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 18:35:17 -04:00