feat: cross-model consensus consolidation (one ranked comment, not N walls)
Build & push image / build-and-push (pull_request) Successful in 4s
Adversarial Review (Gadfly) / review (pull_request) Successful in 16m16s

With >=2 models the swarm now posts ONE consensus comment instead of a
per-model comment each. Every model writes its structured findings to a shared
dir (GADFLY_FINDINGS_OUT); after the swarm finishes, a consolidation pass
(the binary in GADFLY_CONSOLIDATE_DIR mode) clusters findings by location
(±3 lines), counts how many models independently flagged each, escalates to the
highest reported severity, and renders an agreement-ranked table — cross-model
agreement being the strongest real-vs-false-positive signal we have.

- consensus.go: modelFindings artifact, collectFindings (shared with emit),
  writeFindingsOut, runConsolidate, clustering + renderConsensus. Lone
  low-severity findings fold away; a lone critical still surfaces; each model's
  full review is preserved folded inside the one comment.
- main.go: GADFLY_CONSOLIDATE_DIR dispatches to consolidation; writeFindingsOut
  after a normal review.
- emit.go: reuse collectFindings (one definition of "what a finding is").
- run.sh: when GADFLY_CONSOLIDATE=1, write findings + skip the per-model comment
  (live progress still on the status board).
- entrypoint.sh: auto-enable for >=2 models; run the consolidator after the
  barrier and upsert ONE consensus comment; FALL BACK to per-model comments if
  consolidation yields nothing (advisory invariant: never lose output).
- README + reusable workflow (`consolidate` input) + entrypoint docs updated.

Tests: clustering/agreement/tolerance, severity escalation, nit folding, lone
critical, write→consolidate round-trip, empty-dir error. gofmt/vet clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-28 18:35:17 -04:00
parent 84b891b1ba
commit 052833d830
8 changed files with 718 additions and 32 deletions
+22 -5
View File
@@ -50,6 +50,13 @@ MAX_DIFF_CHARS="${MAX_DIFF_CHARS:-60000}"
MARKER="<!-- gadfly-review:${PROVIDER}:${MODEL} -->"
say() { echo "[gadfly-review:${PROVIDER}:${MODEL}] $*" >&2; }
# When the swarm is consolidating (GADFLY_CONSOLIDATE=1, set by entrypoint.sh for
# a multi-model run), this model does NOT post its own comment — it writes its
# findings to GADFLY_FINDINGS_OUT and a single cross-model consensus comment is
# posted after the whole swarm finishes. Live progress still shows on the status
# board. Default 0 (post a per-model comment, the standalone behavior).
CONSOLIDATE="${GADFLY_CONSOLIDATE:-0}"
# Display the model's ACTUAL backend: the provider segment of the spec
# ("m1pro/qwen3.6:35b-mlx" -> "m1pro"); a bare id uses GADFLY_PROVIDER (default
# ollama-cloud). This is what the comment header shows, not the run.sh lane.
@@ -126,7 +133,9 @@ USR="$(printf 'PR #%s: %s\n\nDescription:\n%s\n\nUnified diff to review:\n```dif
# --- announce start (placeholder comment) -----------------------------------
START_TS="$(date +%s)"
say "starting review with ${MODEL}"
upsert_comment "$(printf '%s\n### 🪰 Gadfly review — `%s` (%s)\n\n⏳ Reviewing… this comment will update with findings and run time.' \
# Skip the per-model placeholder when consolidating (the consensus comment is
# posted later; live progress is on the status board).
[ "$CONSOLIDATE" = "1" ] || upsert_comment "$(printf '%s\n### 🪰 Gadfly review — `%s` (%s)\n\n⏳ Reviewing… this comment will update with findings and run time.' \
"$MARKER" "$MODEL" "$MODEL_PROVIDER")"
# --- call the model ---------------------------------------------------------
@@ -170,6 +179,7 @@ case "$PROVIDER" in
GADFLY_BODY="$BODY" \
GADFLY_MAX_DIFF_CHARS="$MAX_DIFF_CHARS" \
GADFLY_STATUS_FILE="${GADFLY_STATUS_FILE:-}" \
GADFLY_FINDINGS_OUT="${GADFLY_FINDINGS_OUT:-}" \
"$BIN" 2>"$ERR_FILE"
)"
rc=$?
@@ -204,7 +214,14 @@ esac
# --- assemble + post final comment (with run time) --------------------------
ELAPSED="$(( $(date +%s) - START_TS ))"
DUR="$(fmt_duration "$ELAPSED")"
COMMENT="$(printf '%s\n### 🪰 Gadfly review — `%s` (%s)\n\n%s\n\n<sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in %s</sub>' \
"$MARKER" "$MODEL" "$MODEL_PROVIDER" "$REVIEW" "$DUR")"
upsert_comment "$COMMENT"
say "done in ${DUR}"
# When consolidating, the binary has written this model's findings to
# GADFLY_FINDINGS_OUT; the consensus comment is posted by entrypoint.sh after the
# whole swarm finishes, so this model posts no comment of its own.
if [ "$CONSOLIDATE" = "1" ]; then
say "done in ${DUR} (consolidated; no per-model comment)"
else
COMMENT="$(printf '%s\n### 🪰 Gadfly review — `%s` (%s)\n\n%s\n\n<sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in %s</sub>' \
"$MARKER" "$MODEL" "$MODEL_PROVIDER" "$REVIEW" "$DUR")"
upsert_comment "$COMMENT"
say "done in ${DUR}"
fi