feat: structured findings contract (machine-readable gadfly-findings block)

Each lens now appends a fenced ```gadfly-findings JSON array of its findings ({file, line, severity, confidence, title}) after the prose review. The binary parses it exactly instead of scraping prose with a path:line regex, so telemetry carries PER-FINDING severity (not just the lens-level verdict) and downstream consolidation gets a clean contract to cluster on. - system-prompt.txt + recheck prompt: ask for / regenerate the block. - findings.go: extractStructuredFindings + stripFindingsBlock, with severity normalization and prose-paragraph detail enrichment. - emit.go: prefer the structured block; gracefully fall back to the heuristic scrape when a model doesn't emit a parseable one (provider-agnostic, robust). - consolidate.go: strip the block from the rendered comment (tooling-only). This is the enabling refactor for consensus consolidation + inline reviews. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 18:02:53 -04:00
parent 6e87a3e73f
commit 9bcd7b9696
6 changed files with 338 additions and 7 deletions
@@ -43,3 +43,21 @@ Output rules:
 - Only report issues you are reasonably confident are real after checking. If the diff
  is clean, say so plainly rather than inventing nits.
 - When you are done investigating, STOP calling tools and reply with the final review.
+
+Machine-readable findings — AFTER the prose review, append ONE fenced code block,
+tagged `gadfly-findings`, holding a JSON array of the SAME findings you described above
+(this block is consumed by tooling and hidden from the rendered comment):
+
+```gadfly-findings
+[
+  {"file": "path/to/file.go", "line": 123, "severity": "high", "confidence": "high", "title": "one-line summary of the issue"}
+]
+```
+
+- One object per real finding, in the same order as your prose. `file`/`line` must be a
+  concrete location you verified (the line the issue is at). `severity` is one of
+  `critical`, `high`, `medium`, `small`, `trivial`. `confidence` is your post-verification
+  confidence the issue is real: one of `high`, `medium`, `low`.
+- Include ONLY genuine problems — never verification notes ("confirmed X is safe at f:line"),
+  and never an "Outside my lens:" aside. If your lens is clean, emit an empty array `[]`.
+- This block is in ADDITION to the prose; do not drop the human-readable findings.