Files
gadfly/cmd/gadfly/recheck.go
T
steve f5ca813af3
Build & push image / build-and-push (pull_request) Successful in 8s
fix: address gadfly swarm review of the executus re-platform
Folds in the real findings the dogfood swarm raised on PR #20 (21 graded
real, 1 false positive):

- tools.go: anchor get_diff `path` matching to whole path tokens (foo.go no
  longer pulls in barfoo.go; a trailing "/" still scopes a directory); split
  the diff once + cache it; drop the spurious trailing blank line; fix the
  "truncated after line N" off-by-one wording. Share one tool-list source
  (allTools) between toolbox() and the executus registry.
- executus.go: drop the dead Config.Defaults caps (per-run RunnableAgent always
  overrides them); shared envBool/reviewTimeout helpers; resolveContextTokens
  logs a failed lookup and uses a 5s timeout (was 15s); note the budget guard
  is pass-granular (the wall-clock backstop covers mid-pass).
- main.go/recheck.go: shared envBool; fix package-doc drift (the removed
  finalization fallback, the paginated get_diff).
- entrypoint.sh/run.sh: export GADFLY_MAX_DIFF_CHARS directly (run.sh prefers
  it); guard the watchdog's delayed SIGKILL on a .disarmed marker so it can't
  catch the consolidation pass.
- tests: anchoring test; corrected obsolete env var + truncation-wording asserts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 11:32:23 -04:00

101 lines
5.0 KiB
Go

package main
import (
"fmt"
"strings"
)
// defaultRecheckMaxSteps bounds the verification pass. It is smaller than the
// review pass: re-checking a handful of existing findings needs fewer steps
// than discovering them.
const defaultRecheckMaxSteps = 16
// defaultRecheckDiffChars caps the diff embedded in the recheck task. It is much
// smaller than the review task's GADFLY_MAX_DIFF_CHARS: the recheck already has
// the draft findings to verify and can pull the exact hunks it needs via the
// paginated get_diff tool (optionally scoped to a path), so re-embedding the
// whole diff on every recheck step is pure burn. Override: GADFLY_RECHECK_DIFF_CHARS.
const defaultRecheckDiffChars = 20000
// recheckSystemPrompt drives the second, adversarial verification pass. The
// model is given a DRAFT review and must independently confirm each finding
// against the real code before letting it survive — the antidote to a
// single-pass reviewer that reads a couple of files, mis-connects them, and
// posts a confident but wrong "blocking" verdict.
const recheckSystemPrompt = `You are a VERIFICATION GATE for an automated adversarial code review. You are
given a DRAFT review produced by another model. Your job is NOT to write a new
review — it is to confirm or reject each finding in the draft against the ACTUAL
code, then output the corrected review.
You have read-only access to the checked-out repository — use your tools to read
files and search the code to independently verify each finding against the real
source.
For EVERY finding in the draft:
1. Independently reproduce the reasoning by reading the actual files with your
tools — do not trust the draft's claim, and do not trust the diff hunk alone.
2. KEEP the finding only if you can positively confirm it against the code.
3. DROP the finding if you cannot confirm it, or if the code contradicts it.
Watch especially for findings that ignore the "glue" around a change — the most
common false positive. Before keeping a claim that something is "missing",
"undefined", "never set", "not exported", or "won't compile", GREP THE WHOLE
REPO for it: the thing is very often satisfied in a place the original reviewer
didn't look — a shell script or Makefile that sets an env var, a CI YAML, an
adjacent file, generated code, or a wrapper that maps one name to another. A
finding that an env var X is unset is wrong if any script invokes the program
with "X=... prog". Check before you keep.
Output rules:
- Output the corrected review in the SAME format as the draft: a one-line
VERDICT ("No material issues found", "Minor issues", or "Blocking issues
found"), then the surviving findings as bullets with path:line and impact.
- Recompute the VERDICT from what SURVIVES. If every finding was dropped, the
verdict is "No material issues found".
- Do NOT invent new findings; this is a verification gate, not a fresh review.
- Do NOT include meta-commentary about the verification process or which
findings you dropped — output only the final, corrected review markdown.
- The draft ends with a fenced ` + "`gadfly-findings`" + ` JSON block. Regenerate it
so it lists ONLY the findings that SURVIVED your verification, in the same schema
({"file","line","severity","confidence","title"}; severity one of
critical/high/medium/small/trivial, confidence one of high/medium/low). If every
finding was dropped, emit an empty array ` + "`[]`" + `. Keep the block last.
- When done investigating, STOP calling tools and reply with the review.`
// recheckEnabled reports whether the verification pass should run. On unless
// GADFLY_RECHECK is explicitly a falsey value.
func recheckEnabled() bool { return envBool("GADFLY_RECHECK", true) }
// shouldRecheck decides whether to run the verification pass for a given draft.
// A clean "no material issues" draft has nothing to verify, so it is skipped
// even when rechecking is enabled — saving a whole model pass on clean PRs.
func shouldRecheck(draft string) bool {
if !recheckEnabled() {
return false
}
if strings.Contains(strings.ToLower(draft), "no material issues") {
return false
}
return true
}
// buildRecheckTask is the verification pass's user message: the draft review to
// scrutinize, with the full diff available via get_diff (and embedded here,
// truncated, to save a tool call).
func buildRecheckTask(draft, diff string) string {
maxDiff := envInt("GADFLY_RECHECK_DIFF_CHARS", defaultRecheckDiffChars)
truncNote := ""
if maxDiff > 0 && len(diff) > maxDiff {
diff = diff[:maxDiff]
truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars here; call get_diff (paginated; pass a `path` to scope it to one file) or read the changed files for the rest.]", maxDiff)
}
var b strings.Builder
b.WriteString("Verify the following DRAFT review against the actual code, drop every finding you cannot confirm, and output the corrected review.\n\n")
b.WriteString("## Draft review\n\n")
b.WriteString(draft)
b.WriteString("\n\n## PR diff under review\n\n")
fmt.Fprintf(&b, "```diff\n%s\n```%s", diff, truncNote)
return b.String()
}