f5ca813af3
Build & push image / build-and-push (pull_request) Successful in 8s
Folds in the real findings the dogfood swarm raised on PR #20 (21 graded real, 1 false positive): - tools.go: anchor get_diff `path` matching to whole path tokens (foo.go no longer pulls in barfoo.go; a trailing "/" still scopes a directory); split the diff once + cache it; drop the spurious trailing blank line; fix the "truncated after line N" off-by-one wording. Share one tool-list source (allTools) between toolbox() and the executus registry. - executus.go: drop the dead Config.Defaults caps (per-run RunnableAgent always overrides them); shared envBool/reviewTimeout helpers; resolveContextTokens logs a failed lookup and uses a 5s timeout (was 15s); note the budget guard is pass-granular (the wall-clock backstop covers mid-pass). - main.go/recheck.go: shared envBool; fix package-doc drift (the removed finalization fallback, the paginated get_diff). - entrypoint.sh/run.sh: export GADFLY_MAX_DIFF_CHARS directly (run.sh prefers it); guard the watchdog's delayed SIGKILL on a .disarmed marker so it can't catch the consolidation pass. - tests: anchoring test; corrected obsolete env var + truncation-wording asserts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
101 lines
5.0 KiB
Go
101 lines
5.0 KiB
Go
package main
|
|
|
|
import (
|
|
"fmt"
|
|
"strings"
|
|
)
|
|
|
|
// defaultRecheckMaxSteps bounds the verification pass. It is smaller than the
|
|
// review pass: re-checking a handful of existing findings needs fewer steps
|
|
// than discovering them.
|
|
const defaultRecheckMaxSteps = 16
|
|
|
|
// defaultRecheckDiffChars caps the diff embedded in the recheck task. It is much
|
|
// smaller than the review task's GADFLY_MAX_DIFF_CHARS: the recheck already has
|
|
// the draft findings to verify and can pull the exact hunks it needs via the
|
|
// paginated get_diff tool (optionally scoped to a path), so re-embedding the
|
|
// whole diff on every recheck step is pure burn. Override: GADFLY_RECHECK_DIFF_CHARS.
|
|
const defaultRecheckDiffChars = 20000
|
|
|
|
// recheckSystemPrompt drives the second, adversarial verification pass. The
|
|
// model is given a DRAFT review and must independently confirm each finding
|
|
// against the real code before letting it survive — the antidote to a
|
|
// single-pass reviewer that reads a couple of files, mis-connects them, and
|
|
// posts a confident but wrong "blocking" verdict.
|
|
const recheckSystemPrompt = `You are a VERIFICATION GATE for an automated adversarial code review. You are
|
|
given a DRAFT review produced by another model. Your job is NOT to write a new
|
|
review — it is to confirm or reject each finding in the draft against the ACTUAL
|
|
code, then output the corrected review.
|
|
|
|
You have read-only access to the checked-out repository — use your tools to read
|
|
files and search the code to independently verify each finding against the real
|
|
source.
|
|
|
|
For EVERY finding in the draft:
|
|
1. Independently reproduce the reasoning by reading the actual files with your
|
|
tools — do not trust the draft's claim, and do not trust the diff hunk alone.
|
|
2. KEEP the finding only if you can positively confirm it against the code.
|
|
3. DROP the finding if you cannot confirm it, or if the code contradicts it.
|
|
|
|
Watch especially for findings that ignore the "glue" around a change — the most
|
|
common false positive. Before keeping a claim that something is "missing",
|
|
"undefined", "never set", "not exported", or "won't compile", GREP THE WHOLE
|
|
REPO for it: the thing is very often satisfied in a place the original reviewer
|
|
didn't look — a shell script or Makefile that sets an env var, a CI YAML, an
|
|
adjacent file, generated code, or a wrapper that maps one name to another. A
|
|
finding that an env var X is unset is wrong if any script invokes the program
|
|
with "X=... prog". Check before you keep.
|
|
|
|
Output rules:
|
|
- Output the corrected review in the SAME format as the draft: a one-line
|
|
VERDICT ("No material issues found", "Minor issues", or "Blocking issues
|
|
found"), then the surviving findings as bullets with path:line and impact.
|
|
- Recompute the VERDICT from what SURVIVES. If every finding was dropped, the
|
|
verdict is "No material issues found".
|
|
- Do NOT invent new findings; this is a verification gate, not a fresh review.
|
|
- Do NOT include meta-commentary about the verification process or which
|
|
findings you dropped — output only the final, corrected review markdown.
|
|
- The draft ends with a fenced ` + "`gadfly-findings`" + ` JSON block. Regenerate it
|
|
so it lists ONLY the findings that SURVIVED your verification, in the same schema
|
|
({"file","line","severity","confidence","title"}; severity one of
|
|
critical/high/medium/small/trivial, confidence one of high/medium/low). If every
|
|
finding was dropped, emit an empty array ` + "`[]`" + `. Keep the block last.
|
|
- When done investigating, STOP calling tools and reply with the review.`
|
|
|
|
// recheckEnabled reports whether the verification pass should run. On unless
|
|
// GADFLY_RECHECK is explicitly a falsey value.
|
|
func recheckEnabled() bool { return envBool("GADFLY_RECHECK", true) }
|
|
|
|
// shouldRecheck decides whether to run the verification pass for a given draft.
|
|
// A clean "no material issues" draft has nothing to verify, so it is skipped
|
|
// even when rechecking is enabled — saving a whole model pass on clean PRs.
|
|
func shouldRecheck(draft string) bool {
|
|
if !recheckEnabled() {
|
|
return false
|
|
}
|
|
if strings.Contains(strings.ToLower(draft), "no material issues") {
|
|
return false
|
|
}
|
|
return true
|
|
}
|
|
|
|
// buildRecheckTask is the verification pass's user message: the draft review to
|
|
// scrutinize, with the full diff available via get_diff (and embedded here,
|
|
// truncated, to save a tool call).
|
|
func buildRecheckTask(draft, diff string) string {
|
|
maxDiff := envInt("GADFLY_RECHECK_DIFF_CHARS", defaultRecheckDiffChars)
|
|
truncNote := ""
|
|
if maxDiff > 0 && len(diff) > maxDiff {
|
|
diff = diff[:maxDiff]
|
|
truncNote = fmt.Sprintf("\n\n[NOTE: diff truncated to %d chars here; call get_diff (paginated; pass a `path` to scope it to one file) or read the changed files for the rest.]", maxDiff)
|
|
}
|
|
|
|
var b strings.Builder
|
|
b.WriteString("Verify the following DRAFT review against the actual code, drop every finding you cannot confirm, and output the corrected review.\n\n")
|
|
b.WriteString("## Draft review\n\n")
|
|
b.WriteString(draft)
|
|
b.WriteString("\n\n## PR diff under review\n\n")
|
|
fmt.Fprintf(&b, "```diff\n%s\n```%s", diff, truncNote)
|
|
return b.String()
|
|
}
|