Fixes the large-PR token burn: a ~250K-token diff was re-sent every agent
step across models × lenses × passes, draining a metered usage block in
minutes. Small PRs are untouched (every mitigation is size-gated / no-op
under threshold).
- Re-platform the in-process review path onto executus run.Executor: context
compaction (executus/compact, threshold from the model's real context window
via executus/model), run-bounding, a per-PR budget gate (Ports.Budget), and
the wrap-up nudge re-expressed as a run.Critic. Lens fan-out now uses
executus/fanout. gadfly keeps its own model.go, so GADFLY_ENDPOINT_<NAME>
aliases and the claude-code engine are unaffected. No majordomo bump; the
binary stays static (executus core is majordomo+stdlib only).
- Paginate get_diff (per-file `path` + start_line/limit) instead of dumping the
whole diff; trim the recheck diff embed (60k -> 20k chars).
- entrypoint.sh: downshift the fleet above GADFLY_HUGE_DIFF_BYTES (one cheap
model, fewer lenses/steps, no recheck) + a swarm-wide GADFLY_PR_BUDGET_SECS
wall-clock backstop (adds procps for pkill). All advisory; CI never fails.
- README + CLAUDE.md + tests updated.
Note: run.Result exposes no transcript, so the old transcript-based forced-
finalization fallback is dropped; the wrap-up critic nudge is the remaining
"always emit something" mechanism.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five fixes, several surfaced by the live bake-off:
- PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist
(own context), not shared across the suite. A slow model (e.g. a 35B local MLX)
was exhausting the whole 600s budget on lens 1, leaving the rest "step 0:
context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go.
- ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean".
Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses
errored"); the section reads "⚠️ could not complete". consolidate.go.
- PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the
spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default
(was wrongly "ollama-cloud" for local models). scripts/run.sh.
- LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens
stays in its lane, says "nothing in my area" rather than re-reporting another
lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive-
constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go.
- RUN TIMING: run.sh posts a "⏳ Reviewing…" placeholder at model start and updates
it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).
- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in
Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/
find_files/get_diff), verifies findings before reporting, two-pass review +
adversarial recheck, posts one labeled comment per model. Advisory only.
- cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo
- entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML)
- Dockerfile: multi-stage; build-time module token never reaches the final image
- .gitea/workflows/build-image.yml: tag v* → build & push image
- examples/: ~15-line consumer stub
- system prompt genericized + hardened to re-derive constants/formulas (semantic bugs)
Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>