gadfly

Author	SHA1	Message	Date
steve	f5ca813af3	fix: address gadfly swarm review of the executus re-platform Build & push image / build-and-push (pull_request) Successful in 8s Details Folds in the real findings the dogfood swarm raised on PR #20 (21 graded real, 1 false positive): - tools.go: anchor get_diff `path` matching to whole path tokens (foo.go no longer pulls in barfoo.go; a trailing "/" still scopes a directory); split the diff once + cache it; drop the spurious trailing blank line; fix the "truncated after line N" off-by-one wording. Share one tool-list source (allTools) between toolbox() and the executus registry. - executus.go: drop the dead Config.Defaults caps (per-run RunnableAgent always overrides them); shared envBool/reviewTimeout helpers; resolveContextTokens logs a failed lookup and uses a 5s timeout (was 15s); note the budget guard is pass-granular (the wall-clock backstop covers mid-pass). - main.go/recheck.go: shared envBool; fix package-doc drift (the removed finalization fallback, the paginated get_diff). - entrypoint.sh/run.sh: export GADFLY_MAX_DIFF_CHARS directly (run.sh prefers it); guard the watchdog's delayed SIGKILL on a .disarmed marker so it can't catch the consolidation pass. - tests: anchoring test; corrected obsolete env var + truncation-wording asserts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 11:32:23 -04:00
steve	b860119332	feat: re-platform agentic review onto executus + large-PR cost controls Build & push image / build-and-push (pull_request) Successful in 11s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 25m56s Details Fixes the large-PR token burn: a ~250K-token diff was re-sent every agent step across models × lenses × passes, draining a metered usage block in minutes. Small PRs are untouched (every mitigation is size-gated / no-op under threshold). - Re-platform the in-process review path onto executus run.Executor: context compaction (executus/compact, threshold from the model's real context window via executus/model), run-bounding, a per-PR budget gate (Ports.Budget), and the wrap-up nudge re-expressed as a run.Critic. Lens fan-out now uses executus/fanout. gadfly keeps its own model.go, so GADFLY_ENDPOINT_<NAME> aliases and the claude-code engine are unaffected. No majordomo bump; the binary stays static (executus core is majordomo+stdlib only). - Paginate get_diff (per-file `path` + start_line/limit) instead of dumping the whole diff; trim the recheck diff embed (60k -> 20k chars). - entrypoint.sh: downshift the fleet above GADFLY_HUGE_DIFF_BYTES (one cheap model, fewer lenses/steps, no recheck) + a swarm-wide GADFLY_PR_BUDGET_SECS wall-clock backstop (adds procps for pkill). All advisory; CI never fails. - README + CLAUDE.md + tests updated. Note: run.Result exposes no transcript, so the old transcript-based forced- finalization fallback is dropped; the wrap-up critic nudge is the remaining "always emit something" mechanism. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 10:43:34 -04:00
steve	3095ebff23	feat: inline COMMENT-state PR review (findings anchored to changed lines) (#18 ) Build & push image / build-and-push (push) Successful in 8s Details Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-29 01:59:36 +00:00
steve	88f74aa768	feat: cross-model consensus consolidation (one ranked comment, not N walls) (#17 ) Build & push image / build-and-push (push) Successful in 9s Details Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-28 22:56:15 +00:00
steve	53971603d3	feat: structured findings contract (machine-readable gadfly-findings block) (#16 ) Build & push image / build-and-push (push) Successful in 5s Details Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-28 22:23:02 +00:00
steve	b23eeb8cbf	feat: bump majordomo + support llama-swap(s) provider spellings (#7 ) Build & push image / build-and-push (push) Successful in 7s Details Bump majordomo to the latest build and accept every llama-swap spelling (llama-swap/llama-swaps + un-hyphenated llamaswap/llamaswaps) in gadfly's endpoint switches; the LLM_* llama-swap(s):// DSN path already worked via majordomo.Parse. README + error messages + endpointProvider alias tests. Swarm review: 8/9 clean; qwen3-coder's "Blocking" was a false positive (claimed llamaswap was untested — it has dedicated test cases). Folded in its one fair nit (README now lists the un-hyphenated aliases). gofmt clean, go vet quiet, go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 23:18:56 +00:00
steve	c342bdb905	feat: add claude-code/opus reviewer + max-thinking spec support (#5 ) Build & push image / build-and-push (push) Successful in 15s Details Adds claude-code/opus to gadfly's dogfood swarm (both sonnet and opus run end-to-end), bumps the image pin to :sha-80d8f53 so the clean-lens telemetry fix is live, and adds engine support for a "claude-code/<model>:max" extended-thinking spec (MAX_THINKING_TOKENS, best-effort). Validated: only 13 findings on this clean PR vs 43 on the comparable #4 — the telemetry fix works. Folded in the swarm's two real findings: a runPass env-injection test and keeping MAX_THINKING_TOKENS in claudeEnv. Follow-up enables claude-code/opus:max once this image builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 22:39:14 +00:00
steve	80d8f53f63	fix: clean-lens findings + trim the dogfood swarm to strong reviewers (#4 ) Build & push image / build-and-push (push) Successful in 9s Details emit() now skips findings extraction for a "No material issues found" lens (its path:line refs are verification notes, not problems), fixing the FP inflation that penalized thorough clean-pass reviewers. Also trims the dogfood swarm to the strong reviewers: drops m5/qwen3.6 (last local lane), gemma4, gpt-oss:120b, and kimi-k2.7-code — leaving 6 cloud + claude-code/sonnet. Fittingly, PR #4's own 11-model review produced 43 findings that were ALL clean-verification bullets (zero real) — a live demonstration of the bug this fixes. gofmt clean, go vet quiet, go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 22:14:07 +00:00
steve	82f7ef78d5	feat: claude-code backends + llamaswap provider + dogfood the CC engine (#3 ) Build & push image / build-and-push (push) Successful in 10s Details Phase 2: bump majordomo to latest and wire its new llamaswap provider into gadfly's endpoint switches; add claude-code/sonnet to gadfly's own dogfood swarm (pin :sha-86f12c1, map CLAUDE_CODE_OAUTH_TOKEN) so the Phase-1 engine runs as a live competitor; document the Ollama-through-CC ANTHROPIC_BASE_URL proxy path as example-only. The 11-model swarm (incl. claude-code/sonnet) reviewed it; 52 findings graded via the MCP. Folded in the two real ones: a llamaswap endpointProvider test (caught by claude-code/sonnet, citing CLAUDE.md) and adding "openai-compatible" to the provider error messages (gpt-oss). gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 21:53:41 +00:00
steve	86f12c126f	feat: claude-code reviewer engine (#2 ) Build & push image / build-and-push (push) Successful in 28s Details Phase 1: a second review engine alongside the majordomo agent loop. For each lens, shell out to the Claude Code CLI (`claude -p --output-format json`) inside the checked-out repo so it verifies findings with its own read tools, then reuse gadfly's verdict-parse + recheck + consolidate + emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`; auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY; read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node + @anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board image (PR #2 was the first dogfood with the live board + full fleet). Folded in the swarm's own review findings: minimal subprocess env (no GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr), process-group cleanup on timeout, rune-safe error truncation, and engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings graded via the gadfly MCP. gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 20:40:41 +00:00
steve	c3d09d3bd4	feat: live status-board comment + full-fleet dogfood (#1 ) Build & push image / build-and-push (push) Successful in 6s Details Phase 3: one consolidated, live-updating PR comment aggregating every model's per-lens progress (queued -> running -> finished + verdict), so the swarm's progress is visible at a glance and a watcher can tell when it's done. Opt-in statusWriter in the binary (atomic writes) + a background status-board.sh renderer wired through entrypoint.sh; default on, GADFLY_STATUS_BOARD=0 to disable. Also restores gadfly's dogfood swarm to the full cloud fleet (9 cloud + M5; M1 dropped as too slow) matching mort, and folds in the 3 real bugs the swarm found on its own PR (skip-binary stuck-waiting, panic-stuck lens, busy-loop on bad poll interval). All 36 findings graded via the gadfly MCP (18 real / 18 false-positive). gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com> Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>	2026-06-27 19:00:12 +00:00
steve	d7f364d803	feat: optional findings telemetry — emit runs+findings to a gadfly-reports store Build & push image / build-and-push (push) Successful in 8s Details After each review the binary POSTs the run + its heuristically-extracted findings to GADFLY_FINDINGS_URL (off unless set). Advisory: any error only goes to stderr — never touches stdout, the exit code, or the review. stdlib net/http only (no new deps). entrypoint.sh derives GADFLY_REPO/GADFLY_PR and passes through GADFLY_FINDINGS_URL/GADFLY_FINDINGS_TOKEN. Also renames store references from the old 'docket' name to 'gadfly-reports'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 09:09:51 -04:00
steve	d0de034726	feat: configurable lens fan-out, per-provider like model concurrency Build & push image / build-and-push (push) Successful in 9s Details Specialist lenses ran strictly sequentially within a model. Add a GADFLY_LENS_CONCURRENCY knob (default 1 = unchanged) that overlaps the independent per-lens review+recheck passes, so a model posts its consolidated comment as soon as its lenses finish. Per-provider configurable, mirroring GADFLY_PROVIDER_CONCURRENCY: GADFLY_PROVIDER_LENS_CONCURRENCY takes a "provider=N,..." map keyed by the same provider lanes (modelProvider() mirrors entrypoint's provider_of; providerOverride() mirrors provider_cap). The override wins for the model's lane, else the scalar default. runSpecialists fans out via a bounded worker pool, order-preserving (results written by index) and keeping each lens's own timeout/recheck. repoFS is immutable + fresh-toolbox-per-pass, so lenses share no mutable state (verified under -race). Docs/examples updated; dropped a duplicate GADFLY_TIMEOUT_SECS README row. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 22:53:27 -04:00
steve	6e3a83c437	feat: add foreman provider type for endpoint overrides Build & push image / build-and-push (push) Successful in 7s Details Accept "foreman" in both resolveModel (GADFLY_BASE_URL) and endpointProvider (GADFLY_ENDPOINT_) switches, mapping to majordomo's ollama.Foreman() preset (handles foreman's non-streaming/long-poll quirks). Unlike the HTTPS-only LLM_ foreman:// DSN, the base URL is verbatim, so a plaintext http:// foreman queue works. Tests + README provider table + endpoint-aliases example updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 20:13:47 -04:00
Steve Dudenhoeffer	b409dff4ed	fix: parseVerdict matches leniently + earliest phrase wins Build & push image / build-and-push (push) Successful in 8s Details A section that led with 'Blocking issues' (no 'found') fell through to unknown, so the consolidated header wrongly read 'No material issues found' (seen live on gpt-oss). Now matches 'blocking issue'/'minor issue'/'no material issue' and picks the earliest-appearing phrase (the lead verdict). + tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 21:28:19 -04:00
Steve Dudenhoeffer	92bf22a1be	fix: correctness lens emoji ✅ -> 🎯 (✅ read like 'no issues') Build & push image / build-and-push (push) Successful in 8s Details A green check on a section reporting blocking issues was misleading; 🎯 signals accuracy/on-target. Section verdict text already conveys pass/fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 20:50:39 -04:00
Steve Dudenhoeffer	49f3623204	fix: per-lens timeout, errored-verdict honesty, accurate provider label, tighter lens focus, run timing Build & push image / build-and-push (push) Successful in 8s Details Five fixes, several surfaced by the live bake-off: - PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist (own context), not shared across the suite. A slow model (e.g. a 35B local MLX) was exhausting the whole 600s budget on lens 1, leaving the rest "step 0: context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go. - ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean". Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses errored"); the section reads "⚠️ could not complete". consolidate.go. - PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default (was wrongly "ollama-cloud" for local models). scripts/run.sh. - LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens stays in its lane, says "nothing in my area" rather than re-reporting another lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive- constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go. - RUN TIMING: run.sh posts a "⏳ Reviewing…" placeholder at model start and updates it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 20:15:40 -04:00
Steve Dudenhoeffer	4b8f9aa39b	feat: dynamic `auto` specialist selection + worker-tier delegation Build & push image / build-and-push (push) Successful in 33s Details Two Phase-2 swarm upgrades: - auto.go: GADFLY_SPECIALISTS=auto routes the review — a selector model (GADFLY_SELECTOR_MODEL, else the review model) reads the changed files + PR description and picks the smallest relevant lens set from the catalog, and may propose ad-hoc lenses for gaps (e.g. migrations). Structured output via majordomo.Generate[T]; capped + de-duped; falls back to the default suite. - delegate.go: GADFLY_WORKER_MODEL adds a delegate_investigation tool so the reviewer offloads mechanical legwork (trace callers, gather usages) to a cheap worker sub-agent that returns an evidence-cited digest — the top model reasons over summaries, not raw file dumps. Workers get an fs-only toolbox (no sub-delegation). Unset = off. resolveSpecialists now also returns the registry + an auto flag. Docs (README Specialists + config table, CLAUDE.md, main.go header) + tests updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 19:35:59 -04:00
Steve Dudenhoeffer	7809d1b93d	feat: specialist suite — configurable + custom review lenses (one consolidated comment) Build & push image / build-and-push (push) Successful in 8s Details Replace the single generic review with a suite of focused specialists, each its own review+recheck pass, merged into ONE comment (a collapsible section per lens, led by the worst verdict; the optional `improvements` lens never escalates it). - cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness, maintainability, performance, error-handling) + opt-in (tests, docs, conventions, improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define). Precedence: built-ins < file < env. Unknown names error but don't sink the run. - cmd/gadfly/consolidate.go: verdict parse + one-comment render. - main.go: loop specialists; per-lens failure is an inline notice, never fatal. Default timeout bumped to 600s (suite runs sequentially). - base system prompt trimmed to persona+tools+discipline+output; lens-specific focus is appended per specialist (semantic re-derivation discipline kept in base). - entrypoint default models -> single model (suite already gives breadth; cost ~= specialists × models × 2). Adds gopkg.in/yaml.v3. - docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var, CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 19:23:05 -04:00
Steve Dudenhoeffer	bd76aa8286	feat: env-defined endpoint aliases (http-capable, local Ollama friendly) Build & push image / build-and-push (push) Successful in 9s Details majordomo's built-in LLM_* env DSNs are HTTPS-only (DSN.BaseURL forces https), so they can't express a plaintext local Ollama. Add Gadfly-native env families that register named providers/aliases with majordomo before resolution: GADFLY_ENDPOINT_<NAME>="<provider>\|<base-url>[\|<key>]" # base URL verbatim (http ok) GADFLY_ALIAS_<NAME>="<majordomo spec>" # plain alias / failover chain Then reference them as "<name>/<model>" (or the bare alias) in GADFLY_MODEL(S). <NAME> lowercases to the registry name, matching majordomo's LLM_* convention. LLM_* DSNs still work (and are documented) for HTTPS endpoints. + unit tests, README "Endpoint aliases via env vars", stub example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 19:01:07 -04:00
Steve Dudenhoeffer	d9405f4f69	feat: multi-provider model support via majordomo (local Ollama, OpenAI-compatible, etc.) Build & push image / build-and-push (push) Successful in 18s Details Replace the hardcoded ollama.Cloud binding with majordomo's provider registry, so Gadfly can target any backend majordomo supports without code changes. - cmd/gadfly/model.go: resolveModel() — GADFLY_PROVIDER (default ollama-cloud) prefixes bare model ids; GADFLY_MODEL may be a full provider/model spec, alias, or failover chain (verbatim). GADFLY_BASE_URL constructs openai/ollama/anthropic/ google directly at a custom endpoint (OpenAI-compatible + local/remote Ollama). GADFLY_API_KEY else the provider's standard env var. + buildSpec unit tests. - run.sh: provider-aware key gate (local Ollama needs none); maps OLLAMA_CLOUD_API_KEY -> OLLAMA_API_KEY; provider/base-url/key inherited by the binary. Gadfly-branded comment. - entrypoint.sh: GADFLY_MODELS alias for OLLAMA_REVIEW_MODELS; provider passthrough. - examples + README: Models & providers section. Upfront: only the Ollama paths (local + OpenAI-compatible-against-Ollama) are tested; OpenAI/Anthropic/Google are wired via majordomo but UNTESTED (no spend). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 18:58:00 -04:00
Steve Dudenhoeffer	c0d0152a34	Gadfly: agentic adversarial PR reviewer (initial extraction) Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/ find_files/get_diff), verifies findings before reporting, two-pass review + adversarial recheck, posts one labeled comment per model. Advisory only. - cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo - entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML) - Dockerfile: multi-stage; build-time module token never reaches the final image - .gitea/workflows/build-image.yml: tag v* → build & push image - examples/: ~15-line consumer stub - system prompt genericized + hardened to re-derive constants/formulas (semantic bugs) Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 18:42:20 -04:00

22 Commits