gadfly

Author	SHA1	Message	Date
steve	a1b0691a1e	fix: fold in gadfly's own review findings (3 real bugs) Build & push image / build-and-push (pull_request) Successful in 9s Details The dogfood swarm reviewed PR #1; folding in the warranted findings (graded via the gadfly MCP — 18 real / 18 false-positive across the 4 completed reviewers): - entrypoint.sh: finalize a never-written status file when run.sh skips the binary (empty diff / no key / missing binary). The pre-seed stayed {started:0, done:false}, so the board showed that model "waiting to start" forever and the N/N counter never completed — breaking the board's own "tell when everything is finished" invariant. (glm-5.2, correctness — the strongest finding.) - main.go: recover() in the per-lens goroutine. A panic previously crashed the whole binary (killing every other lens's output) and left the lens stuck "running" on the board. Now it's recorded as an errored result and the lens is marked finished. (glm-5.2 + minimax-m3.) - status-board.sh: coerce a non-numeric GADFLY_STATUS_POLL_SECS back to 12. Under `set -uo pipefail` a bad `sleep "$POLL"` failed silently and the loop spun, hammering the Gitea API. (glm-5.2, error-handling.) The remaining real findings (sanitizer collision, page-10 pagination, markdown-injection via PR-controlled lens names, cosmetic blank line) were graded trivial and left as-is — documented in the finding notes. gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:56:41 -04:00
steve	1cdda32dbc	feat: live status-board comment — per-model/per-lens review progress Build & push image / build-and-push (pull_request) Successful in 6s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 30m1s Details Phase 3 of the gadfly-games build. With several models × several lenses reviewing a PR, all you'd see mid-run is a row of "⏳ Reviewing…" placeholders. Add ONE consolidated, live-updating status-board comment that aggregates every model's per-lens progress (queued → running → finished + verdict), so progress is visible at a glance and a watcher can tell when the whole swarm is done. - cmd/gadfly: opt-in statusWriter (GADFLY_STATUS_FILE) publishes this model's lenses to a JSON file, written atomically (temp+rename) as runSpecialists transitions each lens. Inert when unset — plain runs and tests are unaffected. - scripts/status-board.sh: background renderer that polls the status dir and upserts one marker comment every GADFLY_STATUS_POLL_SECS (default 12s), caching the comment id to PATCH in place. Advisory and best-effort; the per-model findings comments are untouched. - entrypoint.sh: pre-seeds every model as queued, launches the board, waits only on the review lanes, then signals .done for a final render. Default on; disable with GADFLY_STATUS_BOARD=0. - Docs: README config table + "Live status board" section, example stub note, CLAUDE.md architecture map. gofmt clean, go vet quiet, go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:18:28 -04:00
Steve Dudenhoeffer	49f3623204	fix: per-lens timeout, errored-verdict honesty, accurate provider label, tighter lens focus, run timing Build & push image / build-and-push (push) Successful in 8s Details Five fixes, several surfaced by the live bake-off: - PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist (own context), not shared across the suite. A slow model (e.g. a 35B local MLX) was exhausting the whole 600s budget on lens 1, leaving the rest "step 0: context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go. - ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean". Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses errored"); the section reads "⚠️ could not complete". consolidate.go. - PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default (was wrongly "ollama-cloud" for local models). scripts/run.sh. - LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens stays in its lane, says "nothing in my area" rather than re-reporting another lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive- constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go. - RUN TIMING: run.sh posts a "⏳ Reviewing…" placeholder at model start and updates it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 20:15:40 -04:00
Steve Dudenhoeffer	7809d1b93d	feat: specialist suite — configurable + custom review lenses (one consolidated comment) Build & push image / build-and-push (push) Successful in 8s Details Replace the single generic review with a suite of focused specialists, each its own review+recheck pass, merged into ONE comment (a collapsible section per lens, led by the worst verdict; the optional `improvements` lens never escalates it). - cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness, maintainability, performance, error-handling) + opt-in (tests, docs, conventions, improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define). Precedence: built-ins < file < env. Unknown names error but don't sink the run. - cmd/gadfly/consolidate.go: verdict parse + one-comment render. - main.go: loop specialists; per-lens failure is an inline notice, never fatal. Default timeout bumped to 600s (suite runs sequentially). - base system prompt trimmed to persona+tools+discipline+output; lens-specific focus is appended per specialist (semantic re-derivation discipline kept in base). - entrypoint default models -> single model (suite already gives breadth; cost ~= specialists × models × 2). Adds gopkg.in/yaml.v3. - docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var, CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 19:23:05 -04:00
Steve Dudenhoeffer	d9405f4f69	feat: multi-provider model support via majordomo (local Ollama, OpenAI-compatible, etc.) Build & push image / build-and-push (push) Successful in 18s Details Replace the hardcoded ollama.Cloud binding with majordomo's provider registry, so Gadfly can target any backend majordomo supports without code changes. - cmd/gadfly/model.go: resolveModel() — GADFLY_PROVIDER (default ollama-cloud) prefixes bare model ids; GADFLY_MODEL may be a full provider/model spec, alias, or failover chain (verbatim). GADFLY_BASE_URL constructs openai/ollama/anthropic/ google directly at a custom endpoint (OpenAI-compatible + local/remote Ollama). GADFLY_API_KEY else the provider's standard env var. + buildSpec unit tests. - run.sh: provider-aware key gate (local Ollama needs none); maps OLLAMA_CLOUD_API_KEY -> OLLAMA_API_KEY; provider/base-url/key inherited by the binary. Gadfly-branded comment. - entrypoint.sh: GADFLY_MODELS alias for OLLAMA_REVIEW_MODELS; provider passthrough. - examples + README: Models & providers section. Upfront: only the Ollama paths (local + OpenAI-compatible-against-Ollama) are tested; OpenAI/Anthropic/Google are wired via majordomo but UNTESTED (no spend). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 18:58:00 -04:00
Steve Dudenhoeffer	c0d0152a34	Gadfly: agentic adversarial PR reviewer (initial extraction) Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/ find_files/get_diff), verifies findings before reporting, two-pass review + adversarial recheck, posts one labeled comment per model. Advisory only. - cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo - entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML) - Dockerfile: multi-stage; build-time module token never reaches the final image - .gitea/workflows/build-image.yml: tag v* → build & push image - examples/: ~15-line consumer stub - system prompt genericized + hardened to re-derive constants/formulas (semantic bugs) Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 18:42:20 -04:00

6 Commits