Adds the adversarial-review workflow to gadfly itself (copied from mort: 3 cloud + m1/m5 via foreman, findings telemetry, sha-d7f364d). Future gadfly PRs get reviewed by the swarm.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After each review the binary POSTs the run + its heuristically-extracted findings to GADFLY_FINDINGS_URL (off unless set). Advisory: any error only goes to stderr — never touches stdout, the exit code, or the review. stdlib net/http only (no new deps). entrypoint.sh derives GADFLY_REPO/GADFLY_PR and passes through GADFLY_FINDINGS_URL/GADFLY_FINDINGS_TOKEN. Also renames store references from the old 'docket' name to 'gadfly-reports'.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Specialist lenses ran strictly sequentially within a model. Add a
GADFLY_LENS_CONCURRENCY knob (default 1 = unchanged) that overlaps the
independent per-lens review+recheck passes, so a model posts its
consolidated comment as soon as its lenses finish.
Per-provider configurable, mirroring GADFLY_PROVIDER_CONCURRENCY:
GADFLY_PROVIDER_LENS_CONCURRENCY takes a "provider=N,..." map keyed by
the same provider lanes (modelProvider() mirrors entrypoint's provider_of;
providerOverride() mirrors provider_cap). The override wins for the model's
lane, else the scalar default.
runSpecialists fans out via a bounded worker pool, order-preserving
(results written by index) and keeping each lens's own timeout/recheck.
repoFS is immutable + fresh-toolbox-per-pass, so lenses share no mutable
state (verified under -race). Docs/examples updated; dropped a duplicate
GADFLY_TIMEOUT_SECS README row.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Accept "foreman" in both resolveModel (GADFLY_BASE_URL) and endpointProvider
(GADFLY_ENDPOINT_*) switches, mapping to majordomo's ollama.Foreman() preset
(handles foreman's non-streaming/long-poll quirks). Unlike the HTTPS-only
LLM_* foreman:// DSN, the base URL is verbatim, so a plaintext http:// foreman
queue works. Tests + README provider table + endpoint-aliases example updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per a Gadfly self-review finding (kimi-k2.7-code): an issue_comment can start a
secret-bearing run before the in-container allowed-users check. Add a workflow
if: that only lets trusted actors trigger via comment (PR/dispatch already
trusted); keep GADFLY_ALLOWED_USERS as the belt-and-suspenders layer. README
documents it + the default-branch caveat for comment triggers. (Docs/examples
only — paths-ignored, no image rebuild.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EOF
A section that led with '**Blocking issues**' (no 'found') fell through to
unknown, so the consolidated header wrongly read 'No material issues found'
(seen live on gpt-oss). Now matches 'blocking issue'/'minor issue'/'no material
issue' and picks the earliest-appearing phrase (the lead verdict). + tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A green check on a section reporting blocking issues was misleading; 🎯 signals
accuracy/on-target. Section verdict text already conveys pass/fail.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
entrypoint.sh groups models by provider into lanes that run in PARALLEL; within
a lane at most `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY map
("ollama-cloud=3,m1pro=1") else GADFLY_CONCURRENCY (default 1). So a single
local box stays serial (1 at a time) while cloud models run several at once and
both lanes progress simultaneously. Portable bash (no associative arrays).
Default cap 1 keeps a single-provider pool sequential as before. Pairs with the
per-lens timeout so a slow lane can't starve others. Docs: README Concurrency
section + config table; CLAUDE.md lessons incl. the docker://:latest cache gotcha.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five fixes, several surfaced by the live bake-off:
- PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist
(own context), not shared across the suite. A slow model (e.g. a 35B local MLX)
was exhausting the whole 600s budget on lens 1, leaving the rest "step 0:
context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go.
- ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean".
Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses
errored"); the section reads "⚠️ could not complete". consolidate.go.
- PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the
spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default
(was wrongly "ollama-cloud" for local models). scripts/run.sh.
- LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens
stays in its lane, says "nothing in my area" rather than re-reporting another
lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive-
constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go.
- RUN TIMING: run.sh posts a "⏳ Reviewing…" placeholder at model start and updates
it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two Phase-2 swarm upgrades:
- auto.go: GADFLY_SPECIALISTS=auto routes the review — a selector model
(GADFLY_SELECTOR_MODEL, else the review model) reads the changed files + PR
description and picks the smallest relevant lens set from the catalog, and may
propose ad-hoc lenses for gaps (e.g. migrations). Structured output via
majordomo.Generate[T]; capped + de-duped; falls back to the default suite.
- delegate.go: GADFLY_WORKER_MODEL adds a delegate_investigation tool so the
reviewer offloads mechanical legwork (trace callers, gather usages) to a cheap
worker sub-agent that returns an evidence-cited digest — the top model reasons
over summaries, not raw file dumps. Workers get an fs-only toolbox (no
sub-delegation). Unset = off.
resolveSpecialists now also returns the registry + an auto flag. Docs (README
Specialists + config table, CLAUDE.md, main.go header) + tests updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).
- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
majordomo's built-in LLM_* env DSNs are HTTPS-only (DSN.BaseURL forces https),
so they can't express a plaintext local Ollama. Add Gadfly-native env families
that register named providers/aliases with majordomo before resolution:
GADFLY_ENDPOINT_<NAME>="<provider>|<base-url>[|<key>]" # base URL verbatim (http ok)
GADFLY_ALIAS_<NAME>="<majordomo spec>" # plain alias / failover chain
Then reference them as "<name>/<model>" (or the bare alias) in GADFLY_MODEL(S).
<NAME> lowercases to the registry name, matching majordomo's LLM_* convention.
LLM_* DSNs still work (and are documented) for HTTPS endpoints. + unit tests,
README "Endpoint aliases via env vars", stub example.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hardcoded ollama.Cloud binding with majordomo's provider registry,
so Gadfly can target any backend majordomo supports without code changes.
- cmd/gadfly/model.go: resolveModel() — GADFLY_PROVIDER (default ollama-cloud)
prefixes bare model ids; GADFLY_MODEL may be a full provider/model spec, alias,
or failover chain (verbatim). GADFLY_BASE_URL constructs openai/ollama/anthropic/
google directly at a custom endpoint (OpenAI-compatible + local/remote Ollama).
GADFLY_API_KEY else the provider's standard env var. + buildSpec unit tests.
- run.sh: provider-aware key gate (local Ollama needs none); maps OLLAMA_CLOUD_API_KEY
-> OLLAMA_API_KEY; provider/base-url/key inherited by the binary. Gadfly-branded comment.
- entrypoint.sh: GADFLY_MODELS alias for OLLAMA_REVIEW_MODELS; provider passthrough.
- examples + README: Models & providers section. Upfront: only the Ollama paths
(local + OpenAI-compatible-against-Ollama) are tested; OpenAI/Anthropic/Google
are wired via majordomo but UNTESTED (no spend).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror mort-ci.yml's build-and-push: BuildKit secrets (REGISTRY_USER/
REGISTRY_PASSWORD) for private majordomo access instead of build-args, and the
LAN --add-host so the builder can reach the registry. push main -> :latest +
:sha-<short>; tag v* -> :<tag> + :latest; other branches -> :branch-<safe>;
PRs build-only (no push). Optional DISCORD_WEBHOOK_URL notifications.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in
Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/
find_files/get_diff), verifies findings before reporting, two-pass review +
adversarial recheck, posts one labeled comment per model. Advisory only.
- cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo
- entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML)
- Dockerfile: multi-stage; build-time module token never reaches the final image
- .gitea/workflows/build-image.yml: tag v* → build & push image
- examples/: ~15-line consumer stub
- system prompt genericized + hardened to re-derive constants/formulas (semantic bugs)
Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>