Commit Graph

24 Commits

Author SHA1 Message Date
steve c342bdb905 feat: add claude-code/opus reviewer + max-thinking spec support (#5)
Build & push image / build-and-push (push) Successful in 15s
Adds claude-code/opus to gadfly's dogfood swarm (both sonnet and opus run
end-to-end), bumps the image pin to :sha-80d8f53 so the clean-lens
telemetry fix is live, and adds engine support for a
"claude-code/<model>:max" extended-thinking spec (MAX_THINKING_TOKENS,
best-effort). Validated: only 13 findings on this clean PR vs 43 on the
comparable #4 — the telemetry fix works.

Folded in the swarm's two real findings: a runPass env-injection test and
keeping MAX_THINKING_TOKENS in claudeEnv. Follow-up enables
claude-code/opus:max once this image builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 22:39:14 +00:00
steve 80d8f53f63 fix: clean-lens findings + trim the dogfood swarm to strong reviewers (#4)
Build & push image / build-and-push (push) Successful in 9s
emit() now skips findings extraction for a "No material issues found"
lens (its path:line refs are verification notes, not problems), fixing
the FP inflation that penalized thorough clean-pass reviewers. Also trims
the dogfood swarm to the strong reviewers: drops m5/qwen3.6 (last local
lane), gemma4, gpt-oss:120b, and kimi-k2.7-code — leaving 6 cloud +
claude-code/sonnet.

Fittingly, PR #4's own 11-model review produced 43 findings that were ALL
clean-verification bullets (zero real) — a live demonstration of the bug
this fixes. gofmt clean, go vet quiet, go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 22:14:07 +00:00
steve 82f7ef78d5 feat: claude-code backends + llamaswap provider + dogfood the CC engine (#3)
Build & push image / build-and-push (push) Successful in 10s
Phase 2: bump majordomo to latest and wire its new llamaswap provider
into gadfly's endpoint switches; add claude-code/sonnet to gadfly's own
dogfood swarm (pin :sha-86f12c1, map CLAUDE_CODE_OAUTH_TOKEN) so the
Phase-1 engine runs as a live competitor; document the Ollama-through-CC
ANTHROPIC_BASE_URL proxy path as example-only.

The 11-model swarm (incl. claude-code/sonnet) reviewed it; 52 findings
graded via the MCP. Folded in the two real ones: a llamaswap
endpointProvider test (caught by claude-code/sonnet, citing CLAUDE.md)
and adding "openai-compatible" to the provider error messages (gpt-oss).

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 21:53:41 +00:00
steve 86f12c126f feat: claude-code reviewer engine (#2)
Build & push image / build-and-push (push) Successful in 28s
Phase 1: a second review engine alongside the majordomo agent loop. For
each lens, shell out to the Claude Code CLI (`claude -p --output-format
json`) inside the checked-out repo so it verifies findings with its own
read tools, then reuse gadfly's verdict-parse + recheck + consolidate +
emit pipeline. Select via GADFLY_MODELS `claude-code`/`claude-code/<model>`;
auth via CLAUDE_CODE_OAUTH_TOKEN (no --bare) else ANTHROPIC_API_KEY;
read-only by default; GADFLY_CLAUDE_* knobs. Dockerfile bundles Node +
@anthropic-ai/claude-code. Also bumped the dogfood pin to the status-board
image (PR #2 was the first dogfood with the live board + full fleet).

Folded in the swarm's own review findings: minimal subprocess env (no
GITEA_TOKEN leak to the CLI), runPass robustness (ctx/empty-result/runErr),
process-group cleanup on timeout, rune-safe error truncation, and
engine-neutral prompts (also de-mort-ified the recheck prompt). 66 findings
graded via the gadfly MCP.

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 20:40:41 +00:00
steve c3d09d3bd4 feat: live status-board comment + full-fleet dogfood (#1)
Build & push image / build-and-push (push) Successful in 6s
Phase 3: one consolidated, live-updating PR comment aggregating every
model's per-lens progress (queued -> running -> finished + verdict), so
the swarm's progress is visible at a glance and a watcher can tell when
it's done. Opt-in statusWriter in the binary (atomic writes) + a
background status-board.sh renderer wired through entrypoint.sh; default
on, GADFLY_STATUS_BOARD=0 to disable.

Also restores gadfly's dogfood swarm to the full cloud fleet (9 cloud +
M5; M1 dropped as too slow) matching mort, and folds in the 3 real bugs
the swarm found on its own PR (skip-binary stuck-waiting, panic-stuck
lens, busy-loop on bad poll interval). All 36 findings graded via the
gadfly MCP (18 real / 18 false-positive).

gofmt clean, go vet quiet, go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
Co-committed-by: Steve Dudenhoeffer <steve@stevedudenhoeffer.com>
2026-06-27 19:00:12 +00:00
steve 0ad5b66170 ci: dogfood — gadfly reviews its own PRs (mort's full-fleet setup)
Build & push image / build-and-push (push) Successful in 14s
Adds the adversarial-review workflow to gadfly itself (copied from mort: 3 cloud + m1/m5 via foreman, findings telemetry, sha-d7f364d). Future gadfly PRs get reviewed by the swarm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 13:26:37 -04:00
steve d7f364d803 feat: optional findings telemetry — emit runs+findings to a gadfly-reports store
Build & push image / build-and-push (push) Successful in 8s
After each review the binary POSTs the run + its heuristically-extracted findings to GADFLY_FINDINGS_URL (off unless set). Advisory: any error only goes to stderr — never touches stdout, the exit code, or the review. stdlib net/http only (no new deps). entrypoint.sh derives GADFLY_REPO/GADFLY_PR and passes through GADFLY_FINDINGS_URL/GADFLY_FINDINGS_TOKEN. Also renames store references from the old 'docket' name to 'gadfly-reports'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 09:09:51 -04:00
steve d0de034726 feat: configurable lens fan-out, per-provider like model concurrency
Build & push image / build-and-push (push) Successful in 9s
Specialist lenses ran strictly sequentially within a model. Add a
GADFLY_LENS_CONCURRENCY knob (default 1 = unchanged) that overlaps the
independent per-lens review+recheck passes, so a model posts its
consolidated comment as soon as its lenses finish.

Per-provider configurable, mirroring GADFLY_PROVIDER_CONCURRENCY:
GADFLY_PROVIDER_LENS_CONCURRENCY takes a "provider=N,..." map keyed by
the same provider lanes (modelProvider() mirrors entrypoint's provider_of;
providerOverride() mirrors provider_cap). The override wins for the model's
lane, else the scalar default.

runSpecialists fans out via a bounded worker pool, order-preserving
(results written by index) and keeping each lens's own timeout/recheck.
repoFS is immutable + fresh-toolbox-per-pass, so lenses share no mutable
state (verified under -race). Docs/examples updated; dropped a duplicate
GADFLY_TIMEOUT_SECS README row.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:53:27 -04:00
steve 6e3a83c437 feat: add foreman provider type for endpoint overrides
Build & push image / build-and-push (push) Successful in 7s
Accept "foreman" in both resolveModel (GADFLY_BASE_URL) and endpointProvider
(GADFLY_ENDPOINT_*) switches, mapping to majordomo's ollama.Foreman() preset
(handles foreman's non-streaming/long-poll quirks). Unlike the HTTPS-only
LLM_* foreman:// DSN, the base URL is verbatim, so a plaintext http:// foreman
queue works. Tests + README provider table + endpoint-aliases example updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 20:13:47 -04:00
Steve Dudenhoeffer a1e9d109e5 security: add job-level if-guard to example stubs (gate comment trigger by actor)
Build & push image / build-and-push (push) Successful in 5s
Per a Gadfly self-review finding (kimi-k2.7-code): an issue_comment can start a
secret-bearing run before the in-container allowed-users check. Add a workflow
if: that only lets trusted actors trigger via comment (PR/dispatch already
trusted); keep GADFLY_ALLOWED_USERS as the belt-and-suspenders layer. README
documents it + the default-branch caveat for comment triggers. (Docs/examples
only — paths-ignored, no image rebuild.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EOF
2026-06-25 21:49:23 -04:00
Steve Dudenhoeffer b409dff4ed fix: parseVerdict matches leniently + earliest phrase wins
Build & push image / build-and-push (push) Successful in 8s
A section that led with '**Blocking issues**' (no 'found') fell through to
unknown, so the consolidated header wrongly read 'No material issues found'
(seen live on gpt-oss). Now matches 'blocking issue'/'minor issue'/'no material
issue' and picks the earliest-appearing phrase (the lead verdict). + tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 21:28:19 -04:00
Steve Dudenhoeffer 92bf22a1be fix: correctness lens emoji -> 🎯 ( read like 'no issues')
Build & push image / build-and-push (push) Successful in 8s
A green check on a section reporting blocking issues was misleading; 🎯 signals
accuracy/on-target. Section verdict text already conveys pass/fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 20:50:39 -04:00
Steve Dudenhoeffer 9e582bfaca feat: per-provider concurrency lanes (cloud parallel while local churns)
Build & push image / build-and-push (push) Successful in 7s
entrypoint.sh groups models by provider into lanes that run in PARALLEL; within
a lane at most `cap` models run at once. cap = GADFLY_PROVIDER_CONCURRENCY map
("ollama-cloud=3,m1pro=1") else GADFLY_CONCURRENCY (default 1). So a single
local box stays serial (1 at a time) while cloud models run several at once and
both lanes progress simultaneously. Portable bash (no associative arrays).
Default cap 1 keeps a single-provider pool sequential as before. Pairs with the
per-lens timeout so a slow lane can't starve others. Docs: README Concurrency
section + config table; CLAUDE.md lessons incl. the docker://:latest cache gotcha.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 20:29:08 -04:00
Steve Dudenhoeffer 49f3623204 fix: per-lens timeout, errored-verdict honesty, accurate provider label, tighter lens focus, run timing
Build & push image / build-and-push (push) Successful in 8s
Five fixes, several surfaced by the live bake-off:

- PER-LENS TIMEOUT (critical): GADFLY_TIMEOUT_SECS now applies to EACH specialist
  (own context), not shared across the suite. A slow model (e.g. a 35B local MLX)
  was exhausting the whole 600s budget on lens 1, leaving the rest "step 0:
  context deadline exceeded". Default lowered to 300s (per-lens). cmd/gadfly/main.go.
- ERRORED VERDICT: a lens whose review pass failed no longer counts as "clean".
  Header shows "· ⚠️ N/M lens(es) errored" (or "Review incomplete — all lenses
  errored"); the section reads "⚠️ could not complete". consolidate.go.
- PROVIDER LABEL: the comment header now shows the model's ACTUAL backend from the
  spec ("m1pro/qwen3.6:35b-mlx" -> m1pro), not the global GADFLY_PROVIDER default
  (was wrongly "ollama-cloud" for local models). scripts/run.sh.
- LENS FOCUS: base prompt no longer licenses "report anything serious"; each lens
  stays in its lane, says "nothing in my area" rather than re-reporting another
  lens's bug, with a one-line "Outside my lens:" escape hatch. The re-derive-
  constants discipline is now lane-scoped, not "every lens". system-prompt.txt + specialists.go.
- RUN TIMING: run.sh posts a " Reviewing…" placeholder at model start and updates
  it with "⏱️ reviewed in 1m 23s" on finish, for per-model comparison.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 20:15:40 -04:00
Steve Dudenhoeffer 4b8f9aa39b feat: dynamic auto specialist selection + worker-tier delegation
Build & push image / build-and-push (push) Successful in 33s
Two Phase-2 swarm upgrades:

- auto.go: GADFLY_SPECIALISTS=auto routes the review — a selector model
  (GADFLY_SELECTOR_MODEL, else the review model) reads the changed files + PR
  description and picks the smallest relevant lens set from the catalog, and may
  propose ad-hoc lenses for gaps (e.g. migrations). Structured output via
  majordomo.Generate[T]; capped + de-duped; falls back to the default suite.
- delegate.go: GADFLY_WORKER_MODEL adds a delegate_investigation tool so the
  reviewer offloads mechanical legwork (trace callers, gather usages) to a cheap
  worker sub-agent that returns an evidence-cited digest — the top model reasons
  over summaries, not raw file dumps. Workers get an fs-only toolbox (no
  sub-delegation). Unset = off.

resolveSpecialists now also returns the registry + an auto flag. Docs (README
Specialists + config table, CLAUDE.md, main.go header) + tests updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:35:59 -04:00
Steve Dudenhoeffer 7809d1b93d feat: specialist suite — configurable + custom review lenses (one consolidated comment)
Build & push image / build-and-push (push) Successful in 8s
Replace the single generic review with a suite of focused specialists, each its
own review+recheck pass, merged into ONE comment (a collapsible section per lens,
led by the worst verdict; the optional `improvements` lens never escalates it).

- cmd/gadfly/specialists.go: built-in lenses + default suite (security, correctness,
  maintainability, performance, error-handling) + opt-in (tests, docs, conventions,
  improvements). Selection via GADFLY_SPECIALISTS (csv/"all"); custom defs via
  GADFLY_SPECIALIST_<NAME> env and a repo .gadfly.yml (specialists + define).
  Precedence: built-ins < file < env. Unknown names error but don't sink the run.
- cmd/gadfly/consolidate.go: verdict parse + one-comment render.
- main.go: loop specialists; per-lens failure is an inline notice, never fatal.
  Default timeout bumped to 600s (suite runs sequentially).
- base system prompt trimmed to persona+tools+discipline+output; lens-specific
  focus is appended per specialist (semantic re-derivation discipline kept in base).
- entrypoint default models -> single model (suite already gives breadth; cost ~=
  specialists × models × 2). Adds gopkg.in/yaml.v3.
- docs/examples: README "Specialists" section, examples/.gadfly.yml, stub var,
  CLAUDE.md architecture/config. Dynamic `auto` selection is the planned next step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:23:05 -04:00
Steve Dudenhoeffer 676c9d4f07 ci: skip image rebuild on docs/example-only changes (paths-ignore)
Build & push image / build-and-push (push) Successful in 5s
Tag pushes (v*) bypass path filters, so releases always build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:10:24 -04:00
Steve Dudenhoeffer 04cd260ff9 docs: add CLAUDE.md + provider example configs
Build & push image / build-and-push (push) Successful in 6s
- CLAUDE.md: project goals (advisory-only, real-bugs-not-nits, easy-to-enable,
  provider-agnostic, portable), architecture map, build/test/release, and
  maintenance rules — incl. "keep README + examples/ current with any env/flag/
  provider/trigger change" and the advisory-only invariant.
- examples/: local-ollama.yml, openai-compatible.yml, endpoint-aliases.yml +
  an examples/README index; README setup step points at them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:06:08 -04:00
Steve Dudenhoeffer bd76aa8286 feat: env-defined endpoint aliases (http-capable, local Ollama friendly)
Build & push image / build-and-push (push) Successful in 9s
majordomo's built-in LLM_* env DSNs are HTTPS-only (DSN.BaseURL forces https),
so they can't express a plaintext local Ollama. Add Gadfly-native env families
that register named providers/aliases with majordomo before resolution:

  GADFLY_ENDPOINT_<NAME>="<provider>|<base-url>[|<key>]"  # base URL verbatim (http ok)
  GADFLY_ALIAS_<NAME>="<majordomo spec>"                  # plain alias / failover chain

Then reference them as "<name>/<model>" (or the bare alias) in GADFLY_MODEL(S).
<NAME> lowercases to the registry name, matching majordomo's LLM_* convention.
LLM_* DSNs still work (and are documented) for HTTPS endpoints. + unit tests,
README "Endpoint aliases via env vars", stub example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:01:07 -04:00
Steve Dudenhoeffer d9405f4f69 feat: multi-provider model support via majordomo (local Ollama, OpenAI-compatible, etc.)
Build & push image / build-and-push (push) Successful in 18s
Replace the hardcoded ollama.Cloud binding with majordomo's provider registry,
so Gadfly can target any backend majordomo supports without code changes.

- cmd/gadfly/model.go: resolveModel() — GADFLY_PROVIDER (default ollama-cloud)
  prefixes bare model ids; GADFLY_MODEL may be a full provider/model spec, alias,
  or failover chain (verbatim). GADFLY_BASE_URL constructs openai/ollama/anthropic/
  google directly at a custom endpoint (OpenAI-compatible + local/remote Ollama).
  GADFLY_API_KEY else the provider's standard env var. + buildSpec unit tests.
- run.sh: provider-aware key gate (local Ollama needs none); maps OLLAMA_CLOUD_API_KEY
  -> OLLAMA_API_KEY; provider/base-url/key inherited by the binary. Gadfly-branded comment.
- entrypoint.sh: GADFLY_MODELS alias for OLLAMA_REVIEW_MODELS; provider passthrough.
- examples + README: Models & providers section. Upfront: only the Ollama paths
  (local + OpenAI-compatible-against-Ollama) are tested; OpenAI/Anthropic/Google
  are wired via majordomo but UNTESTED (no spend).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 18:58:00 -04:00
Steve Dudenhoeffer 6123604595 ci: auto build & push image on main (:latest) + v* tags
Build & push image / build-and-push (push) Successful in 58s
Mirror mort-ci.yml's build-and-push: BuildKit secrets (REGISTRY_USER/
REGISTRY_PASSWORD) for private majordomo access instead of build-args, and the
LAN --add-host so the builder can reach the registry. push main -> :latest +
:sha-<short>; tag v* -> :<tag> + :latest; other branches -> :branch-<safe>;
PRs build-only (no push). Optional DISCORD_WEBHOOK_URL notifications.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 18:45:48 -04:00
steve 48936d55b2 Merge initial repo scaffold (keep extracted content) 2026-06-25 18:43:10 -04:00
Steve Dudenhoeffer c0d0152a34 Gadfly: agentic adversarial PR reviewer (initial extraction)
Standalone, Docker-packaged extraction of the agentic PR reviewer that runs in
Gitea Actions: reads the checked-out repo with read-only tools (read_file/grep/
find_files/get_diff), verifies findings before reporting, two-pass review +
adversarial recheck, posts one labeled comment per model. Advisory only.

- cmd/gadfly: reviewer binary (majordomo + Ollama Cloud), zero deps beyond stdlib + majordomo
- entrypoint.sh: container brains — trigger gating, PR clone, model loop (logic out of YAML)
- Dockerfile: multi-stage; build-time module token never reaches the final image
- .gitea/workflows/build-image.yml: tag v* → build & push image
- examples/: ~15-line consumer stub
- system prompt genericized + hardened to re-derive constants/formulas (semantic bugs)

Vibe-coded with Claude Code; see README disclosure. Advisory, never blocks merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 18:42:20 -04:00
steve f2276238da Initial commit 2026-06-25 22:40:35 +00:00