Files
executus/.gitea/workflows/adversarial-review.yml
T
steve 84e84f9785
executus CI / test (pull_request) Successful in 58s
ci(gadfly): cloud-only fleet (3 models, drop local Macs)
Measured on the P2 review: the local Macs (m1/m5) took 26–29 min with lens
timeouts and found ZERO real bugs, while the two cloud models found every
genuine finding in 6–12 min. Drop the Macs; add glm-5.2:cloud as a third
cloud reviewer. Net: faster (~29→~12 min) and higher signal.

Models: minimax-m3:cloud, deepseek-v4-flash:cloud, glm-5.2:cloud
(ollama-cloud=3 concurrency). timeout-minutes 90→30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:01:07 -04:00

75 lines
3.4 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Gadfly — agentic adversarial PR reviewer (https://gitea.stevedudenhoeffer.com/steve/gadfly).
#
# Runs the published Gadfly image (pinned to an immutable :sha- tag — act_runner
# caches :latest, and this build is what carries foreman provider-type support)
# as a specialist swarm and posts
# ONE consolidated review comment as gitea-actions. Advisory only — never blocks a
# merge. This reviews executus PRs with 3 ollama-cloud models (3-lens suite). Gadfly
# is a simple system — findings are advisory; always double-check before acting.
name: Adversarial Review (Gadfly)
on:
pull_request:
types: [opened, reopened, ready_for_review]
issue_comment:
types: [created]
workflow_dispatch:
inputs:
pr_number:
description: "PR number to review"
required: true
permissions:
contents: read
issues: write
pull-requests: write
concurrency:
group: gadfly-${{ github.event.issue.number || github.event.pull_request.number || github.event.inputs.pr_number }}
cancel-in-progress: true
jobs:
review:
# Security: only trusted users may trigger a secret-bearing run via a PR
# comment (pull_request + workflow_dispatch are already trusted). Mirrors
# GADFLY_ALLOWED_USERS, the in-container belt-and-suspenders check.
if: >-
github.event_name != 'issue_comment'
|| (github.event.issue.pull_request
&& (github.actor == 'steve'
|| github.actor == 'fizi'
|| github.actor == 'dazed'))
runs-on: ubuntu-latest
# 3 cloud models, all concurrent, 3-lens suite. ~12 min typical.
timeout-minutes: 30
steps:
- uses: docker://gitea.stevedudenhoeffer.com/steve/gadfly:sha-6e3a83c
env:
GITEA_API: ${{ github.server_url }}/api/v1/repos/${{ github.repository }}
GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
OLLAMA_CLOUD_API_KEY: ${{ secrets.OLLAMA_CLOUD_API_KEY }}
# executus uses CLOUD MODELS ONLY. The local Macs (m1/m5) were dropped:
# on a P2-review measurement they took 2629 min (with lens timeouts)
# and contributed ZERO real findings — the two cloud models found every
# genuine bug in 612 min. Cloud-only is faster AND higher-signal.
# 3 cloud models, one consolidated comment each, all run in parallel.
GADFLY_MODELS: "minimax-m3:cloud,deepseek-v4-flash:cloud,glm-5.2:cloud"
GADFLY_PROVIDER_CONCURRENCY: "ollama-cloud=3"
# Default => the 3-lens suite (security, correctness, error-handling).
# Set the repo var GADFLY_SPECIALISTS to override (csv / "all" / "auto").
GADFLY_SPECIALISTS: ${{ vars.GADFLY_SPECIALISTS || 'security,correctness,error-handling' }}
# Per-lens deadline + bounded steps so the slow local models stay sane.
GADFLY_TIMEOUT_SECS: "600"
GADFLY_MAX_STEPS: "14"
# Allow-list for the comment trigger (mirrors the job-level if: guard).
GADFLY_ALLOWED_USERS: "steve,fizi,dazed"
# --- event context (leave as-is) ---
EVENT_NAME: ${{ github.event_name }}
PR: ${{ github.event.pull_request.number || github.event.issue.number || github.event.inputs.pr_number }}
PR_BRANCH: ${{ github.head_ref }}
IS_DRAFT: ${{ github.event.pull_request.draft }}
COMMENT_BODY: ${{ github.event.comment.body }}
COMMENT_ID: ${{ github.event.comment.id }}
ACTOR: ${{ github.actor }}