Commit Graph

42 Commits

Author SHA1 Message Date
steve be4bbbcad5 run: fix statusFor — don't relabel a generic error / caller-cancel as timeout (gadfly #11)
executus CI / test (pull_request) Successful in 47s
executus CI / test (push) Successful in 45s
The WithCancelCause+timer rewrite made MaxRuntime surface as Canceled (not
DeadlineExceeded), so statusFor's context.Cause(DeadlineExceeded) check could
relabel (a) a genuine run error as 'timeout' and (b) a caller cancel/deadline as
'timeout' (was 'cancelled'). Convergent gadfly finding (glm-5.2 + cluster).

Fix: keep MaxRuntime as WithTimeout (its DeadlineExceeded propagates → 'timeout',
preserving own-timeout vs caller-cancel), add a NESTED WithCancelCause layer only
for the kill. statusFor consults context.Cause ONLY for ErrCriticKill; everything
else is classified by the run error itself. Tests: generic-error-not-relabeled +
caller-cancel-stays-cancelled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
v0.1.0
2026-06-27 17:00:26 -04:00
steve 390e6cf905 run: critic parity — fuller RecordStep + cause-carrying Kill (distinct status)
executus CI / test (pull_request) Successful in 46s
Adversarial Review (Gadfly) / review (pull_request) Successful in 22m30s
Completes the run-critic seam so a host adapter (mort's agentcritic) has full
fidelity, closing the two limitations gadfly surfaced on mort #1334.

- RecordStep(iter int, resp *llm.Response): the completed step's model response
  is now passed to the critic (was index-only), so a host that records a trace
  (mort's ProgressRecorder) can show what the agent actually produced, not just
  an iteration count. The executor forwards s.Response; the battery ignores it
  (its Progress is count-based).
- CriticHandle.KillCause() error + ErrCriticKill: the executor now distinguishes
  an explicit critic KILL from a natural backstop expiry. runCtx uses a
  cause-carrying cancel (WithCancelCause + a MaxRuntime timer cancelling with
  DeadlineExceeded); the deadline-watch cancels with ErrCriticKill when
  KillCause()!=nil, else DeadlineExceeded. statusFor reads context.Cause →
  killed / timeout / cancelled are now distinct (were all "cancelled"). The
  battery sets killCause from Decision.KillReason on a Kill.

Tests: statusFor "killed" case (cause=ErrCriticKill, err=Canceled); fake handle
+ battery RecordStep/KillCause signatures. Core stays battery-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 16:35:13 -04:00
steve 1a1d5e417b chore: go mod tidy (add missing go.sum entry; CI tidiness gate)
executus CI / test (pull_request) Successful in 2m8s
executus CI / test (push) Successful in 1m45s
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:53:58 -04:00
steve f3bd43b726 ci(gadfly): drop the m1 reviewer (dead weight; keep m5)
executus CI / test (pull_request) Failing after 1m1s
m1/qwen3:14b proved consistently low-value + slowest in the pool over multiple
PRs. Removed from GADFLY_MODELS + GADFLY_PROVIDER_CONCURRENCY + its endpoint so it
never fires again. m5 retained.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:41:14 -04:00
steve 306d575c31 critic: overflow-guard maxSteps += RaiseStepsBy (gadfly 5-model convergence)
executus CI / test (pull_request) Has been cancelled
A buggy/hostile Escalator returning a huge RaiseStepsBy could wrap handle.maxSteps
negative (which the executor reads as defer-to-base). Clamp at math.MaxInt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:38:48 -04:00
steve 4ba83ab905 run: critic can raise a run's step ceiling mid-flight (CriticHandle.MaxSteps)
executus CI / test (pull_request) Failing after 1m1s
Adversarial Review (Gadfly) / review (pull_request) Successful in 21m8s
Prerequisite for a full-fidelity mort agentcritic adapter (which adjusts a
healthy-but-long run's iteration budget, not just its deadline). executus's
CriticHandle was deadline+steer only; this adds the dynamic step ceiling above
an unchanged majordomo (which already exposes WithMaxStepsFunc).

- run.RunInfo += MaxIterations (the run's base ceiling, so a critic can raise it
  relative to the baseline).
- run.CriticHandle += MaxSteps() int — polled by the executor each step via
  agent.WithMaxStepsFunc; <=0 defers to the base. The executor uses
  WithMaxStepsFunc(critic.MaxSteps) when a critic is active, else WithMaxSteps.
- critic battery: handle.maxSteps (initialised from RunInfo.MaxIterations) +
  MaxSteps(); Decision gains RaiseStepsBy so an Escalator can raise the ceiling
  alongside ExtendBy. ExtendOnce default is unchanged (time-only).

Test: a critic returning MaxSteps=5 lets a base-MaxIterations=1 run complete two
tool-dispatch steps past the base ceiling. Core stays battery-free (run doesn't
import critic).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:16:03 -04:00
steve a103cc5e9f ci(gadfly): 9-cloud panel @ 3 models x 3 lenses (9 concurrent)
executus CI / test (push) Failing after 1m57s
Match mort: minimax-m3, glm-5.2, glm-5.1 (SWE-Bench Pro SOTA), kimi-k2.7-code,
deepseek-v4-pro, nemotron-3-super, gpt-oss:120b, qwen3-coder:480b, gemma4 (8
families) + m1/m5 locals. ollama-cloud=3 x lens=3 = 9 concurrent (10 budget).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 12:17:24 -04:00
steve 4d28cd6e2c ci(gadfly): 4-cloud pool — add kimi-k2.7-code + deepseek-v4-pro, drop v4-flash
executus CI / test (push) Failing after 1m2s
Match mort's new cloud panel: minimax-m3, glm-5.2, kimi-k2.7-code (Moonshot),
deepseek-v4-pro (frontier, replaces v4-flash). Keeps m1/m5 locals + the existing
ollama-cloud=1 + lens-concurrency=3 serial-model style.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 11:59:13 -04:00
steve dcaefff756 ci(gadfly): add M1/M5 Macs back to the reviewer pool (full fleet)
executus CI / test (push) Failing after 1m23s
Re-adds the local Macs (m1/qwen3:14b, m5/qwen3.6:35b-mlx) via their foreman endpoints alongside the 3 cloud models. Cloud keeps lens fan-out (ollama-cloud=1 model + lens=3); each Mac runs one model with lenses serial (foreman serializes anyway); all provider lanes parallel. Bumps the job timeout 30->90m for the slow local lanes. With findings telemetry now on, gadfly-reports can quantify whether the Macs earn their keep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 10:44:22 -04:00
steve 97154395e6 C0b: document recordToolStart post-iteration timing (gadfly glm finding)
executus CI / test (pull_request) Failing after 59s
executus CI / test (push) Failing after 1m1s
majordomo's step observer fires post-iteration, so the critic's activity clock
refreshes per-iteration, not mid-tool — a single long tool call won't refresh it
until it returns. Documented + the host-progress-bridge mitigation (mort's
pattern). A true pre-dispatch hook needs majordomo support (follow-up).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 10:10:56 -04:00
steve 4aa06f652e C0b: address verified gadfly findings (panic-safety + test honesty)
executus CI / test (pull_request) Failing after 58s
From PR #9 (minimax + deepseek):
- Run now has a top-level recover() — the "never propagates a panic" promise was
  unenforced; a panicking host Port (Critic/Audit/Palette) on the run goroutine
  now becomes Result.Err instead of unwinding into the caller.
- The critic deadline-watch goroutine recovers panics from a host Deadline()
  (it's a separate goroutine, so Run's recover can't catch it) — a buggy
  CriticHandle can't crash the process.
- CriticHandle interface documents its concurrency contract (Record*/Steer on the
  run goroutine vs Deadline()/Stop() from the watch goroutine — impls must be
  concurrent-safe; the critic battery already is).
- startCritic's dead `soft <= 0 -> noop` guard (withFallbacks already coerces to
  90s) replaced with a defensive inline 90s default, so a bypass of withFallbacks
  still gets a working critic instead of silently none.
- Delivery tests made honest: the old "error path" test only checked the
  early-return (no delivery); added TestDeliverErrorOnRunFailure (in-loop model
  error -> DeliverError to the target) + renamed the early-return test.

Graded all #9 findings in the gadfly MCP.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 10:09:22 -04:00
steve 43b2471737 C0b: wire Critic + Delivery into run.Executor
executus CI / test (pull_request) Failing after 1m0s
Adversarial Review (Gadfly) / review (pull_request) Successful in 5m9s
Continues finishing the executor's run.Ports wiring (after C0's Palette).

Critic (run/critic.go): when Ports.Critic is set and the agent enables it, the
executor calls Monitor at run start, feeds RecordStep/RecordToolStart from the
step observer, drains the critic's Steer messages into the loop via
agent.WithSteer, and binds the run's hard cancellation to the critic's
(extendable) Deadline through a watch goroutine — a healthy-but-slow run gets
room while a hung one is killed. Stop() on run end. Soft timeout from
Defaults.CriticSoftTimeout (default 90s). nil-safe: no critic / not-enabled =
no-op.

Delivery (run/executor.go deliver): after the run, when Ports.Delivery is set
and inv.DeliveryID is non-empty, the executor posts Result.Output (or
DeliverError on failure) to a host-interpreted deliver.Target
{inv.DeliveryKind, inv.DeliveryID}. Empty target = caller reads Result.Output
itself (the synchronous default; the `.agent run` canary). Best-effort +
detached.

tool.Invocation gains DeliveryKind/DeliveryID (host-set egress target).

Tests: critic monitored/fed/steered/stopped when enabled, untouched when not;
delivery posts on a target, skips without one. Deferred: Checkpointer (needs a
majordomo hook to snapshot the running message history).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 10:00:05 -04:00
steve 0c80679719 C0: address verified gadfly findings (trivial fixes)
executus CI / test (pull_request) Failing after 1m31s
executus CI / test (push) Failing after 1m31s
From the PR #8 review (all graded in the gadfly MCP):
- skip empty palette names + dedupe by final tool name, instead of producing a
  "skill__" tool or an opaque box.Add duplicate error.
- delegationResult: no trailing blank line when a non-ok child produced no output.
- delegationErr: fold a child's partial output into the hard-failure error so it
  isn't silently dropped.

Deferred to C0b (design-level, not trivial): route delegation through the
tool.Registry gate/audit wrappers; expose the skill's real input schema to the
LLM instead of a generic inputs map. typed-nil PaletteSource is left as a caller
contract (the == nil guard catches the untyped-nil interface).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 09:53:11 -04:00
steve 9d41987b0e C0: wire Palette delegation into run.Executor (skill__/agent__ tools)
executus CI / test (pull_request) Failing after 1m2s
Adversarial Review (Gadfly) / review (pull_request) Successful in 3m47s
The first cutover prerequisite: the executor now turns an agent's SkillPalette /
SubAgentPalette into delegation tools so a mort agent that delegates works
through run.Executor (the piece the `.agent run` canary needs beyond the
already-wired audit/budget).

- run/palette.go: addDelegationTools builds a skill__<name> tool (structured
  inputs) per SkillPalette entry and an agent__<name> tool (prompt) per
  SubAgentPalette entry, each invoking run.Ports.Palette as a CHILD of the
  current run (parentRunID = inv.RunID, inheriting caller + channel). A non-ok
  child status is surfaced to the parent with the partial output. nil-safe: no
  PaletteSource or empty palette → no delegation tools (unchanged behavior).
- executor.go: call it right after building the low-level toolbox.

Tests: the model calls skill__helper → routed through Palette with the right
name/caller/inputs/parent; nil palette → run still works.

Deferred to C0b (the remaining run.Ports executor wiring): Critic (soft-timeout
monitor + deadline binding + steer), Delivery (output egress for surfaces that
need executor-side delivery), Checkpointer (needs a majordomo message-history
hook to snapshot resumable state). The `.agent run` canary delivers its returned
Result.Output itself, so these aren't on its critical path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 09:28:01 -04:00
steve e37cf415de ci(gadfly): emit findings to gadfly-reports + bump image to sha-d7f364d
executus CI / test (push) Failing after 2m40s
Adds GADFLY_FINDINGS_URL / GADFLY_FINDINGS_TOKEN (user-scope secrets) so each review POSTs its run + findings to the gadfly-reports store, and bumps the pinned gadfly image to sha-d7f364d (the build carrying the findings-emit). Advisory only — emit failures never affect the review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 09:12:46 -04:00
steve a87e7d2c72 fix: address verified gadfly P5 findings (canary robustness)
executus CI / test (pull_request) Failing after 3s
executus CI / test (push) Failing after 1m9s
All 3 cloud models converged (all "minor" — example code, no blocking):
- Consolidate: a model whose every lens errored now reads "review incomplete",
  not a misleading "no issues found" (all 3 models). + test.
- Consolidate: swarm-cancelled (unattributed) cells now surface a "swarm
  cancelled — N cell(s) did not run" banner instead of vanishing (all 3). + test.
- main: io.ReadAll(os.Stdin) error is surfaced (all 3); a TTY stdin no longer
  hangs forever (TTY guard, minimax).
- providerOf: a bare tier name now keys its own PerKey bucket instead of all
  bare tiers collapsing onto "tier" (minimax, glm-5.2) — distinct tiers throttle
  independently.
- Review doc reworded (the closure, not fanout, carries per-cell errors).

Left as documented example-scope behavior: no per-cell timeout (caller supplies
ctx), unknown-severity → lowest rank (no crash).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:34:01 -04:00
steve ea9475da54 P5: light-tier canary — gadfly-shaped reviewer on executus core
executus CI / test (pull_request) Failing after 1m5s
Adversarial Review (Gadfly) / review (pull_request) Successful in 8m18s
examples/reviewer proves the core is sufficient for a static-binary light host
(gadfly's shape) with NO batteries:
- config.Env + model.Configure  -> env-driven model fleet + tier overrides
- model.ParseModelForContext    -> tier resolution + failover
- fanout.Run (PerKey caps)      -> N models x M lenses swarm, per-provider bound
- model.GenerateWith[T]         -> structured findings per (model, lens) cell
- Consolidate                   -> one verdict-led report section per model

Hermetic test runs the full 2x3 swarm against majordomo's fake provider and
asserts the consolidated verdicts. A go list -deps CI check asserts the canary
imports ZERO batteries (the light-tier invariant) — gadfly's go.sum stays free
of gorm/redis/discordgo/sqlite. README + docs updated.

This is the canary; migrating the LIVE gadfly repo onto executus core is a
follow-up (kept separate to not destabilize the active reviewer).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:22:02 -04:00
steve dc2d4ec425 P4c: remaining batteries — checkpoint + schedule + critic
executus CI / test (push) Failing after 1m6s
Completes the P4 battery set (squashed onto main from phase-4c-batteries).
- checkpoint/: run.Checkpointer durable-resume (CheckpointStore + throttled
  handle + Memory).
- schedule/: generic cron Runner (Tick/Loop; no cron grammar of its own).
- critic/: two-tier timeout watchdog (run.Critic) + Escalator policy seam +
  ExtendOnce default.
Includes the verified gadfly #6 fixes (ExtendOnce per-run, Kill-sticky, watch
panic-recovery; checkpoint throttle-after-success; schedule Next-before-Run +
nil-guard + Loop recovery).

P4 battery set complete: audit, budget, persona, skill, checkpoint, schedule,
critic — each nil-safe, each with a default, each core-import-clean. Executor
wiring for Critic/Checkpointer remains a P2 follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:15:32 -04:00
steve c8559676ed P4b: skill noun + contrib/store (SQLite for budget/persona/skill/audit)
executus CI / test (push) Has been cancelled
Merges the skill half of the persona/skill pair plus the second nested module.
(Squashed onto main from phase-4b-skill; the audit/budget/persona batteries it
was stacked on already landed via the P4 merge.)

- skill/: clean-redesign Skill noun + LEAN SkillStore (lifecycle/versions/
  schedule only) + ToRunnable + Memory default.
- contrib/store/: separate go.mod carrying modernc.org/sqlite, so the driver
  never enters the core go.sum. db.Budget()/Personas()/Skills()/Audit() back
  all four store seams (JSON-blob + indexed columns; round-trip tested).
  Includes the verified gadfly #5 fixes (AppendVersion tx+UNIQUE+error,
  Mark*ScheduledRun atomic json_set, busy_timeout, NaN guard).
- CI: builds + tests the nested module and asserts it owns the sqlite driver.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:15:00 -04:00
steve d82cef46b4 fix: address verified gadfly P4/#4 findings (audit/budget/persona)
executus CI / test (push) Failing after 1m4s
Security (all 3 models — HIGH): audit OnTool persisted raw tool args + results
verbatim for the very tools the OnStep narration-redaction flags as secret
(mcp_call/email_send/http_*) — the args/results are what CARRY the secret, so
they landed in skill_run_logs unredacted. Factored the predicate into
isSecretTool() (single source of truth) and OnTool now emits
args_redacted/result_redacted (+ lengths) for secret tools. Test asserts no
secret reaches the log. (persona) webhook_ip_allowlist entries are now
CIDR/IP-validated at load (malformed dropped + warned) instead of accepted raw.

Contract correctness (glm-5.2 + deepseek) — audit Memory now honors its
documented Storage contract: ListChildrenByParent/ListFinishedRunsBefore return
oldest-first; WalkParentChain returns root-first and honors MaxParentChainDepth;
ListRunsFiltered clamps limit (<=0 or >500 -> 50); ListFinishedRunsBefore with
limit<=0 returns none; an explicit RunFilter.Status (incl. "dry_run") matches
regardless of IncludeDryRun; LastRunBySkills counts only status=="ok" unless
includeFailed. (PurgeOlderThan's FinishedAt key is the SAFE behavior — in-flight
runs retained — so the doc was aligned to it, not the impl.)

Error-handling: appendLog now uses a bounded context (auditAppendTimeout=3s) so
a hung backend can't block the run goroutine on the hot path; Sink.StartRun
logs its (still best-effort) failure instead of swallowing it; budget Memory.Get
uses RLock (RWMutex); budget package doc fixed (was skillexec's); Check uses the
budgetWindow constant, not a duplicated literal.

Triaged false-positive: NewNoOpBudget returning BudgetTracker is assignable to
run.Budget (identical method sets) — no change needed.

Core go.sum still free of host/DB deps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:12:19 -04:00
steve 2260480c81 P4: persona noun — Agent + ToRunnable bridge + Memory store
The headline P4 piece (clean redesign): the Agent persona noun, decoupled from
its Discord shell.
- agent.go/storage.go/builtin_loader.go moved from mort's pkg/logic/agents; the
  Storage seam drops the Discord CommandBindingStorage embedding (a host
  concern). The host-entangled files (commands, chatbot_provider, command-
  binding dispatcher, personalization, system) stay in mort.
- runnable.go: Agent.ToRunnable() lowers a persona into run.RunnableAgent — the
  bridge that lets run.Executor run a persona without importing this battery
  (the inversion of agentexec.Run(*agents.Agent)).
- memory.go: NewMemory() — zero-dep in-process persona Storage (all 11 CRUD +
  trigger-query methods).

Tests: ToRunnable field/phase mapping; Memory round-trip. CI invariant: core
imports ZERO from persona.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:12:19 -04:00
steve 9116abcae2 P4: budget battery — DBBudget (rolling 7-day) over run.Budget
Second Tier-2 battery, plugging into run.Ports.Budget:
- budget.go: skillexec's BudgetTracker / NoOpBudget / DBBudget moved clean
  (stdlib only). Check/Commit match run.Budget exactly (compile-time proof in
  run.go: NoOpBudget and *DBBudget are run.Budget).
- storage.go: the BudgetStorage seam + SkillBudget domain, split out of mort's
  GORM file (the GORM impl stays in mort).
- memory.go: NewMemory() — zero-dependency in-process BudgetStorage with the
  7-day rolling-window rollover in Add.

Tests: per-user cap enforced, window rolls over after 7 days, NoOp always
allows. CI invariant: core imports ZERO from the budget battery.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:12:19 -04:00
steve 4d2f85d139 P4: audit battery — run.Audit Sink + Writer + queryable Memory store
First Tier-2 battery, plugging into run.Ports.Audit:
- storage.go/writer.go: skillaudit's Storage interface + per-run Writer moved
  clean (only utils->fmt); the Writer already matches run.RunRecorder's shape.
- sink.go: Sink adapts a Storage to run.Audit (StartRun -> a run row + a Writer
  wrapped as run.RunRecorder, converting run.RunStats on Close). NewSink(nil) is
  equivalent to no audit. Compile-time proofs: Sink is run.Audit, recorder is
  run.RunRecorder.
- memory.go: NewMemory() — a zero-dependency, queryable in-process Storage
  (retains runs + logs; all 17 read/filter/purge/walk methods) so a light host
  gets run history with no setup. Mort keeps its GORM Storage; contrib/store
  adds durable SQLite at P4.

End-to-end test: wire audit.NewSink(audit.NewMemory()) into the executor, run an
agent, and the run is recorded with terminal status/output and queryable by
caller. CI invariant verified: core imports ZERO from the audit battery (proper
battery direction; battery imports core, never the reverse).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:12:19 -04:00
steve d0bd3ec3d9 fix: address verified gadfly P3 review (3-cloud fleet)
executus CI / test (push) Has been cancelled
All 3 cloud models converged on a real access-control bug; fixed it + the
other genuine findings (the false-positives were dropped):

Security (HIGH — all 3 models):
- create_file_url skipped ValidateScope: a same-skill caller could mint a
  PUBLIC url for a file scoped to another user/run. Now runs ValidateScope
  (admin-aware), skipped only for the descendant-grant case — mirroring the
  read tools.

Other real fixes:
- ValidateScope hard-coded `false` at every call site (admin branch dead) ->
  pass inv.CallerIsAdmin (the executor sets it via the host AdminPolicy; still
  false/fail-closed when no admin). Stale "no admin flag" comment corrected.
- create_file_url: ExpiresInSeconds clamped BEFORE the *time.Second multiply
  (huge values overflowed to a negative duration that slipped under the cap,
  minting already-expired tokens); swallowed json.Marshal error now returned.
- RegisterMeta: build the default budget WITH the configured MaxPerRun (was
  NewInMemorySearchBudget(nil) -> hardcoded 10, ignoring MetaDeps.MaxPerRun).
- classify: all-zero scores no longer return a false-positive top-1 winner;
  coerceClassifyScore uses strconv.ParseFloat (rejects trailing garbage like
  "50extra" that fmt.Sscanf silently accepted).
- file_delete: honor the descendant grant (parent can clean up a worker's
  artifacts) — was the lone cross-skill-reject-outright file tool.
- meta tools: input caps truncate at a UTF-8 rune boundary (truncateUTF8), not
  mid-rune.
- think: removed the dead `var _ = fmt.Errorf` import-keeper; file_save default
  aligned to 16 MiB (matched RegisterStore).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:11:54 -04:00
steve 78e6858751 P3: store group — kv_* + file_* tools (agent memory)
RegisterStore(reg, StoreDeps) registers the persistent-memory tools over the
host's KV and/or File backends:
- kv_get/set/list/delete (KVStorage seam)
- file_save/get/get_text/get_metadata/list/delete (FileStorage seam), plus
  file_search (FileSearcher) and create_file_url (FileTokenMinter) when wired.

Near-zero-config: Quota defaults to a generous static cap (staticQuota), the
per-value/per-file caps default, and the kv vs file groups register
independently (a host can take just one). Seams moved clean (interface-only):
kv_storage.go, quota_provider.go, file_descendant_grant.go. The default
in-memory KV/File backends come with contrib/store at P4.

Core go.sum still free of gorm/redis/discordgo/sqlite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:11:54 -04:00
steve 1e201550b3 P3: meta + primitive tool group (think/now/cite + classify/extract/summarize)
Grow executus/tools into a real generic tool library:

- Register(reg): the always-available, zero-config tools — think, now (UTC
  unless a CurrentTimeProvider is wired), cite (inert unless a CitationStorage
  is wired). All nil-safe; a light host calls Register and is useful.
- RegisterMeta(reg, MetaDeps): the LLM-backed meta tools — classify,
  extract_entities, summarize — over the llmmeta helper. Budget defaults to the
  shipped in-memory per-run cap; Files optional; caps default.
- Seams moved (interface/type-only, no host coupling): research_providers.go
  (CurrentTimeProvider/CitationStorage/SearchBudget/PageExtractor/PDFFetcher/…)
  and file_storage.go (FileStorage + FileDomainMeta). Plus the in-memory budget
  default (research_defaults.go) and scope_validate.go.

calculate deferred (drags github.com/Krognol/go-wolfram + a module-path replace
— not worth it in the lean core for one tool). Core go.sum still free of
gorm/redis/discordgo/sqlite/wolfram.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:11:54 -04:00
steve df95425bb5 P3 (kickoff): generic tools/ library + end-to-end tool-using-agent test
Stand up executus/tools — the generic, host-agnostic tool library — and prove
the full pattern end to end:

- tools/tools.go: Register(reg) adds the always-available zero-dependency tools
  (currently `think`). A light host calls it and is immediately useful; backed
  tools (web/store/meta groups) will register via grouped registrars with
  nil-safe Deps as they land.
- tools/think.go: the `think` tool moved from mort (imports only executus/tool).
- tools/integration_test.go: end-to-end proof that the executor runs an agent
  which CALLS a registered tool — the fake model emits a `think` tool call, the
  executor dispatches it through the registry, the model finalises, and the step
  instrumentation captures the `think` step. Exercises the full tool-dispatch
  loop through run.Executor.

Stacked on phase-2-run-kernel (P3 needs run.Executor). Remaining P3: the
meta/web/net/store/compose groups + their Deps + default backends (splitting
mort's default.go grab-bag).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 00:11:54 -04:00
steve 16ddd90914 ci(gadfly): new build sha-d0de034 + per-lens concurrency
executus CI / test (push) Successful in 59s
Bump the gadfly image to sha-d0de034 (adds GADFLY_PROVIDER_LENS_CONCURRENCY)
and move ollama-cloud's concurrency from the MODEL axis to the LENS axis:
- GADFLY_PROVIDER_CONCURRENCY: ollama-cloud=1 (one model at a time)
- GADFLY_PROVIDER_LENS_CONCURRENCY: ollama-cloud=3 (its 3 lenses concurrent)

Net: still 3 models, but reviewed serially — the first model's consolidated
comment lands sooner and each model finishes faster, while the other two
models' comments arrive in series after it (instead of all 3 in parallel).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:57:14 -04:00
steve b35514dfaa ci(gadfly): cloud-only fleet (3 models, drop local Macs)
executus CI / test (push) Successful in 57s
Measured on the P2 review: the local Macs (m1/m5) took 26–29 min with lens
timeouts and found ZERO real bugs, while the two cloud models found every
genuine finding in 6–12 min. Drop the Macs; add glm-5.2:cloud as a third
cloud reviewer. Net: faster (~29→~12 min) and higher signal.

Models: minimax-m3:cloud, deepseek-v4-flash:cloud, glm-5.2:cloud
(ollama-cloud=3 concurrency). timeout-minutes 90→30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve 7b3da87c08 fix: address verified gadfly P2 findings (9 real of 18)
Independently verified all 18 gadfly findings against the code (18-agent
fan-out). Fixed the 9 real ones; the other 9 were false-positive /
hallucinated / valid-tradeoff (no change).

High:
- F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the
  agent loop and nil-panicked. Now a clean error (Run never panics). +test.
- F9 compactor data-leak: renderTranscript sent tool-call args verbatim to
  the summarizer (a possibly-different provider/tier); secret-bearing tool
  args (mcp_call/email_send/http_*/webhook_*) are now redacted, with a doc
  note that result bodies still flow (summary needs them).

Medium/minor:
- F2 compactor error path returned the folded slice, not the original msgs
  (contradicting the documented non-fatal contract) -> return msgs.
- F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) /
  cancelled (Canceled) via statusFor. +test.
- F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs
  min(calls,results) only.
- F5 SetIteration was never called -> RunState.Iteration always 0; the step
  observer now updates it each loop.
- F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue).
- F7 estimateTokens had no default arm (future Part kinds counted as 0);
  unknown parts now counted conservatively.
- F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now
  a clear "response exceeded N bytes" via readCapped.
- F12 step observer captured the caller ctx; now the merged runCtx.
- F13 compaction onFire was nil (doc claimed it logged); now wired to
  audit LogEvent("compaction_fired").
- F11 (no pre-dispatch hook in majordomo) documented honestly as a known
  limitation; F18 UsageSink doc clarified cache tokens are subsets of input.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve dfbc5a42b9 P2: run.Executor — executus is runnable
The capstone of the run kernel: run.Executor.Run(ctx, RunnableAgent, inv)
ties model resolution + the tool registry + majordomo's agent loop +
context compaction + run-bounding + step/audit instrumentation into one
path, with every host concern behind the nil-safe run.Ports.

- run/executor.go: New(Config{Registry, Models, Defaults, Ports, Compactor,
  ContextTokens, SystemHeader}) + Run -> Result{RunID, Output, Steps, Usage,
  Err}. Budget gate (pre-run), model resolve, Audit StartRun/recorder
  (satisfies RunTally, stamped on inv.RunState), toolbox build, step observer
  (zips tool calls/results -> emitter + recorder.OnStep/OnTool), V10
  detached-MaxRuntime context with caller-cancel merged back, compaction wired
  from ContextTokens×ratio, audit Close + Budget Commit on a detached cleanup
  ctx. Zero Ports = a bounded in-memory run (gadfly's case).
- run/executor_test.go: hermetic end-to-end run against majordomo's fake
  provider (hello-world), Budget-rejection (no model call), Audit-port wiring
  (StartRun + Close with terminal status/output). All green under -race.
- examples/minimal upgraded to the real "hello, agentic world" (~15 lines:
  Configure tiers -> run.New -> Run -> print). README/CLAUDE.md updated.

Remaining P2 follow-ups (incremental): wire Critic/Checkpointer/PaletteSource/
Delivery into the loop, multi-phase Pipelines, and the no-tools direct path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve 130c2bdfab P2: move compactor -> compact/ + step instrumentation -> run/steps.go
- compact/compactor.go: the per-run stateful context compactor (token-threshold
  gate, fast-tier middle summarisation, fold memory) lifted from mort's
  skillexec/compactor.go. Self-contained; its only dependency is a ModelResolver
  func (model.ParseModelForContext satisfies it) + a token threshold.
- run/steps.go: the step-emission/instrumentation (stepEmitter, tool->kind/
  summary mapping with redaction, Result.Steps accumulation) from agentexec,
  repointed onto executus/tool.

Both build green. With the run-loop mechanics, RunnableAgent DTO, run.Ports,
compactor, and step instrumentation now all in place, the remaining P2 work is
the run.Executor itself (wiring these + majordomo's agent loop), which makes
executus runnable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve d9b44387f5 fix: address gadfly P1 review (3 low-risk findings)
Triaged gadfly's P1 review (advisory). Fixed the three clearly-correct,
low-risk items; the rest were pre-existing mort behavior or theoretical:

- model/call.go: recordUsage dropped fully-cached responses (input==0 &&
  output==0 early-return missed CacheRead/CacheWrite-only usage, which
  Anthropic/OpenAI prompt-caching bills). Guard now also checks cache tokens.
- llmmeta/helper.go: recordLedger swallowed Storage.RecordMetaCall errors;
  now logs them (slog.Warn) so a non-logging Storage impl can't silently drop
  audit rows.
- model/cloud_sync.go: the ollama.com limit-cache used unbounded io.ReadAll;
  wrapped both reads in io.LimitReader(1 MiB) so a misbehaving endpoint can't
  exhaust memory before the 15s timeout.

Noted-not-fixed (follow-ups / pre-existing mort semantics): tier_not_allowed
ledger label on resolution failure, unknown-model usage attribution, the
cloud_sync https scheme allowlist, and several theoretical/cosmetic items.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve e0e2c0451a ci: sync gadfly review config to mort's foreman-provider setup
Mirror mort's updated adversarial-review.yml: m1/m5 pulled in via the
GADFLY_ENDPOINT_M1/_M5 secrets using gadfly's "foreman" provider type
(providers m1/m5; models m1/qwen3:14b, m5/qwen3.6:35b-mlx), 2 cloud models,
3-lens suite, pinned to the gadfly :sha-6e3a83c image. Header adjusted for
executus; functional config identical to mort's tested version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve c732a677df P2: define nil-safe run.Ports (the inversion spine)
Add run/ports.go: the host seams the executor will consume, every one
nil-safe so a light host runs with the zero Ports (no persistence/audit/
budget/critic/delegation/delivery) and a heavy host wires each to a battery.

Ports mirror mort's existing interfaces so the batteries implement them
directly:
- Audit + RunRecorder (mort skillaudit.Storage/Writer): StartRun -> per-run
  recorder (OnStep/OnTool/LogEvent/Close), recorder satisfies RunTally.
- Budget (mort skillexec.BudgetTracker): Check / Commit.
- Critic + CriticHandle (mort agentcritic): Monitor -> handle with
  RecordStep/RecordToolStart/Steer/Deadline/Stop (the loop wiring finalizes
  with the executor merge).
- Checkpointer (mort agentexec.RunCheckpointer): Save/Complete/Fail.
- PaletteSource (mort SkillInvokerForPalette + AgentInvokerForPalette):
  Resolve/Invoke skill + agent delegation.
Plus host-neutral RunInfo / RunStats.

This completes the P2 inversion DESIGN; the agentexec+skillexec ->
run.Executor merge that consumes these Ports is the remaining P2 work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve fa644f1826 P2 (foundation): run-loop mechanics + RunnableAgent DTO
Stand up the executus/run kernel foundation, decoupled from mort:

- runengine.go: the shared run-loop scaffolding (MergeCancellation,
  CleanupContextTimeout, RunFinalizer/FireFinalizers, RunStateAccessor) moved
  from mort. The accessor's *skillaudit.Writer dependency is inverted to a
  narrow run.RunTally interface (TokenStats + ToolCallsCount) — the kernel
  reads live tallies without importing the audit battery.
- submit.go: the legacy submit-capture compat tool (stdlib + majordomo/llm).
- agent.go: RunnableAgent DTO — the kernel's view of "a thing to run" (tier,
  prompt, caps, palette, phases, critic config). The persona Agent and saved
  Skill will LOWER into this DTO so the kernel never imports a noun battery.
  This is the spine of the agentexec.Run(*agents.Agent) inversion.

run/ builds with only majordomo + executus/tool. The executor merge
(agentexec+skillexec -> run.Executor) and the nil-safe run.Ports
(Audit/Critic/Budget/Checkpointer/PaletteSource) are the next P2 block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve b424261aca P1: model layer (convar->config inversion) + llmmeta
executus CI / test (pull_request) Successful in 58s
Adversarial Review (Gadfly) / review (pull_request) Successful in 26m27s
executus CI / test (push) Successful in 1m2s
Lifts mort's pkg/logic/llms into executus/model, decoupled from mort:

- tiers.go: the tier resolver now reads a host-supplied config.Source under
  "model.tier.<name>" with host-supplied fallbacks (Configure(cfg, defaults,
  ttl)), instead of convar.Manager. Tier NAMES + specs are host config; the
  resolution mechanism (cache, reasoning-suffix dialect, chain validation) is
  generic. No tier names hard-coded in the harness.
- sink.go: usage/trace recording inverted off mort's llmusage/llmtrace into
  UsageSink / TraceSink seams + a model-owned Span, with nil-safe context
  attribution helpers (WithModel/WithTraceID/WithUsageTool/WithUsageUser).
  Both sinks optional (nil = off) so a light host records nothing.
- lane decoration repointed to executus/lane; utils.Errorf -> fmt.Errorf.
- call.go keeps GenerateWith[T] (instrumented structured output) — this is the
  structured-output primitive; no separate structured/ package.
- llmmeta moved over model/ (the meta-LLM helper: tier allowlist + JSON retry
  + ledger). Its tests configure a minimal tier table via TestMain.

New tests cover the inversion: config overrides fallback, tier registration,
reasoning-suffix survival, nested-tier rejection, nil-sink no-ops.

Full module: go build/vet/test -race green; core go.sum still free of
gorm/redis/discordgo/sqlite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:47:13 -04:00
steve 741d7816ed ci: add gadfly adversarial review on PRs (mirrors mort)
executus CI / test (push) Successful in 35s
Same setup as mort: the published gadfly:v1 image as a specialist swarm
(m1/m5 local Macs + 2 cloud models, 3-lens suite), posting one consolidated
advisory comment. Must live on main so it triggers on PRs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:46:47 -04:00
steve dc28b63ad8 P1 (part 1): move skilltools core -> tool/ (clean, verbatim)
executus CI / test (push) Successful in 36s
The tool registry core (registry, permission model, Invocation, gated-tool
wrapper, ssrf guard, hmac, encryption, argcoerce, helpers, rootrun,
session_tools, webhook_rate_limit) had zero mort coupling — it imports only
majordomo/llm + x/crypto/hkdf — so it moves verbatim with a package rename
(skilltools -> tool). All same-package tests came along and pass; the SSRF,
gated-wrapper, encryption and output-pattern invariants are re-anchored here.

majordomo re-enters the module graph (now pinned to the latest, incl. the
front-loaded-output fix). model/ + llmmeta + structured follow next.

Docs: CLAUDE.md now requires README/examples to stay in sync with changes in
the same commit; CI skips docs/example-only pushes via paths-ignore.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:31:47 -04:00
steve d2c18ad5bb ci: make tidy-clean check robust to a missing go.sum
executus CI / test (push) Successful in 26s
P0 has no external deps, so go.sum doesn't exist yet and
`git diff --exit-code go.mod go.sum` errored (exit 128) on the missing
path. Use `git status --porcelain` so the check survives a not-yet-created
go.sum and still catches an untidy go.mod or a newly-created go.sum.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:20:47 -04:00
steve ca243a2d50 P0: stand up executus harness module above majordomo
executus CI / test (push) Failing after 24s
Batteries-included agent-harness base, extracted from mort's agent layer.
This first cut establishes the module + the zero-coupling core primitives:

- lane, dispatchguard, pendingattach, run/progress.go: moved verbatim from mort
- config: host config Source seam + env-var default (nil-safe helpers)
- deliver: output-egress seam + Discard/Stdout defaults
- identity: AdminPolicy + MemberResolver seams (nil-safe)
- fanout: programmatic N×M swarm (bounded global + per-key concurrency)
- README/CLAUDE.md with the vibe-coded banner; CI with Go gates +
  the "core stays majordomo+stdlib only" invariant

Core builds with stdlib only today; majordomo enters at P1 (model/structured).
go build/vet/test -race all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:18:37 -04:00
steve 25feb63c00 Initial commit 2026-06-26 23:04:54 +00:00