steve/executus

P2: run kernel + run.Ports inversion — executus is runnable #2

Merged

steve merged 9 commits from phase-2-run-kernel into main

2026-06-27 02:02:21 +00:00

Author	SHA1	Message	Date
steve	84e84f9785	ci(gadfly): cloud-only fleet (3 models, drop local Macs) executus CI / test (pull_request) Successful in 58s Details Measured on the P2 review: the local Macs (m1/m5) took 26–29 min with lens timeouts and found ZERO real bugs, while the two cloud models found every genuine finding in 6–12 min. Drop the Macs; add glm-5.2:cloud as a third cloud reviewer. Net: faster (~29→~12 min) and higher signal. Models: minimax-m3:cloud, deepseek-v4-flash:cloud, glm-5.2:cloud (ollama-cloud=3 concurrency). timeout-minutes 90→30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 22:01:07 -04:00
steve	69c2eb5f47	fix: address verified gadfly P2 findings (9 real of 18) executus CI / test (pull_request) Successful in 1m0s Details Independently verified all 18 gadfly findings against the code (18-agent fan-out). Fixed the 9 real ones; the other 9 were false-positive / hallucinated / valid-tradeoff (no change). High: - F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the agent loop and nil-panicked. Now a clean error (Run never panics). +test. - F9 compactor data-leak: renderTranscript sent tool-call args verbatim to the summarizer (a possibly-different provider/tier); secret-bearing tool args (mcp_call/email_send/http_/webhook_) are now redacted, with a doc note that result bodies still flow (summary needs them). Medium/minor: - F2 compactor error path returned the folded slice, not the original msgs (contradicting the documented non-fatal contract) -> return msgs. - F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) / cancelled (Canceled) via statusFor. +test. - F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs min(calls,results) only. - F5 SetIteration was never called -> RunState.Iteration always 0; the step observer now updates it each loop. - F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue). - F7 estimateTokens had no default arm (future Part kinds counted as 0); unknown parts now counted conservatively. - F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now a clear "response exceeded N bytes" via readCapped. - F12 step observer captured the caller ctx; now the merged runCtx. - F13 compaction onFire was nil (doc claimed it logged); now wired to audit LogEvent("compaction_fired"). - F11 (no pre-dispatch hook in majordomo) documented honestly as a known limitation; F18 UsageSink doc clarified cache tokens are subsets of input. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 21:42:46 -04:00
steve	e76eed0011	P2: run.Executor — executus is runnable executus CI / test (pull_request) Successful in 1m0s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 28m58s Details The capstone of the run kernel: run.Executor.Run(ctx, RunnableAgent, inv) ties model resolution + the tool registry + majordomo's agent loop + context compaction + run-bounding + step/audit instrumentation into one path, with every host concern behind the nil-safe run.Ports. - run/executor.go: New(Config{Registry, Models, Defaults, Ports, Compactor, ContextTokens, SystemHeader}) + Run -> Result{RunID, Output, Steps, Usage, Err}. Budget gate (pre-run), model resolve, Audit StartRun/recorder (satisfies RunTally, stamped on inv.RunState), toolbox build, step observer (zips tool calls/results -> emitter + recorder.OnStep/OnTool), V10 detached-MaxRuntime context with caller-cancel merged back, compaction wired from ContextTokens×ratio, audit Close + Budget Commit on a detached cleanup ctx. Zero Ports = a bounded in-memory run (gadfly's case). - run/executor_test.go: hermetic end-to-end run against majordomo's fake provider (hello-world), Budget-rejection (no model call), Audit-port wiring (StartRun + Close with terminal status/output). All green under -race. - examples/minimal upgraded to the real "hello, agentic world" (~15 lines: Configure tiers -> run.New -> Run -> print). README/CLAUDE.md updated. Remaining P2 follow-ups (incremental): wire Critic/Checkpointer/PaletteSource/ Delivery into the loop, multi-phase Pipelines, and the no-tools direct path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 20:45:10 -04:00
steve	4132af0216	P2: move compactor -> compact/ + step instrumentation -> run/steps.go - compact/compactor.go: the per-run stateful context compactor (token-threshold gate, fast-tier middle summarisation, fold memory) lifted from mort's skillexec/compactor.go. Self-contained; its only dependency is a ModelResolver func (model.ParseModelForContext satisfies it) + a token threshold. - run/steps.go: the step-emission/instrumentation (stepEmitter, tool->kind/ summary mapping with redaction, Result.Steps accumulation) from agentexec, repointed onto executus/tool. Both build green. With the run-loop mechanics, RunnableAgent DTO, run.Ports, compactor, and step instrumentation now all in place, the remaining P2 work is the run.Executor itself (wiring these + majordomo's agent loop), which makes executus runnable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 20:36:32 -04:00
steve	9a89d588b6	fix: address gadfly P1 review (3 low-risk findings) Triaged gadfly's P1 review (advisory). Fixed the three clearly-correct, low-risk items; the rest were pre-existing mort behavior or theoretical: - model/call.go: recordUsage dropped fully-cached responses (input==0 && output==0 early-return missed CacheRead/CacheWrite-only usage, which Anthropic/OpenAI prompt-caching bills). Guard now also checks cache tokens. - llmmeta/helper.go: recordLedger swallowed Storage.RecordMetaCall errors; now logs them (slog.Warn) so a non-logging Storage impl can't silently drop audit rows. - model/cloud_sync.go: the ollama.com limit-cache used unbounded io.ReadAll; wrapped both reads in io.LimitReader(1 MiB) so a misbehaving endpoint can't exhaust memory before the 15s timeout. Noted-not-fixed (follow-ups / pre-existing mort semantics): tier_not_allowed ledger label on resolution failure, unknown-model usage attribution, the cloud_sync https scheme allowlist, and several theoretical/cosmetic items. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 20:32:58 -04:00
steve	0d38783912	Merge main (P1) into phase-2-run-kernel	2026-06-26 20:31:15 -04:00
steve	fe5074c3cf	ci: sync gadfly review config to mort's foreman-provider setup Mirror mort's updated adversarial-review.yml: m1/m5 pulled in via the GADFLY_ENDPOINT_M1/_M5 secrets using gadfly's "foreman" provider type (providers m1/m5; models m1/qwen3:14b, m5/qwen3.6:35b-mlx), 2 cloud models, 3-lens suite, pinned to the gadfly :sha-6e3a83c image. Header adjusted for executus; functional config identical to mort's tested version. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 20:22:26 -04:00
steve	8b30b9f889	P2: define nil-safe run.Ports (the inversion spine) Add run/ports.go: the host seams the executor will consume, every one nil-safe so a light host runs with the zero Ports (no persistence/audit/ budget/critic/delegation/delivery) and a heavy host wires each to a battery. Ports mirror mort's existing interfaces so the batteries implement them directly: - Audit + RunRecorder (mort skillaudit.Storage/Writer): StartRun -> per-run recorder (OnStep/OnTool/LogEvent/Close), recorder satisfies RunTally. - Budget (mort skillexec.BudgetTracker): Check / Commit. - Critic + CriticHandle (mort agentcritic): Monitor -> handle with RecordStep/RecordToolStart/Steer/Deadline/Stop (the loop wiring finalizes with the executor merge). - Checkpointer (mort agentexec.RunCheckpointer): Save/Complete/Fail. - PaletteSource (mort SkillInvokerForPalette + AgentInvokerForPalette): Resolve/Invoke skill + agent delegation. Plus host-neutral RunInfo / RunStats. This completes the P2 inversion DESIGN; the agentexec+skillexec -> run.Executor merge that consumes these Ports is the remaining P2 work. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 20:17:26 -04:00
steve	aab950f1c3	P2 (foundation): run-loop mechanics + RunnableAgent DTO Stand up the executus/run kernel foundation, decoupled from mort: - runengine.go: the shared run-loop scaffolding (MergeCancellation, CleanupContextTimeout, RunFinalizer/FireFinalizers, RunStateAccessor) moved from mort. The accessor's skillaudit.Writer dependency is inverted to a narrow run.RunTally interface (TokenStats + ToolCallsCount) — the kernel reads live tallies without importing the audit battery. - submit.go: the legacy submit-capture compat tool (stdlib + majordomo/llm). - agent.go: RunnableAgent DTO — the kernel's view of "a thing to run" (tier, prompt, caps, palette, phases, critic config). The persona Agent and saved Skill will LOWER into this DTO so the kernel never imports a noun battery. This is the spine of the agentexec.Run(agents.Agent) inversion. run/ builds with only majordomo + executus/tool. The executor merge (agentexec+skillexec -> run.Executor) and the nil-safe run.Ports (Audit/Critic/Budget/Checkpointer/PaletteSource) are the next P2 block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 19:58:20 -04:00