Commit Graph

23 Commits

Author SHA1 Message Date
steve c8a87f1733 fix: address verified gadfly P5/#5 findings (contrib/store concurrency)
executus CI / test (pull_request) Successful in 1m34s
All 3 cloud models converged on real concurrency bugs in the SQLite stores:

- AppendVersion (HIGH): the seq key was `SELECT MAX(seq)+1` then INSERT in two
  un-transacted statements with a NON-unique index, AND the Scan error was
  swallowed (seq stayed 0 on failure). Concurrent appends could both land the
  same seq, silently breaking newest-first ordering. Now: one transaction, the
  Scan error is propagated, the (skill_id, seq) index is UNIQUE (the loser of a
  race fails loudly), and an empty SkillID is rejected.
- MarkScheduledRun / MarkAgentScheduledRun (all 3): replaced the Get→mutate→Save
  read-modify-write (lost-update window) with a single atomic UPDATE using
  json_set, so a concurrent Mark/edit can't clobber it. json_set keeps the JSON
  blob's NextRunAt/LastScheduledRunAt consistent with the indexed column;
  RFC3339Nano matches Go's time encoding so the blob still round-trips (tested).
- Open: actually applies PRAGMA busy_timeout=5000 (the doc advertised it but it
  was never set) — a contended writer waits instead of erroring SQLITE_BUSY.
- budgetStore.Add: rejects NaN/Inf secondsUsed (would irrecoverably poison the
  column).

Triaged-but-kept: plaintext webhook secret (documented design, high-entropy URL
key, pre-existing); SQL()/free-form `where` helpers (no untrusted input reaches
them — defense-in-depth notes only).

Core go.sum still free of host/DB deps; contrib/store green (incl. a json_set
blob-round-trip test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 23:37:37 -04:00
steve b194a9621d P4: contrib/store — audit SQLite store (run history complete)
executus CI / test (pull_request) Successful in 1m41s
db.Audit() satisfies audit.Storage (all 17 methods) over SQLite: one indexed
row per run (+ a JSON inputs blob), one row per log event. Filter/list/walk
queries are indexed on the columns they filter (skill_id, caller_id,
parent_run_id, started_at); WalkParentChain follows parent_run_id with a
seen-set guard; LastRunBySkills is a grouped MAX.

Test covers run start/finish round-trip (inputs map + token roll-up), log
append + ordered read, parent/child + ancestor-chain walks, caller listing,
TopLevelOnly filter, and the last-run-per-skill map.

contrib/store now backs ALL four store seams — budget + persona + skill + audit
— so a host gets turnkey durable persistence (run history, budgets, agents,
skills) with zero store code. Core go.sum still has 0 sqlite refs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:50:21 -04:00
steve 954efde474 P4: contrib/store — skill SQLite store (lifecycle + versions)
executus CI / test (pull_request) Successful in 1m36s
Adversarial Review (Gadfly) / review (pull_request) Successful in 11m23s
db.Skills() satisfies skill.SkillStore over SQLite, same JSON-blob + indexed
columns pattern. Versions live in their own table (each SkillVersion embeds a
full Skill snapshot as JSON), ordered newest-first by an append seq.

Test: round-trip (Tools, ExposeAsChatbotTool), visibility listing
(public/shared/private with SharedWith filtered in Go), chatbot-exposed,
newest-first versions + GetVersionByID, scheduled-due query + MarkScheduledRun.

contrib/store now covers budget + persona + skill; audit store next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:47:35 -04:00
steve cb16008b14 P4: contrib/store — persona SQLite store (JSON-blob round-trip)
db.Personas() satisfies persona.Storage over SQLite. Each Agent is stored as a
JSON blob with extracted indexed columns (owner_id, name, webhook_secret,
chatbot_channel_filter, schedule, next_run_at) — so the WHOLE struct round-trips
(no domain<->GORM<->DB field-loss footgun) while the lookups stay indexable.

Test proves the round-trip preserves nested + map fields (SkillPalette,
StateReactEmoji), the owner/name + webhook + chatbot-filter queries, the
scheduled-due query, and MarkAgentScheduledRun clearing the due window.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:46:09 -04:00
steve 95f564ac4e P4: contrib/store — second module (pure-Go SQLite), budget store
Establish the nested persistence module — the architectural reason the core
stays lean: a SEPARATE go.mod carrying modernc.org/sqlite (pure Go, no cgo), so
the SQLite driver NEVER enters the executus core go.sum. A static-binary host
(gadfly) importing only the core stays static; a host wanting turnkey
persistence imports contrib/store.

- sqlite.go: store.Open(dsn) -> *DB (one SQLite file), accessor-per-seam.
- budget_store.go: db.Budget() satisfies budget.BudgetStorage; Add() does the
  7-day window rollover atomically inside a transaction (concurrent Adds can't
  race the read-modify-write — the in-memory store's one weak spot).
- Conformance test: budget.NewDBBudget over the SQLite store passes the SAME
  rolling-window contract as the in-memory store.
- CI: a new step builds + tests contrib/store on its own AND asserts it carries
  the sqlite driver the core forbids (proof the split works). Verified: core
  go.sum has 0 sqlite refs; contrib/store go.sum has it.

persona/skill/audit SQLite stores follow next (same JSON-blob + indexed-columns
pattern, sidestepping the three-layer field-loss footgun).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:44:44 -04:00
steve 41659b2412 P4: skill noun — domain + LEAN SkillStore + ToRunnable + Memory
The skill half of the persona/skill pair, as a clean redesign (not a faithful
lift of mort's 60-method skills.Storage kitchen sink):
- skill.go/skill_version.go/validate.go/inputs.go/schedule.go moved clean; the
  only host couplings severed: llms.IsTierName -> model.IsTierName, and the
  chatbot DefaultChatbotInputName const localized.
- store.go: a DELIBERATELY LEAN SkillStore — skill lifecycle (CRUD + visibility)
  + versioning + scheduling ONLY. The KV/file/quota sub-stores that were fused
  into mort's interface are the tools/ store seams; email/channel grants stay
  host concerns.
- runnable.go: Skill.ToRunnable() lowers a skill into run.RunnableAgent (flat
  tool list, no palette — composition is a host concern); DueAt() helper.
- memory.go: NewMemory() — zero-dep in-process SkillStore (visibility filters,
  newest-first versions).

Tests: ToRunnable mapping, visibility (public/shared/private) listing, version
ordering + lookup. No mort dependency (go.mod tidy clean); core imports ZERO
from skill.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:40:58 -04:00
steve f521c583bd P4: persona noun — Agent + ToRunnable bridge + Memory store
executus CI / test (pull_request) Failing after 58s
Adversarial Review (Gadfly) / review (pull_request) Successful in 20m1s
The headline P4 piece (clean redesign): the Agent persona noun, decoupled from
its Discord shell.
- agent.go/storage.go/builtin_loader.go moved from mort's pkg/logic/agents; the
  Storage seam drops the Discord CommandBindingStorage embedding (a host
  concern). The host-entangled files (commands, chatbot_provider, command-
  binding dispatcher, personalization, system) stay in mort.
- runnable.go: Agent.ToRunnable() lowers a persona into run.RunnableAgent — the
  bridge that lets run.Executor run a persona without importing this battery
  (the inversion of agentexec.Run(*agents.Agent)).
- memory.go: NewMemory() — zero-dep in-process persona Storage (all 11 CRUD +
  trigger-query methods).

Tests: ToRunnable field/phase mapping; Memory round-trip. CI invariant: core
imports ZERO from persona.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:24:18 -04:00
steve 3f14aae032 P4: budget battery — DBBudget (rolling 7-day) over run.Budget
Second Tier-2 battery, plugging into run.Ports.Budget:
- budget.go: skillexec's BudgetTracker / NoOpBudget / DBBudget moved clean
  (stdlib only). Check/Commit match run.Budget exactly (compile-time proof in
  run.go: NoOpBudget and *DBBudget are run.Budget).
- storage.go: the BudgetStorage seam + SkillBudget domain, split out of mort's
  GORM file (the GORM impl stays in mort).
- memory.go: NewMemory() — zero-dependency in-process BudgetStorage with the
  7-day rolling-window rollover in Add.

Tests: per-user cap enforced, window rolls over after 7 days, NoOp always
allows. CI invariant: core imports ZERO from the budget battery.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:17:51 -04:00
steve addf3a19d1 P4: audit battery — run.Audit Sink + Writer + queryable Memory store
First Tier-2 battery, plugging into run.Ports.Audit:
- storage.go/writer.go: skillaudit's Storage interface + per-run Writer moved
  clean (only utils->fmt); the Writer already matches run.RunRecorder's shape.
- sink.go: Sink adapts a Storage to run.Audit (StartRun -> a run row + a Writer
  wrapped as run.RunRecorder, converting run.RunStats on Close). NewSink(nil) is
  equivalent to no audit. Compile-time proofs: Sink is run.Audit, recorder is
  run.RunRecorder.
- memory.go: NewMemory() — a zero-dependency, queryable in-process Storage
  (retains runs + logs; all 17 read/filter/purge/walk methods) so a light host
  gets run history with no setup. Mort keeps its GORM Storage; contrib/store
  adds durable SQLite at P4.

End-to-end test: wire audit.NewSink(audit.NewMemory()) into the executor, run an
agent, and the run is recorded with terminal status/output and queryable by
caller. CI invariant verified: core imports ZERO from the audit battery (proper
battery direction; battery imports core, never the reverse).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 22:14:37 -04:00
steve b35514dfaa ci(gadfly): cloud-only fleet (3 models, drop local Macs)
executus CI / test (push) Successful in 57s
Measured on the P2 review: the local Macs (m1/m5) took 26–29 min with lens
timeouts and found ZERO real bugs, while the two cloud models found every
genuine finding in 6–12 min. Drop the Macs; add glm-5.2:cloud as a third
cloud reviewer. Net: faster (~29→~12 min) and higher signal.

Models: minimax-m3:cloud, deepseek-v4-flash:cloud, glm-5.2:cloud
(ollama-cloud=3 concurrency). timeout-minutes 90→30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve 7b3da87c08 fix: address verified gadfly P2 findings (9 real of 18)
Independently verified all 18 gadfly findings against the code (18-agent
fan-out). Fixed the 9 real ones; the other 9 were false-positive /
hallucinated / valid-tradeoff (no change).

High:
- F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the
  agent loop and nil-panicked. Now a clean error (Run never panics). +test.
- F9 compactor data-leak: renderTranscript sent tool-call args verbatim to
  the summarizer (a possibly-different provider/tier); secret-bearing tool
  args (mcp_call/email_send/http_*/webhook_*) are now redacted, with a doc
  note that result bodies still flow (summary needs them).

Medium/minor:
- F2 compactor error path returned the folded slice, not the original msgs
  (contradicting the documented non-fatal contract) -> return msgs.
- F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) /
  cancelled (Canceled) via statusFor. +test.
- F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs
  min(calls,results) only.
- F5 SetIteration was never called -> RunState.Iteration always 0; the step
  observer now updates it each loop.
- F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue).
- F7 estimateTokens had no default arm (future Part kinds counted as 0);
  unknown parts now counted conservatively.
- F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now
  a clear "response exceeded N bytes" via readCapped.
- F12 step observer captured the caller ctx; now the merged runCtx.
- F13 compaction onFire was nil (doc claimed it logged); now wired to
  audit LogEvent("compaction_fired").
- F11 (no pre-dispatch hook in majordomo) documented honestly as a known
  limitation; F18 UsageSink doc clarified cache tokens are subsets of input.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve dfbc5a42b9 P2: run.Executor — executus is runnable
The capstone of the run kernel: run.Executor.Run(ctx, RunnableAgent, inv)
ties model resolution + the tool registry + majordomo's agent loop +
context compaction + run-bounding + step/audit instrumentation into one
path, with every host concern behind the nil-safe run.Ports.

- run/executor.go: New(Config{Registry, Models, Defaults, Ports, Compactor,
  ContextTokens, SystemHeader}) + Run -> Result{RunID, Output, Steps, Usage,
  Err}. Budget gate (pre-run), model resolve, Audit StartRun/recorder
  (satisfies RunTally, stamped on inv.RunState), toolbox build, step observer
  (zips tool calls/results -> emitter + recorder.OnStep/OnTool), V10
  detached-MaxRuntime context with caller-cancel merged back, compaction wired
  from ContextTokens×ratio, audit Close + Budget Commit on a detached cleanup
  ctx. Zero Ports = a bounded in-memory run (gadfly's case).
- run/executor_test.go: hermetic end-to-end run against majordomo's fake
  provider (hello-world), Budget-rejection (no model call), Audit-port wiring
  (StartRun + Close with terminal status/output). All green under -race.
- examples/minimal upgraded to the real "hello, agentic world" (~15 lines:
  Configure tiers -> run.New -> Run -> print). README/CLAUDE.md updated.

Remaining P2 follow-ups (incremental): wire Critic/Checkpointer/PaletteSource/
Delivery into the loop, multi-phase Pipelines, and the no-tools direct path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve 130c2bdfab P2: move compactor -> compact/ + step instrumentation -> run/steps.go
- compact/compactor.go: the per-run stateful context compactor (token-threshold
  gate, fast-tier middle summarisation, fold memory) lifted from mort's
  skillexec/compactor.go. Self-contained; its only dependency is a ModelResolver
  func (model.ParseModelForContext satisfies it) + a token threshold.
- run/steps.go: the step-emission/instrumentation (stepEmitter, tool->kind/
  summary mapping with redaction, Result.Steps accumulation) from agentexec,
  repointed onto executus/tool.

Both build green. With the run-loop mechanics, RunnableAgent DTO, run.Ports,
compactor, and step instrumentation now all in place, the remaining P2 work is
the run.Executor itself (wiring these + majordomo's agent loop), which makes
executus runnable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve d9b44387f5 fix: address gadfly P1 review (3 low-risk findings)
Triaged gadfly's P1 review (advisory). Fixed the three clearly-correct,
low-risk items; the rest were pre-existing mort behavior or theoretical:

- model/call.go: recordUsage dropped fully-cached responses (input==0 &&
  output==0 early-return missed CacheRead/CacheWrite-only usage, which
  Anthropic/OpenAI prompt-caching bills). Guard now also checks cache tokens.
- llmmeta/helper.go: recordLedger swallowed Storage.RecordMetaCall errors;
  now logs them (slog.Warn) so a non-logging Storage impl can't silently drop
  audit rows.
- model/cloud_sync.go: the ollama.com limit-cache used unbounded io.ReadAll;
  wrapped both reads in io.LimitReader(1 MiB) so a misbehaving endpoint can't
  exhaust memory before the 15s timeout.

Noted-not-fixed (follow-ups / pre-existing mort semantics): tier_not_allowed
ledger label on resolution failure, unknown-model usage attribution, the
cloud_sync https scheme allowlist, and several theoretical/cosmetic items.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve e0e2c0451a ci: sync gadfly review config to mort's foreman-provider setup
Mirror mort's updated adversarial-review.yml: m1/m5 pulled in via the
GADFLY_ENDPOINT_M1/_M5 secrets using gadfly's "foreman" provider type
(providers m1/m5; models m1/qwen3:14b, m5/qwen3.6:35b-mlx), 2 cloud models,
3-lens suite, pinned to the gadfly :sha-6e3a83c image. Header adjusted for
executus; functional config identical to mort's tested version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve c732a677df P2: define nil-safe run.Ports (the inversion spine)
Add run/ports.go: the host seams the executor will consume, every one
nil-safe so a light host runs with the zero Ports (no persistence/audit/
budget/critic/delegation/delivery) and a heavy host wires each to a battery.

Ports mirror mort's existing interfaces so the batteries implement them
directly:
- Audit + RunRecorder (mort skillaudit.Storage/Writer): StartRun -> per-run
  recorder (OnStep/OnTool/LogEvent/Close), recorder satisfies RunTally.
- Budget (mort skillexec.BudgetTracker): Check / Commit.
- Critic + CriticHandle (mort agentcritic): Monitor -> handle with
  RecordStep/RecordToolStart/Steer/Deadline/Stop (the loop wiring finalizes
  with the executor merge).
- Checkpointer (mort agentexec.RunCheckpointer): Save/Complete/Fail.
- PaletteSource (mort SkillInvokerForPalette + AgentInvokerForPalette):
  Resolve/Invoke skill + agent delegation.
Plus host-neutral RunInfo / RunStats.

This completes the P2 inversion DESIGN; the agentexec+skillexec ->
run.Executor merge that consumes these Ports is the remaining P2 work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve fa644f1826 P2 (foundation): run-loop mechanics + RunnableAgent DTO
Stand up the executus/run kernel foundation, decoupled from mort:

- runengine.go: the shared run-loop scaffolding (MergeCancellation,
  CleanupContextTimeout, RunFinalizer/FireFinalizers, RunStateAccessor) moved
  from mort. The accessor's *skillaudit.Writer dependency is inverted to a
  narrow run.RunTally interface (TokenStats + ToolCallsCount) — the kernel
  reads live tallies without importing the audit battery.
- submit.go: the legacy submit-capture compat tool (stdlib + majordomo/llm).
- agent.go: RunnableAgent DTO — the kernel's view of "a thing to run" (tier,
  prompt, caps, palette, phases, critic config). The persona Agent and saved
  Skill will LOWER into this DTO so the kernel never imports a noun battery.
  This is the spine of the agentexec.Run(*agents.Agent) inversion.

run/ builds with only majordomo + executus/tool. The executor merge
(agentexec+skillexec -> run.Executor) and the nil-safe run.Ports
(Audit/Critic/Budget/Checkpointer/PaletteSource) are the next P2 block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 02:02:21 +00:00
steve b424261aca P1: model layer (convar->config inversion) + llmmeta
executus CI / test (pull_request) Successful in 58s
Adversarial Review (Gadfly) / review (pull_request) Successful in 26m27s
executus CI / test (push) Successful in 1m2s
Lifts mort's pkg/logic/llms into executus/model, decoupled from mort:

- tiers.go: the tier resolver now reads a host-supplied config.Source under
  "model.tier.<name>" with host-supplied fallbacks (Configure(cfg, defaults,
  ttl)), instead of convar.Manager. Tier NAMES + specs are host config; the
  resolution mechanism (cache, reasoning-suffix dialect, chain validation) is
  generic. No tier names hard-coded in the harness.
- sink.go: usage/trace recording inverted off mort's llmusage/llmtrace into
  UsageSink / TraceSink seams + a model-owned Span, with nil-safe context
  attribution helpers (WithModel/WithTraceID/WithUsageTool/WithUsageUser).
  Both sinks optional (nil = off) so a light host records nothing.
- lane decoration repointed to executus/lane; utils.Errorf -> fmt.Errorf.
- call.go keeps GenerateWith[T] (instrumented structured output) — this is the
  structured-output primitive; no separate structured/ package.
- llmmeta moved over model/ (the meta-LLM helper: tier allowlist + JSON retry
  + ledger). Its tests configure a minimal tier table via TestMain.

New tests cover the inversion: config overrides fallback, tier registration,
reasoning-suffix survival, nested-tier rejection, nil-sink no-ops.

Full module: go build/vet/test -race green; core go.sum still free of
gorm/redis/discordgo/sqlite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:47:13 -04:00
steve 741d7816ed ci: add gadfly adversarial review on PRs (mirrors mort)
executus CI / test (push) Successful in 35s
Same setup as mort: the published gadfly:v1 image as a specialist swarm
(m1/m5 local Macs + 2 cloud models, 3-lens suite), posting one consolidated
advisory comment. Must live on main so it triggers on PRs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:46:47 -04:00
steve dc28b63ad8 P1 (part 1): move skilltools core -> tool/ (clean, verbatim)
executus CI / test (push) Successful in 36s
The tool registry core (registry, permission model, Invocation, gated-tool
wrapper, ssrf guard, hmac, encryption, argcoerce, helpers, rootrun,
session_tools, webhook_rate_limit) had zero mort coupling — it imports only
majordomo/llm + x/crypto/hkdf — so it moves verbatim with a package rename
(skilltools -> tool). All same-package tests came along and pass; the SSRF,
gated-wrapper, encryption and output-pattern invariants are re-anchored here.

majordomo re-enters the module graph (now pinned to the latest, incl. the
front-loaded-output fix). model/ + llmmeta + structured follow next.

Docs: CLAUDE.md now requires README/examples to stay in sync with changes in
the same commit; CI skips docs/example-only pushes via paths-ignore.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:31:47 -04:00
steve d2c18ad5bb ci: make tidy-clean check robust to a missing go.sum
executus CI / test (push) Successful in 26s
P0 has no external deps, so go.sum doesn't exist yet and
`git diff --exit-code go.mod go.sum` errored (exit 128) on the missing
path. Use `git status --porcelain` so the check survives a not-yet-created
go.sum and still catches an untidy go.mod or a newly-created go.sum.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:20:47 -04:00
steve ca243a2d50 P0: stand up executus harness module above majordomo
executus CI / test (push) Failing after 24s
Batteries-included agent-harness base, extracted from mort's agent layer.
This first cut establishes the module + the zero-coupling core primitives:

- lane, dispatchguard, pendingattach, run/progress.go: moved verbatim from mort
- config: host config Source seam + env-var default (nil-safe helpers)
- deliver: output-egress seam + Discard/Stdout defaults
- identity: AdminPolicy + MemberResolver seams (nil-safe)
- fanout: programmatic N×M swarm (bounded global + per-key concurrency)
- README/CLAUDE.md with the vibe-coded banner; CI with Go gates +
  the "core stays majordomo+stdlib only" invariant

Core builds with stdlib only today; majordomo enters at P1 (model/structured).
go build/vet/test -race all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:18:37 -04:00
steve 25feb63c00 Initial commit 2026-06-26 23:04:54 +00:00