Security (all 3 models — HIGH): audit OnTool persisted raw tool args + results
verbatim for the very tools the OnStep narration-redaction flags as secret
(mcp_call/email_send/http_*) — the args/results are what CARRY the secret, so
they landed in skill_run_logs unredacted. Factored the predicate into
isSecretTool() (single source of truth) and OnTool now emits
args_redacted/result_redacted (+ lengths) for secret tools. Test asserts no
secret reaches the log. (persona) webhook_ip_allowlist entries are now
CIDR/IP-validated at load (malformed dropped + warned) instead of accepted raw.
Contract correctness (glm-5.2 + deepseek) — audit Memory now honors its
documented Storage contract: ListChildrenByParent/ListFinishedRunsBefore return
oldest-first; WalkParentChain returns root-first and honors MaxParentChainDepth;
ListRunsFiltered clamps limit (<=0 or >500 -> 50); ListFinishedRunsBefore with
limit<=0 returns none; an explicit RunFilter.Status (incl. "dry_run") matches
regardless of IncludeDryRun; LastRunBySkills counts only status=="ok" unless
includeFailed. (PurgeOlderThan's FinishedAt key is the SAFE behavior — in-flight
runs retained — so the doc was aligned to it, not the impl.)
Error-handling: appendLog now uses a bounded context (auditAppendTimeout=3s) so
a hung backend can't block the run goroutine on the hot path; Sink.StartRun
logs its (still best-effort) failure instead of swallowing it; budget Memory.Get
uses RLock (RWMutex); budget package doc fixed (was skillexec's); Check uses the
budgetWindow constant, not a duplicated literal.
Triaged false-positive: NewNoOpBudget returning BudgetTracker is assignable to
run.Budget (identical method sets) — no change needed.
Core go.sum still free of host/DB deps.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The headline P4 piece (clean redesign): the Agent persona noun, decoupled from
its Discord shell.
- agent.go/storage.go/builtin_loader.go moved from mort's pkg/logic/agents; the
Storage seam drops the Discord CommandBindingStorage embedding (a host
concern). The host-entangled files (commands, chatbot_provider, command-
binding dispatcher, personalization, system) stay in mort.
- runnable.go: Agent.ToRunnable() lowers a persona into run.RunnableAgent — the
bridge that lets run.Executor run a persona without importing this battery
(the inversion of agentexec.Run(*agents.Agent)).
- memory.go: NewMemory() — zero-dep in-process persona Storage (all 11 CRUD +
trigger-query methods).
Tests: ToRunnable field/phase mapping; Memory round-trip. CI invariant: core
imports ZERO from persona.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second Tier-2 battery, plugging into run.Ports.Budget:
- budget.go: skillexec's BudgetTracker / NoOpBudget / DBBudget moved clean
(stdlib only). Check/Commit match run.Budget exactly (compile-time proof in
run.go: NoOpBudget and *DBBudget are run.Budget).
- storage.go: the BudgetStorage seam + SkillBudget domain, split out of mort's
GORM file (the GORM impl stays in mort).
- memory.go: NewMemory() — zero-dependency in-process BudgetStorage with the
7-day rolling-window rollover in Add.
Tests: per-user cap enforced, window rolls over after 7 days, NoOp always
allows. CI invariant: core imports ZERO from the budget battery.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First Tier-2 battery, plugging into run.Ports.Audit:
- storage.go/writer.go: skillaudit's Storage interface + per-run Writer moved
clean (only utils->fmt); the Writer already matches run.RunRecorder's shape.
- sink.go: Sink adapts a Storage to run.Audit (StartRun -> a run row + a Writer
wrapped as run.RunRecorder, converting run.RunStats on Close). NewSink(nil) is
equivalent to no audit. Compile-time proofs: Sink is run.Audit, recorder is
run.RunRecorder.
- memory.go: NewMemory() — a zero-dependency, queryable in-process Storage
(retains runs + logs; all 17 read/filter/purge/walk methods) so a light host
gets run history with no setup. Mort keeps its GORM Storage; contrib/store
adds durable SQLite at P4.
End-to-end test: wire audit.NewSink(audit.NewMemory()) into the executor, run an
agent, and the run is recorded with terminal status/output and queryable by
caller. CI invariant verified: core imports ZERO from the audit battery (proper
battery direction; battery imports core, never the reverse).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All 3 cloud models converged on a real access-control bug; fixed it + the
other genuine findings (the false-positives were dropped):
Security (HIGH — all 3 models):
- create_file_url skipped ValidateScope: a same-skill caller could mint a
PUBLIC url for a file scoped to another user/run. Now runs ValidateScope
(admin-aware), skipped only for the descendant-grant case — mirroring the
read tools.
Other real fixes:
- ValidateScope hard-coded `false` at every call site (admin branch dead) ->
pass inv.CallerIsAdmin (the executor sets it via the host AdminPolicy; still
false/fail-closed when no admin). Stale "no admin flag" comment corrected.
- create_file_url: ExpiresInSeconds clamped BEFORE the *time.Second multiply
(huge values overflowed to a negative duration that slipped under the cap,
minting already-expired tokens); swallowed json.Marshal error now returned.
- RegisterMeta: build the default budget WITH the configured MaxPerRun (was
NewInMemorySearchBudget(nil) -> hardcoded 10, ignoring MetaDeps.MaxPerRun).
- classify: all-zero scores no longer return a false-positive top-1 winner;
coerceClassifyScore uses strconv.ParseFloat (rejects trailing garbage like
"50extra" that fmt.Sscanf silently accepted).
- file_delete: honor the descendant grant (parent can clean up a worker's
artifacts) — was the lone cross-skill-reject-outright file tool.
- meta tools: input caps truncate at a UTF-8 rune boundary (truncateUTF8), not
mid-rune.
- think: removed the dead `var _ = fmt.Errorf` import-keeper; file_save default
aligned to 16 MiB (matched RegisterStore).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
RegisterStore(reg, StoreDeps) registers the persistent-memory tools over the
host's KV and/or File backends:
- kv_get/set/list/delete (KVStorage seam)
- file_save/get/get_text/get_metadata/list/delete (FileStorage seam), plus
file_search (FileSearcher) and create_file_url (FileTokenMinter) when wired.
Near-zero-config: Quota defaults to a generous static cap (staticQuota), the
per-value/per-file caps default, and the kv vs file groups register
independently (a host can take just one). Seams moved clean (interface-only):
kv_storage.go, quota_provider.go, file_descendant_grant.go. The default
in-memory KV/File backends come with contrib/store at P4.
Core go.sum still free of gorm/redis/discordgo/sqlite.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Grow executus/tools into a real generic tool library:
- Register(reg): the always-available, zero-config tools — think, now (UTC
unless a CurrentTimeProvider is wired), cite (inert unless a CitationStorage
is wired). All nil-safe; a light host calls Register and is useful.
- RegisterMeta(reg, MetaDeps): the LLM-backed meta tools — classify,
extract_entities, summarize — over the llmmeta helper. Budget defaults to the
shipped in-memory per-run cap; Files optional; caps default.
- Seams moved (interface/type-only, no host coupling): research_providers.go
(CurrentTimeProvider/CitationStorage/SearchBudget/PageExtractor/PDFFetcher/…)
and file_storage.go (FileStorage + FileDomainMeta). Plus the in-memory budget
default (research_defaults.go) and scope_validate.go.
calculate deferred (drags github.com/Krognol/go-wolfram + a module-path replace
— not worth it in the lean core for one tool). Core go.sum still free of
gorm/redis/discordgo/sqlite/wolfram.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stand up executus/tools — the generic, host-agnostic tool library — and prove
the full pattern end to end:
- tools/tools.go: Register(reg) adds the always-available zero-dependency tools
(currently `think`). A light host calls it and is immediately useful; backed
tools (web/store/meta groups) will register via grouped registrars with
nil-safe Deps as they land.
- tools/think.go: the `think` tool moved from mort (imports only executus/tool).
- tools/integration_test.go: end-to-end proof that the executor runs an agent
which CALLS a registered tool — the fake model emits a `think` tool call, the
executor dispatches it through the registry, the model finalises, and the step
instrumentation captures the `think` step. Exercises the full tool-dispatch
loop through run.Executor.
Stacked on phase-2-run-kernel (P3 needs run.Executor). Remaining P3: the
meta/web/net/store/compose groups + their Deps + default backends (splitting
mort's default.go grab-bag).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bump the gadfly image to sha-d0de034 (adds GADFLY_PROVIDER_LENS_CONCURRENCY)
and move ollama-cloud's concurrency from the MODEL axis to the LENS axis:
- GADFLY_PROVIDER_CONCURRENCY: ollama-cloud=1 (one model at a time)
- GADFLY_PROVIDER_LENS_CONCURRENCY: ollama-cloud=3 (its 3 lenses concurrent)
Net: still 3 models, but reviewed serially — the first model's consolidated
comment lands sooner and each model finishes faster, while the other two
models' comments arrive in series after it (instead of all 3 in parallel).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Measured on the P2 review: the local Macs (m1/m5) took 26–29 min with lens
timeouts and found ZERO real bugs, while the two cloud models found every
genuine finding in 6–12 min. Drop the Macs; add glm-5.2:cloud as a third
cloud reviewer. Net: faster (~29→~12 min) and higher signal.
Models: minimax-m3:cloud, deepseek-v4-flash:cloud, glm-5.2:cloud
(ollama-cloud=3 concurrency). timeout-minutes 90→30.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Independently verified all 18 gadfly findings against the code (18-agent
fan-out). Fixed the 9 real ones; the other 9 were false-positive /
hallucinated / valid-tradeoff (no change).
High:
- F1 nil model: a Models resolver returning (ctx,nil,nil) flowed into the
agent loop and nil-panicked. Now a clean error (Run never panics). +test.
- F9 compactor data-leak: renderTranscript sent tool-call args verbatim to
the summarizer (a possibly-different provider/tier); secret-bearing tool
args (mcp_call/email_send/http_*/webhook_*) are now redacted, with a doc
note that result bodies still flow (summary needs them).
Medium/minor:
- F2 compactor error path returned the folded slice, not the original msgs
(contradicting the documented non-fatal contract) -> return msgs.
- F3 RunStats.Status only ok/error; now timeout (DeadlineExceeded) /
cancelled (Canceled) via statusFor. +test.
- F4 step-zip emitted empty-name "ghost" steps when results>calls; now pairs
min(calls,results) only.
- F5 SetIteration was never called -> RunState.Iteration always 0; the step
observer now updates it each loop.
- F6 matchPending fallback was LIFO; now FIFO (matches the per-key queue).
- F7 estimateTokens had no default arm (future Part kinds counted as 0);
unknown parts now counted conservatively.
- F8 cloud_sync silently truncated >1MiB responses -> opaque JSON error; now
a clear "response exceeded N bytes" via readCapped.
- F12 step observer captured the caller ctx; now the merged runCtx.
- F13 compaction onFire was nil (doc claimed it logged); now wired to
audit LogEvent("compaction_fired").
- F11 (no pre-dispatch hook in majordomo) documented honestly as a known
limitation; F18 UsageSink doc clarified cache tokens are subsets of input.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The capstone of the run kernel: run.Executor.Run(ctx, RunnableAgent, inv)
ties model resolution + the tool registry + majordomo's agent loop +
context compaction + run-bounding + step/audit instrumentation into one
path, with every host concern behind the nil-safe run.Ports.
- run/executor.go: New(Config{Registry, Models, Defaults, Ports, Compactor,
ContextTokens, SystemHeader}) + Run -> Result{RunID, Output, Steps, Usage,
Err}. Budget gate (pre-run), model resolve, Audit StartRun/recorder
(satisfies RunTally, stamped on inv.RunState), toolbox build, step observer
(zips tool calls/results -> emitter + recorder.OnStep/OnTool), V10
detached-MaxRuntime context with caller-cancel merged back, compaction wired
from ContextTokens×ratio, audit Close + Budget Commit on a detached cleanup
ctx. Zero Ports = a bounded in-memory run (gadfly's case).
- run/executor_test.go: hermetic end-to-end run against majordomo's fake
provider (hello-world), Budget-rejection (no model call), Audit-port wiring
(StartRun + Close with terminal status/output). All green under -race.
- examples/minimal upgraded to the real "hello, agentic world" (~15 lines:
Configure tiers -> run.New -> Run -> print). README/CLAUDE.md updated.
Remaining P2 follow-ups (incremental): wire Critic/Checkpointer/PaletteSource/
Delivery into the loop, multi-phase Pipelines, and the no-tools direct path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- compact/compactor.go: the per-run stateful context compactor (token-threshold
gate, fast-tier middle summarisation, fold memory) lifted from mort's
skillexec/compactor.go. Self-contained; its only dependency is a ModelResolver
func (model.ParseModelForContext satisfies it) + a token threshold.
- run/steps.go: the step-emission/instrumentation (stepEmitter, tool->kind/
summary mapping with redaction, Result.Steps accumulation) from agentexec,
repointed onto executus/tool.
Both build green. With the run-loop mechanics, RunnableAgent DTO, run.Ports,
compactor, and step instrumentation now all in place, the remaining P2 work is
the run.Executor itself (wiring these + majordomo's agent loop), which makes
executus runnable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Triaged gadfly's P1 review (advisory). Fixed the three clearly-correct,
low-risk items; the rest were pre-existing mort behavior or theoretical:
- model/call.go: recordUsage dropped fully-cached responses (input==0 &&
output==0 early-return missed CacheRead/CacheWrite-only usage, which
Anthropic/OpenAI prompt-caching bills). Guard now also checks cache tokens.
- llmmeta/helper.go: recordLedger swallowed Storage.RecordMetaCall errors;
now logs them (slog.Warn) so a non-logging Storage impl can't silently drop
audit rows.
- model/cloud_sync.go: the ollama.com limit-cache used unbounded io.ReadAll;
wrapped both reads in io.LimitReader(1 MiB) so a misbehaving endpoint can't
exhaust memory before the 15s timeout.
Noted-not-fixed (follow-ups / pre-existing mort semantics): tier_not_allowed
ledger label on resolution failure, unknown-model usage attribution, the
cloud_sync https scheme allowlist, and several theoretical/cosmetic items.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror mort's updated adversarial-review.yml: m1/m5 pulled in via the
GADFLY_ENDPOINT_M1/_M5 secrets using gadfly's "foreman" provider type
(providers m1/m5; models m1/qwen3:14b, m5/qwen3.6:35b-mlx), 2 cloud models,
3-lens suite, pinned to the gadfly :sha-6e3a83c image. Header adjusted for
executus; functional config identical to mort's tested version.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add run/ports.go: the host seams the executor will consume, every one
nil-safe so a light host runs with the zero Ports (no persistence/audit/
budget/critic/delegation/delivery) and a heavy host wires each to a battery.
Ports mirror mort's existing interfaces so the batteries implement them
directly:
- Audit + RunRecorder (mort skillaudit.Storage/Writer): StartRun -> per-run
recorder (OnStep/OnTool/LogEvent/Close), recorder satisfies RunTally.
- Budget (mort skillexec.BudgetTracker): Check / Commit.
- Critic + CriticHandle (mort agentcritic): Monitor -> handle with
RecordStep/RecordToolStart/Steer/Deadline/Stop (the loop wiring finalizes
with the executor merge).
- Checkpointer (mort agentexec.RunCheckpointer): Save/Complete/Fail.
- PaletteSource (mort SkillInvokerForPalette + AgentInvokerForPalette):
Resolve/Invoke skill + agent delegation.
Plus host-neutral RunInfo / RunStats.
This completes the P2 inversion DESIGN; the agentexec+skillexec ->
run.Executor merge that consumes these Ports is the remaining P2 work.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stand up the executus/run kernel foundation, decoupled from mort:
- runengine.go: the shared run-loop scaffolding (MergeCancellation,
CleanupContextTimeout, RunFinalizer/FireFinalizers, RunStateAccessor) moved
from mort. The accessor's *skillaudit.Writer dependency is inverted to a
narrow run.RunTally interface (TokenStats + ToolCallsCount) — the kernel
reads live tallies without importing the audit battery.
- submit.go: the legacy submit-capture compat tool (stdlib + majordomo/llm).
- agent.go: RunnableAgent DTO — the kernel's view of "a thing to run" (tier,
prompt, caps, palette, phases, critic config). The persona Agent and saved
Skill will LOWER into this DTO so the kernel never imports a noun battery.
This is the spine of the agentexec.Run(*agents.Agent) inversion.
run/ builds with only majordomo + executus/tool. The executor merge
(agentexec+skillexec -> run.Executor) and the nil-safe run.Ports
(Audit/Critic/Budget/Checkpointer/PaletteSource) are the next P2 block.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lifts mort's pkg/logic/llms into executus/model, decoupled from mort:
- tiers.go: the tier resolver now reads a host-supplied config.Source under
"model.tier.<name>" with host-supplied fallbacks (Configure(cfg, defaults,
ttl)), instead of convar.Manager. Tier NAMES + specs are host config; the
resolution mechanism (cache, reasoning-suffix dialect, chain validation) is
generic. No tier names hard-coded in the harness.
- sink.go: usage/trace recording inverted off mort's llmusage/llmtrace into
UsageSink / TraceSink seams + a model-owned Span, with nil-safe context
attribution helpers (WithModel/WithTraceID/WithUsageTool/WithUsageUser).
Both sinks optional (nil = off) so a light host records nothing.
- lane decoration repointed to executus/lane; utils.Errorf -> fmt.Errorf.
- call.go keeps GenerateWith[T] (instrumented structured output) — this is the
structured-output primitive; no separate structured/ package.
- llmmeta moved over model/ (the meta-LLM helper: tier allowlist + JSON retry
+ ledger). Its tests configure a minimal tier table via TestMain.
New tests cover the inversion: config overrides fallback, tier registration,
reasoning-suffix survival, nested-tier rejection, nil-sink no-ops.
Full module: go build/vet/test -race green; core go.sum still free of
gorm/redis/discordgo/sqlite.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same setup as mort: the published gadfly:v1 image as a specialist swarm
(m1/m5 local Macs + 2 cloud models, 3-lens suite), posting one consolidated
advisory comment. Must live on main so it triggers on PRs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The tool registry core (registry, permission model, Invocation, gated-tool
wrapper, ssrf guard, hmac, encryption, argcoerce, helpers, rootrun,
session_tools, webhook_rate_limit) had zero mort coupling — it imports only
majordomo/llm + x/crypto/hkdf — so it moves verbatim with a package rename
(skilltools -> tool). All same-package tests came along and pass; the SSRF,
gated-wrapper, encryption and output-pattern invariants are re-anchored here.
majordomo re-enters the module graph (now pinned to the latest, incl. the
front-loaded-output fix). model/ + llmmeta + structured follow next.
Docs: CLAUDE.md now requires README/examples to stay in sync with changes in
the same commit; CI skips docs/example-only pushes via paths-ignore.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P0 has no external deps, so go.sum doesn't exist yet and
`git diff --exit-code go.mod go.sum` errored (exit 128) on the missing
path. Use `git status --porcelain` so the check survives a not-yet-created
go.sum and still catches an untidy go.mod or a newly-created go.sum.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Batteries-included agent-harness base, extracted from mort's agent layer.
This first cut establishes the module + the zero-coupling core primitives:
- lane, dispatchguard, pendingattach, run/progress.go: moved verbatim from mort
- config: host config Source seam + env-var default (nil-safe helpers)
- deliver: output-egress seam + Discard/Stdout defaults
- identity: AdminPolicy + MemberResolver seams (nil-safe)
- fanout: programmatic N×M swarm (bounded global + per-key concurrency)
- README/CLAUDE.md with the vibe-coded banner; CI with Go gates +
the "core stays majordomo+stdlib only" invariant
Core builds with stdlib only today; majordomo enters at P1 (model/structured).
go build/vet/test -race all green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>