majordomo

Author	SHA1	Message	Date
Steve Dudenhoeffer	51f5ea0d2b	ci: pin gadfly reusable to immutable @7bc3c98 (vars-config reusable) [skip ci] The reusable now reads swarm config from user-scope vars (GADFLY_DEFAULT_* + GADFLY_ENDPOINT_*); this immutable @sha bumps past the long-lived-runner ref cache so the vars-config reusable is adopted. Direct to main + [skip ci] to avoid triggering the review swarm. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 02:05:29 -04:00
steve	a457e76ac7	ci: track gadfly's v1 release tag instead of a pinned sha (#7 ) CI / Tidy (push) Successful in 9m25s Details CI / Build & Test (push) Successful in 9m45s Details	2026-06-28 04:08:34 +00:00
steve	78a1d1c3bb	ci: switch gadfly review to the reusable workflow (curated swarm, 5 lenses) (#6 ) CI / Tidy (push) Successful in 9m25s Details CI / Build & Test (push) Successful in 10m13s Details	2026-06-28 02:48:28 +00:00
steve	aa25b2c334	Merge pull request 'feat(llamaswap): add llama-swaps (TLS) DSN scheme' (#4 ) from feat/llama-swaps-tls into main CI / Tidy (push) Successful in 9m23s Details CI / Build & Test (push) Successful in 10m13s Details	2026-06-27 22:56:59 +00:00
steve	2b35f1741c	Merge pull request 'ci(gadfly): trim the weakest reviewers from the swarm' (#5 ) from ci/trim-gadfly-reviewers into main CI / Tidy (push) Successful in 9m25s Details CI / Build & Test (push) Successful in 10m1s Details	2026-06-27 22:56:57 +00:00
steve	98a2164aba	ci(gadfly): trim the weakest reviewers from the swarm Adversarial Review (Gadfly) / review (pull_request) Successful in 5m27s Details CI / Tidy (pull_request) Successful in 9m31s Details CI / Build & Test (pull_request) Successful in 9m48s Details Drop the four lowest-graded reviewers — m5/qwen3.6:35b-mlx, gemma4:cloud, gpt-oss:120b-cloud, kimi-k2.7-code:cloud. Removing m5/qwen3.6 takes the last local Mac out, so this is now a cloud-only fleet of 6 ollama-cloud models; GADFLY_ENDPOINT_M5 and the m5 concurrency entry are gone and the per-job timeout drops to 45m. README/CLAUDE.md kept in sync. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 18:07:27 -04:00
steve	de2b2f0f28	feat(llamaswap): add llama-swaps (TLS) DSN scheme CI / Tidy (pull_request) Successful in 9m43s Details CI / Build & Test (pull_request) Successful in 10m26s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 11m47s Details llama-swap was http-only by DSN, pushing TLS-fronted instances onto the openai:// scheme (which loses the management/image methods). Add a "llama-swaps" scheme that builds an https base URL, alongside "llama-swap" (http, local-first) — mirroring redis/rediss. Both share one factory; llama-swaps is scheme-only (no default built-in). The choice stays explicit because a DSN has no reliable http-vs-https signal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 17:58:59 -04:00
steve	b2487a1a37	Merge pull request 'feat(llamaswap): llama-swap provider + canonical imagegen interface' (#3 ) from feat/llama-swap-provider into main CI / Tidy (push) Successful in 9m24s Details CI / Build & Test (push) Successful in 10m11s Details	2026-06-27 20:14:01 +00:00
steve	64642c43c4	fix(llamaswap): address Gadfly review findings CI / Tidy (pull_request) Successful in 9m25s Details CI / Build & Test (pull_request) Successful in 10m15s Details - Unload: reject model ids containing path separators (/?#) so a model name can't redirect the request to another endpoint; ":" (common in ids) stays verbatim. - doJSON: take a model arg so image/management HTTP errors carry the target id (was always ""); add a base-URL guard so management methods fail clearly instead of building a bare-path request; cap the success-path JSON decode with io.LimitReader (64 MiB) and drain the body when out is nil for conn reuse. - image: reject negative Request.N before sending. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 16:04:23 -04:00
steve	3ba2dbefae	Merge remote-tracking branch 'origin/main' into feat/llama-swap-provider CI / Build & Test (pull_request) Successful in 10m15s Details CI / Tidy (pull_request) Successful in 10m20s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 18m24s Details	2026-06-27 15:13:07 -04:00
steve	38b4e1a028	Merge pull request 'ci: add Gadfly adversarial PR reviewer + document the review loop' (#2 ) from ci/gadfly-adversarial-review into main CI / Tidy (push) Successful in 9m23s Details CI / Build & Test (push) Successful in 10m16s Details	2026-06-27 19:10:53 +00:00
steve	96c612e707	feat(llamaswap): add llama-swap provider + canonical imagegen interface CI / Tidy (pull_request) Successful in 9m25s Details CI / Build & Test (pull_request) Successful in 10m15s Details Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and a timeout-free client so cold model swaps rely on context deadlines. The "tailored" surface is concrete management methods (ListModels / Running / Unload) that don't belong on the canonical llm.Provider interface. The llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL built-in errors clearly on use, mirroring foreman. Add imagegen, a new canonical text-to-image interface separate from llm (Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed straight back into chat). First backend is llama-swap via OpenAI /v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is txt2img only. Hermetic httptest coverage for chat delegation, management endpoints, image decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix + image-gen section, CLAUDE.md package map, and progress.md updated in the same commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 15:01:54 -04:00
steve	43eb155759	ci(gadfly): drop the M1 Mac from the review swarm CI / Build & Test (pull_request) Successful in 10m33s Details CI / Tidy (pull_request) Successful in 9m26s Details M1 was consistently slow (26-29 min) for zero real findings, so pull it before this workflow ever fires. Leaves the 9 ollama-cloud models + the M5 Mac; removes GADFLY_ENDPOINT_M1 and the m1 concurrency entry. Mirrors the same change on executus. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:52:11 -04:00
steve	8dae9cc941	docs: document the Gadfly adversarial review loop in CLAUDE.md CI / Build & Test (pull_request) Successful in 10m13s Details Adversarial Review (Gadfly) / review (pull_request) Successful in 24m4s Details CI / Tidy (pull_request) Successful in 9m26s Details Records the PR workflow: push work to a PR (never straight to main), wait for Gadfly to finish and weigh its findings, then grade each finding back to the gadfly-reports MCP (record_finding_grade / list_findings / scoreboard) so the telemetry can measure whether each model earns its keep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:32:25 -04:00
steve	a5adc6f4d1	ci: add Gadfly adversarial PR reviewer workflow Installs the standalone Gadfly agentic adversarial reviewer (advisory, never blocks merge), mirroring executus's setup on the latest pinned image (sha-d7f364d). Reviews majordomo PRs with the full fleet: 9 ollama-cloud models plus the M1/M5 Macs via foreman, each running the 3-lens suite (security, correctness, error-handling). Posts one consolidated comment per model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 14:31:48 -04:00
steve	1fd7109a42	fix(agent): recover front-loaded answer when terminal turn is degenerate CI / Tidy (pull_request) Successful in 9m31s Details CI / Build & Test (pull_request) Successful in 10m14s Details CI / Tidy (push) Successful in 9m26s Details CI / Build & Test (push) Successful in 10m19s Details The agent loop took the final answer only from the terminal (no-tool-call) turn. Models that "front-load" their answer into an earlier turn that also calls a tool — then close with a trivial pointer like "(Already answered above.)" — had their real answer discarded and the pointer delivered. This recurs across several open-weight models (glm-5.2, etc.); well-behaved models (Claude/GPT) defer their answer to the terminal turn and are unaffected. finalOutput() now falls back to the last substantive assistant content in the transcript when the terminal text is weak (empty, or a short back-reference). The predicate is narrow and back-reference-gated so short-but-correct answers ("42", "It's down, restarting now.") are never overridden; recovery only picks a prior turn that reads like a real answer, not a preamble. Zero extra model calls. Terminal-answer behavior for normal runs is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 18:37:38 -04:00
steve	74474c6da0	feat(chain): fail over on empty/degenerate responses CI / Tidy (push) Successful in 9m26s Details CI / Build & Test (push) Successful in 10m29s Details A failover chain previously treated a successful-but-empty completion (no content parts and no tool calls — a "stop with nothing") as a valid result and returned it. The agent loop then ended the run with empty output, and the configured backup models were never tried because no error was raised. This let a single flaky model silently terminate an agent/skill run with no answer (observed in the wild with ollama-cloud/glm-5.2 returning empty completions right after a large tool/think turn). - Add llm.ErrEmptyResponse (classified transient) and Response.IsEmpty(): true only when there are no tool calls and no meaningful content (no parts, or whitespace-only text). A media/image part counts as content, so image-only responses are NOT empty. - chain.Generate converts an empty completion into ErrEmptyResponse so the chain fails over to the next target. Unlike an ordinary transient it is NOT retried on the same target (the model just produced it; these calls are expensive) — the chain penalizes health (so a persistently-empty target benches) and advances immediately. - When every target returns empty the call fails with ErrChainExhausted joined to ErrEmptyResponse — a visible error instead of a hollow success. Single-element chains therefore also surface empties as errors. Stream path is unchanged (can't inspect content before the consumer reads it). Tests: Response.IsEmpty table; chain fails over past an empty head; all-empty chain returns ErrChainExhausted/ErrEmptyResponse; repeated empties bench the target across requests. Full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 10:35:07 -04:00
Steve Dudenhoeffer	3e81fbd540	docs: public-readiness — vibe-coded disclosure + genericize internal hosts CI / Tidy (push) Successful in 9m39s Details CI / Build & Test (push) Successful in 10m21s Details - README + CLAUDE.md: upfront "this is a vibe-coded project" disclosure for going public. - Replace internal LAN hostnames (*.orgrimmar.dudenhoeffer.casa) with example.com across README, ADR-0004, the envproviders example, and env_test.go (assertions updated together; suite still green). Token was already a "change-me" placeholder, not a real secret. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 19:25:58 -04:00
steve	1029feb0c7	docs: record Phase 9 completion — mort conversion PR open CI / Tidy (push) Successful in 9m33s Details CI / Build & Test (push) Successful in 10m36s Details Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:08:53 +02:00
steve	7c760005f5	docs: README coverage for resolvers, DefineTool, agent hooks, ops controls CI / Build & Test (push) Successful in 10m8s Details CI / Tidy (push) Successful in 9m25s Details Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:34:29 +02:00
steve	0147a79d18	feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls CI / Tidy (push) Successful in 9m31s Details CI / Build & Test (push) Successful in 10m13s Details Phase 9a (ADR-0014): Registry.RegisterResolver for dynamic tiers; DefineTool[Args] typed tools; Usage cache/reasoning detail fields wired through anthropic/openai/google; WithPromptCaching (Anthropic cache_control); agent supervision hooks (WithMaxStepsFunc, WithSteer, WithCompactor, WithToolErrorLimits + ErrToolLoop); health Bench/Unbench/Snapshot; ChainConfig.Observer failover events. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:30:06 +02:00
steve	04b21fdad2	feat: live-validated against Ollama Cloud; schema instruction fallback for cloud Phase 8: all six live checks pass (tier aliases, thinking-tier chat, real tool invocation, structured Generate[T], forced failover with bench+skip, skill agent). Discovery: ollama.com ignores the format field — the provider now also states the schema as a system instruction (constrained decoding locally, instruction-guided JSON on cloud), with hermetic test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:22:54 +02:00
steve	97513141dc	docs: examples for every hard requirement + mort migration blueprint Phase 7: nine runnable examples/ programs (parse, failover chains with trailing alias, tiers, LLM_* env providers, multimodal, tool loop, Generate[T], agent, skills); docs/mort-migration.md mapping mort's go-llm/go-agentkit usage onto majordomo APIs with the planned additive library extensions and conversion order; README finalized with the complete matrix. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:17:20 +02:00
steve	76ecf0e49e	feat: skills — additive instruction+tool bundles, clock + calc examples Phase 6: skill.New constructor satisfying the agent.Skill contract; instruction-only skills; ordered additive composition; skill/clock (injectable-clock time tools) and skill/calc (recursive-descent arithmetic evaluator) as ready-made examples with full test suites incl. an agent-loop round trip. ADR-0013; README skills section + matrix synced. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:13:07 +02:00
steve	7dab4112ff	feat: agent run loop, Generate[T], reflect-derived schemas Phase 5: - agent/: model + system prompt + toolboxes composition; bounded tool-dispatch loop (default 10 steps); panic-proof tool execution; unknown-tool and duplicate-name handling; history continuation; step observers; partial results on ErrMaxSteps/errors (ADR-0012) - llm.SchemaFor[T]: strict-compatible JSON schemas from Go types (nullable pointers, description/enum tags, recursion rejected) - majordomo.Generate[T]: typed structured output with fence-stripping decode and model-naming errors - README agents/structured-output sections + matrix synced Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:10:18 +02:00
steve	1ca607906d	feat: Google (Gemini) provider on the official Gen AI SDK Phase 4: provider/google on google.golang.org/genai v1.59.0 — lazy cached client, FunctionResponse tool loop, raw-JSON-schema tools and structured output, ThinkingLevel reasoning mapping, iter.Pull2 streaming, hermetic httptest suite via HTTPOptions.BaseURL. Registry wires google + gemini schemes to the real client; stub machinery deleted (all built-ins real). ADR-0011; README matrix + CLAUDE.md synced. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:04:28 +02:00
steve	043249e0e1	feat: OpenAI, Anthropic, and native-Ollama providers + media pipeline Phase 3: - provider/openai: Chat Completions for OpenAI + compat endpoints (SSE streaming with by-index tool-call assembly, response_format json_schema, legacy max_tokens option, reasoning_effort) - provider/anthropic: Messages API (tool_use/tool_result, GA structured output via output_config.format, full SSE event parser, 529 transient) - provider/ollama: one native /api/chat client behind the ollama, ollama-cloud, and foreman built-ins (presets; NDJSON streaming tolerant of foreman's buffered single-object responses; object tool arguments; format-schema structured output; think mapping) - media/: capability normalization (sniff, downscale, transcode, byte ladder, ErrUnsupported), wired into the chain executor per target with penalty-free advance past incapable elements - registry: real provider + scheme wiring, WithHTTPClient option, required env-foreman TLS chat round-trip test - ADR-0009 multimodal strategy, ADR-0010 tools/structured mapping; README matrix + CLAUDE.md synced Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 12:58:08 +02:00
steve	323558ed72	feat(llm): ReasoningEffort request option and ErrUnsupported sentinel Groundwork for the provider phase: reasoning levels map to native knobs (OpenAI reasoning_effort, Ollama think); ErrUnsupported marks declared capability mismatches that chains advance past without health penalty. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 12:40:47 +02:00
steve	0d0e8e069e	test: deterministic failover matrix — cooldown re-admission, alias chains, policies Phase 2: proves ADR-0006/0008 semantics end to end with the fake provider and fake clock (cooldown expiry, backoff growth, inline-alias failover, permanent-error policies, retry budgets, bench-mid-request, exhaustion reporting, custom classifier, chain-of-one parity). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 12:37:32 +02:00
steve	dcd004289f	feat: foundations — canonical types, Parse grammar, env DSNs, health, chains Phase 1 of the majordomo build: - llm/ canonical contract (messages, parts, tools, capabilities, streaming, Model/Provider, error classification) - health/ clock-injected tracker (threshold bench, exponential capped cooldown, reset-on-success) - root Registry + Parse (verbatim model ids, inline recursive alias expansion with cycle detection, chain dedup), LLM_* env-DSN providers (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain executor behind the Model interface - provider/fake scriptable test provider; hermetic test suite incl. the trailing-thinking chain and foreman:// env loading - ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow, docs/phase-1-design.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 12:35:34 +02:00
steve	3025044817	Initial commit	2026-06-10 09:21:08 +00:00

31 Commits