Commit Graph

31 Commits

Author SHA1 Message Date
Steve Dudenhoeffer 51f5ea0d2b ci: pin gadfly reusable to immutable @7bc3c98 (vars-config reusable) [skip ci]
The reusable now reads swarm config from user-scope vars (GADFLY_DEFAULT_* +
GADFLY_ENDPOINT_*); this immutable @sha bumps past the long-lived-runner ref
cache so the vars-config reusable is adopted. Direct to main + [skip ci] to
avoid triggering the review swarm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 02:05:29 -04:00
steve a457e76ac7 ci: track gadfly's v1 release tag instead of a pinned sha (#7)
CI / Tidy (push) Successful in 9m25s
CI / Build & Test (push) Successful in 9m45s
2026-06-28 04:08:34 +00:00
steve 78a1d1c3bb ci: switch gadfly review to the reusable workflow (curated swarm, 5 lenses) (#6)
CI / Tidy (push) Successful in 9m25s
CI / Build & Test (push) Successful in 10m13s
2026-06-28 02:48:28 +00:00
steve aa25b2c334 Merge pull request 'feat(llamaswap): add llama-swaps (TLS) DSN scheme' (#4) from feat/llama-swaps-tls into main
CI / Tidy (push) Successful in 9m23s
CI / Build & Test (push) Successful in 10m13s
2026-06-27 22:56:59 +00:00
steve 2b35f1741c Merge pull request 'ci(gadfly): trim the weakest reviewers from the swarm' (#5) from ci/trim-gadfly-reviewers into main
CI / Tidy (push) Successful in 9m25s
CI / Build & Test (push) Successful in 10m1s
2026-06-27 22:56:57 +00:00
steve 98a2164aba ci(gadfly): trim the weakest reviewers from the swarm
Adversarial Review (Gadfly) / review (pull_request) Successful in 5m27s
CI / Tidy (pull_request) Successful in 9m31s
CI / Build & Test (pull_request) Successful in 9m48s
Drop the four lowest-graded reviewers — m5/qwen3.6:35b-mlx, gemma4:cloud,
gpt-oss:120b-cloud, kimi-k2.7-code:cloud. Removing m5/qwen3.6 takes the last
local Mac out, so this is now a cloud-only fleet of 6 ollama-cloud models;
GADFLY_ENDPOINT_M5 and the m5 concurrency entry are gone and the per-job timeout
drops to 45m. README/CLAUDE.md kept in sync.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 18:07:27 -04:00
steve de2b2f0f28 feat(llamaswap): add llama-swaps (TLS) DSN scheme
CI / Tidy (pull_request) Successful in 9m43s
CI / Build & Test (pull_request) Successful in 10m26s
Adversarial Review (Gadfly) / review (pull_request) Successful in 11m47s
llama-swap was http-only by DSN, pushing TLS-fronted instances onto the openai://
scheme (which loses the management/image methods). Add a "llama-swaps" scheme
that builds an https base URL, alongside "llama-swap" (http, local-first) —
mirroring redis/rediss. Both share one factory; llama-swaps is scheme-only (no
default built-in). The choice stays explicit because a DSN has no reliable
http-vs-https signal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 17:58:59 -04:00
steve b2487a1a37 Merge pull request 'feat(llamaswap): llama-swap provider + canonical imagegen interface' (#3) from feat/llama-swap-provider into main
CI / Tidy (push) Successful in 9m24s
CI / Build & Test (push) Successful in 10m11s
2026-06-27 20:14:01 +00:00
steve 64642c43c4 fix(llamaswap): address Gadfly review findings
CI / Tidy (pull_request) Successful in 9m25s
CI / Build & Test (pull_request) Successful in 10m15s
- Unload: reject model ids containing path separators (/?#) so a model name
  can't redirect the request to another endpoint; ":" (common in ids) stays
  verbatim.
- doJSON: take a model arg so image/management HTTP errors carry the target id
  (was always ""); add a base-URL guard so management methods fail clearly
  instead of building a bare-path request; cap the success-path JSON decode with
  io.LimitReader (64 MiB) and drain the body when out is nil for conn reuse.
- image: reject negative Request.N before sending.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 16:04:23 -04:00
steve 3ba2dbefae Merge remote-tracking branch 'origin/main' into feat/llama-swap-provider
CI / Build & Test (pull_request) Successful in 10m15s
CI / Tidy (pull_request) Successful in 10m20s
Adversarial Review (Gadfly) / review (pull_request) Successful in 18m24s
2026-06-27 15:13:07 -04:00
steve 38b4e1a028 Merge pull request 'ci: add Gadfly adversarial PR reviewer + document the review loop' (#2) from ci/gadfly-adversarial-review into main
CI / Tidy (push) Successful in 9m23s
CI / Build & Test (push) Successful in 10m16s
2026-06-27 19:10:53 +00:00
steve 96c612e707 feat(llamaswap): add llama-swap provider + canonical imagegen interface
CI / Tidy (pull_request) Successful in 9m25s
CI / Build & Test (pull_request) Successful in 10m15s
Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping
proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to
provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with
legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and
a timeout-free client so cold model swaps rely on context deadlines. The
"tailored" surface is concrete management methods (ListModels / Running /
Unload) that don't belong on the canonical llm.Provider interface. The
llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL
built-in errors clearly on use, mirroring foreman.

Add imagegen, a new canonical text-to-image interface separate from llm
(Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed
straight back into chat). First backend is llama-swap via OpenAI
/v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is
txt2img only.

Hermetic httptest coverage for chat delegation, management endpoints, image
decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix +
image-gen section, CLAUDE.md package map, and progress.md updated in the same
commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 15:01:54 -04:00
steve 43eb155759 ci(gadfly): drop the M1 Mac from the review swarm
CI / Build & Test (pull_request) Successful in 10m33s
CI / Tidy (pull_request) Successful in 9m26s
M1 was consistently slow (26-29 min) for zero real findings, so pull it before
this workflow ever fires. Leaves the 9 ollama-cloud models + the M5 Mac;
removes GADFLY_ENDPOINT_M1 and the m1 concurrency entry. Mirrors the same change
on executus.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:52:11 -04:00
steve 8dae9cc941 docs: document the Gadfly adversarial review loop in CLAUDE.md
CI / Build & Test (pull_request) Successful in 10m13s
Adversarial Review (Gadfly) / review (pull_request) Successful in 24m4s
CI / Tidy (pull_request) Successful in 9m26s
Records the PR workflow: push work to a PR (never straight to main), wait for
Gadfly to finish and weigh its findings, then grade each finding back to the
gadfly-reports MCP (record_finding_grade / list_findings / scoreboard) so the
telemetry can measure whether each model earns its keep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:32:25 -04:00
steve a5adc6f4d1 ci: add Gadfly adversarial PR reviewer workflow
Installs the standalone Gadfly agentic adversarial reviewer (advisory, never
blocks merge), mirroring executus's setup on the latest pinned image
(sha-d7f364d). Reviews majordomo PRs with the full fleet: 9 ollama-cloud models
plus the M1/M5 Macs via foreman, each running the 3-lens suite (security,
correctness, error-handling). Posts one consolidated comment per model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:31:48 -04:00
steve 1fd7109a42 fix(agent): recover front-loaded answer when terminal turn is degenerate
CI / Tidy (pull_request) Successful in 9m31s
CI / Build & Test (pull_request) Successful in 10m14s
CI / Tidy (push) Successful in 9m26s
CI / Build & Test (push) Successful in 10m19s
The agent loop took the final answer only from the terminal (no-tool-call)
turn. Models that "front-load" their answer into an earlier turn that also
calls a tool — then close with a trivial pointer like "(Already answered
above.)" — had their real answer discarded and the pointer delivered. This
recurs across several open-weight models (glm-5.2, etc.); well-behaved models
(Claude/GPT) defer their answer to the terminal turn and are unaffected.

finalOutput() now falls back to the last substantive assistant content in the
transcript when the terminal text is weak (empty, or a short back-reference).
The predicate is narrow and back-reference-gated so short-but-correct answers
("42", "It's down, restarting now.") are never overridden; recovery only picks
a prior turn that reads like a real answer, not a preamble. Zero extra model
calls. Terminal-answer behavior for normal runs is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 18:37:38 -04:00
steve 74474c6da0 feat(chain): fail over on empty/degenerate responses
CI / Tidy (push) Successful in 9m26s
CI / Build & Test (push) Successful in 10m29s
A failover chain previously treated a successful-but-empty completion (no
content parts and no tool calls — a "stop with nothing") as a valid result
and returned it. The agent loop then ended the run with empty output, and
the configured backup models were never tried because no error was raised.
This let a single flaky model silently terminate an agent/skill run with
no answer (observed in the wild with ollama-cloud/glm-5.2 returning empty
completions right after a large tool/think turn).

- Add llm.ErrEmptyResponse (classified transient) and Response.IsEmpty():
  true only when there are no tool calls and no meaningful content (no
  parts, or whitespace-only text). A media/image part counts as content,
  so image-only responses are NOT empty.
- chain.Generate converts an empty completion into ErrEmptyResponse so the
  chain fails over to the next target. Unlike an ordinary transient it is
  NOT retried on the same target (the model just produced it; these calls
  are expensive) — the chain penalizes health (so a persistently-empty
  target benches) and advances immediately.
- When every target returns empty the call fails with ErrChainExhausted
  joined to ErrEmptyResponse — a visible error instead of a hollow success.
  Single-element chains therefore also surface empties as errors.

Stream path is unchanged (can't inspect content before the consumer reads
it). Tests: Response.IsEmpty table; chain fails over past an empty head;
all-empty chain returns ErrChainExhausted/ErrEmptyResponse; repeated
empties bench the target across requests. Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 10:35:07 -04:00
Steve Dudenhoeffer 3e81fbd540 docs: public-readiness — vibe-coded disclosure + genericize internal hosts
CI / Tidy (push) Successful in 9m39s
CI / Build & Test (push) Successful in 10m21s
- README + CLAUDE.md: upfront "this is a vibe-coded project" disclosure for
  going public.
- Replace internal LAN hostnames (*.orgrimmar.dudenhoeffer.casa) with
  example.com across README, ADR-0004, the envproviders example, and env_test.go
  (assertions updated together; suite still green). Token was already a
  "change-me" placeholder, not a real secret.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:25:58 -04:00
steve 1029feb0c7 docs: record Phase 9 completion — mort conversion PR open
CI / Tidy (push) Successful in 9m33s
CI / Build & Test (push) Successful in 10m36s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:08:53 +02:00
steve 7c760005f5 docs: README coverage for resolvers, DefineTool, agent hooks, ops controls
CI / Build & Test (push) Successful in 10m8s
CI / Tidy (push) Successful in 9m25s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:34:29 +02:00
steve 0147a79d18 feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls
CI / Tidy (push) Successful in 9m31s
CI / Build & Test (push) Successful in 10m13s
Phase 9a (ADR-0014): Registry.RegisterResolver for dynamic tiers;
DefineTool[Args] typed tools; Usage cache/reasoning detail fields wired
through anthropic/openai/google; WithPromptCaching (Anthropic
cache_control); agent supervision hooks (WithMaxStepsFunc, WithSteer,
WithCompactor, WithToolErrorLimits + ErrToolLoop); health
Bench/Unbench/Snapshot; ChainConfig.Observer failover events.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:30:06 +02:00
steve 04b21fdad2 feat: live-validated against Ollama Cloud; schema instruction fallback for cloud
Phase 8: all six live checks pass (tier aliases, thinking-tier chat, real
tool invocation, structured Generate[T], forced failover with bench+skip,
skill agent). Discovery: ollama.com ignores the format field — the
provider now also states the schema as a system instruction (constrained
decoding locally, instruction-guided JSON on cloud), with hermetic test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:22:54 +02:00
steve 97513141dc docs: examples for every hard requirement + mort migration blueprint
Phase 7: nine runnable examples/ programs (parse, failover chains with
trailing alias, tiers, LLM_* env providers, multimodal, tool loop,
Generate[T], agent, skills); docs/mort-migration.md mapping mort's
go-llm/go-agentkit usage onto majordomo APIs with the planned additive
library extensions and conversion order; README finalized with the
complete matrix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:17:20 +02:00
steve 76ecf0e49e feat: skills — additive instruction+tool bundles, clock + calc examples
Phase 6: skill.New constructor satisfying the agent.Skill contract;
instruction-only skills; ordered additive composition; skill/clock
(injectable-clock time tools) and skill/calc (recursive-descent arithmetic
evaluator) as ready-made examples with full test suites incl. an
agent-loop round trip. ADR-0013; README skills section + matrix synced.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:13:07 +02:00
steve 7dab4112ff feat: agent run loop, Generate[T], reflect-derived schemas
Phase 5:
- agent/: model + system prompt + toolboxes composition; bounded
  tool-dispatch loop (default 10 steps); panic-proof tool execution;
  unknown-tool and duplicate-name handling; history continuation; step
  observers; partial results on ErrMaxSteps/errors (ADR-0012)
- llm.SchemaFor[T]: strict-compatible JSON schemas from Go types
  (nullable pointers, description/enum tags, recursion rejected)
- majordomo.Generate[T]: typed structured output with fence-stripping
  decode and model-naming errors
- README agents/structured-output sections + matrix synced

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:10:18 +02:00
steve 1ca607906d feat: Google (Gemini) provider on the official Gen AI SDK
Phase 4: provider/google on google.golang.org/genai v1.59.0 — lazy cached
client, FunctionResponse tool loop, raw-JSON-schema tools and structured
output, ThinkingLevel reasoning mapping, iter.Pull2 streaming, hermetic
httptest suite via HTTPOptions.BaseURL. Registry wires google + gemini
schemes to the real client; stub machinery deleted (all built-ins real).
ADR-0011; README matrix + CLAUDE.md synced.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:04:28 +02:00
steve 043249e0e1 feat: OpenAI, Anthropic, and native-Ollama providers + media pipeline
Phase 3:
- provider/openai: Chat Completions for OpenAI + compat endpoints (SSE
  streaming with by-index tool-call assembly, response_format json_schema,
  legacy max_tokens option, reasoning_effort)
- provider/anthropic: Messages API (tool_use/tool_result, GA structured
  output via output_config.format, full SSE event parser, 529 transient)
- provider/ollama: one native /api/chat client behind the ollama,
  ollama-cloud, and foreman built-ins (presets; NDJSON streaming tolerant
  of foreman's buffered single-object responses; object tool arguments;
  format-schema structured output; think mapping)
- media/: capability normalization (sniff, downscale, transcode, byte
  ladder, ErrUnsupported), wired into the chain executor per target with
  penalty-free advance past incapable elements
- registry: real provider + scheme wiring, WithHTTPClient option, required
  env-foreman TLS chat round-trip test
- ADR-0009 multimodal strategy, ADR-0010 tools/structured mapping; README
  matrix + CLAUDE.md synced

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:58:08 +02:00
steve 323558ed72 feat(llm): ReasoningEffort request option and ErrUnsupported sentinel
Groundwork for the provider phase: reasoning levels map to native knobs
(OpenAI reasoning_effort, Ollama think); ErrUnsupported marks declared
capability mismatches that chains advance past without health penalty.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:40:47 +02:00
steve 0d0e8e069e test: deterministic failover matrix — cooldown re-admission, alias chains, policies
Phase 2: proves ADR-0006/0008 semantics end to end with the fake provider
and fake clock (cooldown expiry, backoff growth, inline-alias failover,
permanent-error policies, retry budgets, bench-mid-request, exhaustion
reporting, custom classifier, chain-of-one parity).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:37:32 +02:00
steve dcd004289f feat: foundations — canonical types, Parse grammar, env DSNs, health, chains
Phase 1 of the majordomo build:
- llm/ canonical contract (messages, parts, tools, capabilities, streaming,
  Model/Provider, error classification)
- health/ clock-injected tracker (threshold bench, exponential capped
  cooldown, reset-on-success)
- root Registry + Parse (verbatim model ids, inline recursive alias
  expansion with cycle detection, chain dedup), LLM_* env-DSN providers
  (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain
  executor behind the Model interface
- provider/fake scriptable test provider; hermetic test suite incl. the
  trailing-thinking chain and foreman:// env loading
- ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow,
  docs/phase-1-design.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:35:34 +02:00
steve 3025044817 Initial commit 2026-06-10 09:21:08 +00:00