Commit Graph

19 Commits

Author SHA1 Message Date
steve 43eb155759 ci(gadfly): drop the M1 Mac from the review swarm
CI / Build & Test (pull_request) Successful in 10m33s
CI / Tidy (pull_request) Successful in 9m26s
M1 was consistently slow (26-29 min) for zero real findings, so pull it before
this workflow ever fires. Leaves the 9 ollama-cloud models + the M5 Mac;
removes GADFLY_ENDPOINT_M1 and the m1 concurrency entry. Mirrors the same change
on executus.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:52:11 -04:00
steve 8dae9cc941 docs: document the Gadfly adversarial review loop in CLAUDE.md
CI / Build & Test (pull_request) Successful in 10m13s
Adversarial Review (Gadfly) / review (pull_request) Successful in 24m4s
CI / Tidy (pull_request) Successful in 9m26s
Records the PR workflow: push work to a PR (never straight to main), wait for
Gadfly to finish and weigh its findings, then grade each finding back to the
gadfly-reports MCP (record_finding_grade / list_findings / scoreboard) so the
telemetry can measure whether each model earns its keep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:32:25 -04:00
steve a5adc6f4d1 ci: add Gadfly adversarial PR reviewer workflow
Installs the standalone Gadfly agentic adversarial reviewer (advisory, never
blocks merge), mirroring executus's setup on the latest pinned image
(sha-d7f364d). Reviews majordomo PRs with the full fleet: 9 ollama-cloud models
plus the M1/M5 Macs via foreman, each running the 3-lens suite (security,
correctness, error-handling). Posts one consolidated comment per model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:31:48 -04:00
steve 1fd7109a42 fix(agent): recover front-loaded answer when terminal turn is degenerate
CI / Tidy (pull_request) Successful in 9m31s
CI / Build & Test (pull_request) Successful in 10m14s
CI / Tidy (push) Successful in 9m26s
CI / Build & Test (push) Successful in 10m19s
The agent loop took the final answer only from the terminal (no-tool-call)
turn. Models that "front-load" their answer into an earlier turn that also
calls a tool — then close with a trivial pointer like "(Already answered
above.)" — had their real answer discarded and the pointer delivered. This
recurs across several open-weight models (glm-5.2, etc.); well-behaved models
(Claude/GPT) defer their answer to the terminal turn and are unaffected.

finalOutput() now falls back to the last substantive assistant content in the
transcript when the terminal text is weak (empty, or a short back-reference).
The predicate is narrow and back-reference-gated so short-but-correct answers
("42", "It's down, restarting now.") are never overridden; recovery only picks
a prior turn that reads like a real answer, not a preamble. Zero extra model
calls. Terminal-answer behavior for normal runs is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 18:37:38 -04:00
steve 74474c6da0 feat(chain): fail over on empty/degenerate responses
CI / Tidy (push) Successful in 9m26s
CI / Build & Test (push) Successful in 10m29s
A failover chain previously treated a successful-but-empty completion (no
content parts and no tool calls — a "stop with nothing") as a valid result
and returned it. The agent loop then ended the run with empty output, and
the configured backup models were never tried because no error was raised.
This let a single flaky model silently terminate an agent/skill run with
no answer (observed in the wild with ollama-cloud/glm-5.2 returning empty
completions right after a large tool/think turn).

- Add llm.ErrEmptyResponse (classified transient) and Response.IsEmpty():
  true only when there are no tool calls and no meaningful content (no
  parts, or whitespace-only text). A media/image part counts as content,
  so image-only responses are NOT empty.
- chain.Generate converts an empty completion into ErrEmptyResponse so the
  chain fails over to the next target. Unlike an ordinary transient it is
  NOT retried on the same target (the model just produced it; these calls
  are expensive) — the chain penalizes health (so a persistently-empty
  target benches) and advances immediately.
- When every target returns empty the call fails with ErrChainExhausted
  joined to ErrEmptyResponse — a visible error instead of a hollow success.
  Single-element chains therefore also surface empties as errors.

Stream path is unchanged (can't inspect content before the consumer reads
it). Tests: Response.IsEmpty table; chain fails over past an empty head;
all-empty chain returns ErrChainExhausted/ErrEmptyResponse; repeated
empties bench the target across requests. Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 10:35:07 -04:00
Steve Dudenhoeffer 3e81fbd540 docs: public-readiness — vibe-coded disclosure + genericize internal hosts
CI / Tidy (push) Successful in 9m39s
CI / Build & Test (push) Successful in 10m21s
- README + CLAUDE.md: upfront "this is a vibe-coded project" disclosure for
  going public.
- Replace internal LAN hostnames (*.orgrimmar.dudenhoeffer.casa) with
  example.com across README, ADR-0004, the envproviders example, and env_test.go
  (assertions updated together; suite still green). Token was already a
  "change-me" placeholder, not a real secret.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:25:58 -04:00
steve 1029feb0c7 docs: record Phase 9 completion — mort conversion PR open
CI / Tidy (push) Successful in 9m33s
CI / Build & Test (push) Successful in 10m36s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:08:53 +02:00
steve 7c760005f5 docs: README coverage for resolvers, DefineTool, agent hooks, ops controls
CI / Build & Test (push) Successful in 10m8s
CI / Tidy (push) Successful in 9m25s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:34:29 +02:00
steve 0147a79d18 feat: conversion-driven extensions — resolvers, DefineTool, hooks, ops controls
CI / Tidy (push) Successful in 9m31s
CI / Build & Test (push) Successful in 10m13s
Phase 9a (ADR-0014): Registry.RegisterResolver for dynamic tiers;
DefineTool[Args] typed tools; Usage cache/reasoning detail fields wired
through anthropic/openai/google; WithPromptCaching (Anthropic
cache_control); agent supervision hooks (WithMaxStepsFunc, WithSteer,
WithCompactor, WithToolErrorLimits + ErrToolLoop); health
Bench/Unbench/Snapshot; ChainConfig.Observer failover events.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:30:06 +02:00
steve 04b21fdad2 feat: live-validated against Ollama Cloud; schema instruction fallback for cloud
Phase 8: all six live checks pass (tier aliases, thinking-tier chat, real
tool invocation, structured Generate[T], forced failover with bench+skip,
skill agent). Discovery: ollama.com ignores the format field — the
provider now also states the schema as a system instruction (constrained
decoding locally, instruction-guided JSON on cloud), with hermetic test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:22:54 +02:00
steve 97513141dc docs: examples for every hard requirement + mort migration blueprint
Phase 7: nine runnable examples/ programs (parse, failover chains with
trailing alias, tiers, LLM_* env providers, multimodal, tool loop,
Generate[T], agent, skills); docs/mort-migration.md mapping mort's
go-llm/go-agentkit usage onto majordomo APIs with the planned additive
library extensions and conversion order; README finalized with the
complete matrix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:17:20 +02:00
steve 76ecf0e49e feat: skills — additive instruction+tool bundles, clock + calc examples
Phase 6: skill.New constructor satisfying the agent.Skill contract;
instruction-only skills; ordered additive composition; skill/clock
(injectable-clock time tools) and skill/calc (recursive-descent arithmetic
evaluator) as ready-made examples with full test suites incl. an
agent-loop round trip. ADR-0013; README skills section + matrix synced.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:13:07 +02:00
steve 7dab4112ff feat: agent run loop, Generate[T], reflect-derived schemas
Phase 5:
- agent/: model + system prompt + toolboxes composition; bounded
  tool-dispatch loop (default 10 steps); panic-proof tool execution;
  unknown-tool and duplicate-name handling; history continuation; step
  observers; partial results on ErrMaxSteps/errors (ADR-0012)
- llm.SchemaFor[T]: strict-compatible JSON schemas from Go types
  (nullable pointers, description/enum tags, recursion rejected)
- majordomo.Generate[T]: typed structured output with fence-stripping
  decode and model-naming errors
- README agents/structured-output sections + matrix synced

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:10:18 +02:00
steve 1ca607906d feat: Google (Gemini) provider on the official Gen AI SDK
Phase 4: provider/google on google.golang.org/genai v1.59.0 — lazy cached
client, FunctionResponse tool loop, raw-JSON-schema tools and structured
output, ThinkingLevel reasoning mapping, iter.Pull2 streaming, hermetic
httptest suite via HTTPOptions.BaseURL. Registry wires google + gemini
schemes to the real client; stub machinery deleted (all built-ins real).
ADR-0011; README matrix + CLAUDE.md synced.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:04:28 +02:00
steve 043249e0e1 feat: OpenAI, Anthropic, and native-Ollama providers + media pipeline
Phase 3:
- provider/openai: Chat Completions for OpenAI + compat endpoints (SSE
  streaming with by-index tool-call assembly, response_format json_schema,
  legacy max_tokens option, reasoning_effort)
- provider/anthropic: Messages API (tool_use/tool_result, GA structured
  output via output_config.format, full SSE event parser, 529 transient)
- provider/ollama: one native /api/chat client behind the ollama,
  ollama-cloud, and foreman built-ins (presets; NDJSON streaming tolerant
  of foreman's buffered single-object responses; object tool arguments;
  format-schema structured output; think mapping)
- media/: capability normalization (sniff, downscale, transcode, byte
  ladder, ErrUnsupported), wired into the chain executor per target with
  penalty-free advance past incapable elements
- registry: real provider + scheme wiring, WithHTTPClient option, required
  env-foreman TLS chat round-trip test
- ADR-0009 multimodal strategy, ADR-0010 tools/structured mapping; README
  matrix + CLAUDE.md synced

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:58:08 +02:00
steve 323558ed72 feat(llm): ReasoningEffort request option and ErrUnsupported sentinel
Groundwork for the provider phase: reasoning levels map to native knobs
(OpenAI reasoning_effort, Ollama think); ErrUnsupported marks declared
capability mismatches that chains advance past without health penalty.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:40:47 +02:00
steve 0d0e8e069e test: deterministic failover matrix — cooldown re-admission, alias chains, policies
Phase 2: proves ADR-0006/0008 semantics end to end with the fake provider
and fake clock (cooldown expiry, backoff growth, inline-alias failover,
permanent-error policies, retry budgets, bench-mid-request, exhaustion
reporting, custom classifier, chain-of-one parity).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:37:32 +02:00
steve dcd004289f feat: foundations — canonical types, Parse grammar, env DSNs, health, chains
Phase 1 of the majordomo build:
- llm/ canonical contract (messages, parts, tools, capabilities, streaming,
  Model/Provider, error classification)
- health/ clock-injected tracker (threshold bench, exponential capped
  cooldown, reset-on-success)
- root Registry + Parse (verbatim model ids, inline recursive alias
  expansion with cycle detection, chain dedup), LLM_* env-DSN providers
  (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain
  executor behind the Model interface
- provider/fake scriptable test provider; hermetic test suite incl. the
  trailing-thinking chain and foreman:// env loading
- ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow,
  docs/phase-1-design.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:35:34 +02:00
steve 3025044817 Initial commit 2026-06-10 09:21:08 +00:00