Commit Graph

9 Commits

Author SHA1 Message Date
steve de2b2f0f28 feat(llamaswap): add llama-swaps (TLS) DSN scheme
CI / Tidy (pull_request) Successful in 9m43s
CI / Build & Test (pull_request) Successful in 10m26s
Adversarial Review (Gadfly) / review (pull_request) Successful in 11m47s
llama-swap was http-only by DSN, pushing TLS-fronted instances onto the openai://
scheme (which loses the management/image methods). Add a "llama-swaps" scheme
that builds an https base URL, alongside "llama-swap" (http, local-first) —
mirroring redis/rediss. Both share one factory; llama-swaps is scheme-only (no
default built-in). The choice stays explicit because a DSN has no reliable
http-vs-https signal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 17:58:59 -04:00
steve 3ba2dbefae Merge remote-tracking branch 'origin/main' into feat/llama-swap-provider
CI / Build & Test (pull_request) Successful in 10m15s
CI / Tidy (pull_request) Successful in 10m20s
Adversarial Review (Gadfly) / review (pull_request) Successful in 18m24s
2026-06-27 15:13:07 -04:00
steve 96c612e707 feat(llamaswap): add llama-swap provider + canonical imagegen interface
CI / Tidy (pull_request) Successful in 9m25s
CI / Build & Test (pull_request) Successful in 10m15s
Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping
proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to
provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with
legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and
a timeout-free client so cold model swaps rely on context deadlines. The
"tailored" surface is concrete management methods (ListModels / Running /
Unload) that don't belong on the canonical llm.Provider interface. The
llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL
built-in errors clearly on use, mirroring foreman.

Add imagegen, a new canonical text-to-image interface separate from llm
(Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed
straight back into chat). First backend is llama-swap via OpenAI
/v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is
txt2img only.

Hermetic httptest coverage for chat delegation, management endpoints, image
decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix +
image-gen section, CLAUDE.md package map, and progress.md updated in the same
commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 15:01:54 -04:00
steve 8dae9cc941 docs: document the Gadfly adversarial review loop in CLAUDE.md
CI / Build & Test (pull_request) Successful in 10m13s
Adversarial Review (Gadfly) / review (pull_request) Successful in 24m4s
CI / Tidy (pull_request) Successful in 9m26s
Records the PR workflow: push work to a PR (never straight to main), wait for
Gadfly to finish and weigh its findings, then grade each finding back to the
gadfly-reports MCP (record_finding_grade / list_findings / scoreboard) so the
telemetry can measure whether each model earns its keep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 14:32:25 -04:00
steve 74474c6da0 feat(chain): fail over on empty/degenerate responses
CI / Tidy (push) Successful in 9m26s
CI / Build & Test (push) Successful in 10m29s
A failover chain previously treated a successful-but-empty completion (no
content parts and no tool calls — a "stop with nothing") as a valid result
and returned it. The agent loop then ended the run with empty output, and
the configured backup models were never tried because no error was raised.
This let a single flaky model silently terminate an agent/skill run with
no answer (observed in the wild with ollama-cloud/glm-5.2 returning empty
completions right after a large tool/think turn).

- Add llm.ErrEmptyResponse (classified transient) and Response.IsEmpty():
  true only when there are no tool calls and no meaningful content (no
  parts, or whitespace-only text). A media/image part counts as content,
  so image-only responses are NOT empty.
- chain.Generate converts an empty completion into ErrEmptyResponse so the
  chain fails over to the next target. Unlike an ordinary transient it is
  NOT retried on the same target (the model just produced it; these calls
  are expensive) — the chain penalizes health (so a persistently-empty
  target benches) and advances immediately.
- When every target returns empty the call fails with ErrChainExhausted
  joined to ErrEmptyResponse — a visible error instead of a hollow success.
  Single-element chains therefore also surface empties as errors.

Stream path is unchanged (can't inspect content before the consumer reads
it). Tests: Response.IsEmpty table; chain fails over past an empty head;
all-empty chain returns ErrChainExhausted/ErrEmptyResponse; repeated
empties bench the target across requests. Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 10:35:07 -04:00
Steve Dudenhoeffer 3e81fbd540 docs: public-readiness — vibe-coded disclosure + genericize internal hosts
CI / Tidy (push) Successful in 9m39s
CI / Build & Test (push) Successful in 10m21s
- README + CLAUDE.md: upfront "this is a vibe-coded project" disclosure for
  going public.
- Replace internal LAN hostnames (*.orgrimmar.dudenhoeffer.casa) with
  example.com across README, ADR-0004, the envproviders example, and env_test.go
  (assertions updated together; suite still green). Token was already a
  "change-me" placeholder, not a real secret.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 19:25:58 -04:00
steve 1ca607906d feat: Google (Gemini) provider on the official Gen AI SDK
Phase 4: provider/google on google.golang.org/genai v1.59.0 — lazy cached
client, FunctionResponse tool loop, raw-JSON-schema tools and structured
output, ThinkingLevel reasoning mapping, iter.Pull2 streaming, hermetic
httptest suite via HTTPOptions.BaseURL. Registry wires google + gemini
schemes to the real client; stub machinery deleted (all built-ins real).
ADR-0011; README matrix + CLAUDE.md synced.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:04:28 +02:00
steve 043249e0e1 feat: OpenAI, Anthropic, and native-Ollama providers + media pipeline
Phase 3:
- provider/openai: Chat Completions for OpenAI + compat endpoints (SSE
  streaming with by-index tool-call assembly, response_format json_schema,
  legacy max_tokens option, reasoning_effort)
- provider/anthropic: Messages API (tool_use/tool_result, GA structured
  output via output_config.format, full SSE event parser, 529 transient)
- provider/ollama: one native /api/chat client behind the ollama,
  ollama-cloud, and foreman built-ins (presets; NDJSON streaming tolerant
  of foreman's buffered single-object responses; object tool arguments;
  format-schema structured output; think mapping)
- media/: capability normalization (sniff, downscale, transcode, byte
  ladder, ErrUnsupported), wired into the chain executor per target with
  penalty-free advance past incapable elements
- registry: real provider + scheme wiring, WithHTTPClient option, required
  env-foreman TLS chat round-trip test
- ADR-0009 multimodal strategy, ADR-0010 tools/structured mapping; README
  matrix + CLAUDE.md synced

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:58:08 +02:00
steve dcd004289f feat: foundations — canonical types, Parse grammar, env DSNs, health, chains
Phase 1 of the majordomo build:
- llm/ canonical contract (messages, parts, tools, capabilities, streaming,
  Model/Provider, error classification)
- health/ clock-injected tracker (threshold bench, exponential capped
  cooldown, reset-on-success)
- root Registry + Parse (verbatim model ids, inline recursive alias
  expansion with cycle detection, chain dedup), LLM_* env-DSN providers
  (go-llm parity: lazy fallback + eager LoadEnv), health-aware chain
  executor behind the Model interface
- provider/fake scriptable test provider; hermetic test suite incl. the
  trailing-thinking chain and foreman:// env loading
- ADRs 0001-0008, CLAUDE.md, README (honest matrix), CI workflow,
  docs/phase-1-design.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:35:34 +02:00