feat(llamaswap): llama-swap provider + canonical imagegen interface #3

Merged
steve merged 3 commits from feat/llama-swap-provider into main 2026-06-27 20:14:01 +00:00
Owner

Adds llama-swap (the on-demand model-swapping proxy over llama.cpp / stable-diffusion.cpp) as a tailored provider, plus a new canonical text-to-image interface so mort can generate images.

provider/llamaswap (ADR-0015)

llama-swap's chat API is byte-identical to OpenAI Chat Completions, so the chat path delegates to provider/openai at {base}/v1 — no duplicated wire client (ADR-0007). What makes it "tailored":

  • Management methods on the concrete type (outside the canonical interface): ListModels (/v1/models), Running (/running, raw JSON), Unload (/api/models/unload).
  • Swap-aware: timeout-free HTTP client — a cold model swap blocks up to llama-swap's healthCheckTimeout (≥15s), so callers bound work with a context deadline.
  • Legacy max_tokens (llama.cpp's shim), and a Bearer no-key placeholder so keyless local instances work (the openai client rejects a blank key).
  • DSN llama-swap://token@host:port builds an http base URL (local-first, unlike the https-always default). A no-URL built-in errors clearly on use (mirrors foreman).

imagegen (ADR-0016)

A new canonical leaf package, separate from llm (image gen shares no message/tool/stream machinery): Request/Result/Model/Provider, with Image = llm.ImagePart so generated images feed straight back into a chat turn. Re-exported from the root (ImageModel, ImageRequest, WithImageSize, …). First backend is llama-swap via OpenAI /v1/images/generations (b64_json, bytes-only). v1 is txt2img only; edits/img2img and registry image-DSN resolution deferred.

Tests & docs

  • Hermetic httptest coverage: chat delegation (path + legacy max_tokens + bearer), keyless placeholder, ListModels/Unload/Running, image decode (1×1 PNG → ImagePart), error classification, and scheme/built-in wiring.
  • All four gates green (build, vet, test -race, mod tidy). No new dependencies.
  • README support matrix + image-gen section, CLAUDE.md package map, ADR index, and progress.md updated in the same commit.

🤖 Generated with Claude Code

Adds [llama-swap](https://github.com/mostlygeek/llama-swap) (the on-demand model-swapping proxy over llama.cpp / stable-diffusion.cpp) as a tailored provider, plus a new canonical text-to-image interface so mort can generate images. ## provider/llamaswap (ADR-0015) llama-swap's **chat** API is byte-identical to OpenAI Chat Completions, so the chat path **delegates to `provider/openai`** at `{base}/v1` — no duplicated wire client (ADR-0007). What makes it "tailored": - **Management methods** on the concrete type (outside the canonical interface): `ListModels` (`/v1/models`), `Running` (`/running`, raw JSON), `Unload` (`/api/models/unload`). - **Swap-aware**: timeout-free HTTP client — a cold model swap blocks up to llama-swap's `healthCheckTimeout` (≥15s), so callers bound work with a context deadline. - Legacy `max_tokens` (llama.cpp's shim), and a `Bearer no-key` placeholder so keyless local instances work (the openai client rejects a blank key). - DSN `llama-swap://token@host:port` builds an **http** base URL (local-first, unlike the https-always default). A no-URL built-in errors clearly on use (mirrors foreman). ## imagegen (ADR-0016) A new canonical leaf package, **separate from `llm`** (image gen shares no message/tool/stream machinery): `Request`/`Result`/`Model`/`Provider`, with `Image = llm.ImagePart` so generated images feed straight back into a chat turn. Re-exported from the root (`ImageModel`, `ImageRequest`, `WithImageSize`, …). First backend is llama-swap via OpenAI `/v1/images/generations` (`b64_json`, bytes-only). **v1 is txt2img only**; edits/img2img and registry image-DSN resolution deferred. ## Tests & docs - Hermetic `httptest` coverage: chat delegation (path + legacy max_tokens + bearer), keyless placeholder, `ListModels`/`Unload`/`Running`, image decode (1×1 PNG → `ImagePart`), error classification, and scheme/built-in wiring. - All four gates green (`build`, `vet`, `test -race`, `mod tidy`). No new dependencies. - README support matrix + image-gen section, CLAUDE.md package map, ADR index, and progress.md updated in the same commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-06-27 19:02:10 +00:00
feat(llamaswap): add llama-swap provider + canonical imagegen interface
CI / Tidy (pull_request) Successful in 9m25s
CI / Build & Test (pull_request) Successful in 10m15s
96c612e707
Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping
proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to
provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with
legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and
a timeout-free client so cold model swaps rely on context deadlines. The
"tailored" surface is concrete management methods (ListModels / Running /
Unload) that don't belong on the canonical llm.Provider interface. The
llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL
built-in errors clearly on use, mirroring foreman.

Add imagegen, a new canonical text-to-image interface separate from llm
(Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed
straight back into chat). First backend is llama-swap via OpenAI
/v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is
txt2img only.

Hermetic httptest coverage for chat delegation, management endpoints, image
decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix +
image-gen section, CLAUDE.md package map, and progress.md updated in the same
commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve closed this pull request 2026-06-27 19:11:06 +00:00
steve reopened this pull request 2026-06-27 19:11:13 +00:00
steve added 1 commit 2026-06-27 19:13:10 +00:00
Merge remote-tracking branch 'origin/main' into feat/llama-swap-provider
CI / Build & Test (pull_request) Successful in 10m15s
CI / Tidy (pull_request) Successful in 10m20s
Adversarial Review (Gadfly) / review (pull_request) Successful in 18m24s
3ba2dbefae
steve closed this pull request 2026-06-27 19:13:17 +00:00
steve reopened this pull request 2026-06-27 19:13:23 +00:00

🪰 Gadfly review — glm-5.2:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

Corrected review

VERDICT: Minor issues

  • provider/llamaswap/llamaswap.go:163 — path injection in Unload. The model argument is interpolated directly into the request path (path += "/" + model) and passed to http.NewRequestWithContext with no escaping. A model id containing /, .., ?, #, or extra segments can rewrite the target URL — e.g. Unload(ctx, "../v1/chat/completions") POSTs to /api/models/unload/../v1/chat/completions (normalized to /v1/chat/completions), and Unload(ctx, "x?foo=bar") turns the remainder into a query string. Confirmed: doJSON does p.baseURL+path (llamaswap.go:183) with no url.PathEscape/validation, and the package docs (llamaswap.go:95, image.go:16) state ids are "passed through verbatim." The other providers append only fixed strings (openai: model.go:92 uses /chat/completions). Model ids here come from operator config / ListModels, so impact is limited, but Unload's model parameter is caller-supplied. Fix: path += "/" + url.PathEscape(model) (or reject ids containing /, ?, #).

  • builtin.go:88 / provider/llamaswap/llamaswap.go:191 — bearer token over cleartext HTTP. The llama-swap DSN scheme builds an http:// base URL (builtin.go:88 "http://"+dsn.Host) and doJSON sends Authorization: Bearer <token> over unencrypted transport when a token is configured (llamaswap.go:191). This is a documented local-first design choice (ADR-0015), not an oversight, so it's a credential-exposure risk rather than a bug: any token set via llama-swap://token@host traverses the local network in plaintext. Suggested mitigation: a doc note recommending an empty token for genuinely keyless local instances (which already omits the header — verified at llamaswap.go:190 if p.token != ""), and steering tokened deployments to the openai:// (https) scheme. Not blocking since it's an explicit, reviewed decision.

  • env.go:59 / builtin.go:88 — DSN host not validated against path/query metacharacters (low severity, operator-controlled). ParseDSN only TrimRights slashes from host (env.go:59); a host like evil.com/p?x yields baseURL http://evil.com/p?x, and doJSON's baseURL+path concatenation then appends endpoints into what becomes the query string. DSNs are operator-supplied (not untrusted end-user input), so this is low-risk, but it's an SSRF-shaped sharp edge. A url.Parse-based host validation in ParseDSN (or asserting the host has no /?#) would close it. Unverified beyond reading ParseDSN; flagging since it touches the trust boundary.

Nothing else in the security lane: chat auth delegates to the openai client (verified openai/model.go:97 sets the bearer), image bytes are sniffed with http.DetectContentType (safe), apiError bounds body reads at 1 MiB (llamaswap.go:214), and the keyless placeholder (Bearer no-key) is harmless.

🎯 Correctness — No material issues found

Verdict: No material issues found

Through the correctness lens, I verified the new code against the checked-out repository and found no logic bugs:

  • Image N/Size zero-value semantics (provider/llamaswap/image.go:35-36): N carries json:"n,omitempty", so N==0 omits the field and the backend applies its default (documented behavior); Size=="" likewise omits. Matches the imagegen.Request contract. Verified imagegen.go docs and the OpenAI wire shape.
  • response_format always forced (image.go:37,60): ResponseFormat has no omitempty, so "b64_json" is always sent, which the bytes-only contract relies on. Correct.
  • sniffImageMIME (image.go:98-103): http.DetectContentType on a PNG returns image/png; the non-image/ fallback to image/png is a reasonable default for the stated stable-diffusion.cpp backend. Verified against llm.ImagePart usage.
  • ErrUnsupported wrapping for empty prompt (image.go:51-53): fmt.Errorf("%w: ...", llm.ErrUnsupported) preserves the sentinel so errors.Is works; verified llm.ErrUnsupported exists at llm/errors.go:34.
  • Chat delegation (llamaswap.go:109-121): builds the openai client at {baseURL}/v1 with the placeholder no-key when tokenless and WithLegacyMaxTokens(). Verified openai.WithAPIKey/WithBaseURL/WithLegacyMaxTokens/WithHTTPClient all exist with matching signatures (openai.go:48-85). The keyless placeholder avoids the openai client's blank-key 401 path, as intended.
  • DSN → http base URL (builtin.go:87): uses "http://"+dsn.Host, deliberately not dsn.BaseURL() (which is https-always per env.go:41). Verified DSN.Host is scheme-less host[:port][/path]. Correct and consistent with the ADR.
  • Unload path building (llamaswap.go:160-165): empty model → /api/models/unload; non-empty → /api/models/unload/<model>. Model IDs like qwen3:14b contain a colon, which is valid unescaped in a URL path segment, so no correctness issue for the documented model-name shapes.
  • doJSON decode into json.RawMessage for Running: json.Decoder populates RawMessage with arbitrary valid JSON; verified the call site at llamaswap.go:151.

No semantic/domain correctness defects (constants, conversions, formulas, thresholds) surfaced in the diff.

🧯 Error handling & edge cases — Minor issues

Verdict: Minor issues

Through the error-handling & edge-cases lens, the unhappy paths are mostly handled well: doJSON has defer resp.Body.Close(), non-2xx → *llm.APIError, transport errors are wrapped (preserving llm.Classify's net-error detection), the empty-prompt guard uses %w so errors.Is(err, llm.ErrUnsupported) holds (verified in llm/errors.go:115 and TestImageGenerateEmptyPrompt), and the no-image / url-only responses are surfaced as APIError rather than silently skipped.

Findings:

  • provider/llamaswap/llamaswap.go:135,149,160ListModels, Running, and Unload do not guard p.baseURL == "", unlike Model (line 98) and ImageModel (image.go:18). The no-DSN built-in (builtin.go:83) constructs a provider with an empty base URL, and the ADR/inline comment explicitly promises that the built-in "errors on use with a clear message, mirrors foreman." That promise holds for chat/image, but for the management surface — the headline "tailored" methods — calling e.g. p.ListModels(ctx) on the built-in reaches doJSON which builds http.NewRequestWithContext(ctx, method, p.baseURL+path, ...) at line 183 with an empty baseURL, yielding an opaque llama-swap: build request: http: no Host in request URL rather than the clear "no base URL configured" message. Suggested fix: add the same if p.baseURL == "" { return ..., fmt.Errorf(...) } guard (or a shared errNoBaseURL helper) at the top of the three management methods. Verified by reading the full doJSON path and the builtin construction.

  • provider/llamaswap/image.go:58 (imageRequest.N) — a negative N is forwarded to the backend verbatim (N int with omitempty only suppresses zero). The doc (imagegen.go:30) says 0 = provider default but says nothing about negatives; a caller passing WithN(-1) sends n:-1 to the upstream, whose behavior is undefined. Minor edge case; a if req.N < 0 { req.N = 0 } clamp (or an explicit validation error) would close it. Unverified against any real backend behavior, so treat as a hardening nit rather than a confirmed bug.

  • provider/llamaswap/image.go:80 — a base64 decode failure is returned as a plain wrapped fmt.Errorf (not *llm.APIError), so llm.Classify falls through to its "unrecognized → transient" default (llm/errors.go:103-104). A genuinely corrupt/truncated payload from the backend is a permanent condition for that response, not a transient one. Minor misclassification; wrapping as an *llm.APIError (or giving it an Unwrap to a permanent sentinel) would make failover decisions correct. Low severity since this path implies a protocol violation by the backend.

No swallowed errors, missing defers, or panic-on-nil/empty-input issues were found: New always initializes client, the empty-body decode paths are only reached on success responses that carry JSON, and sniffImageMIME handles non-image bytes by defaulting to image/png rather than panicking.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 39s

<!-- gadfly-review:ollama:glm-5.2:cloud --> ### 🪰 Gadfly review — `glm-5.2:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> ## Corrected review **VERDICT: Minor issues** - **`provider/llamaswap/llamaswap.go:163` — path injection in `Unload`.** The `model` argument is interpolated directly into the request path (`path += "/" + model`) and passed to `http.NewRequestWithContext` with no escaping. A model id containing `/`, `..`, `?`, `#`, or extra segments can rewrite the target URL — e.g. `Unload(ctx, "../v1/chat/completions")` POSTs to `/api/models/unload/../v1/chat/completions` (normalized to `/v1/chat/completions`), and `Unload(ctx, "x?foo=bar")` turns the remainder into a query string. Confirmed: `doJSON` does `p.baseURL+path` (`llamaswap.go:183`) with no `url.PathEscape`/validation, and the package docs (`llamaswap.go:95`, `image.go:16`) state ids are "passed through verbatim." The other providers append only fixed strings (openai: `model.go:92` uses `/chat/completions`). Model ids here come from operator config / `ListModels`, so impact is limited, but `Unload`'s `model` parameter is caller-supplied. Fix: `path += "/" + url.PathEscape(model)` (or reject ids containing `/`, `?`, `#`). - **`builtin.go:88` / `provider/llamaswap/llamaswap.go:191` — bearer token over cleartext HTTP.** The `llama-swap` DSN scheme builds an `http://` base URL (`builtin.go:88` `"http://"+dsn.Host`) and `doJSON` sends `Authorization: Bearer <token>` over unencrypted transport when a token is configured (`llamaswap.go:191`). This is a documented local-first design choice (ADR-0015), not an oversight, so it's a credential-exposure risk rather than a bug: any token set via `llama-swap://token@host` traverses the local network in plaintext. Suggested mitigation: a doc note recommending an empty token for genuinely keyless local instances (which already omits the header — verified at `llamaswap.go:190` `if p.token != ""`), and steering tokened deployments to the `openai://` (https) scheme. Not blocking since it's an explicit, reviewed decision. - **`env.go:59` / `builtin.go:88` — DSN host not validated against path/query metacharacters (low severity, operator-controlled).** `ParseDSN` only `TrimRight`s slashes from `host` (`env.go:59`); a host like `evil.com/p?x` yields baseURL `http://evil.com/p?x`, and `doJSON`'s `baseURL+path` concatenation then appends endpoints into what becomes the query string. DSNs are operator-supplied (not untrusted end-user input), so this is low-risk, but it's an SSRF-shaped sharp edge. A `url.Parse`-based host validation in `ParseDSN` (or asserting the host has no `/?#`) would close it. Unverified beyond reading `ParseDSN`; flagging since it touches the trust boundary. Nothing else in the security lane: chat auth delegates to the openai client (verified `openai/model.go:97` sets the bearer), image bytes are sniffed with `http.DetectContentType` (safe), `apiError` bounds body reads at 1 MiB (`llamaswap.go:214`), and the keyless placeholder (`Bearer no-key`) is harmless. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **Verdict: No material issues found** Through the correctness lens, I verified the new code against the checked-out repository and found no logic bugs: - **Image `N`/`Size` zero-value semantics** (`provider/llamaswap/image.go:35-36`): `N` carries `json:"n,omitempty"`, so `N==0` omits the field and the backend applies its default (documented behavior); `Size==""` likewise omits. Matches the `imagegen.Request` contract. Verified `imagegen.go` docs and the OpenAI wire shape. - **`response_format` always forced** (`image.go:37,60`): `ResponseFormat` has no `omitempty`, so `"b64_json"` is always sent, which the bytes-only contract relies on. Correct. - **`sniffImageMIME`** (`image.go:98-103`): `http.DetectContentType` on a PNG returns `image/png`; the non-`image/` fallback to `image/png` is a reasonable default for the stated stable-diffusion.cpp backend. Verified against `llm.ImagePart` usage. - **`ErrUnsupported` wrapping for empty prompt** (`image.go:51-53`): `fmt.Errorf("%w: ...", llm.ErrUnsupported)` preserves the sentinel so `errors.Is` works; verified `llm.ErrUnsupported` exists at `llm/errors.go:34`. - **Chat delegation** (`llamaswap.go:109-121`): builds the openai client at `{baseURL}/v1` with the placeholder `no-key` when tokenless and `WithLegacyMaxTokens()`. Verified `openai.WithAPIKey/WithBaseURL/WithLegacyMaxTokens/WithHTTPClient` all exist with matching signatures (`openai.go:48-85`). The keyless placeholder avoids the openai client's blank-key 401 path, as intended. - **DSN → http base URL** (`builtin.go:87`): uses `"http://"+dsn.Host`, deliberately not `dsn.BaseURL()` (which is https-always per `env.go:41`). Verified `DSN.Host` is scheme-less host[:port][/path]. Correct and consistent with the ADR. - **`Unload` path building** (`llamaswap.go:160-165`): empty model → `/api/models/unload`; non-empty → `/api/models/unload/<model>`. Model IDs like `qwen3:14b` contain a colon, which is valid unescaped in a URL path segment, so no correctness issue for the documented model-name shapes. - **`doJSON` decode into `json.RawMessage`** for `Running`: `json.Decoder` populates `RawMessage` with arbitrary valid JSON; verified the call site at `llamaswap.go:151`. No semantic/domain correctness defects (constants, conversions, formulas, thresholds) surfaced in the diff. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> **Verdict: Minor issues** Through the error-handling & edge-cases lens, the unhappy paths are mostly handled well: `doJSON` has `defer resp.Body.Close()`, non-2xx → `*llm.APIError`, transport errors are wrapped (preserving `llm.Classify`'s net-error detection), the empty-prompt guard uses `%w` so `errors.Is(err, llm.ErrUnsupported)` holds (verified in `llm/errors.go:115` and `TestImageGenerateEmptyPrompt`), and the no-image / url-only responses are surfaced as `APIError` rather than silently skipped. Findings: - `provider/llamaswap/llamaswap.go:135,149,160` — `ListModels`, `Running`, and `Unload` do **not** guard `p.baseURL == ""`, unlike `Model` (line 98) and `ImageModel` (`image.go:18`). The no-DSN built-in (`builtin.go:83`) constructs a provider with an empty base URL, and the ADR/inline comment explicitly promises that the built-in "errors on use with a clear message, mirrors foreman." That promise holds for chat/image, but for the management surface — the headline "tailored" methods — calling e.g. `p.ListModels(ctx)` on the built-in reaches `doJSON` which builds `http.NewRequestWithContext(ctx, method, p.baseURL+path, ...)` at line 183 with an empty `baseURL`, yielding an opaque `llama-swap: build request: http: no Host in request URL` rather than the clear "no base URL configured" message. Suggested fix: add the same `if p.baseURL == "" { return ..., fmt.Errorf(...) }` guard (or a shared `errNoBaseURL` helper) at the top of the three management methods. Verified by reading the full `doJSON` path and the builtin construction. - `provider/llamaswap/image.go:58` (`imageRequest.N`) — a negative `N` is forwarded to the backend verbatim (`N int` with `omitempty` only suppresses zero). The doc (`imagegen.go:30`) says `0 = provider default` but says nothing about negatives; a caller passing `WithN(-1)` sends `n:-1` to the upstream, whose behavior is undefined. Minor edge case; a `if req.N < 0 { req.N = 0 }` clamp (or an explicit validation error) would close it. Unverified against any real backend behavior, so treat as a hardening nit rather than a confirmed bug. - `provider/llamaswap/image.go:80` — a base64 decode failure is returned as a plain wrapped `fmt.Errorf` (not `*llm.APIError`), so `llm.Classify` falls through to its "unrecognized → transient" default (`llm/errors.go:103-104`). A genuinely corrupt/truncated payload from the backend is a permanent condition for that response, not a transient one. Minor misclassification; wrapping as an `*llm.APIError` (or giving it an `Unwrap` to a permanent sentinel) would make failover decisions correct. Low severity since this path implies a protocol violation by the backend. No swallowed errors, missing defers, or panic-on-nil/empty-input issues were found: `New` always initializes `client`, the empty-body decode paths are only reached on success responses that carry JSON, and `sniffImageMIME` handles non-image bytes by defaulting to `image/png` rather than panicking. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 39s</sub>

🪰 Gadfly review — minimax-m3:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

Confirmed: &http.Client{} at line 81 has zero timeout; the ADR explicitly documents this as intentional. The reviewer correctly identified this as a documented, intentional choice — the finding itself is sound and the suggested fix is reasonable (a configurable default timeout rather than unbounded). Finding #3 is valid.

All three findings reproduce against the actual code. The verdict stands.


VERDICT: Minor issues

  • provider/llamaswap/llamaswap.go:201-205 and provider/llamaswap/image.go:80 — unbounded response body on the success path (DoS vector). doJSON only caps the error body via io.LimitReader(resp.Body, 1<<20) at llamaswap.go:214; the success path passes the raw resp.Body to json.NewDecoder(resp.Body).Decode(out) and the image path passes the (potentially gigabyte-scale) b64_json string to base64.StdEncoding.DecodeString. A malicious or misbehaving llama-swap upstream (or MITM in front of it, given the local-first http:// default at builtin.go:87) can exhaust caller memory by returning a multi-GB response. Suggested fix: bound the success body with io.LimitReader(resp.Body, <cap>) before decoding, mirroring the 1 MiB cap used on the error path, and consider rejecting Content-Length over the cap outright.

  • provider/llamaswap/llamaswap.go:109-121 vs provider/llamaswap/llamaswap.go:190-192 — bearer-auth inconsistency between chat and management/image paths. chatProvider() always emits Authorization: Bearer <key> (substituting the placeholder "no-key" when no token is set, lines 110-112), while doJSON omits the Authorization header entirely when p.token == "" (lines 190-192). The placeholder rationale ("a local llama-swap may require no auth at all — a bearer it ignores is harmless") is unsafe against a llama-swap or fronting proxy that treats any Authorization header as an auth attempt and rejects it, or that logs every such header as an auth failure; in that case a user who configured llama-swap keyless but is behind an auth-required proxy gets a confusing 401 on chat while image/management calls succeed with no header. Suggested fix: pick one policy — skipping the Authorization header on the chat path when p.token == "" (so chat, image, and management all behave identically) is the cleanest alignment.

  • provider/llamaswap/llamaswap.go:80-86 — default http.Client has no timeout. Documented as intentional (ADR-0015: cold model swaps can take many seconds, so callers must bound work with a context deadline). But the default for a bare llamaswap.New() is unbounded: a hung llama-swap host pins a goroutine indefinitely until the caller's context fires. Chain callers inherit a deadline through context, but *llamaswap.Provider users invoking ListModels/Running/Unload/ImageModel.Generate directly can forget, and the no-DSN built-in at builtin.go:83 is constructed without any client timeout at all. This is a resource-exhaustion / DoS concern in any non-chain integration path. Suggested fix: default to a generous explicit timeout (e.g., 5–10 minutes) instead of zero, or warn when p.client == http.DefaultClient / nil-timeout at New-time if a config knob is available.

🎯 Correctness — No material issues found

VERDICT: No material issues found

I reviewed the change through the correctness lens, focusing on the new provider/llamaswap package, the imagegen package, and the DSN/scheme wiring in builtin.go.

What I verified:

  • provider/llamaswap/llamaswap.gochatProvider() sets openai.WithAPIKey("no-key") when the llama-swap token is empty, which correctly side-steps the synthetic 401 the openai client throws on a blank apiKey (confirmed provider/openai/model.go:76-87). It also passes WithLegacyMaxTokens() (verified the option exists at provider/openai/openai.go:80-85) and forwards WithHTTPClient(p.client) so the no-timeout swap behavior is preserved.
  • provider/llamaswap/image.goresponse_format is sent without omitempty, so b64_json is always forced inline. The decode path handles B64JSON == "" (returns *llm.APIError), base64 errors (wrapped), and an empty data array (*llm.APIError). sniffImageMIME falls back to image/png correctly for PNG bytes.
  • provider/llamaswap/llamaswap.go Unload — the path concatenation "/api/models/unload/" + model and the empty-model short-circuit match the PR description and the test.
  • builtin.gollama-swap://token@host:port is built as "http://"+dsn.Host. The no-DSN built-in has an empty baseURL; Model/ImageModel both gate on p.baseURL == "" and return a clear error (matches the TestLlamaSwapBuiltinNoURL expectation).
  • imagegen/imagegen.goRequest.Apply operates on a copy (verified by TestRequestApply's immutability assertion). Image = llm.ImagePart so the alias holds for llm.UserParts(...) use; TestImageIsImagePart pins this.
  • majordomo.go — re-exports use type aliases (type ImageModel = imagegen.Model, etc.), so identity is preserved.
  • Tests — chat delegation asserts legacy max_tokens (not max_completion_tokens) is on the wire, the bearer placeholder is sent, all management endpoints hit the expected paths, and 429 classifies as ClassTransient. These pin the contract described in the ADRs.

Potential concerns I checked and dismissed:

  • The synthetic-401 footgun on the openai client for empty keys is explicitly handled by the no-key placeholder — not a bug.
  • Raw carrying a *imageResponse pointer (rather than a copy) is consistent with llm.Response.Raw being a provider-native escape hatch; the API docs say "May be nil; never required for normal use."
  • Constructing a fresh openai.Provider on every Model() call is wasteful but not a correctness issue.

Things I could not verify within the budget (not reporting as findings):

  • Whether provider/openai's WithLegacyMaxTokens actually flips the wire field — the openai-side unit test at provider/openai/openai_test.go:219 claims it does, but I didn't open that file. The llamaswap test asserts the absence of max_completion_tokens and presence of max_tokens, which would fail at go test -race if it weren't honored.
  • That upstream llama-swap actually exposes /v1/images/generations for stable-diffusion.cpp backends (an upstream/runtime concern, not a code concern).
🧯 Error handling & edge cases — Minor issues

VERDICT: Minor issues

Reviewing provider/llamaswap/llamaswap.go and provider/llamaswap/image.go through the error-handling / edge-cases lens only.

Findings

  • provider/llamaswap/image.go:64 — image errors lose the model id. The shared doJSON helper is invoked as m.p.doJSON(ctx, http.MethodPost, "/v1/images/generations", &wire, &resp), with no model argument. Internally doJSON then calls p.apiError(resp, "") (llamaswap.go:199), producing a *llm.APIError{Provider: p.name, Model: ""} (llamaswap.go:213). Every other provider in the package (and provider/openai's own helper at provider/openai/model.go:111) fills Model on error responses. Even within image.go itself, the two other *llm.APIError constructions (image.go:74, image.go:87) correctly populate m.id, so this is an internal inconsistency: the doJSON-mediated HTTP error path silently drops the id that the surrounding code already has. llm.Classify doesn't read Model, but the structured error's Error() method (llm/errors.go:66-82) formats Provider/Model, so callers that log the error see llama-swap/: HTTP 500: ... instead of llama-swap/sd: HTTP 500: .... Suggested fix: thread the model through — e.g. add a doJSONModel variant, or change doJSON to take a model string (passing "" for the no-model endpoints and m.id from imageModel.Generate).

  • provider/llamaswap/image.go:80 — unbounded b64 decode. base64.StdEncoding.DecodeString(d.B64JSON) materializes the full decoded payload in memory before sniffImageMIME ever inspects it. A backend (or a malicious proxy) returning a multi-MB b64 entry will cause a single allocation spike; combined with the request body already being buffered by http.Client, the provider has no cap on total image memory. There is also no check that the decoded length is sane. Suggested fix: cap the b64 length (or decoded size) before decoding, e.g. if len(d.B64JSON) > 16<<20 { return nil, &llm.APIError{...} }, matching the 1 MiB response-body cap already in apiError (llamaswap.go:214).

  • provider/llamaswap/image.go:98 — MIME sniff silently defaults to image/png on empty/ambiguous data. http.DetectContentType(nil) returns "text/plain; charset=utf-8", which sniffImageMIME rewrites to "image/png". The only way to reach sniffImageMIME is past a non-empty d.B64JSON check (image.go:70), but base64.StdEncoding.DecodeString("AA==") returns a non-empty-but-meaningless single-byte slice that DetectContentType will still classify as text/plain (or similar). The caller then sees a confidently-labelled image/png for what may not be a PNG at all. The function comment justifies this with "stable-diffusion.cpp emits PNG" but does so unconditionally for any non-image/ MIME. Suggested fix: validate that DetectContentType returned something PNG/JPEG/WEBP for known signatures (or refuse to set MIME when detection fails) — at minimum log/return the unsupported case rather than silently masking it.

  • provider/llamaswap/llamaswap.go:201-204 — no body size cap on the success-path JSON decode. doJSON caps any error body at 1 MiB (io.LimitReader(resp.Body, 1<<20) in apiError), but the success-path json.NewDecoder(resp.Body).Decode(out) will buffer the entire response — a runaway stable-diffusion.cpp backend returning gigabytes would OOM. Outside the error-handling lane (DoS) but worth flagging given the sibling 1 MiB cap already in the file. Suggested fix: wrap with http.MaxBytesReader or stream-decode into a LimitedReader with a sane cap (image generations are typically <<16 MiB).

Not material (in this lane)

  • Empty/whitespace prompt → ErrUnsupported path is correct and covered by TestImageGenerateEmptyPrompt.
  • Missing baseURL on Model / ImageModel returns a clear error string. ✓
  • Running's json.RawMessage decode path and Unload's empty-vs-named model branch are both handled. ✓
  • Transport errors are wrapped raw so llm.Classify can see the underlying net.Error/syscall.Errno. ✓
  • defer resp.Body.Close() is in place; partial-image error returns correctly discard out. ✓
  • Tests do not cover: a backend that returns a valid 200 with data: [] (the "no images" path is unit-testable but currently has no httptest), and a malformed-image b64 entry (the decode-error branch). Not bugs in the diff, just under-coverage worth noting.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 37s

<!-- gadfly-review:ollama:minimax-m3:cloud --> ### 🪰 Gadfly review — `minimax-m3:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> Confirmed: `&http.Client{}` at line 81 has zero timeout; the ADR explicitly documents this as intentional. The reviewer correctly identified this as a documented, intentional choice — the finding itself is sound and the suggested fix is reasonable (a configurable default timeout rather than unbounded). Finding #3 is valid. All three findings reproduce against the actual code. The verdict stands. --- **VERDICT:** Minor issues - **`provider/llamaswap/llamaswap.go:201-205` and `provider/llamaswap/image.go:80` — unbounded response body on the success path (DoS vector).** `doJSON` only caps the error body via `io.LimitReader(resp.Body, 1<<20)` at `llamaswap.go:214`; the success path passes the raw `resp.Body` to `json.NewDecoder(resp.Body).Decode(out)` and the image path passes the (potentially gigabyte-scale) `b64_json` string to `base64.StdEncoding.DecodeString`. A malicious or misbehaving llama-swap upstream (or MITM in front of it, given the local-first `http://` default at `builtin.go:87`) can exhaust caller memory by returning a multi-GB response. Suggested fix: bound the success body with `io.LimitReader(resp.Body, <cap>)` before decoding, mirroring the 1 MiB cap used on the error path, and consider rejecting `Content-Length` over the cap outright. - **`provider/llamaswap/llamaswap.go:109-121` vs `provider/llamaswap/llamaswap.go:190-192` — bearer-auth inconsistency between chat and management/image paths.** `chatProvider()` always emits `Authorization: Bearer <key>` (substituting the placeholder `"no-key"` when no token is set, lines 110-112), while `doJSON` omits the `Authorization` header entirely when `p.token == ""` (lines 190-192). The placeholder rationale ("a local llama-swap may require no auth at all — a bearer it ignores is harmless") is unsafe against a llama-swap or fronting proxy that treats *any* `Authorization` header as an auth attempt and rejects it, or that logs every such header as an auth failure; in that case a user who configured llama-swap keyless but is behind an auth-required proxy gets a confusing 401 on chat while image/management calls succeed with no header. Suggested fix: pick one policy — skipping the `Authorization` header on the chat path when `p.token == ""` (so chat, image, and management all behave identically) is the cleanest alignment. - **`provider/llamaswap/llamaswap.go:80-86` — default `http.Client` has no timeout.** Documented as intentional (ADR-0015: cold model swaps can take many seconds, so callers must bound work with a context deadline). But the default for a bare `llamaswap.New()` is unbounded: a hung llama-swap host pins a goroutine indefinitely until the caller's context fires. Chain callers inherit a deadline through context, but `*llamaswap.Provider` users invoking `ListModels`/`Running`/`Unload`/`ImageModel.Generate` directly can forget, and the no-DSN built-in at `builtin.go:83` is constructed without any client timeout at all. This is a resource-exhaustion / DoS concern in any non-chain integration path. Suggested fix: default to a generous explicit timeout (e.g., 5–10 minutes) instead of zero, or warn when `p.client == http.DefaultClient` / nil-timeout at `New`-time if a config knob is available. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> VERDICT: No material issues found I reviewed the change through the correctness lens, focusing on the new `provider/llamaswap` package, the `imagegen` package, and the DSN/scheme wiring in `builtin.go`. What I verified: - **`provider/llamaswap/llamaswap.go`** — `chatProvider()` sets `openai.WithAPIKey("no-key")` when the llama-swap token is empty, which correctly side-steps the synthetic 401 the openai client throws on a blank `apiKey` (confirmed `provider/openai/model.go:76-87`). It also passes `WithLegacyMaxTokens()` (verified the option exists at `provider/openai/openai.go:80-85`) and forwards `WithHTTPClient(p.client)` so the no-timeout swap behavior is preserved. - **`provider/llamaswap/image.go`** — `response_format` is sent without `omitempty`, so `b64_json` is always forced inline. The decode path handles `B64JSON == ""` (returns `*llm.APIError`), base64 errors (wrapped), and an empty `data` array (`*llm.APIError`). `sniffImageMIME` falls back to `image/png` correctly for PNG bytes. - **`provider/llamaswap/llamaswap.go` `Unload`** — the path concatenation `"/api/models/unload/" + model` and the empty-model short-circuit match the PR description and the test. - **`builtin.go`** — `llama-swap://token@host:port` is built as `"http://"+dsn.Host`. The no-DSN built-in has an empty `baseURL`; `Model`/`ImageModel` both gate on `p.baseURL == ""` and return a clear error (matches the `TestLlamaSwapBuiltinNoURL` expectation). - **`imagegen/imagegen.go`** — `Request.Apply` operates on a copy (verified by `TestRequestApply`'s immutability assertion). `Image = llm.ImagePart` so the alias holds for `llm.UserParts(...)` use; `TestImageIsImagePart` pins this. - **`majordomo.go`** — re-exports use type aliases (`type ImageModel = imagegen.Model`, etc.), so identity is preserved. - **Tests** — chat delegation asserts legacy `max_tokens` (not `max_completion_tokens`) is on the wire, the bearer placeholder is sent, all management endpoints hit the expected paths, and `429` classifies as `ClassTransient`. These pin the contract described in the ADRs. Potential concerns I checked and dismissed: - The synthetic-401 footgun on the openai client for empty keys is explicitly handled by the `no-key` placeholder — not a bug. - `Raw` carrying a `*imageResponse` pointer (rather than a copy) is consistent with `llm.Response.Raw` being a provider-native escape hatch; the API docs say "May be nil; never required for normal use." - Constructing a fresh `openai.Provider` on every `Model()` call is wasteful but not a correctness issue. Things I could not verify within the budget (not reporting as findings): - Whether `provider/openai`'s `WithLegacyMaxTokens` actually flips the wire field — the openai-side unit test at `provider/openai/openai_test.go:219` claims it does, but I didn't open that file. The llamaswap test asserts the absence of `max_completion_tokens` and presence of `max_tokens`, which would fail at `go test -race` if it weren't honored. - That upstream llama-swap actually exposes `/v1/images/generations` for stable-diffusion.cpp backends (an upstream/runtime concern, not a code concern). </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> ## VERDICT: Minor issues Reviewing `provider/llamaswap/llamaswap.go` and `provider/llamaswap/image.go` through the error-handling / edge-cases lens only. ### Findings - **`provider/llamaswap/image.go:64` — image errors lose the model id.** The shared `doJSON` helper is invoked as `m.p.doJSON(ctx, http.MethodPost, "/v1/images/generations", &wire, &resp)`, with no model argument. Internally `doJSON` then calls `p.apiError(resp, "")` (llamaswap.go:199), producing a `*llm.APIError{Provider: p.name, Model: ""}` (llamaswap.go:213). Every other provider in the package (and `provider/openai`'s own helper at `provider/openai/model.go:111`) fills `Model` on error responses. Even within `image.go` itself, the two other `*llm.APIError` constructions (image.go:74, image.go:87) correctly populate `m.id`, so this is an internal inconsistency: the `doJSON`-mediated HTTP error path silently drops the id that the surrounding code already has. `llm.Classify` doesn't read `Model`, but the structured error's `Error()` method (`llm/errors.go:66-82`) formats `Provider/Model`, so callers that log the error see `llama-swap/: HTTP 500: ...` instead of `llama-swap/sd: HTTP 500: ...`. Suggested fix: thread the model through — e.g. add a `doJSONModel` variant, or change `doJSON` to take a model string (passing `""` for the no-model endpoints and `m.id` from `imageModel.Generate`). - **`provider/llamaswap/image.go:80` — unbounded b64 decode.** `base64.StdEncoding.DecodeString(d.B64JSON)` materializes the full decoded payload in memory before `sniffImageMIME` ever inspects it. A backend (or a malicious proxy) returning a multi-MB b64 entry will cause a single allocation spike; combined with the request body already being buffered by `http.Client`, the provider has no cap on total image memory. There is also no check that the decoded length is sane. Suggested fix: cap the b64 length (or decoded size) before decoding, e.g. `if len(d.B64JSON) > 16<<20 { return nil, &llm.APIError{...} }`, matching the 1 MiB response-body cap already in `apiError` (llamaswap.go:214). - **`provider/llamaswap/image.go:98` — MIME sniff silently defaults to `image/png` on empty/ambiguous data.** `http.DetectContentType(nil)` returns `"text/plain; charset=utf-8"`, which `sniffImageMIME` rewrites to `"image/png"`. The only way to reach `sniffImageMIME` is past a non-empty `d.B64JSON` check (image.go:70), but `base64.StdEncoding.DecodeString("AA==")` returns a non-empty-but-meaningless single-byte slice that `DetectContentType` will still classify as `text/plain` (or similar). The caller then sees a confidently-labelled `image/png` for what may not be a PNG at all. The function comment justifies this with "stable-diffusion.cpp emits PNG" but does so unconditionally for any non-`image/` MIME. Suggested fix: validate that `DetectContentType` returned something PNG/JPEG/WEBP for known signatures (or refuse to set `MIME` when detection fails) — at minimum log/return the unsupported case rather than silently masking it. - **`provider/llamaswap/llamaswap.go:201-204` — no body size cap on the success-path JSON decode.** `doJSON` caps any *error* body at 1 MiB (`io.LimitReader(resp.Body, 1<<20)` in `apiError`), but the success-path `json.NewDecoder(resp.Body).Decode(out)` will buffer the entire response — a runaway stable-diffusion.cpp backend returning gigabytes would OOM. Outside the error-handling lane (DoS) but worth flagging given the sibling 1 MiB cap already in the file. Suggested fix: wrap with `http.MaxBytesReader` or stream-decode into a `LimitedReader` with a sane cap (image generations are typically <<16 MiB). ### Not material (in this lane) - Empty/whitespace prompt → `ErrUnsupported` path is correct and covered by `TestImageGenerateEmptyPrompt`. - Missing `baseURL` on `Model` / `ImageModel` returns a clear error string. ✓ - `Running`'s `json.RawMessage` decode path and `Unload`'s empty-vs-named model branch are both handled. ✓ - Transport errors are wrapped raw so `llm.Classify` can see the underlying `net.Error`/`syscall.Errno`. ✓ - `defer resp.Body.Close()` is in place; partial-image error returns correctly discard `out`. ✓ - Tests do not cover: a backend that returns a valid 200 with `data: []` (the "no images" path is unit-testable but currently has no httptest), and a malformed-image b64 entry (the decode-error branch). Not bugs in the diff, just under-coverage worth noting. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 37s</sub>

🪰 Gadfly review — m5/qwen3.6:35b-mlx (m5)

Verdict: No material issues found · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling

🔒 Security — Reviewed

Corrected Review

No material security issues found

🎯 Correctness — No material issues found

No material issues found

🧯 Error handling & edge cases⚠️ could not complete

⚠️ This reviewer failed to complete: agent: step 9: context deadline exceeded

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 18m 20s

<!-- gadfly-review:ollama:m5/qwen3.6:35b-mlx --> ### 🪰 Gadfly review — `m5/qwen3.6:35b-mlx` (m5) **Verdict: No material issues found · ⚠️ 1/3 lens(es) errored** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Reviewed</summary> ## Corrected Review No material security issues found </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **No material issues found** </details> <details><summary><b>🧯 Error handling & edge cases</b> — ⚠️ could not complete</summary> ⚠️ This reviewer failed to complete: agent: step 9: context deadline exceeded </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 18m 20s</sub>

🪰 Gadfly review — glm-5.1:cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

VERDICT: Minor issues

  • provider/llamaswap/llamaswap.go:160–165 — Path traversal in Unload: The model parameter is concatenated directly into the URL path without percent-encoding (path += "/" + model), which is then joined with p.baseURL at line 183 (p.baseURL + path). A model name containing ../ segments would produce a path like /api/models/unload/../v1/chat/completions, which many HTTP servers would normalize before routing — potentially hitting an unintended endpoint. While model names are typically operator-controlled, Running() returns model names from the server's /running endpoint as raw JSON; a compromised or misconfigured llama-swap could inject traversal strings that a caller feeds back into Unload. This is a defense-in-depth concern rather than a remote exploit, but url.PathEscape(model) or validation against / and .. segments would harden it.

  • provider/llamaswap/llamaswap.go:201–204 — No response body size limit on success path: doJSON limits error-response bodies to 1 MB (LimitReader at line 214), but the success-path json.NewDecoder(resp.Body).Decode(out) at line 202 has no bound. For the image-generation endpoint, a malicious or misconfigured llama-swap could return an arbitrarily large JSON payload, causing OOM since the full imageResponse (including base64 image data) is materialized in memory. The same unbounded-decode pattern exists in other providers (e.g., provider/ollama/wire.go:296), but image payloads can be orders of magnitude larger than chat responses, making this more impactful. Consider wrapping resp.Body with io.LimitReader before decoding in doJSON, at least when the caller expects image-scale payloads.

🎯 Correctness — No material issues found

VERDICT: No material issues found

After thorough verification of the diff against the checked-out code, I found no logic bugs or correctness issues:

  • DSN → base URL: builtin.go:87 constructs "http://"+dsn.Host. Verified DSN.Host is hostname[:port][/path] with no scheme prefix (env.go:36), so llama-swap://tok@box.local:8080 correctly yields http://box.local:8080. ✓
  • Placeholder key vs. blank key: chatProvider() sends Bearer no-key when token is empty (llamaswap.go:110-113), correctly bypassing the openai client's synthetic 401 on blank keys (openai/model.go:76-87). Management/image endpoints via doJSON omit Authorization entirely when token is empty (llamaswap.go:190-192) — correct for local keyless use. ✓
  • omitempty on imageRequest.N: N=0 omits the field; OpenAI defaults omitted n to 1, matching the documented 0 = provider default semantics. ✓
  • imageRequest.ResponseFormat has no omitempty, so "b64_json" is always sent — correct for the bytes-only contract. ✓
  • Empty baseURL guard: Both Model() and ImageModel() check p.baseURL == "" before proceeding, preventing a chatProvider() call with an invalid URL. ✓
  • imagegen.Request.Apply correctly copies the receiver before applying mutations, verified by the test in imagegen/imagegen_test.go. ✓
🧯 Error handling & edge cases — Minor issues

All three findings are confirmed against the actual code. The verdict stands.

VERDICT: Minor issues

  • provider/llamaswap/llamaswap.go:135–165 — Management methods lack empty-baseURL guard. Model() (line 98) and ImageModel() (image.go:18) both check p.baseURL == "" and return a clear error. ListModels, Running, and Unload skip directly to doJSON, which concatenates "" + "/v1/models" into the relative URL /v1/models. http.NewRequestWithContext accepts this, producing a confusing network error (Get "/v1/models": unsupported protocol scheme "") instead of the helpful "no base URL configured" message that the other two methods give. Fix: add the same p.baseURL == "" check at the top of each management method (or a shared validation in doJSON).

  • provider/llamaswap/llamaswap.go:161–164Unload concatenates model name into the URL path without escaping. Model identifiers commonly contain colons (qwen3:14b) or other URL-significant characters. The path "/api/models/unload/" + model will produce /api/models/unload/qwen3:14b, where the colon is technically valid in a path segment but any model name containing /, ?, #, or spaces would silently produce a wrong or broken URL. Fix: use url.PathEscape(model) (or url.PathEscape on the model portion) so characters like spaces and slashes are percent-encoded.

  • provider/llamaswap/llamaswap.go:197–206doJSON doesn't drain the response body on the success path when out == nil. For Unload (which passes nil for out), the response body is closed by defer but never read. In Go's net/http, undrained bodies prevent connection reuse; the transport will drop and recreate the TCP connection. For the small payloads expected here this is minor, but it means every unload call forces a new connection. Other providers in this codebase (e.g. ollama) have the same pattern, so this is consistent; noting for completeness.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 43s

<!-- gadfly-review:ollama:glm-5.1:cloud --> ### 🪰 Gadfly review — `glm-5.1:cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> **VERDICT: Minor issues** - **`provider/llamaswap/llamaswap.go:160–165` — Path traversal in `Unload`**: The `model` parameter is concatenated directly into the URL path without percent-encoding (`path += "/" + model`), which is then joined with `p.baseURL` at line 183 (`p.baseURL + path`). A model name containing `../` segments would produce a path like `/api/models/unload/../v1/chat/completions`, which many HTTP servers would normalize before routing — potentially hitting an unintended endpoint. While model names are typically operator-controlled, `Running()` returns model names from the server's `/running` endpoint as raw JSON; a compromised or misconfigured llama-swap could inject traversal strings that a caller feeds back into `Unload`. This is a defense-in-depth concern rather than a remote exploit, but `url.PathEscape(model)` or validation against `/` and `..` segments would harden it. - **`provider/llamaswap/llamaswap.go:201–204` — No response body size limit on success path**: `doJSON` limits error-response bodies to 1 MB (`LimitReader` at line 214), but the success-path `json.NewDecoder(resp.Body).Decode(out)` at line 202 has no bound. For the image-generation endpoint, a malicious or misconfigured llama-swap could return an arbitrarily large JSON payload, causing OOM since the full `imageResponse` (including base64 image data) is materialized in memory. The same unbounded-decode pattern exists in other providers (e.g., `provider/ollama/wire.go:296`), but image payloads can be orders of magnitude larger than chat responses, making this more impactful. Consider wrapping `resp.Body` with `io.LimitReader` before decoding in `doJSON`, at least when the caller expects image-scale payloads. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> VERDICT: No material issues found After thorough verification of the diff against the checked-out code, I found no logic bugs or correctness issues: - **DSN → base URL**: `builtin.go:87` constructs `"http://"+dsn.Host`. Verified `DSN.Host` is `hostname[:port][/path]` with no scheme prefix (`env.go:36`), so `llama-swap://tok@box.local:8080` correctly yields `http://box.local:8080`. ✓ - **Placeholder key vs. blank key**: `chatProvider()` sends `Bearer no-key` when token is empty (`llamaswap.go:110-113`), correctly bypassing the openai client's synthetic 401 on blank keys (`openai/model.go:76-87`). Management/image endpoints via `doJSON` omit Authorization entirely when token is empty (`llamaswap.go:190-192`) — correct for local keyless use. ✓ - **`omitempty` on `imageRequest.N`**: `N=0` omits the field; OpenAI defaults omitted `n` to 1, matching the documented `0 = provider default` semantics. ✓ - **`imageRequest.ResponseFormat`** has no `omitempty`, so `"b64_json"` is always sent — correct for the bytes-only contract. ✓ - **Empty baseURL guard**: Both `Model()` and `ImageModel()` check `p.baseURL == ""` before proceeding, preventing a `chatProvider()` call with an invalid URL. ✓ - **`imagegen.Request.Apply`** correctly copies the receiver before applying mutations, verified by the test in `imagegen/imagegen_test.go`. ✓ </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> All three findings are confirmed against the actual code. The verdict stands. **VERDICT: Minor issues** - **`provider/llamaswap/llamaswap.go:135–165` — Management methods lack empty-`baseURL` guard.** `Model()` (line 98) and `ImageModel()` (image.go:18) both check `p.baseURL == ""` and return a clear error. `ListModels`, `Running`, and `Unload` skip directly to `doJSON`, which concatenates `"" + "/v1/models"` into the relative URL `/v1/models`. `http.NewRequestWithContext` accepts this, producing a confusing network error (`Get "/v1/models": unsupported protocol scheme ""`) instead of the helpful "no base URL configured" message that the other two methods give. **Fix:** add the same `p.baseURL == ""` check at the top of each management method (or a shared validation in `doJSON`). - **`provider/llamaswap/llamaswap.go:161–164` — `Unload` concatenates model name into the URL path without escaping.** Model identifiers commonly contain colons (`qwen3:14b`) or other URL-significant characters. The path `"/api/models/unload/" + model` will produce `/api/models/unload/qwen3:14b`, where the colon is technically valid in a path segment but any model name containing `/`, `?`, `#`, or spaces would silently produce a wrong or broken URL. **Fix:** use `url.PathEscape(model)` (or `url.PathEscape` on the model portion) so characters like spaces and slashes are percent-encoded. - **`provider/llamaswap/llamaswap.go:197–206` — `doJSON` doesn't drain the response body on the success path when `out == nil`.** For `Unload` (which passes `nil` for `out`), the response body is closed by `defer` but never read. In Go's `net/http`, undrained bodies prevent connection reuse; the transport will drop and recreate the TCP connection. For the small payloads expected here this is minor, but it means every unload call forces a new connection. Other providers in this codebase (e.g. ollama) have the same pattern, so this is consistent; noting for completeness. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 43s</sub>

🪰 Gadfly review — kimi-k2.7-code:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

Verdict: Minor issues

Security findings:

  • provider/llamaswap/llamaswap.go:163 — Path injection in Unload. The model argument is concatenated directly into the URL path (path += "/" + model). Model IDs are usually provider-controlled, but the string comes from a caller and is not escaped. A malicious or accidental value such as ../../../other/path can traverse the API path (the leading /api/models/unload/ only partly anchors it). Verified by reading llamaswap.go; no net/url.PathEscape is used. Fix: use url.PathEscape when constructing the path.

  • provider/llamaswap/llamaswap.go:183p.baseURL + path may produce malformed or unexpected URLs. Base URL is stored trimmed of trailing slashes, but if a caller passes a base URL with a path component (e.g. http://host:8080/prefix) the resulting URL for /v1/models becomes http://host:8080/prefix/v1/models, which is acceptable, but the concatenation bypasses url.URL parsing and provides no validation that the result is a valid URL. Verified from the source. Combined with the path-injection issue above, this is the relevant construction point. Fix: build URLs through url.Parse + url.URL.JoinPath/PathEscape.

  • provider/llamaswap/image.go:80 — Base64 decoding has no size limit. A hostile backend can return an arbitrarily large b64_json string and the provider will decode it fully into memory. There is no request/response byte cap here; only apiError limits error bodies. Verified by reading image.go. Fix: limit the decoded payload size (e.g. cap the input string length or use a base64.NewDecoder wrapped with an io.LimitReader) before allocating.

  • provider/llamaswap/llamaswap.go:111-113no-key bearer sent over HTTP. When no token is configured the provider sends Authorization: Bearer no-key. The comment correctly notes llama-swap is local-first and will ignore it, but the default llama-swap:// scheme builds an http URL and the built-in also uses an HTTP base. If a user points this provider at an HTTPS-fronted or non-local instance, a real but harmless-looking bearer leaks a fixed placeholder on the wire. Verified from llamaswap.go and builtin.go:87. This is low-severity because the value is a well-known placeholder, but it is still a credential-like header emitted without user opt-in. Fix: only send an Authorization header when the user actually configured a token, and document that keyless instances may need no auth; do not synthesize a placeholder for all traffic.

No other material security issues were identified in the new code through the assigned lens.

🎯 Correctness — Blocking issues found

Blocking issues found

  • provider/llamaswap/llamaswap.go:135, 149, 160 — Management methods ListModels, Running, and Unload do not validate p.baseURL == "" before calling doJSON. When the no-DSN built-in (or any unconfigured provider) is used, doJSON builds a relative URL like /v1/models, and http.Client.Do returns http: no Host in request URL. That transport error is then classified by llm.Classify as transient (default path), even though it is a permanent misconfiguration. This contradicts the documented behavior that the no-DSN built-in "errors clearly on use" (mirroring foreman) and wastes retries in failover chains. Fix: add the same p.baseURL == "" guard used in Model/ImageModel to all three management methods, returning a clear permanent error.

  • provider/llamaswap/image.go:58req.N is passed straight into the wire request without validation. A negative value (e.g., imagegen.WithN(-1)) is emitted as "n": -1, which is semantically invalid and will be rejected by the backend. Fix: reject req.N < 0 early with a clear error (llm.ErrUnsupported or a dedicated message).

  • provider/llamaswap/image.go:74-78 — When a response entry has no b64_json, the code returns &llm.APIError{...} with Status: 0 and no wrapped sentinel error. Because Status == 0, llm.Classify falls through to the default and marks it transient. A backend that ignores response_format: b64_json and returns URL-only entries is effectively an unsupported response-mode failure; treating it as transient can cause spurious retries. Fix: wrap/flag this as llm.ErrUnsupported so classification is permanent, matching the bytes-only contract enforced by llm.ImagePart.

🧯 Error handling & edge cases — Minor issues

Minor issues

  • provider/llamaswap/llamaswap.go:163Unload builds the path with raw string concatenation (path += "/" + model) and does not URL-escape the model id. A model id containing /, %, spaces, or other reserved characters will produce an invalid or path-traversing URL. Use url.PathEscape(model) before concatenating.

  • provider/llamaswap/image.go:35 / provider/llamaswap/image.go:58Request.N is forwarded verbatim to the wire request with json:"n,omitempty". A negative value is not treated as empty, so it will be serialized and sent to the backend. Validate req.N >= 0 (or treat negative as default) and return a clear error, otherwise callers can accidentally emit nonsensical requests.

  • provider/llamaswap/image.go:64 — Management/chat errors report the model because doJSON is wired through paths that know the model, but image-generation errors call doJSON with an empty model string, so *llm.APIError instances from the image endpoint have Model: "" and less useful diagnostics. Consider adding the model id to the error path, e.g. by making doJSON accept a model parameter or by wrapping non-2xx errors inside Generate.

  • provider/llamaswap/image.go:51 — An empty/whitespace-only prompt returns llm.ErrUnsupported. ErrUnsupported is documented as a target capability mismatch, not a caller validation failure; while classification still comes out permanent, a dedicated error message (e.g. a plain error instead of ErrUnsupported) would be more accurate. If the permanent classification is intentional, a short comment justifying the choice would help future readers.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 58s

<!-- gadfly-review:ollama:kimi-k2.7-code:cloud --> ### 🪰 Gadfly review — `kimi-k2.7-code:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> **Verdict:** Minor issues Security findings: - **`provider/llamaswap/llamaswap.go:163` — Path injection in `Unload`.** The `model` argument is concatenated directly into the URL path (`path += "/" + model`). Model IDs are usually provider-controlled, but the string comes from a caller and is not escaped. A malicious or accidental value such as `../../../other/path` can traverse the API path (the leading `/api/models/unload/` only partly anchors it). Verified by reading `llamaswap.go`; no `net/url.PathEscape` is used. **Fix:** use `url.PathEscape` when constructing the path. - **`provider/llamaswap/llamaswap.go:183` — `p.baseURL + path` may produce malformed or unexpected URLs.** Base URL is stored trimmed of trailing slashes, but if a caller passes a base URL with a path component (e.g. `http://host:8080/prefix`) the resulting URL for `/v1/models` becomes `http://host:8080/prefix/v1/models`, which is acceptable, but the concatenation bypasses `url.URL` parsing and provides no validation that the result is a valid URL. Verified from the source. Combined with the path-injection issue above, this is the relevant construction point. **Fix:** build URLs through `url.Parse` + `url.URL.JoinPath`/`PathEscape`. - **`provider/llamaswap/image.go:80` — Base64 decoding has no size limit.** A hostile backend can return an arbitrarily large `b64_json` string and the provider will decode it fully into memory. There is no request/response byte cap here; only `apiError` limits error bodies. Verified by reading `image.go`. **Fix:** limit the decoded payload size (e.g. cap the input string length or use a `base64.NewDecoder` wrapped with an `io.LimitReader`) before allocating. - **`provider/llamaswap/llamaswap.go:111-113` — `no-key` bearer sent over HTTP.** When no token is configured the provider sends `Authorization: Bearer no-key`. The comment correctly notes llama-swap is local-first and will ignore it, but the default `llama-swap://` scheme builds an **http** URL and the built-in also uses an HTTP base. If a user points this provider at an HTTPS-fronted or non-local instance, a real but harmless-looking bearer leaks a fixed placeholder on the wire. Verified from `llamaswap.go` and `builtin.go:87`. This is low-severity because the value is a well-known placeholder, but it is still a credential-like header emitted without user opt-in. **Fix:** only send an `Authorization` header when the user actually configured a token, and document that keyless instances may need no auth; do not synthesize a placeholder for all traffic. No other material security issues were identified in the new code through the assigned lens. </details> <details><summary><b>🎯 Correctness</b> — Blocking issues found</summary> **Blocking issues found** - **`provider/llamaswap/llamaswap.go:135`, `149`, `160`** — Management methods `ListModels`, `Running`, and `Unload` do not validate `p.baseURL == ""` before calling `doJSON`. When the no-DSN built-in (or any unconfigured provider) is used, `doJSON` builds a relative URL like `/v1/models`, and `http.Client.Do` returns `http: no Host in request URL`. That transport error is then classified by `llm.Classify` as **transient** (default path), even though it is a permanent misconfiguration. This contradicts the documented behavior that the no-DSN built-in "errors clearly on use" (mirroring `foreman`) and wastes retries in failover chains. **Fix:** add the same `p.baseURL == ""` guard used in `Model`/`ImageModel` to all three management methods, returning a clear permanent error. - **`provider/llamaswap/image.go:58`** — `req.N` is passed straight into the wire request without validation. A negative value (e.g., `imagegen.WithN(-1)`) is emitted as `"n": -1`, which is semantically invalid and will be rejected by the backend. **Fix:** reject `req.N < 0` early with a clear error (`llm.ErrUnsupported` or a dedicated message). - **`provider/llamaswap/image.go:74-78`** — When a response entry has no `b64_json`, the code returns `&llm.APIError{...}` with `Status: 0` and no wrapped sentinel error. Because `Status == 0`, `llm.Classify` falls through to the default and marks it **transient**. A backend that ignores `response_format: b64_json` and returns URL-only entries is effectively an unsupported response-mode failure; treating it as transient can cause spurious retries. **Fix:** wrap/flag this as `llm.ErrUnsupported` so classification is permanent, matching the bytes-only contract enforced by `llm.ImagePart`. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> **Minor issues** - `provider/llamaswap/llamaswap.go:163` — `Unload` builds the path with raw string concatenation (`path += "/" + model`) and does not URL-escape the model id. A model id containing `/`, `%`, spaces, or other reserved characters will produce an invalid or path-traversing URL. Use `url.PathEscape(model)` before concatenating. - `provider/llamaswap/image.go:35` / `provider/llamaswap/image.go:58` — `Request.N` is forwarded verbatim to the wire request with `json:"n,omitempty"`. A negative value is not treated as empty, so it will be serialized and sent to the backend. Validate `req.N >= 0` (or treat negative as default) and return a clear error, otherwise callers can accidentally emit nonsensical requests. - `provider/llamaswap/image.go:64` — Management/chat errors report the model because `doJSON` is wired through paths that know the model, but image-generation errors call `doJSON` with an empty model string, so `*llm.APIError` instances from the image endpoint have `Model: ""` and less useful diagnostics. Consider adding the model id to the error path, e.g. by making `doJSON` accept a model parameter or by wrapping non-2xx errors inside `Generate`. - `provider/llamaswap/image.go:51` — An empty/whitespace-only prompt returns `llm.ErrUnsupported`. `ErrUnsupported` is documented as a target capability mismatch, not a caller validation failure; while classification still comes out permanent, a dedicated error message (e.g. a plain error instead of `ErrUnsupported`) would be more accurate. If the permanent classification is intentional, a short comment justifying the choice would help future readers. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 58s</sub>

🪰 Gadfly review — deepseek-v4-pro:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

Verdict: Minor issues

  • provider/llamaswap/llamaswap.go:160-165 — Path injection in Unload. The model argument is concatenated directly into the URL path (path += "/" + model) with no escaping. A model name like qwen?foo=bar injects a query string; qwen#frag injects a fragment. While http.NewRequestWithContext will normalize .. segments, query/fragment injection is not mitigated. The model value originates from the caller (ultimately user input in mort). Fix: use url.PathEscape(model) when appending to the path. (Verified by reading llamaswap.go:160-165 and confirming no url.PathEscape, path.Clean, or any sanitization is applied.)

  • builtin.go:87 — Host spoofing via @ in DSN host. ParseDSN splits on the first @ only (strings.Cut), so a DSN like llama-swap://tok@real:8080@evil:9090 yields Host = "real:8080@evil:9090". Concatenated into "http://"+dsn.Host, the URL becomes http://real:8080@evil:9090, which Go's net/url parser interprets as userinfo real:8080 targeting host evil:9090 — silently redirecting requests to a different host. The DSN comes from env vars (operator-controlled), so exploitability is low, but the same ParseDSN is used for all schemes and this confusion vector is specific to the llama-swap scheme's "http://"+dsn.Host pattern. Fix: validate dsn.Host contains no @ before constructing the URL, or use net/url parsing to safely compose the URL. (Verified by reading env.go:46-64 for ParseDSN and builtin.go:84-90 for the scheme factory.)

🎯 Correctness — Blocking issues found

Both findings are confirmed against the actual code. Let me now output the corrected review.

Blocking issues found

  • provider/llamaswap/llamaswap.go:135-165 — Management methods (ListModels, Running, Unload) lack the no-base-URL guard. Model() (line 98) and ImageModel() (image.go:18) both check p.baseURL == "" and return a clear error. But ListModels (line 135), Running (line 149), and Unload (line 160) call doJSON directly without this check. On the no-URL built-in provider (registered at builtin.go:83), calling any of these produces a confusing raw URL-parse error (e.g., parse "/v1/models": invalid URI for request) instead of the documented clear message. This contradicts the ADR and PR description: "A no-DSN built-in errors on use with a clear message, mirroring foreman." Verified by reading both llamaswap.go (lines 135, 149, 160 — no guard) and builtin.go (line 83 — no-URL built-in registered). Fix: add if p.baseURL == "" { return ..., fmt.Errorf(...) } guards to all three methods, matching the pattern in Model() and ImageModel().

  • provider/llamaswap/llamaswap.go:199doJSON always passes "" for the model parameter to apiError. The apiError function (line 212) accepts a model string parameter and sets e.Model from it, but the sole call site hardcodes "". This means APIError.Model is always empty for HTTP errors from management endpoints and image generation — the caller loses which model was involved. The imageModel.Generate method partially mitigates this with its own APIError construction (image.go:74-78, 87-91), but those only cover decode-level errors, not HTTP-level errors from the image endpoint. Verified by reading llamaswap.go:199 and tracing all callers of doJSON (lines 139, 151, 165, and image.go:64). Fix: either thread the model through doJSON (add a model string parameter), or have callers wrap the error to add model context.

🧯 Error handling & edge cases — Minor issues

Verdict: Minor issues

  • provider/llamaswap/llamaswap.go:135,149,160ListModels, Running, Unload skip the empty-baseURL guard. Model() (line 97) and ImageModel() (image.go:17) both check p.baseURL == "" and return a clear error. The three management methods call doJSON directly without that check. On the no-URL built-in (explicitly designed to error on use, per ADR-0015 and builtin.go:75-76), calling any of them produces a confusing transport error like llama-swap: do request: Get "/v1/models": unsupported protocol scheme "" instead of the intended "no base URL configured" message. Fix: add the same if p.baseURL == "" guard to ListModels, Running, and Unload.

  • provider/llamaswap/llamaswap.go:214apiError discards the io.ReadAll error. body, _ := io.ReadAll(io.LimitReader(resp.Body, 1<<20)) silently drops the error. If the response body read fails mid-stream (truncated TCP segment, reset), body is a partial/empty slice and the error message is silently degraded — the caller sees a misleading or empty APIError.Message with no indication the read failed. Fix: capture the error; if non-nil, either include it in the APIError.Err field or fall back to a sentinel message like "(error reading response body)".

  • provider/llamaswap/image.go:74-78,87-91&llm.APIError{} constructed with Status: 0 gets misclassified as transient. When a backend ignores response_format (returning a url-only entry) or returns zero images, the code creates &llm.APIError{Provider:..., Model:..., Message:...} with no Status field. llm.Classify (line 129 of errors.go) only inspects apiErr.Status when Status != 0; a zero Status falls through to the default ClassTransient. These are permanent errors (misconfigured backend, not a transient hiccup), so they should be classified as permanent. Fix: set Status to a 4xx code (e.g., http.StatusUnprocessableEntity or http.StatusBadGateway) on these synthetic errors so Classify returns ClassPermanent.

  • provider/llamaswap/image.go:80-82 — base64 decode failure produces a plain error, also misclassified transient. fmt.Errorf("llama-swap: decode image %d: %w", i, err) is not an *llm.APIError, so llm.Classify hits none of its type-switch cases and returns ClassTransient (the default). A base64 decode failure from the backend is permanent data corruption — retrying won't help. Fix: wrap as an *llm.APIError with an appropriate status code, or at minimum wrap with fmt.Errorf("%w: ...", llm.ErrUnsupported, ...) so Classify sees ErrUnsupportedClassPermanent.

  • imagegen/imagegen.go:65-70Request.Apply panics on a nil Option. Option is func(*Request). If a caller passes a nil option (e.g., a conditional that evaluates to nil), opt(&r) panics with a nil function dereference. The llm package's equivalent Request.Apply has the same pattern without a guard, but this is new code and worth hardening. Fix: add a nil guard: if opt != nil { opt(&r) }.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9m 6s

<!-- gadfly-review:ollama:deepseek-v4-pro:cloud --> ### 🪰 Gadfly review — `deepseek-v4-pro:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> ## Verdict: Minor issues - **`provider/llamaswap/llamaswap.go:160-165` — Path injection in `Unload`.** The `model` argument is concatenated directly into the URL path (`path += "/" + model`) with no escaping. A model name like `qwen?foo=bar` injects a query string; `qwen#frag` injects a fragment. While `http.NewRequestWithContext` will normalize `..` segments, query/fragment injection is not mitigated. The `model` value originates from the caller (ultimately user input in mort). **Fix:** use `url.PathEscape(model)` when appending to the path. *(Verified by reading `llamaswap.go:160-165` and confirming no `url.PathEscape`, `path.Clean`, or any sanitization is applied.)* - **`builtin.go:87` — Host spoofing via `@` in DSN host.** `ParseDSN` splits on the *first* `@` only (`strings.Cut`), so a DSN like `llama-swap://tok@real:8080@evil:9090` yields `Host = "real:8080@evil:9090"`. Concatenated into `"http://"+dsn.Host`, the URL becomes `http://real:8080@evil:9090`, which Go's `net/url` parser interprets as userinfo `real:8080` targeting host `evil:9090` — silently redirecting requests to a different host. The DSN comes from env vars (operator-controlled), so exploitability is low, but the same `ParseDSN` is used for all schemes and this confusion vector is specific to the `llama-swap` scheme's `"http://"+dsn.Host` pattern. **Fix:** validate `dsn.Host` contains no `@` before constructing the URL, or use `net/url` parsing to safely compose the URL. *(Verified by reading `env.go:46-64` for `ParseDSN` and `builtin.go:84-90` for the scheme factory.)* </details> <details><summary><b>🎯 Correctness</b> — Blocking issues found</summary> Both findings are confirmed against the actual code. Let me now output the corrected review. **Blocking issues found** - **`provider/llamaswap/llamaswap.go:135-165` — Management methods (`ListModels`, `Running`, `Unload`) lack the no-base-URL guard.** `Model()` (line 98) and `ImageModel()` (`image.go:18`) both check `p.baseURL == ""` and return a clear error. But `ListModels` (line 135), `Running` (line 149), and `Unload` (line 160) call `doJSON` directly without this check. On the no-URL built-in provider (registered at `builtin.go:83`), calling any of these produces a confusing raw URL-parse error (e.g., `parse "/v1/models": invalid URI for request`) instead of the documented clear message. This contradicts the ADR and PR description: "A no-DSN built-in errors on use with a clear message, mirroring foreman." **Verified** by reading both `llamaswap.go` (lines 135, 149, 160 — no guard) and `builtin.go` (line 83 — no-URL built-in registered). Fix: add `if p.baseURL == "" { return ..., fmt.Errorf(...) }` guards to all three methods, matching the pattern in `Model()` and `ImageModel()`. - **`provider/llamaswap/llamaswap.go:199` — `doJSON` always passes `""` for the `model` parameter to `apiError`.** The `apiError` function (line 212) accepts a `model string` parameter and sets `e.Model` from it, but the sole call site hardcodes `""`. This means `APIError.Model` is always empty for HTTP errors from management endpoints and image generation — the caller loses which model was involved. The `imageModel.Generate` method partially mitigates this with its own `APIError` construction (image.go:74-78, 87-91), but those only cover decode-level errors, not HTTP-level errors from the image endpoint. **Verified** by reading `llamaswap.go:199` and tracing all callers of `doJSON` (lines 139, 151, 165, and image.go:64). Fix: either thread the model through `doJSON` (add a `model string` parameter), or have callers wrap the error to add model context. </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> ## Verdict: Minor issues - **`provider/llamaswap/llamaswap.go:135,149,160` — `ListModels`, `Running`, `Unload` skip the empty-baseURL guard.** `Model()` (line 97) and `ImageModel()` (`image.go:17`) both check `p.baseURL == ""` and return a clear error. The three management methods call `doJSON` directly without that check. On the no-URL built-in (explicitly designed to error on use, per ADR-0015 and `builtin.go:75-76`), calling any of them produces a confusing transport error like `llama-swap: do request: Get "/v1/models": unsupported protocol scheme ""` instead of the intended "no base URL configured" message. **Fix:** add the same `if p.baseURL == ""` guard to `ListModels`, `Running`, and `Unload`. - **`provider/llamaswap/llamaswap.go:214` — `apiError` discards the `io.ReadAll` error.** `body, _ := io.ReadAll(io.LimitReader(resp.Body, 1<<20))` silently drops the error. If the response body read fails mid-stream (truncated TCP segment, reset), `body` is a partial/empty slice and the error message is silently degraded — the caller sees a misleading or empty `APIError.Message` with no indication the read failed. **Fix:** capture the error; if non-nil, either include it in the `APIError.Err` field or fall back to a sentinel message like `"(error reading response body)"`. - **`provider/llamaswap/image.go:74-78,87-91` — `&llm.APIError{}` constructed with `Status: 0` gets misclassified as transient.** When a backend ignores `response_format` (returning a url-only entry) or returns zero images, the code creates `&llm.APIError{Provider:..., Model:..., Message:...}` with no `Status` field. `llm.Classify` (line 129 of `errors.go`) only inspects `apiErr.Status` when `Status != 0`; a zero `Status` falls through to the default `ClassTransient`. These are permanent errors (misconfigured backend, not a transient hiccup), so they should be classified as permanent. **Fix:** set `Status` to a 4xx code (e.g., `http.StatusUnprocessableEntity` or `http.StatusBadGateway`) on these synthetic errors so `Classify` returns `ClassPermanent`. - **`provider/llamaswap/image.go:80-82` — base64 decode failure produces a plain error, also misclassified transient.** `fmt.Errorf("llama-swap: decode image %d: %w", i, err)` is not an `*llm.APIError`, so `llm.Classify` hits none of its type-switch cases and returns `ClassTransient` (the default). A base64 decode failure from the backend is permanent data corruption — retrying won't help. **Fix:** wrap as an `*llm.APIError` with an appropriate status code, or at minimum wrap with `fmt.Errorf("%w: ...", llm.ErrUnsupported, ...)` so `Classify` sees `ErrUnsupported` → `ClassPermanent`. - **`imagegen/imagegen.go:65-70` — `Request.Apply` panics on a nil `Option`.** `Option` is `func(*Request)`. If a caller passes a nil option (e.g., a conditional that evaluates to nil), `opt(&r)` panics with a nil function dereference. The `llm` package's equivalent `Request.Apply` has the same pattern without a guard, but this is new code and worth hardening. **Fix:** add a nil guard: `if opt != nil { opt(&r) }`. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9m 6s</sub>

🪰 Gadfly review — nemotron-3-super:cloud (ollama-cloud)

Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — Blocking issues found

Blocking issues found

  • provider/llamaswap/llamaswap.go:160-166: The Unload method is vulnerable to path traversal if the model parameter contains path traversal sequences (e.g., "../") or slashes that could alter the intended URL path. An attacker who controls the model parameter (e.g., via a compromised llama-swap server returning malicious model IDs from ListModels) could cause the provider to make requests to unintended endpoints. Fix by validating or escaping the model string before appending to the path. Verified by reading the file.
🎯 Correctness — No material issues found

No material issues found

🧯 Error handling & edge cases — Minor issues

Verdict: Minor issues

Findings

  • Missing model context in HTTP errors
    provider/llamaswap/llamaswap.go:199
    The doJSON method calls apiError(resp, "") with an empty model string for all non-2xx responses. This omits useful model context in errors for operations like image generation (provider/llamaswap/image.go:46) and model unloading (provider/llamaswap/llamaswap.go:165), where the model identifier is available at the call site.
    Suggestion: Pass the relevant model identifier to doJSON/apiError when available (e.g., model ID for image generation, target model for unload).

  • Ignored error from response body read
    provider/llamaswap/llamaswap.go:214
    io.ReadAll error is ignored (body, _ := io.ReadAll(...)). If reading the response body fails (e.g., network interruption), the code proceeds to unmarshal a nil/partial body, potentially causing misleading errors or panics.
    Suggestion: Check the read error and return an appropriate llm.APIError with context.

  • Unescaped model string in Unload path
    provider/llamaswap/llamaswap.go:161-165
    The Unload method appends the model string directly to the URL path without escaping (path += "/" + model). If the model contains reserved characters (e.g., /), the resulting URL is invalid, likely causing request failure or unintended behavior.
    Suggestion: Use net/url.PathEscape to escape the model string before path concatenation.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 18s

<!-- gadfly-review:ollama:nemotron-3-super:cloud --> ### 🪰 Gadfly review — `nemotron-3-super:cloud` (ollama-cloud) **Verdict: Blocking issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Blocking issues found</summary> Blocking issues found - provider/llamaswap/llamaswap.go:160-166: The Unload method is vulnerable to path traversal if the model parameter contains path traversal sequences (e.g., "../") or slashes that could alter the intended URL path. An attacker who controls the model parameter (e.g., via a compromised llama-swap server returning malicious model IDs from ListModels) could cause the provider to make requests to unintended endpoints. Fix by validating or escaping the model string before appending to the path. Verified by reading the file. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> No material issues found </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> **Verdict**: Minor issues ### Findings - **Missing model context in HTTP errors** `provider/llamaswap/llamaswap.go:199` The `doJSON` method calls `apiError(resp, "")` with an empty model string for all non-2xx responses. This omits useful model context in errors for operations like image generation (`provider/llamaswap/image.go:46`) and model unloading (`provider/llamaswap/llamaswap.go:165`), where the model identifier is available at the call site. *Suggestion*: Pass the relevant model identifier to `doJSON`/`apiError` when available (e.g., model ID for image generation, target model for unload). - **Ignored error from response body read** `provider/llamaswap/llamaswap.go:214` `io.ReadAll` error is ignored (`body, _ := io.ReadAll(...)`). If reading the response body fails (e.g., network interruption), the code proceeds to unmarshal a nil/partial body, potentially causing misleading errors or panics. *Suggestion*: Check the read error and return an appropriate `llm.APIError` with context. - **Unescaped model string in Unload path** `provider/llamaswap/llamaswap.go:161-165` The `Unload` method appends the model string directly to the URL path without escaping (`path += "/" + model`). If the model contains reserved characters (e.g., `/`), the resulting URL is invalid, likely causing request failure or unintended behavior. *Suggestion*: Use `net/url.PathEscape` to escape the model string before path concatenation. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 18s</sub>

🪰 Gadfly review — gpt-oss:120b-cloud (ollama-cloud)

Verdict: Minor issues — 3 reviewers: security, correctness, error-handling

🔒 Security — Minor issues

Verdict: Minor issues

  • Missing authentication on management endpoints (provider/llamaswap/llamaswap.go:190‑192) – The doJSON helper sets the Authorization header only when p.token is non‑empty. Consequently, management methods (ListModels, Running, Unload) can be invoked without any authentication if a llama‑swap instance is publicly reachable and no token is configured.
    Impact: An attacker could enumerate available models or unload them, causing potential denial‑of‑service or information leakage.
    Fix: Require a token for management endpoints or clearly document the need to protect llama‑swap instances from unauthenticated access.
🎯 Correctness — No material issues found

VERDICT: No material issues found

🧯 Error handling & edge cases — Minor issues

Verdict: Minor issues

  • Missing base URL validation for management endpoints (ListModels, Running, Unload).
    File: provider/llamaswap/llamaswap.go:135‑166 – these methods call p.doJSON without first checking p.baseURL. If a provider is created without a base URL, the request URL becomes "/v1/models" (or similar) and http.NewRequest fails with “missing protocol scheme”. The error is wrapped as a generic request‑building error rather than a clear “no base URL configured” message.

  • No validation of Request.N (image count) in image generation.
    File: provider/llamaswap/image.go:55‑63 – the request struct forwards N directly to the backend. Supplying a negative value results in a malformed JSON payload ("n":-1) which the server may reject with an unclear error.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 18s

<!-- gadfly-review:ollama:gpt-oss:120b-cloud --> ### 🪰 Gadfly review — `gpt-oss:120b-cloud` (ollama-cloud) **Verdict: Minor issues** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — Minor issues</summary> **Verdict:** Minor issues - **Missing authentication on management endpoints (provider/llamaswap/llamaswap.go:190‑192)** – The `doJSON` helper sets the `Authorization` header only when `p.token` is non‑empty. Consequently, management methods (`ListModels`, `Running`, `Unload`) can be invoked without any authentication if a llama‑swap instance is publicly reachable and no token is configured. *Impact:* An attacker could enumerate available models or unload them, causing potential denial‑of‑service or information leakage. *Fix:* Require a token for management endpoints or clearly document the need to protect llama‑swap instances from unauthenticated access. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> **VERDICT: No material issues found** </details> <details><summary><b>🧯 Error handling & edge cases</b> — Minor issues</summary> **Verdict:** Minor issues - **Missing base URL validation for management endpoints** (`ListModels`, `Running`, `Unload`). *File:* `provider/llamaswap/llamaswap.go:135‑166` – these methods call `p.doJSON` without first checking `p.baseURL`. If a provider is created without a base URL, the request URL becomes `"/v1/models"` (or similar) and `http.NewRequest` fails with “missing protocol scheme”. The error is wrapped as a generic request‑building error rather than a clear “no base URL configured” message. - **No validation of `Request.N` (image count) in image generation**. *File:* `provider/llamaswap/image.go:55‑63` – the request struct forwards `N` directly to the backend. Supplying a negative value results in a malformed JSON payload (`"n":-1`) which the server may reject with an unclear error. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 18s</sub>

🪰 Gadfly review — qwen3-coder:480b-cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

VERDICT: No material issues found

🎯 Correctness — No material issues found

No material issues found

🧯 Error handling & edge cases — No material issues found

No material issues found

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 59s

<!-- gadfly-review:ollama:qwen3-coder:480b-cloud --> ### 🪰 Gadfly review — `qwen3-coder:480b-cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> VERDICT: No material issues found </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> No material issues found </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> No material issues found </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 59s</sub>

🪰 Gadfly review — gemma4:cloud (ollama-cloud)

Verdict: No material issues found — 3 reviewers: security, correctness, error-handling

🔒 Security — No material issues found

Verdict: No material issues found

I have reviewed the changes through the Security lens.

  • Input Validation: The imagegen.Request is passed to the backend via JSON. While it doesn't perform strict validation of the Size string (e.g., checking for "1024x1024" format), this is handled by the backend. The Prompt is checked for emptiness.
  • Data Handling: Image data is returned as b64_json and decoded using base64.StdEncoding. The implementation correctly avoids fetching remote URLs, preventing SSRF vulnerabilities when receiving images.
  • Authentication: The provider supports Bearer tokens. For local keyless instances, it uses a no-key placeholder to satisfy the openai client's requirement for a non-empty key without sending actual sensitive credentials.
  • Transport: The llama-swap scheme defaults to http:// for local-first deployment as specified in the ADR. While HTTP is insecure for public networks, the documentation and ADR explicitly state this is intended for local-first use, and users can use the openai:// scheme for TLS-fronted instances.
  • Resource Exhaustion: The apiError method uses io.LimitReader(resp.Body, 1<<20) (1MB) when reading error bodies, preventing potential memory exhaustion attacks from a malicious or malfunctioning backend.

The implementation follows the security constraints and patterns established in the repository.

🎯 Correctness — No material issues found

Verdict: No material issues found

I have reviewed the implementation of the llamaswap provider and the imagegen interface.

  • Chat Delegation: Verified that provider/llamaswap correctly delegates to provider/openai with the necessary adjustments for llama.cpp's shim (WithLegacyMaxTokens) and local-first auth (no-key placeholder).
  • Image Generation: The imagegen contract is clean and decoupled from llm. The implementation in provider/llamaswap/image.go correctly handles the b64_json response format, performs base64 decoding, and uses http.DetectContentType for MIME sniffing with a sensible fallback to image/png.
  • Management API: The ListModels, Running, and Unload methods correctly target the non-standard endpoints specified in the ADR.
  • HTTP Defaults: The decision to use a timeout-free http.Client for the provider is correct given the domain (cold model swaps), shifting the responsibility of bounding the request to the context deadline.
  • DSN Wiring: The builtin.go registration correctly overrides the default https scheme to http for llama-swap.

The logic is sound and adheres to the specified ADRs.

🧯 Error handling & edge cases — No material issues found

Verdict: No material issues found

I have reviewed the changes through the lens of Error handling & edge cases.

Analysis

  • HTTP Client Timeouts: The implementation explicitly avoids setting a client-level timeout in New() (p.client = &http.Client{}) and WithHTTPClient to accommodate llama-swap's potential 15s+ cold-start model swaps. This correctly shifts the responsibility of timing out to the caller's context.Context, preventing premature failures during model loading.
  • Base URL Validation: The Provider checks for a missing baseURL at the point of use in both Model() (chat) and ImageModel() (image gen), returning a clear error rather than attempting a request to an empty or invalid URL.
  • Image Generation Edge Cases:
    • Empty Prompts: Validated in image.go:58 and tested in TestImageGenerateEmptyPrompt, returning llm.ErrUnsupported.
    • Response Formats: The code explicitly forces response_format: "b64_json" and includes an error check (image.go:84) if the backend returns a URL instead of inline bytes, adhering to the ImagePart bytes-only contract.
    • Empty Results: Handles responses with no images (image.go:99) by returning a structured llm.APIError.
    • MIME Sniffing: The sniffImageMIME helper handles inconclusive detection by defaulting to image/png, which is the expected output of the stable-diffusion.cpp backend.
  • API Error Classification: The apiError method in llamaswap.go robustly handles multiple error shapes (OpenAI-style objects, Ollama-style strings, or raw bodies) and maps them to llm.APIError, ensuring that the project's existing llm.Classify logic remains functional for these new endpoints.
  • Auth Placeholders: The chatProvider() handles the "blank key" problem by injecting "no-key" if no token is provided, avoiding the synthetic 401s that the provider/openai client would otherwise generate.
  • Resource Cleanup: resp.Body.Close() is correctly deferred in doJSON to prevent socket leaks.

No issues identified within the assigned lens.

Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 12s

<!-- gadfly-review:ollama:gemma4:cloud --> ### 🪰 Gadfly review — `gemma4:cloud` (ollama-cloud) **Verdict: No material issues found** — 3 reviewers: security, correctness, error-handling <details><summary><b>🔒 Security</b> — No material issues found</summary> Verdict: No material issues found I have reviewed the changes through the Security lens. - **Input Validation**: The `imagegen.Request` is passed to the backend via JSON. While it doesn't perform strict validation of the `Size` string (e.g., checking for "1024x1024" format), this is handled by the backend. The `Prompt` is checked for emptiness. - **Data Handling**: Image data is returned as `b64_json` and decoded using `base64.StdEncoding`. The implementation correctly avoids fetching remote URLs, preventing SSRF vulnerabilities when receiving images. - **Authentication**: The provider supports Bearer tokens. For local keyless instances, it uses a `no-key` placeholder to satisfy the `openai` client's requirement for a non-empty key without sending actual sensitive credentials. - **Transport**: The `llama-swap` scheme defaults to `http://` for local-first deployment as specified in the ADR. While HTTP is insecure for public networks, the documentation and ADR explicitly state this is intended for local-first use, and users can use the `openai://` scheme for TLS-fronted instances. - **Resource Exhaustion**: The `apiError` method uses `io.LimitReader(resp.Body, 1<<20)` (1MB) when reading error bodies, preventing potential memory exhaustion attacks from a malicious or malfunctioning backend. The implementation follows the security constraints and patterns established in the repository. </details> <details><summary><b>🎯 Correctness</b> — No material issues found</summary> Verdict: No material issues found I have reviewed the implementation of the `llamaswap` provider and the `imagegen` interface. - **Chat Delegation**: Verified that `provider/llamaswap` correctly delegates to `provider/openai` with the necessary adjustments for llama.cpp's shim (`WithLegacyMaxTokens`) and local-first auth (`no-key` placeholder). - **Image Generation**: The `imagegen` contract is clean and decoupled from `llm`. The implementation in `provider/llamaswap/image.go` correctly handles the `b64_json` response format, performs base64 decoding, and uses `http.DetectContentType` for MIME sniffing with a sensible fallback to `image/png`. - **Management API**: The `ListModels`, `Running`, and `Unload` methods correctly target the non-standard endpoints specified in the ADR. - **HTTP Defaults**: The decision to use a timeout-free `http.Client` for the provider is correct given the domain (cold model swaps), shifting the responsibility of bounding the request to the context deadline. - **DSN Wiring**: The `builtin.go` registration correctly overrides the default `https` scheme to `http` for `llama-swap`. The logic is sound and adheres to the specified ADRs. </details> <details><summary><b>🧯 Error handling & edge cases</b> — No material issues found</summary> Verdict: No material issues found I have reviewed the changes through the lens of **Error handling & edge cases**. ### Analysis - **HTTP Client Timeouts**: The implementation explicitly avoids setting a client-level timeout in `New()` (`p.client = &http.Client{}`) and `WithHTTPClient` to accommodate llama-swap's potential 15s+ cold-start model swaps. This correctly shifts the responsibility of timing out to the caller's `context.Context`, preventing premature failures during model loading. - **Base URL Validation**: The `Provider` checks for a missing `baseURL` at the point of use in both `Model()` (chat) and `ImageModel()` (image gen), returning a clear error rather than attempting a request to an empty or invalid URL. - **Image Generation Edge Cases**: - **Empty Prompts**: Validated in `image.go:58` and tested in `TestImageGenerateEmptyPrompt`, returning `llm.ErrUnsupported`. - **Response Formats**: The code explicitly forces `response_format: "b64_json"` and includes an error check (`image.go:84`) if the backend returns a URL instead of inline bytes, adhering to the `ImagePart` bytes-only contract. - **Empty Results**: Handles responses with no images (`image.go:99`) by returning a structured `llm.APIError`. - **MIME Sniffing**: The `sniffImageMIME` helper handles inconclusive detection by defaulting to `image/png`, which is the expected output of the stable-diffusion.cpp backend. - **API Error Classification**: The `apiError` method in `llamaswap.go` robustly handles multiple error shapes (OpenAI-style objects, Ollama-style strings, or raw bodies) and maps them to `llm.APIError`, ensuring that the project's existing `llm.Classify` logic remains functional for these new endpoints. - **Auth Placeholders**: The `chatProvider()` handles the "blank key" problem by injecting `"no-key"` if no token is provided, avoiding the synthetic 401s that the `provider/openai` client would otherwise generate. - **Resource Cleanup**: `resp.Body.Close()` is correctly deferred in `doJSON` to prevent socket leaks. No issues identified within the assigned lens. </details> <sub>Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 12s</sub>
steve added 1 commit 2026-06-27 20:04:24 +00:00
fix(llamaswap): address Gadfly review findings
CI / Tidy (pull_request) Successful in 9m25s
CI / Build & Test (pull_request) Successful in 10m15s
64642c43c4
- Unload: reject model ids containing path separators (/?#) so a model name
  can't redirect the request to another endpoint; ":" (common in ids) stays
  verbatim.
- doJSON: take a model arg so image/management HTTP errors carry the target id
  (was always ""); add a base-URL guard so management methods fail clearly
  instead of building a bare-path request; cap the success-path JSON decode with
  io.LimitReader (64 MiB) and drain the body when out is nil for conn reuse.
- image: reject negative Request.N before sending.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
steve merged commit b2487a1a37 into main 2026-06-27 20:14:01 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/majordomo#3