feat(llamaswap): llama-swap provider + canonical imagegen interface #3
Reference in New Issue
Block a user
Delete Branch "feat/llama-swap-provider"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Adds llama-swap (the on-demand model-swapping proxy over llama.cpp / stable-diffusion.cpp) as a tailored provider, plus a new canonical text-to-image interface so mort can generate images.
provider/llamaswap (ADR-0015)
llama-swap's chat API is byte-identical to OpenAI Chat Completions, so the chat path delegates to
provider/openaiat{base}/v1— no duplicated wire client (ADR-0007). What makes it "tailored":ListModels(/v1/models),Running(/running, raw JSON),Unload(/api/models/unload).healthCheckTimeout(≥15s), so callers bound work with a context deadline.max_tokens(llama.cpp's shim), and aBearer no-keyplaceholder so keyless local instances work (the openai client rejects a blank key).llama-swap://token@host:portbuilds an http base URL (local-first, unlike the https-always default). A no-URL built-in errors clearly on use (mirrors foreman).imagegen (ADR-0016)
A new canonical leaf package, separate from
llm(image gen shares no message/tool/stream machinery):Request/Result/Model/Provider, withImage = llm.ImagePartso generated images feed straight back into a chat turn. Re-exported from the root (ImageModel,ImageRequest,WithImageSize, …). First backend is llama-swap via OpenAI/v1/images/generations(b64_json, bytes-only). v1 is txt2img only; edits/img2img and registry image-DSN resolution deferred.Tests & docs
httptestcoverage: chat delegation (path + legacy max_tokens + bearer), keyless placeholder,ListModels/Unload/Running, image decode (1×1 PNG →ImagePart), error classification, and scheme/built-in wiring.build,vet,test -race,mod tidy). No new dependencies.🤖 Generated with Claude Code
Add provider/llamaswap, a tailored provider for llama-swap (the model-swapping proxy over llama.cpp / stable-diffusion.cpp). Its chat path delegates to provider/openai at {base}/v1 — no duplicated wire client (ADR-0007) — with legacy max_tokens, a Bearer no-key placeholder for keyless local instances, and a timeout-free client so cold model swaps rely on context deadlines. The "tailored" surface is concrete management methods (ListModels / Running / Unload) that don't belong on the canonical llm.Provider interface. The llama-swap:// DSN scheme builds an http base URL (local-first); a no-URL built-in errors clearly on use, mirroring foreman. Add imagegen, a new canonical text-to-image interface separate from llm (Request/Result/Model/Provider; Image = llm.ImagePart so generated images feed straight back into chat). First backend is llama-swap via OpenAI /v1/images/generations (b64_json, bytes-only). Re-exported from the root. v1 is txt2img only. Hermetic httptest coverage for chat delegation, management endpoints, image decode, and scheme wiring. ADR-0015 + ADR-0016, README support matrix + image-gen section, CLAUDE.md package map, and progress.md updated in the same commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>🪰 Gadfly review —
glm-5.2:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
Corrected review
VERDICT: Minor issues
provider/llamaswap/llamaswap.go:163— path injection inUnload. Themodelargument is interpolated directly into the request path (path += "/" + model) and passed tohttp.NewRequestWithContextwith no escaping. A model id containing/,..,?,#, or extra segments can rewrite the target URL — e.g.Unload(ctx, "../v1/chat/completions")POSTs to/api/models/unload/../v1/chat/completions(normalized to/v1/chat/completions), andUnload(ctx, "x?foo=bar")turns the remainder into a query string. Confirmed:doJSONdoesp.baseURL+path(llamaswap.go:183) with nourl.PathEscape/validation, and the package docs (llamaswap.go:95,image.go:16) state ids are "passed through verbatim." The other providers append only fixed strings (openai:model.go:92uses/chat/completions). Model ids here come from operator config /ListModels, so impact is limited, butUnload'smodelparameter is caller-supplied. Fix:path += "/" + url.PathEscape(model)(or reject ids containing/,?,#).builtin.go:88/provider/llamaswap/llamaswap.go:191— bearer token over cleartext HTTP. Thellama-swapDSN scheme builds anhttp://base URL (builtin.go:88"http://"+dsn.Host) anddoJSONsendsAuthorization: Bearer <token>over unencrypted transport when a token is configured (llamaswap.go:191). This is a documented local-first design choice (ADR-0015), not an oversight, so it's a credential-exposure risk rather than a bug: any token set viallama-swap://token@hosttraverses the local network in plaintext. Suggested mitigation: a doc note recommending an empty token for genuinely keyless local instances (which already omits the header — verified atllamaswap.go:190if p.token != ""), and steering tokened deployments to theopenai://(https) scheme. Not blocking since it's an explicit, reviewed decision.env.go:59/builtin.go:88— DSN host not validated against path/query metacharacters (low severity, operator-controlled).ParseDSNonlyTrimRights slashes fromhost(env.go:59); a host likeevil.com/p?xyields baseURLhttp://evil.com/p?x, anddoJSON'sbaseURL+pathconcatenation then appends endpoints into what becomes the query string. DSNs are operator-supplied (not untrusted end-user input), so this is low-risk, but it's an SSRF-shaped sharp edge. Aurl.Parse-based host validation inParseDSN(or asserting the host has no/?#) would close it. Unverified beyond readingParseDSN; flagging since it touches the trust boundary.Nothing else in the security lane: chat auth delegates to the openai client (verified
openai/model.go:97sets the bearer), image bytes are sniffed withhttp.DetectContentType(safe),apiErrorbounds body reads at 1 MiB (llamaswap.go:214), and the keyless placeholder (Bearer no-key) is harmless.🎯 Correctness — No material issues found
Verdict: No material issues found
Through the correctness lens, I verified the new code against the checked-out repository and found no logic bugs:
N/Sizezero-value semantics (provider/llamaswap/image.go:35-36):Ncarriesjson:"n,omitempty", soN==0omits the field and the backend applies its default (documented behavior);Size==""likewise omits. Matches theimagegen.Requestcontract. Verifiedimagegen.godocs and the OpenAI wire shape.response_formatalways forced (image.go:37,60):ResponseFormathas noomitempty, so"b64_json"is always sent, which the bytes-only contract relies on. Correct.sniffImageMIME(image.go:98-103):http.DetectContentTypeon a PNG returnsimage/png; the non-image/fallback toimage/pngis a reasonable default for the stated stable-diffusion.cpp backend. Verified againstllm.ImagePartusage.ErrUnsupportedwrapping for empty prompt (image.go:51-53):fmt.Errorf("%w: ...", llm.ErrUnsupported)preserves the sentinel soerrors.Isworks; verifiedllm.ErrUnsupportedexists atllm/errors.go:34.llamaswap.go:109-121): builds the openai client at{baseURL}/v1with the placeholderno-keywhen tokenless andWithLegacyMaxTokens(). Verifiedopenai.WithAPIKey/WithBaseURL/WithLegacyMaxTokens/WithHTTPClientall exist with matching signatures (openai.go:48-85). The keyless placeholder avoids the openai client's blank-key 401 path, as intended.builtin.go:87): uses"http://"+dsn.Host, deliberately notdsn.BaseURL()(which is https-always perenv.go:41). VerifiedDSN.Hostis scheme-less host[:port][/path]. Correct and consistent with the ADR.Unloadpath building (llamaswap.go:160-165): empty model →/api/models/unload; non-empty →/api/models/unload/<model>. Model IDs likeqwen3:14bcontain a colon, which is valid unescaped in a URL path segment, so no correctness issue for the documented model-name shapes.doJSONdecode intojson.RawMessageforRunning:json.DecoderpopulatesRawMessagewith arbitrary valid JSON; verified the call site atllamaswap.go:151.No semantic/domain correctness defects (constants, conversions, formulas, thresholds) surfaced in the diff.
🧯 Error handling & edge cases — Minor issues
Verdict: Minor issues
Through the error-handling & edge-cases lens, the unhappy paths are mostly handled well:
doJSONhasdefer resp.Body.Close(), non-2xx →*llm.APIError, transport errors are wrapped (preservingllm.Classify's net-error detection), the empty-prompt guard uses%wsoerrors.Is(err, llm.ErrUnsupported)holds (verified inllm/errors.go:115andTestImageGenerateEmptyPrompt), and the no-image / url-only responses are surfaced asAPIErrorrather than silently skipped.Findings:
provider/llamaswap/llamaswap.go:135,149,160—ListModels,Running, andUnloaddo not guardp.baseURL == "", unlikeModel(line 98) andImageModel(image.go:18). The no-DSN built-in (builtin.go:83) constructs a provider with an empty base URL, and the ADR/inline comment explicitly promises that the built-in "errors on use with a clear message, mirrors foreman." That promise holds for chat/image, but for the management surface — the headline "tailored" methods — calling e.g.p.ListModels(ctx)on the built-in reachesdoJSONwhich buildshttp.NewRequestWithContext(ctx, method, p.baseURL+path, ...)at line 183 with an emptybaseURL, yielding an opaquellama-swap: build request: http: no Host in request URLrather than the clear "no base URL configured" message. Suggested fix: add the sameif p.baseURL == "" { return ..., fmt.Errorf(...) }guard (or a sharederrNoBaseURLhelper) at the top of the three management methods. Verified by reading the fulldoJSONpath and the builtin construction.provider/llamaswap/image.go:58(imageRequest.N) — a negativeNis forwarded to the backend verbatim (N intwithomitemptyonly suppresses zero). The doc (imagegen.go:30) says0 = provider defaultbut says nothing about negatives; a caller passingWithN(-1)sendsn:-1to the upstream, whose behavior is undefined. Minor edge case; aif req.N < 0 { req.N = 0 }clamp (or an explicit validation error) would close it. Unverified against any real backend behavior, so treat as a hardening nit rather than a confirmed bug.provider/llamaswap/image.go:80— a base64 decode failure is returned as a plain wrappedfmt.Errorf(not*llm.APIError), sollm.Classifyfalls through to its "unrecognized → transient" default (llm/errors.go:103-104). A genuinely corrupt/truncated payload from the backend is a permanent condition for that response, not a transient one. Minor misclassification; wrapping as an*llm.APIError(or giving it anUnwrapto a permanent sentinel) would make failover decisions correct. Low severity since this path implies a protocol violation by the backend.No swallowed errors, missing defers, or panic-on-nil/empty-input issues were found:
Newalways initializesclient, the empty-body decode paths are only reached on success responses that carry JSON, andsniffImageMIMEhandles non-image bytes by defaulting toimage/pngrather than panicking.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 39s
🪰 Gadfly review —
minimax-m3:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
Confirmed:
&http.Client{}at line 81 has zero timeout; the ADR explicitly documents this as intentional. The reviewer correctly identified this as a documented, intentional choice — the finding itself is sound and the suggested fix is reasonable (a configurable default timeout rather than unbounded). Finding #3 is valid.All three findings reproduce against the actual code. The verdict stands.
VERDICT: Minor issues
provider/llamaswap/llamaswap.go:201-205andprovider/llamaswap/image.go:80— unbounded response body on the success path (DoS vector).doJSONonly caps the error body viaio.LimitReader(resp.Body, 1<<20)atllamaswap.go:214; the success path passes the rawresp.Bodytojson.NewDecoder(resp.Body).Decode(out)and the image path passes the (potentially gigabyte-scale)b64_jsonstring tobase64.StdEncoding.DecodeString. A malicious or misbehaving llama-swap upstream (or MITM in front of it, given the local-firsthttp://default atbuiltin.go:87) can exhaust caller memory by returning a multi-GB response. Suggested fix: bound the success body withio.LimitReader(resp.Body, <cap>)before decoding, mirroring the 1 MiB cap used on the error path, and consider rejectingContent-Lengthover the cap outright.provider/llamaswap/llamaswap.go:109-121vsprovider/llamaswap/llamaswap.go:190-192— bearer-auth inconsistency between chat and management/image paths.chatProvider()always emitsAuthorization: Bearer <key>(substituting the placeholder"no-key"when no token is set, lines 110-112), whiledoJSONomits theAuthorizationheader entirely whenp.token == ""(lines 190-192). The placeholder rationale ("a local llama-swap may require no auth at all — a bearer it ignores is harmless") is unsafe against a llama-swap or fronting proxy that treats anyAuthorizationheader as an auth attempt and rejects it, or that logs every such header as an auth failure; in that case a user who configured llama-swap keyless but is behind an auth-required proxy gets a confusing 401 on chat while image/management calls succeed with no header. Suggested fix: pick one policy — skipping theAuthorizationheader on the chat path whenp.token == ""(so chat, image, and management all behave identically) is the cleanest alignment.provider/llamaswap/llamaswap.go:80-86— defaulthttp.Clienthas no timeout. Documented as intentional (ADR-0015: cold model swaps can take many seconds, so callers must bound work with a context deadline). But the default for a barellamaswap.New()is unbounded: a hung llama-swap host pins a goroutine indefinitely until the caller's context fires. Chain callers inherit a deadline through context, but*llamaswap.Providerusers invokingListModels/Running/Unload/ImageModel.Generatedirectly can forget, and the no-DSN built-in atbuiltin.go:83is constructed without any client timeout at all. This is a resource-exhaustion / DoS concern in any non-chain integration path. Suggested fix: default to a generous explicit timeout (e.g., 5–10 minutes) instead of zero, or warn whenp.client == http.DefaultClient/ nil-timeout atNew-time if a config knob is available.🎯 Correctness — No material issues found
VERDICT: No material issues found
I reviewed the change through the correctness lens, focusing on the new
provider/llamaswappackage, theimagegenpackage, and the DSN/scheme wiring inbuiltin.go.What I verified:
provider/llamaswap/llamaswap.go—chatProvider()setsopenai.WithAPIKey("no-key")when the llama-swap token is empty, which correctly side-steps the synthetic 401 the openai client throws on a blankapiKey(confirmedprovider/openai/model.go:76-87). It also passesWithLegacyMaxTokens()(verified the option exists atprovider/openai/openai.go:80-85) and forwardsWithHTTPClient(p.client)so the no-timeout swap behavior is preserved.provider/llamaswap/image.go—response_formatis sent withoutomitempty, sob64_jsonis always forced inline. The decode path handlesB64JSON == ""(returns*llm.APIError), base64 errors (wrapped), and an emptydataarray (*llm.APIError).sniffImageMIMEfalls back toimage/pngcorrectly for PNG bytes.provider/llamaswap/llamaswap.goUnload— the path concatenation"/api/models/unload/" + modeland the empty-model short-circuit match the PR description and the test.builtin.go—llama-swap://token@host:portis built as"http://"+dsn.Host. The no-DSN built-in has an emptybaseURL;Model/ImageModelboth gate onp.baseURL == ""and return a clear error (matches theTestLlamaSwapBuiltinNoURLexpectation).imagegen/imagegen.go—Request.Applyoperates on a copy (verified byTestRequestApply's immutability assertion).Image = llm.ImagePartso the alias holds forllm.UserParts(...)use;TestImageIsImagePartpins this.majordomo.go— re-exports use type aliases (type ImageModel = imagegen.Model, etc.), so identity is preserved.max_tokens(notmax_completion_tokens) is on the wire, the bearer placeholder is sent, all management endpoints hit the expected paths, and429classifies asClassTransient. These pin the contract described in the ADRs.Potential concerns I checked and dismissed:
no-keyplaceholder — not a bug.Rawcarrying a*imageResponsepointer (rather than a copy) is consistent withllm.Response.Rawbeing a provider-native escape hatch; the API docs say "May be nil; never required for normal use."openai.Provideron everyModel()call is wasteful but not a correctness issue.Things I could not verify within the budget (not reporting as findings):
provider/openai'sWithLegacyMaxTokensactually flips the wire field — the openai-side unit test atprovider/openai/openai_test.go:219claims it does, but I didn't open that file. The llamaswap test asserts the absence ofmax_completion_tokensand presence ofmax_tokens, which would fail atgo test -raceif it weren't honored./v1/images/generationsfor stable-diffusion.cpp backends (an upstream/runtime concern, not a code concern).🧯 Error handling & edge cases — Minor issues
VERDICT: Minor issues
Reviewing
provider/llamaswap/llamaswap.goandprovider/llamaswap/image.gothrough the error-handling / edge-cases lens only.Findings
provider/llamaswap/image.go:64— image errors lose the model id. The shareddoJSONhelper is invoked asm.p.doJSON(ctx, http.MethodPost, "/v1/images/generations", &wire, &resp), with no model argument. InternallydoJSONthen callsp.apiError(resp, "")(llamaswap.go:199), producing a*llm.APIError{Provider: p.name, Model: ""}(llamaswap.go:213). Every other provider in the package (andprovider/openai's own helper atprovider/openai/model.go:111) fillsModelon error responses. Even withinimage.goitself, the two other*llm.APIErrorconstructions (image.go:74, image.go:87) correctly populatem.id, so this is an internal inconsistency: thedoJSON-mediated HTTP error path silently drops the id that the surrounding code already has.llm.Classifydoesn't readModel, but the structured error'sError()method (llm/errors.go:66-82) formatsProvider/Model, so callers that log the error seellama-swap/: HTTP 500: ...instead ofllama-swap/sd: HTTP 500: .... Suggested fix: thread the model through — e.g. add adoJSONModelvariant, or changedoJSONto take a model string (passing""for the no-model endpoints andm.idfromimageModel.Generate).provider/llamaswap/image.go:80— unbounded b64 decode.base64.StdEncoding.DecodeString(d.B64JSON)materializes the full decoded payload in memory beforesniffImageMIMEever inspects it. A backend (or a malicious proxy) returning a multi-MB b64 entry will cause a single allocation spike; combined with the request body already being buffered byhttp.Client, the provider has no cap on total image memory. There is also no check that the decoded length is sane. Suggested fix: cap the b64 length (or decoded size) before decoding, e.g.if len(d.B64JSON) > 16<<20 { return nil, &llm.APIError{...} }, matching the 1 MiB response-body cap already inapiError(llamaswap.go:214).provider/llamaswap/image.go:98— MIME sniff silently defaults toimage/pngon empty/ambiguous data.http.DetectContentType(nil)returns"text/plain; charset=utf-8", whichsniffImageMIMErewrites to"image/png". The only way to reachsniffImageMIMEis past a non-emptyd.B64JSONcheck (image.go:70), butbase64.StdEncoding.DecodeString("AA==")returns a non-empty-but-meaningless single-byte slice thatDetectContentTypewill still classify astext/plain(or similar). The caller then sees a confidently-labelledimage/pngfor what may not be a PNG at all. The function comment justifies this with "stable-diffusion.cpp emits PNG" but does so unconditionally for any non-image/MIME. Suggested fix: validate thatDetectContentTypereturned something PNG/JPEG/WEBP for known signatures (or refuse to setMIMEwhen detection fails) — at minimum log/return the unsupported case rather than silently masking it.provider/llamaswap/llamaswap.go:201-204— no body size cap on the success-path JSON decode.doJSONcaps any error body at 1 MiB (io.LimitReader(resp.Body, 1<<20)inapiError), but the success-pathjson.NewDecoder(resp.Body).Decode(out)will buffer the entire response — a runaway stable-diffusion.cpp backend returning gigabytes would OOM. Outside the error-handling lane (DoS) but worth flagging given the sibling 1 MiB cap already in the file. Suggested fix: wrap withhttp.MaxBytesReaderor stream-decode into aLimitedReaderwith a sane cap (image generations are typically <<16 MiB).Not material (in this lane)
ErrUnsupportedpath is correct and covered byTestImageGenerateEmptyPrompt.baseURLonModel/ImageModelreturns a clear error string. ✓Running'sjson.RawMessagedecode path andUnload's empty-vs-named model branch are both handled. ✓llm.Classifycan see the underlyingnet.Error/syscall.Errno. ✓defer resp.Body.Close()is in place; partial-image error returns correctly discardout. ✓data: [](the "no images" path is unit-testable but currently has no httptest), and a malformed-image b64 entry (the decode-error branch). Not bugs in the diff, just under-coverage worth noting.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 1m 37s
🪰 Gadfly review —
m5/qwen3.6:35b-mlx(m5)Verdict: No material issues found · ⚠️ 1/3 lens(es) errored — 3 reviewers: security, correctness, error-handling
🔒 Security — Reviewed
Corrected Review
No material security issues found
🎯 Correctness — No material issues found
No material issues found
🧯 Error handling & edge cases — ⚠️ could not complete
⚠️ This reviewer failed to complete: agent: step 9: context deadline exceeded
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 18m 20s
🪰 Gadfly review —
glm-5.1:cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
VERDICT: Minor issues
provider/llamaswap/llamaswap.go:160–165— Path traversal inUnload: Themodelparameter is concatenated directly into the URL path without percent-encoding (path += "/" + model), which is then joined withp.baseURLat line 183 (p.baseURL + path). A model name containing../segments would produce a path like/api/models/unload/../v1/chat/completions, which many HTTP servers would normalize before routing — potentially hitting an unintended endpoint. While model names are typically operator-controlled,Running()returns model names from the server's/runningendpoint as raw JSON; a compromised or misconfigured llama-swap could inject traversal strings that a caller feeds back intoUnload. This is a defense-in-depth concern rather than a remote exploit, buturl.PathEscape(model)or validation against/and..segments would harden it.provider/llamaswap/llamaswap.go:201–204— No response body size limit on success path:doJSONlimits error-response bodies to 1 MB (LimitReaderat line 214), but the success-pathjson.NewDecoder(resp.Body).Decode(out)at line 202 has no bound. For the image-generation endpoint, a malicious or misconfigured llama-swap could return an arbitrarily large JSON payload, causing OOM since the fullimageResponse(including base64 image data) is materialized in memory. The same unbounded-decode pattern exists in other providers (e.g.,provider/ollama/wire.go:296), but image payloads can be orders of magnitude larger than chat responses, making this more impactful. Consider wrappingresp.Bodywithio.LimitReaderbefore decoding indoJSON, at least when the caller expects image-scale payloads.🎯 Correctness — No material issues found
VERDICT: No material issues found
After thorough verification of the diff against the checked-out code, I found no logic bugs or correctness issues:
builtin.go:87constructs"http://"+dsn.Host. VerifiedDSN.Hostishostname[:port][/path]with no scheme prefix (env.go:36), sollama-swap://tok@box.local:8080correctly yieldshttp://box.local:8080. ✓chatProvider()sendsBearer no-keywhen token is empty (llamaswap.go:110-113), correctly bypassing the openai client's synthetic 401 on blank keys (openai/model.go:76-87). Management/image endpoints viadoJSONomit Authorization entirely when token is empty (llamaswap.go:190-192) — correct for local keyless use. ✓omitemptyonimageRequest.N:N=0omits the field; OpenAI defaults omittednto 1, matching the documented0 = provider defaultsemantics. ✓imageRequest.ResponseFormathas noomitempty, so"b64_json"is always sent — correct for the bytes-only contract. ✓Model()andImageModel()checkp.baseURL == ""before proceeding, preventing achatProvider()call with an invalid URL. ✓imagegen.Request.Applycorrectly copies the receiver before applying mutations, verified by the test inimagegen/imagegen_test.go. ✓🧯 Error handling & edge cases — Minor issues
All three findings are confirmed against the actual code. The verdict stands.
VERDICT: Minor issues
provider/llamaswap/llamaswap.go:135–165— Management methods lack empty-baseURLguard.Model()(line 98) andImageModel()(image.go:18) both checkp.baseURL == ""and return a clear error.ListModels,Running, andUnloadskip directly todoJSON, which concatenates"" + "/v1/models"into the relative URL/v1/models.http.NewRequestWithContextaccepts this, producing a confusing network error (Get "/v1/models": unsupported protocol scheme "") instead of the helpful "no base URL configured" message that the other two methods give. Fix: add the samep.baseURL == ""check at the top of each management method (or a shared validation indoJSON).provider/llamaswap/llamaswap.go:161–164—Unloadconcatenates model name into the URL path without escaping. Model identifiers commonly contain colons (qwen3:14b) or other URL-significant characters. The path"/api/models/unload/" + modelwill produce/api/models/unload/qwen3:14b, where the colon is technically valid in a path segment but any model name containing/,?,#, or spaces would silently produce a wrong or broken URL. Fix: useurl.PathEscape(model)(orurl.PathEscapeon the model portion) so characters like spaces and slashes are percent-encoded.provider/llamaswap/llamaswap.go:197–206—doJSONdoesn't drain the response body on the success path whenout == nil. ForUnload(which passesnilforout), the response body is closed bydeferbut never read. In Go'snet/http, undrained bodies prevent connection reuse; the transport will drop and recreate the TCP connection. For the small payloads expected here this is minor, but it means every unload call forces a new connection. Other providers in this codebase (e.g. ollama) have the same pattern, so this is consistent; noting for completeness.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 43s
🪰 Gadfly review —
kimi-k2.7-code:cloud(ollama-cloud)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
Verdict: Minor issues
Security findings:
provider/llamaswap/llamaswap.go:163— Path injection inUnload. Themodelargument is concatenated directly into the URL path (path += "/" + model). Model IDs are usually provider-controlled, but the string comes from a caller and is not escaped. A malicious or accidental value such as../../../other/pathcan traverse the API path (the leading/api/models/unload/only partly anchors it). Verified by readingllamaswap.go; nonet/url.PathEscapeis used. Fix: useurl.PathEscapewhen constructing the path.provider/llamaswap/llamaswap.go:183—p.baseURL + pathmay produce malformed or unexpected URLs. Base URL is stored trimmed of trailing slashes, but if a caller passes a base URL with a path component (e.g.http://host:8080/prefix) the resulting URL for/v1/modelsbecomeshttp://host:8080/prefix/v1/models, which is acceptable, but the concatenation bypassesurl.URLparsing and provides no validation that the result is a valid URL. Verified from the source. Combined with the path-injection issue above, this is the relevant construction point. Fix: build URLs throughurl.Parse+url.URL.JoinPath/PathEscape.provider/llamaswap/image.go:80— Base64 decoding has no size limit. A hostile backend can return an arbitrarily largeb64_jsonstring and the provider will decode it fully into memory. There is no request/response byte cap here; onlyapiErrorlimits error bodies. Verified by readingimage.go. Fix: limit the decoded payload size (e.g. cap the input string length or use abase64.NewDecoderwrapped with anio.LimitReader) before allocating.provider/llamaswap/llamaswap.go:111-113—no-keybearer sent over HTTP. When no token is configured the provider sendsAuthorization: Bearer no-key. The comment correctly notes llama-swap is local-first and will ignore it, but the defaultllama-swap://scheme builds an http URL and the built-in also uses an HTTP base. If a user points this provider at an HTTPS-fronted or non-local instance, a real but harmless-looking bearer leaks a fixed placeholder on the wire. Verified fromllamaswap.goandbuiltin.go:87. This is low-severity because the value is a well-known placeholder, but it is still a credential-like header emitted without user opt-in. Fix: only send anAuthorizationheader when the user actually configured a token, and document that keyless instances may need no auth; do not synthesize a placeholder for all traffic.No other material security issues were identified in the new code through the assigned lens.
🎯 Correctness — Blocking issues found
Blocking issues found
provider/llamaswap/llamaswap.go:135,149,160— Management methodsListModels,Running, andUnloaddo not validatep.baseURL == ""before callingdoJSON. When the no-DSN built-in (or any unconfigured provider) is used,doJSONbuilds a relative URL like/v1/models, andhttp.Client.Doreturnshttp: no Host in request URL. That transport error is then classified byllm.Classifyas transient (default path), even though it is a permanent misconfiguration. This contradicts the documented behavior that the no-DSN built-in "errors clearly on use" (mirroringforeman) and wastes retries in failover chains. Fix: add the samep.baseURL == ""guard used inModel/ImageModelto all three management methods, returning a clear permanent error.provider/llamaswap/image.go:58—req.Nis passed straight into the wire request without validation. A negative value (e.g.,imagegen.WithN(-1)) is emitted as"n": -1, which is semantically invalid and will be rejected by the backend. Fix: rejectreq.N < 0early with a clear error (llm.ErrUnsupportedor a dedicated message).provider/llamaswap/image.go:74-78— When a response entry has nob64_json, the code returns&llm.APIError{...}withStatus: 0and no wrapped sentinel error. BecauseStatus == 0,llm.Classifyfalls through to the default and marks it transient. A backend that ignoresresponse_format: b64_jsonand returns URL-only entries is effectively an unsupported response-mode failure; treating it as transient can cause spurious retries. Fix: wrap/flag this asllm.ErrUnsupportedso classification is permanent, matching the bytes-only contract enforced byllm.ImagePart.🧯 Error handling & edge cases — Minor issues
Minor issues
provider/llamaswap/llamaswap.go:163—Unloadbuilds the path with raw string concatenation (path += "/" + model) and does not URL-escape the model id. A model id containing/,%, spaces, or other reserved characters will produce an invalid or path-traversing URL. Useurl.PathEscape(model)before concatenating.provider/llamaswap/image.go:35/provider/llamaswap/image.go:58—Request.Nis forwarded verbatim to the wire request withjson:"n,omitempty". A negative value is not treated as empty, so it will be serialized and sent to the backend. Validatereq.N >= 0(or treat negative as default) and return a clear error, otherwise callers can accidentally emit nonsensical requests.provider/llamaswap/image.go:64— Management/chat errors report the model becausedoJSONis wired through paths that know the model, but image-generation errors calldoJSONwith an empty model string, so*llm.APIErrorinstances from the image endpoint haveModel: ""and less useful diagnostics. Consider adding the model id to the error path, e.g. by makingdoJSONaccept a model parameter or by wrapping non-2xx errors insideGenerate.provider/llamaswap/image.go:51— An empty/whitespace-only prompt returnsllm.ErrUnsupported.ErrUnsupportedis documented as a target capability mismatch, not a caller validation failure; while classification still comes out permanent, a dedicated error message (e.g. a plain error instead ofErrUnsupported) would be more accurate. If the permanent classification is intentional, a short comment justifying the choice would help future readers.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 2m 58s
🪰 Gadfly review —
deepseek-v4-pro:cloud(ollama-cloud)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
Verdict: Minor issues
provider/llamaswap/llamaswap.go:160-165— Path injection inUnload. Themodelargument is concatenated directly into the URL path (path += "/" + model) with no escaping. A model name likeqwen?foo=barinjects a query string;qwen#fraginjects a fragment. Whilehttp.NewRequestWithContextwill normalize..segments, query/fragment injection is not mitigated. Themodelvalue originates from the caller (ultimately user input in mort). Fix: useurl.PathEscape(model)when appending to the path. (Verified by readingllamaswap.go:160-165and confirming nourl.PathEscape,path.Clean, or any sanitization is applied.)builtin.go:87— Host spoofing via@in DSN host.ParseDSNsplits on the first@only (strings.Cut), so a DSN likellama-swap://tok@real:8080@evil:9090yieldsHost = "real:8080@evil:9090". Concatenated into"http://"+dsn.Host, the URL becomeshttp://real:8080@evil:9090, which Go'snet/urlparser interprets as userinforeal:8080targeting hostevil:9090— silently redirecting requests to a different host. The DSN comes from env vars (operator-controlled), so exploitability is low, but the sameParseDSNis used for all schemes and this confusion vector is specific to thellama-swapscheme's"http://"+dsn.Hostpattern. Fix: validatedsn.Hostcontains no@before constructing the URL, or usenet/urlparsing to safely compose the URL. (Verified by readingenv.go:46-64forParseDSNandbuiltin.go:84-90for the scheme factory.)🎯 Correctness — Blocking issues found
Both findings are confirmed against the actual code. Let me now output the corrected review.
Blocking issues found
provider/llamaswap/llamaswap.go:135-165— Management methods (ListModels,Running,Unload) lack the no-base-URL guard.Model()(line 98) andImageModel()(image.go:18) both checkp.baseURL == ""and return a clear error. ButListModels(line 135),Running(line 149), andUnload(line 160) calldoJSONdirectly without this check. On the no-URL built-in provider (registered atbuiltin.go:83), calling any of these produces a confusing raw URL-parse error (e.g.,parse "/v1/models": invalid URI for request) instead of the documented clear message. This contradicts the ADR and PR description: "A no-DSN built-in errors on use with a clear message, mirroring foreman." Verified by reading bothllamaswap.go(lines 135, 149, 160 — no guard) andbuiltin.go(line 83 — no-URL built-in registered). Fix: addif p.baseURL == "" { return ..., fmt.Errorf(...) }guards to all three methods, matching the pattern inModel()andImageModel().provider/llamaswap/llamaswap.go:199—doJSONalways passes""for themodelparameter toapiError. TheapiErrorfunction (line 212) accepts amodel stringparameter and setse.Modelfrom it, but the sole call site hardcodes"". This meansAPIError.Modelis always empty for HTTP errors from management endpoints and image generation — the caller loses which model was involved. TheimageModel.Generatemethod partially mitigates this with its ownAPIErrorconstruction (image.go:74-78, 87-91), but those only cover decode-level errors, not HTTP-level errors from the image endpoint. Verified by readingllamaswap.go:199and tracing all callers ofdoJSON(lines 139, 151, 165, and image.go:64). Fix: either thread the model throughdoJSON(add amodel stringparameter), or have callers wrap the error to add model context.🧯 Error handling & edge cases — Minor issues
Verdict: Minor issues
provider/llamaswap/llamaswap.go:135,149,160—ListModels,Running,Unloadskip the empty-baseURL guard.Model()(line 97) andImageModel()(image.go:17) both checkp.baseURL == ""and return a clear error. The three management methods calldoJSONdirectly without that check. On the no-URL built-in (explicitly designed to error on use, per ADR-0015 andbuiltin.go:75-76), calling any of them produces a confusing transport error likellama-swap: do request: Get "/v1/models": unsupported protocol scheme ""instead of the intended "no base URL configured" message. Fix: add the sameif p.baseURL == ""guard toListModels,Running, andUnload.provider/llamaswap/llamaswap.go:214—apiErrordiscards theio.ReadAllerror.body, _ := io.ReadAll(io.LimitReader(resp.Body, 1<<20))silently drops the error. If the response body read fails mid-stream (truncated TCP segment, reset),bodyis a partial/empty slice and the error message is silently degraded — the caller sees a misleading or emptyAPIError.Messagewith no indication the read failed. Fix: capture the error; if non-nil, either include it in theAPIError.Errfield or fall back to a sentinel message like"(error reading response body)".provider/llamaswap/image.go:74-78,87-91—&llm.APIError{}constructed withStatus: 0gets misclassified as transient. When a backend ignoresresponse_format(returning a url-only entry) or returns zero images, the code creates&llm.APIError{Provider:..., Model:..., Message:...}with noStatusfield.llm.Classify(line 129 oferrors.go) only inspectsapiErr.StatuswhenStatus != 0; a zeroStatusfalls through to the defaultClassTransient. These are permanent errors (misconfigured backend, not a transient hiccup), so they should be classified as permanent. Fix: setStatusto a 4xx code (e.g.,http.StatusUnprocessableEntityorhttp.StatusBadGateway) on these synthetic errors soClassifyreturnsClassPermanent.provider/llamaswap/image.go:80-82— base64 decode failure produces a plain error, also misclassified transient.fmt.Errorf("llama-swap: decode image %d: %w", i, err)is not an*llm.APIError, sollm.Classifyhits none of its type-switch cases and returnsClassTransient(the default). A base64 decode failure from the backend is permanent data corruption — retrying won't help. Fix: wrap as an*llm.APIErrorwith an appropriate status code, or at minimum wrap withfmt.Errorf("%w: ...", llm.ErrUnsupported, ...)soClassifyseesErrUnsupported→ClassPermanent.imagegen/imagegen.go:65-70—Request.Applypanics on a nilOption.Optionisfunc(*Request). If a caller passes a nil option (e.g., a conditional that evaluates to nil),opt(&r)panics with a nil function dereference. Thellmpackage's equivalentRequest.Applyhas the same pattern without a guard, but this is new code and worth hardening. Fix: add a nil guard:if opt != nil { opt(&r) }.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 9m 6s
🪰 Gadfly review —
nemotron-3-super:cloud(ollama-cloud)Verdict: Blocking issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — Blocking issues found
Blocking issues found
🎯 Correctness — No material issues found
No material issues found
🧯 Error handling & edge cases — Minor issues
Verdict: Minor issues
Findings
Missing model context in HTTP errors
provider/llamaswap/llamaswap.go:199The
doJSONmethod callsapiError(resp, "")with an empty model string for all non-2xx responses. This omits useful model context in errors for operations like image generation (provider/llamaswap/image.go:46) and model unloading (provider/llamaswap/llamaswap.go:165), where the model identifier is available at the call site.Suggestion: Pass the relevant model identifier to
doJSON/apiErrorwhen available (e.g., model ID for image generation, target model for unload).Ignored error from response body read
provider/llamaswap/llamaswap.go:214io.ReadAllerror is ignored (body, _ := io.ReadAll(...)). If reading the response body fails (e.g., network interruption), the code proceeds to unmarshal a nil/partial body, potentially causing misleading errors or panics.Suggestion: Check the read error and return an appropriate
llm.APIErrorwith context.Unescaped model string in Unload path
provider/llamaswap/llamaswap.go:161-165The
Unloadmethod appends the model string directly to the URL path without escaping (path += "/" + model). If the model contains reserved characters (e.g.,/), the resulting URL is invalid, likely causing request failure or unintended behavior.Suggestion: Use
net/url.PathEscapeto escape the model string before path concatenation.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 4m 18s
🪰 Gadfly review —
gpt-oss:120b-cloud(ollama-cloud)Verdict: Minor issues — 3 reviewers: security, correctness, error-handling
🔒 Security — Minor issues
Verdict: Minor issues
doJSONhelper sets theAuthorizationheader only whenp.tokenis non‑empty. Consequently, management methods (ListModels,Running,Unload) can be invoked without any authentication if a llama‑swap instance is publicly reachable and no token is configured.Impact: An attacker could enumerate available models or unload them, causing potential denial‑of‑service or information leakage.
Fix: Require a token for management endpoints or clearly document the need to protect llama‑swap instances from unauthenticated access.
🎯 Correctness — No material issues found
VERDICT: No material issues found
🧯 Error handling & edge cases — Minor issues
Verdict: Minor issues
Missing base URL validation for management endpoints (
ListModels,Running,Unload).File:
provider/llamaswap/llamaswap.go:135‑166– these methods callp.doJSONwithout first checkingp.baseURL. If a provider is created without a base URL, the request URL becomes"/v1/models"(or similar) andhttp.NewRequestfails with “missing protocol scheme”. The error is wrapped as a generic request‑building error rather than a clear “no base URL configured” message.No validation of
Request.N(image count) in image generation.File:
provider/llamaswap/image.go:55‑63– the request struct forwardsNdirectly to the backend. Supplying a negative value results in a malformed JSON payload ("n":-1) which the server may reject with an unclear error.Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 3m 18s
🪰 Gadfly review —
qwen3-coder:480b-cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
VERDICT: No material issues found
🎯 Correctness — No material issues found
No material issues found
🧯 Error handling & edge cases — No material issues found
No material issues found
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 59s
🪰 Gadfly review —
gemma4:cloud(ollama-cloud)Verdict: No material issues found — 3 reviewers: security, correctness, error-handling
🔒 Security — No material issues found
Verdict: No material issues found
I have reviewed the changes through the Security lens.
imagegen.Requestis passed to the backend via JSON. While it doesn't perform strict validation of theSizestring (e.g., checking for "1024x1024" format), this is handled by the backend. ThePromptis checked for emptiness.b64_jsonand decoded usingbase64.StdEncoding. The implementation correctly avoids fetching remote URLs, preventing SSRF vulnerabilities when receiving images.no-keyplaceholder to satisfy theopenaiclient's requirement for a non-empty key without sending actual sensitive credentials.llama-swapscheme defaults tohttp://for local-first deployment as specified in the ADR. While HTTP is insecure for public networks, the documentation and ADR explicitly state this is intended for local-first use, and users can use theopenai://scheme for TLS-fronted instances.apiErrormethod usesio.LimitReader(resp.Body, 1<<20)(1MB) when reading error bodies, preventing potential memory exhaustion attacks from a malicious or malfunctioning backend.The implementation follows the security constraints and patterns established in the repository.
🎯 Correctness — No material issues found
Verdict: No material issues found
I have reviewed the implementation of the
llamaswapprovider and theimagegeninterface.provider/llamaswapcorrectly delegates toprovider/openaiwith the necessary adjustments for llama.cpp's shim (WithLegacyMaxTokens) and local-first auth (no-keyplaceholder).imagegencontract is clean and decoupled fromllm. The implementation inprovider/llamaswap/image.gocorrectly handles theb64_jsonresponse format, performs base64 decoding, and useshttp.DetectContentTypefor MIME sniffing with a sensible fallback toimage/png.ListModels,Running, andUnloadmethods correctly target the non-standard endpoints specified in the ADR.http.Clientfor the provider is correct given the domain (cold model swaps), shifting the responsibility of bounding the request to the context deadline.builtin.goregistration correctly overrides the defaulthttpsscheme tohttpforllama-swap.The logic is sound and adheres to the specified ADRs.
🧯 Error handling & edge cases — No material issues found
Verdict: No material issues found
I have reviewed the changes through the lens of Error handling & edge cases.
Analysis
New()(p.client = &http.Client{}) andWithHTTPClientto accommodate llama-swap's potential 15s+ cold-start model swaps. This correctly shifts the responsibility of timing out to the caller'scontext.Context, preventing premature failures during model loading.Providerchecks for a missingbaseURLat the point of use in bothModel()(chat) andImageModel()(image gen), returning a clear error rather than attempting a request to an empty or invalid URL.image.go:58and tested inTestImageGenerateEmptyPrompt, returningllm.ErrUnsupported.response_format: "b64_json"and includes an error check (image.go:84) if the backend returns a URL instead of inline bytes, adhering to theImagePartbytes-only contract.image.go:99) by returning a structuredllm.APIError.sniffImageMIMEhelper handles inconclusive detection by defaulting toimage/png, which is the expected output of the stable-diffusion.cpp backend.apiErrormethod inllamaswap.gorobustly handles multiple error shapes (OpenAI-style objects, Ollama-style strings, or raw bodies) and maps them tollm.APIError, ensuring that the project's existingllm.Classifylogic remains functional for these new endpoints.chatProvider()handles the "blank key" problem by injecting"no-key"if no token is provided, avoiding the synthetic 401s that theprovider/openaiclient would otherwise generate.resp.Body.Close()is correctly deferred indoJSONto prevent socket leaks.No issues identified within the assigned lens.
Automated adversarial review by Gadfly. Advisory only — does not block merge. · ⏱️ reviewed in 12s