llama-swap

Author	SHA1	Message	Date
Benson Wong	a4b91e08cf	Changes and fixes before the release (docs/small tweaks) (#750 ) - update README.md with new docker instructions - update docs/configuration.md - update .github/workflows to have pinned action versions - gofmt events package - fix small bugs in CI scripts - reduce config options for internal/perf/monitor and config. A ring buffer is used to keep 1hr of entries at max 5s granularity. For long term stats use prometheus monitoring on /metrics Fixes #744	2026-05-13 21:18:19 -07:00
rhtenhove	a01afe261b	ci: use manifest-aware cleanup action for multi-arch :cpu (#751 ) actions/delete-package-versions can't see OCI manifest lists. When the cpu build pushes a multi-arch image, the registry gets a tagged index plus one untagged per-platform manifest per arch. The cleanup step with `delete-only-untagged-versions: true` then deletes the per-platform children, leaving the index dangling — `docker pull ghcr.io/mostlygeek/llama-swap:cpu` 404s on the referenced sha. Swap to dataaxiom/ghcr-cleanup-action, which inspects tagged manifest lists first and excludes their children from deletion. Single-arch backends behave the same as before. Fix #746	2026-05-12 18:04:46 -07:00
rhtenhove	174e8562aa	Multi arch cpu (#746 ) Encountered a similar problem as in https://github.com/mostlygeek/llama-swap/issues/709 but in my case I only needed the :cpu version. So decided to add the github action to build arm64 combined with the amd64 version on the same :cpu tag. Already tested it from this fork: ghcr.io/rhtenhove/llama-swap:cpu and it works perfectly fine. Adding GPU support is a whole other beast, needing quite a bit more work and isn't something I can test.	2026-05-11 21:03:48 -07:00
pdscomp	181f71ca11	.github,docker: add cuda13 architecture support (#551 ) Add `cuda13` as a supported build architecture, targeting the `ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image. The `server-cuda13` image ships with CUDA 13 libraries, providing improved performance on recent NVIDIA hardware compared to the existing `server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX 50-series) benefit from reduced model load latency and higher token throughput. - Add `cuda13` to the allowed architectures list in `docker/build-container.sh` - Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so the container is built and pushed automatically	2026-03-01 09:37:08 -08:00
Benson Wong	17e5263a76	.github/workflows: fix expired token in publishing images (#522 ) Fixes: #517	2026-02-14 10:06:05 -08:00
Benson Wong	bc01e6f539	build: add stable-diffusion server to musa and vulkan container images (#504 ) Add sd-server from stable-diffusion.cpp docker image for vulkan and musa containers. closes #450	2026-02-01 16:17:26 -08:00
Benson Wong	3edb180c08	ci: free up disk space before ROCm container build (#460 )	2026-01-14 22:03:42 -08:00
Benson Wong	66d555e625	Improve container build reliability (#457 ) * docker: add .env usage in build-container.sh * .github,docker: add rocm, improve logging * .github,CLAUDE.md: fix workflow and update guidelines Update containers workflow to only push images when triggered manually or on schedule, not on workflow file changes. - add push trigger for workflow file changes in containers.yml - update push condition to skip on regular push events - update CLAUDE.md commit message guidelines * docker: remove comma in build-container.sh * .github,docker: improve container build workflow Add pagination support for fetching llama.cpp tags and improve debugging. - add build-container.sh to workflow trigger paths - implement fetch_llama_tag() with pagination support - replace .env with local testing instructions - add DEBUG_ABORT_BUILD flag for testing	2026-01-10 22:14:33 -08:00
Thammachart Chinvarapon	cc33b6c270	restore intel docker builds (#163 )	2025-06-16 11:13:49 -07:00
Benson Wong	b2a891f8f4	Disable building of intel container until it's fixed upstream	2025-05-23 22:54:43 -07:00
Thammachart Chinvarapon	9548931258	ci: re-enabled intel build pipeline (#121 )	2025-05-11 00:19:57 -07:00
Benson Wong	9667989727	Disabling intel container build since it's been broken for weeks.	2025-05-04 21:39:42 -07:00
Benson Wong	29cd98878d	better container build logic when upstream containers do not exist	2025-03-09 13:02:06 -07:00
Benson Wong	ebabe55ff3	Delete untagged packages after build and push (#55 )	2025-02-18 10:32:32 -08:00
Benson Wong	41a338297c	deletion of untagged containers happen after build-and-push	2025-02-18 10:11:59 -08:00
Benson Wong	7e3353efeb	add action step to remove untagged containers	2025-02-18 10:08:41 -08:00
Benson Wong	4ed58fb173	update container build action	2025-02-18 09:59:06 -08:00
Benson Wong	0acfdb9f78	update workflow to build `cpu` and disable `musa`	2025-02-14 15:26:59 -08:00
Benson Wong	f20f2c9b7a	add docs and container build improvements #43	2025-02-14 12:20:07 -08:00
Benson Wong	7a97c38828	enable parallel container built #46	2025-02-14 11:04:33 -08:00
Benson Wong	4885132565	more permissions futzing	2025-02-14 11:02:15 -08:00
Benson Wong	8b46a0b7f1	grant package:write to container workflow #46	2025-02-14 10:55:30 -08:00
Benson Wong	1b6736ec6f	rename workflow for containers	2025-02-14 10:50:15 -08:00
Benson Wong	11d024bbaa	just build cuda while debugging	2025-02-14 10:48:06 -08:00
Benson Wong	d7e1bb9f7c	add GITHUB_TOKEN to container build env	2025-02-14 10:43:44 -08:00
Benson Wong	ab93460a8b	first container code (#52 )	2025-02-14 10:39:25 -08:00

26 Commits