llama-swap/internal/router/loading_remarks.go

package router

var loadingRemarks = []string{
	"Still faster than your last standup meeting",
	"Reticulating splines",
	"Waking up the hamsters",
	"Teaching the model manners",
	"Convincing the GPU to participate",
	"Loading weights (they're heavy)",
	"Please enjoy this elevator music in your head",
	"Pretending to be productive",
	"Reading the entire internet, page by page",
	"Staring at the abyss, the abyss is buffering",
	"Applying layer after layer of disembodied cognition",
	"Remembering everything it forgot during quantization",
	"Counting to 405 billion, one parameter at a time",
	"Summoning the stochastic parroting",
	"Hold on, the GPU is questioning its existence",
	"Deciding which facts to hallucinate today",
	"Untangling the transformer spaghetti",
	"Warming up the token soup",
	"Your prompt is in a queue, behind 7 billion other thoughts",
	"Running `sudo apt-get install intelligence`",
	"Defragmenting the latent space",
	"Polishing each matrix multiplication by hand",
	"Whispering sweet nothings to the attention heads",
	"Aligning with human values, one reluctant epoch at a time",
	"The model is thinking about what it's about to think about",
	"Loading... and by loading we mean making you wait",
	"Spinning up the cloud GPU, please be patient while we burn your credits",
	"Applying duct tape to the context window",
	"Bribing the GPU scheduler for a timeslice",
	"Would you like to hear a fun fact while we load? Too bad.",
	"Hot swapping your sanity for an LLM",
	"Compressing optimism into FP16",
	"Ignoring 90% of the attention to save you 50% of the time",
	"Counting the exact same thing three times just to be sure",
	"Sorry, the inference you have reached is not in service",
	"Rotating the positional encodings counterclockwise for good luck",
	"Your call is very important to us. Please continue to hold.",
	"Unpacking the blobs. All 300GB of them.",
	"Initializing the thing that initializes the other thing",
	"Converting electricity into existential dread",
	"Flattening the curve... wait, the tensor. Flattening the tensor.",
	"Fetching the fetch of a fetch, callback hell edition",
	"The GPU is at 100%. The fan is now a helicopter.",
	"Baking the weights at 350° for a golden-brown inference",
	"Recalibrating the confidence of things it's still wrong about",
	"Have you tried turning it off and on again? No? Good, wait here.",
	"Simulating deep thought by pausing dramatically",
	"Loading the model that knows more than you but still can't count r's in 'strawberry'",
	"Convincing CUDA to cooperate. This may take a while.",
	"VRAM: 23.9GB used of 24GB. Living on the edge.",
	"Processing your request with the urgency of a DMV employee",
	"This model was trained on the entire internet, including that embarrassing blog you wrote in 2008",
	"Dispatching tokens through a series of increasingly confused matrix multiplies",
	"Gently lowering your expectations",
	"Applying softmax to our feelings about this load time",
	"Autoregressively generating disappointment, one token at a time",
	"The magic is happening. Somewhere. Probably.",
	"Synchronizing the parallel processes that run in parallel but really don't",
	"Calculating the meaning of life. Spoiler: it's 42, but we're double-checking.",
	"Loading... just like it said 30 seconds ago. And will say 30 seconds from now.",
	"Pre-warming the cache so the first query is only slightly slower than the rest",
	"Have you considered that maybe your question wasn't worth all this compute?",
	"Downloading more RAM (no, really, we're mmap-ing the weights)",
	"Translating your prompt into math it barely understands",
	"Estimating your time remaining with 0% accuracy",
	"Buffering enthusiasm",
	"Model is loading. Go make some coffee. Or a three-course meal.",
	"Tokenizing the dictionary, filing a grievance on behalf of 'antidisestablishmentarianism'",
	"Polling for readiness in a loop that would make your CS professor weep",
	"Performing percussive maintenance on the attention mechanism",
	"This loading screen is singlehandedly reversing climate progress",
	"Decompressing the hopes and dreams of thousands of underpaid labelers",
	"Filling the key-value cache with the ghost of prompts past",
	"Currently at step 3 of 9,742 of loading. We'll get there. Eventually.",
	"If you stare at the spinner, it spins slower. It's science.",
	"Multiplying matricies with the enthusiasm of a teenager doing chores",
	"Applying `torch.nap()` until the model feels refreshed",
	"Reacquainting the model with the concept of 'facts' it forgot during fine-tuning",
	"Sorry for the wait. No, wait, we're not actually sorry.",
	"Your GPU is now a space heater with a side hustle in linear algebra",
	"Allocating memory like a billionaire allocates tax avoidance strategies",
	"The model saw \"As an AI language model\" and won't stop saying it now",
	"Installing dependencies you didn't know existed and will never use again",
	"Re-reading 'Attention Is All You Need' for the 400th time",
	"Convincing the embedding layer that context is overrated",
	"Manually untangling the residual connections with a tiny comb",
	"On hold with the cloud provider trying to explain why 8 H100s isn't enough",
	"Adjusting temperatures: model is 0.7, server room is 104°F",
	"Please hold while we justify this electricity bill to accounting",
	"Stacking decoder blocks like a Jenga tower at a LAN party",
	"Compensating for your lack of patience with our lack of speed",
	"This is a loading screen comment. Loading screens have comments now. Welcome to the future.",
	"Processing the entire works of Shakespeare backwards just in case",
	"The model is loading slower than your last `npm install`",
	"Rehearsing plausible-sounding explanations for why it got everything wrong",
	"Populating the context with filler while you wait for actual content",
	"Optimizing for BLEU score, which definitely correlates with making you laugh",
	"Generating an embedding for each and every letter of the alphabet, individually",
	"Coming soon: llama-swap v2 with actual performance improvements. Probably.",
	"Loading a model larger than your attention span",
	"Performing a seance to invoke the spirit of Geoff Hinton",
	"Did you know loading screens were invented to prevent users from smashing their monitors? Now you do.",
	"Converting all the internet's bad opinions into a surprisingly useful autocomplete",
	"Laying down each layer with the care of a Michelin-starred pastry chef",
	"Checking if the model still thinks birds are government drones. Yep.",
	"Activating the neurons responsible for 'I cannot assist with that request'",
	"This model was trained on the same internet that brought you Rickrolling. You're welcome.",
	"Realigning the alignment so it aligns with the previous alignment",
	"Running `nvidia-smi` and sighing heavily",
	"If you close your eyes, the loading bar moves faster. Proven by science.",
	"EULA said 'by using this software you agree to wait forever' and you clicked Accept",
	"Zipping the GPUs to make them go faster",
	"Padding the context window with existential padding",
	"We could have used a smaller model but someone wanted 'quality'",
	"Disentangling the latent space into something resembling coherence",
	"Slow is smooth, smooth is fast, but this is just slow",
	"Memory-mapping like it's a AAA title from 2012",
	"Your patience has been tokenized and added to the training set. Thank you for your contribution.",
	"Loading is CPU-bound and your CPU is busy regretting its life choices",
	"Exploring the high-dimensional manifold of ways to say 'just a moment'",
	"The model is experiencing a brief but intense moment of imposter syndrome",
	"Initializing 7B parameters by rolling 7B 16-sided dice",
	"Panic! at the disk I/O",
	"Intelligence is loading... your definition of intelligence may vary",
	"This model was distilled. Unlike your patience, which is evaporating.",
	"Unzipping the model. It's a .gguf file, not a metaphor.",
	"Running inference on the concept of 'soon' to estimate remaining time",
	"Loading with all the speed of a government-funded IT project",
	"A blank terminal is a terrible thing to waste. Here's a loading message instead.",
}