AI
Ollama Isn't a Competitor to vLLM (And Neither Is llama.cpp)
Stop comparing local LLM engines on tokens per second. Pick the one that matches your actual bottleneck: setup friction, KV-cache scheduling, or memory bandwidth.
Archive
Tutorials, teardowns and field notes across software, firmware and the machines between them.
1 piece tagged #llm-inference
AI
Stop comparing local LLM engines on tokens per second. Pick the one that matches your actual bottleneck: setup friction, KV-cache scheduling, or memory bandwidth.