Skip to content

Bytecairn

Archive

Writing

Everything I've published — on gaming, displays, AI, hardware, programming, and agents. The hype, examined. Filter by tag, or just scroll.

1 piece tagged #llm-inference

2026

1

AIJun 6, 20266 min Ollama Isn't a Competitor to vLLM (And Neither Is llama.cpp) Stop comparing local LLM engines on tokens per second. Pick the one that matches your actual bottleneck: setup friction, KV-cache scheduling, or memory bandwidth.