Skip to content

Bytecairn

Archive

Writing

Everything I've published — on gaming, displays, AI, hardware, programming, and agents. The hype, examined. Filter by tag, or just scroll.

12 pieces

2026

12

Business & RegulationJun 10, 202626 min Enterprise AI Audits Find Zero Inference in Six Platforms Enterprise AI audits now expose legacy rule engines masquerading as intelligent platforms, triggering regulatory fines and sustained market declines.

Backend EngineeringJun 10, 202613 min Laravel Scout Drops Meilisearch Task IDs tags: laravel, meilisearch, php, json-serialization, async-tasks

Embedded Systems / AI InfrastructureJun 9, 202613 min TinyML Toolchains Block Production, Not Compression Quantization solves model size, but fragmented TinyML toolchains still block production until engineers adopt a unified build system for reliable edge AI deployments.

Edge ArchitectureJun 8, 202617 min When does Cloudflare Workers KV’s eventual consistency break your stateful logic? A field-tested routing model that separates distributed caching from atomic state, so you stop overprovisioning Durable Objects and avoid silent KV corruption. AI Hardware & InfrastructureJun 7, 202610 min VRAM capacity and memory bandwidth, not raw compute, dictate which 2026 models actually run locally The industry’s shift toward Mixture-of-Experts architectures and 128K context windows has turned GPU memory into a hard ceiling, forcing users to choose between aggressive quantization, slower unified memory, or NVIDIA’s $2,000+ 32GB cards.

AIJun 7, 20267 min Stop Guessing GGUF Quants: A VRAM-to-Precision Lookup Table for Local LLMs Consumer GPUs are bandwidth-bound, not precision-bound. Here’s the exact VRAM-to-quant lookup table that maximizes tokens/sec without crossing the perceptible quality threshold.

AIJun 6, 20266 min Ollama Isn't a Competitor to vLLM (And Neither Is llama.cpp) Stop comparing local LLM engines on tokens per second. Pick the one that matches your actual bottleneck: setup friction, KV-cache scheduling, or memory bandwidth. AIJun 6, 20266 min RTX Spark: 128GB Unified Memory Won't Fix the Bandwidth Bottleneck NVIDIA's RTX Spark packs 128GB of unified memory, but ~300 GB/s bandwidth caps inference throughput—here's the math on what you can actually run locally versus the cloud. HardwareMay 28, 20267 min Flashing custom firmware on an ESP32 without bricking it A careful, repeatable workflow for replacing stock firmware — with a recovery path for every step. AIMay 20, 20266 min A practical mental model for LLM context windows Stop thinking of the context window as memory. Think of it as a desk — finite, and everything competes for space. ProgrammingMay 11, 20268 min Self-hosting product analytics with open source You do not need a SaaS subscription to understand your users. A small stack gets you most of the way. ProgrammingMay 2, 20265 min What actually changed in the React compiler Automatic memoisation sounds like magic. Here is the unglamorous reality of what it does and does not do.