-
Rise of the Inference Stack
While language models promise to fundamentally change how we use AI across all industries, actually serving these models with low latency is quite challenging and slow even on expensive hardware. This inference problem has drawn the attention of many groups and organizations to create a solution for fast and efficient inference that works on various…
-
The shift from “large” to “lean” AI
The Agentic AI revolution has popularized the idea that larger models are better, but Small Language Models (SLMs) are emerging as a powerful alternative. SLMs, smaller versions of large language models, run efficiently on consumer devices with lower latency and cost. They excel in agentic workflows by being more targeted, reliable, and energy-efficient than massive…