The shift from “large” to “lean” AI

July 14, 2025

4–5 minutes

The LLM centered Agentic AI revolution has led to the popular belief that in AI too “bigger means better“. However, a contrarian alternative referred to as Small Language Models (SLM) has been gaining mindshare steadily. Viewing the rise of these SLMs as a mere trend would be a gross understatement — on the contrary it may be a game-changer — as reflected in the title of this NVIDIA research paper, “Small Language Models are the Future of Agentic AI”.

Further momentum for SLMs comes from trailblazers such as arcee.ai who are boldly wagering everything on SLMs. Armed with strong financial backing Arcee is on a mission to drive SLM adoption among enterprises. And now, NVIDIA’s arguments have strengthened the case further in favour of smarter, leaner, and more efficient models.

What are Small Language Models?

SLMs are smaller versions of LLMs. They are language models that can run on common consumer devices with low latency, making them practical for one user’s requests. The size that defines an SLM is still debated; NVIDIA suggests a limit of under 10B parameters, while Arcee.ai leans towards anything below 70B. This threshold will continue to change each year, just like other aspects of computing.

Why Small Language Models? Are LLMs not enough?

The outstanding aspect of SLMs, which are trained on much smaller and more specific datasets, is their ability to outperform the larger, generalist LLMs when it comes to delivering the functional demands of agentic workflows.

For example, Arcee’s 70B-parameter model, SuperNova, outperforms GPT-4’s massive 1.8T-parameter model when it comes to instruction-following. Similarly, Microsoft’s Phi-3 small (7B) achieves language understanding and commonsense reasoning on par with and code generation scores running up to 70B models of the same generation.

SLMs receive targeted training that reduces errors and improves problem-solving skills. While LLMs are versatile and good at conversations, most tasks in agentic workflows are repetitive and straightforward. This means we need models that are efficient, reliable, and cost-effective. In this context, SLMs are not just enough but also a better choice.

SLMs offer many advantages

Lower latency due to smaller model size, enabling faster real-time responses. Advances in on-device inference systems allow local execution of SLMs on consumer-grade GPUs, showcasing real-time, offline agentic inference with lower latency and stronger data control.
Reduced memory and computational requirements make them suitable for deployment in resource-constrained environments or on-device applications. Serving a 7B SLM is 10–30X cheaper (in latency, energy consumption, and FLOPs) than a 70–175B LLM.
Significantly lower operational costs, as training and running SLMs consume less energy and require less expensive hardware. Fine tuning for SLMs require only a few GPU-hours, allowing behaviors to be added, fixed, or specialized overnight rather than over weeks.
Better security and privacy control since SLMs can be deployed on-premises or in private clouds, reducing data exposure risks.
Efficiency and predictability in repetitive tasks common in agentic workflows, where general conversational skills of LLMs are less important.
On top of competitive off-the-shelf performance, the reasoning capabilities of SLMs can be enhanced at inference time with self-consistency, verifier feedback, or tool augmentation. For example, Toolformer (6.7B) outperforms GPT-3 (175B) via API use and 1-3B models have rivaled 30B+ LLMs on math problems via structured reasoning.
A particularly significant and beneficial outcome of using SLMs over LLMs is the resulting democratization of AI agents. As more people and organizations gain the ability to create language models for deployment in agentic systems, the overall pool of agents is more likely to reflect a broader variety of viewpoints and requirements. This increased diversity can contribute to lowering the chances of systemic biases.

Then why aren’t agents using SLMs?

There is a mixed bag of reasons for why SLM adoption has not gone viral yet; factors such as a lack of widespread understanding among potential users about its benefits and the complexity of implementation for businesses, contribute to hesitance in fully embracing this innovative approach.

While these reasons are clearer, there may also be a David vs Goliath story at play here. It’s important to recognize the massive investments already put into building centralized LLM infrastructure, marketing it as the top paradigm for Agentic AI, and the community of developers, tools and products that has emerged around it. Much of the work on SLM design and development also relates to LLM design, aiming for the same general benchmarks. However, when focusing on benchmarks that assess the actual utility of agents, SLMs easily outperform the larger models.

In essence, the reasons for slow SLM adoption lie on the spectrum between practical and strategic than due to any fundamental flaw in the SLM technology itself.

The Road Ahead

SLMs are becoming more important as users look for models that can run well on different form factors. The community continues to create many efficient SLMs that I plan to discuss in this blog, each pushing the boundaries of what can be done and showing progress in natural language processing. As a result, ongoing research and innovation in small language models hold great potential for the future of artificial intelligence, fostering creativity and making tech solutions easier to access.

Coming up next : The Art of Distillation

Citations
- SLMs are the future of Agentic AI
- Why SLMs

smallm.works