blog from Aveni

Why the future of agentic AI is not bigger – it is smaller

Share this resource
company
by Aveni
| 29/07/2025 09:00:00

Aveni Labs has been building a new kind of foundation model: one built specifically for financial services. FinLLM is our answer to the sector’s call for trusted, auditable and high-performance language intelligence. But not one trained to do everything, everywhere, all at once. That kind of ambition belongs to the generalist LLM.

A new paper out of NVIDIA Research, Small Language Models are the Future of Agentic AI, makes a clear, compelling case that the future of AI agents would not be powered by monolithic, trillion-parameter behemoths. It will be small, sharp and modular. And we agree.

Let us talk about why this matters, and how it reflects the design principles behind FinLLM.

The overuse of large models in a world of small tasks

The agentic AI boom has led to a tidal wave of LLM-first architectures – build agents, hook them into APIs, and let a generalist model reason through arbitrary tasks. But most agentic workloads do not actually require “general intelligence.” They require accuracy, reliability, and speed, often within a narrow functional domain.

What NVIDIA’s paper argues, and what we are seeing in practice, is that small language models (SLMs) are better positioned for the majority of tasks within an agentic system. Whether it is extracting data, summarising documents, validating financial statements, or generating templated output, these are scoped, deterministic, and repeatable actions. You do not need a GPT-4 for that. You need a fast, fine-tuned specialist.

Why smaller is better, especially in finance

Financial workflows are a perfect use case for the shift from LLMs to SLMs:

  • Performance where it counts: The best SLMs today (e.g. Phi-3, Hymba, RETRO) match or exceed the performance of older 30–70B models on key reasoning, instruction following and tool-use benchmarks. That is more than enough for the kinds of structured, format-sensitive outputs financial agents require.
  • Auditability and alignment: SLMs can be fine-tuned to follow rigid formats and behaviours, vital in an environment where hallucinations are unacceptable, and every output may have regulatory implications.
  • Edge compatibility and data sovereignty: Running SLMs locally or in private environments becomes feasible, even on commodity hardware. That opens up options for hybrid deployments and improved control over sensitive data.
  • Speed and cost: In regulated sectors, inference latency and operating costs are not trivial line items. SLMs deliver 10–30× efficiency gains in FLOPs and energy usage compared to frontier LLMs, according to NVIDIA’s benchmarks.

FinLLM is built for this moment

We did not set out to build yet another generalist foundation model. From day one, FinLLM has been designed around a different set of constraints: domain specialism, operational transparency, and composability. The architecture we are pursuing aligns closely with the SLM-first agentic model this paper promotes.

Our approach:

  • Leverages task-specific SLMs, fine-tuned from scratch or distilled from larger models, for high-accuracy, low-latency performance on core financial NLP tasks.
  • Builds modular agents that route requests intelligently, reserving LLMs only for ambiguous or open-ended reasoning, not routine actions.
  • Incorporates feedback loops and logging infrastructure that support continual fine-tuning and behavioural alignment, mirroring the LLM-to-SLM conversion pipeline proposed in the paper.
  • Prioritises deployment flexibility, so that firms can run FinLLM modules on-prem, in private cloud or hybrid environments, without vendor lock-in.

The strategic case for SLMs in the enterprise

If you are a Chief AI Officer weighing infrastructure choices, here is the bottom line: sticking with centralised, general-purpose LLMs as your agent backbone is becoming an increasingly inefficient default. It is a model built for maximum versatility, but often at the cost of alignment, controllability and cost-efficiency.

In contrast, an SLM-first approach, one that favours multiple small, specialised, swappable components, delivers better economics, greater agility, and a clearer path to compliance and traceability.

At Aveni, we are leaning into this shift. We are building FinLLM not as a single static model, but as a system of expert models designed to work together in concert, reflecting the actual architecture of financial services workflows.

Where this goes next

The industry is still early in its transition to agentic systems. But as workloads scale, budgets tighten, and regulation looms, we believe the questions of what powers your agents and how you scale them will become central to any enterprise AI strategy.

The NVIDIA paper is a shot across the bow of the LLM-first orthodoxy. It confirms much of what we have already seen in practice: in most real-world agent use cases, smaller is smarter.

If you are exploring how to make that transition toward modular, specialised, and cost-effective agentic AI, we would love to talk.

Read the original article here.