What question did this study set out to answer?

The aim is to develop a framework that orchestrates SLMs and LLMs effectively while optimizing performance and minimizing costs.

March 3, 2026Open Access

Adaptive agentic meta-controller (AAMC): A deep reinforcement learning framework for intelligent SLM/LLM orchestration

Puntos clave

The aim is to develop a framework that orchestrates SLMs and LLMs effectively while optimizing performance and minimizing costs.
Developed the Adaptive Agentic Meta-Controller (AAMC) using deep reinforcement learning.
Designed a Task Complexity Estimator (TCE) and an RL-based Router (RLR).
Conducted experiments in a high-fidelity simulation environment to assess performance, cost, and latency.
Achieved a task success rate similar to an LLM-only approach.
Reduced operational costs by over 70%.
Significantly improved inference latency compared to previous models.

Resumen

The rapid proliferation of agentic AI systems has been dominated by Large Language Models (LLMs), but their substantial operational costs and high latency present significant barriers to widespread adoption. In response, the research community has increasingly turned to Small Language Models (SLMs), which offer a compelling combination of efficiency, task-specificity, and cost-effectiveness. This paper introduces the Adaptive Agentic Meta-Controller (AAMC), a deep reinforcement learning (RL) framework designed for intelligent SLM/LLM orchestration. The AAMC transforms the model selection problem into a principled, multi-objective optimization task, learning a dynamic policy that routes queries to the most appropriate model—preferring SLMs for routine tasks and escalating to LLMs only when necessary. Our framework features a Task Complexity Estimator (TCE) and an RL-based Router (RLR) that collaboratively balance the trade-offs between performance, cost, and latency. We conduct extensive experiments in a high-fidelity simulation environment, demonstrating that the AAMC achieves a task success rate comparable to an LLM-only approach while reducing operational costs by over 70% and significantly improving inference latency. We further introduce a comprehensive set of experiments on robustness, scalability, and fairness, including new ablation studies on the impact of the TCE and the sensitivity to user preferences, alongside a detailed complexity analysis and a discussion of real-world deployment. We further introduce a comprehensive set of experiments on robustness, scalability, and fairness, alongside a detailed complexity analysis and a real-world enterprise deployment case study. We also release code and experiments to support reproducibility.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo