The rapid proliferation of agentic AI systems has been dominated by Large Language Models (LLMs), but their substantial operational costs and high latency present significant barriers to widespread adoption. In response, the research community has increasingly turned to Small Language Models (SLMs), which offer a compelling combination of efficiency, task-specificity, and cost-effectiveness. This paper introduces the Adaptive Agentic Meta-Controller (AAMC), a deep reinforcement learning (RL) framework designed for intelligent SLM/LLM orchestration. The AAMC transforms the model selection problem into a principled, multi-objective optimization task, learning a dynamic policy that routes queries to the most appropriate model—preferring SLMs for routine tasks and escalating to LLMs only when necessary. Our framework features a Task Complexity Estimator (TCE) and an RL-based Router (RLR) that collaboratively balance the trade-offs between performance, cost, and latency. We conduct extensive experiments in a high-fidelity simulation environment, demonstrating that the AAMC achieves a task success rate comparable to an LLM-only approach while reducing operational costs by over 70% and significantly improving inference latency. We further introduce a comprehensive set of experiments on robustness, scalability, and fairness, including new ablation studies on the impact of the TCE and the sensitivity to user preferences, alongside a detailed complexity analysis and a discussion of real-world deployment. We further introduce a comprehensive set of experiments on robustness, scalability, and fairness, alongside a detailed complexity analysis and a real-world enterprise deployment case study. We also release code and experiments to support reproducibility.
Building similarity graph...
Analyzing shared references across papers
Loading...
Gaith Rjoub
Wei Wan
Shahed Bassam Almobydeen
Neurocomputing
University of Alberta
Polish Academy of Sciences
Concordia University
Building similarity graph...
Analyzing shared references across papers
Loading...
Rjoub et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69a67dd6f353c071a6f09e01 — DOI: https://doi.org/10.1016/j.neucom.2026.133192