The fast development of multi-tenant Software-as-a-Service (SaaS) environments has increased the pressure on developing smart, scalable, and autonomous cloud infrastructure management. Classical rule-based orchestration and reactive auto-scaling systems cannot support dynamic workloads of heterogeneous resources and complex service-level objectives (SLOs). This paper introduces a proposed artificial intelligence agent (AI) framework of autonomous cloud orchestration created to provide the best optimization of resource requesting, workload placement, fault tolerance, and service scaling in multi-tenant SaaS settings. The suggested approach combines reinforcement learning, using generative AI models, and self- adaptive control attempted within various layers of the cloud, Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and that of container orchestration systems such as Kubernetes. The intelligent agents are constantly checking the telemetry of the systems and then forecasting changes in workload, identifying the anomaly, and automatically scaling the computing, storage, and network resources to guarantee performance isolation and cost-effectiveness among tenants. Moreover, the framework has explainable decision modules, which will increase transparency in orchestration policy and boost trust in automated cloud operation. The model of autonomous orchestration based on AI-agents has been experimentally proven to yield better scalability, fewer SLA violations, lower power usage, and quicker fault recovery than traditional threshold- based and heuristic orchestration models, as well as to enable highly scalable and resilient SaaS systems.
Bijal Tayade (Mon,) studied this question.