What question did this study set out to answer?

This study aims to categorize and evaluate various ML-based autoscaling methods in cloud applications.

March 18, 2026Open Access

ML-Based Autoscaling for Elastic Cloud Applications: Taxonomy, Frameworks, and Evaluation

VMVishwanath Srikanth MachirajuMicrosoft (India)VKVijay KumarDr. B. R. Ambedkar National Institute of Technology Jalandhar SSSahil SharmaUniversity of Ulster

Key Points

This study aims to categorize and evaluate various ML-based autoscaling methods in cloud applications.
Analyzed 60 primary studies from 2015 to 2025
Classified approaches into a five-dimensional taxonomy
Examined supervised, unsupervised, and reinforcement learning methods
Synthesized common evaluation practices and identified challenges
Identified fragmentation in current ML-based autoscalers across platforms
Highlighted challenges like actuation delays and telemetry lag
Outlined key areas for further research in unified orchestration and sustainability-aware autoscaling

Abstract

Elastic cloud systems are increasingly employing machine learning (ML) to automate resource scaling in response to variable workloads and stringent service-level objectives. However, current ML-based autoscalers are fragmented across different platforms, objectives, and evaluation frameworks. This survey examines 60 primary studies conducted between 2015 and 2025, categorising them according to a five-dimensional taxonomy that includes goal, decision logic, scaling mode, control scope, and deployment. This study classifies supervised, unsupervised, and reinforcement learning approaches and analyzes their integration into practical frameworks, including Kubernetes-based controllers and cloud provider services. This paper summarizes the application of machine learning to workload prediction, proactive and hybrid horizontal–vertical scaling, and adaptive policy optimization. Additionally, it synthesises common evaluation practices, encompassing workloads, metrics, and benchmarks. The analysis identifies ongoing challenges: actuation delays and telemetry lag, the intricacies of hybrid scaling, coordination across multi-service and edge-cloud deployments, and the constrained joint consideration of cost, SLO, and energy objectives. The identified gaps necessitate additional research on unified machine learning-driven orchestration, multi-agent and federated control, standardised benchmarks, and sustainability-aware autoscaling.

Perguntar à IA

Bookmark

View Full Paper