• Automatic model-based load test generation for elastic microservice applications. • Models explicitly couple application and autoscaler dynamics. • Optimization identifies failure-inducing workload traces offline. • Framework supports multiple objectives and workload scenarios. • Generated tests expose performance violations in real microservice applications. Microservice applications are required to consistently guarantee Service-Level Agreements (SLAs) under fluctuating workloads, a challenge commonly addressed through autoscaling mechanisms. However, the effectiveness of an autoscaler strongly depends on the workload scenario, and validating robustness across diverse workload conditions remains an open problem. To address this, we propose an offline model-based framework that automatically generates load test traces designed to expose performance failures in elastic microservice applications. The system under test is modeled as a closed-loop dynamical system where the microservice application and the autoscaler are explicitly coupled. Specifically, we encode both components as piecewise affine functions, allowing a wide set of applications and autoscalers to be captured. Test generation is framed using a falsification approach and solved as a mixed-integer linear program, eliminating the need for manual configuration or real system interactions during test generation. The generated test cases are designed to cause SLA violations, uncovering critical workload scenarios that may be overlooked by existing approaches. We evaluate the framework on both a realistic benchmark microservice application and a population of randomly generated systems, demonstrating that the generated traces consistently induce performance failures in real deployments. Furthermore, we show that the method generalizes across different autoscaling policies and workload patterns, producing valid test traces within short time intervals. Finally, we discuss and compare alternative approaches for load test generation. These experiments highlight both the effectiveness of the approach in exposing performance violations and its applicability to diverse autoscaling configurations.
Zamponi et al. (Sun,) studied this question.