Designing software systems within uncertain and evolving operational contexts is challenging, as pervasive uncertainty can compromise system objectives. Self-adaptive systems address this by dynamically adjusting configurations when objectives are at risk. However, exhaustive exploration of large configuration spaces is time-consuming and computationally expensive. Existing machine learning, search-based, and reinforcement learning approaches often degrade under changing conditions. To overcome these limitations, this paper introduces an architecture based on the MAPE-K loop integrated with a Deep Reinforcement Learning (DRL) module enhanced by a novel reward-shaping mechanism. The proposed Reward-Shaped Deep Reinforcement Learning (RS-DRL) method reshapes rewards during experience replay, improving generalization, convergence, and adaptability across dynamic environments. Experiments on IoT case studies (DeltaIoTv1/v2) show that RS-DRL achieves an asymptotic multi-objective reward of \ (0. 9905 0. 0268\) compared to \ (0. 4559 0. 2233\) for the \ (\) -greedy DQN approach, and near-optimal per-objective asymptotic results (e. g. , packet loss \ (0. 9850 0. 0071\), latency \ (0. 9964 0. 0107\) ). Under the TT goal, RS-DRL reduces packet loss by up to \ (22. 23\%\) and latency by up to \ (39. 07\%\) compared to the best competing method (DLASER+), while also consistently outperforming the exhaustive-search Reference baseline that analyzes the full adaptation space. All comparisons are based on 30 independent runs, with \ (25\) of \ (28\) baseline comparisons statistically significant (many with \ (p<0. 001\) ). These results demonstrate that RS-DRL offers a robust, efficient, and adaptive optimization strategy for self-adaptive systems operating under uncertainty and dynamic conditions.
Kavianifar et al. (Mon,) studied this question.