What type of study is this?

September 10, 2025Open Access

Correlation-Aware Collaborative Adaptive Window Algorithm for Multi-Armed Bandits

Key Points

Adaptive UCB reduces cumulative regret by 18.35% compared to Standard UCB and 44.66% versus Sliding Window UCB.
The algorithm improves the mean average reward by 5.6% over Standard UCB and 1.52% over Sliding Window UCB.
Dynamic Window Recalibration adjusts the historical data window based on real-time covariance for adaptability.
Hierarchical Correlation-Aware Exploration clusters arms and applies Upper Confidence Bound at the group level for efficient exploration.

Abstract

The Multi-Armed Bandit (MAB) problem is central to reinforcement learning, where it addresses the trade-off between exploration and exploitation. However, traditional MAB algorithms often encounter difficulties in non-stationary environments with evolving correlations between arms. This paper introduces the Correlation-Aware Collaborative Adaptive Window Algorithm (Adaptive UCB). The algorithm addresses key challenges by combining two techniques: Dynamic Window Recalibration (DWR) and Hierarchical Correlation-Aware Exploration (HCAE). The DWR mechanism adjusts the window size of the historical data based on real-time covariance analysis. This allows the algorithm to adapt to both abrupt and gradual changes in the environment. The HCAE method improves the selection of arms by clustering them and using Upper Confidence Bound (UCB) at the group level, which helps in exploration and minimizes sampling redundancy. The results of the experiments show that Adaptive UCB is better than other algorithms, which are Standard UCB, Sliding Window UCB, and Restart UCB. The advantage is most apparent in volatile environments and where arms are highly correlated. The Adaptive UCB has a much lower cumulative regret of 18.35% of the Standard UCB and 44.66% of the sliding window UCB. It also increases the mean average reward by 5.6% compared to Standard UCB and 1.52% compared to sliding window UCB, which proves that the algorithm is efficient in dynamic conditions.

Correlation-Aware Collaborative Adaptive Window Algorithm for Multi-Armed Bandits

Key Points

Abstract

Cite This Study