This paper investigates strategic learning approaches for resource allocation in decentralized edge computing environments where multiple agents compete for limited resources without direct coordination. The problem is modeled using a multi-player multi-armed bandit (MP-MAB) framework, which captures the exploration-exploitation trade-offs inherent in sequential decision-making. Building upon this foundation, the study incorporates game-theoretic principles such as strategic regret minimization to guide the development of learning strategies that can achieve both stable and efficient outcomes. Three representative algorithms—Upper Confidence Bound (UCB), Thompson Sampling (TS), and Sliding Window UCB—are implemented to evaluate performance across multiple dimensions. The experimental setup leverages the MovieLens dataset to simulate realistic user demand and resource constraints. Evaluation metrics include cumulative reward, conflict rate, and Jain's fairness index to capture efficiency, contention, and equity, respectively. Experimental results reveal that Thompson Sampling consistently outperforms the other strategies, delivering on average 10.2% higher rewards, a 25% reduction in conflict rate, and improved fairness scores across 50 interaction rounds. These findings underscore the advantages of probabilistic decision-making in competitive, distributed systems. The study offers practical implications for edge computing and other real-world systems requiring decentralized resource management.
Xiaolei Leng (Wed,) studied this question.