Complex interactions among individuals in a population are often influenced by both experiential memory and emotional factors. In this paper, we propose an anxiety-driven prisoner's dilemma model and assume that agents adopt the Q-learning algorithm to assist in strategy selection. In designing immediate rewards function, an individual's comprehensive payoff consists of both the actual payoff and the reputational payoff. Each individual acquires information about the comprehensive payoffs of other individuals within its interaction range and calculates a weighted average to determine its own fitness. Anxiety modulates the generation of immediate rewards by scaling the fitness mechanism. The results show that the effect of the interaction radius D on the cooperation level depends on the interaction between the group anxiety level β and environmental parameters (i.e., the temptation to defect b and the reputation factor θ ). Specifically, theoretical analysis reveals the underlying cause of the non-monotonic relationship between the interaction radius D and the cooperation ratio f C , which can also reasonably explain the formation of checkerboard patterns. The more adverse the environment, the stronger the inhibitory effect of anxiety on cooperation. In populations with high anxiety levels, it is more essential to appropriately expand the interaction range when the temptation to defect b is low, so as to facilitate cooperation. • A reinforcement learning-based model is established. • Our model incorporates reputation, anxiety and interaction range. • The effects of diverse parameter combinations on the outcomes are revealed. • The classic checkerboard pattern is exhibited and analyzed.
Yan et al. (Fri,) studied this question.