Abstract Purpose Reinforcement learning with reversal is a paradigm frequently employed in neuropsychology to assess the adaptive capacity and flexibility in individuals in non-stationary environments. Each single behavior, however, depends on the superimposition of multiple factors, which are still not completely understood. Methods We optimize a neurocomputational model of the Basal Ganglia to simulate the behavior of 18 patients with Parkinson’s disease (both ON and OFF medication) and 14 control subjects during a two-choice probabilistic reversal learning task (with 80–20% reward probabilities). The individual behavior (in terms of the cumulative number of correct responses during a 40-trials direct phase and a 40-trial reversal phase) is reproduced by fitting a few model parameters for each individual, representing the tonic dopamine level, Hebbian learning, and the exploratory attitude (noise level). Results Results show that very different responses can be explained quite well, ascribing them to the varying combination of the aforementioned individual factors. The tonic dopamine level is significantly different for patients in ON and OFF medication, while the other parameters were not statistically different. A regression analysis reveals that the value of the Hebbian learning rate is correlated with the subject's sensitivity to punishments. In contrast, the noise standard deviation is correlated with exploration, i.e., the tendency to modify the choice even after a reward. Conclusion The results provide a mechanistic explanation of the various factors that affect adaptation and flexibility in reinforcement learning, representing a first step toward characterizing and understanding the diverse behaviors on an individual basis.
Piermaria et al. (Wed,) studied this question.