July 3, 2024

RL or not RL? Parsing the processes that support human reward-based learning.

ACAnne CollinsAllen Institute for Brain Science

Key Points

Key points are not available for this paper at this time.

Abstract

{Reinforcement Learning (RL) algorithms have had tremendous success accounting for reward-based learning across species, in both behavior and brain. In particular, simple model-free RL models, such as delta-rule or Q-learning, are routinely used to model instrumental learning in bandit tasks, and they capture variance in brain signals. However, reward-based learning in humans recruits multiple processes, including high-level processes such as memory and low-level ones such as choice perseveration; their contributions can easily be mistakenly attributed to RL computations. Here, we investigate how much of RL-like behavior is supported by RL computations in a context where other processes can be factored out. Re-analysis and computational modeling of seven data sets spanning hundreds of participants show that in this instrumental context, reward-based learning is best explained by a combination of working memory and a habit-like associative process, with no RL-like value-based incremental learning. Simulations show that this combination nevertheless approximates the adaptive policy of a value-based RL agent, explaining why RL computations are mistakenly inferred when working memory is not parsed out. Our results raise important questions for the interpretation of RL as a meaningful process across brain and behavior, and call for a reconsideration of how we interpret findings in reinforcement learning across levels of analysis.

KI fragen

Bookmark

View Full Paper

Cite This Study

Anne Collins (Wed,) studied this question.

synapsesocial.com/papers/68e61806b6db6435875aaaa1 https://doi.org/https://doi.org/10.31234/osf.io/he3pm

KI fragen

Bookmark

View Full Paper