This preprint investigates the predictability of trial-by-trial human rule-learning behavior using the Badham et al. (2017) dataset. We compare cognitive and data-driven models, including Win-Stay-Lose-Shift, Q-learning, LSTM, and XGBoost, under leave-one-participant-out cross-validation. Results show that episodic resetting of behavioral-history features provides the largest performance gain, and that model performance approaches a practical predictability plateau near AUC = 0.676. Behavioral clustering further decomposes population-level predictability into four learner profiles.
Hamid et al. (Wed,) studied this question.