Modern decision-making systems, from online marketplaces to large language models (LLMs), increasingly rely on high-dimensional environments with human feedback. However, the inherent heterogeneity of user preferences and the massive scale of feature spaces pose significant challenges for statistical efficiency and robust alignment. This dissertation develops and analyzes low-rank reinforcement learning (RL) methods designed to exploit latent structures to achieve scalability and theoretical rigor. In the first part, we investigate the dynamic assortment problem in high-dimensional e-commerce settings. By imposing a low-rank structure on user–item interactions, we significantly reduce the complexity of estimating personalized utilities. We demonstrate how this structure enables efficient exploration-exploitation strategies and provide provable regret bounds that characterize the gain in efficiency over traditional methods. We then assess the performance of our method in the Expedia Hotel recommendation dataset. The second part of this dissertation extends these principles to Reinforcement Learning from Human Feedback (RLHF) within large-scale contextual environments. We propose a low-rank contextual RLHF framework that simultaneously addresses diverse user preferences and the intricate latent spaces typical of modern LLMs. Our approach incorporates personalized reward modeling for alignment, offering theoretical guarantees on sample efficiency and robust performance under distribution shifts. Throughout this work, we provide rigorous theoretical analyses, algorithmic descriptions, and extensive numerical experiments. Together, these contributions illustrate how a low-rank perspective unifies efficiency and robustness in personalized decision-making systems, providing a scalable path for aligning complex models with heterogeneous human values.
Seong Jin Lee (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: