Modern recommender systems increasingly rely on deep neural architectures to learn user-item relationships from interaction logs. Sequential recommendation has become a prominent paradigm, where RNN-based models such as GRU4Rec and Transformer-based models such as SASRec/BERT4Rec achieve strong performance on accuracy-oriented metrics (e.g., Recall and NDCG). However, real-world deployments expose fundamental limitations that accuracycentric formulations do not address: (i) ID-based representations are platform-specific and difficult to transfer across domains; (ii) optimizing only for relevance often produces homogeneous recommendation lists and fails to satisfy users’ multifaceted needs for diversity, novelty, and serendipity; (iii) offline-learned policies degrade under distribution shift and face exploration risks in dynamic online environments; and (iv) black-box pipelines provide limited interpretability and offer little actionable value to stakeholders beyond end-users. This thesis studies these challenges under a unified theme of multi-objective personalization for sequential recommendation, and develops methods that improve transferability, controllability, deployability, and stakeholder-facing value. First, to address the transferability bottleneck, we propose TransRec, which learns from mixture-of-modality (MoM) feedback by encoding items with content encoders (e.g., text and images) rather than categorical IDs. By learning directly from raw MoM features in an end-to-end manner, TransRec enables effective cross-domain transfer without requiring overlapped users or items, and yields significant gains in cold-start and cross-domain settings. Second, to move beyond accuracy-centric optimization, we introduce two frameworks that reformulate recommendation as multi-objective sequential decision-making. MODT4R leverages return-conditioned Decision Transformers to integrate multiple objectives within a stable supervised learning pipeline, allowing flexible objective trade-offs via inference-time adjustment. Building on this, HDT employs a hierarchical architecture to capture long-term preferences across sessions and short-term intent within sessions, and uses hierarchical (expected and unexpected) returns to balance accuracy with diversity, novelty, and serendipity. Across multiple datasets, MODT4R and HDT achieve up to 16% improvement in diversity-related metrics while maintaining competitive accuracy. Third, to bridge the offline-to-online gap for RL-based recommenders, we leverage Large Language Models (LLMs) as auxiliary components. We introduce LE/LEA to adapt LLMs as state and reward models and to augment offline learning signals via action synthesis. Furthermore, iALP and its adaptive variant A-iALP use LLM-distilled preferences to warm-start policies offline and adapt them online through fine-tuning and exploration strategies, achieving up to 20% improvement in long-horizon cumulative rewards in online simulation and reducing convergence time. Finally, to support multiple stakeholders, we propose PDiT-GIM, a two-stage diffusion framework that generates semantically meaningful preference representations and decodes them into interpretable, attribute-constrained textual and visual content, enabling actionable insights for retailers and designers in addition to end-user recommendation. Case studies report improved preference-aligned content generation and downstream engagement compared to generic baselines. Overall, through extensive experiments spanning e-commerce, multimedia recommendation, and simulated online environments, this thesis demonstrates that multi-objective personalization can simultaneously improve beyond-accuracy objectives and long-term policy performance while maintaining strong accuracy. The thesis is presented in a thesis-by-publication format, with chapters organized around the above tasks and objectives.
Jie Wang (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: