In music streaming applications, next music tracks are typically played automatically and require less consumption time than many other contents (e.g., books and movies). These characteristics potentially lead to long and noisy user listening history, making generic sequential recommendation models more error-prone when modeling user interests. To bridge the gap, we propose to retrieve representative music tracks from the original listening history for recommendation. Differing from existing user behavior retrieval techniques, we pursue a fine-grained, content-based, and temporal user behavior retrieval framework, namely FactUBR, to better exploit music track content and the fine-grained connections between temporally dependent tracks in the listening history. Technically, FactUBR consists of a reinforced content-based-retrieval module (RCB) and a differentiable temporal-channel (DTC) retrieval module. RCB follows a reinforcement-learning procedure to optimize sequential retrieval decisions based on content differences between the state and the observation, maximizing rewards of retrieval diversity and recommendation performance. DTC evaluates the fine-grained correlations between temporal listening segments and the candidate music track, and employs the perturbed maximum technique for hard retrieval optimization. Extensive experiments on two public music recommendation benchmarks demonstrate that FactUBR can enhance various representative sequential recommendation models and outperform state-of-the-art (SOTA) behavior retrieval techniques.
Fan et al. (Thu,) studied this question.