Large Language Models (LLMs) offer a promising approach to recommendation by enabling the generation of user profiles in Natural Language (NL) form. When used as summarization devices, LLMs can produce interpretable and editable alternatives to opaque collaborative filtering representations, potentially increasing transparency and user control. However, it remains unclear whether users perceive these profiles as accurate representations of their preferences, which is key for trust and usability. Moreover, because LLMs inherit societal and data-driven biases, profile quality may systematically vary across user and item characteristics. In this paper, we investigate these issues in the context of music streaming, where personalization is challenged by large and culturally diverse catalogs. We conduct a user study in which participants evaluate NL profiles generated from their own listening histories. We analyze whether user identification with these profiles is biased by user attributes, such as mainstreamness and taste diversity, and by item features, including genre and country of origin. We further assess the usefulness of the generated profiles in a downstream recommendation task by analyzing their representations in a shared embedding space. Our results reveal systematic differences across models and user groups, highlighting both the potential and the limitations of scrutable, LLM-based profiling for personalized systems.
Sguerra et al. (Fri,) studied this question.