Abstract Do large language models (LLMs) generate unbiased financial advice across investor and fund manager demographics? We develop a two-sided audit framework to evaluate demographic bias in LLM-generated investment advice and apply it to multiple large language models, with GPT-4 Turbo as the primary baseline. On the investor side, fund selections are similar across demographic groups and rely on financial criteria, but recommended investment amounts vary when investor names signal race or gender, despite identical age and income. On the fund manager side, capital allocations favor non-Black and male managers: racial disparities persist even under explicit disclosure, while gender-related differences are more pronounced under name-based cues. Bias patterns are qualitatively similar across models, with differences in magnitude between implicit and explicit demographic signaling. These results suggest that, even when LLMs incorporate core financial reasoning, demographic signals can affect allocation decisions, with effects that tend to be stronger under implicit signaling, potentially replicating existing market inequalities and raising concerns about impartiality in financial advising. The proposed audit framework provides a generalizable approach for identifying and evaluating demographic bias in AI-driven financial advisory systems.
Wang et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: