Strategic decision-making often involves more candidates than can be thoroughly assessed, leading evaluators to rely on proxies like gender and race, disadvantaging underrepresented minorities (URMs). As large language models (LLMs) like OpenAI’s ChatGPT become increasingly adopted by organizations, we ask whether and how LLMs rely on gender and race in evaluations. Across 26,000 evaluations of innovative offerings (e.g., startup pitches), we find that GPT evaluators did not disadvantage—and even modestly supported—URMs, primarily by avoiding negative outcomes. We theorize that this reflects symbolic compliance: A superficial response to avoid overt discrimination rather than a genuine commitment to fairness. We test this mechanism through “Second Opinion” experiments, where LLMs evaluate alongside simulated human inputs. This study highlights the implications of LLM adoption in strategic evaluations.
Botelho et al. (Tue,) studied this question.