What type of study is this?

This is a Quantitative Study study.

synapse

⌘+K

synapse

⌘+K

October 8, 2025Open Access

Bias In, Symbolic Compliance Out? GPT’s Reliance on Gender and Race in Strategic Evaluations

Key Points

GPT evaluators showed minimal disadvantage to underrepresented minorities, suggesting a shift in evaluation dynamics.
Across 26,000 evaluations, LLMs primarily avoided negative outcomes for underrepresented minorities.
Symbolic compliance may drive LLMs to avoid overt discrimination, creating a superficial fairness effect.
The study employed 'Second Opinion' experiments, testing LLM evaluations alongside simulated human inputs.

Abstract

Strategic decision-making often involves more candidates than can be thoroughly assessed, leading evaluators to rely on proxies like gender and race, disadvantaging underrepresented minorities (URMs). As large language models (LLMs) like OpenAI’s ChatGPT become increasingly adopted by organizations, we ask whether and how LLMs rely on gender and race in evaluations. Across 26,000 evaluations of innovative offerings (e.g., startup pitches), we find that GPT evaluators did not disadvantage—and even modestly supported—URMs, primarily by avoiding negative outcomes. We theorize that this reflects symbolic compliance: A superficial response to avoid overt discrimination rather than a genuine commitment to fairness. We test this mechanism through “Second Opinion” experiments, where LLMs evaluate alongside simulated human inputs. This study highlights the implications of LLM adoption in strategic evaluations.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper