What type of study is this?

September 10, 2025

Redefining Bias Audits for Generative AI in Health Care

Key Points

Current bias evaluation frameworks for LLMs inadequately address unique challenges, needing novel approaches.
Real-world examples in mental health chatbots highlight existing biases that require effective auditing and mitigation.
Proposed guidelines aim to enhance the detection and categorization of biases in generative AI applications within health care.
New auditing methodologies could lead to improved health equity, reducing the risk of exacerbating existing disparities.

Abstract

Large language models (LLMs) are transforming health care by supporting a range of administrative and clinical tasks; however, recent studies have raised concerns about their potential to exacerbate existing health care inequities. Traditional algorithmic auditing approaches fall short in addressing the unique challenges posed by LLMs, which process complex text-based inputs and generate human-like outputs. In this perspective, we examine current approaches for evaluating LLM bias in clinical settings, identifying key gaps in existing audit methodologies. We propose comprehensive guidelines for categorizing and detecting biases in LLM applications and illustrate their application through two real-world deployed systems — in-basket patient response drafting and mental health chatbots. Finally, we offer concrete recommendations for advancing LLM bias evaluation in a rapidly evolving technological landscape.

Ask AI

Helpful

Bookmark