Los puntos clave no están disponibles para este artículo en este momento.
Abstract Objectives Large language models (LLMs) are increasingly explored as decision-support tools in medical imaging. However, their ability to align with country-specific guidelines, which often diverge, remains uncertain. We set out to evaluate the geographic neutrality of three state-of-the-art LLMs—GPT-o3, Mistral Large, and DeepSeek R1—and a biomedical LLM (MedGemma 1.5 4B), when applied to neuroradiology scenarios with conflicting U.S. and non-U.S. recommendations. Materials and methods Vignettes derived from contradictory international guidelines were presented to each model under two conditions: an implicit setting, where no guideline was specified and vignettes were provided in English and French; and an explicit setting, where prompts directed models to follow a named guideline. Performance was reviewed against the target guideline, and mitigation strategies were tested. Results Thirty clinical vignettes presenting conflicting guidelines were evaluated by GPT-o3, Mistral Large, and DeepSeek R1. In the implicit setting, all models favored U.S. guidelines, with GPT-o3, Mistral, and DeepSeek aligning with them in 27 of 30 scenarios (90.0%; 95% CI, 74.4–96.5). In the explicit setting, adherence declined sharply for non-U.S. recommendations for all models. Providing the complete guideline text was the most effective mitigation strategy, restoring accuracies above 90% across all models. Conclusion Across languages and model origins, LLMs exhibited a systematic bias toward U.S. neuroradiology guidelines, even when explicitly instructed otherwise. This U.S.-centrism likely reflects training data imbalances and raises concerns for safe global deployment. Strategies for local contextualization, such as guideline integration at deployment, are necessary to ensure context-appropriate clinical decision support. Key Points Question Do large language models display geographical neutrality in neuroradiology decision support? Findings Even models developed in France and China systematically preferred United States guidelines, aligning with them in most implicit scenarios while failing to follow explicit guidelines from other sources. Clinical relevance This systematic United States-centric bias poses clinical and legal risks for global deployment. Safe implementation requires specific localization strategies, such as providing full guideline texts, to ensure recommendations align with local practice standards.
Bazerbachi et al. (Tue,) studied this question.