Key points are not available for this paper at this time.
Existing evaluations of LLMs mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such as fairness, bias, and toxicity and deployment considerations received limited attention. Future evaluations should adopt standardized applications and metrics, use clinical data, and broaden focus to include a wider range of tasks and specialties.
Building similarity graph...
Analyzing shared references across papers
Loading...
Suhana Bedi
Yutong Liu
Lucy Orr-Ewing
JAMA
Harvard University
Stanford University
University of California, San Diego
Building similarity graph...
Analyzing shared references across papers
Loading...
Bedi et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d78105a9e24f7f0ff30865 — DOI: https://doi.org/10.1001/jama.2024.21700