Key points are not available for this paper at this time.
BACKGROUND Large language models (LLMs) have the potential to improve the accessibility and quality of medical information for patients. Assessing the quality of LLM-generated responses in real-world clinical settings is crucial for determining their suitability and optimizing healthcare efficiency. OBJECTIVE This study aims to comprehensively evaluate the reliability of responses generated by an LLM-driven chatbot compared to those written by physicians, demonstrating that artificial intelligence (AI) can enhance the quality of otorhinolaryngological advice in complex, nuanced text-based workflows. METHODS Inquiries and verified physician responses related to otorhinolaryngology posted on a public social media forum between December 20 and 21, 2023, were extracted and anonymized. ChatGPT-4 was tasked with generating responses to each inquiry. A panel of seven board-certified otorhinolaryngologists evaluated both physician and ChatGPT-4 responses in a masked, randomized manner. The responses were assessed based on six criteria: overall quality, empathy, alignment with medical consensus, accuracy or appropriateness of information, inquiry comprehension, and potential harm. Logistic regression analysis was employed to identify predictors of preference for ChatGPT-4 responses and their influence on overall quality. RESULTS A total of 60 question–response pairs were included in the analysis. ChatGPT-4 responses were significantly longer (median: 162 words) compared to physician responses (median: 67 words; p CONCLUSIONS ChatGPT-4 outperformed physicians in generating high-quality responses. Therefore, integrating AI into clinical workflows may enhance the quality of physicians’ responses by improving comprehension of complex inquiries and providing more detailed information, thereby enhancing perceived quality.
Building similarity graph...
Analyzing shared references across papers
Loading...
Masaomi Motegi
Masato Shino
Mikio Kuwabara
Building similarity graph...
Analyzing shared references across papers
Loading...
Motegi et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e57654b6db643587515dab — DOI: https://doi.org/10.2196/preprints.66900