Purpose Large language models (LLMs) have been useful for synthesizing clinical practice guidelines into decision-support tools; however, their utility for clinicians has not been formally evaluated. This study aims to generate a structured clinical checklist from an otolaryngology guideline using an LLM and to assess clinician perceptions of its accuracy, usability, safety, and likelihood of adoption. Materials and methods An LLM (ChatGPT version 5.2, OpenAI, San Francisco, CA, USA) was provided with the American Academy of Otolaryngology-Head and Neck Surgery Clinical Practice Guideline: Evaluation of the Neck Mass in Adults and instructed to generate a concise checklist restricted to guideline content. A structured questionnaire comprising Likert-type scale items and free-text responses was distributed electronically to otolaryngologists. Quantitative responses were summarized descriptively, and thematic analysis was performed on free-text comments to identify key perceptions and concerns. Results Twenty-two otolaryngologists completed the survey, including attending physicians and trainees. Most respondents agreed that the checklist was accurate, clear, and safe; however, fewer indicated that it would save time or that they would be likely to use or recommend it in practice. Attending otolaryngologists more frequently endorsed checklist safety and expressed a greater willingness to use or recommend the checklist than trainees. Thematic analysis identified perceived clinical completeness and educational value as strengths, while omissions of specific examination elements were noted as limitations. Conclusions LLM-generated checklists derived from clinical practice guidelines were generally perceived as accurate and safe by otolaryngologists, but acceptance did not consistently translate into willingness to adopt them in practice. Perceived utility varied by level of training. These findings highlight both the potential and current limitations of LLM-generated decision-support tools and highlight the need for human oversight and further evaluation before routine clinical implementation.
Iqbal et al. (Fri,) studied this question.