What question did this study set out to answer?

To evaluate how effectively ChatGPT 4.0 can assist physicians in assessing dysphagia in head and neck cancer patients undergoing radiotherapy.

April 17, 2026Open Access

ChatGPT-assisted evaluation of dysphagia in head and neck cancer radiotherapy patients: A practical tool

Key Points

To evaluate how effectively ChatGPT 4.0 can assist physicians in assessing dysphagia in head and neck cancer patients undergoing radiotherapy.
Conducted a prospective study with 100 head and neck cancer patients.
Compared dysphagia assessments by human physicians in control and experimental groups.
Utilized expert evaluations as gold standard for consistency comparisons.
Achieved a Kappa index of 0.87 between the experimental group and the expert group.
Control group had a Kappa index of 0.70, demonstrating lower consistency.
Experimental group showed higher accuracy in identifying dysphagia cases compared to the control group.

Abstract

Objectives To explore the effectiveness of utilizing ChatGPT 4.0 to assist human physicians in assessing dysphagia in patients undergoing radiotherapy for head and neck cancer. Methods This prospective study included 100 head and neck cancer (HNC) patients who visited our hospital between January 2025 and October 2025. All participants first underwent an independent dysphagia assessment in the control group conducted by a human physician (Physician A). Subsequently, they were evaluated in the experimental group by a similarly qualified physician (Physician B) with the assistance of ChatGPT 4.0. The comprehensive assessment results from an expert group consisting of two senior head and neck surgeons with ten years of experience served as the “gold standard.” Consistency comparisons of the evaluation results among the three groups were conducted to validate the effectiveness of the language model-assisted assessment. Results The consistency Kappa index between the experimental group and the expert group was 0.87, indicating a “good” level of consistency, significantly superior to the control group’s 0.70. Subgroup analysis of different EAT-10 and MDADI score ranges showed that in 85 patients with EAT-10 scores ≥ 3: the control group accurately identified 72 cases, achieving an accuracy of 84.7%; the experimental group accurately identified 80 cases, with an accuracy of 94.1%. Among 78 patients with MDADI scores ≤ 69, the control group accurately identified 65 cases (accuracy of 83.3%), while the experimental group identified 73 cases accurately (93.6%). Conclusions The assessment model combining large language models with human physicians effectively improves the accuracy and consistency of dysphagia assessment in patients undergoing radiotherapy for head and neck cancer.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper