What question did this study set out to answer?

This research aims to assess the accuracy and educational value of various AI chatbots in strabismus care education for caregivers.

synapse

⌘+K

synapse

⌘+K

May 8, 2026

AI chatbots in strabismus care: A multidomain expert evaluation of caregiver-facing information

Key Points

This research aims to assess the accuracy and educational value of various AI chatbots in strabismus care education for caregivers.
Multidomain expert evaluation of AI chatbots including ChatGPT, Grok, DeepSeek, Gemini, and Llama.
Inter-rater reliability assessed using Fleiss' κ and Gwet's AC1 statistics.
Comparative analysis of chatbots based on accuracy, clarity, educational value, and safety.
ChatGPT demonstrated superior accuracy (p < 0.05) and clarity compared to Grok (OR 0.48) and DeepSeek (OR 0.61).
Gemini and Llama were rated higher in educational value and safety.
High expert agreement indicated (Fleiss' κ = 0.59, Gwet's AC1 = 0.87), supporting AI chatbots' role in pediatric ophthalmology education.

Abstract

< 0.05) but not for Safety. Compared with ChatGPT, lower odds of higher ratings were seen for Grok (OR 0.48) and DeepSeek (OR 0.61). Inter-rater reliability indicated moderate agreement (Fleiss' κ = 0.59) and strong consensus (Gwet's AC1 = 0.87).ConclusionChatGPT showed superior accuracy and clarity, while Gemini and Llama excelled in educational value and safety. High expert agreement supports AI chatbots as adjuncts in pediatric ophthalmology education requiring continued validation.

Bookmark

AI chatbots in strabismus care: A multidomain expert evaluation of caregiver-facing information

Key Points

Abstract

Cite This Study