Differentiating jawbone-destroying malignancy from osteomyelitis remains a major diagnostic challenge in oral and maxillofacial surgery because these entities share overlapping radiologic features but require fundamentally different management strategies. This study evaluated the bi-linguistic diagnostic performance of advanced multimodal large language models (LLMs) in distinguishing these conditions using stepwise multimodal inputs. In this retrospective diagnostic accuracy study, 50 patients with histopathologically confirmed malignancy or osteomyelitis of the maxilla or mandible were included. Three multimodal LLMs (ChatGPT, Claude, and Gemini) were assessed using standardized prompts in Korean and English under three imaging conditions: panoramic radiograph only (P), panoramic radiograph plus computed tomography (CT) (P + C), and panoramic radiograph plus CT plus histopathology slide (P + C+B). Diagnostic accuracy, sensitivity, and specificity were evaluated against histopathology as the reference standard using generalized linear mixed models. Overall diagnostic accuracy increased significantly with additional modalities, from 0.683 (95% CI, 0.542–0.797) under the P condition to 0.776 (95% CI, 0.652–0.865) under P + C, and to 0.978 (95% CI, 0.953–0.990) under P + C+B. Incorporation of histopathology slides markedly increased the odds of a correct diagnosis compared with P and P + C conditions (both p < 0.0001), while CT addition alone showed a nonsignificant trend toward improvement. Under limited imaging conditions, models tended to overdiagnose malignancy, reflecting high sensitivity but low specificity. With full multimodal input, all models achieved balanced diagnostic performance across models and languages. Notably, ChatGPT demonstrated higher diagnostic accuracy in the Korean-language condition than in English. Overall, these findings suggest that multimodal LLMs may support diagnostic interpretation by integrating heterogeneous imaging information, highlighting their potential role as adjunctive decision-support tools within existing maxillofacial diagnostic frameworks.
Yoo et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: