Automated diagnosis of pediatric dental diseases from panoramic radiographs remains challenging due to anatomical variability and limited availability of specialist expertise. Vision-language models offer a potential approach by integrating visual and textual information to improve diagnostic performance and interpretability. To develop and evaluate a deep learning vision-language model for differentiating between caries and periapical infections in pediatric panoramic radiographs. A multimodal framework was proposed that combines visual features extracted from panoramic radiographs using non-linear dynamics and textural encoding with textual descriptions generated by a large language model. The fused multimodal representations were used to train a one-dimensional convolutional neural network classifier. Model performance was evaluated using accuracy, sensitivity, precision, F1 score, and area under the receiver operating characteristic curve (AUC). Experiments conducted on a small, single-center dataset demonstrated that the proposed model outperformed conventional image-only convolutional neural networks and standalone language-based approaches, achieving an accuracy of 90%, sensitivity of 92%, specificity of 83%, precision of 92%, F1 score of 0.90, and an AUC of 0.96 within this dataset. However, the limited sample size and absence of external or prospective clinical validation restrict the generalizability and immediate clinical applicability of these findings. The results suggest that integrating visual and textual representations can enhance diagnostic performance for pediatric dental disease classification. Nevertheless, the findings should be regarded as preliminary and hypothesis-generating. Future work will involve larger, multi-center studies, external validation, and prospective clinical evaluation to establish robustness, generalizability, and real-world clinical impact of vision-language models in pediatric dental diagnostics. • Vision–language model integrates radiographic and textual features. • Achieves 90% accuracy and 0.96 AUC in pediatric dental disease diagnosis. • Multimodal fusion improves accuracy and interpretability in radiology.
Tuan D. Pham (Sun,) studied this question.