February 26, 2026Open Access

A deep learning vision-language model for diagnosing pediatric dental diseases

Q: What does this research mean for the field?

A deep learning vision-language model achieves 90% accuracy and 0.96 AUC in diagnosing pediatric dental diseases from panoramic radiographs. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

Key Points

To develop and evaluate a deep learning vision-language model for diagnosing pediatric dental diseases from radiographs.
Proposed a multimodal framework combining visual features from panoramic radiographs and textual descriptions.
Used non-linear dynamics and textural encoding for visual features extraction.
Trained a one-dimensional convolutional neural network classifier on fused representations of visual and textual data.
Evaluated model performance using metrics such as accuracy, sensitivity, precision, F1 score, and AUC.
Achieved 90% accuracy and 0.96 AUC in diagnosing pediatric dental diseases.
Sensitivity was 92%, specificity was 83%, and precision was 92%.
Outperformed conventional image-only convolutional neural networks and standalone language-based models.

Abstract

Automated diagnosis of pediatric dental diseases from panoramic radiographs remains challenging due to anatomical variability and limited availability of specialist expertise. Vision-language models offer a potential approach by integrating visual and textual information to improve diagnostic performance and interpretability. To develop and evaluate a deep learning vision-language model for differentiating between caries and periapical infections in pediatric panoramic radiographs. A multimodal framework was proposed that combines visual features extracted from panoramic radiographs using non-linear dynamics and textural encoding with textual descriptions generated by a large language model. The fused multimodal representations were used to train a one-dimensional convolutional neural network classifier. Model performance was evaluated using accuracy, sensitivity, precision, F1 score, and area under the receiver operating characteristic curve (AUC). Experiments conducted on a small, single-center dataset demonstrated that the proposed model outperformed conventional image-only convolutional neural networks and standalone language-based approaches, achieving an accuracy of 90%, sensitivity of 92%, specificity of 83%, precision of 92%, F1 score of 0.90, and an AUC of 0.96 within this dataset. However, the limited sample size and absence of external or prospective clinical validation restrict the generalizability and immediate clinical applicability of these findings. The results suggest that integrating visual and textual representations can enhance diagnostic performance for pediatric dental disease classification. Nevertheless, the findings should be regarded as preliminary and hypothesis-generating. Future work will involve larger, multi-center studies, external validation, and prospective clinical evaluation to establish robustness, generalizability, and real-world clinical impact of vision-language models in pediatric dental diagnostics. • Vision–language model integrates radiographic and textual features. • Achieves 90% accuracy and 0.96 AUC in pediatric dental disease diagnosis. • Multimodal fusion improves accuracy and interpretability in radiology.

A deep learning vision-language model for diagnosing pediatric dental diseases

Key Points

Abstract

Cite This Study