To develop and validate a multimodal deep learning model for pre-treatment prediction of radiation-induced temporal lobe injury (RTLI), and to evaluate its generalizability across nasopharyngeal carcinoma-endemic and non-endemic regions. In this multicenter retrospective study, 6847 patients with nasopharyngeal carcinoma from five institutions in southern and northern China were included. A 3D ResNet-based multimodal deep learning model integrating planning CT images, spatial dose distribution maps, dosimetric variables, and clinical factors was developed and validated, and subsequently evaluated in internal and multiple external test cohorts. Model performance was assessed using the concordance index and compared with reduced-modality deep learning models, a dosiomics risk model, and conventional reference models. Model interpretability was explored using gradient-weighted class activation mapping (Grad-CAM). The multimodal deep learning model demonstrated robust predictive performance, achieving concordance indices of 0.817, 0.825, 0.767, 0.901, and 0.815 across the internal, three southern external, and northern external test cohorts, respectively, and significantly outperforming all comparator models (all P < 0.001). Incremental performance gains were observed with multimodal integration, and CT imaging improved performance in the deep learning framework but not in the dosiomics risk model. The model enabled reliable identification of patients at high risk of RTLI. Grad-CAM visualization links model outputs to spatial dose-tissue interactions to guide radiotherapy plan optimization. Multimodal deep learning integrating anatomical imaging, spatial dose information, and clinical context enables accurate and generalizable pre-treatment prediction of RTLI across diverse regions. Spatially interpretable outputs further support individualized risk stratification and establish a biologically and dosimetrically informed foundation for risk-adapted radiotherapy planning.
Yang et al. (Thu,) studied this question.