Introduction Induction chemotherapy (IC) represents a standard treatment approach for locally advanced nasopharyngeal carcinoma (LA-NPC), yet marked interpatient heterogeneity in treatment response persists. This study sought to develop and evaluate a temporal Transformer–based fusion model integrating baseline pretreatment and early intratreatment MRI to facilitate early risk stratification and inform individualized therapeutic management. Materials and Methods In this retrospective multicenter study, 488 patients with pathologically confirmed LA-NPC were enrolled from two institutions. All patients underwent induction chemotherapy and received contrast-enhanced T1-weighted imaging (CE-T1WI) before treatment initiation (Pre-IC) and at an early post-treatment time point (Post-IC). A dual-branch independent network architecture with Twins-SVT as the backbone was implemented to separately extract deep learning features from Pre-IC and Post-IC CE-T1WI images. Subsequently, an attention-based temporal Transformer fusion module was designed to model nonlinear longitudinal interactions and dynamic evolutionary patterns between pre- and post-treatment tumor representations, yielding a longitudinal temporal fusion predictive model. Gradient-weighted class activation mapping (Grad-CAM) was applied to enhance interpretability through visualization of salient imaging regions. Results The deep learning model based on Pre-IC imaging yielded an AUC of 0.844 (95% CI: 0.767–0.922) in the internal validation cohort and 0.819 (95% CI: 0.725–0.913) in the external validation cohort. The Post-IC–based model demonstrated AUCs of 0.863 (95% CI: 0.790–0.936) and 0.838 (95% CI: 0.753–0.923) in the internal and external validation cohorts, respectively. The longitudinal temporal Transformer fusion model achieved higher predictive performance, with AUCs increasing to 0.889 (95% CI: 0.824–0.955) in the internal validation cohort and 0.865 (95% CI: 0.791–0.939) in the external validation cohort. Conclusion Compared with single–time point models, a longitudinal contrast-enhanced MRI–based temporal Transformer fusion model enabled more accurate early individualized prediction of induction chemotherapy response in patients with locally advanced nasopharyngeal carcinoma.
Han et al. (Mon,) studied this question.