Diffuse large B-cell lymphoma (DLBCL) is an aggressive and common subtype of non-Hodgkin lymphoma (NHL). Despite the availability of several risk stratification tools, substantial room for improvement in personalized prognostic prediction still exists. Furthermore, considering the heterogeneity of DLBCL, how to select an appropriate treatment in a personalized manner remains a clinical challenge. In this study, we developed a random survival forests model by integrating clinical and gene expression data from 677 DLBCL case in Gene Expression Omnibus (GEO) database. Our model predicted overall survival with high concordance between training and validation datasets (C-index: 0.832 and 0.758, respectively), outperforming the consistency predicted by common prognostic markers such as Cell-Of-Origin Subtype, IPI score and Ann Arbor stage. Time-dependent ROC curves also showed good predictive performance for 1-year, 3-year, and 5-year survival in training and validation cohorts, the models are accessible via an open-access website. Survival analysis demonstrated that the group receiving the optimal treatment showed a more favorable survival association. Furthermore, we also used Kaplan-Meier curves, multivariate analysis and penalized Cox regression model to identify six genes (C2CD5, CD163, JADE3, BIRC3, TMEM200A, and LINC00877) related to the prognosis of DLBCL. In conclusion, we developed a machine learning model integrating clinical characteristics and gene expression profiles, providing a reliable decision-support tool for DLBCL prognosis and treatment selection.
Lin et al. (Sat,) studied this question.