Purpose: To develop a gene expression-based prediction model for periodontitis by identifying a compact set of predictive genes and to validate the model using an independent cohort of patient samples analyzed by reverse transcription quantitative polymerase chain reaction (RT-qPCR).Methods: Using a total of 9 Gene Expression Omnibus (GEO) series, we first performed feature selection through differential expression analysis and SHapley Additive exPlanations (SHAP) values in 2 GEO series (GSE10334 and GSE16134).The remaining datasets were then integrated to construct an extended multi-cohort dataset (680 samples: 193 healthy and 487 with periodontitis) for model development using the XGBoost classifier.An exhaustive search with nested cross-validation (CV) was conducted to identify the optimal gene subset.Model performance was estimated using repeated 10-fold CV and summarized by the area under the receiver operating characteristic curve (AUC) with corresponding standard deviations.Gene-level interpretation was performed using SHAP rankings and univariate analyses.Validation was conducted in a newly collected RT-qPCR patient cohort (n=20; 10 healthy individuals and 10 patients with periodontitis) derived from gingival tissue samples, using z-score-transformed Ct values without model retraining. Results:The optimal 4-gene model (tissue inhibitor of metalloproteinases-4 TIMP4, RNA binding motif protein 25 RBM25, TLC domain containing 3A TLCD3A, and TSR1 ribosome maturation factor TSR1) achieved an AUC of 0.9360.031 in repeated 10-fold CV.Among individual genes, TIMP4 demonstrated the strongest discriminatory performance (AUC=0.867),followed by TLCD3A (AUC=0.849),TSR1 (AUC=0.782),and RBM25 (AUC=0.728).In the independent RT-qPCR cohort, the 4-gene model yielded an AUC of 0.790 (95% confidence interval, 0.548-0.979). Conclusions:The compact 4-gene XGBoost model provides reproducible and interpretable prediction of periodontitis across GEO datasets and demonstrates moderate performance in a newly collected RT-qPCR patient cohort, suggesting preliminary cross-platform feasibility.
Lee et al. (Thu,) studied this question.