What question did this study set out to answer?

This research aims to classify gliomas and their grades using machine learning based on gene expression data.

May 6, 2026Open Access

Machine Learning-Based Classification of Gliomas and Tumor Grades with SHAP-Guided Feature Interpretation

Key Points

This research aims to classify gliomas and their grades using machine learning based on gene expression data.
Developed an interpretable machine learning framework for classification of brain tumor subtypes and grades
Analyzed gene expression data from the REMBRANDT dataset containing 464 labeled samples
Evaluated performance metrics including accuracy and AUC for model predictions
Achieved 99.6% accuracy for glioblastoma classification
Secured 83.7% accuracy for grade II vs. III tumor classification
Identified key genes contributing to model predictions through SHAP analysis

Abstract

Background: Gliomas are among the most common and heterogeneous primary brain tumors, exhibiting substantial molecular and transcriptomic diversity that complicates diagnosis, grading, and treatment planning. Advances in artificial intelligence (AI), particularly machine learning (ML), offer powerful opportunities to analyze high-dimensional gene expression data and support precision oncology. Methods: This study proposes an interpretable ML framework to classify brain tumor subtypes—glioblastoma, astrocytoma, and oligodendroglioma—and to predict tumor grades (2, 3, and 4) using microarray-based gene expression data. The analysis was conducted on the REMBRANDT dataset, comprising 464 labeled samples (221 glioblastoma, 148 astrocytoma, 67 oligodendroglioma, and 28 controls) and 314 tumor samples for grade classification. Results: The ML models achieved high performance for disease classification, with accuracies of 99.6% (AUC 99.89%) for glioblastoma, 98.3% (AUC 99.83%) for astrocytoma, and 98.95% (AUC 100%) for oligodendroglioma. Tumor grade predictions also performed strongly, achieving 83.7% accuracy (AUC 88.2%) for grade II vs. III, 91.3% (AUC 94.8%) for grade II vs. IV, and 84.2% (AUC 90.8%) for grade III vs. IV. SHAP analysis identified key genes contributing to the model predictions (e.g., WIF1, STX6, RGS5, and ACTR2), and KEGG enrichment identified the candidate pathways involved in vesicular transport, metabolism, and immune signaling. Conclusion: Overall, our findings demonstrate that interpretable ML models can accurately differentiate glioma subtypes and grades, and SHAP analysis can help identify the strongest predictors of our models. These findings provide additional insights into the heterogeneous genetic and molecular landscape of brain gliomas and are intended to complement, not replace, conventional histopathological diagnosis.

Machine Learning-Based Classification of Gliomas and Tumor Grades with SHAP-Guided Feature Interpretation

Key Points

Abstract

Cite This Study