Background: Accurate and dependable brain tumor classification from magnetic resonance imaging (MRI) is essential for clinical decision support, yet remains challenging due to inter-dataset variability, heterogeneous tumor appearances, and limited generalization of many deep learning models. Existing studies often rely on single-dataset evaluation, insufficient statistical validation, or lack interpretability, which restricts their clinical reliability and real-world deployment. Methods: This study proposes a robust brain tumor classification framework based on the ConvNeXt Base architecture. The model is evaluated across three independent MRI datasets comprising four classes-glioma, meningioma, pituitary tumor, and no tumor. Performance is assessed using class-wise and aggregate metrics, including accuracy, precision, recall, F1-score, AUC, and Cohen's Kappa. The experimental analysis is complemented by ablation studies, computational efficiency evaluation, and rigorous statistical validation using Friedman's aligned ranks test, Holm and Wilcoxon post hoc tests, Kendall's W, critical difference diagrams, and TOPSIS-based multi-criteria ranking. Model interpretability is examined using Grad-CAM++ and Gradient SHAP. Results: ConvNeXt Base consistently achieves near-perfect classification performance across all datasets, with accuracies exceeding 99.6% and AUC values approaching 1.0, while maintaining balanced class-wise behavior. Statistical analyses confirm that the observed performance gains over competing architectures are significant and reproducible. Efficiency results demonstrate favorable inference speed and resource usage, and explainability analyses show that predictions are driven by tumor-relevant regions. Conclusions: The results demonstrate that ConvNeXt Base provides a reliable, generalizable, and explainable solution for MRI-based brain tumor classification. Its strong diagnostic accuracy, statistical robustness, and computational efficiency support its suitability for integration into real-world clinical and diagnostic workflows.
Pant et al. (Wed,) studied this question.