What question did this study set out to answer?

This study aims to evaluate transformer-based architectures for accurate brain tumor classification using MRI data.

December 11, 2025Open Access

Comparative analysis of transformer architectures for brain tumor classification

Key Points

This study aims to evaluate transformer-based architectures for accurate brain tumor classification using MRI data.
Compared eleven models from Vision Transformer, Data-efficient Image Transformer, and Swin Transformer families.
Used a publicly available four-class MRI dataset (Glioma, Meningioma, Pituitary, No Tumor).
Applied stratified sampling and extensive data augmentation to enhance model generalization.
All evaluated models achieved high accuracy (> 98.8%).
Swin-Small attained the highest accuracy of 99.37% with lower computational costs than Swin-Large.
Swin-Small offered superior operational efficiency with a faster inference speed (0.54 ms vs. 1.29 ms) compared to Swin-Large.

Abstract

Aim: Early and accurate diagnosis of brain tumors is critical for treatment success, but manual magnetic resonance imaging (MRI) interpretation has limitations. This study evaluates state-of-the-art Transformer-based architectures to enhance diagnostic efficiency and robustness for this task, aiming to identify models that balance high accuracy with computational feasibility. Methods: We systematically compared the performance and computational cost of eleven models from the Vision Transformer (ViT), Data-efficient Image Transformer (DeiT), and Hierarchical Vision Transformer using Shifted Windows (Swin) Transformer families. A publicly available four-class MRI dataset (Glioma, Meningioma, Pituitary, No Tumor) was used for multi-class classification. The dataset was partitioned using stratified sampling and extensively augmented to improve model generalization. Results: All evaluated models demonstrated high accuracy (> 98.8%). The Swin-Small and Swin-Large models achieved the highest accuracy of 99.37%. Remarkably, Swin-Small delivered this top-tier performance at a fraction of the computational cost of the Swin-Large model, which is nearly four times its size and with more than double the inference speed (0.54 ms vs. 1.29 ms), showcasing superior operational efficiency. Conclusions: The largest model does not inherently guarantee the best performance. Architecturally efficient, mid-sized models like Swin-Small provide an optimal trade-off between diagnostic accuracy and practical clinical applicability. This finding highlights a key direction for developing feasible and effective AI-based diagnostic systems in neuroradiology.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Çakmak et al. (Thu,) studied this question.

synapsesocial.com/papers/6940192a2d562116f28f6c32 https://doi.org/https://doi.org/10.37349/emed.2025.1001377

Bookmark

View Full Paper