Over the last decade, brain tumors have emerged as a significant and potentially fatal medical issue. Traditional methods for detecting and classifying brain tumors using Magnetic Resonance Imaging (MRI) scans are often time-consuming and prone to inaccuracies, necessitating a precise classification method for brain tumors, effective diagnosis, and therapy planning. This study proposes the use of a Vision Transformer (ViT) model to classify brain tumors from the Brain Tumor MRI dataset available on Kaggle into four categories: no-tumor, meningioma, pituitary tumor, and glioma. Unlike conventional Convolutional Neural Networks (CNNs), the proposed ViT model leverages self-attention mechanisms, making it particularly effective for capturing global relationships and extracting complex features from medical images. The model processes images by dividing them into fixed-size patches, which are then linearly embedded and passed through a positional encoding layer. These encoded representations are input into the transformer’s encoding layers, and the final classification is produced through a fully connected output layer. The performance of the ViT model is evaluated using standard multi-class classification metrics. The model achieved an impressive average accuracy of 99.3%, outperforming all other Deep Learning (DL) models previously tested on this benchmark dataset. The ViT model’s auto-focusing capability enables it to capture both fine-grained and large-scale features, significantly improving the accuracy and reliability of brain tumor detection.
Murala et al. (Sat,) studied this question.