Accurate segmentation of spinal structures in magnetic resonance imaging (MRI) is an important step for supporting the diagnosis of several diseases and related conditions. In this work, we present a systematic evaluation of three deep learning architectures, U-Net, Feature Pyramid Network (FPN), and SegFormer, combined with ResNet-50 and EfficientNet-B2 encoders for the semantic segmentation of the cervical spine (C1–C7) in MRI volumes from the VerSe 2020 dataset. The proposed pipeline includes dataset preparation, model training using Dice loss, Adam optimizer, and early stop strategy, and evaluation with standard metrics such as Intersection over Union (IoU), F1-Score, and accuracy. Results show that FPN with ResNet-50 achieved the best overall performance, reaching an IoU of 0.6696 and an F1-score of 0.8021, while EfficientNet-B2 provided more consistent results across different architectures. Data augmentation showed limited impact, with gains restricted to a few specific configurations. Qualitative analysis through 3D surface reconstruction further highlighted the limitations of 2D slice-based segmentation, particularly at the extremities of the vertebrae, suggesting the need for volumetric approaches. The contributions of this work include the development of a reproducible pipeline for spinal segmentation, a comparative evaluation of convolutional and Transformer-based models. Qualitative analysis suggests potential benefits in exploring 3D approaches in future work.
Carvalho et al. (Tue,) studied this question.