BACKGROUND: Medical image segmentation is a crucial task for accurate diagnosis and treatment, aiding in the identification of organs and lesions. While SAM has excelled in natural image segmentation, its direct application to medical images is limited due to significant feature differences. Existing models like MedSAM, despite making progress, face challenges with high computational resource consumption and insufficient accuracy in handling detailed features. PURPOSE: To address the limitations of high computational cost and insufficient segmentation accuracy in existing medical image segmentation models, this study proposes a novel model, VM-MedSAM, designed to be more efficient and precise. METHODS: Inspired by the Mamba architecture, we developed VM-MedSAM. The model incorporates a vision backbone network based on RVM+, freezes the prompt encoder, and optimizes the image encoder from MedSAM. This structural adjustment significantly reduces the number of parameters and improves training efficiency. The proposed model was validated on a medical image dataset covering 12 different abdominal organs. RESULTS: Experimental results demonstrate that VM-MedSAM achieves a slight improvement in abdominal organ segmentation accuracy compared to MedSAM, with significant improvements in lung cancer and brain tumor segmentation. Furthermore, VM-MedSAM reduced the number of parameters by 65.11%, increased training speed by 3.82 times, and decreased model size by 85.41%. CONCLUSIONS: The VM-MedSAM model effectively addresses the challenges of high computational cost and limited accuracy in existing medical image segmentation approaches. Its improved performance and efficiency make it a promising solution for medical image segmentation.
Li et al. (Fri,) studied this question.