What question did this study set out to answer?

This research aims to develop a self-supervised model for ultrasound imaging that improves representation learning and classification performance.

April 27, 2026Open Access

USF-MAE: Ultrasound self-supervised foundation model with masked autoencoding

Key Points

This research aims to develop a self-supervised model for ultrasound imaging that improves representation learning and classification performance.
Developed the USF-MAE model using 370,000 2D and 3D ultrasound images from 46 datasets.
Utilized an encoder-decoder architecture for masked image patch reconstruction.
Fine-tuned the pre-trained model on classification benchmarks and segmentation tasks.
USF-MAE achieved F1-scores of 81.6%, 79.6%, and 82.4% on BUS-BRA, MMOTU-2D, and GIST514-DB, respectively.
Outperformed state-of-the-art methods in breast cancer classification with strong cross-anatomical generalization.
Achieved mAP of 51.0% and mAP @ 50 of 77.9% for ovarian tumor segmentation.

Abstract

Ultrasound imaging is a diagnostic modality that provides real-time, radiation-free evaluation in many clinical areas. Due to noise, operator reliance, and restricted field of vision, ultrasound images are difficult to interpret, resulting in inter-observer variability. Due to the lack of labelled datasets and the domain gap between general and sonographic images, Deep Learning models pre-trained on non-medical data are limited in transferability. To address these challenges, we introduce the Ultrasound Self-Supervised Foundation Model with Masked Autoencoding ( USF-MAE ), the first large-scale self-supervised MAE framework pre-trained exclusively on ultrasound data. The model was pre-trained on ∼ 370,000 2D and 3D ultrasound images from 46 open-source datasets ( OpenUS-46 ), covering over 20 anatomical regions. This curated dataset has been made publicly available. Using an encoder–decoder architecture, USF-MAE reconstructs masked image patches, enabling it to learn representations directly from unlabelled data. The pre-trained encoder was fine-tuned on three public downstream classification benchmarks: BUS-BRA, MMOTU-2D, and GIST514-DB. USF-MAE outperformed CNN and ViT baselines in all tasks, attaining F1-scores of 81.6%, 79.6%, and 82.4%, respectively. Without labels during pre-training, USF-MAE approached the supervised foundation model UltraSam on breast cancer classification and outperformed it on other tasks, showing cross-anatomical generalization. In addition, USF-MAE showed strong performance on ovarian tumour segmentation using the MMOTU-2D dataset, achieving an mAP of 51.0% and mAP @ 50 of 77.9%. These findings establish USF-MAE as a scalable and label-efficient ultrasound foundation model. Its ultrasonic representation learning approach supports data-efficient clinical and research applications by continually pre-training on future unlabelled public or institutional datasets without human annotation.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Megahed et al. (Sat,) studied this question.

synapsesocial.com/papers/69eefd64fede9185760d4180 https://doi.org/https://doi.org/10.1016/j.bspc.2026.110313

Bookmark

View Full Paper