What question did this study set out to answer?

The aim is to create deep learning models that enhance the accuracy and efficiency of medical image segmentation, addressing data challenges.

March 28, 2026Open Access

Efficient and Generalised Deep Learning Models for Medical Image Segmentation

Read Full Paperexternally

Key Points

The aim is to create deep learning models that enhance the accuracy and efficiency of medical image segmentation, addressing data challenges.
Developed CTranS, combining CNN and Transformer for baseline segmentation.
Introduced MO-CTranS for learning from partially labeled datasets.
Proposed CRFTrans to improve computational efficiency with a novel recursive layer.
Created SSL-MedSAM2, a semi-supervised framework that integrates few-shot learning.
CTranS achieved state-of-the-art performance in medical image segmentation.
MO-CTranS enhanced generalisability and tackled label inconsistencies.
CRFTrans reduced computational complexity without sacrificing performance.
SSL-MedSAM2 significantly decreased the need for manual annotations, improving data efficiency.

Abstract

Medical image segmentation is a critical task in computer-assisted diagnosis and therapy planning. Accurate and efficient fully automated segmentation methods are essential for accelerating decision-making processes. Compared with natural image segmentation methods, current deep learning–based segmentation models for medical imaging face challenges related to data scarcity, inconsistent annotations, and high computational cost. Consequently, this thesis aims to develop efficiency and generalised medical image segmentation models, which are fundamental to the clinical adoption and industrial deployment of this technology in healthcare settings. This thesis systematically addresses these challenges through a series of progressively advanced deep learning models. The foundational work, CTranS, forms a baseline medical image segmentation model that combines a convolutional neural network (CNN) based encoder with a Transformer-based decoder. This initial contribution leverages the strengths of CNNs for local feature extraction and Transformers for global contextual understanding, achieving state-of-the-art (SOTA) performance. Subsequently, to enhance the generalisability, MO-CTranS is introduced, which is capable of learning from a mixture of multiple partially labelled datasets. MO-CTranS employs a CNN encoder and Transformer decoder with task-specific tokens to handle label inconsistencies and class imbalances, tackling the challenges of training a single unified model from fragmented and heterogeneously annotated medical data. This innovation moves beyond single-task, fully-annotated model training towards a more scalable and clinically realistic approach. Despite the superior performance of CTranS and MO-CTranS, the computational efficiency of the Transformer blocks hinders their clinical application. Thus, CRFTrans is proposed to enhance the model efficiency. As a novel recursive Vision Transformer layer designed to replace the commonly used cascaded Transformer layers, this approach represents a fundamental shift from simply stacking layers to a recursive refinement process, demonstrating that comparable performance can be achieved with significantly reduced computational complexity, thus facilitating deployment in resource-constrained clinical environments. However, these methods still depend on extensive manual annotation, which remains challenging and time-consuming in clinical practice. Thus, to further enhance both efficiency and generality, a novel semi-supervised medical image segmentation framework, referred to as SSL-MedSAM2, has been developed. SSL-MedSAM2 integrates a training-free few-shot segmentation branch for pseudo labelling and a fully supervised learning branch for label refinement. This design allows the model to leverage a small fraction of labelled images together with a large unlabelled set, dramatically decreasing the annotation effort. Together, these contributions result in deep learning models that are significantly more data-efficient and broadly applicable than conventional approaches. The overall impact of this work is to facilitate the reliable and scalable deployment of segmentation models in real-world clinical settings by bridging the gap among accuracy, efficiency, and generalisability.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Zhendi Gong (Wed,) studied this question.

synapsesocial.com/papers/69c771dd8bbfbc51511e1e8a — DOI: https://doi.org/10.17639/7969

Efficient and Generalised Deep Learning Models for Medical Image Segmentation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion