Medical image segmentation is a critical task in computer-assisted diagnosis and therapy planning. Accurate and efficient fully automated segmentation methods are essential for accelerating decision-making processes. Compared with natural image segmentation methods, current deep learning–based segmentation models for medical imaging face challenges related to data scarcity, inconsistent annotations, and high computational cost. Consequently, this thesis aims to develop efficiency and generalised medical image segmentation models, which are fundamental to the clinical adoption and industrial deployment of this technology in healthcare settings. This thesis systematically addresses these challenges through a series of progressively advanced deep learning models. The foundational work, CTranS, forms a baseline medical image segmentation model that combines a convolutional neural network (CNN) based encoder with a Transformer-based decoder. This initial contribution leverages the strengths of CNNs for local feature extraction and Transformers for global contextual understanding, achieving state-of-the-art (SOTA) performance. Subsequently, to enhance the generalisability, MO-CTranS is introduced, which is capable of learning from a mixture of multiple partially labelled datasets. MO-CTranS employs a CNN encoder and Transformer decoder with task-specific tokens to handle label inconsistencies and class imbalances, tackling the challenges of training a single unified model from fragmented and heterogeneously annotated medical data. This innovation moves beyond single-task, fully-annotated model training towards a more scalable and clinically realistic approach. Despite the superior performance of CTranS and MO-CTranS, the computational efficiency of the Transformer blocks hinders their clinical application. Thus, CRFTrans is proposed to enhance the model efficiency. As a novel recursive Vision Transformer layer designed to replace the commonly used cascaded Transformer layers, this approach represents a fundamental shift from simply stacking layers to a recursive refinement process, demonstrating that comparable performance can be achieved with significantly reduced computational complexity, thus facilitating deployment in resource-constrained clinical environments. However, these methods still depend on extensive manual annotation, which remains challenging and time-consuming in clinical practice. Thus, to further enhance both efficiency and generality, a novel semi-supervised medical image segmentation framework, referred to as SSL-MedSAM2, has been developed. SSL-MedSAM2 integrates a training-free few-shot segmentation branch for pseudo labelling and a fully supervised learning branch for label refinement. This design allows the model to leverage a small fraction of labelled images together with a large unlabelled set, dramatically decreasing the annotation effort. Together, these contributions result in deep learning models that are significantly more data-efficient and broadly applicable than conventional approaches. The overall impact of this work is to facilitate the reliable and scalable deployment of segmentation models in real-world clinical settings by bridging the gap among accuracy, efficiency, and generalisability.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhendi Gong (Wed,) studied this question.
synapsesocial.com/papers/69c771dd8bbfbc51511e1e8a — DOI: https://doi.org/10.17639/7969
Zhendi Gong
Building similarity graph...
Analyzing shared references across papers
Loading...