ABSTRACT Accurate medical image segmentation is essential for disease diagnosis, treatment planning and outcome monitoring. However, current segmentation methods heavily rely on large‐scale, pixel‐level annotations, which are costly and labour‐intensive to obtain. To address this challenge, we propose TriS‐Net (Triple‐Supervision Segmentation Network) , a progressive framework that integrates image‐level, bounding box‐level and pixel‐level labels into a multigranularity supervision pipeline under limited annotation settings. In the first stage, TriS‐Net uses image‐level labels to train a classification branch, enabling the network to learn discriminative features and localise potential lesion regions. In the second stage, a box‐guided mask refinement strategy (BMR) is proposed, which combines Soft‐NMS filtering and a one‐to‐one matching mechanism to obtain reliable candidate regions. CIoU is further employed to derive image‐level quality metrics that impose quality‐aware weighted constraints on segmentation learning, thereby improving spatial localisation and structural consistency. In the third stage, a small number of pixel‐level labels are used for fine‐grained supervision, further enhancing segmentation accuracy and boundary details. The proposed method is validated on the BraTS 2019 and LiTS 2017 datasets, on which it outperforms several existing methods under limited annotation settings. Additional experiments on the BUSI ultrasound dataset further demonstrate its good generalisation capability across different imaging modalities.
Zhang et al. (Wed,) studied this question.