Emerging deep learning architectures and large foundation models have transformed medical image segmentation, yet significant challenges remain when limited annotations are available across diverse imaging modalities and anatomical structures. Current few-shot learning approaches struggle with limited training examples and fail to leverage anatomical expertise effectively, often relying on superficial visual similarities rather than meaningful anatomical correspondences essential for medical image analysis. We propose FUSE-RAG, a novel framework based on retrieval-augmented generation (RAG) for few-shot medical image segmentation. For an image query, FUSE-RAG uses a retrieved anatomically-relevant support set for conditioning mask generation. Our key innovations include: (1) an ROI-aware retrieval mechanism that injects expert anatomical knowledge into medical foundation model features, guiding them toward clinically relevant regions and aligning with core RAG principles for medical imaging, (2) a segmentation architecture integrating Anatomical Correspondence Blocks (ACB) with Vision Mamba SS2D for efficient long-range modeling, Support Quality Assessment Blocks (SQAB) for adaptive feature weighting, and Support-Conditioned Skip Connections (SCSC) for propagating anatomical guidance. FUSE-RAG extends prompt engineering principles from natural language processing to medical imaging, demonstrating that carefully selected anatomically relevant examples outperform larger randomly chosen sets through a quality-over-quantity paradigm. Comprehensive evaluation on four standard datasets, ATLAS 2.0 stroke lesion, QaTa-COVID19 pneumonia, ISIC 2018 skin lesion, and DRIVE retinal vessel, demonstrates substantial improvements of 12.69%, 10.31%, 5.60%, and 11.79% Dice coefficient, respectively, over state-of-the-art few-shot methods, with the potential of real-time deployment. Code will be made publicly available at: https://github.com/MOHAMEDLamine77/FuseRAG.
Allaoui et al. (Thu,) studied this question.