With increasing reliance on multi-structural analysis in ophthalmic diagnosis and treatment, accurate segmentation of ocular structures and lesions is essential for effective clinical decision-making. Multi-modal imaging, such as color fundus photography (CFP) and anterior segment optical coherence tomography (AS-OCT), provides complementary views of the posterior and anterior segments, enabling comprehensive disease assessment and personalized treatment planning. However, significant modality differences hinder the generalization ability of existing segmentation models. Although the Transformer-based Segment Anything Model (SAM) demonstrates strong zero-shot performance on natural images, it struggles with medical images exhibiting inter-modal variations. To address this, we propose SAM Fine-Grained Fine-tuning (SAM-FGF), a framework for multi-modal, multi-target ophthalmic image segmentation. SAM-FGF incorporates a Fine-Grained Fine-tuning (FGF) module that employs cross-attention mechanisms to dynamically align and contrast input images with multi-modal feature representations, thereby extracting modality-adaptive features. These refined features serve as inputs to the HQ-Decoder, improving segmentation accuracy across diverse medical imaging tasks. In addition, we incorporate Low-Rank Adaptation (LoRA) to enable efficient fine-tuning while preserving structural details. Experiments on multiple datasets demonstrate that SAM-FGF achieves superior segmentation performance across diverse ophthalmic imaging modalities.
Liang et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: