Infrared small target detection (ISTD) plays a crucial role in many real-world applications. However, this task remains highly challenging due to the extremely small target size, low contrast, and complex background interference as infrared small targets often occupy fewer than 80 pixels in a 256×256 image under a commonly used ISTD criterion. Although Segment Anything Model (SAM) shows strong generalization in image segmentation, directly applying SAM to ISTD is suboptimal, primarily due to the significant modality gap between RGB and infrared imagery, as well as the prohibitive cost of full-parameter fine-tuning. To address these challenges, we propose a prompt-free and parameter-efficient fine-tuning framework that adapts SAM for ISTD. To bridge the cross-modality gap while preserving the pretrained prior knowledge of SAM, a lightweight Infrared Adapter (IR-Adapter) is introduced into the image encoder, enabling effective task adaptation with only a small number of trainable parameters. Furthermore, to alleviate the loss of small target information in deep network layers, we design a Multi-Scale Feature Fusion (MSF) module that integrates hierarchical features from different encoder stages. In addition, a Coarse-to-Fine Head (CFH) with dual-branch prediction is proposed to incorporate fine-grained details for more accurate target localization and segmentation. Extensive experiments conducted on two public datasets demonstrate that the proposed method achieves better overall performance than existing representative approaches, yielding higher IoU, nIoU and Pd.
Li et al. (Sat,) studied this question.