Key points are not available for this paper at this time.
Object detection is essential in biomedical image analysis, particularly for identifying small yet critical entities such as abnormal cells or lung nodules under 3 mm, which are often missed by existing methods. To address this, we propose CAF-YOLO, built on the YOLOv8 architecture. Our model combines CNNs for robust local feature extraction with transformers to capture long-range dependencies. To mitigate the limited ability of convolutional kernels to handle distant spatial interactions, we developed an attention and convolution fusion module (ACFM), which enhances both global and local feature modeling. Additionally, we designed a multi-scale neural network (MSNN) to aggregate features across multiple scales, overcoming the restricted feature aggregation seen in standard feed-forward networks (FFN) within transformer architectures. This comprehensive approach improves detection accuracy in identifying complex micro-lesions. Experimental results on benchmark datasets like BCCD and LUNA16 demonstrate the efficacy of CAF-YOLO in capturing detailed biomedical entities. Our codes are available at https://github.com/xiaochen925/CAF-YOLO.
Chen et al. (Wed,) studied this question.