Key points are not available for this paper at this time.
Fine-grained image classification aims to accurately categorize subclasses within a particular category. Due to the small inter-class differences and large intra-class variations, fine-grained image classification has been a challenging research topic in the field of computer vision and holds significant research value. Existing neural network-based algorithms suffer from the loss of fine-grained texture details during the training process and the inability to effectively fuse features extracted from different convolution layers of the backbone network. To address these issues, this paper proposes a fine-grained image classification method based on a lightweight feature extraction network with MobileNet v2 as the core, incorporating multi-scale feature fusion and attention mechanism. Considering that high-level and low-level features contain rich semantic and textural information, attention mechanisms are embedded into different scales to capture more diverse feature information. Experimental evaluations conducted on the publicly available fine-grained dataset, A Large Scale Fish Dataset, achieve a classification accuracy of 99.86%. The results demonstrate the superiority of the proposed method in fine-grained object classification.
Miao et al. (Wed,) studied this question.