Vehicle classification within low-resolution surveillance scenarios remains a challenging task due to the subtle differences between classes and the lack of clear visual cues. This study aimed to improve vehicle classification performance under low-resolution surveillance scenarios.To this end, we proposed KRepIncep-AF, a convolutional neural network model that employed the backbone of InceptionNeXt-Tiny, RepViT modules, and a KernelWarehouse block for prioritized assimilation of spatial cues and contextual information. A compound loss function that combined linear adaptive cross-entropy and focal loss was applied to effectively address class imbalance and reinforce robustness. Comparative experiments were carried out using a vehicle dataset consisting of six classes and a resolution of 100 × 100 pixels. The proposed model attained an outstanding accuracy rate of 99.58%, with macro-average F1, precision, and recall values exceeding 99.5%, and outperformed several competitive baselines. These results demonstrate the effectiveness of the proposed architecture in constrained surveillance environments. Visual examination via heatmaps further established that the model highlighted silhouette-specific features such as bumpers and trailers. These observations indicated that improvements in model structure and the domain-specific application of loss functions could lead to considerable gains in classification accuracy, with meaningful implications for real-world traffic surveillance scenarios.
Zhang et al. (Sun,) studied this question.