What question did this study set out to answer?

This research aims to enhance vehicle detection in remote sensing images to support sustainable urban development and intelligent transportation.

March 15, 2026Open Access

MCViM-YOLO: Remote Sensing Vehicle Detection for Sustainable Intelligent Transportation

Key Points

This research aims to enhance vehicle detection in remote sensing images to support sustainable urban development and intelligent transportation.
Proposed the MCViM-YOLO algorithm that incorporates a Mix-Mamba module to capture spatial details.
Utilized a dual-factor calibration fusion module (DCFM) for feature fusion.
Implemented a dual-branch attention detection head (DADH) for optimizing predictions on complex samples.
Achieved an mAP@0.5 of 92.391% in vehicle detection accuracy.
Obtained a recall rate of 86.070%.
Maintained a computational complexity of 10.41 GFLOPs.

Abstract

Vehicle detection is a core task in smart city perception management and an important technical support for sustainable urban development and intelligent transportation optimization. In high-resolution unmanned aerial vehicle (UAV) remote sensing images, it faces challenges such as variable target scales, severe occlusion, and difficulty in modeling long-range dependencies. To address these issues, this study proposes the MCViM-YOLO algorithm, which integrates the local perception advantage of convolution with the global modeling capability of the state space model (Mamba). Based on YOLOv12, the algorithm reconstructs the neck network: it introduces the Mix-Mamba module (parallel multi-scale convolution and selective state space model) to simultaneously capture local details and global spatial dependencies, adopts the dual-factor calibration fusion module (DCFM) to adaptively fuse heterogeneous features, and employs a dual-branch attention detection head (DADH) to optimize the prediction of difficult samples (e.g., occluded, small-scale vehicles). Experiments on the VEBAI dataset demonstrate that our proposed model achieves an mAP@0.5 of 92.391% and a recall rate of 86.070%, with a computational complexity of 10.41 GFLOPs. The results show that the proposed method effectively improves the accuracy and efficiency of vehicle detection in complex remote sensing scenarios, provides technical support for traffic flow monitoring, low-carbon urban planning, and other sustainable applications, and offers an innovative paradigm for the deep integration of CNN and state space models with both theoretical research value and engineering application prospects.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhang et al. (Fri,) studied this question.

synapsesocial.com/papers/69b5ff5c83145bc643d1bcc6 https://doi.org/https://doi.org/10.3390/su18062836

Bookmark

View Full Paper