What question did this study set out to answer?

March 26, 2026Open Access

Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention

Key Points

The aim is to improve the robustness of vehicle-infrastructure cooperative perception systems in complex urban environments.
Developed a 3D object detection framework.
Integrated a Redefined Squeeze-and-Excitation Network for feature enhancement.
Designed a Feature Pyramid Backbone Network for multi-scale feature extraction.
Introduced a Spatial Adaptive Feature Fusion module to address sensor feature misalignment.
Conducted extensive experiments on DAIR-V2X benchmark and a custom dataset.
Achieved Average Precision scores of 0.762 and 0.694 at IoU threshold of 0.5.
Obtained Average Precision scores of 0.617 and 0.563 at IoU threshold of 0.7.
Demonstrated real-time inference performance.

Abstract

Vehicle–infrastructure cooperative perception (VICP) extends the sensing capability of single-vehicle systems by integrating multi-source information from onboard and roadside sensors, thereby alleviating limitations in sensing range and field-of-view coverage. However, in complex urban environments, the robustness of such systems—particularly in terms of blind-spot coverage and feature representation—is severely affected by both static and dynamic occlusions, as well as distance-induced sparsity in point cloud data. To address these challenges, a 3D object detection framework incorporating point cloud feature enhancement and spatially adaptive fusion is proposed. First, to mitigate feature degradation under sparse and occluded conditions, a Redefined Squeeze-and-Excitation Network (R-SENet) attention module is integrated into the feature encoding stage. This module employs a dual-dimensional squeeze-and-excitation mechanism operating across pillars and intra-pillar points, enabling adaptive recalibration of critical geometric features. In addition, a Feature Pyramid Backbone Network (FPB-Net) is designed to improve target representation across varying distances through multi-scale feature extraction and cross-layer aggregation. Second, to address feature heterogeneity and spatial misalignment between heterogeneous sensing agents, a Spatial Adaptive Feature Fusion (SAFF) module is introduced. By explicitly encoding the origin of features and leveraging spatial attention mechanisms, the SAFF module enables dynamic weighting and complementary fusion between fine-grained vehicle-side features and globally informative roadside semantics. Extensive experiments conducted on the DAIR-V2X benchmark and a custom dataset demonstrate that the proposed approach outperforms several state-of-the-art methods. Specifically, Average Precision (AP) scores of 0.762 and 0.694 are achieved at an IoU threshold of 0.5, while AP scores of 0.617 and 0.563 are obtained at an IoU threshold of 0.7 on the two datasets, respectively. Furthermore, the proposed framework maintains real-time inference performance, highlighting its effectiveness and practical potential for real-world deployment.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Shiyang Yan

Queen's University Belfast

Yanfeng Wu

Zhennan Liu

Guizhou Institute of Technology

Journals

World Electric Vehicle Journal

Actions

Institutions

Henan University of Science and Technology

Yutong (China)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study