What question did this study set out to answer?

The research aims to develop an effective framework for 3D multi-object tracking using radar-camera fusion without requiring complex data annotation.

March 29, 2026Open Access

Robust 3D Multi-Object Tracking via 4D mmWave Radar-Camera Fusion and Disparity-Domain Depth Recovery

Key Points

The research aims to develop an effective framework for 3D multi-object tracking using radar-camera fusion without requiring complex data annotation.
Introduced Gaussian distribution-based adaptive angle compression for noise suppression in radar measurements.
Employed IMU-based velocity compensation and an improved DBSCAN clustering scheme for radar detections.
Proposed a disparity-domain depth recovery method using sparse radar points as anchors.
Implemented Kalman filtering for temporal smoothing of depth recovery.
Designed a hierarchical fusion strategy for effective detection and tracking.
Achieved an overall MOTA (Multiple Object Tracking Accuracy) of 77.93%.
Outperformed single-modality baselines and other comparison methods by 11 to 31 percentage points.

Abstract

4D millimeter-wave radar provides high-precision ranging capability and exhibits strong robustness under adverse weather and low-visibility conditions, but its point clouds are relatively sparse and suffer from severe elevation-angle measurement noise. Monocular cameras, by contrast, provide rich semantic information and high recall, yet are fundamentally limited by scale ambiguity. To exploit the complementary characteristics of these two sensors, this paper proposes a radar-camera fusion 3D multi-object tracking framework that does not rely on complex 3D annotated data. First, on the radar signal-processing side, a Gaussian distribution-based adaptive angle compression method and IMU-based velocity compensation are introduced to effectively suppress measurement noise, and an improved DBSCAN clustering scheme with recursive cluster splitting and historical static-box guidance is employed to generate high-quality radar detections. Second, a disparity-domain metric depth recovery method is proposed. This method uses filtered radar points as sparse metric anchors, performs robust fitting with RANSAC, and applies Kalman filtering for temporal smoothing, thereby converting the relative depth output of the visual foundation model Depth Anything V2 into metric depth. Finally, a hierarchical fusion strategy is designed at both the detection and tracking levels to achieve stable cross-modal state association. Experimental results on a self-collected dataset show that the proposed method achieves an overall MOTA of 77.93%, outperforming single-modality baselines and other comparison methods by 11 to 31 percentage points. This study provides an effective solution for low-cost and robust environment perception in complex dynamic scenarios.

Robust 3D Multi-Object Tracking via 4D mmWave Radar-Camera Fusion and Disparity-Domain Depth Recovery

Key Points

Abstract

Cite This Study

Also Consider

Also Consider