What question did this study set out to answer?

This work aims to enhance multi-target 3D object detection through an improved encoding technique using LiDAR data.

January 22, 2026Open Access

Point cloud-based multi-target 3D object detection using LiDAR sensor and deep learning

Key Points

This work aims to enhance multi-target 3D object detection through an improved encoding technique using LiDAR data.
Developed a point cloud-based multi-target detection framework using LiDAR sensors.
Implemented a unified classification head for improved training stability.
Combined a voxel-based encoder with a convolutional backbone and feature pyramid network for feature extraction.
Evaluated the model's performance against baseline methods on the KITTI dataset.
Achieved state-of-the-art performance on the KITTI dataset, surpassing PointPillars and VoxelNet.
Attained Average Precision scores of 92.84%, 88.63%, and 85.68% for Easy, Moderate, and Hard levels in bird’s-eye view detection.
Achieved Average Precision scores of 87.84%, 76.65%, and 73.60% for Easy, Moderate, and Hard levels in 3D detection.

Abstract

Abstract Point cloud-based three-dimensional (3D) object detection is a critical task in autonomous driving, robotics, and augmented reality, where accurate localization and classification of objects are essential under diverse and challenging scenarios. This work introduces a point cloud-based multi-target 3D object detection framework using Light Detection and Ranging (LiDAR) sensors. The key contribution lies in improving the encoding technique to preserve spatial information and adapt to varying point densities, enabling efficient processing of raw LiDAR data for robust 3D detection. Specifically, we design an efficient network architecture that incorporates a single unified classification head to jointly handle positive and negative samples, simplifying the design and improving training stability. Our model detects all classes (car, pedestrian, cyclist) simultaneously within a single network, enhancing computational efficiency while avoiding separate networks for each class. Furthermore, the proposed feature extraction pipeline combines a voxel-based encoder with a sparsely embedded convolutional backbone and a feature pyramid network, facilitating multi-scale feature representation and effective detection of objects at different scales. This encoding method facilitates the efficient processing of raw LiDAR data, enabling accurate object detection across diverse scenarios. As a result, our model achieves state-of-the-art performance on the KITTI dataset, surpassing baseline methods such as PointPillars and VoxelNet by delivering superior Average Precision (AP) across all difficulty levels for the car class in both bird’s-eye view (BEV) and 3D object detection tasks. Specifically, it achieves AP40 scores of 92.84%, 88.63%, and 85.68% in BEV detection and 87.84%, 76.65%, and 73.60% in 3D detection for Easy, Moderate, and Hard levels, respectively. These results highlight the effectiveness of our encoding technique in enhancing model efficiency and detection accuracy across diverse scenarios.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Soumya et al. (Tue,) studied this question.

synapsesocial.com/papers/6971be8d642b1836717e3399 https://doi.org/https://doi.org/10.1007/s12652-026-05037-y

Bookmark

View Full Paper