LiDAR-based human motion capture holds great promise for large-scale, unconstrained environments. However, existing approaches often rely on clean, pre-segmented point clouds and struggle with noisy or dynamic scenes, limiting their practical applicability. We propose OptimalCap, a robust and efficient LiDAR-based framework that integrates hierarchical skeletal modeling and kinematic-aware temporal optimization to enable accurate, coherent, and real-time multi-human motion capture. To support training and evaluation under realistic disturbances, we also introduce NoiseMotion, a large-scale synthetic dataset simulating human-object interactions in noisy environments. Extensive experiments on public and synthetic benchmarks demonstrate that OptimalCap achieves state-of-the-art accuracy, robustness, and temporal consistency, while supporting over 20 individuals, at 60 FPS and up to 100 meters, setting a new standard for scalable, real-world LiDAR-based motion capture.
Ren et al. (Thu,) studied this question.