What type of study is this?

This is a Quantitative Study study.

October 19, 2025Open Access

Non-differentiable Reward Optimization for Diffusion-based Autonomous Motion Planning

Key Points

Non-differentiable reward optimization leads to better motion planning in autonomous robots, improving their safety.
The proposed reward-weighted dynamic thresholding algorithm enhances training on complex objectives, outperforming traditional methods.
Diffusion models capture agent interactions effectively, demonstrating their applicability in real-world dynamic environments.
State-of-the-art performance achieved on pedestrian datasets underscores the significance of this approach for autonomous systems.

Abstract

Safe and effective motion planning is crucial for autonomous robots. Diffusion models excel at capturing complex agent interactions, a fundamental aspect of decision-making in dynamic environments. Recent studies have successfully applied diffusion models to motion planning, demonstrating their competence in handling complex scenarios and accurately predicting multi-modal future trajectories. Despite their effectiveness, diffusion models have limitations in training objectives, as they approximate data distributions rather than explicitly capturing the underlying decision-making dynamics. However, the crux of motion planning lies in non-differentiable downstream objectives, such as safety (collision avoidance) and effectiveness (goal-reaching), which conventional learning algorithms cannot directly optimize. In this paper, we propose a reinforcement learning-based training scheme for diffusion motion planning models, enabling them to effectively learn non-differentiable objectives that explicitly measure safety and effectiveness. Specifically, we introduce a reward-weighted dynamic thresholding algorithm to shape a dense reward signal, facilitating more effective training and outperforming models trained with differentiable objectives. State-of-the-art performance on pedestrian datasets (CrowdNav, ETH-UCY) compared to various baselines demonstrates the versatility of our approach for safe and effective motion planning.

Non-differentiable Reward Optimization for Diffusion-based Autonomous Motion Planning

Key Points

Abstract

Cite This Study