This study proposes TNAP-DDQN, a deep reinforcement learning method for urban low-altitude UAV path planning under residential noise threshold constraints. With time cost and safety risk as the optimization objectives, operational constraints such as collision risk and maximum AGL altitude are incorporated to achieve coordinated optimization of noise compliance, operational safety, and efficiency. To mitigate action space contraction and training instability induced by multiple constraints, a Noise-Degradation-Mask-based Action Bias Network (NDM-ABN) is introduced at the action selection layer. A three-tier degradation scheme prevents empty candidate sets, while bias-based decision making is applied to approximately tied actions to stabilize the policy. Moreover, multi-step prioritized experience replay (PER) improves sample efficiency and long-horizon return modeling, and potential-based reward shaping (PBRS) transforms sparse constraint signals into auxiliary rewards. Simulation results indicate that: (1) NDM-ABN is the key module for stabilizing the noise-exposure process by suppressing high-noise actions; (2) the required AGL is related to the UAV source noise level and local noise limits, implying the need for differentiated AGL altitude classes; and (3) the maximum admissible UAV source noise level increases as the threshold is relaxed. The proposed method provides quantitative guidance for noise-entry and AGL altitude regulation, while future work will incorporate additional metrics (e.g., A-weighted equivalent sound level) to better capture noise fluctuations and short-term peaks.
Chen et al. (Mon,) studied this question.