Real-time path planning for Unmanned Surface Vehicles (USVs) in complex marine environments remains challenging due to unstructured environments, ocean current disturbances, and dynamic obstacles. This paper proposes an improved Hybrid Safety and Reward-Sensitive Twin Delayed Deep Deterministic Policy Gradient (HRSTD3) algorithm and constructs a high-fidelity simulation environment based on GEBCO bathymetric data and CMEMS ocean current data. The path planning problem is formulated as a Markov Decision Process (MDP), where the state space incorporates multi-beam radar perception, ocean current disturbances, and relative goal information, while the action space outputs continuous thrust and rudder commands subject to vehicle dynamics constraints. The proposed framework integrates a risk-aware hybrid safety decision architecture, a Trajectory Predictor Network (TPN), a Curvature-driven Advantage-based Prioritized Experience Replay (CDA-PER) mechanism, and an uncertainty-aware conservative Q-learning strategy to enhance navigation safety, sample efficiency, and policy stability. Comprehensive simulations demonstrate that, compared with baseline deep reinforcement learning methods, the proposed approach achieves faster convergence, improved stability, and competitive path efficiency while consistently maintaining sufficient obstacle clearance and millisecond-level inference latency, validating its effectiveness and practical feasibility for safe USV navigation in realistic dynamic marine environments.
Zhang et al. (Fri,) studied this question.