In spatially constrained water domains, surface sensing agents(SSAs) must achieve safe path planning, uncertain currents, and sensor noise. We present a decentralized motion planning and collision-avoidance framework based on distributional reinforcement learning (DRL) that models the full return distribution to enable risk-aware decision making. Each surface sensing agent autonomously proceeds to its designated coordinates without rigid spatial constraints, coordinating implicitly through learned policies and a lightweight safety shield that enforces separation and kinematic limits. The method integrates (i) distributional value estimation for controllable risk sensitivity near hazards, (ii) domain randomization of sea states and disturbances for robustness, and (iii) a shielded action layer compatible with standard reactive rules (e.g., velocity obstacle-style constraints) to guarantee feasible maneuvers. In simulations across cluttered maps and stochastic current fields, the proposed approach improves success rates and reduces near-miss events compared to non-distributional RL and classical planners, while maintaining competitive path length and computation time. The results indicate that DRL-based surface sensing agent navigation is a practical path toward safe, efficient environmental monitoring and surveying.
Dou et al. (Mon,) studied this question.