August 16, 2025Open Access

Autonomous Vehicle Maneuvering Using Vision–LLM Models for Marine Surface Vehicles

Key Points

The system increased success rates from 23% to 73% after implementing path planning, demonstrating significant improvement.
Adding path planning increased average travel distances from 39 m to 45 m while extending task completion time from 483 s to 672 s.
Assessment using a multimodal vision–LLM system in simulated marine environments showed real-time adaptability benefits.
Reliability improvements come with trade-offs in efficiency, emphasizing the importance of balanced algorithm design.

Abstract

Recent advances in vision–language models (VLMs) have transformed the field of robotics. Researchers are combining the reasoning capabilities of large language models (LLMs) with the visual information processing capabilities of VLMs in various domains. However, most efforts have focused on terrestrial robots and are limited in their applicability to volatile environments such as ocean surfaces and underwater environments, where real-time judgment is required. We propose a system integrating the cognition, decision making, path planning, and control of autonomous marine surface vehicles in the ROS2–Gazebo simulation environment using a multimodal vision–LLM system with zero-shot prompting for real-time adaptability. In 30 experiments, adding the path plan mode feature increased the success rate from 23% to 73%. The average distance increased from 39 m to 45 m, and the time required to complete the task increased from 483 s to 672 s. These results demonstrate the trade-off between improved reliability and reduced efficiency. Experiments were conducted to verify the effectiveness of the proposed system and evaluate its performance with and without adding a path-planning step. The final algorithm with the path-planning sub-process yields a higher success rate, and better average path length and time. We achieve real-time environmental adaptability and performance improvement through prompt engineering and the addition of a path-planning sub-process in a limited structure, where the LLM state is initialized with every application programming interface call (zero-shot prompting). Additionally, the developed system is independent of the vision–LLM archetype, making it scalable and adaptable to future models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Taeyeon Kim

Yonsei University

Woen-Sug Choi

Korea Maritime and Ocean University

Journals

Journal of Marine Science and Engineering

Actions

Institutions

Korea Maritime and Ocean University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Autonomous Vehicle Maneuvering Using Vision–LLM Models for Marine Surface Vehicles

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study