What question did this study set out to answer?

The study aims to improve decision-making in autonomous driving through a semantic-aligned multimodal framework.

January 23, 2026Open Access

Semantic-Aligned Multimodal Vision–Language Framework for Autonomous Driving Decision-Making

Puntos clave

The study aims to improve decision-making in autonomous driving through a semantic-aligned multimodal framework.
Proposed SemAlign-E2E framework integrates visual, LiDAR, and textual inputs.
Utilized cross-modal attention for scene understanding and command generation.
Conducted evaluations on the nuScenes dataset and CARLA simulation platform.
Achieved improvements in driving stability and safety.
Demonstrated multi-task generalization and semantic comprehension.
Outperformed state-of-the-art methods in complex traffic scenarios.

Resumen

Recent advances in Large Vision–Language Models (LVLMs) have demonstrated strong cross-modal reasoning capabilities, offering new opportunities for decision-making in autonomous driving. However, existing end-to-end approaches still suffer from limited semantic consistency, weak task controllability, and insufficient interpretability. To address these challenges, we propose SemAlign-E2E (Semantic-Aligned End-to-End), a semantic-aligned multimodal LVLM framework that unifies visual, LiDAR, and task-oriented textual inputs through cross-modal attention. This design enables end-to-end reasoning from scene understanding to high-level driving command generation. Beyond producing structured control instructions, the framework also provides natural-language explanations to enhance interpretability. We conduct extensive evaluations on the nuScenes dataset and CARLA simulation platform. Experimental results show that SemAlign-E2E achieves substantial improvements in driving stability, safety, multi-task generalization, and semantic comprehension, consistently outperforming state-of-the-art baselines. Notably, the framework exhibits superior behavioral consistency and risk-aware decision-making in complex traffic scenarios. These findings highlight the potential of LVLM-driven semantic reasoning for autonomous driving and provide a scalable pathway toward future semantic-enhanced end-to-end driving systems.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Peng et al. (Wed,) studied this question.

synapsesocial.com/papers/69730f78c8125b09b0d1f3d4 https://doi.org/https://doi.org/10.3390/machines14010125

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo