June 30, 2025

FusedVisionNet: A Multi-Modal Transformer Model for Real-Time Autonomous Navigation

Key Points

FusedVisionNet improves real-time navigation outcomes in rapidly changing environments, marking a significant technological advance.
Benchmark evaluation demonstrates superior performance in object detection and path planning compared to state-of-the-art benchmarks.
The model utilizes a cross-attention transformer to effectively merge spatial and semantic information from various sensor modalities.
Achieving robust navigation capabilities, FusedVisionNet is expected to enhance future autonomous vehicle technologies in real-world scenarios.

Abstract

Rapid and precise perception and decision-making across different types of sensors are ever more demanding for real-time autonomous navigation in rapidly changing environments. We propose FusedVisionNet, a multi-modal, real-time vision, depth, and LiDAR data fused navigation system based on transformers. Unlike traditional convolutional single-modality or multi-modality systems, FusedVisionNet employs a cross-attention transformer backbone that combines spatial and semantic information extracted from several modalities, which enables us to comprehend complex scenes better. The model also features a multi-scale fusion framework that captures shared feature along with individual characteristics unique to each captured modality, deepening coherent representations. Evaluation on benchmark datasets like KITTI and nuScenes reveals that FusedVisionNet surpasses the state-of-the- art benchmarks in object detection, path planning, and obstacle avoidance while maintaining the lower latency required for real-time use. Realtime applications depend on low latency; ablation studies demonstrate the efficacy of each modality and the union approach. Through direct enhancement in reliably contested weather and lighting conditions, FusedVisionNet achieves superiority in diverse urban and off-road scenarios. The developed model marks a critical advance in robust and dependable autonomous navigation systems tailored specifically for real- world use and illuminates prospects for transformer-based multi-modal hierarchical fusion architectures in future autonomously-vehicular technologies.

Bookmark

Cite This Study

Van et al. (Mon,) studied this question.

synapsesocial.com/papers/68af4eaead7bf08b1ead70e3 https://doi.org/https://doi.org/10.71086/iajir/v12i2/iajir1215

Bookmark