What type of study is this?

This is a Experimental Study study.

September 19, 2025Open Access

MM-RSTraj: A Remote Sensing–Assisted Multimodal Large Language Model for Trajectory Traffic Semantic Understanding and General Spatial Semantic Perception

Key Points

MM-RSTraj achieves superior performance in trajectory traffic semantic evaluation through remote sensing imagery.
Experimental results show competitive outcomes in both remote sensing and trajectory-specific tasks, enhancing cross-modal interaction.
The two high-quality datasets, RSI-Instruct and RSI-Traffic, support the instruction supervision for semantic understanding.
This work sets a new paradigm in combining environmental semantics with trajectory modeling through advanced MLLMs.

Abstract

Trajectory traffic semantic understanding is fundamental to applications such as intelligent transportation and urban mobility analysis. While multimodal large language models (MLLMs) have recently advanced remote sensing scene understanding, current models remain focused on general remote sensing semantics and lack tailored designs for trajectory-specific tasks. To bridge this gap, we propose MM-RSTraj, the first remote sensing–assisted multimodal framework tailored for trajectory traffic semantic understanding. Built upon the LLaVA-OneVision architecture, MM-RSTraj adopts a two-stage fine-tuning strategy to enhance cross-modal interaction between remote sensing imagery and trajectory features. To support this process, we construct two high-quality instruction datasets: RSI-Instruct, an extension of RSICap providing multi-turn instruction–response supervision for general remote sensing semantics; and RSI-Traffic, a dataset designed for trajectory traffic semantic understanding, emphasizing key environmental semantics such as road structures, building layouts, and trajectory-related features. Experimental results demonstrate that MM-RSTraj achieves superior performance in remote sensing trajectory traffic semantic evaluation, while also attaining competitive results in general remote sensing semantic tasks such as RSIC and RSVQA. This work establishes a new paradigm for integrating environmental semantics with trajectory modeling through multimodal large language models (MLLMs).

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper