What type of study is this?

This is a Quantitative Study study.

October 2, 2025Open Access

MSSA: A Multi-Scale Semantic-Aware Method for Remote Sensing Image–Text Retrieval

Key Points

MSSA improves cross-modal retrieval accuracy by ensuring semantic consistency in remote sensing images.
The method enhances multi-scale image features using Progressive Spatial Channel Joint Attention for better alignment.
Image-Guided Text Attention dynamically adjusts textual attention weights based on visual contexts.
Experiments show MSSA achieves superior retrieval performance across three baseline datasets.

Abstract

In recent years, the convenience and potential for information extraction offered by Remote Sensing Image–Text Retrieval (RSITR) have made it a significant focus of research in remote sensing (RS) knowledge services. Current mainstream methods for RSITR generally align fused image features at multiple scales with textual features, primarily focusing on the local information of RS images while neglecting potential semantic information. This results in insufficient alignment in the cross-modal semantic space. To overcome this limitation, we propose a Multi-Scale Semantic-Aware Remote Sensing Image–Text Retrieval method (MSSA). This method introduces Progressive Spatial Channel Joint Attention (PSCJA), which enhances the expressive capability of multi-scale image features through Window-Region-Global Progressive Attention (WRGPA) and Segmented Channel Attention (SCA). Additionally, the Image-Guided Text Attention (IGTA) mechanism dynamically adjust textual attention weights based on visual context. Furthermore, the Cross-Modal Semantic Extraction Module (CMSE) incorporated learnable semantic tokens at each scale, enabling attention interaction between multi-scale features of different modalities and the capturing of hierarchical semantic associations. This multi-scale semantic-guided retrieval method ensures cross-modal semantic consistency, significantly improving the accuracy of cross-modal retrieval in RS. MSSA demonstrates superior retrieval accuracy in experiments across three baseline datasets, achieving a new state-of-the-art performance.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper