Remote sensing object detection is fundamental to Earth observation, yet remains challenging when relying on a single sensing modality. While optical imagery provides rich spatial and textural details, it is highly sensitive to illumination and adverse weather; conversely, Synthetic Aperture Radar (SAR) offers robust all-weather acquisition but suffers from speckle noise and limited semantic interpretability. To address these limitations, we leverage the potential of foundation models for optical–SAR object detection via a novel gated–guided fusion approach. By integrating transferable and generalizable representations from foundation models into the detection pipeline, we enhance semantic expressiveness and cross-environment robustness. Specifically, a gated–guided fusion mechanism is designed to selectively merge cross-modal features with foundational priors, enabling the network to prioritize informative cues while suppressing unreliable signals in complex scenes. Furthermore, we propose a dual-stream architecture incorporating attention mechanisms and State Space Models (SSMs) to simultaneously capture local and long-range dependencies. Extensive experiments on the large-scale M4-SAR dataset demonstrate that our method achieves state-of-the-art performance, significantly improving detection accuracy and robustness under challenging sensing conditions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiang et al. (Wed,) studied this question.
synapsesocial.com/papers/69d895be6c1944d70ce06d56 — DOI: https://doi.org/10.3390/ijgi15040160
Qianyin Jiang
Guangzhou Maritime College
Jianshang Liao
Guangzhou Maritime College
Qiuyu Lin
Guangzhou Maritime College
ISPRS International Journal of Geo-Information
Nanjing University of Posts and Telecommunications
Guangzhou Maritime College
Building similarity graph...
Analyzing shared references across papers
Loading...