Hybrid multi-backbone architectures and the utilization of edge cues for auxiliary training have become two major research trends in salient object detection (SOD). It is widely acknowledged that CNNs can effectively model local spatial structures, while Transformers can capture long-range global dependencies. However, the representation discrepancy between CNN and Transformer features, together with boundary-detail degradation during multi-scale fusion, remains a major challenge. In addition, how to effectively leverage edge cues as reliable structural guidance without introducing texture-induced false boundaries or boundary leakages remains an open issue. In this paper, we present SECA-Net, a unified framework that establishes a profound synergy between CNN and Transformer representations. It explicitly bridges their inherent discrepancies through level-dependent interaction strategies, while resolving structural degradation via a sequential “purify-and-guide” mechanism. This approach enables the network to extract and utilize edge cues effectively, thereby alleviating boundary degradation and texture-induced false contours. Specifically, we design a dual-encoder structure to extract features. A level-wise feature interaction (LFI) module is introduced to perform discrepancy-aware fusion across feature levels, stabilizing CNN–Transformer aggregation. Meanwhile, the features extracted from the CNN branch are projected into a semantic-aware edge refinement (SAER) module to produce clean multi-scale edge priors under high-level semantic guidance, suppressing texture-induced spurious edges. Finally, we design an edge-guided cross-attention feature aggregation (ECFA) module, which progressively injects refined edge priors as structural constraints into multi-scale saliency decoding via cascaded cross-attention, enabling effective structural refinement. Overall, LFI reduces cross-branch discrepancy, SAER purifies boundary priors, and ECFA integrates semantics and structure in a progressive decoding manner, forming a unified SECA-Net framework. Extensive experimental results on five benchmark SOD datasets show that SECA-Net outperforms 19 state-of-the-art methods, demonstrating its effectiveness. Specifically, our proposed method ranks first in Fβ and BDE across all datasets, notably improving Fβ by 1.54% on the challenging DUTS-TE dataset.
Lu et al. (Thu,) studied this question.