Segmenting programmed cell death-ligand 1 (PD-L1) expression regions in lung squamous cell carcinoma from pathological H&E images represents a challenging pixel-level prediction task, attributed to the morphological heterogeneity and size discrepancies of expression areas. Although hybrid architectures of CNN and Transformer can extract local features and capture long-range dependencies, they inadequately address information interaction and redundant information elimination during the fusion process, adversely impacting PD-L1 segmentation accuracy. To address this, we propose a W-shaped dual-encoder network (DEW-Net) with novel attention fusion mechanisms. First, a CNN encoder and a Swin Transformer encoder are connected in parallel to extract multi-layer local and global features from pathological images, respectively. Second, a Cross-Attention Fusion (CAF) module is proposed to strengthen information interaction and semantic feature fusion. Additionally, a Channel Attention (CA) is introduced in skip connections to enhance the channel-wise information of shallow features, while a Bilateral-voting Position Attention (BPA) module is further proposed to eliminate positional noise in same-scale shallow features and reinforce position-wise information. We conducted extensive experiments on four datasets. On the PD-L1 segmentation dataset, DEW-Net achieved superior performance, with DSC and IoU reaching 79.93% and 71.27%, respectively. These results demonstrate its strong performance and generalization capability compared to other state-of-the-art (SOTA) methods.
Meng et al. (Thu,) studied this question.