Timely and accurate paddy field mapping remains challenging in tropical regions due to persistent cloud cover and complex cropping patterns. We propose DSSNet , a lightweight dual-encoder semantic segmentation framework that fuses Sentinel-1 SAR and Sentinel-2 optical imagery. DSSNet leverages modality-specific backbones from different architectural paradigms: EfficientNet-B0 , a convolutional, and MaxVit-T , a transformer-based encoder. To further enhance multimodal feature discrimination, we introduce two axial attention mechanisms — Axial Spatial Attention (ASA) and Axial Channel Attention (ACA) — to selectively emphasize directional spatial patterns and inter-channel relationships. Evaluated on imagery from Indonesia rice-growing regions during the 2019 season, DSSNet achieves an F1-score of 0.8982, pixel accuracy of 0.8998, and mIoU of 0.8156, outperforming ten benchmark models. These findings underscore the operational feasibility of lightweight dual-paradigm fusion architectures for large-scale, in-season agricultural mapping under complex environmental conditions. Our code and model will be publicly available at https://github.com/project4earth/DSSNet . • Lightweight dual-stream CNN–Transformer network for efficient multimodal segmentation • Axial spatial and channel attention capture directional context with low overhead • Achieves strong paddy field segmentation performance on Sentinel-1/2 data • Raw SAR and multispectral bands outperform vegetation indices in fusion settings • Compact model enables near-real-time inference on 512 × 512 dual-stream inputs
Wijaya et al. (Thu,) studied this question.