Extracting roads accurately from remote sensing images is important for map updates, traffic analysis, and infrastructure monitoring. Medium-resolution multispectral images can provide useful surface and background information, but when used alone, the spatial details are limited for retaining narrow roads, intersection structures, and fine road topologies. To address this problem, this paper proposes GeoRoad-UPerNet, a Geo-1-centered weakly supervised multispectral framework for road extraction. In this framework, Geo-1 serves as the primary 16-band multispectral source, Sentinel-2 Level-2A imagery serves as auxiliary contextual support, and OpenStreetMap (OSM) road information is converted into proxy supervision rather than dense manual ground truth. GeoRoad-UPerNet contains three modules: a Geo Spectral Semantic Stem (GSSS), a Geo-Auxiliary Gated Fusion module (GAGF), and a Road Semantic Multi-Task Head (RSMH). GSSS strengthens road-sensitive multispectral responses in the Geo-1 branch. GAGF injects Sentinel-2 context through a Geo-centered gate instead of symmetric channel concatenation. RSMH imposes restrained hierarchy- and material-aware semantic regularization on the shared decoder representation during training. On the fixed source-domain benchmark, the complete model achieves an IoU of 0.7204, an F1-score of 0.8375, a Precision of 0.8092, and a Recall of 0.8678 against OSM-derived proxy masks. Relative to the UPerNet-MiT-B3 early-fusion baseline, IoU, F1-score, and Precision increase by 6.29%, 3.65%, and 12.58%, respectively. These results indicate that role-aware multisource organization improves road extraction under proxy supervision and reduces boundary noise and background false positives.
Chen et al. (Fri,) studied this question.