What question did this study set out to answer?

This research aims to address class head miscalibration in Mask Transformers for accurate building footprint extraction.

March 21, 2026Open Access

CalibMask: Addressing Class Head Miscalibration in Mask Transformers for Cross-Domain Building Footprint Extraction from Satellite Imagery

Key Points

This research aims to address class head miscalibration in Mask Transformers for accurate building footprint extraction.
Developed a calibration-aware training framework called CalibMask.
Utilized connected component analysis for automatic instance label generation.
Incorporated calibrated weight transfer from a regional model.
Implemented differential learning rate scheduling.
CalibMask achieved 50.1% IoU on USA imagery, 49.6% on UK, and 53.8% on France (zero-shot).
Instance labels were found to be the most crucial for performance, resulting in a 44% IoU drop without them.
Calibrated weight transfer contributed to a 19% increase in IoU.

Abstract

Building footprint extraction from satellite imagery is essential for urban planning, population estimation, and disaster response. While Mask Transformers such as Mask2Former have achieved state-of-the-art results on standard benchmarks, their deployment for global-scale building detection reveals a previously undocumented failure mode: class head miscalibration. We discover that when Mask2Former is trained with semantic-level labels rather than instance-level labels, the class prediction head becomes severely miscalibrated, producing maximum building confidence scores of only 8.5% despite accurate spatial predictions with 67% coverage. This miscalibration stems from the degeneration of Hungarian matching when semantic labels reduce all building instances to a single ground-truth object, creating an extreme 199:1 negative-to-positive class ratio. The result: standard post-processing produces zero detections (0% IoU). We propose CalibMask, a calibration-aware training framework that resolves this through automatic instance label generation via connected component analysis, calibrated weight transfer from a properly-trained regional model, and differential learning rate scheduling. Trained on 17,305 tiles spanning 9 countries, CalibMask achieves 50.1% IoU on USA, 49.6% on UK, and 53.8% on France (zero-shot), while restoring standard prediction to full functionality. A comprehensive ablation study confirms that instance labels are the most critical component (44% relative IoU drop without them), followed by calibrated transfer (19%).

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper