Background and objective: Rapid urbanization in Vietnam's municipalities and in particular in Ho Chi Minh City has placed complex issues in the practices of urban management and governance. The underlying challenge is the availability of spatial data on urban infrastructure and settings. This has instigated a national program on digitalizing urban infrastructure for better urban governance and efficient management. With respect to urban infrastructure data, this study introduces a deep learning workflow to enable building footprints extraction from very high-resolution UAV imagery and converts them into a structured geospatial layer to support the spatial management practices of urban environmental settings.Methods: The workflow relies on the DeepLabV3+ architecture with a ResNet-101 backbone, chosen for its strong semantic segmentation capability. Due to a small set of available labeled UAV samples, a combination of deep learning and transfer learning techniques was deployed by first pretraining the model on a large collection of open images. Then, it was fine-tuned with manually annotated photographs acquired over the Phuoc Thien resettlement area in Long Binh Ward, the eastern side of Ho Chi Minh City. To facilitate this, survey flights were conducted at an altitude of 100 m to produce orthomosaics with a Ground Sampling Distance (GSD) of 2.82 cm/pixel, which were processed through contrast enhancement, geometric correction, orthomosaic generation, and tiling before segmentation; subsequent refinement converts raster masks to vector polygons and assigns spatial references.Results: The resulting building footprint layer attains an Intersection over Union (IoU) of 0.91 and a Precision of 0.90, confirming reliable delineation under diverse building roof geometries and illumination. Delivered as a georeferenced vector layer, these footprints can be ingested directly into municipal spatial databases, providing an up-to-date dataset to support cadastral maintenance, zoning evaluation, and infrastructure monitoring.Conclusion: The proposed method shows that deep learning combined with transfer learning can effectively extract urban building boundaries from UAV imagery in the context of labeled data is limited. The workflow ensures the accuracy in both geometric and semantic aspects, which creates a spatial data layer that can be easily integrated into urban databases. The findings offer an insight into enabling the national strategy for urban digital transformation.
Trang et al. (Sat,) studied this question.