What question did this study set out to answer?

The aim is to compare the semantic attributes of six major pan-European open building datasets for consistency, completeness, and comparability.

June 6, 2026Open Access

Towards a Comparison of the Semantic Information of Pan-European Open Building Data

Key Points

The aim is to compare the semantic attributes of six major pan-European open building datasets for consistency, completeness, and comparability.
Systematic comparison of six open building datasets: OpenStreetMap, EUBUCCO, Microsoft Global ML, Overture Maps, GHS-OBAT, and DBSM.
Five semantic attributes (height, typology, building age, number of floors, building material) were harmonised and analysed.
API-based data ingestion and high-performance computing were used to process around 1.25 billion building footprints.
Remote-sensing-derived datasets (GHS-OBAT and DBSM) show high levels of completeness for semantic attributes but use aggregated information.
Community-driven datasets (OpenStreetMap and Overture Maps) offer richer details but have lower completeness.
Completeness varies significantly across countries and urbanisation levels, highlighting that no single dataset is ideal for all contexts.

Abstract

Open, non-governmental building datasets have become increasingly important for urban analysis, exposure modelling, and policy support. Despite their growing use, little is known about the consistency, completeness, and comparability of the semantic information they provide at a continental scale. This study presents the first systematic comparison of the semantic attributes of six major pan-European open building datasets—OpenStreetMap, EUBUCCO, Microsoft Global ML Building Footprints, Overture Maps, GHS-OBAT, and the Digital Building Stock Model (DBSM)—using the 27 EU Member States as a common reference area. Five key semantic attributes (height, typology, building age, number of floors, and building material) were harmonised and analysed in terms of completeness and value distributions across countries and degrees of urbanisation. The workflow combines API-based data ingestion, distributed geospatial processing, and high-performance computing to handle around 1.250 billion building footprints. Results reveal pronounced heterogeneity in semantic content across datasets. Remote-sensing-derived products (GHS-OBAT and DBSM) exhibit the highest levels of attribute completeness for height, typology, and building age, but rely on aggregated or coarse semantic representations. In contrast, community-driven and conflated datasets (OpenStreetMap and Overture Maps) provide richer and more detailed semantic schemas, albeit with low and spatially uneven completeness. Completeness patterns vary substantially across countries and urbanisation classes, and high completeness values often mask limited semantic informativeness due to the prevalence of unknown or aggregated attribute values. Overall, the findings demonstrate that no single dataset is universally optimal regarding consistency and completeness of building footprints’ semantic attributes. Nonetheless, the paper provides practical guidance for selecting suitable data sources depending on spatial scale, attribute requirements, and analytical objectives.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Gabrielli et al. (Thu,) studied this question.

synapsesocial.com/papers/6a23ba1771a5da9775e75e7b https://doi.org/https://doi.org/10.3390/ijgi15060252

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper