What question did this study set out to answer?

To develop a model for better interpolation of air temperature fields using crowdsourced data.

June 9, 2026Open Access

Probabilistic interpolation of crowdsourced meteorological data for higher-resolution gridded estimates of surface air temperature

Key Points

To develop a model for better interpolation of air temperature fields using crowdsourced data.
Developed a sparse variational Gaussian process model to address heteroscedasticity.
Applied the model to six years of hourly data across Durham County, North Carolina.
Compared predictions with linearly-interpolated ERA5-Land data at held-out sensor locations.
Achieved mean absolute error (MAE) of 0.57 °C at held-out locations compared to ERA5-Land MAE of 3.20 °C.
Enabled high-resolution analysis of canopy urban heat island patterns over various synoptic conditions.
Visualized variations in heating and cooling demand as well as annual hours exceeding 35 °C by neighborhood.

Abstract

Crowdsourced air temperature data from networks like Weather Underground offer dense spatial coverage and are increasingly used to study the canopy urban heat island (CUHI) effect. However, these observations are noisy: siting conditions, environmental interference, and sensor failures introduce spatially and temporally varying bias. This complicates interpolation, limiting our ability to estimate neighborhood-level air temperature. While interpolation techniques such as kriging account for uncertainty, they do so under the assumption of homoscedasticity. Moreover, they struggle to scale beyond a few thousand observations, limiting their utility on crowdsourced data. To overcome these limitations, we develop a sparse variational Gaussian process model that accounts for heteroscedasticity, allowing us to efficiently interpolate air temperature fields with calibrated uncertainty quantification. To test our approach, we apply our model to six years of hourly data across Durham County, North Carolina, and compare predictions at held-out sensor locations with linearly-interpolated ERA5-Land. Our method improves estimates at held-out locations (MAE=0.57 °C versus ERA5-Land MAE=3.20 °C) and enables high-resolution analysis of CUHI patterns over space and time. We illustrate this by visualizing (1) how CUHI patterns vary with synoptic conditions, (2) differential impacts on heating and cooling demand, and (3) annual hours exceeding 35 °C by neighborhood. Our method provides a scalable and statistically rigorous framework for transforming crowdsourced climate data into a gridded reanalysis product. Using this product, we can better quantify urban heat exposure and its impact on health and energy.

Read Full Paperexternally

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Journals

Urban Climate

Institutions

Duke University

References and Citations

Add This Paper to Your Research Feed

Any time a new paper drops it will be there.