This study models insured loss data against environmental and socioeconomic variables to derive empirical hail vulnerability functions. The loss data come from three major hailstorms that occurred in 2020, 2021, and 2024 across Calgary, Canada. Radar-derived Maximum Estimated Size of Hail (MESH) at 1 km resolution and satellite-based precipitation data at 8-km resolution are combined with nine socioeconomic variables transformed from 46 variables in the Canadian Census to construct empirical vulnerability functions. Least absolute shrinkage and selection operator (Lasso) regression and Random Forest models are used to assess the model performance of various model complexities. Results show that cumulative precipitation significantly improves model performance and helps reduce event-specific biases. Variable selection through Lasso regression and Random Forest importance score guide the construction of four models with different complexities. The complexity range is from base models with MESHmax (which is the maximum value of MESH data at one location during a hailstorm) only to refined models with additional precipitation variable, socioeconomic variable, and with combined use of precipitation and socioeconomic variables. Including additional variables, such as precipitation and socioeconomic variables, enhances interpretability and predictive power, achieving a coefficient of determination of 0.82 and 15% to 40% reduction in the mean squared error compared to the base models. Random Forest models consistently outperform Lasso regression, achieving a 35% to 50% reduction in the mean squared error. The findings highlight the importance of conditioning on both environmental hazard and socioeconomic dimensions in hail loss modeling, offering a robust and scalable approach for future forecasting and resilience planning.
Li et al. (Wed,) studied this question.