May 1, 2026Open Access

Impact of COVID-19-related data drift on machine-learning prognostic models predicting 30-day opioid-related emergency department visits, hospitalisation or mortality: a population-level administrative data study in Alberta, Canada

Key Points

Key points are not available for this paper at this time.

Abstract

Objective To develop machine-learning (ML) models during the COVID-19 pandemic and adjacent time periods to evaluate the impact of data drift on model performance. Design This prognostic study used population-level administrative health data to develop ML prediction models. Setting Alberta, Canada during 2019–2023. Participants All patients over 18 who received at least one opioid dispensation from a community pharmacy within the province of Alberta between 2019–2023. Exposure Each opioid dispensation served as the unit-of-analysis. Main outcomes/measures Opioid-related outcomes were identified from linked health administrative datasets. Light Gradient Boosting-machine models were developed on pre-pandemic, pandemic and endemic data and temporally validated on 2023 data (pre-pandemic model was also validated on 2020–2021 data) to predict the risk of emergency department visit, hospitalisation or mortality within 30-days of an opioid dispensation. We described key feature distributions across the study time period and changes in model prediction performance on the validation sets using relevant metrics. Results Among 1.2 million study participants representing over 13 million opioid dispensations, there were 59 809 (2.1%), 134 402 (2.4%) and 62 143 (2.3%) events reported in the pre-pandemic (2019), pandemic (2020 and 2021) and endemic (2022) time periods, respectively (estimated 2023 validation set pre-test probability of 2.8%). Notable differences in key features were observed in the 2020–2021 model relative to other years. In the 2023 validation set, discrimination performance was highest for the pre-pandemic and endemic models compared with the pandemic model (0.81, 0.83, 0.74, respectively). A similar trend regarding changes from pre-test to post-test probabilities in higher categories of predicted risk (23%, 40%, 16%) was observed. 2020–2021 had the lowest discrimination performance (0.71) and uninformative post-test probabilities (<10%). Conclusion COVID-19 pandemic health data contributed to significant ML drift. Although ML approaches allow for quick re-training to mitigate drift, health regulators should approach ML prediction with caution when using pandemic-times data.

Impact of COVID-19-related data drift on machine-learning prognostic models predicting 30-day opioid-related emergency department visits, hospitalisation or mortality: a population-level administrative data study in Alberta, Canada

Key Points

Abstract

Cite This Study