Teacher shortages are a challenge in many countries and a threat for the quality of schools. This makes it important to monitor shortages and their impact on schools. However, school-level teacher shortages are multifaceted and influenced by numerous factors, making them costly to measure directly. This study develops a proxy for teacher shortages of Dutch primary schools using a rich set of predictors, including both administrative and online scraped data. Applying machine learning techniques with high-dimensional statistics, we construct two proxies: one that predicts the degree of shortages and another that classifies whether schools experience a shortage. Gradient boosting models generally outperform alternative approaches in predictive accuracy, measured using the root mean squared error and Youden’s J statistic, parsimony, and validation analyses. These results demonstrate that school-level teacher shortages can be accurately predicted with available information, substantially reducing the time and costs associated with conventional measurement.
Rongen et al. (Tue,) studied this question.