What question did this study set out to answer?

The aim is to forecast monthly Canadian real GDP growth using dynamic feature selection and machine learning models.

March 12, 2026Open Access

Dynamic Feature Selection for Canadian GDP Forecasting: Machine Learning with Google Trends and Official Data

Key Points

The aim is to forecast monthly Canadian real GDP growth using dynamic feature selection and machine learning models.
Utilized rolling windows for dynamic predictor selection using PDC-SIS.
Applied cross-validation for tuning to ensure robust real-time forecasting.
Compared machine learning models (GBM, XGBoost, LightGBM, CatBoost, Random Forest) against an ARIMA baseline.
Evaluated forecasting performance on different data sets: Official data, GT data, and a combination of both.
Official data provided the best performance for short and medium-term forecasts.
Combining Official and GT data significantly improved forecasting accuracy at long horizons.
LightGBM maintained a positive out-of-sample R2 with GT data across all forecasting horizons.
Diebold–Mariano tests confirmed that LightGBM was superior when using only GT data, while tree-based methods excelled with combined data.

Abstract

We forecast monthly Canadian real GDP growth using machine learning models trained on Official macroeconomic indicators and Google Trends (GT) data. Predictors are selected dynamically in each rolling window using PDC-SIS, with cross-validation-based tuning to support real-time forecasting and avoid data leakage. The evaluation is conducted on the latest-available (final-vintage) series and should be interpreted as a pseudo out-of-sample forecasting exercise rather than real-time vintage nowcasting. We evaluate GBM, XGBoost, LightGBM, CatBoost, and Random Forest against an ARIMA baseline. Official data deliver the strongest performance at short and medium horizons, while combining Official and GT data yields the clearest improvement at the longest horizon. With GT data alone, LightGBM is the only ML model maintaining positive out-of-sample R2 across all horizons. Diebold–Mariano tests corroborate these patterns: LightGBM dominates other ML models under GT-only predictors, whereas with Official and combined data, the horizon-specific best models significantly outperform ARIMA, with smaller differences among leading tree-based methods.

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper