We forecast monthly Canadian real GDP growth using machine learning models trained on Official macroeconomic indicators and Google Trends (GT) data. Predictors are selected dynamically in each rolling window using PDC-SIS, with cross-validation-based tuning to support real-time forecasting and avoid data leakage. The evaluation is conducted on the latest-available (final-vintage) series and should be interpreted as a pseudo out-of-sample forecasting exercise rather than real-time vintage nowcasting. We evaluate GBM, XGBoost, LightGBM, CatBoost, and Random Forest against an ARIMA baseline. Official data deliver the strongest performance at short and medium horizons, while combining Official and GT data yields the clearest improvement at the longest horizon. With GT data alone, LightGBM is the only ML model maintaining positive out-of-sample R2 across all horizons. Diebold–Mariano tests corroborate these patterns: LightGBM dominates other ML models under GT-only predictors, whereas with Official and combined data, the horizon-specific best models significantly outperform ARIMA, with smaller differences among leading tree-based methods.
Qureshi et al. (Mon,) studied this question.