What question did this study set out to answer?

The aim is to improve daily log-return forecasting using a hybrid CNN and gradient-boosted tree model.

March 1, 2026Open Access

Strictly Chronological CNN Embeddings with Gradient-Boosted Trees for Next-Day Log-Return Forecasting

Key Points

The aim is to improve daily log-return forecasting using a hybrid CNN and gradient-boosted tree model.
Developed a CNN-LightGBM hybrid model for log-return forecasting.
Implemented embedding standardization for stable interface in tree learning.
Used split-wise, training-only standardization with a recency-aware rule.
Anchored return-related predictors on a one-sided wavelet-denoised close series.
Achieved statistically significant accuracy gains over Naive0 and competitive performance against deep sequence baselines.
Demonstrated positive risk-adjusted performance using a long-or-cash decision rule.
Supported practical relevance of the predictive signal through quantifiable metrics.

Abstract

Daily equity return forecasting is challenging due to low signal-to-noise ratios, heavy-tailed innovations, and persistent distribution drift. We study one-step-ahead log-return prediction using daily market variables and return-based transformations. We propose a CNN–LightGBM hybrid that transfers a last-step CNN embedding to a gradient-boosted tree regressor through explicit embedding standardization, which stabilizes the representation interface for tree learning. To reduce train-to-evaluation mismatch under drift, we adopt split-wise, training-only standardization with a recency-aware fit-latest-W rule. Return-related predictors are anchored on a one-sided wavelet-denoised close series, while other market channels are retained in their original form to preserve episodic extremes. Experiments on NIFTY50 with walk-forward model selection show statistically reliable accuracy gains over Naive0 and competitive performance against representative deep sequence baselines, and the supplementary evaluations on HDFC and INDA provide additional out-of-sample evidence on these two assets under the same strictly chronological protocol. A long-or-cash decision rule based on the return forecasts yields positive risk-adjusted performance under realistic transaction-cost assumptions, supporting the practical relevance of the predictive signal.

Strictly Chronological CNN Embeddings with Gradient-Boosted Trees for Next-Day Log-Return Forecasting

Key Points

Abstract

Cite This Study